pmdarima.arima
.auto_arima¶

pmdarima.arima.
auto_arima
(y, exogenous=None, start_p=2, d=None, start_q=2, max_p=5, max_d=2, max_q=5, start_P=1, D=None, start_Q=1, max_P=2, max_D=1, max_Q=2, max_order=5, m=1, seasonal=True, stationary=False, information_criterion='aic', alpha=0.05, test='kpss', seasonal_test='ocsb', stepwise=True, n_jobs=1, start_params=None, trend=None, method='lbfgs', maxiter=50, offset_test_args=None, seasonal_test_args=None, suppress_warnings=False, error_action='warn', trace=False, random=False, random_state=None, n_fits=10, return_valid_fits=False, out_of_sample_size=0, scoring='mse', scoring_args=None, with_intercept=True, sarimax_kwargs=None, **fit_args)[source][source]¶ Automatically discover the optimal order for an ARIMA model.
The autoARIMA process seeks to identify the most optimal parameters for an
ARIMA
model, settling on a single fitted ARIMA model. This process is based on the commonlyused R function,forecast::auto.arima
[3].AutoARIMA works by conducting differencing tests (i.e., Kwiatkowski–Phillips–Schmidt–Shin, Augmented DickeyFuller or Phillips–Perron) to determine the order of differencing,
d
, and then fitting models within ranges of definedstart_p
,max_p
,start_q
,max_q
ranges. If theseasonal
optional is enabled, autoARIMA also seeks to identify the optimalP
andQ
hyper parameters after conducting the CanovaHansen to determine the optimal order of seasonal differencing,D
.In order to find the best model, autoARIMA optimizes for a given
information_criterion
, one of (‘aic’, ‘aicc’, ‘bic’, ‘hqic’, ‘oob’) (Akaike Information Criterion, Corrected Akaike Information Criterion, Bayesian Information Criterion, HannanQuinn Information Criterion, or “out of bag”–for validation scoring–respectively) and returns the ARIMA which minimizes the value.Note that due to stationarity issues, autoARIMA might not find a suitable model that will converge. If this is the case, a
ValueError
will be thrown suggesting stationarityinducing measures be taken prior to refitting or that a new range oforder
values be selected. Non stepwise (i.e., essentially a grid search) selection can be slow, especially for seasonal data. Stepwise algorithm is outlined in Hyndman and Khandakar (2008).Parameters: y : arraylike or iterable, shape=(n_samples,)
The timeseries to which to fit the
ARIMA
estimator. This may either be a PandasSeries
object (statsmodels can internally use the dates in the index), or a numpy array. This should be a onedimensional array of floats, and should not contain anynp.nan
ornp.inf
values.exogenous : arraylike, shape=[n_obs, n_vars], optional (default=None)
An optional 2d array of exogenous variables. If provided, these variables are used as additional features in the regression operation. This should not include a constant or trend. Note that if an
ARIMA
is fit on exogenous features, it must be provided exogenous features for making predictions.start_p : int, optional (default=2)
The starting value of
p
, the order (or number of time lags) of the autoregressive (“AR”) model. Must be a positive integer.d : int, optional (default=None)
The order of firstdifferencing. If None (by default), the value will automatically be selected based on the results of the
test
(i.e., either the Kwiatkowski–Phillips–Schmidt–Shin, Augmented DickeyFuller or the Phillips–Perron test will be conducted to find the most probable value). Must be a positive integer or None. Note that ifd
is None, the runtime could be significantly longer.start_q : int, optional (default=2)
The starting value of
q
, the order of the movingaverage (“MA”) model. Must be a positive integer.max_p : int, optional (default=5)
The maximum value of
p
, inclusive. Must be a positive integer greater than or equal tostart_p
.max_d : int, optional (default=2)
The maximum value of
d
, or the maximum number of nonseasonal differences. Must be a positive integer greater than or equal tod
.max_q : int, optional (default=5)
The maximum value of
q
, inclusive. Must be a positive integer greater thanstart_q
.start_P : int, optional (default=1)
The starting value of
P
, the order of the autoregressive portion of the seasonal model.D : int, optional (default=None)
The order of the seasonal differencing. If None (by default, the value will automatically be selected based on the results of the
seasonal_test
. Must be a positive integer or None.start_Q : int, optional (default=1)
The starting value of
Q
, the order of the movingaverage portion of the seasonal model.max_P : int, optional (default=2)
The maximum value of
P
, inclusive. Must be a positive integer greater thanstart_P
.max_D : int, optional (default=1)
The maximum value of
D
. Must be a positive integer greater thanD
.max_Q : int, optional (default=2)
The maximum value of
Q
, inclusive. Must be a positive integer greater thanstart_Q
.max_order : int, optional (default=5)
Maximum value of p+q+P+Q if model selection is not stepwise. If the sum of
p
andq
is >=max_order
, a model will not be fit with those parameters, but will progress to the next combination. Default is 5. Ifmax_order
is None, it means there are no constraints on maximum order.m : int, optional (default=1)
The period for seasonal differencing,
m
refers to the number of periods in each season. For example,m
is 4 for quarterly data, 12 for monthly data, or 1 for annual (nonseasonal) data. Default is 1. Note that ifm
== 1 (i.e., is nonseasonal),seasonal
will be set to False. For more information on setting this parameter, see Setting m.seasonal : bool, optional (default=True)
Whether to fit a seasonal ARIMA. Default is True. Note that if
seasonal
is True andm
== 1,seasonal
will be set to False.stationary : bool, optional (default=False)
Whether the timeseries is stationary and
d
should be set to zero.information_criterion : str, optional (default=’aic’)
The information criterion used to select the best ARIMA model. One of
pmdarima.arima.auto_arima.VALID_CRITERIA
, (‘aic’, ‘bic’, ‘hqic’, ‘oob’).alpha : float, optional (default=0.05)
Level of the test for testing significance.
test : str, optional (default=’kpss’)
Type of unit root test to use in order to detect stationarity if
stationary
is False andd
is None. Default is ‘kpss’ (Kwiatkowski–Phillips–Schmidt–Shin).seasonal_test : str, optional (default=’ocsb’)
This determines which seasonal unit root test is used if
seasonal
is True andD
is None. Default is ‘OCSB’.stepwise : bool, optional (default=True)
Whether to use the stepwise algorithm outlined in Hyndman and Khandakar (2008) to identify the optimal model parameters. The stepwise algorithm can be significantly faster than fitting all (or a
random
subset of) hyperparameter combinations and is less likely to overfit the model.n_jobs : int, optional (default=1)
The number of models to fit in parallel in the case of a grid search (
stepwise=False
). Default is 1, but 1 can be used to designate “as many as possible”.start_params : arraylike, optional (default=None)
Starting parameters for
ARMA(p,q)
. If None, the default is given byARMA._fit_start_params
.method : str, optional (default=’lbfgs’)
The
method
determines which solver fromscipy.optimize
is used, and it can be chosen from among the following strings: ‘newton’ for NewtonRaphson
 ‘nm’ for NelderMead
 ‘bfgs’ for BroydenFletcherGoldfarbShanno (BFGS)
 ‘lbfgs’ for limitedmemory BFGS with optional box constraints
 ‘powell’ for modified Powell’s method
 ‘cg’ for conjugate gradient
 ‘ncg’ for Newtonconjugate gradient
 ‘basinhopping’ for global basinhopping solver
The explicit arguments in
fit
are passed to the solver, with the exception of the basinhopping solver. Each solver has several optional arguments that are not the same across solvers. These can be passed as **fit_kwargstrend : str or None, optional (default=None)
The trend parameter. If
with_intercept
is True,trend
will be used. Ifwith_intercept
is False, the trend will be set to a no intercept value.maxiter : int, optional (default=50)
The maximum number of function evaluations. Default is 50.
offset_test_args : dict, optional (default=None)
The args to pass to the constructor of the offset (
d
) test. Seepmdarima.arima.stationarity
for more details.seasonal_test_args : dict, optional (default=None)
The args to pass to the constructor of the seasonal offset (
D
) test. Seepmdarima.arima.seasonality
for more details.suppress_warnings : bool, optional (default=False)
Many warnings might be thrown inside of statsmodels. If
suppress_warnings
is True, all of the warnings coming fromARIMA
will be squelched.error_action : str, optional (default=’warn’)
If unable to fit an
ARIMA
due to stationarity issues, whether to warn (‘warn’), raise theValueError
(‘raise’) or ignore (‘ignore’). Note that the default behavior is to warn, and fits that fail will be returned as None. This is the recommended behavior, as statsmodels ARIMA and SARIMAX models hit bugs periodically that can cause an otherwise healthy parameter combination to fail for reasons not related to pmdarima.trace : bool or int, optional (default=False)
Whether to print status on the fits. A value of False will print no debugging information. A value of True will print some. Integer values exceeding 1 will print increasing amounts of debug information at each fit.
random : bool, optional (default=False)
Similar to grid searches,
auto_arima
provides the capability to perform a “random search” over a hyperparameter space. Ifrandom
is True, rather than perform an exhaustive search orstepwise
search, onlyn_fits
ARIMA models will be fit (stepwise
must be False for this option to do anything).random_state : int, long or numpy
RandomState
, optional (default=None)The PRNG for when
random=True
. Ensures replicable testing and results.n_fits : int, optional (default=10)
If
random
is True and a “random search” is going to be performed,n_iter
is the number of ARIMA models to be fit.return_valid_fits : bool, optional (default=False)
If True, will return all valid ARIMA fits in a list. If False (by default), will only return the best fit.
out_of_sample_size : int, optional (default=0)
The
ARIMA
class can fit only a portion of the data if specified, in order to retain an “out of bag” sample score. This is the number of examples from the tail of the time series to hold out and use as validation examples. The model will not be fit on these samples, but the observations will be added into the model’sendog
andexog
arrays so that future forecast values originate from the end of the endogenous vector.For instance:
y = [0, 1, 2, 3, 4, 5, 6] out_of_sample_size = 2 > Fit on: [0, 1, 2, 3, 4] > Score on: [5, 6] > Append [5, 6] to end of self.arima_res_.data.endog values
scoring : str, optional (default=’mse’)
If performing validation (i.e., if
out_of_sample_size
> 0), the metric to use for scoring the outofsample data. One of (‘mse’, ‘mae’)scoring_args : dict, optional (default=None)
A dictionary of keyword arguments to be passed to the
scoring
metric.with_intercept : bool, optional (default=True)
Whether to include an intercept term. Default is True.
sarimax_kwargs : dict or None, optional (default=None)
Keyword arguments to pass to the ARIMA constructor.
**fit_args : dict, optional (default=None)
A dictionary of keyword arguments to pass to the
ARIMA.fit()
method.See also
Notes
 Fitting with stepwise=False can prove slower, especially when seasonal=True.
References
[R64] https://wikipedia.org/wiki/Autoregressive_integrated_moving_average [R65] R’s autoarima source code: https://github.com/robjhyndman/forecast/blob/master/R/arima.R # noqa [R66] R’s autoarima documentation: https://www.rdocumentation.org/packages/forecast # noqa