pmdarima.arima.ARIMA

class pmdarima.arima.ARIMA(order, seasonal_order=(0, 0, 0, 0), start_params=None, method='lbfgs', maxiter=50, suppress_warnings=False, out_of_sample_size=0, scoring='mse', scoring_args=None, trend=None, with_intercept=True, **sarimax_kwargs)[source][source]

An ARIMA estimator.

An ARIMA, or autoregressive integrated moving average, is a generalization of an autoregressive moving average (ARMA) and is fitted to time-series data in an effort to forecast future points. ARIMA models can be especially efficacious in cases where data shows evidence of non-stationarity.

The “AR” part of ARIMA indicates that the evolving variable of interest is regressed on its own lagged (i.e., prior observed) values. The “MA” part indicates that the regression error is actually a linear combination of error terms whose values occurred contemporaneously and at various times in the past. The “I” (for “integrated”) indicates that the data values have been replaced with the difference between their values and the previous values (and this differencing process may have been performed more than once). The purpose of each of these features is to make the model fit the data as well as possible.

Non-seasonal ARIMA models are generally denoted ARIMA(p,d,q) where parameters p, d, and q are non-negative integers, p is the order (number of time lags) of the autoregressive model, d is the degree of differencing (the number of times the data have had past values subtracted), and q is the order of the moving-average model. Seasonal ARIMA models are usually denoted ARIMA(p,d,q)(P,D,Q)m, where m refers to the number of periods in each season, and the uppercase P, D, Q refer to the autoregressive, differencing, and moving average terms for the seasonal part of the ARIMA model.

When two out of the three terms are zeros, the model may be referred to based on the non-zero parameter, dropping “AR”, “I” or “MA” from the acronym describing the model. For example, ARIMA(1,0,0) is AR(1), ARIMA(0,1,0) is I(1), and ARIMA(0,0,1) is MA(1). [1]

See notes for more practical information on the ARIMA class.

Parameters:

order : iterable or array-like, shape=(3,)

The (p,d,q) order of the model for the number of AR parameters, differences, and MA parameters to use. p is the order (number of time lags) of the auto-regressive model, and is a non-negative integer. d is the degree of differencing (the number of times the data have had past values subtracted), and is a non-negative integer. q is the order of the moving-average model, and is a non-negative integer.

seasonal_order : array-like, shape=(4,), optional (default=(0, 0, 0, 0))

The (P,D,Q,s) order of the seasonal component of the model for the AR parameters, differences, MA parameters, and periodicity. D must be an integer indicating the integration order of the process, while P and Q may either be an integers indicating the AR and MA orders (so that all lags up to those orders are included) or else iterables giving specific AR and / or MA lags to include. S is an integer giving the periodicity (number of periods in season), often it is 4 for quarterly data or 12 for monthly data. Default is no seasonal effect.

start_params : array-like, optional (default=None)

Starting parameters for ARMA(p,q). If None, the default is given by ARMA._fit_start_params.

method : str, optional (default=’lbfgs’)

The method determines which solver from scipy.optimize is used, and it can be chosen from among the following strings:

  • ‘newton’ for Newton-Raphson

  • ‘nm’ for Nelder-Mead

  • ‘bfgs’ for Broyden-Fletcher-Goldfarb-Shanno (BFGS)

  • ‘lbfgs’ for limited-memory BFGS with optional box constraints

  • ‘powell’ for modified Powell’s method

  • ‘cg’ for conjugate gradient

  • ‘ncg’ for Newton-conjugate gradient

  • ‘basinhopping’ for global basin-hopping solver

The explicit arguments in fit are passed to the solver, with the exception of the basin-hopping solver. Each solver has several optional arguments that are not the same across solvers. These can be passed as **fit_kwargs

maxiter : int, optional (default=50)

The maximum number of function evaluations. Default is 50

suppress_warnings : bool, optional (default=False)

Many warnings might be thrown inside of statsmodels. If suppress_warnings is True, all of these warnings will be squelched.

out_of_sample_size : int, optional (default=0)

The number of examples from the tail of the time series to hold out and use as validation examples. The model will not be fit on these samples, but the observations will be added into the model’s endog and exog arrays so that future forecast values originate from the end of the endogenous vector. See update().

For instance:

y = [0, 1, 2, 3, 4, 5, 6]
out_of_sample_size = 2

> Fit on: [0, 1, 2, 3, 4]
> Score on: [5, 6]
> Append [5, 6] to end of self.arima_res_.data.endog values

scoring : str or callable, optional (default=’mse’)

If performing validation (i.e., if out_of_sample_size > 0), the metric to use for scoring the out-of-sample data:

  • If a string, must be a valid metric name importable from sklearn.metrics.

  • If a callable, must adhere to the function signature:

    def foo_loss(y_true, y_pred)
    

Note that models are selected by minimizing loss. If using a maximizing metric (such as sklearn.metrics.r2_score), it is the user’s responsibility to wrap the function such that it returns a negative value for minimizing.

scoring_args : dict, optional (default=None)

A dictionary of key-word arguments to be passed to the scoring metric.

trend : str or None, optional (default=None)

The trend parameter. If with_intercept is True, trend will be used. If with_intercept is False, the trend will be set to a no- intercept value. If None and with_intercept, ‘c’ will be used as a default.

with_intercept : bool, optional (default=True)

Whether to include an intercept term. Default is True.

**sarimax_kwargs : keyword args, optional

Optional arguments to pass to the SARIMAX constructor. Examples of potentially valuable kwargs:

  • time_varying_regression : boolean Whether or not coefficients on the exogenous regressors are allowed to vary over time.

  • enforce_stationarity : boolean Whether or not to transform the AR parameters to enforce stationarity in the auto-regressive component of the model.

  • enforce_invertibility : boolean Whether or not to transform the MA parameters to enforce invertibility in the moving average component of the model.

  • simple_differencing : boolean Whether or not to use partially conditional maximum likelihood estimation for seasonal ARIMA models. If True, differencing is performed prior to estimation, which discards the first \(s D + d\) initial rows but results in a smaller state-space formulation. If False, the full SARIMAX model is put in state-space form so that all datapoints can be used in estimation. Default is False.

  • measurement_error: boolean Whether or not to assume the endogenous observations endog were measured with error. Default is False.

  • mle_regression : boolean Whether or not to use estimate the regression coefficients for the exogenous variables as part of maximum likelihood estimation or through the Kalman filter (i.e. recursive least squares). If time_varying_regression is True, this must be set to False. Default is True.

  • hamilton_representation : boolean Whether or not to use the Hamilton representation of an ARMA process (if True) or the Harvey representation (if False). Default is False.

  • concentrate_scale : boolean Whether or not to concentrate the scale (variance of the error term) out of the likelihood. This reduces the number of parameters estimated by maximum likelihood by one, but standard errors will then not be available for the scale parameter.

Attributes

arima_res_

(ModelResultsWrapper) The model results, per statsmodels

endog_index_

(pd.Series or None) If the fitted endog array is a pd.Series, this value will be non-None and is used to validate args for in-sample predictions with non-integer start/end indices

oob_

(float) The MAE or MSE of the out-of-sample records, if out_of_sample_size is > 0, else np.nan

oob_preds_

(np.ndarray or None) The predictions for the out-of-sample records, if out_of_sample_size is > 0, else None

Notes

  • The model internally wraps the statsmodels SARIMAX class

  • After the model fit, many more methods will become available to the fitted model (i.e., pvalues(), params(), etc.). These are delegate methods which wrap the internal ARIMA results instance.

References

Methods

aic()

Get the AIC, the Akaike Information Criterion:

aicc()

Get the AICc, the corrected Akaike Information Criterion:

arparams()

Get the parameters associated with the AR coefficients in the model.

arroots()

The roots of the AR coefficients are the solution to:

bic()

Get the BIC, the Bayes Information Criterion:

bse()

Get the standard errors of the parameters.

conf_int([alpha])

Returns the confidence interval of the fitted parameters.

df_model()

The model degrees of freedom: k_exog + k_trend + k_ar + k_ma.

df_resid()

Get the residual degrees of freedom:

fit(y[, X])

Fit an ARIMA to a vector, y, of observations with an optional matrix of X variables.

fit_predict(y[, X, n_periods])

Fit an ARIMA to a vector, y, of observations with an optional matrix of exogenous variables, and then generate predictions.

fittedvalues()

Get the fitted values from the model

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

hqic()

Get the Hannan-Quinn Information Criterion:

maparams()

Get the value of the moving average coefficients.

maroots()

The roots of the MA coefficients are the solution to:

oob()

If the model was built with out_of_sample_size > 0, a validation score will have been computed.

params()

Get the parameters of the model.

plot_diagnostics([variable, lags, fig, figsize])

Plot an ARIMA's diagnostics.

predict([n_periods, X, return_conf_int, alpha])

Forecast future values

predict_in_sample([X, start, end, dynamic, ...])

Generate in-sample predictions from the fit ARIMA model.

pvalues()

Get the p-values associated with the t-values of the coefficients.

resid()

Get the model residuals.

set_params(**params)

Set the parameters of this estimator.

set_predict_request(*[, alpha, n_periods, ...])

Request metadata passed to the predict method.

summary()

Get a summary of the ARIMA model

to_dict()

Get the ARIMA model as a dictionary

update(y[, X, maxiter])

Update the model fit with additional observed endog/exog values.

__init__(order, seasonal_order=(0, 0, 0, 0), start_params=None, method='lbfgs', maxiter=50, suppress_warnings=False, out_of_sample_size=0, scoring='mse', scoring_args=None, trend=None, with_intercept=True, **sarimax_kwargs)[source][source]