pmdarima.arima.ARIMA¶
- 
class pmdarima.arima.ARIMA(order, seasonal_order=(0, 0, 0, 0), start_params=None, method='lbfgs', maxiter=50, suppress_warnings=False, out_of_sample_size=0, scoring='mse', scoring_args=None, trend=None, with_intercept=True, **sarimax_kwargs)[source][source]¶
- An ARIMA estimator. - An ARIMA, or autoregressive integrated moving average, is a generalization of an autoregressive moving average (ARMA) and is fitted to time-series data in an effort to forecast future points. ARIMA models can be especially efficacious in cases where data shows evidence of non-stationarity. - The “AR” part of ARIMA indicates that the evolving variable of interest is regressed on its own lagged (i.e., prior observed) values. The “MA” part indicates that the regression error is actually a linear combination of error terms whose values occurred contemporaneously and at various times in the past. The “I” (for “integrated”) indicates that the data values have been replaced with the difference between their values and the previous values (and this differencing process may have been performed more than once). The purpose of each of these features is to make the model fit the data as well as possible. - Non-seasonal ARIMA models are generally denoted - ARIMA(p,d,q)where parameters- p,- d, and- qare non-negative integers,- pis the order (number of time lags) of the autoregressive model,- dis the degree of differencing (the number of times the data have had past values subtracted), and- qis the order of the moving-average model. Seasonal ARIMA models are usually denoted- ARIMA(p,d,q)(P,D,Q)m, where- mrefers to the number of periods in each season, and the uppercase- P,- D,- Qrefer to the autoregressive, differencing, and moving average terms for the seasonal part of the ARIMA model.- When two out of the three terms are zeros, the model may be referred to based on the non-zero parameter, dropping “AR”, “I” or “MA” from the acronym describing the model. For example, - ARIMA(1,0,0)is- AR(1),- ARIMA(0,1,0)is- I(1), and- ARIMA(0,0,1)is- MA(1). [1]- See notes for more practical information on the - ARIMAclass.- Parameters: - order : iterable or array-like, shape=(3,) - The (p,d,q) order of the model for the number of AR parameters, differences, and MA parameters to use. - pis the order (number of time lags) of the auto-regressive model, and is a non-negative integer.- dis the degree of differencing (the number of times the data have had past values subtracted), and is a non-negative integer.- qis the order of the moving-average model, and is a non-negative integer.- seasonal_order : array-like, shape=(4,), optional (default=(0, 0, 0, 0)) - The (P,D,Q,s) order of the seasonal component of the model for the AR parameters, differences, MA parameters, and periodicity. - Dmust be an integer indicating the integration order of the process, while- Pand- Qmay either be an integers indicating the AR and MA orders (so that all lags up to those orders are included) or else iterables giving specific AR and / or MA lags to include.- Sis an integer giving the periodicity (number of periods in season), often it is 4 for quarterly data or 12 for monthly data. Default is no seasonal effect.- start_params : array-like, optional (default=None) - Starting parameters for - ARMA(p,q). If None, the default is given by- ARMA._fit_start_params.- method : str, optional (default=’lbfgs’) - The - methoddetermines which solver from- scipy.optimizeis used, and it can be chosen from among the following strings:- ‘newton’ for Newton-Raphson
- ‘nm’ for Nelder-Mead
- ‘bfgs’ for Broyden-Fletcher-Goldfarb-Shanno (BFGS)
- ‘lbfgs’ for limited-memory BFGS with optional box constraints
- ‘powell’ for modified Powell’s method
- ‘cg’ for conjugate gradient
- ‘ncg’ for Newton-conjugate gradient
- ‘basinhopping’ for global basin-hopping solver
 - The explicit arguments in - fitare passed to the solver, with the exception of the basin-hopping solver. Each solver has several optional arguments that are not the same across solvers. These can be passed as **fit_kwargs- maxiter : int, optional (default=50) - The maximum number of function evaluations. Default is 50 - suppress_warnings : bool, optional (default=False) - Many warnings might be thrown inside of statsmodels. If - suppress_warningsis True, all of these warnings will be squelched.- out_of_sample_size : int, optional (default=0) - The number of examples from the tail of the time series to hold out and use as validation examples. The model will not be fit on these samples, but the observations will be added into the model’s - endogand- exogarrays so that future forecast values originate from the end of the endogenous vector. See- update().- For instance: - y = [0, 1, 2, 3, 4, 5, 6] out_of_sample_size = 2 > Fit on: [0, 1, 2, 3, 4] > Score on: [5, 6] > Append [5, 6] to end of self.arima_res_.data.endog values - scoring : str or callable, optional (default=’mse’) - If performing validation (i.e., if - out_of_sample_size> 0), the metric to use for scoring the out-of-sample data:- If a string, must be a valid metric name importable from - sklearn.metrics.
- If a callable, must adhere to the function signature: - def foo_loss(y_true, y_pred) 
 - Note that models are selected by minimizing loss. If using a maximizing metric (such as - sklearn.metrics.r2_score), it is the user’s responsibility to wrap the function such that it returns a negative value for minimizing.- scoring_args : dict, optional (default=None) - A dictionary of key-word arguments to be passed to the - scoringmetric.- trend : str or None, optional (default=None) - The trend parameter. If - with_interceptis True,- trendwill be used. If- with_interceptis False, the trend will be set to a no- intercept value. If None and- with_intercept, ‘c’ will be used as a default.- with_intercept : bool, optional (default=True) - Whether to include an intercept term. Default is True. - **sarimax_kwargs : keyword args, optional - Optional arguments to pass to the SARIMAX constructor. Examples of potentially valuable kwargs: - time_varying_regression : boolean Whether or not coefficients on the exogenous regressors are allowed to vary over time.
- enforce_stationarity : boolean Whether or not to transform the AR parameters to enforce stationarity in the auto-regressive component of the model.
- enforce_invertibility : boolean Whether or not to transform the MA parameters to enforce invertibility in the moving average component of the model.
- simple_differencing : boolean Whether or not to use partially conditional maximum likelihood estimation for seasonal ARIMA models. If True, differencing is performed prior to estimation, which discards the first \(s D + d\) initial rows but results in a smaller state-space formulation. If False, the full SARIMAX model is put in state-space form so that all datapoints can be used in estimation. Default is False.
- measurement_error: boolean Whether or not to assume the endogenous observations endog were measured with error. Default is False.
- mle_regression : boolean Whether or not to use estimate the regression coefficients for the exogenous variables as part of maximum likelihood estimation or through the Kalman filter (i.e. recursive least squares). If time_varying_regression is True, this must be set to False. Default is True.
- hamilton_representation : boolean Whether or not to use the Hamilton representation of an ARMA process (if True) or the Harvey representation (if False). Default is False.
- concentrate_scale : boolean Whether or not to concentrate the scale (variance of the error term) out of the likelihood. This reduces the number of parameters estimated by maximum likelihood by one, but standard errors will then not be available for the scale parameter.
 - Attributes - arima_res_ - (ModelResultsWrapper) The model results, per statsmodels - oob_ - (float) The MAE or MSE of the out-of-sample records, if - out_of_sample_sizeis > 0, else np.nan- oob_preds_ - (np.ndarray or None) The predictions for the out-of-sample records, if - out_of_sample_sizeis > 0, else None- See also - Notes - The model internally wraps the statsmodels SARIMAX class
- After the model fit, many more methods will become available to the
fitted model (i.e., pvalues(),params(), etc.). These are delegate methods which wrap the internal ARIMA results instance.
 - References - [R45] - https://wikipedia.org/wiki/Autoregressive_integrated_moving_average - Methods - aic()- Get the AIC, the Akaike Information Criterion: - aicc()- Get the AICc, the corrected Akaike Information Criterion: - arparams()- Get the parameters associated with the AR coefficients in the model. - arroots()- The roots of the AR coefficients are the solution to: - bic()- Get the BIC, the Bayes Information Criterion: - bse()- Get the standard errors of the parameters. - conf_int([alpha])- Returns the confidence interval of the fitted parameters. - df_model()- The model degrees of freedom: - k_exog+- k_trend+- k_ar+- k_ma.- df_resid()- Get the residual degrees of freedom: - fit(y[, X])- Fit an ARIMA to a vector, - y, of observations with an optional matrix of- Xvariables.- fit_predict(y[, X, n_periods])- Fit an ARIMA to a vector, - y, of observations with an optional matrix of- exogenousvariables, and then generate predictions.- get_params([deep])- Get parameters for this estimator. - hqic()- Get the Hannan-Quinn Information Criterion: - maparams()- Get the value of the moving average coefficients. - maroots()- The roots of the MA coefficients are the solution to: - oob()- If the model was built with - out_of_sample_size> 0, a validation score will have been computed.- params()- Get the parameters of the model. - plot_diagnostics([variable, lags, fig, figsize])- Plot an ARIMA’s diagnostics. - predict([n_periods, X, return_conf_int, alpha])- Forecast future values - predict_in_sample([X, start, end, dynamic, …])- Generate in-sample predictions from the fit ARIMA model. - pvalues()- Get the p-values associated with the t-values of the coefficients. - resid()- Get the model residuals. - set_params(**params)- Set the parameters of this estimator. - summary()- Get a summary of the ARIMA model - to_dict()- Get the ARIMA model as a dictionary - update(y[, X, maxiter])- Update the model fit with additional observed endog/exog values. - 
__init__(order, seasonal_order=(0, 0, 0, 0), start_params=None, method='lbfgs', maxiter=50, suppress_warnings=False, out_of_sample_size=0, scoring='mse', scoring_args=None, trend=None, with_intercept=True, **sarimax_kwargs)[source][source]¶
- Initialize self. See help(type(self)) for accurate signature. 
 - 
aic()[source][source]¶
- Get the AIC, the Akaike Information Criterion: - -2 * llf + 2 * df_model- Where - df_model(the number of degrees of freedom in the model) includes all AR parameters, MA parameters, constant terms parameters on constant terms and the variance.- Returns: - aic : float - The AIC - References - [R46] - https://en.wikipedia.org/wiki/Akaike_information_criterion 
 - 
aicc()[source][source]¶
- Get the AICc, the corrected Akaike Information Criterion: - AIC + 2 * df_model * (df_model + 1) / (nobs - df_model - 1)- Where - df_model(the number of degrees of freedom in the model) includes all AR parameters, MA parameters, constant terms parameters on constant terms and the variance. And- nobsis the sample size.- Returns: - aicc : float - The AICc - References - [R47] - https://en.wikipedia.org/wiki/Akaike_information_criterion#AICc 
 - 
arparams()[source][source]¶
- Get the parameters associated with the AR coefficients in the model. - Returns: - arparams : array-like - The AR coefficients. 
 - 
arroots()[source][source]¶
- The roots of the AR coefficients are the solution to: - (1 - arparams[0] * z - arparams[1] * z^2 - ... - arparams[ p-1] * z^k_ar) = 0- Stability requires that the roots in modulus lie outside the unit circle. - Returns: - arroots : array-like - The roots of the AR coefficients. 
 - 
bic()[source][source]¶
- Get the BIC, the Bayes Information Criterion: - -2 * llf + log(nobs) * df_model- Where if the model is fit using conditional sum of squares, the number of observations - nobsdoes not include the- ppre-sample observations.- Returns: - bse : float - The BIC - References - [R48] - https://en.wikipedia.org/wiki/Bayesian_information_criterion 
 - 
bse()[source][source]¶
- Get the standard errors of the parameters. These are computed using the numerical Hessian. - Returns: - bse : array-like - The BSE 
 - 
conf_int(alpha=0.05, **kwargs)[source][source]¶
- Returns the confidence interval of the fitted parameters. - Returns: - alpha : float, optional (default=0.05) - The significance level for the confidence interval. ie., the default alpha = .05 returns a 95% confidence interval. - **kwargs : keyword args or dict - Keyword arguments to pass to the confidence interval function. Could include ‘cols’ or ‘method’ 
 - 
df_model()[source][source]¶
- The model degrees of freedom: - k_exog+- k_trend+- k_ar+- k_ma.- Returns: - df_model : array-like - The degrees of freedom in the model. 
 - 
df_resid()[source][source]¶
- Get the residual degrees of freedom: - nobs - df_model- Returns: - df_resid : array-like - The residual degrees of freedom. 
 - 
fit(y, X=None, **fit_args)[source][source]¶
- Fit an ARIMA to a vector, - y, of observations with an optional matrix of- Xvariables.- Parameters: - y : array-like or iterable, shape=(n_samples,) - The time-series to which to fit the - ARIMAestimator. This may either be a Pandas- Seriesobject (statsmodels can internally use the dates in the index), or a numpy array. This should be a one-dimensional array of floats, and should not contain any- np.nanor- np.infvalues.- X : array-like, shape=[n_obs, n_vars], optional (default=None) - An optional 2-d array of exogenous variables. If provided, these variables are used as additional features in the regression operation. This should not include a constant or trend. Note that if an - ARIMAis fit on exogenous features, it must be provided exogenous features for making predictions.- **fit_args : dict or kwargs - Any keyword arguments to pass to the statsmodels ARIMA fit. 
 - 
fit_predict(y, X=None, n_periods=10, **fit_args)[source]¶
- Fit an ARIMA to a vector, - y, of observations with an optional matrix of- exogenousvariables, and then generate predictions.- Parameters: - y : array-like or iterable, shape=(n_samples,) - The time-series to which to fit the - ARIMAestimator. This may either be a Pandas- Seriesobject (statsmodels can internally use the dates in the index), or a numpy array. This should be a one-dimensional array of floats, and should not contain any- np.nanor- np.infvalues.- X : array-like, shape=[n_obs, n_vars], optional (default=None) - An optional 2-d array of exogenous variables. If provided, these variables are used as additional features in the regression operation. This should not include a constant or trend. Note that if an - ARIMAis fit on exogenous features, it must be provided exogenous features for making predictions.- n_periods : int, optional (default=10) - The number of periods in the future to forecast. - fit_args : dict or kwargs, optional (default=None) - Any keyword args to pass to the fit method. 
 - 
get_params(deep=True)[source]¶
- Get parameters for this estimator. - Parameters: - deep : bool, default=True - If True, will return the parameters for this estimator and contained subobjects that are estimators. - Returns: - params : mapping of string to any - Parameter names mapped to their values. 
 - 
hqic()[source][source]¶
- Get the Hannan-Quinn Information Criterion: - -2 * llf + 2 * (`df_model) * log(log(nobs))`- Like - bic()if the model is fit using conditional sum of squares then the- k_arpre-sample observations are not counted in- nobs.- Returns: - hqic : float - The HQIC - References - [R49] - https://en.wikipedia.org/wiki/Hannan-Quinn_information_criterion 
 - 
maparams()[source][source]¶
- Get the value of the moving average coefficients. - Returns: - maparams : array-like - The MA coefficients. 
 - 
maroots()[source][source]¶
- The roots of the MA coefficients are the solution to: - (1 + maparams[0] * z + maparams[1] * z^2 + ... + maparams[ q-1] * z^q) = 0- Stability requires that the roots in modules lie outside the unit circle. - Returns: - maroots : array-like - The MA roots. 
 - 
oob()[source][source]¶
- If the model was built with - out_of_sample_size> 0, a validation score will have been computed. Otherwise it will be np.nan.- Returns: - oob_ : float - The “out-of-bag” score. 
 - 
params()[source][source]¶
- Get the parameters of the model. The order of variables is the trend coefficients and the - k_exog()exogenous coefficients, then the- k_ar()AR coefficients, and finally the- k_ma()MA coefficients.- Returns: - params : array-like - The parameters of the model. 
 - 
plot_diagnostics(variable=0, lags=10, fig=None, figsize=None)[source][source]¶
- Plot an ARIMA’s diagnostics. - Diagnostic plots for standardized residuals of one endogenous variable - Parameters: - variable : integer, optional - Index of the endogenous variable for which the diagnostic plots should be created. Default is 0. - lags : integer, optional - Number of lags to include in the correlogram. Default is 10. - fig : Matplotlib Figure instance, optional - If given, subplots are created in this figure instead of in a new figure. Note that the 2x2 grid will be created in the provided figure using fig.add_subplot(). - figsize : tuple, optional - If a figure is created, this argument allows specifying a size. The tuple is (width, height). - See also - statsmodels.graphics.gofplots.qqplot,- pmdarima.utils.visualization.plot_acf- Notes - Produces a 2x2 plot grid with the following plots (ordered clockwise from top left): - Standardized residuals over time
- Histogram plus estimated density of standardized residuals, along with a Normal(0,1) density plotted for reference.
- Normal Q-Q plot, with Normal reference line.
- Correlogram
 - References - [R50] - https://www.statsmodels.org/dev/_modules/statsmodels/tsa/statespace/mlemodel.html#MLEResults.plot_diagnostics 
 - 
predict(n_periods=10, X=None, return_conf_int=False, alpha=0.05, **kwargs)[source][source]¶
- Forecast future values - Generate predictions (forecasts) - n_periodsin the future. Note that if- exogenousvariables were used in the model fit, they will be expected for the predict procedure and will fail otherwise.- Parameters: - n_periods : int, optional (default=10) - The number of periods in the future to forecast. - X : array-like, shape=[n_obs, n_vars], optional (default=None) - An optional 2-d array of exogenous variables. If provided, these variables are used as additional features in the regression operation. This should not include a constant or trend. Note that if an - ARIMAis fit on exogenous features, it must be provided exogenous features for making predictions.- return_conf_int : bool, optional (default=False) - Whether to get the confidence intervals of the forecasts. - alpha : float, optional (default=0.05) - The confidence intervals for the forecasts are (1 - alpha) % - Returns: - forecasts : array-like, shape=(n_periods,) - The array of fore-casted values. - conf_int : array-like, shape=(n_periods, 2), optional - The confidence intervals for the forecasts. Only returned if - return_conf_intis True.
 - 
predict_in_sample(X=None, start=None, end=None, dynamic=False, return_conf_int=False, alpha=0.05, **kwargs)[source][source]¶
- Generate in-sample predictions from the fit ARIMA model. - Predicts the original training (in-sample) time series values. This can be useful when wanting to visualize the fit, and qualitatively inspect the efficacy of the model, or when wanting to compute the residuals of the model. - Parameters: - X : array-like, shape=[n_obs, n_vars], optional (default=None) - An optional 2-d array of exogenous variables. If provided, these variables are used as additional features in the regression operation. This should not include a constant or trend. Note that if an - ARIMAis fit on exogenous features, it must be provided exogenous features for making predictions.- start : int, optional (default=None) - Zero-indexed observation number at which to start forecasting, ie., the first forecast is start. Note that if this value is less than - d, the order of differencing, an error will be raised.- end : int, optional (default=None) - Zero-indexed observation number at which to end forecasting, ie., the first forecast is start. - dynamic : bool, optional (default=False) - The dynamic keyword affects in-sample prediction. If dynamic is False, then the in-sample lagged values are used for prediction. If dynamic is True, then in-sample forecasts are used in place of lagged dependent variables. The first forecasted value is start. - return_conf_int : bool, optional (default=False) - Whether to get the confidence intervals of the forecasts. - alpha : float, optional (default=0.05) - The confidence intervals for the forecasts are (1 - alpha) % - Returns: - preds : array - The predicted values. - conf_int : array-like, shape=(n_periods, 2), optional - The confidence intervals for the predictions. Only returned if - return_conf_intis True.
 - 
pvalues()[source][source]¶
- Get the p-values associated with the t-values of the coefficients. Note that the coefficients are assumed to have a Student’s T distribution. - Returns: - pvalues : array-like - The p-values. 
 - 
resid()[source][source]¶
- Get the model residuals. If the model is fit using ‘mle’, then the residuals are created via the Kalman Filter. If the model is fit using ‘css’ then the residuals are obtained via - scipy.signal.lfilteradjusted such that the first- k_ma()residuals are zero. These zero residuals are not returned.- Returns: - resid : array-like - The model residuals. 
 - 
set_params(**params)[source]¶
- Set the parameters of this estimator. - The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form - <component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters: - **params : dict - Estimator parameters. - Returns: - self : object - Estimator instance. 
 - 
to_dict()[source][source]¶
- Get the ARIMA model as a dictionary - Return the dictionary representation of the ARIMA model - Returns: - res : dictionary - The ARIMA model as a dictionary. 
 - 
update(y, X=None, maxiter=None, **kwargs)[source][source]¶
- Update the model fit with additional observed endog/exog values. - Updating an ARIMA adds new observations to the model, updating the MLE of the parameters accordingly by performing several new iterations ( - maxiter) from the existing model parameters.- Parameters: - y : array-like or iterable, shape=(n_samples,) - The time-series data to add to the endogenous samples on which the - ARIMAestimator was previously fit. This may either be a Pandas- Seriesobject or a numpy array. This should be a one- dimensional array of finite floats.- X : array-like, shape=[n_obs, n_vars], optional (default=None) - An optional 2-d array of exogenous variables. If the model was fit with an exogenous array of covariates, it will be required for updating the observed values. - maxiter : int, optional (default=None) - The number of iterations to perform when updating the model. If None, will perform - max(5, n_samples // 10)iterations.- **kwargs : keyword args - Any keyword args that should be passed as - **fit_kwargsin the new model fit.- Notes - Internally, this calls fitagain using the OLD model parameters as the starting parameters for the new model’s MLE computation.
 
- Internally, this calls