pmdarima.model_selection
.cross_val_predict¶
-
pmdarima.model_selection.
cross_val_predict
(estimator, y, X=None, cv=None, verbose=0, averaging='mean', return_raw_predictions=False, **kwargs)[source][source]¶ Generate cross-validated estimates for each input data point
Parameters: estimator : estimator
An estimator object that implements the
fit
methody : array-like or iterable, shape=(n_samples,)
The time-series array.
X : array-like, shape=[n_obs, n_vars], optional (default=None)
An optional 2-d array of exogenous variables.
cv : BaseTSCrossValidator or None, optional (default=None)
An instance of cross-validation. If None, will use a RollingForecastCV. Note that for cross-validation predictions, the CV step cannot exceed the CV horizon, or there will be a gap between fold predictions.
verbose : integer, optional
The verbosity level.
averaging : str or callable, one of [“median”, “mean”] (default=”mean”)
Unlike normal CV, time series CV might have different folds (windows) forecasting the same time step. After all forecast windows are made, we build a matrix of y x n_folds, populating each fold’s forecasts like so:
nan nan nan # training samples nan nan nan nan nan nan nan nan nan 1 nan nan # test samples 4 3 nan 3 2.5 3.5 nan 6 5 nan nan 4
We then average each time step’s forecasts to end up with our final prediction results.
return_raw_predictions : bool (default=False)
If True, raw predictions are returned instead of averaged ones. This results in a y x h matrix. For example, if h=3, and step=1 then:
nan nan nan # training samples nan nan nan nan nan nan nan nan nan 1 4 2 # test samples 2 5 7 8 9 1 nan nan nan nan nan nan
First column contains all one-step-ahead-predictions, second column all two-step-ahead-predictions etc. Further metrics can then be calculated as desired.
Examples
>>> import pmdarima as pm >>> from pmdarima.model_selection import cross_val_predict, ... RollingForecastCV >>> y = pm.datasets.load_wineind() >>> cv = RollingForecastCV(h=14, step=12) >>> preds = cross_val_predict( ... pm.ARIMA((1, 1, 2), seasonal_order=(0, 1, 1, 12)), y, cv=cv) >>> preds[:5] array([30710.45743168, 34902.94929722, 17994.16587163, 22127.71167249, 25473.60876435])