Cross-validating your time series modelsΒΆ
Like scikit-learn, pmdarima
provides several different strategies for
cross-validating your time series models. The interface was designed to behave
as similarly as possible to that of scikit to make its usage as simple as
possible.
Out:
pmdarima version: 0.0.0
[CV] fold=0 ..........................................................
[CV] fold=1 ..........................................................
[CV] fold=2 ..........................................................
[CV] fold=0 ..........................................................
[CV] fold=1 ..........................................................
[CV] fold=2 ..........................................................
Model 1 CV scores: [23.928976841681518, 22.289666122815046, 3.7484080171047283]
Model 2 CV scores: [1.4413474767581218, 15.413429218463346, 5.572225774148651]
Lowest average SMAPE: 7.47566748979004 (model2)
Best model: ARIMA(maxiter=50, method='lbfgs', order=(1, 1, 2), out_of_sample_size=0,
scoring='mse', scoring_args=None, seasonal_order=(0, 1, 1, 12),
start_params=None, suppress_warnings=True, trend=None,
with_intercept=True)
print(__doc__)
# Author: Taylor Smith <taylor.smith@alkaline-ml.com>
import numpy as np
import pmdarima as pm
from pmdarima import model_selection
print("pmdarima version: %s" % pm.__version__)
# Load the data and split it into separate pieces
data = pm.datasets.load_wineind()
train, test = model_selection.train_test_split(data, train_size=165)
# Even though we have a dedicated train/test split, we can (and should) still
# use cross-validation on our training set to get a good estimate of the model
# performance. We can choose which model is better based on how it performs
# over various folds.
model1 = pm.ARIMA(order=(2, 1, 1))
model2 = pm.ARIMA(order=(1, 1, 2),
seasonal_order=(0, 1, 1, 12),
suppress_warnings=True)
cv = model_selection.SlidingWindowForecastCV(window_size=100, step=24, h=1)
model1_cv_scores = model_selection.cross_val_score(
model1, train, scoring='smape', cv=cv, verbose=2)
model2_cv_scores = model_selection.cross_val_score(
model2, train, scoring='smape', cv=cv, verbose=2)
print("Model 1 CV scores: {}".format(model1_cv_scores.tolist()))
print("Model 2 CV scores: {}".format(model2_cv_scores.tolist()))
# Pick based on which has a lower mean error rate
m1_average_error = np.average(model1_cv_scores)
m2_average_error = np.average(model2_cv_scores)
errors = [m1_average_error, m2_average_error]
models = [model1, model2]
# print out the answer
better_index = np.argmin(errors) # type: int
print("Lowest average SMAPE: {} (model{})".format(
errors[better_index], better_index + 1))
print("Best model: {}".format(models[better_index]))
Total running time of the script: ( 0 minutes 4.556 seconds)