Cross-validating your time series modelsΒΆ
Like scikit-learn, pmdarima
provides several different strategies for
cross-validating your time series models. The interface was designed to behave
as similarly as possible to that of scikit to make its usage as simple as
possible.
Out:
pmdarima version: 0.0.0
[CV] fold=0 ..........................................................
[CV] fold=1 ..........................................................
[CV] fold=2 ..........................................................
[CV] fold=0 ..........................................................
[CV] fold=1 ..........................................................
[CV] fold=2 ..........................................................
Model 1 CV scores: [23.92897591530385, 22.289666174351435, 3.7484075369532053]
Model 2 CV scores: [1.4772800360106546, 15.413428596219074, 5.57222577821664]
Lowest average SMAPE: 7.487644803482123 (model2)
Best model: ARIMA(1,1,2)(0,1,1)[12] intercept
print(__doc__)
# Author: Taylor Smith <taylor.smith@alkaline-ml.com>
import numpy as np
import pmdarima as pm
from pmdarima import model_selection
print("pmdarima version: %s" % pm.__version__)
# Load the data and split it into separate pieces
data = pm.datasets.load_wineind()
train, test = model_selection.train_test_split(data, train_size=165)
# Even though we have a dedicated train/test split, we can (and should) still
# use cross-validation on our training set to get a good estimate of the model
# performance. We can choose which model is better based on how it performs
# over various folds.
model1 = pm.ARIMA(order=(2, 1, 1))
model2 = pm.ARIMA(order=(1, 1, 2),
seasonal_order=(0, 1, 1, 12),
suppress_warnings=True)
cv = model_selection.SlidingWindowForecastCV(window_size=100, step=24, h=1)
model1_cv_scores = model_selection.cross_val_score(
model1, train, scoring='smape', cv=cv, verbose=2)
model2_cv_scores = model_selection.cross_val_score(
model2, train, scoring='smape', cv=cv, verbose=2)
print("Model 1 CV scores: {}".format(model1_cv_scores.tolist()))
print("Model 2 CV scores: {}".format(model2_cv_scores.tolist()))
# Pick based on which has a lower mean error rate
m1_average_error = np.average(model1_cv_scores)
m2_average_error = np.average(model2_cv_scores)
errors = [m1_average_error, m2_average_error]
models = [model1, model2]
# print out the answer
better_index = np.argmin(errors) # type: int
print("Lowest average SMAPE: {} (model{})".format(
errors[better_index], better_index + 1))
print("Best model: {}".format(models[better_index]))
Total running time of the script: ( 0 minutes 28.504 seconds)