.. _sphx_glr_auto_examples_model_selection_example_cross_validation.py: ======================================== Cross-validating your time series models ======================================== Like scikit-learn, ``pmdarima`` provides several different strategies for cross-validating your time series models. The interface was designed to behave as similarly as possible to that of scikit to make its usage as simple as possible. .. raw:: html
.. rst-class:: sphx-glr-script-out Out:: pmdarima version: 0.0.0 [CV] fold=0 .......................................................... [CV] fold=1 .......................................................... [CV] fold=2 .......................................................... [CV] fold=0 .......................................................... [CV] fold=1 .......................................................... [CV] fold=2 .......................................................... Model 1 CV scores: [23.928975763465598, 22.289666126689166, 3.7484087132831814] Model 2 CV scores: [1.4745570833470711, 15.413429296026031, 5.572225774496019] Lowest average SMAPE: 7.486737384623041 (model2) Best model: ARIMA(1,1,2)(0,1,1)[12] intercept | .. code-block:: python print(__doc__) # Author: Taylor Smith import numpy as np import pmdarima as pm from pmdarima import model_selection print("pmdarima version: %s" % pm.__version__) # Load the data and split it into separate pieces data = pm.datasets.load_wineind() train, test = model_selection.train_test_split(data, train_size=165) # Even though we have a dedicated train/test split, we can (and should) still # use cross-validation on our training set to get a good estimate of the model # performance. We can choose which model is better based on how it performs # over various folds. model1 = pm.ARIMA(order=(2, 1, 1)) model2 = pm.ARIMA(order=(1, 1, 2), seasonal_order=(0, 1, 1, 12), suppress_warnings=True) cv = model_selection.SlidingWindowForecastCV(window_size=100, step=24, h=1) model1_cv_scores = model_selection.cross_val_score( model1, train, scoring='smape', cv=cv, verbose=2) model2_cv_scores = model_selection.cross_val_score( model2, train, scoring='smape', cv=cv, verbose=2) print("Model 1 CV scores: {}".format(model1_cv_scores.tolist())) print("Model 2 CV scores: {}".format(model2_cv_scores.tolist())) # Pick based on which has a lower mean error rate m1_average_error = np.average(model1_cv_scores) m2_average_error = np.average(model2_cv_scores) errors = [m1_average_error, m2_average_error] models = [model1, model2] # print out the answer better_index = np.argmin(errors) # type: int print("Lowest average SMAPE: {} (model{})".format( errors[better_index], better_index + 1)) print("Best model: {}".format(models[better_index])) **Total running time of the script:** ( 0 minutes 18.107 seconds) .. only :: html .. container:: sphx-glr-footer .. container:: sphx-glr-download :download:`Download Python source code: example_cross_validation.py ` .. container:: sphx-glr-download :download:`Download Jupyter notebook: example_cross_validation.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_