.. _sphx_glr_auto_examples_model_selection_example_cross_validation.py:


========================================
Cross-validating your time series models
========================================


Like scikit-learn, ``pmdarima`` provides several different strategies for
cross-validating your time series models. The interface was designed to behave
as similarly as possible to that of scikit to make its usage as simple as
possible.

.. raw:: html

   <br/>


.. rst-class:: sphx-glr-script-out

 Out::

    pmdarima version: 0.0.0
    [CV] fold=0 ..........................................................
    [CV] fold=1 ..........................................................
    [CV] fold=2 ..........................................................
    [CV] fold=0 ..........................................................
    [CV] fold=1 ..........................................................
    [CV] fold=2 ..........................................................
    Model 1 CV scores: [23.928975763465598, 22.289666126689166, 3.7484087132831814]
    Model 2 CV scores: [1.4745570833470711, 15.413429296026031, 5.572225774496019]
    Lowest average SMAPE: 7.486737384623041 (model2)
    Best model:  ARIMA(1,1,2)(0,1,1)[12] intercept


|


.. code-block:: python

    print(__doc__)

    # Author: Taylor Smith <taylor.smith@alkaline-ml.com>

    import numpy as np
    import pmdarima as pm
    from pmdarima import model_selection

    print("pmdarima version: %s" % pm.__version__)

    # Load the data and split it into separate pieces
    data = pm.datasets.load_wineind()
    train, test = model_selection.train_test_split(data, train_size=165)

    # Even though we have a dedicated train/test split, we can (and should) still
    # use cross-validation on our training set to get a good estimate of the model
    # performance. We can choose which model is better based on how it performs
    # over various folds.
    model1 = pm.ARIMA(order=(2, 1, 1))
    model2 = pm.ARIMA(order=(1, 1, 2),
                      seasonal_order=(0, 1, 1, 12),
                      suppress_warnings=True)
    cv = model_selection.SlidingWindowForecastCV(window_size=100, step=24, h=1)

    model1_cv_scores = model_selection.cross_val_score(
        model1, train, scoring='smape', cv=cv, verbose=2)

    model2_cv_scores = model_selection.cross_val_score(
        model2, train, scoring='smape', cv=cv, verbose=2)

    print("Model 1 CV scores: {}".format(model1_cv_scores.tolist()))
    print("Model 2 CV scores: {}".format(model2_cv_scores.tolist()))

    # Pick based on which has a lower mean error rate
    m1_average_error = np.average(model1_cv_scores)
    m2_average_error = np.average(model2_cv_scores)
    errors = [m1_average_error, m2_average_error]
    models = [model1, model2]

    # print out the answer
    better_index = np.argmin(errors)  # type: int
    print("Lowest average SMAPE: {} (model{})".format(
        errors[better_index], better_index + 1))
    print("Best model: {}".format(models[better_index]))

**Total running time of the script:** ( 0 minutes  18.107 seconds)


.. only :: html

 .. container:: sphx-glr-footer


  .. container:: sphx-glr-download

     :download:`Download Python source code: example_cross_validation.py <example_cross_validation.py>`


  .. container:: sphx-glr-download

     :download:`Download Jupyter notebook: example_cross_validation.ipynb <example_cross_validation.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.readthedocs.io>`_