.. _sphx_glr_auto_examples_preprocessing_example_date_featurizer.py:
=================================================
Modeling quasi-seasonal trends with date features
=================================================
Some trends are common enough to appear seasonal, yet sporadic enough that
approaching them from a seasonal perspective may not be valid. An example of
this is the `"end-of-the-month" effect `_.
In this example, we'll explore how we can create meaningful features that
express seasonal trends without needing to fit a seasonal model.
.. raw:: html
   
.. rst-class:: sphx-glr-horizontal
    *
      .. image:: /auto_examples/preprocessing/images/sphx_glr_example_date_featurizer_001.png
            :scale: 47
    *
      .. image:: /auto_examples/preprocessing/images/sphx_glr_example_date_featurizer_002.png
            :scale: 47
.. rst-class:: sphx-glr-script-out
 Out::
    pmdarima version: 0.0.0
    Head of generated exog features:
       DATE-WEEKDAY-0  ...  DATE-DAY-OF-MONTH
    0               0  ...                  1
    1               0  ...                  2
    2               0  ...                  3
    3               0  ...                  4
    4               0  ...                  5
    [5 rows x 8 columns]
    Performing stepwise search to minimize aic
    Fit ARIMA(2,1,2)x(0,0,0,0) [intercept=True]; AIC=2839.502, BIC=2891.632, Time=0.828 seconds
    Fit ARIMA(0,1,0)x(0,0,0,0) [intercept=True]; AIC=2864.731, BIC=2901.967, Time=0.031 seconds
    Fit ARIMA(1,1,0)x(0,0,0,0) [intercept=True]; AIC=2861.910, BIC=2902.869, Time=0.357 seconds
    Fit ARIMA(0,1,1)x(0,0,0,0) [intercept=True]; AIC=2859.092, BIC=2900.051, Time=0.492 seconds
    Fit ARIMA(0,1,0)x(0,0,0,0) [intercept=False]; AIC=2862.772, BIC=2896.284, Time=0.227 seconds
    Fit ARIMA(1,1,2)x(0,0,0,0) [intercept=True]; AIC=2837.832, BIC=2886.238, Time=0.757 seconds
    Near non-invertible roots for order (1, 1, 2)(0, 0, 0, 0); setting score to inf (at least one inverse root too close to the border of the unit circle: 0.998)
    Fit ARIMA(2,1,1)x(0,0,0,0) [intercept=True]; AIC=2837.790, BIC=2886.197, Time=0.727 seconds
    Near non-invertible roots for order (2, 1, 1)(0, 0, 0, 0); setting score to inf (at least one inverse root too close to the border of the unit circle: 0.999)
    Fit ARIMA(3,1,2)x(0,0,0,0) [intercept=True]; AIC=2841.291, BIC=2897.145, Time=0.883 seconds
    Near non-invertible roots for order (3, 1, 2)(0, 0, 0, 0); setting score to inf (at least one inverse root too close to the border of the unit circle: 0.998)
    Fit ARIMA(2,1,3)x(0,0,0,0) [intercept=True]; AIC=2841.239, BIC=2897.093, Time=1.013 seconds
    Near non-invertible roots for order (2, 1, 3)(0, 0, 0, 0); setting score to inf (at least one inverse root too close to the border of the unit circle: 0.998)
    Fit ARIMA(1,1,1)x(0,0,0,0) [intercept=True]; AIC=2835.791, BIC=2880.474, Time=0.811 seconds
    Near non-invertible roots for order (1, 1, 1)(0, 0, 0, 0); setting score to inf (at least one inverse root too close to the border of the unit circle: 0.999)
    Fit ARIMA(1,1,3)x(0,0,0,0) [intercept=True]; AIC=2838.701, BIC=2890.831, Time=1.173 seconds
    Near non-invertible roots for order (1, 1, 3)(0, 0, 0, 0); setting score to inf (at least one inverse root too close to the border of the unit circle: 0.996)
    Fit ARIMA(3,1,1)x(0,0,0,0) [intercept=True]; AIC=2838.972, BIC=2891.102, Time=0.781 seconds
    Near non-invertible roots for order (3, 1, 1)(0, 0, 0, 0); setting score to inf (at least one inverse root too close to the border of the unit circle: 0.998)
    Fit ARIMA(3,1,3)x(0,0,0,0) [intercept=True]; AIC=2840.653, BIC=2900.231, Time=1.232 seconds
    Near non-invertible roots for order (3, 1, 3)(0, 0, 0, 0); setting score to inf (at least one inverse root too close to the border of the unit circle: 1.000)
    Final model order: (2, 1, 2)x(0, 0, 0, 0) (constant=True)
    Total fit time: 9.319 seconds
|
.. code-block:: python
    print(__doc__)
    # Author: Taylor Smith 
    import pmdarima as pm
    from pmdarima import arima
    from pmdarima import model_selection
    from pmdarima import pipeline
    from pmdarima import preprocessing
    from pmdarima.datasets._base import load_date_example
    import numpy as np
    from matplotlib import pyplot as plt
    print("pmdarima version: %s" % pm.__version__)
    # Load the data and split it into separate pieces
    y, X = load_date_example()
    y_train, y_test, X_train, X_test = \
        model_selection.train_test_split(y, X, test_size=20)
    # We can examine traits about the time series:
    pm.tsdisplay(y_train, lag_max=10)
    # We can see the ACF increases and decreases rather rapidly, which means we may
    # need some differencing. There also does not appear to be an obvious seasonal
    # trend.
    n_diffs = arima.ndiffs(y_train, max_d=5)
    # Here's what the featurizer will create for us:
    date_feat = preprocessing.DateFeaturizer(
        column_name="date",  # the name of the date feature in the exog matrix
        with_day_of_week=True,
        with_day_of_month=True)
    _, X_train_feats = date_feat.fit_transform(y_train, X_train)
    print("Head of generated exog features:\n%s" % repr(X_train_feats.head()))
    # We can plug this exog featurizer into a pipeline:
    pipe = pipeline.Pipeline([
        ('date', date_feat),
        ('arima', arima.AutoARIMA(d=n_diffs,
                                  trace=3,
                                  stepwise=True,
                                  suppress_warnings=True,
                                  seasonal=False))
    ])
    pipe.fit(y_train, X_train)
    # Plot our forecasts
    forecasts = pipe.predict(exogenous=X_test)
    fig = plt.figure(figsize=(16, 8))
    ax = fig.add_subplot(1, 1, 1)
    n_train = y_train.shape[0]
    x = np.arange(n_train + forecasts.shape[0])
    ax.plot(x[:n_train], y_train, color='blue', label='Training Data')
    ax.plot(x[n_train:], forecasts, color='green', marker='o',
            label='Predicted')
    ax.plot(x[n_train:], y_test, color='red', label='Actual')
    ax.legend(loc='lower left', borderaxespad=0.5)
    ax.set_title('Predicted Foo')
    ax.set_ylabel('# Foo')
    plt.show()
    # What next? Try combining different featurizers in your pipeline to enhance
    # a model's predictive power.
**Total running time of the script:** ( 0 minutes  9.491 seconds)
.. only :: html
 .. container:: sphx-glr-footer
  .. container:: sphx-glr-download
     :download:`Download Python source code: example_date_featurizer.py `
  .. container:: sphx-glr-download
     :download:`Download Jupyter notebook: example_date_featurizer.ipynb `
.. only:: html
 .. rst-class:: sphx-glr-signature
    `Gallery generated by Sphinx-Gallery `_