.. _sphx_glr_auto_examples_preprocessing_example_date_featurizer.py: ================================================= Modeling quasi-seasonal trends with date features ================================================= Some trends are common enough to appear seasonal, yet sporadic enough that approaching them from a seasonal perspective may not be valid. An example of this is the `"end-of-the-month" effect `_. In this example, we'll explore how we can create meaningful features that express seasonal trends without needing to fit a seasonal model. .. raw:: html
.. rst-class:: sphx-glr-horizontal * .. image:: /auto_examples/preprocessing/images/sphx_glr_example_date_featurizer_001.png :scale: 47 * .. image:: /auto_examples/preprocessing/images/sphx_glr_example_date_featurizer_002.png :scale: 47 .. rst-class:: sphx-glr-script-out Out:: pmdarima version: 0.0.0 Head of generated exog features: DATE-WEEKDAY-0 ... DATE-DAY-OF-MONTH 0 0 ... 1 1 0 ... 2 2 0 ... 3 3 0 ... 4 4 0 ... 5 [5 rows x 8 columns] Performing stepwise search to minimize aic Near non-invertible roots for order (2, 1, 2)(0, 0, 0, 0); setting score to inf (at least one inverse root too close to the border of the unit circle: 0.998) ARIMA(2,1,2)(0,0,0)[0] intercept : AIC=inf, Time=0.92 sec ARIMA(0,1,0)(0,0,0)[0] intercept : AIC=2864.731, Time=0.03 sec First viable model found (2864.731) ARIMA(1,1,0)(0,0,0)[0] intercept : AIC=2861.910, Time=0.41 sec New best model found (2861.910 < 2864.731) ARIMA(0,1,1)(0,0,0)[0] intercept : AIC=2859.092, Time=0.80 sec New best model found (2859.092 < 2861.910) ARIMA(0,1,0)(0,0,0)[0] : AIC=2862.772, Time=0.28 sec Near non-invertible roots for order (1, 1, 1)(0, 0, 0, 0); setting score to inf (at least one inverse root too close to the border of the unit circle: 0.999) ARIMA(1,1,1)(0,0,0)[0] intercept : AIC=inf, Time=0.87 sec ARIMA(0,1,2)(0,0,0)[0] intercept : AIC=2847.648, Time=0.67 sec New best model found (2847.648 < 2859.092) Near non-invertible roots for order (1, 1, 2)(0, 0, 0, 0); setting score to inf (at least one inverse root too close to the border of the unit circle: 0.998) ARIMA(1,1,2)(0,0,0)[0] intercept : AIC=inf, Time=0.90 sec ARIMA(0,1,3)(0,0,0)[0] intercept : AIC=2844.490, Time=0.80 sec New best model found (2844.490 < 2847.648) Near non-invertible roots for order (1, 1, 3)(0, 0, 0, 0); setting score to inf (at least one inverse root too close to the border of the unit circle: 0.996) ARIMA(1,1,3)(0,0,0)[0] intercept : AIC=inf, Time=0.97 sec ARIMA(0,1,4)(0,0,0)[0] intercept : AIC=2844.554, Time=0.94 sec ARIMA(1,1,4)(0,0,0)[0] intercept : AIC=2846.569, Time=0.98 sec ARIMA(0,1,3)(0,0,0)[0] : AIC=2842.499, Time=0.63 sec New best model found (2842.499 < 2844.490) ARIMA(0,1,2)(0,0,0)[0] : AIC=2845.649, Time=0.52 sec ARIMA(1,1,3)(0,0,0)[0] : AIC=2840.067, Time=0.86 sec New best model found (2840.067 < 2842.499) ARIMA(1,1,2)(0,0,0)[0] : AIC=2839.103, Time=0.59 sec New best model found (2839.103 < 2840.067) ARIMA(1,1,1)(0,0,0)[0] : AIC=2837.109, Time=0.40 sec New best model found (2837.109 < 2839.103) ARIMA(0,1,1)(0,0,0)[0] : AIC=2857.120, Time=0.32 sec ARIMA(1,1,0)(0,0,0)[0] : AIC=2859.946, Time=0.33 sec ARIMA(2,1,1)(0,0,0)[0] : AIC=2839.104, Time=0.52 sec ARIMA(2,1,0)(0,0,0)[0] : AIC=2854.995, Time=0.50 sec ARIMA(2,1,2)(0,0,0)[0] : AIC=2840.740, Time=0.68 sec Best model: ARIMA(1,1,1)(0,0,0)[0] Total fit time: 13.926 seconds | .. code-block:: python print(__doc__) # Author: Taylor Smith import pmdarima as pm from pmdarima import arima from pmdarima import model_selection from pmdarima import pipeline from pmdarima import preprocessing from pmdarima.datasets._base import load_date_example import numpy as np from matplotlib import pyplot as plt print("pmdarima version: %s" % pm.__version__) # Load the data and split it into separate pieces y, X = load_date_example() y_train, y_test, X_train, X_test = \ model_selection.train_test_split(y, X, test_size=20) # We can examine traits about the time series: pm.tsdisplay(y_train, lag_max=10) # We can see the ACF increases and decreases rather rapidly, which means we may # need some differencing. There also does not appear to be an obvious seasonal # trend. n_diffs = arima.ndiffs(y_train, max_d=5) # Here's what the featurizer will create for us: date_feat = preprocessing.DateFeaturizer( column_name="date", # the name of the date feature in the exog matrix with_day_of_week=True, with_day_of_month=True) _, X_train_feats = date_feat.fit_transform(y_train, X_train) print("Head of generated exog features:\n%s" % repr(X_train_feats.head())) # We can plug this exog featurizer into a pipeline: pipe = pipeline.Pipeline([ ('date', date_feat), ('arima', arima.AutoARIMA(d=n_diffs, trace=3, stepwise=True, suppress_warnings=True, seasonal=False)) ]) pipe.fit(y_train, X_train) # Plot our forecasts forecasts = pipe.predict(exogenous=X_test) fig = plt.figure(figsize=(16, 8)) ax = fig.add_subplot(1, 1, 1) n_train = y_train.shape[0] x = np.arange(n_train + forecasts.shape[0]) ax.plot(x[:n_train], y_train, color='blue', label='Training Data') ax.plot(x[n_train:], forecasts, color='green', marker='o', label='Predicted') ax.plot(x[n_train:], y_test, color='red', label='Actual') ax.legend(loc='lower left', borderaxespad=0.5) ax.set_title('Predicted Foo') ax.set_ylabel('# Foo') plt.show() # What next? Try combining different featurizers in your pipeline to enhance # a model's predictive power. **Total running time of the script:** ( 0 minutes 14.234 seconds) .. only :: html .. container:: sphx-glr-footer .. container:: sphx-glr-download :download:`Download Python source code: example_date_featurizer.py ` .. container:: sphx-glr-download :download:`Download Jupyter notebook: example_date_featurizer.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_