.. _sphx_glr_auto_examples_preprocessing_example_date_featurizer.py: ================================================= Modeling quasi-seasonal trends with date features ================================================= Some trends are common enough to appear seasonal, yet sporadic enough that approaching them from a seasonal perspective may not be valid. An example of this is the `"end-of-the-month" effect `_. In this example, we'll explore how we can create meaningful features that express seasonal trends without needing to fit a seasonal model. .. raw:: html
.. rst-class:: sphx-glr-horizontal * .. image:: /auto_examples/preprocessing/images/sphx_glr_example_date_featurizer_001.png :scale: 47 * .. image:: /auto_examples/preprocessing/images/sphx_glr_example_date_featurizer_002.png :scale: 47 .. rst-class:: sphx-glr-script-out Out:: pmdarima version: 0.0.0 Head of generated exog features: DATE-WEEKDAY-0 ... DATE-DAY-OF-MONTH 0 0 ... 1 1 0 ... 2 2 0 ... 3 3 0 ... 4 4 0 ... 5 [5 rows x 8 columns] Performing stepwise search to minimize aic Fit ARIMA(2,1,2)x(0,0,0,0) [intercept=True]; AIC=2839.502, BIC=2891.632, Time=0.828 seconds Fit ARIMA(0,1,0)x(0,0,0,0) [intercept=True]; AIC=2864.731, BIC=2901.967, Time=0.031 seconds Fit ARIMA(1,1,0)x(0,0,0,0) [intercept=True]; AIC=2861.910, BIC=2902.869, Time=0.357 seconds Fit ARIMA(0,1,1)x(0,0,0,0) [intercept=True]; AIC=2859.092, BIC=2900.051, Time=0.492 seconds Fit ARIMA(0,1,0)x(0,0,0,0) [intercept=False]; AIC=2862.772, BIC=2896.284, Time=0.227 seconds Fit ARIMA(1,1,2)x(0,0,0,0) [intercept=True]; AIC=2837.832, BIC=2886.238, Time=0.757 seconds Near non-invertible roots for order (1, 1, 2)(0, 0, 0, 0); setting score to inf (at least one inverse root too close to the border of the unit circle: 0.998) Fit ARIMA(2,1,1)x(0,0,0,0) [intercept=True]; AIC=2837.790, BIC=2886.197, Time=0.727 seconds Near non-invertible roots for order (2, 1, 1)(0, 0, 0, 0); setting score to inf (at least one inverse root too close to the border of the unit circle: 0.999) Fit ARIMA(3,1,2)x(0,0,0,0) [intercept=True]; AIC=2841.291, BIC=2897.145, Time=0.883 seconds Near non-invertible roots for order (3, 1, 2)(0, 0, 0, 0); setting score to inf (at least one inverse root too close to the border of the unit circle: 0.998) Fit ARIMA(2,1,3)x(0,0,0,0) [intercept=True]; AIC=2841.239, BIC=2897.093, Time=1.013 seconds Near non-invertible roots for order (2, 1, 3)(0, 0, 0, 0); setting score to inf (at least one inverse root too close to the border of the unit circle: 0.998) Fit ARIMA(1,1,1)x(0,0,0,0) [intercept=True]; AIC=2835.791, BIC=2880.474, Time=0.811 seconds Near non-invertible roots for order (1, 1, 1)(0, 0, 0, 0); setting score to inf (at least one inverse root too close to the border of the unit circle: 0.999) Fit ARIMA(1,1,3)x(0,0,0,0) [intercept=True]; AIC=2838.701, BIC=2890.831, Time=1.173 seconds Near non-invertible roots for order (1, 1, 3)(0, 0, 0, 0); setting score to inf (at least one inverse root too close to the border of the unit circle: 0.996) Fit ARIMA(3,1,1)x(0,0,0,0) [intercept=True]; AIC=2838.972, BIC=2891.102, Time=0.781 seconds Near non-invertible roots for order (3, 1, 1)(0, 0, 0, 0); setting score to inf (at least one inverse root too close to the border of the unit circle: 0.998) Fit ARIMA(3,1,3)x(0,0,0,0) [intercept=True]; AIC=2840.653, BIC=2900.231, Time=1.232 seconds Near non-invertible roots for order (3, 1, 3)(0, 0, 0, 0); setting score to inf (at least one inverse root too close to the border of the unit circle: 1.000) Final model order: (2, 1, 2)x(0, 0, 0, 0) (constant=True) Total fit time: 9.319 seconds | .. code-block:: python print(__doc__) # Author: Taylor Smith import pmdarima as pm from pmdarima import arima from pmdarima import model_selection from pmdarima import pipeline from pmdarima import preprocessing from pmdarima.datasets._base import load_date_example import numpy as np from matplotlib import pyplot as plt print("pmdarima version: %s" % pm.__version__) # Load the data and split it into separate pieces y, X = load_date_example() y_train, y_test, X_train, X_test = \ model_selection.train_test_split(y, X, test_size=20) # We can examine traits about the time series: pm.tsdisplay(y_train, lag_max=10) # We can see the ACF increases and decreases rather rapidly, which means we may # need some differencing. There also does not appear to be an obvious seasonal # trend. n_diffs = arima.ndiffs(y_train, max_d=5) # Here's what the featurizer will create for us: date_feat = preprocessing.DateFeaturizer( column_name="date", # the name of the date feature in the exog matrix with_day_of_week=True, with_day_of_month=True) _, X_train_feats = date_feat.fit_transform(y_train, X_train) print("Head of generated exog features:\n%s" % repr(X_train_feats.head())) # We can plug this exog featurizer into a pipeline: pipe = pipeline.Pipeline([ ('date', date_feat), ('arima', arima.AutoARIMA(d=n_diffs, trace=3, stepwise=True, suppress_warnings=True, seasonal=False)) ]) pipe.fit(y_train, X_train) # Plot our forecasts forecasts = pipe.predict(exogenous=X_test) fig = plt.figure(figsize=(16, 8)) ax = fig.add_subplot(1, 1, 1) n_train = y_train.shape[0] x = np.arange(n_train + forecasts.shape[0]) ax.plot(x[:n_train], y_train, color='blue', label='Training Data') ax.plot(x[n_train:], forecasts, color='green', marker='o', label='Predicted') ax.plot(x[n_train:], y_test, color='red', label='Actual') ax.legend(loc='lower left', borderaxespad=0.5) ax.set_title('Predicted Foo') ax.set_ylabel('# Foo') plt.show() # What next? Try combining different featurizers in your pipeline to enhance # a model's predictive power. **Total running time of the script:** ( 0 minutes 9.491 seconds) .. only :: html .. container:: sphx-glr-footer .. container:: sphx-glr-download :download:`Download Python source code: example_date_featurizer.py ` .. container:: sphx-glr-download :download:`Download Jupyter notebook: example_date_featurizer.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_