Modeling quasi-seasonal trends with date features¶
Some trends are common enough to appear seasonal, yet sporadic enough that approaching them from a seasonal perspective may not be valid. An example of this is the “end-of-the-month” effect. In this example, we’ll explore how we can create meaningful features that express seasonal trends without needing to fit a seasonal model.
Out:
pmdarima version: 0.0.0
Head of generated exog features:
DATE-WEEKDAY-0 ... DATE-DAY-OF-MONTH
0 0 ... 1
1 0 ... 2
2 0 ... 3
3 0 ... 4
4 0 ... 5
[5 rows x 8 columns]
Performing stepwise search to minimize aic
Fit ARIMA(2,1,2)x(0,0,0,0) [intercept=True]; AIC=2839.502, BIC=2891.632, Time=0.828 seconds
Fit ARIMA(0,1,0)x(0,0,0,0) [intercept=True]; AIC=2864.731, BIC=2901.967, Time=0.031 seconds
Fit ARIMA(1,1,0)x(0,0,0,0) [intercept=True]; AIC=2861.910, BIC=2902.869, Time=0.357 seconds
Fit ARIMA(0,1,1)x(0,0,0,0) [intercept=True]; AIC=2859.092, BIC=2900.051, Time=0.492 seconds
Fit ARIMA(0,1,0)x(0,0,0,0) [intercept=False]; AIC=2862.772, BIC=2896.284, Time=0.227 seconds
Fit ARIMA(1,1,2)x(0,0,0,0) [intercept=True]; AIC=2837.832, BIC=2886.238, Time=0.757 seconds
Near non-invertible roots for order (1, 1, 2)(0, 0, 0, 0); setting score to inf (at least one inverse root too close to the border of the unit circle: 0.998)
Fit ARIMA(2,1,1)x(0,0,0,0) [intercept=True]; AIC=2837.790, BIC=2886.197, Time=0.727 seconds
Near non-invertible roots for order (2, 1, 1)(0, 0, 0, 0); setting score to inf (at least one inverse root too close to the border of the unit circle: 0.999)
Fit ARIMA(3,1,2)x(0,0,0,0) [intercept=True]; AIC=2841.291, BIC=2897.145, Time=0.883 seconds
Near non-invertible roots for order (3, 1, 2)(0, 0, 0, 0); setting score to inf (at least one inverse root too close to the border of the unit circle: 0.998)
Fit ARIMA(2,1,3)x(0,0,0,0) [intercept=True]; AIC=2841.239, BIC=2897.093, Time=1.013 seconds
Near non-invertible roots for order (2, 1, 3)(0, 0, 0, 0); setting score to inf (at least one inverse root too close to the border of the unit circle: 0.998)
Fit ARIMA(1,1,1)x(0,0,0,0) [intercept=True]; AIC=2835.791, BIC=2880.474, Time=0.811 seconds
Near non-invertible roots for order (1, 1, 1)(0, 0, 0, 0); setting score to inf (at least one inverse root too close to the border of the unit circle: 0.999)
Fit ARIMA(1,1,3)x(0,0,0,0) [intercept=True]; AIC=2838.701, BIC=2890.831, Time=1.173 seconds
Near non-invertible roots for order (1, 1, 3)(0, 0, 0, 0); setting score to inf (at least one inverse root too close to the border of the unit circle: 0.996)
Fit ARIMA(3,1,1)x(0,0,0,0) [intercept=True]; AIC=2838.972, BIC=2891.102, Time=0.781 seconds
Near non-invertible roots for order (3, 1, 1)(0, 0, 0, 0); setting score to inf (at least one inverse root too close to the border of the unit circle: 0.998)
Fit ARIMA(3,1,3)x(0,0,0,0) [intercept=True]; AIC=2840.653, BIC=2900.231, Time=1.232 seconds
Near non-invertible roots for order (3, 1, 3)(0, 0, 0, 0); setting score to inf (at least one inverse root too close to the border of the unit circle: 1.000)
Final model order: (2, 1, 2)x(0, 0, 0, 0) (constant=True)
Total fit time: 9.319 seconds
print(__doc__)
# Author: Taylor Smith <taylor.smith@alkaline-ml.com>
import pmdarima as pm
from pmdarima import arima
from pmdarima import model_selection
from pmdarima import pipeline
from pmdarima import preprocessing
from pmdarima.datasets._base import load_date_example
import numpy as np
from matplotlib import pyplot as plt
print("pmdarima version: %s" % pm.__version__)
# Load the data and split it into separate pieces
y, X = load_date_example()
y_train, y_test, X_train, X_test = \
model_selection.train_test_split(y, X, test_size=20)
# We can examine traits about the time series:
pm.tsdisplay(y_train, lag_max=10)
# We can see the ACF increases and decreases rather rapidly, which means we may
# need some differencing. There also does not appear to be an obvious seasonal
# trend.
n_diffs = arima.ndiffs(y_train, max_d=5)
# Here's what the featurizer will create for us:
date_feat = preprocessing.DateFeaturizer(
column_name="date", # the name of the date feature in the exog matrix
with_day_of_week=True,
with_day_of_month=True)
_, X_train_feats = date_feat.fit_transform(y_train, X_train)
print("Head of generated exog features:\n%s" % repr(X_train_feats.head()))
# We can plug this exog featurizer into a pipeline:
pipe = pipeline.Pipeline([
('date', date_feat),
('arima', arima.AutoARIMA(d=n_diffs,
trace=3,
stepwise=True,
suppress_warnings=True,
seasonal=False))
])
pipe.fit(y_train, X_train)
# Plot our forecasts
forecasts = pipe.predict(exogenous=X_test)
fig = plt.figure(figsize=(16, 8))
ax = fig.add_subplot(1, 1, 1)
n_train = y_train.shape[0]
x = np.arange(n_train + forecasts.shape[0])
ax.plot(x[:n_train], y_train, color='blue', label='Training Data')
ax.plot(x[n_train:], forecasts, color='green', marker='o',
label='Predicted')
ax.plot(x[n_train:], y_test, color='red', label='Actual')
ax.legend(loc='lower left', borderaxespad=0.5)
ax.set_title('Predicted Foo')
ax.set_ylabel('# Foo')
plt.show()
# What next? Try combining different featurizers in your pipeline to enhance
# a model's predictive power.
Total running time of the script: ( 0 minutes 9.491 seconds)