pmdarima.preprocessing.DateFeaturizer

class pmdarima.preprocessing.DateFeaturizer(column_name, with_day_of_week=True, with_day_of_month=True, prefix=None)[source][source]

Create exogenous date features

Given an exogenous feature of dtype TimeStamp, creates a set of dummy and ordinal variables indicating:

  • Day of the week
    Particular days of the week may align with quasi-seasonal trends.
  • Day of the month
    Useful for modeling things like the end-of-month effect, ie., a department spends the remainder of its monthly budget to avoid future budget cuts, and the last Friday of the month is heavy on spending.

The motivation for this featurizer comes from a blog post by Rob Hyndman [1] on modeling quasi-seasonal patterns in time series. Note that an exogenous array _must_ be provided at inference.

Parameters:

column_name : str

The name of the date column. This forces the exogenous array to be a Pandas DataFrame, and does not permit a np.ndarray as others may.

with_day_of_week : bool, optional (default=True)

Whether to include dummy variables for the day of the week (in {0, 1}).

with_day_of_month : bool, optional (default=True)

Whether to include an ordinal feature for the day of the month (1-31).

prefix : str or None, optional (default=None)

The feature prefix

Notes

  • In order to use time series with holes, it is required that an X array be provided at prediction time. Other featurizers automatically create exog arrays into the future for inference, but this is not possible currently with the date featurizer. Your code must provide the dates for which you are forecasting as exog features.
  • The column_name field is dropped in the transformed exogenous array.

References

[R88]https://robjhyndman.com/hyndsight/monthly-seasonality/

Examples

>>> from pmdarima.datasets._base import load_date_example
>>> y, X = load_date_example()
>>> feat = DateFeaturizer(column_name='date')
>>> _, X_prime = feat.fit_transform(y, X)
>>> X_prime.head()
   DATE-WEEKDAY-0  DATE-WEEKDAY-1  ...  DATE-WEEKDAY-6  DATE-DAY-OF-MONTH
0               0               1  ...               0                  1
1               0               0  ...               0                  2
2               0               0  ...               0                  3
3               0               0  ...               0                  4
4               0               0  ...               0                  5

Methods

fit(y[, X]) Fit the transformer
fit_transform(y[, X]) Fit and transform the arrays
get_params([deep]) Get parameters for this estimator.
set_params(**params) Set the parameters of this estimator.
transform(y[, X]) Create date features
__init__(column_name, with_day_of_week=True, with_day_of_month=True, prefix=None)[source][source]

Initialize self. See help(type(self)) for accurate signature.

fit(y, X=None, **kwargs)[source][source]

Fit the transformer

Parameters:

y : array-like or None, shape=(n_samples,)

The endogenous (time-series) array.

X : array-like, shape=(n_samples, n_features)

The exogenous array of additional covariates. Must include the column_name feature, which must be a pd.Timestamp dtype.

fit_transform(y, X=None, **kwargs)[source]

Fit and transform the arrays

Parameters:

y : array-like or None, shape=(n_samples,)

The endogenous (time-series) array.

X : array-like or None, shape=(n_samples, n_features), optional

The exogenous array of additional covariates.

**kwargs : keyword args

Keyword arguments required by the transform function.

get_params(deep=True)[source]

Get parameters for this estimator.

Parameters:

deep : bool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params : mapping of string to any

Parameter names mapped to their values.

set_params(**params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params : dict

Estimator parameters.

Returns:

self : object

Estimator instance.

transform(y, X=None, **kwargs)[source][source]

Create date features

When an ARIMA is fit with an X array, it must be forecasted with one also. However, unlike other exogenous featurizers, an X array is required at inference time for the DateFeaturizer.

Parameters:

y : array-like or None, shape=(n_samples,)

The endogenous (time-series) array. This is unused and technically optional for the Fourier terms, since it uses the pre-computed n to calculate the seasonal Fourier terms.

X : array-like, shape=(n_samples, n_features)

The exogenous array of additional covariates. The column_name feature must be present, and of dtype pd.Timestamp

Examples using pmdarima.preprocessing.DateFeaturizer