pmdarima.preprocessing
.DateFeaturizer¶
-
class
pmdarima.preprocessing.
DateFeaturizer
(column_name, with_day_of_week=True, with_day_of_month=True, prefix=None)[source][source]¶ Create exogenous date features
Given an exogenous feature of dtype TimeStamp, creates a set of dummy and ordinal variables indicating:
- Day of the week
- Particular days of the week may align with quasi-seasonal trends.
- Day of the month
- Useful for modeling things like the end-of-month effect, ie., a department spends the remainder of its monthly budget to avoid future budget cuts, and the last Friday of the month is heavy on spending.
The motivation for this featurizer comes from a blog post by Rob Hyndman [1] on modeling quasi-seasonal patterns in time series. Note that an exogenous array _must_ be provided at inference.
Parameters: column_name : str
The name of the date column. This forces the exogenous array to be a Pandas DataFrame, and does not permit a np.ndarray as others may.
with_day_of_week : bool, optional (default=True)
Whether to include dummy variables for the day of the week (in {0, 1}).
with_day_of_month : bool, optional (default=True)
Whether to include an ordinal feature for the day of the month (1-31).
prefix : str or None, optional (default=None)
The feature prefix
Notes
- In order to use time series with holes, it is required that an X array be provided at prediction time. Other featurizers automatically create exog arrays into the future for inference, but this is not possible currently with the date featurizer. Your code must provide the dates for which you are forecasting as exog features.
- The
column_name
field is dropped in the transformed exogenous array.
References
[R88] https://robjhyndman.com/hyndsight/monthly-seasonality/ Examples
>>> from pmdarima.datasets._base import load_date_example >>> y, X = load_date_example() >>> feat = DateFeaturizer(column_name='date') >>> _, X_prime = feat.fit_transform(y, X) >>> X_prime.head() DATE-WEEKDAY-0 DATE-WEEKDAY-1 ... DATE-WEEKDAY-6 DATE-DAY-OF-MONTH 0 0 1 ... 0 1 1 0 0 ... 0 2 2 0 0 ... 0 3 3 0 0 ... 0 4 4 0 0 ... 0 5
Methods
fit
(y[, X])Fit the transformer fit_transform
(y[, X])Fit and transform the arrays get_params
([deep])Get parameters for this estimator. set_params
(**params)Set the parameters of this estimator. transform
(y[, X])Create date features -
__init__
(column_name, with_day_of_week=True, with_day_of_month=True, prefix=None)[source][source]¶ Initialize self. See help(type(self)) for accurate signature.
-
fit
(y, X=None, **kwargs)[source][source]¶ Fit the transformer
Parameters: y : array-like or None, shape=(n_samples,)
The endogenous (time-series) array.
X : array-like, shape=(n_samples, n_features)
The exogenous array of additional covariates. Must include the
column_name
feature, which must be a pd.Timestamp dtype.
-
fit_transform
(y, X=None, **kwargs)[source]¶ Fit and transform the arrays
Parameters: y : array-like or None, shape=(n_samples,)
The endogenous (time-series) array.
X : array-like or None, shape=(n_samples, n_features), optional
The exogenous array of additional covariates.
**kwargs : keyword args
Keyword arguments required by the transform function.
-
get_params
(deep=True)[source]¶ Get parameters for this estimator.
Parameters: deep : bool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params : mapping of string to any
Parameter names mapped to their values.
-
set_params
(**params)[source]¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.Parameters: **params : dict
Estimator parameters.
Returns: self : object
Estimator instance.
-
transform
(y, X=None, **kwargs)[source][source]¶ Create date features
When an ARIMA is fit with an X array, it must be forecasted with one also. However, unlike other exogenous featurizers, an X array is required at inference time for the DateFeaturizer.
Parameters: y : array-like or None, shape=(n_samples,)
The endogenous (time-series) array. This is unused and technically optional for the Fourier terms, since it uses the pre-computed
n
to calculate the seasonal Fourier terms.X : array-like, shape=(n_samples, n_features)
The exogenous array of additional covariates. The
column_name
feature must be present, and of dtype pd.Timestamp