## Speed up Forecasting while Preserving Performances

When approaching for the first time a forecasting task, it’s a good practice to start adopting standard and baseline solutions. In the first part of the project, **we must dedicate our time to understanding the business needs and conducting adequate explorative analyses**. If we can’t wait to build a predictive model, the best option is to fit naive models. This also sounds good since it may help in understanding the data, adopting an adequate validation strategy, or introducing fancy ideas.

After this preliminary phase, **when we are more confident about the achieved results, we can focus more on the engineering choices** to develop the most adequate solution. There are a lot of activities we could optimize. From the data processing to the model inference, we have to take care lot of aspects to make our solution works the best.

**Sometimes providing forecasts in a fast and effective way may be a need**. In that case, we have to configure our pipeline to provide predictions in the fastest way while maintaining an adequate performance level. Retraining the models from scratch is important but not mandatory. Since it may not grant performance boost/stability, we risk wasting precious time on retraining every time we have to forecast. The possibility to reuse models and **make forecasts without a mandatory fit it’s the first advantage to speed up forecasting**.

At the same time, **we could increase the speed of forecasting with some simple yet effective tricks**. For example, everyone knows feature selection as a technique to reduce the feature dimensionality received as input by a predictive model. Feature selection is an important step in most machine learning pipelines applied with the primary scope to improve performance. **When discharging features we are reducing the complexity of a model resulting also in lower inference times**.

In this post, **we demonstrate feature selection’s effectiveness in reducing the inference time of forecasting while avoiding significant drops in performance**. To facilitate and standardize forecasting with every machine learning model, I developed **tspiral**. **tspiral** is a python package that goes beyond classic recursive forecasting offering various forecasting techniques. Its perfect integration with scikit-learn makes possible the adoption of the rich ecosystem build on top of scikit-learn also to the time series field.

We simulate multiple time series with an hourly frequency and a double seasonality (daily and weekly). We also add a trend, obtained from a smoothed random walk, to introduce a stochastic behavior.

The last parts of the series are used as a test set where we measure forecasting errors and the inference time required to make predictions. For our experiment, we simulate a hundred independent time series. We say “independent” since all the series are not related to each other despite they show very similar behaviors. In this way, we individually model them.

Both recursive and direct forecasting strategies are tested. These methodologies forecast time series using as input lagged values of the target. In other words, to forecast the next hour’s value we use the previous available hourly observations rearranged in a more friendly tabular format. **Carrying out feature selection for time series forecasting is as simple as in standard tabular supervised tasks**. The selection algorithm simply operates on the lagged target features. Below is an example of feature selection using recursive forecasting.

from sklearn.linear_model import Ridge

from sklearn.pipeline import make_pipeline

from sklearn.feature_selection import SelectFromModel

from tsprial.forecasting importForecastingCascademax_lags = 72

recursive_model =ForecastingCascade(

make_pipeline(

SelectFromModel(

Ridge(), threshold='median',

max_features=max_lags,

),

Ridge()

),

lags=range(1,169),

use_exog=False

)recursive_model.fit(None, y)selected_lags= recursive_model.estimator_['selectfrommodel'].get_support(indices=True)

We use the importance weights of a meta-estimator (coefficients in the case of a linear model) to select the important features from the training data. This is a naive and fast way of selecting features but **feature selection for time series can be carried out using the same techniques usually applied in a tabular regression task**.

In the case of direct forecasting, where we have to fit a separate estimator for each forecasting step, the procedure remains quite the same. We make the selection for each forecasting step. Different subsets of important lags are selected by each estimator. To aggregate the results and produce only a unique set of meaningful lags, we choose the ones that are more frequently selected.

from sklearn.linear_model import Ridge

from sklearn.pipeline import make_pipeline

from sklearn.feature_selection import SelectFromModel

from tsprial.forecasting importForecastingChainmax_lags = 72

direct_model =ForecastingChain(

make_pipeline(

SelectFromModel(

Ridge(), threshold='median',

),

Ridge()

),

n_estimators=168,

lags=range(1,169),

use_exog=False,

n_jobs=-1

)direct_model.fit(None, y)selected_lags= np.argsort(np.asarray([

est.estimator_['selectfrommodel'].get_support()

for est in direct_model.estimators_

]).sum(0))[-max_lags:]

Lags selection is strictly connected with the model performances. In a pure autoregressive context, without additional exogenous variables, lagged target values are the only valuable information to provide good forecasts. Expecting a significant improvement of the error metrics, using feature selection, may be too optimistic. Let’s inspect the results.

We fit three variations of recursive and direct methodologies. Firstly, we consider all the lags up to 168 hours in the past (*full*). Then, we use only periodical lags (*dummy*). In the end, we fit our models considering only the meaningful lags selected on the training data (*filtered*)

The direct approach is the most accurate one. As expected, the *full* methodology is more performant than the *dummy* and *filtered* ones. However, the differences are not so marked. *Full* and *filtered* approaches behave almost the same way. Are the same differences also present considering the inference times?

The *dummy* method is the fastest one since it considers a low number of features. For the same reason, the *filtered* method is faster than the *full* one. Surprisingly the speed of the *filtered* approach is the half of *full* one*.* This may be a great result since **we can get good forecasts in a speeder way by simply making a simple feature selection**.

In this post, we leveraged the ability of tspiral to work with time series simply using scikit-learn. This simplified the identification of meaningful autoregressive lags and granted the possibility of operating feature selection also with time series. In the end, we discovered how we may decrease the inference time of forecasting by simply applying an adequate lag selection.

Time Series Forecasting with Feature Selection: Why you may need it Republished from Source https://towardsdatascience.com/time-series-forecasting-with-feature-selection-why-you-may-need-it-696b23ecc329?source=rss----7f60cf5620c9---4 via https://towardsdatascience.com/feed