时间序列预测分析(2)How to Develop a Skillful Machine Learning Time Series Forecasting Model

全心思
2023-12-01

Process Overview

The goal of this process is to get a “good enough” forecast model as fast as possible.

This process may or may not deliver the best possible model, but it will deliver a good model: a model that is better than a baseline prediction, if such a model exists.

Typically, this process will deliver a model that is 80% to 90% of what can be achieved on the problem.

The process is fast. As such, it focuses on automation. Hyperparameters are searched rather than specified based on careful analysis. You are encouraged to test suites of models in parallel, rapidly getting an idea of what works and what doesn’t.

Nevertheless, the process is flexible, allowing you to circle back or go as deep as you like on a given step if you have the time and resources.

This process is divided into four parts; they are:

  • Define Problem 分析问题
  • Design Test Harness 设计评估工具
  • Test Models 对多个模型比较
  • Finalize Model 确认最终模型
    You will notice that the process is different from a classical linear work-through of a predictive modeling problem. This is because it is designed to get a working forecast model fast and then slow down and see if you can get a better model.

How to Use This Process

The biggest mistake is skipping steps.

For example, the mistake that almost all beginners make is going straight to modeling without a strong idea of what problem is being solved or how to robustly evaluate candidate solutions. This almost always results in a lot of wasted time.

Slow down, follow the process, and complete each step.

I recommend having separate code for each experiment that can be re-run at any time.

This is important so that you can circle back when you discover a bug, fix the code, and re-run an experiment. You are running experiments and iterating quickly, but if you are sloppy, then you cannot trust any of your results. This is especially important when it comes to the design of your test harness for evaluating candidate models.

Let’s take a closer look at each step of the process.

1. Define Problem 分析问题

Define your time series problem.

Some topics to consider and motivating questions within each topic are as follows:

Inputs vs. Outputs 输入/输出
What are the inputs and outputs for a forecast?
Endogenous vs. Exogenous 内源性和外源性变量
What are the endogenous and exogenous variables?
Unstructured vs. Structured 非结构化/结构化变量
Are the time series variables unstructured or structured?
Regression vs. Classification 回归/分类
Are you working on a regression or classification predictive modeling problem?
What are some alternate ways to frame your time series forecasting problem?
Univariate vs. Multivariate 单变量/多变量问题
Are you working on a univariate or multivariate time series problem?
Single-step vs. Multi-step 单步/多步预测
Do you require a single-step or a multi-step forecast?
Static vs. Dynamic 静态/动态更新模型
Do you require a static or a dynamically updated model?
Answer each question even if you have to estimate or guess.

Some useful tools to help get answers include:

Data visualizations (e.g. line plots, etc.).
Statistical analysis (e.g. ACF/PACF plots, etc.).
Domain experts.
Project stakeholders.
Update your answers to these questions as you learn more.

2. Design Test Harness 设计评估工具

Design a test harness that you can use to evaluate candidate models.

This includes both the method used to estimate model skill and the metric used to evaluate predictions.

Below is a common time series forecasting model evaluation scheme if you are looking for ideas:

常见的时间序列预测模型评估框架:

  • Split the dataset into a train and test set. 划分数据集为训练集和测试集
  • Fit a candidate approach on the training dataset. 在训练集上找到一种合适的候选方法
  • Make predictions on the test set directly or using walk-forward validation. 在测试集或者验证集上进行预测
  • Calculate a metric that compares the predictions to the expected values. 比较预测结果与实际值
    The test harness must be robust and you must have complete trust in the results it provides.

An important consideration is to ensure that any coefficients used for data preparation are estimated from the training dataset only and then applied on the test set. This might include mean and standard deviation in the case of data standardization.

3. Test Models 多个模型比较

Test many models using your test harness.

I recommend carefully designing experiments to test a suite of configurations for standard models and letting them run. Each experiment can record results to a file, to allow you to quickly discover the top three to five most skilful configurations from each run.

Some common classes of methods that you can design experiments around include the following:

常见的时间序列分析方法

  • Baseline.
    Persistence (grid search the lag observation that is persisted)
    Rolling moving average.
  • Autoregression.
    ARMA for stationary data.
    ARIMA for data with a trend.
    SARIMA for data with seasonality.
  • Exponential Smoothing.
    Simple Smoothing
    Holt Winters Smoothing
  • Linear Machine Learning.
    Linear Regression
    Ridge Regression
    Lasso Regression
    Elastic Net Regression
    ….
  • Nonlinear Machine Learning.
    k-Nearest Neighbors
    Classification and Regression Trees
    Support Vector Regression
  • Ensemble Machine Learning.
    Bagging
    Boosting
    Random Forest
    Gradient Boosting
  • Deep Learning.
    MLP
    CNN
    LSTM
    Hybrids

    This list is based on a univariate time series forecasting problem, but you can adapt it for the specifics of your problem, e.g. use VAR/VARMA/etc. in the case of multivariate time series forecasting.

Slot in more of your favorite classical time series forecasting methods and machine learning methods as you see fit.

Order here is important and is structured in increasing complexity from classical to modern methods. Early approaches are simple and give good results fast; later approaches are slower and more complex, but also have a higher bar to clear to be skillful.

The resulting model skill can be used in a ratchet. For example, the skill of the best persistence configuration provide a baseline skill that all other models must outperform. If an autoregression model does better than persistence, it becomes the new level to outperform in order for a method to be considered skilful.

Ideally, you want to exhaust each level before moving on to the next. E.g. get the most out of Autoregression methods and use the results as a new baseline to define “skilful” before moving on to Exponential Smoothing methods.

I put deep learning at the end as generally neural networks are poor at time series forecasting, but there is still a lot of room for improvement and experimentation in this area. (将神经网络放在最后是由于大部分神经网络在时间序列预测方面能力较弱,在该领域有很大的提高空间)

模型改进方向

The more time and resources that you have, the more configurations that you can evaluate.

For example, with more time and resources, you could:

Search model configurations at a finer resolution around a configuration known to already perform well.
Search more model hyperparameter configurations.
Use analysis to set better bounds on model hyperparameters to be searched.
Use domain knowledge to better prepare data or engineer input features.
Explore different potentially more complex methods.
Explore ensembles of well performing base models.
I also encourage you to include data preparation schemes as hyperparameters for model runs.

Some methods will perform some basic data preparation, such as differencing in ARIMA, nevertheless, it is often unclear exactly what data preparation schemes or combinations of schemes are required to best present a dataset to a modeling algorithm. Rather than guess, grid search and decide based on real results.

数据预处理架构

Some data preparation schemes to consider include:

Differencing to remove a trend.
Seasonal differencing to remove seasonality.
Standardize to center.
Normalize to rescale.
Power Transform to make normal.
So much searching can be slow.

模型加速方法

Some ideas to speed up the evaluation of models include:

Use multiple machines in parallel via cloud hardware (such as Amazon EC2).
Reduce the size of the train or test dataset to make the evaluation process faster.
Use a more coarse grid of hyperparameters and circle back if you have time later.
Perhaps do not refit a model for each step in walk-forward validation.

4. Finalize Model

At the end of the previous time step, you know whether your time series is predictable.

If it is predictable, you will have a list of the top 5 to 10 candidate models that are skillful on the problem.

You can pick one or multiple models and finalize them. This involves training a new final model on all available historical data (train and test).

The model is ready for use; for example:

Make a prediction for the future.
Save the model to file for later use in making predictions.
Incorporate the model into software for making predictions.
If you have time, you can always circle back to the previous step and see if you can further improve upon the final model.

This may be required periodically if the data changes significantly over time.

参考

 类似资料: