Time series forecasting is one of the most important topics in data science. Almost every business needs to predict the future in order to make better decisions and allocate resources more effectively.
This repository provides examples and best practice guidelines for building forecasting solutions. The goal of this repository is to build a comprehensive set of tools and examples that leverage recent advances in forecasting algorithms to build solutions and operationalize them. Rather than creating implementations from scratch, we draw from existing state-of-the-art libraries and build additional utilities around processing and featurizing the data, optimizing and evaluating models, and scaling up to the cloud.
The examples and best practices are provided as Python Jupyter notebooks and R markdown files and a library of utility functions. We hope that these examples and utilities can significantly reduce the “time to market” by simplifying the experience from defining the business problem to the development of solutions by orders of magnitude. In addition, the example notebooks would serve as guidelines and showcase best practices and usage of the tools in a wide variety of languages.
We've carried out a cleanup of large obsolete files to reduce the size of this repo. If you had cloned or forked it previously, please delete and clone/fork it again to avoid any potential merge conflicts.
The following is a summary of models and methods for developing forecasting solutions covered in this repository. The examples are organized according to use cases. Currently, we focus on a retail sales forecasting use case as it is widely used in assortment planning, inventory optimization, and price optimization. To enable high-throughput forecasting scenarios, we have included examples for forecasting multiple time series with distributed training techniques such as Ray in Python, parallel package in R, and multi-threading in LightGBM. Note that html links are provided next to R examples for best viewing experience when reading this document on our github.io
page.
Model | Language | Description |
---|---|---|
Auto ARIMA | Python | Auto Regressive Integrated Moving Average (ARIMA) model that is automatically selected |
Linear Regression | Python | Linear regression model trained on lagged features of the target variable and external features |
LightGBM | Python | Gradient boosting decision tree implemented with LightGBM package for high accuracy and fast speed |
DilatedCNN | Python | Dilated Convolutional Neural Network that captures long-range temporal flow with dilated causal connections |
Mean Forecast (.html) | R | Simple forecasting method based on historical mean |
ARIMA (.html) | R | ARIMA model without or with external features |
ETS (.html) | R | Exponential Smoothing algorithm with additive errors |
Prophet (.html) | R | Automated forecasting procedure based on an additive model with non-linear trends |
The repository also comes with AzureML-themed notebooks and best practices recipes to accelerate the development of scalable, production-grade forecasting solutions on Azure. In particular, we have the following examples for forecasting with Azure AutoML as well as tuning and deploying a forecasting model on Azure.
Method | Language | Description |
---|---|---|
Azure AutoML | Python | AzureML service that automates model development process and identifies the best machine learning pipeline |
HyperDrive | Python | AzureML service for tuning hyperparameters of machine learning models in parallel on cloud |
AzureML Web Service | Python | AzureML service for deploying a model as a web service on Azure Container Instances |
To quickly get started with the repository on your local machine, use the following commands.
Install Anaconda with Python >= 3.6. Miniconda is a quick way to get started.
Clone the repository
git clone https://github.com/microsoft/forecasting
cd forecasting/
Run setup scripts to create conda environment. Please execute one of the following commands from the root of Forecasting repo based on your operating system.
./tools/environment_setup.sh
tools\environment_setup.bat
Note that for Windows you need to run the batch script from Anaconda Prompt. The script creates a conda environment forecasting_env
and installs the forecasting utility library fclib
.
Start the Jupyter notebook server
jupyter notebook
Run the LightGBM single-round notebook under the 00_quick_start
folder. Make sure that the selected Jupyter kernel is forecasting_env
.
If you have any issues with the above setup, or want to find more detailed instructions on how to set up your environment and run examples provided in the repository, on local or a remote machine, please navigate to the Setup Guide.
We assume you already have R installed on your machine. If not, simply follow the instructions on CRAN to download and install R.
The recommended editor is RStudio, which supports interactive editing and previewing of R notebooks. However, you can use any editor or IDE that supports RMarkdown. In particular, Visual Studio Code with the R extension can be used to edit and render the notebook files. The rendered .nb.html
files can be viewed in any modern web browser.
The examples use the Tidyverts family of packages, which is a modern framework for time series analysis that builds on the widely-used Tidyverse family. The Tidyverts framework is still under active development, so it's recommended that you update your packages regularly to get the latest bug fixes and features.
Our target audience for this repository includes data scientists and machine learning engineers with varying levels of knowledge in forecasting as our content is source-only and targets custom machine learning modelling. The utilities and examples provided are intended to be solution accelerators for real-world forecasting problems.
We hope that the open source community would contribute to the content and bring in the latest SOTA algorithm. This project welcomes contributions and suggestions. Before contributing, please see our Contributing Guide.
The following is a list of related repositories that you may find helpful.
Deep Learning for Time Series Forecasting | A collection of examples for using deep neural networks for time series forecasting with Keras. |
Microsoft AI Github | Find other Best Practice projects, and Azure AI designed patterns in our central repository. |
Build | Branch | Status |
---|---|---|
Linux CPU | master | |
Linux CPU | staging |
本篇文章希望对demand forecasting涉及的技术进行框架性的整理。首先参考的是供应链及库存相关的著作,一般其中都会有关于forecasting的一章。 References Waters, D. (2003). Inventory control and management 2nd. John Wiley & Sons. (偏OM) Axsäter, S. (2015). Inven
关于pytorch forecasting TFT模型 PyTorch Forecasting for Time Series Forecasting | Kaggle https://levelup.gitconnected.com/forecasting-walmart-quarterly-revenue-pytorch-lstm-example-b4e4b20862a7 tempora
An implementation of the @handsontable/react wrapper.import React from 'react'; import ReactDOM from 'react-dom'; import { HotTable } from '@handsontable/react'; import Handsontable from 'handsontable
通用范例/范例七: Face completion with a multi-output estimators http://scikit-learn.org/stable/auto_examples/plot_multioutput_face_completion.html 这个范例用来展示scikit-learn如何用 extremely randomized trees, k neares
http://scikit-learn.org/stable/auto_examples/missing_values.htm 在这范例说明有时补充缺少的数据(missing values),可以得到更好的结果。但仍然需要进行交叉验证。来验证填充是否合适 。而missing values可以用均值、中位值,或者频繁出现的值代替。中位值对大数据之机器学习来说是比较稳定的估计值。 (一)引入函式库及内
http://scikit-learn.org/stable/auto_examples/plot_isotonic_regression.html 迴归函数採用递增函数。 y[] are inputs (real numbers) y_[] are fitted 这个范例的主要目的: 比较 Isotonic Fit Linear Fit (一) Regression「迴归」 「迴归」就是找一个函
http://scikit-learn.org/stable/auto_examples/feature_stacker.html 在许多实际应用中,会有很多方法可以从一个数据集中提取特征。也常常会组合多个方法来获得良好的特征。这个例子说明如何使用FeatureUnion 来结合由PCA 和univariate selection 时的特征。 这个范例的主要目的: 资料集:iris 鸢尾花资料集
通用范例/范例一: Plotting Cross-Validated Predictions http://scikit-learn.org/stable/auto_examples/plot_cv_predict.html 资料集:波士顿房产 特征:房地产客观数据,如年份、平面大小 预测目标:房地产价格 机器学习方法:线性迴归 探讨重点:10 等分的交叉验証(10-fold Cross-Vali
Ex 1: Plotting Cross-Validated Predictions Ex 2: Concatenating multiple feature extraction methods Ex 3: Isotonic Regression Ex 4: Imputing missing values before building an estimator Ex 7: Face compl
我正在尝试在Ubuntu14.04 LTS上使用Python2.7构建tensorflow,没有GPU。当我在终端上运行本教程中的以下命令时: 它说无法使用日志进行构建: 这是我的构建环境: OS:Ubuntu14.04 LTS 64位内存12Gib gcc版本:4.8.4 python版本:2.7.6 bazel版本:0.3.2 git版本的tensorflow源代码:v0。11.0rc0-15