time-series-machine-learning

Machine learning models for time series analysis
授权协议 Apache-2.0 License
开发语言 Python
所属分类 神经网络/人工智能、 机器学习/深度学习
软件类型 开源软件
地区 不详
投 递 者 章阳波
操作系统 跨平台
开源组织
适用人群 未知
 软件概览

Time Series Prediction with Machine Learning

A collection of different Machine Learning models predicting the time series,concretely the market price for given the currency chart and target.

Requirements

Required dependency: numpy. Other dependencies are optional, but to diversify the final models ensemble,it's recommended to install these packages: tensorflow, xgboost.

Tested with python versions: 2.7.14, 3.6.0.

Fetching data

There is one built-in data provider, which fetches the data from Poloniex exchange.Currently, all models have been tested with crypto-currencies' charts.

Fetched data format is standard security OHLC trading info:date, high, low, open, close, volume, quoteVolume, weightedAverage.But the models are agnostic of the particular time series features and can be trained with sub- or superset of these features.

To fetch the data, run run_fetch.py script from the root directory:

# Fetches the default tickers: BTC_ETH, BTC_LTC, BTC_XRP, BTC_ZEC for all time periods.
$ ./run_fetch.py

By default, the data is fetched for all time periods available in Poloniex (day, 4h, 2h, 30m, 15m, 5m)and is stored in _data directory. One can specify the tickers and periods via command-line arguments.

# Fetches just BTC_ETH ticker data for only 3 time periods.
$ ./run_fetch.py BTC_ETH --period=2h,4h,day

Note: the second and following runs won't fetch all charts from scratch, but just the update from the last run till now.

Training the models

To start training, run run_train.py script from the root directory:

# Trains all models until stopped.
# The defaults: 
# - tickers: BTC_ETH, BTC_LTC, BTC_XRP, BTC_ZEC
# - period: day
# - target: high
$ ./run_train.py

# Trains the models for specified parameters.
$ ./run_train.py --period=4h --target=low BTC_BCH

By default, the script trains all available methods (see below) with random hyper-parameters, cross-validates each model andsaves the result weights if the performance is better than current average (the limit can be configured).

All models are placed to the _zoo directory (note: it is possible that early saved models will perform much worse thanlater ones, so you're welcome to clean-up the models you're definitely not interested in, because they can only spoilthe final ensemble).

Note 1: specifying multiple periods and targets will force the script to train all combinations of those.Currently, the models do not reuse weights for different targets. In other words, if set --target=low,high,it will train different models particularly for low and for high.

Note 2: under the hood, the models work with transformed data,in particular high, low, open, close, volume are transform to percent changes. Hence, the prediction for thesecolumns is also percent changes.

Machine Learning methods

Currently supported methods:

  • Ordinary linear model. Even though it's very simple, as it turns out, the linear regression shows pretty good resultsand compliments the more complex models in the final ensemble.
  • Gradient boosting (using xgboost implementation).
  • Deep neural network (in tensorflow).
  • Recurrent neural network: LSTM, GRU, one or multi-layered (in tensorflow as well).
  • Convolutional neural network for 1-dimensional data (in tensorflow as well).

All models take as input a window of certain size (named k) and predict a single target value for the next time step.Example:window size k=10 means that the model accepts (x[t-10], x[t-9], ..., x[t-1]) array to predict x[t].target.Each of x[i] includes a number of features (open, close, volume, etc). Thus, the model takes 10 * features values inand outputs a single value - percent change for the target column.

Inspecting the model

Saved models consist of the following files:

  • run-params.txt: each model has the following run parameters:
    • Ticker name, e.g., BTC_ETH.
    • Time period, e.g., 4h.
    • Target column, e.g., high (means the model is predicting the next high price).
    • Model class, e.g., RecurrentModel.
    • The k value, which denotes the input length,e.g., k=16 with period=day means the model needs 16 days to predict the next one.
  • model-params.txt: holds the specific hyper-parameters that the model was trained with.
  • stats.txt: evaluation statistics (for both training and test sets, see the details below).
  • One or several files holding the internal weights.

Each model is evaluated for both training and test set, but the final evaluation score is computed only from the test set.

Here's the example report:

# Test results:
Mean absolute error: 0.019528
SD absolute error:   0.023731
Sign accuracy:       0.635158
Mean squared error:  0.000944
Sqrt of MSE:         0.030732
Mean error:          -0.001543
Residuals stats:     mean=0.0195 std=0.0238 percentile=[0%=0.0000 25%=0.0044 50%=0.0114 75%=0.0252 90%=0.0479 100%=0.1917]
Relative residuals:  mean=1.1517 std=0.8706 percentile=[0%=0.0049 25%=0.6961 50%=0.9032 75%=1.2391 90%=2.3504 100%=4.8597]

You should read it like this:

  • The model is on average 0.019528 or about 2% away from the ground truth percent change (absolute difference),but only -0.001543 away taking into account the sign. In other words, the model underestimates and overestimates thetarget equally, usually by 2%.

  • The standard deviation of residuals is also about 2%: 0.023731, so it's rarely far off the target.

  • The model is 63% right about the sign of the change: 0.635158.For example, this means that when the model says "Buy!",it may be wrong about how high the predicted price will be, but the price will go up in 63% of the cases.

  • Residuals and relative residuals show the percentiles of error distribution. In particular, in 75% of the casesthe residual percent value is less than 2.5% away from the ground truth and no more than 124% larger relatively.

    Example: if truth=0.01 and prediction=0.02, then residual=0.01 (1% away) and relative_residual=1.0 (100% larger).

In the end, the report is summarized to one evaluation result, which is mean_abs_error + risk_factor * sd_abs_error.You can vary the risk_factor to prefer the models that are better or worse on average vs in the worst case.By default, risk_factor=1.0, hence the model above is evaluated at 0.0433. Lower evaluation is better.

Running predictions

The run_predict.py script downloads the current trading data for the selected currencies and runs anensemble of several best models (5 by default) that have been saved for these currencies, period and target.Result prediction is the aggregated value of constituent model predictions.

# Runs ensemble of best models for BTC_ETH ticker and outputs the aggregated prediction.
# Default period: day, default target: high.
$ ./run_predict.py BTC_ETH

License

Apache 2.0

  • Process Overview The goal of this process is to get a “good enough” forecast model as fast as possible. This process may or may not deliver the best possible model, but it will deliver a good model: a

  • Paper list of Time-series Forecasting with Deep Learning RNN-LSTM Deep and Confident Prediction for Time Series at Uber DeepAR_Probabilistic Forecasting with Autoregressive Recurrent Networks Time-ser

  •         好吧,之所以想先说它,是因为这几天更新系统下了两次实在是有点烦了。突然发现自己笨了~既然系统自带这么强大的备份软件,那为什么不去用呢?说来也怪,今天备份好做好充足的准备后,升10.6.6一次成功,想体验一下time machine的强大功能也木有机会啊。那就只有先进行理论学习了~不过这样也好,起码养成了良好的备份习惯,以后出问题时也不至于无法补救~         来自百度百科:

  • by Joseph Rickert 原文地址 Late last Saturday afternoon I was reading in my usual spot at the Dana Street Coffee House in Mt. View. A stranger walking by my table noticed my copy of Madsen’s Time Series A

  • There are a few questions in the forums about what and where to learn Machine Learning(ML). The overview of this course also suggests some information during the last week of lectures. Since a lot of

  • Machine learning is to find functions automatically.For example, speech recognition, image recognition, playing go, dialogue system… Regression Prediction. Binary Classification Output is yes or no. M

 相关资料
  • 时间序列是一系列数据点,其中每个数据点与时间戳相关联。 一个简单的例子是股票市场中某一天的不同时间点的股票价格。 另一个例子是一年中不同月份的一个地区的降雨量。 在下面的例子中,我们将特定股票代码的每日股票价格的价值定为四分之一。 我们将这些值捕获为csv文件,然后使用pandas库将它们组织到数据框中。 然后,我们通过将附加的Valuedate列重新创建为索引并删除旧的值列来将日期字段设置为数据

  • 我们已经在Highcharts Configuration Syntax一章中看到了用于绘制此图表的配置 。 现在让我们考虑以下示例来进一步理解时间序列,Zoomable Chart。 配置 (Configurations) 现在让我们讨论所采取的其他配置/步骤。 图表 配置图表以使其可缩放。 chart.zoomType通过拖动鼠标chart.zoomType决定用户可以缩放的尺寸。 可能的值是

  • 以下是基于时间的数据图表的示例。 我们已经在Highcharts Configuration Syntax一章中看到了用于绘制图表的配置 。 现在,我们将讨论基于时间的数据图表的示例。 配置 (Configurations) 现在让我们讨论所采取的其他配置/步骤。 图表 配置图表以使其可缩放。 chart.zoomType通过拖动鼠标chart.zoomType决定用户可以缩放的尺寸。 可能的值是

  • 预测给定输入序列中的下一个是机器学习中的另一个重要概念。 本章为您提供有关分析时间序列数据的详细说明。 介绍 (Introduction) 时间序列数据表示一系列特定时间间隔内的数据。 如果我们想在机器学习中构建序列预测,那么我们必须处理顺序数据和时间。 系列数据是顺序数据的摘要。 数据排序是顺序数据的重要特征。 序列分析或时间序列分析的基本概念 序列分析或时间序列分析是基于先前观察到的预测给定输

  • 将任务函数和/或组合操作组合成更大的操作,这些操作将按顺序依次执行。对于使用 series() 和 parallel() 组合操作的嵌套深度没有强制限制。 用法 const { series } = require('gulp'); function javascript(cb) { // body omitted cb(); } function css(cb) { // bod

  • Series是一维标记数组,能够保存任何类型的数据(整数,字符串,浮点数,python对象等)。 轴标签统称为索引。 pandas.Series 可以使用以下构造函数创建pandas系列 - pandas.Series( data, index, dtype, copy) 构造函数的参数如下 - S.No 参数和描述 1 data 数据采用各种形式,如ndarray,list,常量 2 inde