这篇博客主要介绍从 csv -> bin文件的流程
借助efinance
工具包获取股票数据,详细用法请参考github页面:https://github.com/Micro-sheep/efinance
import pandas as pd
import efinance
def get_k_data(stock_code, begin="20200101", end="20210101") -> pd.DataFrame:
"""
根据efinance工具包获取股票数据
:param stock_code:股票代码
:param begin: 开始日期
:param end: 结束日期
:return:
"""
k_dataframe: pd.DataFrame = efinance.stock.get_quote_history(stock_code, beg=begin, end=end, fqt=0)
k_dataframe = k_dataframe.iloc[:, :9]
k_dataframe.columns = ['name', 'code', 'date', 'open', 'close', 'high', 'low', 'volume', 'turnover']
k_dataframe.drop(['name', 'code'], axis=1, inplace=True)
return k_dataframe
if __name__ == '__main__':
k_line_df = get_k_data(stock_code='600519') # 茅台股价
k_line_df.to_csv("600519.csv", index=False)
可以得到如下的csv格式数据:
date,open,close,high,low,volume,turnover
2020-01-02,1128.0,1130.0,1145.06,1116.0,148099,16696837120.0
2020-01-03,1117.0,1078.56,1117.0,1076.9,130319,14266380544.0
2020-01-06,1070.86,1077.99,1092.9,1067.3,63415,6853917696.0
2020-01-07,1077.5,1094.53,1099.0,1076.4,47854,5220697088.0
2020-01-08,1085.05,1088.14,1095.5,1082.58,25008,2720371920.0
2020-01-09,1094.0,1102.7,1105.39,1090.0,37406,4110949808.0
........
有两种方法,一是通过命令行,另一种是通过代码来转换。转换用的函数Qlib已经写好了,在scripts/dump_bin.py
文件下的class DumpDataAll(DumpDataBase)
python qlib/scripts/dump_bin.py dump_all --csv_path . --qlib_dir my_qlib_dir/ --include_fields open,close,high,low,volume,turnover
from scripts.dump_bin import DumpDataAll
if __name__ == '__main__':
dump_util = DumpDataAll(csv_path=".", qlib_dir="my_qlib_dir/",
include_fields='open,close,high,low,volume,turnover')
dump_util.dump()
通过上面的两种方法,可以发现:
--参数
后面的就是传递进去的参数include_fields
后面的就是要封装的列名,这个需要与csv文件一一对应在刚才生成的my_qlib_dir/
下,包含下面的三个文件夹:
import qlib
from qlib.data import D
qlib.init(provider_uri="my_qlib_dir/") # 存放全部数据的根目录
calendar = D.calendar() # 交易日历
market_instrument = D.instruments(market='all') # 定位市场(股票的范围)
# 市场全部的股票
total_code = D.list_instruments(instruments=market_instrument, start_time='2020-01-01', end_time='2020-05-01',
as_list=True)
print(total_code)
# 具体的数据
field_data = D.features(instruments=["600519"], start_time='2020-01-01', end_time='2020-05-01',
fields=["$open", "$close", "$high", "$low"])
print(field_data.head())
得到结果:
['600519']
$open $close $high $low
instrument datetime
600519 2020-01-02 1128.000000 1130.000000 1145.060059 1116.000000
2020-01-03 1117.000000 1078.560059 1117.000000 1076.900024
2020-01-06 1070.859985 1077.989990 1092.900024 1067.300049
2020-01-07 1077.500000 1094.530029 1099.000000 1076.400024
2020-01-08 1085.050049 1088.140015 1095.500000 1082.579956
更多内容请参考:https://qlib.readthedocs.io/en/latest/start/getdata.html#examples