FM模型主要目标是:解决数据稀疏的情况下,特征怎样组合的问题,因此该算法主要用于组合特征等特征工程。
参考 https://blog.csdn.net/chloezhao/article/details/53462411 使用手册中文版
具体可以查看readme.pdf
进入src的libfm目录点击make all命令,在bin目录下会生成三个可执行的文件的convert* libFM* transpose*
• scripts
— — ltriple format to libfm.pl︰ 一个 Perl 脚本将逗号/制表符分隔的数据集 转换成 libFM 格式。
• src︰ 源文件的 libFM 和工具
libFM的输入数据支持两种文件格式:txt格式和二进制格式。txt推荐新手使用。
数据格式跟SVMlite和 LIBSVM的一样:4 0:1.5 3:-7.9 y=4, x0 = 1.5 ,x3=-7.9(先是y,再是xINDEX = VALUE)
For binary classification, cases with y > 0 are regarded as the positive class and with y ≤ 0 as the negative class.
推荐系统中经常使用像 userid, itemid, rating的这样的文件格式
1、文本格式
转换libFM文件格式的perl脚本在scripts目录中
#输出将写入到一个文件扩展名为.libfm的文件中。例如,写入toratings.dat.libfm中。将Movielens 1M里的rating.csv转换成libFM格式
./triple_format_to_libfm.pl-in ratings.dat -target 2 -delete_column 3 -separator "::"
包含多个文件
./triple_format_to_libfm.pl -in train.txt,test.txt -target 2 -delete_column 3 -separator ”::”
每个文件单独运行转换脚本,变量(ids)会不匹配
./libFM -task r -train ml1m-train.libfm -test ml1m-test.libfm -dim ’1,1,8’ -iter 1000 -method sgd /als /mcmc
-learn_rate 0.01 -regular ’0,0,0.01’ -init_stdev 0.1
2、二进制文件
bin文件夹下的convert
./convert --ifile ratings.dat.libfm --ofilex ratings.x --ofiley ratings.y
输出两个文件:
(1)包含设计矩阵X即预测器变量的文件
(2)包含预测目标y的文件
建议,分别以.x和.y作为文件拓展名
转置数据: ./transpose --ifile ratings.x --ofile ratings.xt
3、LIBFM
libFm工具从训练数据集(-train)和验证数据集(-test)中训练FM模型。
例如:一个FM,使用bias,1-way interactions,a factorization of k = 8 forpairwise interactions的回归任务——
/libFM -task r -train ml1m-train.libfm -test ml1m-test.libfm -dim ’1,1,8’
LIBFM说明----------------------------------------------------------------------------
libFM
Version: 1.4.2
Author: Steffen Rendle, srendle@libfm.org
WWW: http://www.libfm.org/
This program comes with ABSOLUTELY NO WARRANTY; for details see license.txt.
This is free software, and you are welcome to redistribute it under certain
conditions; for details see license.txt.
----------------------------------------------------------------------------
-cache_size cache size for data storage (only applicable if data is
in binary format), default=infty
-dim 'k0,k1,k2': k0=use bias, k1=use 1-way interactions,
k2=dim of 2-way interactions; default=1,1,8
-help this screen
-init_stdev stdev for initialization of 2-way factors; default=0.1
-iter number of iterations; default=100
-learn_rate learn_rate for SGD; default=0.1
-meta filename for meta information about data set
-method learning method (SGD, SGDA, ALS, MCMC); default=MCMC
-out filename for output
-regular 'r0,r1,r2' for SGD and ALS: r0=bias regularization,
r1=1-way regularization, r2=2-way regularization
-relation BS: filenames for the relations, default=''
-rlog write measurements within iterations to a file;
default=''
-task r=regression, c=binary classification [MANDATORY]
-test filename for test data [MANDATORY]
-train filename for training data [MANDATORY]
-validation filename for validation data (only for SGDA)
-verbosity how much infos to print; default=0