当前位置: 首页 > 工具软件 > LibFM > 使用案例 >

【深度学习】LibFM实际应用

邹胜泫
2023-12-01

本文使用的训练数据:https://download.csdn.net/download/qq_31573519/12344779

1. 准备数据

从上述地址下载,数据格式:

1.数据介绍
User ID, item ID, category ID, behavior type, timestamp
Field	        Explanation
User ID	        An integer, the serialized ID that represents a user
Item ID	        An integer, the serialized ID that represents an item
Category ID	An integer, the serialized ID that represents the category which the corresponding item belongs to
Behavior type	A string, enum-type from ('pv', 'buy')

2. 数据处理(特征工程)

vim process.py

import random

lines=[]
file_path = '/Users/Documents/data.csv'
with open(file_path, 'r') as infile:
    for line in infile:
        lines.append(line.strip())
# print(len(lines))

random.shuffle(lines)
for line in lines:
    # print(line)
    temp = line.split(',')
    # print(temp)
    # print(temp[3])
    if temp[3] == 'buy':
        print('1 ' + temp[0]+':1 ' + temp[1]+':1 ' + temp[2]+':1')
    else:
        print('0 ' + temp[0] + ':1 ' + temp[1] + ':1 ' + temp[2] + ':1')

注意:一定要进行数据打乱 random.shuffle(lines)

3. 分离测试、训练集

head -2196681 ./data/train_all_data > ./data/train_shuffle
tail -549170 ./data/train_all_data > ./data/test_shuffle
看看训练、测试集的样本比例是否符合预期
cat ./data/train_shuffle| grep '^1' | wc -l
   47756
cat ./data/train_shuffle| grep '^0' | wc -l
 2148925
cat ./data/test_shuffle| grep '^0' | wc -l
  536951
cat ./data/test_shuffle| grep '^1' | wc -l
   12219
   
>>> 47756.0/2148925
0.022223204625568597
>>> 12219.0/536951
0.022756266400472295
>>>

都是0.02,符合预期 (shuffle逻辑没问题)

4. 训练:train.sh

vim train.sh

#./bin/libFM -task c -method sgd -train ./data/train_shuffle -test ./data/test_shuffle -dim '1,1,8' -out result_v1 -save_model model_v1 -iter 20 -learn_rate 0.01
#./bin/libFM -task c -method sgd -train ./data/train_shuffle -test ./data/test_shuffle -dim '1,1,16' -out result_v1 -iter 20 -learn_rate 0.01
./bin/libFM -task c -method sgd -train ./data/train_shuffle -test ./data/test_shuffle -dim '1,1,32' -out result_v1 -iter 100 -learn_rate 0.01
 类似资料: