当前位置: 首页 > 工具软件 > recommenders > 使用案例 >

Github微软开源项目介绍——Microsoft Recommenders

安坚诚
2023-12-01

项目GitHub地址:https://github.com/microsoft/recommenders

简介

此存储库包含构建推荐系统的示例和最佳实践,以Jupyter notebook形式提供。

这些例子详细介绍了我们在五项关键任务上的学习:

  1. 准备数据:为每个推荐算法准备和加载数据
  2. 模型:使用各种经典和深度学习推荐算法(如交替最小二乘(ALS)或极深分解机器(XDeepFM))建立模型。
  3. 评估:使用离线指标评估算法
  4. 模型选择和优化:调整和优化推荐模型的超参数
  5. 运营化:在Azure上的生产环境中运营化模型

Reco_utils文档中提供了几个实用程序来支持常见任务,例如以不同算法预期的格式加载数据集、评估模型输出以及拆分训练/测试数据。其中包括几种最先进算法的实现,以便在您自己的应用程序中进行自学和自定义。请参阅reco_utils文档。

有关存储库的更详细概述,请参阅维基页面上的文档。

入门

有关在数据科学虚拟机(DSVM)或Azure Databricks上本地设置计算机的更多详细信息,请参阅安装指南

  • 1.要在本地计算机上进行设置:使用Python>=3.6安装Anaconda。Miniconda是一种快速入门的方式。
  • 2.克隆存储库
git clone https://github.com/Microsoft/Recommenders
  • 3.运行生成conda文件脚本以创建conda环境:(这适用于基本的python环境,有关PySpark和GPU环境设置,请参阅SETUP.md)
cd Recommenders
python tools/generate_conda_file.py
conda env create -f reco_base.yaml  
  • 4.激活conda环境并将其注册到jupyter:
conda activate reco_base
python -m ipykernel install --user --name reco_base --display-name "Python (reco)"
  • 5.启动Jupyter notebook服务器
jupyter notebook
  • 6.运行00_QUICK_START文件夹下的SAR Python CPU MovieLens记事本文件。确保将内核更改为\“Python(Reco)\”。

注意-交替最小二乘(ALS)笔记本需要PySpark环境才能运行。请按照安装指南中的步骤在PySpark环境中运行这些笔记本。对于深度学习算法,建议使用GPU机器。

算法

下表列出了存储库中当前可用的推荐算法。当有不同的实现可用时,notebook会链接到“Environment”列下。

AlgorithmEnvironmentTypeDescription
Alternating Least Squares (ALS)PySparkCollaborative FilteringMatrix factorization algorithm for explicit or implicit feedback in large datasets, optimized by Spark MLLib for scalability and distributed computing capability
Attentive Asynchronous Singular Value Decomposition (A2SVD)*Python CPU / Python GPUCollaborative FilteringSequential-based algorithm that aims to capture both long and short-term user preferences using attention mechanism
Cornac/Bayesian Personalized Ranking (BPR)Python CPUCollaborative FilteringMatrix factorization algorithm for predicting item ranking with implicit feedback
Cornac/Bilateral Variational Autoencoder (BiVAE)Python CPU / Python GPUCollaborative FilteringGenerative model for dyadic data (e.g., user-item interactions)
Convolutional Sequence Embedding Recommendation (Caser)Python CPU / Python GPUCollaborative FilteringAlgorithm based on convolutions that aim to capture both user’s general preferences and sequential patterns
Deep Knowledge-Aware Network (DKN)*Python CPU / Python GPUContent-Based FilteringDeep learning algorithm incorporating a knowledge graph and article embeddings to provide powerful news or article recommendations
Extreme Deep Factorization Machine (xDeepFM)*Python CPU / Python GPUHybridDeep learning based algorithm for implicit and explicit feedback with user/item features
FastAI Embedding Dot Bias (FAST)Python CPU / Python GPUCollaborative FilteringGeneral purpose algorithm with embeddings and biases for users and items
LightFM/Hybrid Matrix FactorizationPython CPUHybridHybrid matrix factorization algorithm for both implicit and explicit feedbacks
LightGBM/Gradient Boosting Tree*Python CPU / PySparkContent-Based FilteringGradient Boosting Tree algorithm for fast training and low memory usage in content-based problems
LightGCNPython CPU / Python GPUCollaborative FilteringDeep learning algorithm which simplifies the design of GCN for predicting implicit feedback
GeoIMCPython CPUHybridMatrix completion algorithm that has into account user and item features using Riemannian conjugate gradients optimization and following a geometric approach.
GRU4RecPython CPU / Python GPUCollaborative FilteringSequential-based algorithm that aims to capture both long and short-term user preferences using recurrent neural networks
Multinomial VAEPython CPU / Python GPUCollaborative FilteringGenerative Model for predicting user/item interactions
Neural Recommendation with Long- and Short-term User Representations (LSTUR)*Python CPU / Python GPUContent-Based FilteringNeural recommendation algorithm with long- and short-term user interest modeling
Neural Recommendation with Attentive Multi-View Learning (NAML)*Python CPU / Python GPUContent-Based FilteringNeural recommendation algorithm with attentive multi-view learning
Neural Collaborative Filtering (NCF)Python CPU / Python GPUCollaborative FilteringDeep learning algorithm with enhanced performance for implicit feedback
Neural Recommendation with Personalized Attention (NPA)*Python CPU / Python GPUContent-Based FilteringNeural recommendation algorithm with personalized attention network
Neural Recommendation with Multi-Head Self-Attention (NRMS)*Python CPU / Python GPUContent-Based FilteringNeural recommendation algorithm with multi-head self-attention
Next Item Recommendation (NextItNet)Python CPU / Python GPUCollaborative FilteringAlgorithm based on dilated convolutions and residual network that aims to capture sequential patterns
Restricted Boltzmann Machines (RBM)Python CPU / Python GPUCollaborative FilteringNeural network based algorithm for learning the underlying probability distribution for explicit or implicit feedback
Riemannian Low-rank Matrix Completion (RLRMC)*Python CPUCollaborative FilteringMatrix factorization algorithm using Riemannian conjugate gradients optimization with small memory consumption.
Simple Algorithm for Recommendation (SAR)*Python CPUCollaborative FilteringSimilarity-based algorithm for implicit feedback dataset
Short-term and Long-term Preference Integrated Recommender (SLi-Rec)*Python CPU / Python GPUCollaborative FilteringSequential-based algorithm that aims to capture both long and short-term user preferences using attention mechanism, a time-aware controller and a content-aware controller
Multi-Interest-Aware Sequential User Modeling (SUM)*Python CPU / Python GPUCollaborative FilteringAn enhanced memory network-based sequential user model which aims to capture users' multiple interests.
Standard VAEPython CPU / Python GPUCollaborative FilteringGenerative Model for predicting user/item interactions
Surprise/Singular Value Decomposition (SVD)Python CPUCollaborative FilteringMatrix factorization algorithm for predicting explicit rating feedback in datasets that are not very large
Term Frequency - Inverse Document Frequency (TF-IDF)Python CPUContent-Based FilteringSimple similarity-based algorithm for content-based recommendations with text datasets
Vowpal Wabbit (VW)*Python CPU (online training)Content-Based FilteringFast online learning algorithms, great for scenarios where user features / context are constantly changing
Wide and DeepPython CPU / Python GPUHybridDeep learning algorithm that can memorize feature interactions and generalize user features
xLearn/Factorization Machine (FM) & Field-Aware FM (FFM)Python CPUHybridQuick and memory efficient algorithm to predict labels with user/item features

注:*表示微软发明/贡献的算法。独立或孵化的算法和实用程序是Conrib文件夹的候选者。这将包含可能不容易放入核心存储库的贡献,或者需要时间来重构或成熟代码并添加必要的测试。

AlgorithmEnvironmentTypeDescription
SARplus *PySparkCollaborative FilteringOptimized implementation of SAR for Spark

初步比较

我们提供了一个基准notebook来说明如何评估和比较不同的算法。在本notebook中,使用分层拆分将MovieLens数据集按75/25的比率拆分成训练/测试集。使用下面的每种协同过滤算法来训练推荐模型。

我们利用文献中报道的经验参数值。对于排名指标,我们使用k=10(前10个推荐项目)。我们在标准的NC6S_v2 Azure DSVM(6个vCPU、112 GB内存和1个P100 GPU)上运行比较。Spark ALS在本地独立模式下运行。在此表中,我们显示了在Movielens 100k上运行15次迭代的算法的结果。

AlgoMAPnDCG@kPrecision@kRecall@kRMSEMAER2Explained Variance
ALS0.0047320.0442390.0484620.0177960.9650380.7530010.2556470.251648
BiVAE0.1461260.4750770.4117710.219145N/AN/AN/AN/A
BPR0.1324780.4419970.3882290.212522N/AN/AN/AN/A
FastAI0.0255030.1478660.1303290.0538240.9430840.7443370.2853080.287671
LightGCN0.0885260.4198460.3796260.144336N/AN/AN/AN/A
NCF0.1077200.3961180.3472960.180775N/AN/AN/AN/A
SAR0.1105910.3824610.3307530.1763851.2538051.048484-0.5693630.030474
SVD0.0128730.0959300.0911980.0327830.9386810.7426900.2919670.291971

 

相关项目

Microsoft AI Github:在我们的中央存储库中查找其他最佳实践项目和Azure AI设计模式。

NLP最佳实践:关于NLP的最佳实践和实例。

计算机视觉最佳实践:计算机视觉最佳实践和实例。

预测最佳实践:时间序列预测的最佳实践和实例。

参考文献

  • A. Argyriou, M. González-Fierro, and L. Zhang, "Microsoft Recommenders: Best Practices for Production-Ready Recommendation Systems", WWW 2020: International World Wide Web Conference Taipei, 2020. Available online: https://dl.acm.org/doi/abs/10.1145/3366424.3382692
  • L. Zhang, T. Wu, X. Xie, A. Argyriou, M. González-Fierro and J. Lian, "Building Production-Ready Recommendation System at Scale", ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2019 (KDD 2019), 2019.
  • S. Graham, J.K. Min, T. Wu, "Microsoft recommenders: tools to accelerate developing recommender systems", RecSys '19: Proceedings of the 13th ACM Conference on Recommender Systems, 2019. Available online: https://dl.acm.org/doi/10.1145/3298689.3346967
 类似资料: