此存储库包含构建推荐系统的示例和最佳实践,以Jupyter notebook形式提供。
这些例子详细介绍了我们在五项关键任务上的学习:
Reco_utils文档中提供了几个实用程序来支持常见任务,例如以不同算法预期的格式加载数据集、评估模型输出以及拆分训练/测试数据。其中包括几种最先进算法的实现,以便在您自己的应用程序中进行自学和自定义。请参阅reco_utils文档。
有关存储库的更详细概述,请参阅维基页面上的文档。
有关在数据科学虚拟机(DSVM)或Azure Databricks上本地设置计算机的更多详细信息,请参阅安装指南。
git clone https://github.com/Microsoft/Recommenders
cd Recommenders
python tools/generate_conda_file.py
conda env create -f reco_base.yaml
conda activate reco_base
python -m ipykernel install --user --name reco_base --display-name "Python (reco)"
jupyter notebook
注意-交替最小二乘(ALS)笔记本需要PySpark环境才能运行。请按照安装指南中的步骤在PySpark环境中运行这些笔记本。对于深度学习算法,建议使用GPU机器。
下表列出了存储库中当前可用的推荐算法。当有不同的实现可用时,notebook会链接到“Environment”列下。
Algorithm | Environment | Type | Description |
---|---|---|---|
Alternating Least Squares (ALS) | PySpark | Collaborative Filtering | Matrix factorization algorithm for explicit or implicit feedback in large datasets, optimized by Spark MLLib for scalability and distributed computing capability |
Attentive Asynchronous Singular Value Decomposition (A2SVD)* | Python CPU / Python GPU | Collaborative Filtering | Sequential-based algorithm that aims to capture both long and short-term user preferences using attention mechanism |
Cornac/Bayesian Personalized Ranking (BPR) | Python CPU | Collaborative Filtering | Matrix factorization algorithm for predicting item ranking with implicit feedback |
Cornac/Bilateral Variational Autoencoder (BiVAE) | Python CPU / Python GPU | Collaborative Filtering | Generative model for dyadic data (e.g., user-item interactions) |
Convolutional Sequence Embedding Recommendation (Caser) | Python CPU / Python GPU | Collaborative Filtering | Algorithm based on convolutions that aim to capture both user’s general preferences and sequential patterns |
Deep Knowledge-Aware Network (DKN)* | Python CPU / Python GPU | Content-Based Filtering | Deep learning algorithm incorporating a knowledge graph and article embeddings to provide powerful news or article recommendations |
Extreme Deep Factorization Machine (xDeepFM)* | Python CPU / Python GPU | Hybrid | Deep learning based algorithm for implicit and explicit feedback with user/item features |
FastAI Embedding Dot Bias (FAST) | Python CPU / Python GPU | Collaborative Filtering | General purpose algorithm with embeddings and biases for users and items |
LightFM/Hybrid Matrix Factorization | Python CPU | Hybrid | Hybrid matrix factorization algorithm for both implicit and explicit feedbacks |
LightGBM/Gradient Boosting Tree* | Python CPU / PySpark | Content-Based Filtering | Gradient Boosting Tree algorithm for fast training and low memory usage in content-based problems |
LightGCN | Python CPU / Python GPU | Collaborative Filtering | Deep learning algorithm which simplifies the design of GCN for predicting implicit feedback |
GeoIMC | Python CPU | Hybrid | Matrix completion algorithm that has into account user and item features using Riemannian conjugate gradients optimization and following a geometric approach. |
GRU4Rec | Python CPU / Python GPU | Collaborative Filtering | Sequential-based algorithm that aims to capture both long and short-term user preferences using recurrent neural networks |
Multinomial VAE | Python CPU / Python GPU | Collaborative Filtering | Generative Model for predicting user/item interactions |
Neural Recommendation with Long- and Short-term User Representations (LSTUR)* | Python CPU / Python GPU | Content-Based Filtering | Neural recommendation algorithm with long- and short-term user interest modeling |
Neural Recommendation with Attentive Multi-View Learning (NAML)* | Python CPU / Python GPU | Content-Based Filtering | Neural recommendation algorithm with attentive multi-view learning |
Neural Collaborative Filtering (NCF) | Python CPU / Python GPU | Collaborative Filtering | Deep learning algorithm with enhanced performance for implicit feedback |
Neural Recommendation with Personalized Attention (NPA)* | Python CPU / Python GPU | Content-Based Filtering | Neural recommendation algorithm with personalized attention network |
Neural Recommendation with Multi-Head Self-Attention (NRMS)* | Python CPU / Python GPU | Content-Based Filtering | Neural recommendation algorithm with multi-head self-attention |
Next Item Recommendation (NextItNet) | Python CPU / Python GPU | Collaborative Filtering | Algorithm based on dilated convolutions and residual network that aims to capture sequential patterns |
Restricted Boltzmann Machines (RBM) | Python CPU / Python GPU | Collaborative Filtering | Neural network based algorithm for learning the underlying probability distribution for explicit or implicit feedback |
Riemannian Low-rank Matrix Completion (RLRMC)* | Python CPU | Collaborative Filtering | Matrix factorization algorithm using Riemannian conjugate gradients optimization with small memory consumption. |
Simple Algorithm for Recommendation (SAR)* | Python CPU | Collaborative Filtering | Similarity-based algorithm for implicit feedback dataset |
Short-term and Long-term Preference Integrated Recommender (SLi-Rec)* | Python CPU / Python GPU | Collaborative Filtering | Sequential-based algorithm that aims to capture both long and short-term user preferences using attention mechanism, a time-aware controller and a content-aware controller |
Multi-Interest-Aware Sequential User Modeling (SUM)* | Python CPU / Python GPU | Collaborative Filtering | An enhanced memory network-based sequential user model which aims to capture users' multiple interests. |
Standard VAE | Python CPU / Python GPU | Collaborative Filtering | Generative Model for predicting user/item interactions |
Surprise/Singular Value Decomposition (SVD) | Python CPU | Collaborative Filtering | Matrix factorization algorithm for predicting explicit rating feedback in datasets that are not very large |
Term Frequency - Inverse Document Frequency (TF-IDF) | Python CPU | Content-Based Filtering | Simple similarity-based algorithm for content-based recommendations with text datasets |
Vowpal Wabbit (VW)* | Python CPU (online training) | Content-Based Filtering | Fast online learning algorithms, great for scenarios where user features / context are constantly changing |
Wide and Deep | Python CPU / Python GPU | Hybrid | Deep learning algorithm that can memorize feature interactions and generalize user features |
xLearn/Factorization Machine (FM) & Field-Aware FM (FFM) | Python CPU | Hybrid | Quick and memory efficient algorithm to predict labels with user/item features |
注:*表示微软发明/贡献的算法。独立或孵化的算法和实用程序是Conrib文件夹的候选者。这将包含可能不容易放入核心存储库的贡献,或者需要时间来重构或成熟代码并添加必要的测试。
Algorithm | Environment | Type | Description |
---|---|---|---|
SARplus * | PySpark | Collaborative Filtering | Optimized implementation of SAR for Spark |
我们提供了一个基准notebook来说明如何评估和比较不同的算法。在本notebook中,使用分层拆分将MovieLens数据集按75/25的比率拆分成训练/测试集。使用下面的每种协同过滤算法来训练推荐模型。
我们利用文献中报道的经验参数值。对于排名指标,我们使用k=10(前10个推荐项目)。我们在标准的NC6S_v2 Azure DSVM(6个vCPU、112 GB内存和1个P100 GPU)上运行比较。Spark ALS在本地独立模式下运行。在此表中,我们显示了在Movielens 100k上运行15次迭代的算法的结果。
Algo | MAP | nDCG@k | Precision@k | Recall@k | RMSE | MAE | R2 | Explained Variance |
---|---|---|---|---|---|---|---|---|
ALS | 0.004732 | 0.044239 | 0.048462 | 0.017796 | 0.965038 | 0.753001 | 0.255647 | 0.251648 |
BiVAE | 0.146126 | 0.475077 | 0.411771 | 0.219145 | N/A | N/A | N/A | N/A |
BPR | 0.132478 | 0.441997 | 0.388229 | 0.212522 | N/A | N/A | N/A | N/A |
FastAI | 0.025503 | 0.147866 | 0.130329 | 0.053824 | 0.943084 | 0.744337 | 0.285308 | 0.287671 |
LightGCN | 0.088526 | 0.419846 | 0.379626 | 0.144336 | N/A | N/A | N/A | N/A |
NCF | 0.107720 | 0.396118 | 0.347296 | 0.180775 | N/A | N/A | N/A | N/A |
SAR | 0.110591 | 0.382461 | 0.330753 | 0.176385 | 1.253805 | 1.048484 | -0.569363 | 0.030474 |
SVD | 0.012873 | 0.095930 | 0.091198 | 0.032783 | 0.938681 | 0.742690 | 0.291967 | 0.291971 |
Microsoft AI Github:在我们的中央存储库中查找其他最佳实践项目和Azure AI设计模式。
NLP最佳实践:关于NLP的最佳实践和实例。
计算机视觉最佳实践:计算机视觉最佳实践和实例。
预测最佳实践:时间序列预测的最佳实践和实例。