We have a new release Recommenders 0.7.0!We have changed the names of the folders which contain the source code, so that they are more informative. This implies that you will need to change any import statements that reference the recommenders package. Specifically, the folder reco_utils
has been renamed to recommenders
and its subfolders have been renamed according to issue 1390.
The previous release (0.6.0) is compatible with the old style of naming of modules.
The recommenders package now supports three types of environments: venv and virtualenv with Python 3.6, conda with Python versions 3.6 and 3.7.
We have also added new evaluation metrics: novelty, serendipity, diversity and coverage (see the evalution notebooks).
Code coverage reports are now generated for every PR, using Codecov.
Starting with release 0.6.0, Recommenders has been available on PyPI and can be installed using pip!
Here you can find the PyPi page: https://pypi.org/project/recommenders/
Here you can find the package documentation: https://microsoft-recommenders.readthedocs.io/en/latest/
This repository contains examples and best practices for building recommendation systems, provided as Jupyter notebooks. The examples detail our learnings on five key tasks:
Several utilities are provided in recommenders to support common tasks such as loading datasets in the format expected by different algorithms, evaluating model outputs, and splitting training/test data. Implementations of several state-of-the-art algorithms are included for self-study and customization in your own applications. See the recommenders documentation.
For a more detailed overview of the repository, please see the documents on the wiki page.
Please see the setup guide for more details on setting up your machine locally, on a data science virtual machine (DSVM) or on Azure Databricks.
The installation of the recommenders package has been tested with
and currently does not support version 3.8 and above. It is recommended to install the package and its dependencies inside a clean environment (such as conda, venv or virtualenv).
To set up on your local machine:
To install core utilities, CPU-based algorithms, and dependencies:
sudo apt-get install -y build-essential libpython<version>
where <version>
should be 3.6
or 3.7
as appropriate.
On Windows you will need Microsoft C++ Build Tools.
Create a conda or virtual environment. See the setup guide for more details.
Within the created environment, install the package from PyPI:
pip install --upgrade pip
pip install --upgrade setuptools
pip install recommenders[examples]
In the case of conda, you also need to
conda install numpy-base
after the pip installation.
python -m ipykernel install --user --name my_environment_name --display-name "Python (reco)"
jupyter notebook
00_quick_start
folder. Make sure to change the kernel to "Python (reco)".For additional options to install the package (support for GPU, Spark etc.) see this guide.
NOTE - The Alternating Least Squares (ALS) notebooks require a PySpark environment to run. Please follow the steps in the setup guide to run these notebooks in a PySpark environment. For the deep learning algorithms, it is recommended to use a GPU machine and to follow the steps in the setup guide to set up Nvidia libraries.
NOTE for DSVM Users - Please follow the steps in the Dependencies setup - Set PySpark environment variables on Linux or MacOS and Troubleshooting for the DSVM sections if you encounter any issue.
DOCKER - Another easy way to try the recommenders repository and get started quickly is to build docker images suitable for different environments.
The table below lists the recommender algorithms currently available in the repository. Notebooks are linked under the Environment column when different implementations are available.
Algorithm | Environment | Type | Description |
---|---|---|---|
Alternating Least Squares (ALS) | PySpark | Collaborative Filtering | Matrix factorization algorithm for explicit or implicit feedback in large datasets, optimized by Spark MLLib for scalability and distributed computing capability |
Attentive Asynchronous Singular Value Decomposition (A2SVD)* | Python CPU / Python GPU | Collaborative Filtering | Sequential-based algorithm that aims to capture both long and short-term user preferences using attention mechanism |
Cornac/Bayesian Personalized Ranking (BPR) | Python CPU | Collaborative Filtering | Matrix factorization algorithm for predicting item ranking with implicit feedback |
Cornac/Bilateral Variational Autoencoder (BiVAE) | Python CPU / Python GPU | Collaborative Filtering | Generative model for dyadic data (e.g., user-item interactions) |
Convolutional Sequence Embedding Recommendation (Caser) | Python CPU / Python GPU | Collaborative Filtering | Algorithm based on convolutions that aim to capture both user’s general preferences and sequential patterns |
Deep Knowledge-Aware Network (DKN)* | Python CPU / Python GPU | Content-Based Filtering | Deep learning algorithm incorporating a knowledge graph and article embeddings to provide powerful news or article recommendations |
Extreme Deep Factorization Machine (xDeepFM)* | Python CPU / Python GPU | Hybrid | Deep learning based algorithm for implicit and explicit feedback with user/item features |
FastAI Embedding Dot Bias (FAST) | Python CPU / Python GPU | Collaborative Filtering | General purpose algorithm with embeddings and biases for users and items |
LightFM/Hybrid Matrix Factorization | Python CPU | Hybrid | Hybrid matrix factorization algorithm for both implicit and explicit feedbacks |
LightGBM/Gradient Boosting Tree* | Python CPU / PySpark | Content-Based Filtering | Gradient Boosting Tree algorithm for fast training and low memory usage in content-based problems |
LightGCN | Python CPU / Python GPU | Collaborative Filtering | Deep learning algorithm which simplifies the design of GCN for predicting implicit feedback |
GeoIMC* | Python CPU | Hybrid | Matrix completion algorithm that has into account user and item features using Riemannian conjugate gradients optimization and following a geometric approach. |
GRU4Rec | Python CPU / Python GPU | Collaborative Filtering | Sequential-based algorithm that aims to capture both long and short-term user preferences using recurrent neural networks |
Multinomial VAE | Python CPU / Python GPU | Collaborative Filtering | Generative Model for predicting user/item interactions |
Neural Recommendation with Long- and Short-term User Representations (LSTUR)* | Python CPU / Python GPU | Content-Based Filtering | Neural recommendation algorithm with long- and short-term user interest modeling |
Neural Recommendation with Attentive Multi-View Learning (NAML)* | Python CPU / Python GPU | Content-Based Filtering | Neural recommendation algorithm with attentive multi-view learning |
Neural Collaborative Filtering (NCF) | Python CPU / Python GPU | Collaborative Filtering | Deep learning algorithm with enhanced performance for implicit feedback |
Neural Recommendation with Personalized Attention (NPA)* | Python CPU / Python GPU | Content-Based Filtering | Neural recommendation algorithm with personalized attention network |
Neural Recommendation with Multi-Head Self-Attention (NRMS)* | Python CPU / Python GPU | Content-Based Filtering | Neural recommendation algorithm with multi-head self-attention |
Next Item Recommendation (NextItNet) | Python CPU / Python GPU | Collaborative Filtering | Algorithm based on dilated convolutions and residual network that aims to capture sequential patterns |
Restricted Boltzmann Machines (RBM) | Python CPU / Python GPU | Collaborative Filtering | Neural network based algorithm for learning the underlying probability distribution for explicit or implicit feedback |
Riemannian Low-rank Matrix Completion (RLRMC)* | Python CPU | Collaborative Filtering | Matrix factorization algorithm using Riemannian conjugate gradients optimization with small memory consumption. |
Simple Algorithm for Recommendation (SAR)* | Python CPU | Collaborative Filtering | Similarity-based algorithm for implicit feedback dataset |
Short-term and Long-term Preference Integrated Recommender (SLi-Rec)* | Python CPU / Python GPU | Collaborative Filtering | Sequential-based algorithm that aims to capture both long and short-term user preferences using attention mechanism, a time-aware controller and a content-aware controller |
Multi-Interest-Aware Sequential User Modeling (SUM)* | Python CPU / Python GPU | Collaborative Filtering | An enhanced memory network-based sequential user model which aims to capture users' multiple interests. |
Standard VAE | Python CPU / Python GPU | Collaborative Filtering | Generative Model for predicting user/item interactions |
Surprise/Singular Value Decomposition (SVD) | Python CPU | Collaborative Filtering | Matrix factorization algorithm for predicting explicit rating feedback in datasets that are not very large |
Term Frequency - Inverse Document Frequency (TF-IDF) | Python CPU | Content-Based Filtering | Simple similarity-based algorithm for content-based recommendations with text datasets |
Vowpal Wabbit (VW)* | Python CPU (online training) | Content-Based Filtering | Fast online learning algorithms, great for scenarios where user features / context are constantly changing |
Wide and Deep | Python CPU / Python GPU | Hybrid | Deep learning algorithm that can memorize feature interactions and generalize user features |
xLearn/Factorization Machine (FM) & Field-Aware FM (FFM) | Python CPU | Hybrid | Quick and memory efficient algorithm to predict labels with user/item features |
NOTE: * indicates algorithms invented/contributed by Microsoft.
Independent or incubating algorithms and utilities are candidates for the contrib folder. This will house contributions which may not easily fit into the core repository or need time to refactor or mature the code and add necessary tests.
Algorithm | Environment | Type | Description |
---|---|---|---|
SARplus * | PySpark | Collaborative Filtering | Optimized implementation of SAR for Spark |
We provide a benchmark notebook to illustrate how different algorithms could be evaluated and compared. In this notebook, the MovieLens dataset is split into training/test sets at a 75/25 ratio using a stratified split. A recommendation model is trained using each of the collaborative filtering algorithms below. We utilize empirical parameter values reported in literature here. For ranking metrics we use k=10
(top 10 recommended items). We run the comparison on a Standard NC6s_v2 Azure DSVM (6 vCPUs, 112 GB memory and 1 P100 GPU). Spark ALS is run in local standalone mode. In this table we show the results on Movielens 100k, running the algorithms for 15 epochs.
Algo | MAP | nDCG@k | Precision@k | Recall@k | RMSE | MAE | R2 | Explained Variance |
---|---|---|---|---|---|---|---|---|
ALS | 0.004732 | 0.044239 | 0.048462 | 0.017796 | 0.965038 | 0.753001 | 0.255647 | 0.251648 |
BiVAE | 0.146126 | 0.475077 | 0.411771 | 0.219145 | N/A | N/A | N/A | N/A |
BPR | 0.132478 | 0.441997 | 0.388229 | 0.212522 | N/A | N/A | N/A | N/A |
FastAI | 0.025503 | 0.147866 | 0.130329 | 0.053824 | 0.943084 | 0.744337 | 0.285308 | 0.287671 |
LightGCN | 0.088526 | 0.419846 | 0.379626 | 0.144336 | N/A | N/A | N/A | N/A |
NCF | 0.107720 | 0.396118 | 0.347296 | 0.180775 | N/A | N/A | N/A | N/A |
SAR | 0.110591 | 0.382461 | 0.330753 | 0.176385 | 1.253805 | 1.048484 | -0.569363 | 0.030474 |
SVD | 0.012873 | 0.095930 | 0.091198 | 0.032783 | 0.938681 | 0.742690 | 0.291967 | 0.291971 |
This project adheres to Microsoft's Open Source Code of Conduct in order to foster a welcoming and inspiring communtity for all.
This project welcomes contributions and suggestions. Before contributing, please see our contribution guidelines.
These tests are the nightly builds, which compute the smoke and integration tests. main
is our principal branch and staging
is our development branch. We use pytest
for testing python utilities in recommenders and papermill
for the notebooks. For more information about the testing pipelines, please see the test documentation.
The following tests run on a Linux DSVM daily. These machines run 24/7.
Build Type | Branch | Status | Branch | Status | |
---|---|---|---|---|---|
Linux CPU | main | staging | |||
Linux GPU | main | staging | |||
Linux Spark | main | staging |
GitHub - microsoft/recommenders: Best Practices on Recommendation Systems recommenders/sar_movielens.ipynb at main · microsoft/recommenders · GitHub recommenders · PyPI recommenders/SETUP.md at main ·
项目GitHub地址:https://github.com/microsoft/recommenders 简介 此存储库包含构建推荐系统的示例和最佳实践,以Jupyter notebook形式提供。 这些例子详细介绍了我们在五项关键任务上的学习: 准备数据:为每个推荐算法准备和加载数据 模型:使用各种经典和深度学习推荐算法(如交替最小二乘(ALS)或极深分解机器(XDeepFM))建立模型。 评估
新发现个东西,是编码建议器的东西: Application frameworks have become an integral part of today's software development - this is hardly surprising given their promised benefits such as reduced costs, higher quality,
吴恩达Coursera, 机器学习专项课程, Machine Learning:Unsupervised Learning, Recommenders, Reinforcement Learning第二周所有jupyter notebook文件2: 吴恩达Coursera, 机器学习专项课程, Machine Learning:Unsupervised Learning, Recommenders
要求:tensorflow>2.6.0 from typing import Dict, Text import keras.layers import tensorflow as tf from keras import Model import tensorflow_recommenders as tfrs ###############################模型编写#####