��️
News: If you're interested to gain some insights on ML/AI technical interviews, please check out my new machine learning interview enlightener repo.
��️
Note: This repo is under continous development, and all feedback and contribution are very welcome
��
Deploying deep learning models in production can be challenging, as it is far beyond training models with good performance. Several distinct components need to be designed and developed in order to deploy a production level deep learning system (seen below):
This repo aims to be an engineering guideline for building production-level deep learning systems which will be deployed in real world applications.
Fun
��
fact: 85% of AI projects fail. 1 Potential reasons include:
Technically infeasible or poorly scoped
Never make the leap to production
Unclear success criteria (metrics)
Poor team management
1. ML Projects lifecycle
Importance of understanding state of the art in your domain:
Helps to understand what is possible
Helps to know what to try next
2. Mental Model for ML project
The two important factors to consider when defining and prioritizing ML projects:
High Impact:
Complex parts of your pipeline
Where "cheap prediction" is valuable
Where automating complicated manual process is valuable
Low Cost:
Cost is driven by:
Data availability
Performance requirements: costs tend to scale super-linearly in the accuracy requirement
Problem difficulty:
Some of the hard problems include: unsupervised learning, reinforcement learning, and certain categories of supervised learning
Full stack pipeline
The following figure represents a high level overview of different components in a production level deep learning system:
In the following, we will go through each module and recommend toolsets and frameworks as well as best practices from practitioners that fit each component.
1. Data Management
1.1 Data Sources
Supervised deep learning requires a lot of labeled data
Labeling own data is costly!
Here are some resources for data:
Open source data (good to start with, but not an advantage)
Data augmentation (a MUST for computer vision, an option for NLP)
Synthetic data (almost always worth starting with, esp. in NLP)
1.2 Data Labeling
Requires: separate software stack (labeling platforms), temporary labor, and QC
Sources of labor for labeling:
Crowdsourcing (Mechanical Turk): cheap and scalable, less reliable, needs QC
Hiring own annotators: less QC needed, expensive, slow to scale
Feature Store: store, access, and share machine learning features(Feature extraction could be computationally expensive and nearly impossible to scale, hence re-using features by different models and teams is a key to high performance ML teams).
Dolt: a SQL database with Git-like version control for data and schema
1.5. Data Processing
Training data for production models may come from different sources, including Stored data in db and object stores, log processing, and outputs of other classifiers.
There are dependencies between tasks, each needs to be kicked off after its dependencies are finished. For example, training on new log data, requires a preprocessing step before training.
Makefiles are not scalable. "Workflow manager"s become pretty essential in this regard.
Airflow by Airbnb: Dynamic, extensible, elegant, and scalable (the most widely used)
DAG workflow
Robust conditional execution: retry in case of failure
Pusher supports docker images with tensorflow serving
Whole workflow in a single .py file
2. Development, Training, and Evaluation
2.1. Software engineering
Winner language: Python
Editors:
Vim
Emacs
VS Code (Recommended by the author): Built-in git staging and diff, Lint code, open projects remotely through ssh
Notebooks: Great as starting point of the projects, hard to scale (fun fact: Netflix’s Notebook-Driven Architecture is an exception, which is entirely based on nteract suites).
nteract: a next-gen React-based UI for Jupyter notebooks
Papermill: is an nteract library built for parameterizing, executing, and analyzing Jupyter Notebooks.
Commuter: another nteract project which provides a read-only display of notebooks (e.g. from S3 buckets).
Streamlit: interactive data science tool with applets
Comet: lets you track code, experiments, and results on ML projects
Weights & Biases: Record and visualize every detail of your research with easy collaboration
MLFlow Tracking: for logging parameters, code versions, metrics, and output files as well as visualization of the results.
Automatic experiment tracking with one line of code in python
Side by side comparison of experiments
Hyper parameter tuning
Supports Kubernetes based jobs
2.5. Hyperparameter Tuning
Approaches:
Grid search
Random search
Bayesian Optimization
HyperBand and Asynchronous Successive Halving Algorithm (ASHA)
Population-based Training
Platforms:
RayTune: Ray Tune is a Python library for hyperparameter tuning at any scale (with a focus on deep learning and deep reinforcement learning). Supports any machine learning framework, including PyTorch, XGBoost, MXNet, and Keras.
Katib: Kubernete's Native System for Hyperparameter Tuning and Neural Architecture Search, inspired by [Google vizier](https://static.googleusercontent.com/media/ research.google.com/ja//pubs/archive/ bcb15507f4b52991a0783013df4222240e942381.pdf) and supports multiple ML/DL frameworks (e.g. TensorFlow, MXNet, and PyTorch).
Hyperas: a simple wrapper around hyperopt for Keras, with a simple template notation to define hyper-parameter ranges to tune.
SIGOPT: a scalable, enterprise-grade optimization platform
Sweeps from [Weights & Biases] (https://www.wandb.com/): Parameters are not explicitly specified by a developer. Instead they are approximated and learned by a machine learning model.
Keras Tuner: A hyperparameter tuner for Keras, specifically for tf.keras with TensorFlow 2.0.
2.6. Distributed Training
Data parallelism: Use it when iteration time is too long (both tensorflow and PyTorch support)
CPU inference is preferable if it meets the requirements.
Scale by adding more servers, or going serverless.
GPU inference:
TF serving or Clipper
Adaptive batching is useful
(Bonus) Deploying Jupyter Notebooks:
Kubeflow Fairing is a hybrid deployment package that let's you deploy your Jupyter notebook codes!
4.5 Service Mesh and Traffic Routing
Transition from monolithic applications towards a distributed microservice architecture could be challenging.
A Service mesh (consisting of a network of microservices) reduces the complexity of such deployments, and eases the strain on development teams.
Istio: a service mesh to ease creation of a network of deployed services with load balancing, service-to-service authentication, monitoring, with few or no code changes in service code.
4.4. Monitoring:
Purpose of monitoring:
Alerts for downtime, errors, and distribution shifts
Catching service and data regressions
Cloud providers solutions are decent
Kiali:an observability console for Istio with service mesh configuration capabilities. It answers these questions: How are the microservices connected? How are they performing?
Are we done?
4.5. Deploying on Embedded and Mobile Devices
Main challenge: memory footprint and compute constraints
Solutions:
Quantization
Reduced model size
MobileNets
Knowledge Distillation
DistillBERT (for NLP)
Embedded and Mobile Frameworks:
Tensorflow Lite
PyTorch Mobile
Core ML
ML Kit
FRITZ
OpenVINO
Model Conversion:
Open Neural Network Exchange (ONNX): open-source format for deep learning models
原文链接 Mixed precision in AI frameworks (Automatic Mixed Precision): 混合精度计算,最高3倍加速比,利用Tensor Cores;(Get upto 3X speedup running on Tensor Cores With just a few lines of code added to your existing train
Deep Machine Learning libraries and frameworks At the end of 2015, all eyes were on the year’s accomplishments, as well as forecasting technology trends of 2016 and beyond. One particular field that h
Deep-Learning-in-Production In this repository, I will share some useful notes and references about deploying deep learning-based models in production. Convert PyTorch Models in Production: PyTorch Pr
Awesome production machine learning This repository contains a curated list of awesome open source libraries that will help you deploy, monitor, version, scale, and secure your production machine lear
Level Project Status Level began with the ambitious idea of solving the problems caused by real-time communication tools. After pouring thousands of hours effort into the cause, I made the tough decis