amazon-sagemaker-examples

授权协议 Apache-2.0 License
开发语言 Python
所属分类 神经网络/人工智能、 机器学习/深度学习
软件类型 开源软件
地区 不详
投 递 者 景靖琪
操作系统 跨平台
开源组织
适用人群 未知
 软件概览

Amazon SageMaker Examples

Example Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using Amazon SageMaker.

�� Background

Amazon SageMaker is a fully managed service for data science and machine learning (ML) workflows.You can use Amazon SageMaker to simplify the process of building, training, and deploying ML models.

The SageMaker example notebooks are Jupyter notebooks that demonstrate the usage of Amazon SageMaker.

��️ Setup

The quickest setup to run example notebooks includes:

�� Usage

These example notebooks are automatically loaded into SageMaker Notebook Instances.They can be accessed by clicking on the SageMaker Examples tab in Jupyter or the SageMaker logo in JupyterLab.

Although most examples utilize key Amazon SageMaker functionality like distributed, managed training or real-time hosted endpoints, these notebooks can be run outside of Amazon SageMaker Notebook Instances with minimal modification (updating IAM role definition and installing the necessary libraries).

�� Examples

Introduction to Ground Truth Labeling Jobs

These examples provide quick walkthroughs to get you up and running with the labeling job workflow for Amazon SageMaker Ground Truth.

Introduction to Applying Machine Learning

These examples provide a gentle introduction to machine learning concepts as they are applied in practical use cases across a variety of sectors.

  • Targeted Direct Marketing predicts potential customers that are most likely to convert based on customer and aggregate level metrics, using Amazon SageMaker's implementation of XGBoost.
  • Predicting Customer Churn uses customer interaction and service usage data to find those most likely to churn, and then walks through the cost/benefit trade-offs of providing retention incentives. This uses Amazon SageMaker's implementation of XGBoost to create a highly predictive model.
  • Time-series Forecasting generates a forecast for topline product demand using Amazon SageMaker's Linear Learner algorithm.
  • Cancer Prediction predicts Breast Cancer based on features derived from images, using SageMaker's Linear Learner.
  • Ensembling predicts income using two Amazon SageMaker models to show the advantages in ensembling.
  • Video Game Sales develops a binary prediction model for the success of video games based on review scores.
  • MXNet Gluon Recommender System uses neural network embeddings for non-linear matrix factorization to predict user movie ratings on Amazon digital reviews.
  • Fair Linear Learner is an example of an effective way to create fair linear models with respect to sensitive features.
  • Population Segmentation of US Census Data using PCA and Kmeans analyzes US census data and reduces dimensionality using PCA then clusters US counties using KMeans to identify segments of similar counties.
  • Document Embedding using Object2Vec is an example to embed a large collection of documents in a common low-dimensional space, so that the semantic distances between these documents are preserved.
  • Traffic violations forecasting using DeepAR is an example to use daily traffic violation data to predict pattern and seasonality to use Amazon DeepAR alogorithm.

SageMaker Automatic Model Tuning

These examples introduce SageMaker's hyperparameter tuning functionality which helps deliver the best possible predictions by running a large number of training jobs to determine which hyperparameter values are the most impactful.

  • XGBoost Tuning shows how to use SageMaker hyperparameter tuning to improve your model fits for the Targeted Direct Marketing task.
  • BlazingText Tuning shows how to use SageMaker hyperparameter tuning with the BlazingText built-in algorithm and 20_newsgroups dataset..
  • TensorFlow Tuning shows how to use SageMaker hyperparameter tuning with the pre-built TensorFlow container and MNIST dataset.
  • MXNet Tuning shows how to use SageMaker hyperparameter tuning with the pre-built MXNet container and MNIST dataset.
  • HuggingFace Tuning shows how to use SageMaker hyperparameter tuning with the pre-built HuggingFace container and 20_newsgroups dataset.
  • Keras BYO Tuning shows how to use SageMaker hyperparameter tuning with a custom container running a Keras convolutional network on CIFAR-10 data.
  • R BYO Tuning shows how to use SageMaker hyperparameter tuning with the custom container from the Bring Your Own R Algorithm example.
  • Analyzing Results is a shared notebook that can be used after each of the above notebooks to provide analysis on how training jobs with different hyperparameters performed.

Introduction to Amazon Algorithms

These examples provide quick walkthroughs to get you up and running with Amazon SageMaker's custom developed algorithms. Most of these algorithms can train on distributed hardware, scale incredibly well, and are faster and cheaper than popular alternatives.

  • k-means is our introductory example for Amazon SageMaker. It walks through the process of clustering MNIST images of handwritten digits using Amazon SageMaker k-means.
  • Factorization Machines showcases Amazon SageMaker's implementation of the algorithm to predict whether a handwritten digit from the MNIST dataset is a 0 or not using a binary classifier.
  • Latent Dirichlet Allocation (LDA) introduces topic modeling using Amazon SageMaker Latent Dirichlet Allocation (LDA) on a synthetic dataset.
  • Linear Learner predicts whether a handwritten digit from the MNIST dataset is a 0 or not using a binary classifier from Amazon SageMaker Linear Learner.
  • Neural Topic Model (NTM) uses Amazon SageMaker Neural Topic Model (NTM) to uncover topics in documents from a synthetic data source, where topic distributions are known.
  • Principal Components Analysis (PCA) uses Amazon SageMaker PCA to calculate eigendigits from MNIST.
  • Seq2Seq uses the Amazon SageMaker Seq2Seq algorithm that's built on top of Sockeye, which is a sequence-to-sequence framework for Neural Machine Translation based on MXNet. Seq2Seq implements state-of-the-art encoder-decoder architectures which can also be used for tasks like Abstractive Summarization in addition to Machine Translation. This notebook shows translation from English to German text.
  • Image Classification includes full training and transfer learning examples of Amazon SageMaker's Image Classification algorithm. This uses a ResNet deep convolutional neural network to classify images from the caltech dataset.
  • XGBoost for regression predicts the age of abalone (Abalone dataset) using regression from Amazon SageMaker's implementation of XGBoost.
  • XGBoost for multi-class classification uses Amazon SageMaker's implementation of XGBoost to classify handwritten digits from the MNIST dataset as one of the ten digits using a multi-class classifier. Both single machine and distributed use-cases are presented.
  • DeepAR for time series forecasting illustrates how to use the Amazon SageMaker DeepAR algorithm for time series forecasting on a synthetically generated data set.
  • BlazingText Word2Vec generates Word2Vec embeddings from a cleaned text dump of Wikipedia articles using SageMaker's fast and scalable BlazingText implementation.
  • Object detection for bird images demonstrates how to use the Amazon SageMaker Object Detection algorithm with a public dataset of Bird images.
  • Object2Vec for movie recommendation demonstrates how Object2Vec can be used to model data consisting of pairs of singleton tokens using movie recommendation as a running example.
  • Object2Vec for multi-label classification shows how ObjectToVec algorithm can train on data consisting of pairs of sequences and singleton tokens using the setting of genre prediction of movies based on their plot descriptions.
  • Object2Vec for sentence similarity explains how to train Object2Vec using sequence pairs as input using sentence similarity analysis as the application.
  • IP Insights for suspicious logins shows how to train IP Insights on a login events for a web server to identify suspicious login attempts.
  • Semantic Segmentation shows how to train a semantic segmentation algorithm using the Amazon SageMaker Semantic Segmentation algorithm. It also demonstrates how to host the model and produce segmentation masks and probability of segmentation.

Amazon SageMaker RL

The following provide examples demonstrating different capabilities of Amazon SageMaker RL.

  • Cartpole using Coach demonstrates the simplest usecase of Amazon SageMaker RL using Intel's RL Coach.
  • AWS DeepRacer demonstrates AWS DeepRacer trainig using RL Coach in the Gazebo environment.
  • HVAC using EnergyPlus demonstrates the training of HVAC systems using the EnergyPlus environment.
  • Knapsack Problem demonstrates how to solve the knapsack problem using a custom environment.
  • Mountain Car Mountain car is a classic RL problem. This notebook explains how to solve this using the OpenAI Gym environment.
  • Distributed Neural Network Compression This notebook explains how to compress ResNets using RL, using a custom environment and the RLLib toolkit.
  • Portfolio Management This notebook uses a custom Gym environment to manage multiple financial investments.
  • Autoscaling demonstrates how to adjust load depending on demand. This uses RL Coach and a custom environment.
  • Roboschool is an open source physics simulator that is commonly used to train RL policies for robotic systems. This notebook demonstrates training a few agents using it.
  • Stable Baselines In this notebook example, we will make the HalfCheetah agent learn to walk using the stable-baselines, which are a set of improved implementations of Reinforcement Learning (RL) algorithms based on OpenAI Baselines.
  • Travelling Salesman is a classic NP hard problem, which this notebook solves with AWS SageMaker RL.
  • Tic-tac-toe is a simple implementation of a custom Gym environment to train and deploy an RL agent in Coach that then plays tic-tac-toe interactively in a Jupyter Notebook.
  • Unity Game Agent shows how to use RL algorithms to train an agent to play Unity3D game.

Scientific Details of Algorithms

These examples provide more thorough mathematical treatment on a select group of algorithms.

  • Streaming Median sequentially introduces concepts used in streaming algorithms, which many SageMaker algorithms rely on to deliver speed and scalability.
  • Latent Dirichlet Allocation (LDA) dives into Amazon SageMaker's spectral decomposition approach to LDA.
  • Linear Learner features shows how to use the class weights and loss functions features of the SageMaker Linear Learner algorithm to improve performance on a credit card fraud prediction task

Amazon SageMaker Debugger

These examples provide and introduction to SageMaker Debugger which allows debugging and monitoring capabilities for training of machine learning and deep learning algorithms. Note that although these notebooks focus on a specific framework, the same approach works with all the frameworks that Amazon SageMaker Debugger supports. The notebooks below are listed in the order in which we recommend you review them.

Amazon SageMaker Clarify

These examples provide an introduction to SageMaker Clarify which provides machine learning developers with greater visibility into their training data and models so they can identify and limit bias and explain predictions.

  • Fairness and Explainability with SageMaker Clarify shows how to use SageMaker Clarify Processor API to measure the pre-training bias of a dataset and post-training bias of a model, and explain the importance of the input features on the model's decision.
  • Amazon SageMaker Clarify Model Monitors shows how to use SageMaker Clarify Model Monitor API to schedule bias monitor to monitor predictions for bias drift on a regular basis, and schedule explainability monitor to monitor predictions for feature attribution drift on a regular basis.

Advanced Amazon SageMaker Functionality

These examples that showcase unique functionality available in Amazon SageMaker. They cover a broad range of topics and will utilize a variety of methods, but aim to provide the user with sufficient insight or inspiration to develop within Amazon SageMaker.

  • Data Distribution Types showcases the difference between two methods for sending data from S3 to Amazon SageMaker Training instances. This has particular implication for scalability and accuracy of distributed training.
  • Encrypting Your Data shows how to use Server Side KMS encrypted data with Amazon SageMaker training. The IAM role used for S3 access needs to have permissions to encrypt and decrypt data with the KMS key.
  • Using Parquet Data shows how to bring Parquet data sitting in S3 into an Amazon SageMaker Notebook and convert it into the recordIO-protobuf format that many SageMaker algorithms consume.
  • Connecting to Redshift demonstrates how to copy data from Redshift to S3 and vice-versa without leaving Amazon SageMaker Notebooks.
  • Bring Your Own XGBoost Model shows how to use Amazon SageMaker Algorithms containers to bring a pre-trained model to a realtime hosted endpoint without ever needing to think about REST APIs.
  • Bring Your Own k-means Model shows how to take a model that's been fit elsewhere and use Amazon SageMaker Algorithms containers to host it.
  • Bring Your Own R Algorithm shows how to bring your own algorithm container to Amazon SageMaker using the R language.
  • Installing the R Kernel shows how to install the R kernel into an Amazon SageMaker Notebook Instance.
  • Bring Your Own scikit Algorithm provides a detailed walkthrough on how to package a scikit learn algorithm for training and production-ready hosting.
  • Bring Your Own MXNet Model shows how to bring a model trained anywhere using MXNet into Amazon SageMaker.
  • Bring Your Own TensorFlow Model shows how to bring a model trained anywhere using TensorFlow into Amazon SageMaker.
  • Inference Pipeline with SparkML and XGBoost shows how to deploy an Inference Pipeline with SparkML for data pre-processing and XGBoost for training on the Abalone dataset. The pre-processing code is written once and used between training and inference.
  • Inference Pipeline with SparkML and BlazingText shows how to deploy an Inference Pipeline with SparkML for data pre-processing and BlazingText for training on the DBPedia dataset. The pre-processing code is written once and used between training and inference.
  • Experiment Management Capabilities with Search shows how to organize Training Jobs into projects, and track relationships between Models, Endpoints, and Training Jobs.
  • Host Multiple Models with Your Own Algorithm shows how to deploy multiple models to a realtime hosted endpoint with your own custom algorithm.
  • Host Multiple Models with XGBoost shows how to deploy multiple models to a realtime hosted endpoint using a multi-model enabled XGBoost container.
  • Host Multiple Models with SKLearn shows how to deploy multiple models to a realtime hosted endpoint using a multi-model enabled SKLearn container.
  • SageMaker Training and Inference with Script Mode shows how to use custom training and inference scripts, similar to those you would use outside of SageMaker, with SageMaker's prebuilt containers for various frameworks like Scikit-learn, PyTorch, and XGBoost.

Amazon SageMaker Neo Compilation Jobs

These examples provide you an introduction to how to use Neo to optimizes deep learning model

Amazon SageMaker Processing

These examples show you how to use SageMaker Processing jobs to run data processing workloads.

Amazon SageMaker Pre-Built Framework Containers and the Python SDK

Pre-Built Deep Learning Framework Containers

These examples show you how to train and host in pre-built deep learning framework containers using the SageMaker Python SDK.

Pre-Built Machine Learning Framework Containers

These examples show you how to build Machine Learning models with frameworks like Apache Spark or Scikit-learn using SageMaker Python SDK.

Using Amazon SageMaker with Apache Spark

These examples show how to use Amazon SageMaker for model training, hosting, and inference through Apache Spark using SageMaker Spark. SageMaker Spark allows you to interleave Spark Pipeline stages with Pipeline stages that interact with Amazon SageMaker.

AWS Marketplace

Create algorithms/model packages for listing in AWS Marketplace for machine learning.

This example shows you how to package a model-package/algorithm for listing in AWS Marketplace for machine learning.

Once you have created an algorithm or a model package to be listed in the AWS Marketplace, the next step is to list it in AWS Marketplace, and provide a sample notebook that customers can use to try your algorithm or model package.

Use algorithms, data, and model packages from AWS Marketplace.

These examples show you how to use model-packages and algorithms from AWS Marketplace and dataset products from AWS Data Exchange, for machine learning.

⚖️ License

This library is licensed under the Apache 2.0 License.For more details, please take a look at the LICENSE file.

�� Contributing

Although we're extremely excited to receive contributions from the community, we're still working on the best mechanism to take in examples from external sources. Please bear with us in the short-term if pull requests take longer than expected or are closed.Please read our contributing guidelinesif you'd like to open an issue or submit a pull request.

  • 亚马逊的sagemaker 提供了模型训练到部署的全流程支持,下面这个例子是其参考手册的入门例子,记录一下整体流程,具体代码操作可以查看其手册。 1 创建Amazon账号,建立IAM 用户(identify and access management) 2 建立 S3 bucket (Amazon simple storage service ) 用于存放训练数据和调优后的模型代码/模型工件(mo

 相关资料
  • 我正在做一个项目,我需要将视频从我的IP摄像机发送到Kinesis视频流,并使用Sagemaker来托管我的ML模型,然后它将实时分析来自Kinesis视频流的视频。 我跟踪了这个链接:https://aws.amazon.com/blogs/machine-learning/analyse-live-video-at-scale-in-real-time-using-amazon-kinesis

  • 我想在SageMaker找一份当地的培训工作。 根据这个AWS笔记本(https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/mxnet_gluon_mnist/mxnet_mnist_with_gluon_local_mode.ipynb),我能够在本地进行训练和预测。 有没有办法

  • SageMaker Python SDK SageMaker Python SDK is an open source library for training and deploying machine learning models on Amazon SageMaker. With the SDK, you can train and deploy models using popular

  • Amazon Kinesis是一种托管,可扩展,基于云的服务,允许实时处理每秒流式传输大量数据。 它专为实时应用程序而设计,允许开发人员从多个来源接收任何数量的数据,可以在EC2实例上运行扩展和缩小。 它用于从大型分布式流(如事件日志和社交媒体源)捕获,存储和处理数据。 处理完数据后,Kinesis会同时将其分发给多个消费者。 如何使用亚马逊KCL? 它用于我们需要快速移动数据及其连续处理的情况。

  • 我有一个500GB的csv文件和一个1.5 TB数据的mysql数据库,我想运行aws sagemaker分类和回归算法和随机森林。 aws sagemaker能支持吗?模型可以批量或分块读取和训练吗?它的任何例子

  • Amazon S3 (简单存储服务)是一种可扩展,高速,低成本的基于Web的服务,专为在线备份和归档数据和应用程序而设计。 它允许上传,存储和下载任何类型的文件,最大为5 GB。 此服务允许订户访问Amazon用于运行其自己的网站的相同系统。 订户控制数据的可访问性,即私人/公共可访问。 如何配置S3? 以下是配置S3帐户的步骤。 Step 1 - 使用此链接打开Amazon S3控制台 - ht