Mixed precision in AI frameworks (Automatic Mixed Precision): 混合精度计算,最高3倍加速比,利用Tensor Cores;(Get upto 3X speedup running on Tensor Cores With just a few lines of code added to your existing training script)
Deep Learning Primitives (cuDNN): 深度学习GPU加速的标配;(High-performance building blocks for deep neural network applications including convolutions, activation functions, and tensor transformations)
Input Data Processing (DALI): 并行度高的数据加载和数据增强库(主要针对图像、视频);(An open source data loading and augmentation library that is fast, portable and flexible)
Multi-GPU Communication (NCCL): 组播通信神器,double-tree实现;(Collective communication routines, such as all-gather, reduce, and broadcast that accelerate multi-GPU deep learning training)
Deep Learning for Video Analytics (DeepStream SDK): High-level C++ API and runtime for GPU-accelerated transcoding and deep learning inference
Optical Flow for Video Inference (Optical Flow SDK): Set of high-level APIs that expose the latest hardware capability of Turing GPUs dedicated for computing the optical flow of pixels between images. Also useful for calculating stereo disparity and depth estimation.
High level SDK for tuning domain specific DNNs (Transfer Learning Toolkit): 迁移学习;(Enabling end to end Deep Learning workflows for industries)
AI enabled Annotation for Medical Imaging (AI-Assisted Annotation SDK): 没权限打开??;(AI-assisted annotation for medical imaging related data labeling)
Deep Learning GPU Training System (DIGITS): 网页版的数据集、模型、训练可视化工具(和TensorGou很像),在计算框架等核心组件外围包的一层可视化而已;(Rapidly train highly accurate deep neural network (DNNs) for image classification, segmentation and object detection tasks)
Linear Algebra (cuBLAS): GPU矩阵计算标配;(GPU-accelerated BLAS functionality that delivers 6x to 17x faster performance than CPU-only BLAS libraries)
Sparse Matrix Operations (cuSPARSE): 稀疏矩阵计算标配(模型权重剪枝那里真用到了);(GPU-accelerated linear algebra subroutines for sparse matrices that deliver up to 8x faster performance than CPU BLAS (MKL), ideal for applications such as natural language processing)