machine-learning-collection

授权协议 MIT License
开发语言 Python
所属分类 神经网络/人工智能、 机器学习/深度学习
软件类型 开源软件
地区 不详
投 递 者 储臻
操作系统 跨平台
开源组织
适用人群 未知
 软件概览

Machine Learning Collection

Microsoft contributing libraries, tools, recipes, sample codes and workshop contents for machine learning & deep learning.

Table of Contents


Boosting

  • LightGBM - A fast, distributed, high performance gradient boosting framework
  • Explainable Boosting Machines - interpretable model developed in Microsoft Research using bagging, gradient boosting, and automatic interaction detection to estimated generalized additive models.

AutoML

  • Neural Network Intelligence - An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
  • Archai - Reproducible Rapid Research for Neural Architecture Search (NAS).
  • FLAML - A fast and lightweight AutoML library.
  • Azure Automated Machine Learning - Automated Machine Learning for Tabular data (regression, classification and forecasting) by Azure Machine Learning

Neural Network

  • PyMarlin - Lightweight Deep Learning Model Training library based on PyTorch.
  • bayesianize - A Bayesian neural network wrapper in pytorch.
  • O-CNN - Octree-based convolutional neural networks for 3D shape analysis.
  • ResNet - deep residual network.
  • CNTK - microsoft cognitive toolkit (CNTK), open source deep-learning toolkit.
  • InfiniBatch - Efficient, check-pointed data loading for deep learning with massive data sets.
  • Models under Hugging Face - Microsoft shares transformer models at Hugging Face. 51 pretrained models (as of June 28, 2021).
  • Muzic - Music Understanding and Generation with Artificial Intelligence.

Graph & Network

  • graspologic - utilities and algorithms designed for the processing and analysis of graphs with specialized graph statistical algorithms.
  • TF Graph Neural Network Samples - tensorFlow implementations of graph neural networks.
  • ptgnn - PyTorch Graph Neural Network Library
  • StemGNN - spectral temporal graph neural network (StemGNN) for multivariate time-series forecasting.
  • SPTAG - a distributed approximate nearest neighborhood search (ANN) library.

Vision

  • Microsoft Vision Model ResNet50 - a large pretrained vision ResNet-50 model using search engine's web-scale image data.
  • Oscar - Object-Semantics Aligned Pre-training for Vision-Language Tasks.
  • TorchGeo - a PyTorch domain library, similar to torchvision, that provides datasets, transforms, samplers, and pre-trained models specific to geospatial data.
  • Swin Transformer - an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".

Time Series

NLP

  • T-ULRv2 - Turing multilingual language model.
  • Turing-NLG - Turing Natural Language Generation, 17 billion-parameter language model.
  • DeBERTa - Decoding-enhanced BERT with Disentangled Attention
  • UniLM - Unified Language Model Pre-training / Pre-training for NLP and Beyond
  • Unicoder - Unicoder model for understanding and generation.
  • NeuronBlocks - building your nlp dnn models like playing lego
  • Multilingual Model Transfer - new deep learning models for bootstrapping language understanding models for languages with no labeled data using labeled data from other languages.
  • MT-DNN - multi-task deep neural networks for natural language understanding.
  • inmt - interactive neural machine trainslation-lite
  • OpenKP - automatically extracting keyphrases that are salient to the document meanings is an essential step in semantic document understanding.
  • DeText - a deep neural text understanding framework for ranking and classification tasks.
  • Genalog - an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and text alignment capabilities.
  • FastFormers - highly efficient transformer models for NLU.
  • VERSEAGILITY - a Python-based toolkit to ramp up your custom natural language processing (NLP) task, allowing you to bring your own data and bring models into production. It is a central component of the Microsoft Data Science Toolkit.
  • DPU Utilities - Utilities used by the Deep Program Understanding team.

Online Machine Learning

  • Vowpal Wabbit - fast, efficient, and flexible online machine learning techniques for reinforcement learning, supervised learning, and more.

Recommendation

  • Recommenders - examples and best practics for building recommendation systems (A2SVD, DKN, xDeepFM, LightGBM, LSTUR, NAML, NPA, NRMS, RLRMC, SAR, Vowpal Wabbit are invented/contributed by Microsoft).
  • GDMIX - A deep ranking personalization framework
  • rankerEval - A fast numpy-based implementation of ranking metrics for information retrieval and recommendation.

Distributed

  • DeepSpeed - DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.
  • MMLSpark - machine learning library on spark.
  • photon-ml - a scalable machine learning library on apache spark.
  • TonY - framework to natively run deep learning frameworks on apache hadoop.
  • isolation-forest - A Spark/Scala implementation of the isolation forest unsupervised outlier detection algorithm.

Causal Inference

  • EconML - Python package for estimating heterogeneous treatment effects from observational data via machine learning.
  • DoWhy - Python library for causal inference that supports explicit modeling and testing of causal assumptions.

Responsible AI

  • InterpretML - a toolkit to help understand models and enable responsbile machine learning.
    • Interpret Community - extends interpret repo with additional interpretability techniques and utility functions.
    • DiCE - diverse counterfactual explanations.
    • Interpret-Text - state-of-the-art explainers for text-based ml models and visualize with dashboard.
  • fairlearn - python package to assess and improve fairness of machine learning models.
  • LiFT - linkedin fairness toolkit.
  • RobustDG - Toolkit for building machine learning models that generalize to unseen domains and are robust to privacy and other attacks.
  • SHAP - a game theoretic approach to explain the output of any machine learning model (scott lundbert, Microsoft Research).
  • LIME - explaining the predictions of any machine learning classifier (Marco, Microsoft Research).
  • BackwardCompatibilityML - Project for open sourcing research efforts on Backward Compatibility in Machine Learning
  • confidential-ml-utils - Python utilities for training and deploying ML models against data you can't see.
  • presidio - context aware, pluggable and customizable data protection and anonymization service for text and images.
    • Presidio-research - This package features data-science related tasks for developing new recognizers for Presidio.
  • Confidential ONNX Inference Server - An Open Enclave port of the ONNX inference server with data encryption and attestation capabilities to enable confidential inference on Azure Confidential Computing.
  • Responsible-AI-Widgets - responsible AI user interfaces for Fairlearn, interpret-community, and Error Analysis, as well as foundational building blocks that they rely on.
  • Error Analysis - A toolkit to help analyze and improve model accuracy.
  • Secure Data Sandbox - A toolkit for conducting machine learning trials against confidential data.
  • shrike - Python utilities to aid "compliant experiment" in Azure Machine Learning - training ML models without seeing the training data.

Optimization

  • ONNXRuntime - cross-platfom, high performance ML inference and training accelerator.
  • Hummingbird - compile trained ml model into tensor computation for faster inference.
  • EdgeML - provides code for machine learning algorithms for edge devices developed at Microsoft Research India.
  • DirectML - high-performance, hardware-accelerated DirectX 12 library for machine learning.
  • MMdnn - MMdnn is a set of tools to help users inter-operate among different deep learning frameworks. E.g. model conversion and visualization.
  • inifinibatch - Efficient, check-pointed data loading for deep learning with massive data sets.
  • InferenceSchema - Schema decoration for inference code
  • nnfusion - flexible and efficient deep neural network compiler.

Reinforcement Learning

  • AirSim - open source simulator for autonomous vehicles build on unreal engine / unity from microsoft research.
  • TextWorld - TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.
  • Moab - Project Moab, a new open-source balancing robot to help engineers and developers learn how to build real-world autonomous control systems with Project Bonsai.
  • MARO - multi-agent resource optimization (MARO) platfom.
  • Training Data-Driven or Surrogate Simulators - build simulation from data for use in RL and Bonsai platform for machine teaching.
  • Bonsai - low code industrial machine teaching platform.
    • Bonsai Python SDK - A python library for integrating data sources with Bonsai BRAIN.

Security

  • counterfit - a CLI that provides a generic automation layer for assessing the security of ML models.

Windows

Datasets

  • COCO Dataset - COCO is a large-scale object detection, segmentation, and captioning dataset.
  • MS MARCO - collection of datasets focused on deep learning in search.
  • InnerEye CreateDataset - InnerEye dataset creation tool for InnerEye-DeepLearning library. Transforms DICOM data into mask for training Deep Learning models.
  • Sepsis Cohort from MIMIC III - Sepsis cohort from MIMIC dataset.
  • MIND : Microsoft News Dataset - a large-scale dataset for news recommendation research.
  • Dataset for AI for Earth - AIForEarthDataSets is a collection of datasets for AI research.
  • ORBIT - a collection of videos of objects in clean and cluttered scenes recorded by people who are blind/low-vision on a mobile phone.

Debug & Benchmark

  • tensorwatch - debugging, monitoring and visualization for python machine learning and data science.
  • PYRIGHT - static type checker for python.
  • Bench ML - Python library to benchmark popular pre-built cloud AI APIs.
  • debugpy - An implementation of the Debug Adapter Protocol for Python
  • kineto - A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters contributed by Azure AI Platform team.
  • SuperBenchmark - a benchmarking and diagnosis tool for AI infrastructure (software & hardware).

Pipeline

  • GitHub Actions - Automate all your software workflows, now with world-class CI/CD. Build, test, and deploy your code right from GitHub.
  • Azure Pipelines - Automate your builds and deployments with Pipelines so you spend less time with the nuts and bolts and more time being creative.
  • Dagli - framework for defining machine learning models, including feature generation and transformations as DAG.

Platform

  • AI for Earth API Platform - distributed infrastructure designed to provide a secure, scalable, and customizable API hosting, designed to handle the needs of long-running/asynchronous machine learning model inference.

  • Open Platfom for AI (OpenPAI) - resource scheduling and cluster management for AI.

    • OpenPAI Runtime - Runtime for deep learning workload.
    • OpenPAI Protocol - OpenPAI protocol enables job sharing and portability.
    • Openpaimarketplace - A marketplace which stores examples and job templates of openpai.
    • OpenPAI FrameworkController - built to orchestrate all kinds of applications on Kubernetes by a single controller.
    • HivedDScheduler - Kubernetes Scheduler for Deep Learning.
    • OpenPAI JS SDK - The JavaScript SDK is designed to facilitate the developers of OpenPAI to offer user friendly experience.
    • OpenPAI VS Code Client - Extension to connect OpenPAI clusters, submit AI jobs, simulate jobs locally, manage files, and so on.
  • MLOS - Data Science powered infrastructure and methodology to democratize and automate Performance Engineering.

  • Platform for Situated Intelligence - an open-source framework for multimodal, integrative AI.

  • Qlib - an AI-oriented quantitative investment platform.

Tagging

  • TagAnomaly - Anomaly detection analysis and labeling tool, specifically for multiple time series (one time series per category)
  • VoTT - Visual object tagging tool

Developer tool

  • Visual Studio Code - Code editor redefined and optimized for building and debugging modern web and cloud applications.
  • Gather - adds gather functionality in the Python language to the Jupyter Extension.
  • Pylance - an extension that works alongside Python in Visual Studio Code to provide performant language support.
  • Azure ML Snippets - VSCode snippets for Azure Machine Learning

Sample Code

Community

  • AI@Edge Community - find the resources you need to create solutions using intelligence at the edge through combinations of hardware, machine learning (ML), artificial intelligence (AI) and Microsoft Azure service.
  • Global AI Community - empowers developers who are passionate about AI to share knowledge through events and meetups.
  • Deep Learning Lab (Japan) - provides information on development cases and the latest technology trends related to deep learning.

Workshop

Competition

Book

Learning

Blog, News & Webinar


---

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to aContributor License Agreement (CLA) declaring that you have the right to, and actually do, grant usthe rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to providea CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructionsprovided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct.For more information see the Code of Conduct FAQ orcontact opencode@microsoft.com with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsofttrademarks or logos is subject to and must followMicrosoft's Trademark & Brand Guidelines.Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship.Any use of third-party trademarks or logos are subject to those third-party's policies.

  • COURSERA Applied-Machine-Learning-in-Python assignment4 因为直接下载的notebook保存了, csdn没法直接上传notebook,放在github了。嘻嘻从零开始学习,希望大家一起加油吧,在csdn学习到了很多知识,现在小白也来输出一点奶! https://github.com/tolozheng/Applied-Machine-Lear

  • Let’s start by telling the truth: machines don’t learn. What a typical “learning machine” does, is finding a mathematical formula, which, when applied to a collection of inputs (called “training data”

  • 1. cs 299: Machine Learning 2. cs 231n: Convolutional Neural Networks for Visual Recognition 3. cs 230: Deep learning 4. cs 224d: Deep Learning for Natural Language Processing 5.MIT:Mathematics for Ma

  • Octave简单知识![在这里插入图片描述](https://img-blog.csdnimg.cn/20181030212910693.gif) 对于机器学习的原型创建来说,我们并不推荐使用Java或是python这些细化的编程语言,针对机器学习而优化的Octave将能够帮助我们迅速建立原型并进行一定的测试,在测试成功后再使用Java等语言进行实现将很大的提高效率,下面介绍一些简单的Octave

  • Machine Learning(1)Collect Documents 1. Introduction Input Data —> Feature Representation —>Learning Algorithm Deep Learning —> UnsupervisedFeature Learning Example from Picture Learning, how to judge

 相关资料
  • 学习意味着通过学习或经验获得知识或技能。 基于此,我们可以定义机器学习(ML)如下 - 它可以被定义为计算机科学领域,更具体地说是人工智能的应用,其为计算机系统提供了学习数据和从经验改进而无需明确编程的能力。 基本上,机器学习的主要焦点是允许计算机自动学习而无需人为干预。 现在问题是如何开始和完成这种学习? 它可以从数据的观察开始。 数据可以是一些示例,指令或一些直接经验。 然后在此输入的基础上,

  • Machine Learning This project provides a web-interface,as well as a programmatic-apifor various machine learning algorithms. Supported algorithms: Support Vector Machine (SVM) Support Vector Regressio

  • 深度学习 我们可以在Personal Computer上完成庞大的任务 深度学习是一种适应于各类问题的万能药 神经网络 神经网络出现于80年代,但当时计算机运行慢,数据集很小,神经网络不适用 现在神经网络回来了,因为能够进行GPU计算,可用使用的数据集也变大 分类 分类的一些讨论可以在这个项目里看到 Machine Learning不仅是Classification!但分类是机器学习的核心。 学会

  • Machine Learning Projects This repository contains mini projects in machine learning with jupyter notebook files.Go to the projects folder and see the readme for detailed instructions about the projec

  • Machine Learning for OpenCV This is the Jupyter notebook version of the following book: Michael Beyeler Machine Learning for OpenCV Intelligent Image Processing with Python 14 July 2017 Packt Publishi

  • Machine Learning and Data Science Applications in Industry Sov.ai Research Lab (Sponsorship) Animated Investment Management Research at Sov.ai — Sponsoring open source AI, Machine learning, and Data S