awesome-machine-learning-on-source-code

授权协议 CC-BY-SA-4.0 License
开发语言 Python
所属分类 神经网络/人工智能、 机器学习/深度学习
软件类型 开源软件
地区 不详
投 递 者 濮阳默
操作系统 跨平台
开源组织
适用人群 未知
 软件概览

Awesome Machine Learning On Source Code

Notice: This repository is no longer actively maintained, and no further updates will be done, nor issues/PRs will be answered or attended.An alternative actively maintained can be found at ml4code.github.io repository.

A curated list of awesome research papers, datasets and software projects devoted to machine learning and source code. #MLonCode

Contents

  • Posts
  • Talks
  • Software
  • Datasets
  • Credits
  • Contributions
  • License
  • Digests

    Conferences

    Competitions

    • CodRep - competition on automatic program repair: given a source line, find the insertion point.

    Papers

    Program Synthesis and Induction

    Source Code Analysis and Language modeling

    Neural Network Architectures and Algorithms

    Embeddings in Software Engineering

    Program Translation

    Code Suggestion and Completion

    Program Repair and Bug Detection

    APIs and Code Mining

    Code Optimization

    Topic Modeling

    Sentiment Analysis

    Code Summarization

    Clone Detection

    Differentiable Interpreters

    Related research

    AST Differencing

    Binary Data Modeling

    Soft Clustering Using T-mixture Models

    Natural Language Parsing and Comprehension

    Posts

    Talks

    Software

    Machine Learning

    • Differentiable Neural Computer (DNC) - TensorFlow implementation of the Differentiable Neural Computer.
    • sourced.ml - Abstracts feature extraction from source code syntax trees and working with ML models.
    • vecino - Finds similar Git repositories.
    • apollo - Source code deduplication as scale, research.
    • gemini - Source code deduplication as scale, production.
    • enry - Insanely fast file based programming language detector.
    • hercules - Git repository mining framework with batteries on top of go-git.
    • DeepCS - Keras and Pytorch implementations of DeepCS (Deep Code Search).
    • Code Neuron - Recurrent neural network to detect code blocks in natural language text.
    • Naturalize - Language agnostic framework for learning coding conventions from a codebase and then expoiting this information for suggesting better identifier names and formatting changes in the code.
    • Extreme Source Code Summarization - Convolutional attention neural network that learns to summarize source code into a short method name-like summary by just looking at the source code tokens.
    • Summarizing Source Code using a Neural Attention Model - CODE-NN, uses LSTM networks with attention to produce sentences that describe C# code snippets and SQL queries from StackOverflow. Torch over C#/SQL
    • Probabilistic API Miner - Near parameter-free probabilistic algorithm for mining the most interesting API patterns from a list of API call sequences.
    • Interesting Sequence Miner - Novel algorithm that mines the most interesting sequences under a probabilistic model. It is able to efficiently infer interesting sequences directly from the database.
    • TASSAL - Tool for the automatic summarization of source code using autofolding. Autofolding automatically creates a summary of a source code file by folding non-essential code and comment blocks.
    • JNice2Predict - Efficient and scalable open-source framework for structured prediction, enabling one to build new statistical engines more quickly.
    • Clone Digger - clone detection for Python and Java.
    • Sensibility - Uses LSTMs to detect and correct syntax errors in Java source code.
    • DeepBugs - Framework for learning bug detectors from an existing code corpus.
    • DeepSim - a deep learning-based approach to measure code functional similarity.
    • rnn-autocomplete - Neural code autocompletion with RNN (bachelor's thesis).
    • MindsDB - MindsDB is an Explainable AutoML framework for developers. With MindsDB you can build, train and use state of the art ML models in as simple as one line of code.

    Utilities

    • go-git - Highly extensible Git implementation in pure Go which is friendly to data mining.
    • bblfsh - Self-hosted server for source code parsing.
    • engine - Scalable and distributed data retrieval pipeline for source code.
    • minhashcuda - Weighted MinHash implementation on CUDA to efficiently find duplicates.
    • kmcuda - k-means on CUDA to cluster and to search for nearest neighbors in dense space.
    • wmd-relax - Python package which finds nearest neighbors at Word Mover's Distance.
    • Tregex, Tsurgeon and Semgrex - Tregex is a utility for matching patterns in trees, based on tree relationships and regular expression matches on nodes (the name is short for "tree regular expressions").
    • source{d} models - Machine Learning models for MLonCode trained using the source{d} stack.

    Datasets

    Credits

    Contributions

    See CONTRIBUTING.md. TL;DR: create a pull request which is signed off.

    License

     相关资料
    • Awesome production machine learning This repository contains a curated list of awesome open source libraries that will help you deploy, monitor, version, scale, and secure your production machine lear

    • 学习意味着通过学习或经验获得知识或技能。 基于此,我们可以定义机器学习(ML)如下 - 它可以被定义为计算机科学领域,更具体地说是人工智能的应用,其为计算机系统提供了学习数据和从经验改进而无需明确编程的能力。 基本上,机器学习的主要焦点是允许计算机自动学习而无需人为干预。 现在问题是如何开始和完成这种学习? 它可以从数据的观察开始。 数据可以是一些示例,指令或一些直接经验。 然后在此输入的基础上,

    • Machine Learning This project provides a web-interface,as well as a programmatic-apifor various machine learning algorithms. Supported algorithms: Support Vector Machine (SVM) Support Vector Regressio

    • 深度学习 我们可以在Personal Computer上完成庞大的任务 深度学习是一种适应于各类问题的万能药 神经网络 神经网络出现于80年代,但当时计算机运行慢,数据集很小,神经网络不适用 现在神经网络回来了,因为能够进行GPU计算,可用使用的数据集也变大 分类 分类的一些讨论可以在这个项目里看到 Machine Learning不仅是Classification!但分类是机器学习的核心。 学会

    • Machine Learning Projects This repository contains mini projects in machine learning with jupyter notebook files.Go to the projects folder and see the readme for detailed instructions about the projec

    • Machine Learning for OpenCV This is the Jupyter notebook version of the following book: Michael Beyeler Machine Learning for OpenCV Intelligent Image Processing with Python 14 July 2017 Packt Publishi