For more tasks, datasets and results in Chinese, check out the Chinese NLP website.
This document aims to track the progress in Natural Language Processing (NLP) and give an overviewof the state-of-the-art (SOTA) across the most common NLP tasks and their corresponding datasets.
It aims to cover both traditional and core NLP tasks such as dependency parsing and part-of-speech taggingas well as more recent ones such as reading comprehension and natural language inference. The main objectiveis to provide the reader with a quick overview of benchmark datasets and the state-of-the-art for theirtask of interest, which serves as a stepping stone for further research. To this end, if there is aplace where results for a task are already published and regularly maintained, such as a public leaderboard,the reader will be pointed there.
If you want to find this document again in the future, just go to nlpprogress.com
or nlpsota.com
in your browser.
Results Results reported in published papers are preferred; an exception may be made for influential preprints.
Datasets Datasets should have been used for evaluation in at least one published paper besidesthe one that introduced the dataset.
Code We recommend to add a link to an implementationif available. You can add a Code
column (see below) to the table if it does not exist.In the Code
column, indicate an official implementation with Official.If an unofficial implementation is available, use Link (see below).If no implementation is available, you can leave the cell empty.
If you would like to add a new result, you can just click on the small edit button in the top-rightcorner of the file for the respective task (see below).
This allows you to edit the file in Markdown. Simply add a row to the corresponding table in thesame format. Make sure that the table stays sorted (with the best result on top).After you've made your change, make sure that the table still looks ok by clicking on the"Preview changes" tab at the top of the page. If everything looks good, go to the bottom of the page,where you see the below form.
Add a name for your proposed change, an optional description, indicate that you would like to"Create a new branch for this commit and start a pull request", and click on "Propose file change".
For adding a new dataset or task, you can also follow the steps above. Alternatively, you can fork the repository.In both cases, follow the steps below:
Score
.Model | Score | Paper / Source | Code |
---|---|---|---|
These are tasks and datasets that are still missing:
You can extract all the data into a structured, machine-readable JSON format with parsed tasks, descriptions and SOTA tables.
The instructions are in structured/README.md.
Instructions for building the website locally using Jekyll can be found here.
NLP-progress: http://nlpprogress.com/ https://github.com/sebastianruder/NLP-progress decanlp: http://decanlp.com/
Github https://github.com/sebastianruder/NLP-progress 官方网址 https://nlpprogress.com/ NLP-Progress 同时涵盖了传统的NLP任务,如依赖解析和词性标注,和一些新的任务,如阅读理解和自然语言推理。它的不仅为读者提供这些任务的 baseline 和 标准数据集,还记录了这些问题的state-of-the-a
RNN百度百科 循环神经网络(Recurrent Neural Network, RNN)是一类以序列(sequence)数据为输入,在序列的演进方向进行递归(recursion)且所有节点(循环单元)按链式连接的递归神经网络(recursive neural network)。 对循环神经网络的研究始于二十世纪80-90年代,并在二十一世纪初发展为深度学习(deep learning)算法之一,
本文整理了 GitHub 上 11 个 NLP 相关项目。包含 NLP 的最近前沿进展、学习路径、基准语料库、面试必备理论知识等。无论是入门,还是精进 NLP ,这些项目足以满足你的需求!收藏本文慢慢学习吧。 最近进展梳理: NLP-progress https://github.com/sebastianruder/NLP-progress 跟踪 NLP 最新进展。整理常见 NLP 任务的 SO
NLP知乎大佬 自然语言处理怎么最快入门? - 刘知远的回答 - 知乎 NLP(自然语言处理)界有哪些神级人物? - jiangfeng的回答 - 知乎 有关NLP的比赛 - 砍手豪的文章 - 知乎 目前常用的自然语言处理开源项目/开发包有哪些? - 刘知远的回答 - 知乎 常见论文网站(杰20190706) Github:代码管理、开源网站 EMNLP:会议论文 ACL:会议论文 Philipp
2020/08/07 - 这部分内容是我之前在简书的草稿,没有完整整理,后续将进行完整整理。 2020/06/17 - 这两天在学习word2vec模型原理的时候,看到了两篇文章,一篇是关于使用word2vec进行情感分类,另一篇是通过tf-idf这种方式运用LDA或者直接kmeans进行主题分类。 从这两篇文章中,相对与自然语言处理的模型来说, 我感觉,我学到的是更多的分析的过程和可视化的内容,
推荐算法已经死了,而且没有出路,一线饱和,二线不需要,三线更不需要,而NLP则是一二线都有的坑,不矛盾,NLP也可辅助做好推荐,但NLP的路子更宽了。二线中需要CV,NLP,但没听说有要推荐方面的,搜索都没有,别提多难熬了。仰天大笑出门去,我辈岂是蓬蒿人。 1,以text8数据集为例,其数据全是text,如下:未见标点符号 head text8 anarchism originated as
自然语言处理NLP星空智能对话机器人系列:深入理解Transformer自然语言处理 GLUE Winograd schemas and NER Winograd schemas Winograd模式 Winograd模式是一对句子,它们只有一两个词不同,其中包含的歧义在两个句子中以相反的方式解决,需要使用世界知识和推理来解决。该模式以Terry Winograd的一个著名示例命名。 The ci
NLP Architect 是一个开源的 Python 库,用于探索最先进的深度学习拓扑结构和技术,以优化自然语言处理和自然语言理解神经网络。NLP Architect 的设计是为了灵活地添加新的模型、神经网络组件、数据处理方法,并方便训练和运行模型。 特点 新颖的 NLU 模型展示了新颖的拓扑结构和技术 优化的 NLP/NLU 模型,展示了神经 NLP/NLU 模型的不同优化算法 面向模型的设计
nlp-lang 文档地址:http://www.nlpcn.org/docs/7 部分演示:http://www.nlpcn.org/demo MAVEN <dependencies> <dependency> <groupId>org.nlpcn</groupId> <artifactId>nlp-lang</artifactId> <versi
A Hands-on Introduction to Natural Language Processing (NLP) About this course This course was created by Prof. Mohammad Ghassemi in Fall of 2020 as part of the CSE 842 class at Michigan State Univers
项目介绍 此项目是机器学习、NLP面试中常考到的知识点和代码实现,也是作为一个算法工程师必会的理论基础知识。 既然是以面试为主要目的,亦不可以篇概全,请谅解,有问题可提出。 此项目以各个模块为切入点,让大家有一个清晰的知识体系。 此项目亦可拿来常读、常记以及面试时复习之用。 每一章里的问题都是面试时有可能问到的知识点,如有遗漏可联系我进行补充,结尾处都有算法的实战代码案例。 思维导图,请关注 AI
awesome-nlp A curated list of resources dedicated to Natural Language Processing Read this in English, Traditional Chinese Please read the contribution guidelines before contributing. Please add your
nlp-tutorial nlp-tutorial is a tutorial for who is studying NLP(Natural Language Processing) using Pytorch. Most of the models in NLP were implemented with less than 100 lines of code.(except comments