NLP-progress

授权协议 MIT License
开发语言 Python
所属分类 神经网络/人工智能、 机器学习/深度学习
软件类型 开源软件
地区 不详
投 递 者 叶鹭洋
操作系统 跨平台
开源组织
适用人群 未知
 软件概览

Tracking Progress in Natural Language Processing

Table of contents

English

Vietnamese

Hindi

Chinese

For more tasks, datasets and results in Chinese, check out the Chinese NLP website.

French

Russian

Spanish

Portuguese

Korean

Nepali

Bengali

Persian

Turkish

German

This document aims to track the progress in Natural Language Processing (NLP) and give an overviewof the state-of-the-art (SOTA) across the most common NLP tasks and their corresponding datasets.

It aims to cover both traditional and core NLP tasks such as dependency parsing and part-of-speech taggingas well as more recent ones such as reading comprehension and natural language inference. The main objectiveis to provide the reader with a quick overview of benchmark datasets and the state-of-the-art for theirtask of interest, which serves as a stepping stone for further research. To this end, if there is aplace where results for a task are already published and regularly maintained, such as a public leaderboard,the reader will be pointed there.

If you want to find this document again in the future, just go to nlpprogress.comor nlpsota.com in your browser.

Contributing

Guidelines

Results   Results reported in published papers are preferred; an exception may be made for influential preprints.

Datasets   Datasets should have been used for evaluation in at least one published paper besidesthe one that introduced the dataset.

Code   We recommend to add a link to an implementationif available. You can add a Code column (see below) to the table if it does not exist.In the Code column, indicate an official implementation with Official.If an unofficial implementation is available, use Link (see below).If no implementation is available, you can leave the cell empty.

Adding a new result

If you would like to add a new result, you can just click on the small edit button in the top-rightcorner of the file for the respective task (see below).

This allows you to edit the file in Markdown. Simply add a row to the corresponding table in thesame format. Make sure that the table stays sorted (with the best result on top).After you've made your change, make sure that the table still looks ok by clicking on the"Preview changes" tab at the top of the page. If everything looks good, go to the bottom of the page,where you see the below form.

Add a name for your proposed change, an optional description, indicate that you would like to"Create a new branch for this commit and start a pull request", and click on "Propose file change".

Adding a new dataset or task

For adding a new dataset or task, you can also follow the steps above. Alternatively, you can fork the repository.In both cases, follow the steps below:

  1. If your task is completely new, create a new file and link to it in the table of contents above.
  2. If not, add your task or dataset to the respective section of the corresponding file (in alphabetical order).
  3. Briefly describe the dataset/task and include relevant references.
  4. Describe the evaluation setting and evaluation metric.
  5. Show how an annotated example of the dataset/task looks like.
  6. Add a download link if available.
  7. Copy the below table and fill in at least two results (including the state-of-the-art)for your dataset/task (change Score to the metric of your dataset). If your dataset/taskhas multiple metrics, add them to the right of Score.
  8. Submit your change as a pull request.
Model Score Paper / Source Code

Wish list

These are tasks and datasets that are still missing:

  • Bilingual dictionary induction
  • Discourse parsing
  • Keyphrase extraction
  • Knowledge base population (KBP)
  • More dialogue tasks
  • Semi-supervised learning
  • Frame-semantic parsing (FrameNet full-sentence analysis)

Exporting into a structured format

You can extract all the data into a structured, machine-readable JSON format with parsed tasks, descriptions and SOTA tables.

The instructions are in structured/README.md.

Instructions for building the site locally

Instructions for building the website locally using Jekyll can be found here.

  • NLP-progress: http://nlpprogress.com/ https://github.com/sebastianruder/NLP-progress decanlp: http://decanlp.com/

  • Github https://github.com/sebastianruder/NLP-progress 官方网址 https://nlpprogress.com/   NLP-Progress 同时涵盖了传统的NLP任务,如依赖解析和词性标注,和一些新的任务,如阅读理解和自然语言推理。它的不仅为读者提供这些任务的 baseline 和 标准数据集,还记录了这些问题的state-of-the-a

  • RNN百度百科 循环神经网络(Recurrent Neural Network, RNN)是一类以序列(sequence)数据为输入,在序列的演进方向进行递归(recursion)且所有节点(循环单元)按链式连接的递归神经网络(recursive neural network)。 对循环神经网络的研究始于二十世纪80-90年代,并在二十一世纪初发展为深度学习(deep learning)算法之一,

  • 本文整理了 GitHub 上 11 个 NLP 相关项目。包含 NLP 的最近前沿进展、学习路径、基准语料库、面试必备理论知识等。无论是入门,还是精进 NLP ,这些项目足以满足你的需求!收藏本文慢慢学习吧。 最近进展梳理: NLP-progress https://github.com/sebastianruder/NLP-progress 跟踪 NLP 最新进展。整理常见 NLP 任务的 SO

  • NLP知乎大佬 自然语言处理怎么最快入门? - 刘知远的回答 - 知乎 NLP(自然语言处理)界有哪些神级人物? - jiangfeng的回答 - 知乎 有关NLP的比赛 - 砍手豪的文章 - 知乎 目前常用的自然语言处理开源项目/开发包有哪些? - 刘知远的回答 - 知乎 常见论文网站(杰20190706) Github:代码管理、开源网站 EMNLP:会议论文 ACL:会议论文 Philipp

  • 2020/08/07 - 这部分内容是我之前在简书的草稿,没有完整整理,后续将进行完整整理。 2020/06/17 - 这两天在学习word2vec模型原理的时候,看到了两篇文章,一篇是关于使用word2vec进行情感分类,另一篇是通过tf-idf这种方式运用LDA或者直接kmeans进行主题分类。 从这两篇文章中,相对与自然语言处理的模型来说, 我感觉,我学到的是更多的分析的过程和可视化的内容,

  • 推荐算法已经死了,而且没有出路,一线饱和,二线不需要,三线更不需要,而NLP则是一二线都有的坑,不矛盾,NLP也可辅助做好推荐,但NLP的路子更宽了。二线中需要CV,NLP,但没听说有要推荐方面的,搜索都没有,别提多难熬了。仰天大笑出门去,我辈岂是蓬蒿人。 1,以text8数据集为例,其数据全是text,如下:未见标点符号 head text8 anarchism originated as

  • 自然语言处理NLP星空智能对话机器人系列:深入理解Transformer自然语言处理 GLUE Winograd schemas and NER Winograd schemas Winograd模式 Winograd模式是一对句子,它们只有一两个词不同,其中包含的歧义在两个句子中以相反的方式解决,需要使用世界知识和推理来解决。该模式以Terry Winograd的一个著名示例命名。 The ci

 相关资料
  • NLP Architect 是一个开源的 Python 库,用于探索最先进的深度学习拓扑结构和技术,以优化自然语言处理和自然语言理解神经网络。NLP Architect 的设计是为了灵活地添加新的模型、神经网络组件、数据处理方法,并方便训练和运行模型。 特点 新颖的 NLU 模型展示了新颖的拓扑结构和技术 优化的 NLP/NLU 模型,展示了神经 NLP/NLU 模型的不同优化算法 面向模型的设计

  • nlp-lang 文档地址:http://www.nlpcn.org/docs/7 部分演示:http://www.nlpcn.org/demo MAVEN <dependencies>    <dependency>        <groupId>org.nlpcn</groupId>        <artifactId>nlp-lang</artifactId>        <versi

  • A Hands-on Introduction to Natural Language Processing (NLP) About this course This course was created by Prof. Mohammad Ghassemi in Fall of 2020 as part of the CSE 842 class at Michigan State Univers

  • 项目介绍 此项目是机器学习、NLP面试中常考到的知识点和代码实现,也是作为一个算法工程师必会的理论基础知识。 既然是以面试为主要目的,亦不可以篇概全,请谅解,有问题可提出。 此项目以各个模块为切入点,让大家有一个清晰的知识体系。 此项目亦可拿来常读、常记以及面试时复习之用。 每一章里的问题都是面试时有可能问到的知识点,如有遗漏可联系我进行补充,结尾处都有算法的实战代码案例。 思维导图,请关注 AI

  • awesome-nlp A curated list of resources dedicated to Natural Language Processing Read this in English, Traditional Chinese Please read the contribution guidelines before contributing. Please add your

  • nlp-tutorial nlp-tutorial is a tutorial for who is studying NLP(Natural Language Processing) using Pytorch. Most of the models in NLP were implemented with less than 100 lines of code.(except comments