- |
Language Models are Unsupervised Multitask Learners |
N/A |
TF Pytorch, TF2.0 Keras |
GPT-2(117M, 124M, 345M, 355M, 774M, 1558M) |
2017/08 |
Learned in Translation: Contextualized Word Vectors |
524 |
Pytorch Keras |
CoVe |
2018/01 |
Universal Language Model Fine-tuning for Text Classification |
167 |
Pytorch |
ULMFit(English, Zoo) |
2018/02 |
Deep contextualized word representations |
999+ |
Pytorch TF |
ELMO(AllenNLP, TF-Hub) |
2018/04 |
Efficient Contextualized Representation:Language Model Pruning for Sequence Labeling |
26 |
Pytorch |
LD-Net |
2018/07 |
Towards Better UD Parsing: Deep Contextualized Word Embeddings, Ensemble, and Treebank Concatenation |
120 |
Pytorch |
ELMo |
2018/08 |
Direct Output Connection for a High-Rank Language Model |
24 |
Pytorch |
DOC |
2018/10 |
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding |
999+ |
TF Keras Pytorch, TF2.0 MXNet PaddlePaddle TF Keras |
BERT(BERT, ERNIE, KoBERT) |
2018/?? |
Contextual String Embeddings for Sequence Labeling |
486 |
Pytorch |
Flair |
2018/?? |
Improving Language Understanding by Generative Pre-Training |
999+ |
TF Keras Pytorch, TF2.0 |
GPT |
2019/01 |
Multi-Task Deep Neural Networks for Natural Language Understanding |
364 |
Pytorch |
MT-DNN |
2019/01 |
BioBERT: pre-trained biomedical language representation model for biomedical text mining |
634 |
TF |
BioBERT |
2019/01 |
Cross-lingual Language Model Pretraining |
639 |
Pytorch Pytorch, TF2.0 |
XLM |
2019/01 |
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context |
754 |
TF Pytorch Pytorch, TF2.0 |
Transformer-XL |
2019/02 |
Efficient Contextual Representation Learning Without Softmax Layer |
2 |
Pytorch |
- |
2019/03 |
SciBERT: Pretrained Contextualized Embeddings for Scientific Text |
124 |
Pytorch, TF |
SciBERT |
2019/04 |
Publicly Available Clinical BERT Embeddings |
229 |
Text |
clinicalBERT |
2019/04 |
ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission |
84 |
Pytorch |
ClinicalBERT |
2019/05 |
ERNIE: Enhanced Language Representation with Informative Entities |
210 |
Pytorch |
ERNIE |
2019/05 |
Unified Language Model Pre-training for Natural Language Understanding and Generation |
278 |
Pytorch |
UniLMv1(unilm1-large-cased, unilm1-base-cased) |
2019/05 |
HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization |
81 |
|
- |
2019/06 |
Pre-Training with Whole Word Masking for Chinese BERT |
98 |
Pytorch, TF |
BERT-wwm |
2019/06 |
XLNet: Generalized Autoregressive Pretraining for Language Understanding |
999+ |
TF Pytorch, TF2.0 |
XLNet |
2019/07 |
ERNIE 2.0: A Continual Pre-training Framework for Language Understanding |
107 |
PaddlePaddle |
ERNIE 2.0 |
2019/07 |
SpanBERT: Improving Pre-training by Representing and Predicting Spans |
282 |
Pytorch |
SpanBERT |
2019/07 |
RoBERTa: A Robustly Optimized BERT Pretraining Approach |
999+ |
Pytorch Pytorch, TF2.0 |
RoBERTa |
2019/09 |
Subword ELMo |
1 |
Pytorch |
- |
2019/09 |
Knowledge Enhanced Contextual Word Representations |
115 |
|
- |
2019/09 |
TinyBERT: Distilling BERT for Natural Language Understanding |
129 |
|
- |
2019/09 |
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism |
136 |
Pytorch |
Megatron-LM(BERT-345M, GPT-2-345M) |
2019/09 |
MultiFiT: Efficient Multi-lingual Language Model Fine-tuning |
29 |
Pytorch |
- |
2019/09 |
Extreme Language Model Compression with Optimal Subwords and Shared Projections |
32 |
|
- |
2019/09 |
MULE: Multimodal Universal Language Embedding |
5 |
|
- |
2019/09 |
Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks |
51 |
|
- |
2019/09 |
K-BERT: Enabling Language Representation with Knowledge Graph |
59 |
|
- |
2019/09 |
UNITER: Learning UNiversal Image-TExt Representations |
60 |
|
- |
2019/09 |
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations |
803 |
TF |
- |
2019/10 |
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension |
349 |
Pytorch |
BART(bart.base, bart.large, bart.large.mnli, bart.large.cnn, bart.large.xsum) |
2019/10 |
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter |
481 |
Pytorch, TF2.0 |
DistilBERT |
2019/10 |
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer |
696 |
TF |
T5 |
2019/11 |
CamemBERT: a Tasty French Language Model |
102 |
- |
CamemBERT |
2019/11 |
ZEN: Pre-training Chinese Text Encoder Enhanced by N-gram Representations |
15 |
Pytorch |
- |
2019/11 |
Unsupervised Cross-lingual Representation Learning at Scale |
319 |
Pytorch |
XLM-R (XLM-RoBERTa)(xlmr.large, xlmr.base) |
2020/01 |
ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training |
35 |
Pytorch |
ProphetNet(ProphetNet-large-16GB, ProphetNet-large-160GB) |
2020/02 |
CodeBERT: A Pre-Trained Model for Programming and Natural Languages |
25 |
Pytorch |
CodeBERT |
2020/02 |
UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training |
33 |
Pytorch |
- |
2020/03 |
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators |
203 |
TF |
ELECTRA(ELECTRA-Small, ELECTRA-Base, ELECTRA-Large) |
2020/04 |
MPNet: Masked and Permuted Pre-training for Language Understanding |
5 |
Pytorch |
MPNet |
2020/05 |
ParsBERT: Transformer-based Model for Persian Language Understanding |
1 |
Pytorch |
ParsBERT |
2020/05 |
Language Models are Few-Shot Learners |
382 |
- |
- |
2020/07 |
InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training |
12 |
Pytorch |
- |