bert-chainer

授权协议 Readme
开发语言 Python
所属分类 神经网络/人工智能、 自然语言处理
软件类型 开源软件
地区 不详
投 递 者 鲜于德泽
操作系统 跨平台
开源组织
适用人群 未知
 软件概览

Chainer implementation of Google AI's BERT model with a script to load Google's pre-trained models

This repository contains a Chainer reimplementation of Google's TensorFlow repository for the BERT model for the paper BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.

This implementation can load any pre-trained TensorFlow checkpoint for BERT (in particular Google's pre-trained models) and a conversion script is provided (see below).

In the current implementation, we can

  • build BertModel and load pre-trained checkpoints from TensorFlow
  • use BERT for sentence-level classification tasks (on GLUE) (run_classifier.py)
  • use BERT for token-level classification tasks (on SQuAD) (run_squad.py)
  • extract token-level multi-layer features from sentences (extract_features.py)

Not implemented:

This README follows the great README of PyTorch's BERT repository by the huggingface team.

Loading a TensorFlow checkpoint (e.g. Google's pre-trained models)

You can convert any TensorFlow checkpoint for BERT (in particular the pre-trained models released by Google) in a Chainer save file by using the convert_tf_checkpoint_to_chainer.py script.

This script takes as input a TensorFlow checkpoint (three files starting with bert_model.ckpt) and creates a Chainer model (npz file) for this configuration, so that we can load the models using chainer.serializers.load_npz() by Chainer. (see examples in run_classifier.py or run_squad.py)

You only need to run this conversion script once to get a Chainer model. You can then disregard the TensorFlow checkpoint (the three files starting with bert_model.ckpt) but be sure to keep the configuration file (bert_config.json) and the vocabulary file (vocab.txt) as these are needed for the Chainer model too.

To run this specific conversion script you will need to have TensorFlow and Chainer installed (pip install tensorflow). The rest of the repository only requires Chainer.

Here is an example of the conversion process for a pre-trained BERT-Base Uncased model:

export BERT_BASE_DIR=/path/to/bert/uncased_L-12_H-768_A-12

python convert_tf_checkpoint_to_chainer.py \
  --tf_checkpoint_path $BERT_BASE_DIR/bert_model.ckpt \
  --npz_dump_path $BERT_BASE_DIR/arrays_bert_model.ckpt.npz

You can download Google's pre-trained models for the conversion here.

Chainer models for BERT

We included two Chainer models in this repository that you will find in modeling.py:

  • BertModel - the basic BERT Transformer model
  • BertClassifier - the BERT model with a sequence classification head on top
  • BertSQuAD - the BERT model with a token classification head on top

Here are some details on each class.

1. BertModel

BertModel is the basic BERT Transformer model with a layer of summed token, position and sequence embeddings followed by a series of identical self-attention blocks (12 for BERT-base, 24 for BERT-large).

The inputs and output are identical to the TensorFlow model inputs and outputs.

We detail them here. This model takes as inputs:

  • input_ids: an int array of shape [batch_size, sequence_length] with the word token indices in the vocabulary (see the tokens preprocessing logic in the scripts run_classifier.py), and
  • token_type_ids: an optional int array of shape [batch_size, sequence_length] with the token types indices selected in [0, 1]. Type 0 corresponds to a sentence A and type 1 corresponds to a sentence B token (see BERT paper for more details).
  • attention_mask: an optional array of shape [batch_size, sequence_length] with indices selected in [0, 1]. It's a mask to be used if the input sequence length is smaller than the max input sequence length in the current batch. It's the mask that we typically use for attention when a batch has varying length sentences.

This model can return some kinds of outputs (by calling like .get_pooled_output()):

  • all_encoder_layers: a list of Variables of size [batch_size, sequence_length, hidden_size] which is a list of the full sequences of hidden-states at the end of each attention block (i.e. 12 full sequences for BERT-base, 24 for BERT-large)
  • pooled_output: a Variable of size [batch_size, hidden_size] which is the output of a classifier pretrained on top of the hidden state associated to the first character of the input (CLF) to train on the Next-Sentence task (see BERT's paper)
  • get_sequence_output: a Variable of size [batch_size, sequence_length, hidden_size] which is the output from the BERT final block
  • get_embedding_output: a Variable of size [batch_size, sequence_length, hidden_size] which is the summed embedding of tokens, segments and positions

An example on how to use this class is given in the extract_features.py script which can be used to extract the hidden states of the model for a given input.

2. BertClassifier

BertClassifier is a fine-tuning model that includes BertModel and a sequence-level (sequence or pair of sequences) classifier on top of the BertModel.

The sequence-level classifier is a linear layer that takes as input the last hidden state of the first character in the input sequence (see Figures 3a and 3b in the BERT paper).

An example on how to use this class is given in the run_classifier.py script which can be used to fine-tune a single sequence (or pair of sequence) classifier using BERT, for example for the MRPC task.

3. BertForQuestionAnswering

BertSQuAD is a fine-tuning model that includes BertModel with a token-level classifiers on top of the full sequence of last hidden states.

The token-level classifier takes as input the full sequence of the last hidden state and compute several (e.g. two) scores for each tokens that can for example respectively be the score that a given token is a start_span and a end_span token (see Figures 3c and 3d in the BERT paper).

An example on how to use this class is given in the run_squad.py script which can be used to fine-tune a token classifier using BERT, for example for the SQuAD task.

Installation, requirements

This code was tested on Python 3.5+. The requirements are:

  • Chainer
  • progressbar2

Fine-tuning with BERT: running the examples

We showcase the same examples as the original implementation: fine-tuning a sequence-level classifier on the MRPC classification corpus and a token-level classifier on the question answering dataset SQuAD.

Prepare the pretrained BERT model

First of all, please also download the BERT-Basecheckpoint, unzip it to some directory $BERT_BASE_DIR, and convert it to its Chainer version as explained in the previous section.

wget https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-12_H-768_A-12.zip
unzip uncased_L-12_H-768_A-12.zip
export BERT_BASE_DIR=./uncased_L-12_H-768_A-12
python convert_tf_checkpoint_to_chainer.py \
  --tf_checkpoint_path $BERT_BASE_DIR/bert_model.ckpt \
  --npz_dump_path $BERT_BASE_DIR/arrays_bert_model.ckpt.npz

Sentence (or Pair) Classification with GLUE Dataset

Prepare GLUE dataset

Before running theses examples you should download theGLUE data by runningthis scriptand unpack it to some directory $GLUE_DIR.

wget https://gist.github.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e/raw/17b8dd0d724281ed7c3b2aeeda662b92809aadd5/download_glue_data.py
python download_glue_data.py
export GLUE_DIR=./glue_data

Train and evaluate

This example code fine-tunes BERT-Base on the Microsoft Research ParaphraseCorpus (MRPC) corpus and runs in less than several minutes on a single Tesla P100.

python run_classifier.py \
  --task_name MRPC \
  --do_train True \
  --do_eval True \
  --do_lower_case True \
  --data_dir $GLUE_DIR/MRPC/ \
  --vocab_file $BERT_BASE_DIR/vocab.txt \
  --bert_config_file $BERT_BASE_DIR/bert_config.json \
  --init_checkpoint $BERT_BASE_DIR/arrays_bert_model.ckpt.npz \
  --max_seq_length 128 \
  --train_batch_size 32 \
  --learning_rate 2e-5 \
  --num_train_epochs 3.0 \
  --output_dir=./mrpc_output

Our test run on a few seeds with the original implementation hyper-parameters gave evaluation results around 86.

Token-level Classification with SQuAD QA Dataset

Prepare SQuAD dataset

The data for SQuAD can be downloaded with the following links and should be saved in a $SQUAD_DIR directory.This runs in less than several hours on a single Tesla P100.

export SQUAD_DIR=/path/to/SQUAD

python run_squad.py \
  --vocab_file=$BERT_BASE_DIR/vocab.txt \
  --bert_config_file=$BERT_BASE_DIR/bert_config.json \
  --init_checkpoint=$BERT_BASE_DIR/arrays_bert_model.ckpt.npz \
  --do_train=True \
  --do_predict=True \
  --train_file=$SQUAD_DIR/train-v1.1.json \
  --predict_file=$SQUAD_DIR/dev-v1.1.json \
  --train_batch_size=12 \
  --learning_rate=3e-5 \
  --num_train_epochs=2.0 \
  --max_seq_length=384 \
  --doc_stride=128 \
  --output_dir=./squad_output

Training with the previous hyper-parameters gave us the following results:

{"exact_match": 79.81078524124882, "f1": 87.74743306449187}

The result was little worse than the original repository reported (e.g. f1=88.4).

  • 摘自 我爱自然语言处理 BERT相关论文、文章和代码资源汇总 BERT最近太火,蹭个热点,整理一下相关的资源,包括Paper, 代码和文章解读。 1、Google官方: 1) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 一切始于10月Google祭出的这篇Paper, 瞬间引爆整个

  • 转载自:https://mp.weixin.qq.com/s/q5OyrIycfN4fjQ33uSRmEA 整理一下BERT相关的资源,包括Paper, 代码和文章解读。 1、Google官方: 1) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 一切始于10月Google祭出的这篇

  • 论文:《Pre-training of Deep Bidirectional Transformers for Language Understanding》 官方代码和预训练模型:https://github.com/google-research/bert 第三方代码: 1. pytorch-pretrained-BERT谷歌推荐Pytorch版本 2. BERT-pytorchPytorch

  • 转载 http://www.52nlp.cn/tag/bert BERT相关论文、文章和代码资源汇总 BERT最近太火,蹭个热点,整理一下相关的资源,包括Paper, 代码和文章解读。 1、Google官方: 1) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 一切始于10月Goog

  • BERT相关论文、文章和代码资源汇总 4条回复 BERT最近太火,蹭个热点,整理一下相关的资源,包括Paper, 代码和文章解读。 1、Google官方: 1) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 一切始于10月Google祭出的这篇Paper, 瞬间引爆整个AI圈包括自媒

  • 一文读懂BERT(原理篇) 2018年的10月11日,Google发布的论文《Pre-training of Deep Bidirectional Transformers for Language Understanding》,成功在 11 项 NLP 任务中取得 state of the art 的结果,赢得自然语言处理学界的一片赞誉之声。 本文是对近期关于BERT论文、相关文章、代码进行学习

  • 原文来自:我爱自然语言处理 的BERT相关论文、文章和代码资源汇总 BERT最近太火,蹭个热点,整理一下相关的资源,包括Paper, 代码和文章解读。 1、Google官方: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 一切始于10月Google祭出的这篇Paper, 瞬间引爆整个

  • 转自:https://blog.csdn.net/jiaowoshouzi/article/details/89073944 一文读懂BERT(原理篇) 2018年的10月11日,Google发布的论文《Pre-training of Deep Bidirectional Transformers for Language Understanding》,成功在 11 项 NLP 任务中取得 sta

  •          2018年是NLP丰收的一年,这一年中比较大火的又是BERT,而BERT又和2017年大火的Transformer相关,本文就网上阐释比较好的几篇关于这两个模型的文章进行简单收集,方便想了解这两个模型的研究者阅读。 当然最权威的是官方论文和源码: Transformer:Attention is all you need BERT:Pre-training of Deep Bid

 相关资料
  • BERT 是 Ruby 的 BERT (Binary ERlang Term) 序列化库,可以编码 Ruby 对象到 BERT 格式,或者解码 BERT 二进制到 Ruby 对象。 BERT 规范请看 bert-rpc.org。 以下的 Ruby 类实例会被自动转化为简单的 BERT 类型: Fixnum Float Symbol Array String 以下的 Ruby 类实例会被自动转换为复

  • 在自然语言处理领域中,预训练语言模型(Pre-trained Language Models)已成为非常重要的基础技术。为了进一步促进中文信息处理的研究发展,我们发布了基于全词遮罩(Whole Word Masking)技术的中文预训练模型 BERT-wwm,以及与此技术密切相关的模型:BERT-wwm-ext,RoBERTa-wwm-ext,RoBERTa-wwm-ext-large, RBT3

  • Are you looking for X-as-service? Try the Cloud-Native Neural Search Framework for Any Kind of Data bert-as-service Using BERT model as a sentence encoding service, i.e. mapping a variable-length sent

  • 中文说明 | English 在自然语言处理领域中,预训练语言模型(Pre-trained Language Models)已成为非常重要的基础技术。为了进一步促进中文信息处理的研究发展,我们发布了基于全词遮罩(Whole Word Masking)技术的中文预训练模型BERT-wwm,以及与此技术密切相关的模型:BERT-wwm-ext,RoBERTa-wwm-ext,RoBERTa-wwm-e

  • 我正在尝试将两个模型连接在一起。我有一个伯特模型和效率网模型。 但我有一个错误: ValueError Traceback(最近一次调用上次)在9个输出中=层。密集(2,activation='softmax',name='real_output')(密集)10--- ~/anaconda3/lib/python3。7/现场包/KERA/遗留/接口。包装中的py(*args,**kwargs)89

  • 然后,如果我尝试在本地使用保存的模型,它就可以工作了: 然后,我将SavedModel上传到一个bucket,并在gcloud上创建了一个模型和一个模型版本: 没有问题,模型在控制台中部署并显示为工作状态。 “saved_model_cli show--dir 1575241274/--tag_set serve--signature_def serving_default”的输出:

相关阅读

相关文章

相关问答

相关文档