transformer-models

Deep Learning Transformer models in MATLAB
授权协议 View license
开发语言 Python
所属分类 神经网络/人工智能、 机器学习/深度学习
软件类型 开源软件
地区 不详
投 递 者 楮庆
操作系统 跨平台
开源组织
适用人群 未知
 软件概览

Transformer Models for MATLAB

This repository implements deep learning transformer models in MATLAB.

Requirements

BERT and FinBERT

  • MATLAB R2021a or later
  • Deep Learning Toolbox
  • Text Analytics Toolbox

GPT-2

  • MATLAB R2020a or later
  • Deep Learning Toolbox

Getting Started

Download or clone this repository to your machine and open it in MATLAB.

Functions

bert

mdl = bert loads a pretrained BERT transformer model and if necessary, downloads the model weights. The output mdl is structure with fields Tokenizer and Parameters that contain the BERT tokenizer and the model parameters, respectively.

mdl = bert("Model",modelName) specifies which BERT model variant to use:

  • "base" (default) - A 12 layer model with hidden size 768.
  • "multilingual-cased" - A 12 layer model with hidden size 768. The tokenizer is case-sensitive. This model was trained on multi-lingual data.
  • "medium" - An 8 layer model with hidden size 512.
  • "small" - A 4 layer model with hidden size 512.
  • "mini" - A 4 layer model with hidden size 256.
  • "tiny" - A 2 layer model with hidden size 128.

bert.model

Z = bert.model(X,parameters) performs inference with a BERT model on the input 1-by-numInputTokens-by-numObservations array of encoded tokens with the specified parameters. The output Z is an array of size (NumHeads*HeadSize)-by-numInputTokens-by-numObservations. The element Z(:,i,j) corresponds to the BERT embedding of input token X(1,i,j).

Z = bert.model(X,parameters,Name,Value) specifies additional options using one or more name-value pairs:

  • "PaddingCode" - Positive integer corresponding to the padding token. The default is 1.
  • "InputMask" - Mask indicating which elements to include for computation, specified as a logical array the same size as X or as an empty array. The mask must be false at indices positions corresponds to padding, and true elsewhere. If the mask is [], then the function determines padding according to the PaddingCode name-value pair. The default is [].
  • "DropoutProb" - Probability of dropout for the output activation. The default is 0.
  • "AttentionDropoutProb" - Probability of dropout used in the attention layer. The default is 0.
  • "Outputs" - Indices of the layers to return outputs from, specified as a vector of positive integers, or "last". If "Outputs" is "last", then the function returns outputs from the final encoder layer only. The default is "last".
  • "SeparatorCode" - Separator token specified as a positive integer. The default is 103.

finbert

mdl = finbert loads a pretrained BERT transformer model for sentiment analysis of financial text. The output mdl is structure with fields Tokenizer and Parameters that contain the BERT tokenizer and the model parameters, respectively.

mdl = finbert("Model",modelName) specifies which FinBERT model variant to use:

  • "sentiment-model" (default) - The fine-tuned sentiment classifier model.
  • "language-model" - The FinBERT pretrained language model, which uses a BERT-Base architecture.

finbert.sentimentModel

sentiment = finbert.sentimentModel(X,parameters) classifies the sentiment of the input 1-by-numInputTokens-by-numObservations array of encoded tokens with the specified parameters. The output sentiment is a categorical array with categories "positive", "neutral", or "negative".

[sentiment, scores] = finbert.sentimentModel(X,parameters) also returns the corresponding sentiment scores in the range [-1 1].

gpt2

mdl = gpt2 loads a pretrained GPT-2 transformer model and if necessary, downloads the model weights.

generateSummary

summary = generateSummary(mdl,text) generates a summary of the string or char array text using the transformer model mdl. The output summary is a char array.

summary = generateSummary(mdl,text,Name,Value) specifies additional options using one or more name-value pairs.

  • "MaxSummaryLength" - The maximum number of tokens in the generated summary. The default is 50.
  • "TopK" - The number of tokens to sample from when generating the summary. The default is 2.
  • "Temperature" - Temperature applied to the GPT-2 output probability distribution. The default is 1.
  • "StopCharacter" - Character to indicate that the summary is complete. The default is ".".

Example: Classify Text Data Using BERT

The simplest use of a pretrained BERT model is to use it as a feature extractor. In particular, you can use the BERT model to convert documents to feature vectors which you can then use as inputs to train a deep learning classification network.

The example ClassifyTextDataUsingBERT.m shows how to use a pretrained BERT model to classify failure events given a data set of factory reports.

Example: Fine-Tune Pretrained BERT Model

To get the most out of a pretrained BERT model, you can retrain and fine tune the BERT parameters weights for your task.

The example FineTuneBERT.m shows how to fine-tune a pretrained BERT model to classify failure events given a data set of factory reports.

Example: Analyze Sentiment with FinBERT

FinBERT is a sentiment analysis model trained on financial text data and fine-tuned for sentiment analysis.

The example SentimentAnalysisWithFinBERT.m shows how to classify the sentiment of financial news reports using a pretrained FinBERT model.

Example: Predict Masked Tokens Using BERT and FinBERT

BERT models are trained to perform various tasks. One of the tasks is known as masked language modeling which is the task of predicting tokens in text that have been replaced by a mask value.

The example PredictMaskedTokensUsingBERT.m shows how to predict masked tokens and calculate the token probabilities using a pretrained BERT model.

The example PredictMaskedTokensUsingFinBERT.m shows how to predict masked tokens for financial text using and calculate the token probabilities using a pretrained FinBERT model.

Example: Summarize Text Using GPT-2

Transformer networks such as GPT-2 can be used to summarize a piece of text. The trained GPT-2 transformer can generate text given an initial sequence of words as input. The model was trained on comments left on various web pages and internet forums.

Because lots of these comments themselves contain a summary indicated by the statement "TL;DR" (Too long, didn't read), you can use the transformer model to generate a summary by appending "TL;DR" to the input text. The generateSummary function takes the input text, automatically appends the string "TL;DR" and generates the summary.

The example SummarizeTextUsingTransformersExample.m shows how to summarize a piece of text using GPT-2.

 相关资料
  • 问题内容: 我正在使用以下代码编写XML文件: 这是输出文件: 我希望此文件缩进,例如: 在我的代码中调用并不能解决问题,它实际上使文本带有换行符(但不缩进)。 任何人都可以解决此问题,而无需外部库? 问题答案: 您可能还必须指定要缩进的空格数量:

  • 简介:Attention机制是一种用于加强神经网络在处理序列数据中关注重要部分的机制。在处理长序列时,RNN可能难以捕捉到序列中不同部分的重要程度,导致信息传递不够高效。而Attention机制允许网络根据当前输入和其他位置的信息,动态地调整各个位置的权重,使得模型可以有选择地关注不同部分的输入。Transformer是一种基于Attention机制的神经网络架构,由著名且经典的"Attentio

  • 问题内容: 我正在使用hibernate和hql在Java代码中进行查询。但是我有这样一个例外: 我不明白“ 0”的含义。以下是一些带有示例的细节: 我有几个表加入hql。表格如下: 类: hql: 查询: 结果是对象“ A”的列表,其中包含收集的长度和单位。我不明白为什么会遇到这个例外。请给一些建议。 更新: 我编写了一个ResultTransformer并输出所有“别名”以查看问题: 似乎它还

  • 问题内容: 我有一个要映射到一个类的SQL查询。我正在使用Hibernate的createSQLQuery,然后将Result Transformer用于要映射到的类。但是Hibernate抛出此错误: 据我所知,这意味着我的构造函数是错误的。它设置为public,并且为空。不知道我还要在这里做什么? 从pgadmin运行时,SQL查询工作正常。在这里,我使用addScalar设置了所有字段,因为

  • 问题内容: 我想用Java中的XSLT转换XML。为此,我使用了包装。但是,我得到了例外。这是我正在使用的代码: 请注意,我标记了引发异常的行。 当我输入方法时,的值为: 问题答案: 该构造 从URL构造StreamSource。我认为您正在传递XSLT的内容。试试这个: 你还必须设置你的意愿写:

  • 我在为TransformerFactory设置功能时遇到以下错误。 代码段为:: 请帮我解决这个问题。