我在文本分类任务中使用Pytorch GRU训练模型(输出维度为5)。我的网络实现如下代码所示。
class GRU(nn.Module):
def __init__(self, model_param: ModelParam):
super(GRU, self).__init__()
self.embedding = nn.Embedding(model_param.vocab_size, model_param.embed_dim)
# Build with pre-trained embedding vectors, if given.
if model_param.vocab_embedding is not None:
self.embedding.weight.data.copy_(model_param.vocab_embedding)
self.embedding.weight.requires_grad = False
self.rnn = nn.GRU(model_param.embed_dim,
model_param.hidden_dim,
num_layers=2,
bias=True,
batch_first=True,
dropout=0.5,
bidirectional=False)
self.dropout = nn.Dropout(0.5)
self.fc = nn.Sequential(
nn.Linear(in_features=model_param.hidden_dim, out_features=128),
nn.Linear(in_features=128, out_features=model_param.output_dim)
)
def forward(self, x, labels=None):
'''
:param x: torch.tensor, of shape [batch_size, max_seq_len].
:param labels: torch.tensor, of shape [batch_size]. Not used in this model.
:return outputs: torch.tensor, of shape [batch_size, output_dim].
'''
# [batch_size, max_seq_len, embed_dim].
features = self.dropout(self.embedding(x))
# [batch_size, max_seq_len, hidden_dim].
outputs, _ = self.rnn(features)
# [batch_size, hidden_dim].
outputs = outputs[:, -1, :]
return self.fc(self.dropout(outputs))
我用的是nn。损失函数的CrossEntropyLoss()和optim。SGD for optimizer。损失函数和优化器的定义是这样给出的。
# Loss function and optimizer.
loss_func = nn.CrossEntropyLoss()
optimizer = SGD(model.parameters(), lr=learning_rate, weight_decay=0.9)
我的培训程序大致如下所示。
for batch in train_iter:
optimizer.zero_grad()
# The prediction of model, and its corresponding loss.
prediction = model(batch.text.type(torch.LongTensor).to(device), batch.label.to(device))
loss = loss_func(prediction, batch.label.to(device))
loss.backward()
optimizer.step()
# Record total loss.
epoch_losses.append(loss.item() / batch_size)
当我训练这个模型时,验证准确性和损失是这样报告的。
Epoch 1/300 valid acc: [0.839] (16668 in 19873), time spent 631.497 sec. Validate loss 1.506138. Best validate epoch is 1.
Epoch 2/300 valid acc: [0.839] (16668 in 19873), time spent 627.631 sec. Validate loss 1.577007. Best validate epoch is 2.
Epoch 3/300 valid acc: [0.839] (16668 in 19873), time spent 631.427 sec. Validate loss 1.580756. Best validate epoch is 3.
Epoch 4/300 valid acc: [0.839] (16668 in 19873), time spent 605.352 sec. Validate loss 1.581306. Best validate epoch is 4.
Epoch 5/300 valid acc: [0.839] (16668 in 19873), time spent 388.487 sec. Validate loss 1.581431. Best validate epoch is 5.
Epoch 6/300 valid acc: [0.839] (16668 in 19873), time spent 360.344 sec. Validate loss 1.581464. Best validate epoch is 6.
Epoch 7/300 valid acc: [0.839] (16668 in 19873), time spent 624.345 sec. Validate loss 1.581473. Best validate epoch is 7.
Epoch 8/300 valid acc: [0.839] (16668 in 19873), time spent 622.059 sec. Validate loss 1.581477. Best validate epoch is 8.
Epoch 9/300 valid acc: [0.839] (16668 in 19873), time spent 651.425 sec. Validate loss 1.581478. Best validate epoch is 9.
Epoch 10/300 valid acc: [0.839] (16668 in 19873), time spent 697.475 sec. Validate loss 1.581478. Best validate epoch is 10.
...
它表明验证损失在第9个epoch之后不会减少,并且验证准确性自第一个epoch以来保持不变(请注意,在我的数据集中,其中一个标签占83%,从中可以推断出我的模型倾向于将所有序列预测到同一标签,但是当我在另一个相对不平衡的数据集上进行训练时,也会发生这种情况)。有没有人遇到过这种情况B4?我想知道我是否在设计模型或训练程序时犯了错误。感谢您的帮助 XD.
11月19日更新,我添加了一个数字,显示了训练时损失的行为。从这个图中可以知道,训练损失和验证损失在第5个时期后都变得不变。20个时期的训练和验证损失
现在我发现损失不会下降,主要是因为优化器中设置的权重衰减太高。
optimizer = SGD(model.parameters(), lr=learning_rate, weight_decay=0.9)
所以我修正了这个,把重量衰减改成了5e-5。
optimizer = SGD(model.parameters(), lr=learning_rate, weight_decay=5e-5)
这一次我的网络损失开始减少。但是,在准确性上没有任何提高。
Epoch 1/100 valid acc: [0.839] (16668 in 19873), time spent 398.154 sec. Validate loss 0.713456. Best validate epoch is 1.
Epoch 2/100 valid acc: [0.839] (16668 in 19873), time spent 572.057 sec. Validate loss 0.631721. Best validate epoch is 2.
Epoch 3/100 valid acc: [0.839] (16668 in 19873), time spent 580.867 sec. Validate loss 0.613186. Best validate epoch is 3.
Epoch 4/100 valid acc: [0.839] (16668 in 19873), time spent 561.953 sec. Validate loss 0.601883. Best validate epoch is 4.
Epoch 5/100 valid acc: [0.839] (16668 in 19873), time spent 564.913 sec. Validate loss 0.596573. Best validate epoch is 5.
Epoch 6/100 valid acc: [0.839] (16668 in 19873), time spent 574.525 sec. Validate loss 0.592848. Best validate epoch is 6.
Epoch 7/100 valid acc: [0.839] (16668 in 19873), time spent 580.885 sec. Validate loss 0.591074. Best validate epoch is 7.
Epoch 8/100 valid acc: [0.839] (16668 in 19873), time spent 455.228 sec. Validate loss 0.589787. Best validate epoch is 8.
Epoch 9/100 valid acc: [0.839] (16668 in 19873), time spent 582.756 sec. Validate loss 0.588691. Best validate epoch is 9.
Epoch 10/100 valid acc: [0.839] (16668 in 19873), time spent 583.997 sec. Validate loss 0.588260. Best validate epoch is 10.
Epoch 11/100 valid acc: [0.839] (16668 in 19873), time spent 599.630 sec. Validate loss 0.588224. Best validate epoch is 11.
Epoch 12/100 valid acc: [0.839] (16668 in 19873), time spent 597.713 sec. Validate loss 0.586977. Best validate epoch is 12.
Epoch 13/100 valid acc: [0.839] (16668 in 19873), time spent 605.038 sec. Validate loss 0.587937. Best validate epoch is 13.
Epoch 14/100 valid acc: [0.839] (16668 in 19873), time spent 598.712 sec. Validate loss 0.587059. Best validate epoch is 14.
Epoch 15/100 valid acc: [0.839] (16668 in 19873), time spent 409.344 sec. Validate loss 0.587293. Best validate epoch is 15.
...
训练损失的行为方式如图所示。
我想知道1e-3的学习速率和5e-5的重量衰减是否是合理的设置。我的指定批量大小是32。
我在Pytorch中用LSTM-线性模块构建了一个分类问题(10个类)的模型。我正在训练模型,对于每个时代,我在训练集中输出损失和准确性。产出如下: 纪元:0开始 损失:2.301875352859497 会计科目:0.1138888889 时代:1开始<损失:2.2759320735931396 会计科目:0.29 时代:2开始<损失:2.2510263919830322 会计科目:0.4872
我使用的是bert lstm crf模型,其中bert模型来自https://github.com/huggingface/pytorch-pretrained-BERT/lstm crf模型是我自己编写的。 训练bert-lstm-crf模型25个周期后,训练集、开发集和测试集的性能保持不变,但损失继续减少。我应该在哪里做出改变? 以下是表演: 第25纪元: 第26纪元: 第27纪元: 更多纪元
我有一个只有完全连接/密集层的深度网络,形状为128-256-512-1024-1024所有层使用激活,没有,最后一层使用激活。 在第20次训练后,验证/测试损失开始逆转并上升,但测试精度也在继续提高。这怎么说得通?如果显示了新的数据,测试的准确性是否准确,或者是否存在某种假阳性? 我这样编译模型:
我的团队正在Tensorflow中训练一个CNN对损坏/可接受部件进行二进制分类。我们通过修改cifar10示例代码来创建我们的代码。在我以前的神经网络经验中,我总是训练到损失非常接近于0(远低于1)。然而,我们现在在训练期间(在一个单独的GPU上)用一个验证集来评估我们的模型,看起来精度在大约6.7K步数后停止增长,而损失在超过40K步数后仍在稳步下降。这是因为过装吗?一旦损失非常接近于零,我们
我正在使用神经网络来解决二分类问题,但我遇到了一些麻烦。有时在运行我的模型时,我的验证精度根本没有变化,有时它工作得很好。我的数据集有1200个样本,有28个特征,我有一个类不平衡(200类a 1000类b)。我所有的特征都被标准化了,并且在1到0之间。正如我之前所说,这个问题并不总是发生,但我想知道为什么并修复它 我曾尝试更改优化功能和激活功能,但这对我没有好处。我还注意到,当我增加网络中的神经
问题陈述: 编写一个方法whatTime,它采用int,seconds,表示从某一天午夜开始的秒数,并返回一个格式为“:”的字符串。此处,表示自午夜以来的完整小时数,表示自上一完整小时结束以来的完整分钟数,以及自上一完整分钟结束以来的秒数。和中的每一个都应该是整数,没有额外的前导0。因此,如果秒为0,则应返回“0:0:0”,而如果秒为3661,则应返回“1:1:1” 我的算法: 以下是我的算法对输