深度学习：标签平滑（Label Smoothing Regularization）

齐运诚

2023-12-01

1.标签平滑的作用—防止过拟合

在进行多分类时，很多时候采用one-hot标签进行计算交叉熵损失，而单纯的交叉熵损失时，只考虑到了正确标签的位置的损失，而忽略了错误标签位置的损失。这样导致模型可能会在训练集上拟合的非常好，但由于其错误标签位置的损失没有计算，导致预测的时候，预测错误的概率比较大，也就是常说的过拟合。
标签平滑可以在一定程度上防止过拟合。

2. 传统的交叉熵损失计算

Step1: softmax多分类
$P_i = { e^{z_i} \over {\sum_{i=1}^{n} e^{z_i}} }$
其中， $p_i$ 为当前样本属于类别 $i$ 的概率， $z_i$ 指当前样本的对应类别 $i$ 的 $l o g i t$ , n表示样本的总列别数。
Step2: 交叉熵损失计算公式：
$\over M} {\sum_{m=1}^M {\sum_{i=1}^n}} y_ilog{p_i}$
其中， $M$ 表示样本综述。
实例：
假设一批样本，样本类别的总数n=5, 其中一个样本的one-hot标签为 $[0, 0, 0, 1, 0]$ ,假设通过模型（如全连接等）的 $l o g i t$ 进行softmax后的概率矩阵 $p$ 为：
$p = [0.1, 0.1, 0.1, 0.36, 0.34]$
将其带入到上面的公式，即可计算出单个样本的loss为：
$l oss = - (0 * l o g 0.1 + 0 * l o g 0.1 + 0 * l o g 0.1 + 1 * l o g 0.36 + 0 * l o g 0.34) = - l o g 0.36 = 1.47$
这种传统计算交叉熵损失只考虑了正确标签位置的损失，而没有考虑错误标签的损失。下面让我们看看带有标签平滑的交叉熵损失是怎样计算的吧。

3.带有标签平滑的交叉熵损失的计算

同样是上面的例子：一批样本，样本类别的总数n=5, 其中一个样本的one-hot标签为 $[0, 0, 0, 1, 0]$ ,假设通过模型（如全连接等）的 $l o g i t$ 进行softmax后的概率矩阵 $p$ 为：
$p = [0.1, 0.1, 0.1, 0.36, 0.34]$
设：标签的平滑因子 $\epsilon=0.1$ ,平滑的计算步骤如下：
$(1-\epsilon)*[0,0,0,1,0] = [0,0,0,0.9,0]$
$\epsilon*[1,1,1,1,1] / 5= [0.1,0.1,0.1,0.1,0.1]/5 = [0.02, 0.02, 0.02, 0.02, 0.02]$
$y = y 1 + y 2 = [0.02, 0.02, 0.02, 0.92, 0.02]$
$y$ 即是平滑后的新标签，然后按照传统的交叉熵损失计算步骤即可,如：
$l oss = - y * l o g p = - [0.02, 0.02, 0.02, 0.92, 0.02] * l o g ([0.1, 0.1, 0.1, 0.36, 0.34]) = 2.63$

4.标签平滑与传统的交叉熵损失的比较与分析

有上面实例可以看出，带有标签平滑的损失要比传统交叉熵损失要更大。换言之，带有标签平滑的损失要想下降到传统交叉熵损失的程度，就要学习的更好，迫使模型往正确分类的方向走。

5. 标签平滑的应用场景

只要用到的是交叉熵损失（cross loss）,都可以采取标签平滑处理。

6.pytorch的实现与使用

import torch
import torch.nn as nn
import torch.nn.functional as F


class CELossWithLabelSmoothing(nn.Module):
    ''' Cross Entropy Loss with label smoothing '''
    def __init__(self, label_smooth=0.1, class_num=3755):
        super().__init__()
        self.label_smooth = label_smooth
        self.class_num = class_num

    def forward(self, pred, target):
        '''
        Args:
            pred: prediction of model output    [N, M]
            target: ground truth of sampler [N]
        '''
        eps = 1e-12

        if self.label_smooth is not None:
            # cross entropy loss with label smoothing
            logprobs = F.log_softmax(pred, dim=1)  # softmax + log
            target = F.one_hot(target, self.class_num)  # 转换成one-hot

            # label smoothing
            # 实现 1
            # target = (1.0-self.label_smooth)*target + self.label_smooth/self.class_num
            # 实现 2
            # implement 2
            target = torch.clamp(target.float(), min=self.label_smooth / (self.class_num - 1),
                                 max=1.0 - self.label_smooth)
            loss = -1 * torch.sum(target * logprobs, 1)

        else:
            # standard cross entropy loss
            loss = -1. * pred.gather(1, target.unsqueeze(-1)) + torch.log(torch.exp(pred + eps).sum(dim=1))

        return loss.mean()


if __name__ == '__main__':
    loss2 = CELossWithLabelSmoothing(label_smooth=0.2, class_num=3)
    x = torch.tensor([[0.1, 8, 0.1], [0.1, 0.1, 8]], dtype=torch.float)
    y = torch.tensor([1, 2])
    print(loss2(x, y))

深度学习：标签平滑（Label Smoothing Regularization）

1.标签平滑的作用—防止过拟合

2. 传统的交叉熵损失计算

3.带有标签平滑的交叉熵损失的计算

4.标签平滑与传统的交叉熵损失的比较与分析

5. 标签平滑的应用场景

6.pytorch的实现与使用

相关阅读

相关文章

相关问答

相关文档