Distiller: Preparing a Model for Quantization 量化自己的神经网络模型

岳正浩

2023-12-01

本文是对Distiller官方文档Preparing a Model for Quantization的翻译。

Background

注意：如果只希望对所需的修改进行精简以确保模型在Distiller中正确量化，则可以跳过此部分，直接进入下一部分。

Distiller提供了一种，可将“原始” 的FP32 PyTorch模型转换为对应量化模型（用于量化感知训练和训练后量化）的自动机制。该机制在PyTorch“module”级别起作用。说到“module”，我们指的是torch.nn.Module 该类的任何子类。Distiller 的Quantizer可以检测module，并将其替换为其他module。

但是，在PyTorch中并非要求将所有操作都定义为module。操作通常通过直接重载张量运算符（运算符+，-等）和torch下的functions（例如torch.cat()）。还有一个torch.nn.functional名称空间，它提供了与torch.nn中提供的模块等效的功能。当一个操作不保持任何状态时，即使它具有专用状态nn.Module，它也通常会通过其功能对应项来调用它。例如-调用nn.functional.relu()而不是创建实例nn.ReLU并调用它。此类非模块操作直接从模块的forward函数中调用。有一些方法可以预先发现这些操作，这些方法已在Distiller中使用用于不同的目的。即使这样，我们也只能借助“肮脏的” Python技巧来替换这些操作，即使出于多种原因，我们宁愿不这样做。

另外，在某些情况下，同一模块实例在forward函数中会重复使用多次。这也是Distiller的问题。如果每个操作调用未“绑定”到专用模块实例，则有几种流程将无法按预期工作。例如：

在收集统计信息时，重用的每次调用都会覆盖为先前的调用收集的统计信息。最后，除最后一次调用外，所有调用的统计信息都丢失了。
“ Net-ware”量化依赖于从模型中执行的每个操作到调用它的模块的1：1映射。对于重复使用的模块，此映射不再是1：1。

因此，为了确保Distiller对模型中所有受支持的操作进行了正确的量化，可能有必要在将模型代码传递给量化器之前对其进行修改。请注意，在不同的可用量化器之间，支持的操作的确切集合可能有所不同。

Model Preparation To-Do List

准备量化模型所需的步骤可总结如下：

用模块替换直接张量操作
将重复使用的模块替换为专用实例
用等效模块替换torch.nn.functional调用
特殊情况-用无法量化的变量替换无法量化的模块

在下一节中，我们将看到此列表中第1-3项的示例。

至于“特殊情况”，目前唯一的情况是LSTM。有关详细信息，请参见示例之后的部分。

Model Preparation Example

我们将使用以下简单模块作为示例。该模块宽松地基于torchvision中的ResNet实现，其中包含一些没有太大意义的更改，目的是演示可能需要的不同修改。

import torch.nn as nn
import torch.nn.functional as F

class BasicModule(nn.Module):
    def __init__(self, in_ch, out_ch, kernel_size):
        super(BasicModule, self).__init__()
        self.conv1 = nn.Conv2d(in_ch, out_ch, kernel_size)
        self.bn1 = nn.BatchNorm2d(out_ch)
        self.relu = nn.ReLU()
        self.conv2 = nn.Conv2d(out_ch, out_ch, kernel_size)
        self.bn2 = nn.BatchNorm2d(out_ch)

    def forward(self, x):
        identity = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)
        out = self.conv2(out)
        out = self.bn2(out)

        # (1) Overloaded tensor addition operation
        # Alternatively, could be called via a tensor function: skip_1.add_(identity)
        out += identity
        # (2) Relu module re-used
        out = self.relu(out)

        # (3) Using operation from 'torch' namespace
        out = torch.cat([identity, out], dim=1)
        # (4) Using function from torch.nn.functional
        out = F.sigmoid(out)

        return out

Replace direct tensor operations with modules

forward函数中的加法（1）和串联（3）操作是直接张量操作的示例。这些操作没有在torch.nn.Module中定义的等效模块。因此，如果要量化这些操作，则必须实现将调用它们的模块。在Distiller中，我们为常用操作实现了一些简单的包装器模块。这些在distiller.modules名称空间中定义。具体来说，addition操作应替换为EltWiseAdd模块，concatenation操作应替换为Concat模块。

Replace re-used modules with dedicated instances

上面的relu操作是通过模块调用的，但是两个调用都使用相同的实例（2）。我们需要在__init__中创建nn.ReLU的第二个实例，并将其用于forward期间的第二次调用。

Replace `torch.nn.functional` calls with equivalent modules

使用功能接口可以调用Sigmoid（4）操作。幸运的是，torch.nn.functional中的操作具有等效的模块，因此se只能使用这些模块。在这种情况下，我们需要创建一个实例torch.nn.Sigmoid。

Putting it all together

完成上述所有更改后，我们最终得到：

import torch.nn as nn
import torch.nn.functional as F
import distiller.modules

class BasicModule(nn.Module):
    def __init__(self, in_ch, out_ch, kernel_size):
        super(BasicModule, self).__init__()
        self.conv1 = nn.Conv2d(in_ch, out_ch, kernel_size)
        self.bn1 = nn.BatchNorm2d(out_ch)
        self.relu1 = nn.ReLU()
        self.conv2 = nn.Conv2d(out_ch, out_ch, kernel_size)
        self.bn2 = nn.BatchNorm2d(out_ch)

        # Fixes start here
        # (1) Replace '+=' with an inplace module
        self.add = distiller.modules.EltWiseAdd(inplace=True)
        # (2) Separate instance for each relu call
        self.relu2 = nn.ReLU()
        # (3) Dedicated module instead of tensor op
        self.concat = distiller.modules.Concat(dim=1)
        # (4) Dedicated module instead of functional call
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        identity = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu1(out)
        out = self.conv2(out)
        out = self.bn2(out)

        out = self.add(out, identity)
        out = self.relu(out)
        out = self.concat(identity, out)
        out = self.sigmoid(out)

        return out

Special Case: LSTM (a “compound” module)

Background

LSTM代表一种特殊情况。LSTM块由构建块组成，例如全连接层和Sigmoid/ tanh的非线性层，所有这些都在torch.nn中具有专用模块。但是，PyTorch中提供的LSTM实现不使用这些构件。为了优化，所有内部操作都在C ++级别上实现。在Python级别公开的模型的唯一部分是完全连接的层的参数。因此，我们使用PyTorch LSTM模块所能做的就是量化整个模块的输入/输出，并量化FC层参数。我们根本无法量化模块的内部阶段。除了仅对内部级进行量化之外，我们还希望可以选择单独控制每个内部级的量化参数。

What to do

Distiller提供了LSTM的“模块化”实现，该实现完全由在Python级别定义的操作组成。我们提供的DistillerLSTM和DistillerLSTMCell实现，与由PyTorch提供的LSTM和LSTMCell并行。

还提供了将模型中的所有LSTM实例转换为Distiller变体的功能：

model = distiller.modules.convert_model_to_distiller_lstm(model)

Distiller: Preparing a Model for Quantization 量化自己的神经网络模型

Background

Model Preparation To-Do List

Model Preparation Example

Replace direct tensor operations with modules

Replace re-used modules with dedicated instances

Replace `torch.nn.functional` calls with equivalent modules

Putting it all together

Special Case: LSTM (a “compound” module)

Background

What to do

相关阅读

相关文章

相关问答

相关文档

Distiller: Preparing a Model for Quantization 量化自己的神经网络模型

Background

Model Preparation To-Do List

Model Preparation Example

Replace direct tensor operations with modules

Replace re-used modules with dedicated instances

Replace torch.nn.functional calls with equivalent modules

Putting it all together

Special Case: LSTM (a “compound” module)

Background

What to do

相关阅读

相关文章

相关问答

相关文档

Replace `torch.nn.functional` calls with equivalent modules