Pytorch Cheatsheet

鲁浩渺

2023-12-01

Pytorch Cheatsheet

torch.no_grad()

CLASS torch.autograd.no_grad[SOURCE]
Context-manager that disabled gradient calculation.

Disabling gradient calculation is useful for inference, when you are sure that you will not call Tensor.backward(). It will reduce memory consumption for computations that would otherwise have requires_grad=True. In this mode, the result of every computation will have requires_grad=False, even when the inputs have requires_grad=True.

Also functions as a decorator.

Example:

>>> x = torch.tensor([1], requires_grad=True)
>>> with torch.no_grad():
...   y = x * 2
>>> y.requires_grad
False
>>> @torch.no_grad()
... def doubler(x):
...     return x * 2
>>> z = doubler(x)
>>> z.requires_grad
False

Dataloader

https://pytorch.org/tutorials/beginner/data_loading_tutorial.html

Model.train,Model.eval

在训练和测试前加上Model.train()和Model.eval()用来切换batch normalization 和drop out在训练和验证时不同的行为

torch.nn.functional和torch.nn

torch.nn.functional（F）和torch.nn（简称nn）中都有一些函数，比如F.relu、nn.ReLU，F中也有损失函数F.nll_loss，不过之前尝试了一下，F.的函数好像在print(model)的时候显示不出来？

CrossEntropyLoss和nll_loss

CrossEntopyLoss会自己计算负log，而nll_loss需要手动计算log softmax

import torch
import torch.nn.functional as F

input = torch.randn(3, 5, requires_grad=True)
target = torch.tensor([1, 0, 4])
output = F.nll_loss(F.log_softmax(input), target)

print("nll_loss:")
print(output)

criterion = torch.nn.CrossEntropyLoss()
loss =criterion(input,target)
print("cross entropy:")
print(loss)

输出为

nll_loss:
tensor(2.3012, grad_fn=<NllLossBackward>)
cross entropy:
tensor(2.3012, grad_fn=<NllLossBackward>)

tensor.item()

返回一维张量的python内置类型的数字

>>> x = torch.tensor([1.0])
>>> x.item()
1.0

随机种子

torch.manual_seed(args.seed)

torch.cat

连接两个张量

r = torch.cat([1,2,3],1)

torch.nn.functional.pad

可以给常量pad常数

>>>import torch
>>>import torch.nn.functional as F
>>>sample = torch.rand((10,3,5,5))
>>>result = F.pad(sample,(0,0,0,0,0,3)) # 默认pad0
>>>result.size()
[10,6,5,5]

pad的第二个参数表明在每一维的开始和结束分别pad多少，从左到右是最后一维到第一维

torch.mm or tensor.mm

矩阵乘法

r = x.mm(Weight)
r = torch.mm(x,Weight)

手动更新权值

fc = torch.nn.Linear(W_target.size(0), 1)
for param in fc.parameters():
    param.data.add_(-0.1 * param.grad.data)

手动更新学习率

参考自博客和官方代码

def adjust_learning_rate(optimizer, epoch):
    """Sets the learning rate to the initial LR decayed by 10 every 30 epochs"""
    lr = args.lr * (0.1 ** (epoch // 30))
    for param_group in optimizer.param_groups:
        param_group['lr'] = lr

在训练时这样使用

for epoch in range(args.start_epoch, args.epochs):
        adjust_learning_rate(optimizer, epoch)
        ...

自动更新学习率

pytorch有封装好的更新学习率的策略

自适应的avgpool

nn.AdaptiveAvgPool2d((1, 1))

只要给出输出的尺寸即可

坑

Win10使用GPU训练报错

.\aten\src\THC\THCGeneral.cpp:87
用1.0.0取代1.0.1的版本就可以了
https://github.com/pytorch/pytorch/issues/18981

模型的保存和读取

官方文档
首先state dict就是描述模型的一个状态字典，是一个模型各层与其参数的映射，它也可以描述optimizer，详细可见文档

For Inference

保存`state_dict`(推荐)

保存：

torch.save(model.state_dict(), PATH)

加载：

model = TheModelClass(*args, **kwargs)
model.load_state_dict(torch.load(PATH))
model.eval()

通常模型文件以.pt或.pth结尾
通过model.eval()在inference前更改dropout和batch nomalization层的行为
注意不可以直接使用model.load_state_dict(PATH)

保存整个模型

保存：

torch.save(model, PATH)

加载：

model=torch.load(PATH)
model.eval()

这样写很直观，但缺点是保存的模型与模型类绑定，在加载的时候需要用到模型的类，因此，可能会在其他模块中或者重构后出现各种各样的错误，如
直接用torch.save保存模型在导入的时候需要main里有模型的结构，否则会报：
AttributeError: Can't get attribute 'Flatten' on <module '__main__'>

Checkpoint

保存：

torch.save({
            'epoch': epoch,
            'model_state_dict': model.state_dict(),
            'optimizer_state_dict': optimizer.state_dict(),
            'loss': loss,
            ...
            }, PATH)

注意这里也需要保存优化器的state dict，以保证优化器的参数不变
重载：

model = TheModelClass(*args, **kwargs)
optimizer = TheOptimizerClass(*args, **kwargs)

checkpoint = torch.load(PATH)
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
epoch = checkpoint['epoch']
loss = checkpoint['loss']

model.eval()
# - or -
model.train()

通常保存的checkpoint以.tar结尾

多个模型的保存，使用其他模型的参数热启动不同的模型，在GPU、CPU之间保存加载模型的方法等见文档

If you trained your model using Adam, you need to save the optimizer state dict as well and reload that. Also, if you used any learning rate decay, you need to reload the state of the scheduler because it gets reset if you don’t,and you may end up with a higher learning rate that will make the solution state oscillate. Finally, if you have any dropout or batch norm in your model architecture, and you saved your model after a test loop (in which case model.eval() was called),make sure to call model.train() before the training loop.

计算Acc

请使用.item()，否则算出来是0？？

 acc += torch.sum(pred_label == target).item()

不常想到的函数

torch.numel()

返回tensor元素的个数

训练

try:
	train()
except (RuntimeError, KeyboardInterrupt):
    print('Save ckpt on exception ...')
    save_checkpoint(model, infos, optimizer)
    print('Save ckpt done.')
    stack_trace = traceback.format_exc()
    print(stack_trace)

ToTensor()

torchvision.transforms.ToTensor好像直接可以将uint8的图片转成float格式的，将图片输入到网络中的时候，好像也可以？

transform 的顺序

Nomalize应该在ToTensor之前！！

测试GPU上的时间

使用torch.cuda.synchronize()，因为pytorch在cpu和gpu上的代码是异步运行的，详见
https://blog.csdn.net/u013548568/article/details/81368019

Normalize

cifar10 (0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)

num_workers

Dataloader的num_workers参数设定了加载batch的线程数，通过实验发现，当我使用某个对data预处理比较多的Data类时，更高的num_workers可以显著提高速度（比如我设置为4，相比于1来说，提高了近15~20倍的速度），对num_workers设置的讨论在这里

pin_memory

当你在GPU上训练时，Dataloader的pin_memory应该总是被设定为True，参考自这里和这里

迁移学习

fintune

import torch.nn as nn
import torchvision
model = torchvision.models.resnet18(pretrained=True)
num_ftrs = model.fc.in_features
model.fc = nn.Linear(num_ftrs,num_classes)

feature extractor

model =  torchvision.models.resnet18(pretrained=True)
for param in model_conv.parameters():
	param.requires_grad = False
num_ftrs = model.fc.in_features
model.fc = nn.Linear(num_ftrs,num_classes)

比较方便的做法

	model =  torchvision.models.resnet18(pretrained=True)
	print(list(model.chirdrens()))

转成列表后选出想要的层，然后用nn.Sequential转回来

torchvision.uitls.make_grid

def imshow(inp, title=None):
    """Imshow for Tensor."""
    inp = inp.numpy().transpose((1, 2, 0))
    mean = np.array([0.485, 0.456, 0.406])
    std = np.array([0.229, 0.224, 0.225])
    inp = std * inp + mean
    inp = np.clip(inp, 0, 1)
    plt.imshow(inp)
    if title is not None:
        plt.title(title)
    plt.pause(0.001)  # pause a bit so that plots are updated

inputs, classes = next(iter(dataloaders['train']))

# Make a grid from batch
out = torchvision.utils.make_grid(inputs)

imshow(out, title=[class_names[x] for x in classes])

nn.Sequencial

这样写AlexNet forward的时候太麻烦了

class AlexNet(nn.Module):
    def __init__(self):
        super(AlexNet, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=8, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(in_channels=8, out_channels=16, kernel_size=3, padding=1)
        self.conv3 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3, padding=1)
        self.conv4 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, padding=1)
        self.conv5 = nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, padding=1)
        self.fc1 = nn.Linear(128, 256)
        self.fc2 = nn.Linear(256, 256)
        self.fc3 = nn.Linear(256, 10)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = F.max_pool2d(x, 2, 2)
        x = F.relu(self.conv2(x))
        x = F.max_pool2d(x, 2, 2)
        x = F.relu(self.conv3(x))
        x = F.max_pool2d(x, 2, 2)
        x = F.relu(self.conv4(x))
        x = F.max_pool2d(x, 2, 2)
        x = F.relu(self.conv5(x))
        x = F.max_pool2d(x, 2, 2)
        x = x.view((x.size(0), -1))
        x = F.dropout(self.fc1(x), 0.5)
        x = F.relu(x)
        x = F.dropout(self.fc2(x), 0.5)
        x = F.relu(x)
        x = self.fc3(x)
        return x

不如

class AlexNet(nn.Module):

    def __init__(self, num_classes):
        super(AlexNet, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=5),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Conv2d(64, 192, kernel_size=5, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Conv2d(192, 384, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(384, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )
        self.fc = nn.Linear(256, num_classes)

    def forward(self, x):
        x = self.features(x)
        x = x.view(x.size(0), -1)
        x = self.fc(x)
        return x

rnn初始化

rnn的隐状态初始化通常是这样的

def init_hiddden(self, batch_size):
    weight = next(self.parameters())
    return weight.new_zeros(self.num_layers, batch_size, self.rnn_size)

weight仅仅是为了保证初始化的隐藏状态的数据类型和模型的数据类型一致

expand

类似numpy repeat的方法
https://pytorch.org/docs/stable/tensors.html?highlight=expand#torch.Tensor.expand

在forward内获得model所在位置

if next(self.parameters()).is_cuda:
    mask = mask.cuda()

并行处理的艺术

Capsule中的ReconstructionNet
FC模型的多个gate

1channel to 3channel

if len(image.shape) == 2:
	image = image[:, :, np.newaxis]
	image = np.concatenate((image, image, image), axis=2)

查看每层输出

参考自这里

import torchvision
dummy_img = torch.zeros((1, 3, 800, 800)).float()
print(dummy_img)
#Out: torch.Size([1, 3, 800, 800])
model = torchvision.models.vgg16(pretrained=True)
fe = list(model.features)
print(fe)

fee = []
k = dummy_img.clone()
for i in fe:
   k = i(k)
   if k.size()[2] < 800//16:
       break
   fee.append(i)
   out_channels = k.size()[1]

权值的初始化

https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/linear.py

初始化函数

def kaiming_uniform_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu'):
    fan = _calculate_correct_fan(tensor, mode)
    gain = calculate_gain(nonlinearity, a)
    std = gain / math.sqrt(fan)
    bound = math.sqrt(3.0) * std  # Calculate uniform bounds from standard deviation
    with torch.no_grad():
        return tensor.uniform_(-bound, bound)

kaiming_uniform_按照均匀分布初始化tensor，在 $U (- b o u n d, b o u n d)$ 中采样，其中
$fan_in \text{bound} = \sqrt{\frac{6}{(1 + a^2) \times \text{fan\_in}}}$
在二维的时候，fan_in就是tensor.size(1)，即输入向量的维数

def kaiming_normal_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu'):
    fan = _calculate_correct_fan(tensor, mode)
    gain = calculate_gain(nonlinearity, a)
    std = gain / math.sqrt(fan)
    with torch.no_grad():
        return tensor.normal_(0, std)

kaiming_normal_从 $\mathcal{N}(0, \text{std})$ 中采样来初始化tensor，其中
$fan_in \text{std} = \sqrt{\frac{2}{(1 + a^2) \times \text{fan\_in}}}$
同样的，fan_in在tensor为二维时，是tensor.size(1)

linear的初始化

def reset_parameters(self):
        init.kaiming_uniform_(self.weight, a=math.sqrt(5))
        if self.bias is not None:
            fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
            bound = 1 / math.sqrt(fan_in)
            init.uniform_(self.bias, -bound, bound)

W在 $U (- b o u n d, b o u n d)$ 中采样，其中
$fan_in \text{bound} = \sqrt{\frac{1}{\text{fan\_in}}}$
fan_in即为W的第二维大小，即Linear所作用的输入向量的维度
bias也在 $U (- b o u n d, b o u n d)$ 中采样，且bound与W一样

Conv的初始化

以二维为例，卷积层的参数实际上是一个四维tensor

if transposed:
    self.weight = Parameter(torch.Tensor(
        in_channels, out_channels // groups, *kernel_size))
else:
    self.weight = Parameter(torch.Tensor(
        out_channels, in_channels // groups, *kernel_size))
if bias:
    self.bias = Parameter(torch.Tensor(out_channels))
else:
    self.register_parameter('bias', None)

比如一个输入channel为3，输出channel为64，kernel size=3的卷积层，其权值即为一个3×64×3×3的向量，它会这样进行初始化

def reset_parameters(self):
    init.kaiming_uniform_(self.weight, a=math.sqrt(5))
    if self.bias is not None:
        fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
        bound = 1 / math.sqrt(fan_in)
        init.uniform_(self.bias, -bound, bound)

同样默认使用kaiming_uniform，在 $U (- b o u n d, b o u n d)$ 中采样，其中
$fan_in \text{bound} = \sqrt{\frac{1}{\text{fan\_in}}}$
对于fan_in的计算：

num_input_fmaps = tensor.size(1)
num_output_fmaps = tensor.size(0)
receptive_field_size = 1
if tensor.dim() > 2:
    receptive_field_size = tensor[0][0].numel()
fan_in = num_input_fmaps * receptive_field_size
fan_out = num_output_fmaps * receptive_field_size

也就是
$fan_in = in_channels × kernel_size 2 \text{fan\_in} = \text{in\_channels}\times \text{kernel\_size}^2$

BatchNorm初始化

def reset_parameters(self):
    self.reset_running_stats()
    if self.affine:
        init.uniform_(self.weight)
        init.zeros_(self.bias)

weigth初始化为 $U (0, 1)$ ,bias初始化为0

ModuleDict ModuleList

参数放在list或dict中时，记得使用nn.ModuleDict和nn.ModuleList

class MyModule(nn.Module):
    def __init__(self):
        super(MyModule, self).__init__()
        self.linears = nn.ModuleList([nn.Linear(10, 10) for i in range(10)])

    def forward(self, x):
        # ModuleList can act as an iterable, or be indexed using ints
        for i, l in enumerate(self.linears):
            x = self.linears[i // 2](x) + l(x)
        return x


class MyModule(nn.Module):
    def __init__(self):
        super(MyModule, self).__init__()
        self.choices = nn.ModuleDict({
                'conv': nn.Conv2d(10, 10, 3),
                'pool': nn.MaxPool2d(3)
        })
        self.activations = nn.ModuleDict([
                ['lrelu', nn.LeakyReLU()],
                ['prelu', nn.PReLU()]
        ])

    def forward(self, x, choice, act):
        x = self.choices[choice](x)
        x = self.activations[act](x)
        return x

broadcastable

https://pytorch.org/docs/stable/notes/broadcasting.html#broadcasting-semantics

多分类,multi-label的问题

crossEntropy不支持multi label，需要用BCEWithLogitsLoss，且这个损失函数比人为使用
sigmoid数值更稳定

dataloader.dataset

.dataset可以返回dataloader中的dataset，然后可以进行其他操作，比如求len()

nn.LSTM和nn.LSTMCell

这里简单记一下，nn.LSTM的输出是output,(hidden state,cell)，nn.LSTMCell也是一样，只不过nn.LSTM输出的是所有time step的output（也就是hidden state），nn.LSTMCell输出的是当前time step的output，两者输出的state都是当前time step的state，且nn.LSTMCell输出的state的维数是(bs,hidden_dim)的，而nn.LSTM除此外前面还有num_layers和方向的额外维数

因为nn.LSTM输出的hidden state维数是 (num_layers * num_directions, batch, hidden_size)，所以state[0][-1]指的是最上面一层的hidden state

nn.Module.register_buffer()

来自官方文档
Adds a persistent buffer to the module.

This is typically used to register a buffer that should not to be considered a model parameter. For example, BatchNorm’s running_mean is not a parameter, but is part of the persistent state.

Buffers can be accessed as attributes using given names.

torch.topk

神奇的函数，返回tensor某维的topk的值

torch.chunk

假设a.shape = (2,3,4)
torch.chunk(a,4,dim=-1)就沿着a的最后一维将a分成4个2*3的tensor

Pytorch Cheatsheet

Pytorch Cheatsheet

torch.no_grad()

Dataloader

Model.train,Model.eval

torch.nn.functional和torch.nn

CrossEntropyLoss和nll_loss

tensor.item()

随机种子

torch.cat

torch.nn.functional.pad

torch.mm or tensor.mm

手动更新权值

手动更新学习率

自动更新学习率

自适应的avgpool

坑

Win10使用GPU训练报错

模型的保存和读取

For Inference

保存state_dict(推荐)

保存整个模型

Checkpoint

计算Acc

不常想到的函数

训练

ToTensor()

transform 的顺序

测试GPU上的时间

Normalize

num_workers

pin_memory

迁移学习

torchvision.uitls.make_grid

nn.Sequencial

rnn初始化

expand

在forward内获得model所在位置

并行处理的艺术

1channel to 3channel

查看每层输出

权值的初始化

初始化函数

linear的初始化

Conv的初始化

BatchNorm初始化

ModuleDict ModuleList

broadcastable

多分类,multi-label的问题

dataloader.dataset

nn.LSTM和nn.LSTMCell

nn.Module.register_buffer()

torch.topk

torch.chunk

相关阅读

相关文章

相关问答

保存`state_dict`(推荐)