CLASS torch.autograd.no_grad[SOURCE]
Context-manager that disabled gradient calculation.
Disabling gradient calculation is useful for inference, when you are sure that you will not call Tensor.backward(). It will reduce memory consumption for computations that would otherwise have requires_grad=True. In this mode, the result of every computation will have requires_grad=False, even when the inputs have requires_grad=True.
Also functions as a decorator.
Example:
>>> x = torch.tensor([1], requires_grad=True)
>>> with torch.no_grad():
... y = x * 2
>>> y.requires_grad
False
>>> @torch.no_grad()
... def doubler(x):
... return x * 2
>>> z = doubler(x)
>>> z.requires_grad
False
https://pytorch.org/tutorials/beginner/data_loading_tutorial.html
在训练和测试前加上Model.train()和Model.eval()用来切换batch normalization 和drop out在训练和验证时不同的行为
torch.nn.functional(F)和torch.nn(简称nn)中都有一些函数,比如F.relu、nn.ReLU,F中也有损失函数F.nll_loss,不过之前尝试了一下,F.的函数好像在print(model)的时候显示不出来?
CrossEntopyLoss会自己计算负log,而nll_loss需要手动计算log softmax
import torch
import torch.nn.functional as F
input = torch.randn(3, 5, requires_grad=True)
target = torch.tensor([1, 0, 4])
output = F.nll_loss(F.log_softmax(input), target)
print("nll_loss:")
print(output)
criterion = torch.nn.CrossEntropyLoss()
loss =criterion(input,target)
print("cross entropy:")
print(loss)
输出为
nll_loss:
tensor(2.3012, grad_fn=<NllLossBackward>)
cross entropy:
tensor(2.3012, grad_fn=<NllLossBackward>)
返回一维张量的python内置类型的数字
>>> x = torch.tensor([1.0])
>>> x.item()
1.0
torch.manual_seed(args.seed)
连接两个张量
r = torch.cat([1,2,3],1)
可以给常量pad常数
>>>import torch
>>>import torch.nn.functional as F
>>>sample = torch.rand((10,3,5,5))
>>>result = F.pad(sample,(0,0,0,0,0,3)) # 默认pad0
>>>result.size()
[10,6,5,5]
pad的第二个参数表明在每一维的开始和结束分别pad多少,从左到右是最后一维到第一维
矩阵乘法
r = x.mm(Weight)
r = torch.mm(x,Weight)
fc = torch.nn.Linear(W_target.size(0), 1)
for param in fc.parameters():
param.data.add_(-0.1 * param.grad.data)
def adjust_learning_rate(optimizer, epoch):
"""Sets the learning rate to the initial LR decayed by 10 every 30 epochs"""
lr = args.lr * (0.1 ** (epoch // 30))
for param_group in optimizer.param_groups:
param_group['lr'] = lr
在训练时这样使用
for epoch in range(args.start_epoch, args.epochs):
adjust_learning_rate(optimizer, epoch)
...
pytorch有封装好的更新学习率的策略
nn.AdaptiveAvgPool2d((1, 1))
只要给出输出的尺寸即可
.\aten\src\THC\THCGeneral.cpp:87
用1.0.0取代1.0.1的版本就可以了
https://github.com/pytorch/pytorch/issues/18981
官方文档
首先state dict就是描述模型的一个状态字典,是一个模型各层与其参数的映射,它也可以描述optimizer,详细可见文档
state_dict
(推荐)保存:
torch.save(model.state_dict(), PATH)
加载:
model = TheModelClass(*args, **kwargs)
model.load_state_dict(torch.load(PATH))
model.eval()
通常模型文件以.pt
或.pth
结尾
通过model.eval()
在inference前更改dropout和batch nomalization层的行为
注意不可以直接使用model.load_state_dict(PATH)
保存:
torch.save(model, PATH)
加载:
model=torch.load(PATH)
model.eval()
这样写很直观,但缺点是保存的模型与模型类绑定,在加载的时候需要用到模型的类,因此,可能会在其他模块中或者重构后出现各种各样的错误,如
直接用torch.save保存模型在导入的时候需要main里有模型的结构,否则会报:
AttributeError: Can't get attribute 'Flatten' on <module '__main__'>
保存:
torch.save({
'epoch': epoch,
'model_state_dict': model.state_dict(),
'optimizer_state_dict': optimizer.state_dict(),
'loss': loss,
...
}, PATH)
注意这里也需要保存优化器的state dict,以保证优化器的参数不变
重载:
model = TheModelClass(*args, **kwargs)
optimizer = TheOptimizerClass(*args, **kwargs)
checkpoint = torch.load(PATH)
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
epoch = checkpoint['epoch']
loss = checkpoint['loss']
model.eval()
# - or -
model.train()
通常保存的checkpoint以.tar
结尾
多个模型的保存,使用其他模型的参数热启动不同的模型,在GPU、CPU之间保存加载模型的方法等见文档
If you trained your model using Adam, you need to save the optimizer state dict as well and reload that. Also, if you used any learning rate decay, you need to reload the state of the scheduler because it gets reset if you don’t,and you may end up with a higher learning rate that will make the solution state oscillate. Finally, if you have any dropout or batch norm in your model architecture, and you saved your model after a test loop (in which case model.eval()
was called),make sure to call model.train()
before the training loop.
请使用.item(),否则算出来是0??
acc += torch.sum(pred_label == target).item()
torch.numel()
返回tensor元素的个数
try:
train()
except (RuntimeError, KeyboardInterrupt):
print('Save ckpt on exception ...')
save_checkpoint(model, infos, optimizer)
print('Save ckpt done.')
stack_trace = traceback.format_exc()
print(stack_trace)
torchvision.transforms.ToTensor
好像直接可以将uint8的图片转成float格式的,将图片输入到网络中的时候,好像也可以?
Nomalize应该在ToTensor之前!!
使用torch.cuda.synchronize(),因为pytorch在cpu和gpu上的代码是异步运行的,详见
https://blog.csdn.net/u013548568/article/details/81368019
Dataloader
的num_workers
参数设定了加载batch的线程数,通过实验发现,当我使用某个对data
预处理比较多的Data
类时,更高的num_workers
可以显著提高速度(比如我设置为4,相比于1来说,提高了近15~20倍的速度),对num_workers设置的讨论在这里
当你在GPU上训练时,Dataloader
的pin_memory
应该总是被设定为True
,参考自这里和这里
import torch.nn as nn
import torchvision
model = torchvision.models.resnet18(pretrained=True)
num_ftrs = model.fc.in_features
model.fc = nn.Linear(num_ftrs,num_classes)
model = torchvision.models.resnet18(pretrained=True)
for param in model_conv.parameters():
param.requires_grad = False
num_ftrs = model.fc.in_features
model.fc = nn.Linear(num_ftrs,num_classes)
比较方便的做法
model = torchvision.models.resnet18(pretrained=True)
print(list(model.chirdrens()))
转成列表后选出想要的层,然后用nn.Sequential
转回来
def imshow(inp, title=None):
"""Imshow for Tensor."""
inp = inp.numpy().transpose((1, 2, 0))
mean = np.array([0.485, 0.456, 0.406])
std = np.array([0.229, 0.224, 0.225])
inp = std * inp + mean
inp = np.clip(inp, 0, 1)
plt.imshow(inp)
if title is not None:
plt.title(title)
plt.pause(0.001) # pause a bit so that plots are updated
inputs, classes = next(iter(dataloaders['train']))
# Make a grid from batch
out = torchvision.utils.make_grid(inputs)
imshow(out, title=[class_names[x] for x in classes])
这样写AlexNet forward的时候太麻烦了
class AlexNet(nn.Module):
def __init__(self):
super(AlexNet, self).__init__()
self.conv1 = nn.Conv2d(in_channels=3, out_channels=8, kernel_size=3, padding=1)
self.conv2 = nn.Conv2d(in_channels=8, out_channels=16, kernel_size=3, padding=1)
self.conv3 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3, padding=1)
self.conv4 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, padding=1)
self.conv5 = nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, padding=1)
self.fc1 = nn.Linear(128, 256)
self.fc2 = nn.Linear(256, 256)
self.fc3 = nn.Linear(256, 10)
def forward(self, x):
x = F.relu(self.conv1(x))
x = F.max_pool2d(x, 2, 2)
x = F.relu(self.conv2(x))
x = F.max_pool2d(x, 2, 2)
x = F.relu(self.conv3(x))
x = F.max_pool2d(x, 2, 2)
x = F.relu(self.conv4(x))
x = F.max_pool2d(x, 2, 2)
x = F.relu(self.conv5(x))
x = F.max_pool2d(x, 2, 2)
x = x.view((x.size(0), -1))
x = F.dropout(self.fc1(x), 0.5)
x = F.relu(x)
x = F.dropout(self.fc2(x), 0.5)
x = F.relu(x)
x = self.fc3(x)
return x
不如
class AlexNet(nn.Module):
def __init__(self, num_classes):
super(AlexNet, self).__init__()
self.features = nn.Sequential(
nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=5),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(64, 192, kernel_size=5, padding=2),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(192, 384, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(384, 256, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(256, 256, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=2, stride=2),
)
self.fc = nn.Linear(256, num_classes)
def forward(self, x):
x = self.features(x)
x = x.view(x.size(0), -1)
x = self.fc(x)
return x
rnn的隐状态初始化通常是这样的
def init_hiddden(self, batch_size):
weight = next(self.parameters())
return weight.new_zeros(self.num_layers, batch_size, self.rnn_size)
weight仅仅是为了保证初始化的隐藏状态的数据类型和模型的数据类型一致
类似numpy repeat的方法
https://pytorch.org/docs/stable/tensors.html?highlight=expand#torch.Tensor.expand
if next(self.parameters()).is_cuda:
mask = mask.cuda()
if len(image.shape) == 2:
image = image[:, :, np.newaxis]
image = np.concatenate((image, image, image), axis=2)
参考自这里
import torchvision
dummy_img = torch.zeros((1, 3, 800, 800)).float()
print(dummy_img)
#Out: torch.Size([1, 3, 800, 800])
model = torchvision.models.vgg16(pretrained=True)
fe = list(model.features)
print(fe)
fee = []
k = dummy_img.clone()
for i in fe:
k = i(k)
if k.size()[2] < 800//16:
break
fee.append(i)
out_channels = k.size()[1]
https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/linear.py
def kaiming_uniform_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu'):
fan = _calculate_correct_fan(tensor, mode)
gain = calculate_gain(nonlinearity, a)
std = gain / math.sqrt(fan)
bound = math.sqrt(3.0) * std # Calculate uniform bounds from standard deviation
with torch.no_grad():
return tensor.uniform_(-bound, bound)
kaiming_uniform_
按照均匀分布初始化tensor,在
U
(
−
b
o
u
n
d
,
b
o
u
n
d
)
U(-bound,bound)
U(−bound,bound)中采样,其中
bound
=
6
(
1
+
a
2
)
×
fan_in
\text{bound} = \sqrt{\frac{6}{(1 + a^2) \times \text{fan\_in}}}
bound=(1+a2)×fan_in6
在二维的时候,fan_in
就是tensor.size(1)
,即输入向量的维数
def kaiming_normal_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu'):
fan = _calculate_correct_fan(tensor, mode)
gain = calculate_gain(nonlinearity, a)
std = gain / math.sqrt(fan)
with torch.no_grad():
return tensor.normal_(0, std)
kaiming_normal_
从
N
(
0
,
std
)
\mathcal{N}(0, \text{std})
N(0,std)中采样来初始化tensor,其中
std
=
2
(
1
+
a
2
)
×
fan_in
\text{std} = \sqrt{\frac{2}{(1 + a^2) \times \text{fan\_in}}}
std=(1+a2)×fan_in2
同样的,fan_in
在tensor为二维时,是tensor.size(1)
def reset_parameters(self):
init.kaiming_uniform_(self.weight, a=math.sqrt(5))
if self.bias is not None:
fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
bound = 1 / math.sqrt(fan_in)
init.uniform_(self.bias, -bound, bound)
W在
U
(
−
b
o
u
n
d
,
b
o
u
n
d
)
U(-bound,bound)
U(−bound,bound)中采样,其中
bound
=
1
fan_in
\text{bound} = \sqrt{\frac{1}{\text{fan\_in}}}
bound=fan_in1
fan_in
即为W的第二维大小,即Linear所作用的输入向量的维度
bias也在
U
(
−
b
o
u
n
d
,
b
o
u
n
d
)
U(-bound,bound)
U(−bound,bound)中采样,且bound与W一样
以二维为例,卷积层的参数实际上是一个四维tensor
if transposed:
self.weight = Parameter(torch.Tensor(
in_channels, out_channels // groups, *kernel_size))
else:
self.weight = Parameter(torch.Tensor(
out_channels, in_channels // groups, *kernel_size))
if bias:
self.bias = Parameter(torch.Tensor(out_channels))
else:
self.register_parameter('bias', None)
比如一个输入channel为3,输出channel为64,kernel size=3的卷积层,其权值即为一个3×64×3×3的向量,它会这样进行初始化
def reset_parameters(self):
init.kaiming_uniform_(self.weight, a=math.sqrt(5))
if self.bias is not None:
fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
bound = 1 / math.sqrt(fan_in)
init.uniform_(self.bias, -bound, bound)
同样默认使用kaiming_uniform
,在
U
(
−
b
o
u
n
d
,
b
o
u
n
d
)
U(-bound,bound)
U(−bound,bound)中采样,其中
bound
=
1
fan_in
\text{bound} = \sqrt{\frac{1}{\text{fan\_in}}}
bound=fan_in1
对于fan_in
的计算:
num_input_fmaps = tensor.size(1)
num_output_fmaps = tensor.size(0)
receptive_field_size = 1
if tensor.dim() > 2:
receptive_field_size = tensor[0][0].numel()
fan_in = num_input_fmaps * receptive_field_size
fan_out = num_output_fmaps * receptive_field_size
也就是
fan_in
=
in_channels
×
kernel_size
2
\text{fan\_in} = \text{in\_channels}\times \text{kernel\_size}^2
fan_in=in_channels×kernel_size2
def reset_parameters(self):
self.reset_running_stats()
if self.affine:
init.uniform_(self.weight)
init.zeros_(self.bias)
weigth初始化为 U ( 0 , 1 ) U(0,1) U(0,1),bias初始化为0
参数放在list或dict中时,记得使用nn.ModuleDict和nn.ModuleList
class MyModule(nn.Module):
def __init__(self):
super(MyModule, self).__init__()
self.linears = nn.ModuleList([nn.Linear(10, 10) for i in range(10)])
def forward(self, x):
# ModuleList can act as an iterable, or be indexed using ints
for i, l in enumerate(self.linears):
x = self.linears[i // 2](x) + l(x)
return x
class MyModule(nn.Module):
def __init__(self):
super(MyModule, self).__init__()
self.choices = nn.ModuleDict({
'conv': nn.Conv2d(10, 10, 3),
'pool': nn.MaxPool2d(3)
})
self.activations = nn.ModuleDict([
['lrelu', nn.LeakyReLU()],
['prelu', nn.PReLU()]
])
def forward(self, x, choice, act):
x = self.choices[choice](x)
x = self.activations[act](x)
return x
https://pytorch.org/docs/stable/notes/broadcasting.html#broadcasting-semantics
crossEntropy不支持multi label,需要用BCEWithLogitsLoss,且这个损失函数比人为使用
sigmoid数值更稳定
.dataset可以返回dataloader中的dataset,然后可以进行其他操作,比如求len()
这里简单记一下,nn.LSTM的输出是output,(hidden state,cell),nn.LSTMCell也是一样,只不过nn.LSTM输出的是所有time step的output(也就是hidden state),nn.LSTMCell输出的是当前time step的output,两者输出的state都是当前time step的state,且nn.LSTMCell输出的state的维数是(bs,hidden_dim)的,而nn.LSTM除此外前面还有num_layers和方向的额外维数
因为nn.LSTM输出的hidden state维数是 (num_layers * num_directions, batch, hidden_size),所以state[0][-1]指的是最上面一层的hidden state
来自官方文档
Adds a persistent buffer to the module.
This is typically used to register a buffer that should not to be considered a model parameter. For example, BatchNorm’s running_mean is not a parameter, but is part of the persistent state.
Buffers can be accessed as attributes using given names.
神奇的函数,返回tensor某维的topk的值
假设a.shape = (2,3,4)
torch.chunk(a,4,dim=-1)就沿着a的最后一维将a分成4个2*3的tensor