pytorch中with torch.no_grad(): && model.eval()

蓝侯林

2023-12-01

with torch.no_grad():

1.关于`with`

with是python中上下文管理器，简单理解，当要进行固定的进入，返回操作时，可以将对应需要的操作，放在with所需要的语句中。比如文件的写入（需要打开关闭文件）等。

以下为一个文件写入使用with的例子。

with open (filename,'w') as sh:    
            sh.write("#!/bin/bash\n")
            sh.write("#$ -N "+'IC'+altas+str(patientNumber)+altas+'\n')
            sh.write("#$ -o "+pathSh+altas+'log.log\n') 
            sh.write("#$ -e "+pathSh+altas+'err.log\n') 
            sh.write('source ~/.bashrc\n')          
            sh.write('. "/home/kjsun/anaconda3/etc/profile.d/conda.sh"\n')
            sh.write('conda activate python27\n')
            sh.write('echo "to python"\n')
            sh.write('echo "finish"\n')
            sh.close()

with后部分，可以将with后的语句运行，将其返回结果给到as后的变量（sh），之后的代码块对close进行操作。

2.关于`with torch.no_grad()`:

在使用pytorch时，并不是所有的操作都需要进行计算图的生成（计算过程的构建，以便梯度反向传播等操作）。而对于tensor的计算操作，默认是要进行计算图的构建的，在这种情况下，可以使用 with torch.no_grad():，强制之后的内容不进行计算图构建。

以下分别为使用和不使用的情况：

（1）使用`with torch.no_grad()`:

with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
print('Accuracy of the network on the 10000 test images: %d %%' % (
    100 * correct / total))        
print(outputs)

运行结果：

Accuracy of the network on the 10000 test images: 55 %
tensor([[-2.9141, -3.8210,  2.1426,  3.0883,  2.6363,  2.6878,  2.8766,  0.3396,
         -4.7505, -3.8502],
        [-1.4012, -4.5747,  1.8557,  3.8178,  1.1430,  3.9522, -0.4563,  1.2740,
         -3.7763, -3.3633],
        [ 1.3090,  0.1812,  0.4852,  0.1315,  0.5297, -0.3215, -2.0045,  1.0426,
         -3.2699, -0.5084],
        [-0.5357, -1.9851, -0.2835, -0.3110,  2.6453,  0.7452, -1.4148,  5.6919,
         -6.3235, -1.6220]])

此时的outputs没有属性。

（2）不使用`with torch.no_grad()`:

而对应的不使用的情况

for data in testloader:
    images, labels = data
    outputs = net(images)
    _, predicted = torch.max(outputs.data, 1)
    total += labels.size(0)
    correct += (predicted == labels).sum().item()
print('Accuracy of the network on the 10000 test images: %d %%' % (
    100 * correct / total))
print(outputs)

结果如下：

Accuracy of the network on the 10000 test images: 55 %
tensor([[-2.9141, -3.8210,  2.1426,  3.0883,  2.6363,  2.6878,  2.8766,  0.3396,
         -4.7505, -3.8502],
        [-1.4012, -4.5747,  1.8557,  3.8178,  1.1430,  3.9522, -0.4563,  1.2740,
         -3.7763, -3.3633],
        [ 1.3090,  0.1812,  0.4852,  0.1315,  0.5297, -0.3215, -2.0045,  1.0426,
         -3.2699, -0.5084],
        [-0.5357, -1.9851, -0.2835, -0.3110,  2.6453,  0.7452, -1.4148,  5.6919,
         -6.3235, -1.6220]], grad_fn=<AddmmBackward>)

可以看到，此时有grad_fn=<AddmmBackward>属性，表示，计算的结果在一计算图当中，可以进行梯度反传等操作。但是，两者计算的结果实际上是没有区别的。

model.eval()与with torch.no_grad()

※ 共同点：

在PyTorch中进行validation时，使用这两者均可切换到测试模式。

如用于通知dropout层和batchnorm层在train和val模式间切换。
在train模式下，dropout网络层会按照设定的参数p设置保留激活单元的概率（保留概率=p); batchnorm层会继续计算数据的mean和var等参数并更新。
在val模式下，dropout层会让所有的激活单元都通过，而batchnorm层会停止计算和更新mean和var，直接使用在训练阶段已经学出的mean和var值。

※ 不同点：

1、model.eval()会影响各层的gradient计算行为，即gradient计算和存储与training模式一样，只是不进行反传。
2、with torch.zero_grad()是停止autograd模块的工作，也就是停止gradient计算，以起到加速和节省显存的作用，从而节省了GPU算力和显存，但是并不会影响dropout和batchnorm层的行为。

也就是说，如果不在意显存大小和计算时间的话，仅使用model.eval()已足够得到正确的validation的结果；而with torch.zero_grad()则是更进一步加速和节省gpu空间（因为不用计算和存储gradient），从而可以更快计算，也可以跑更大的batch来测试。

requires_grad、volatile、model.eval()与no_grad

总结：
requires_grad=True 要求计算梯度；
requires_grad=False 不要求计算梯度；
model.eval()中的数据不会进行反向传播，但是仍然需要计算梯度；
with torch.no_grad()或者@torch.no_grad()中的数据不需要计算梯度，也不会进行反向传播。（torch.no_grad()是新版本pytorch中volatile的替代）

requires_grad

requires_grad是Variable变量中的一个属性，requires_grad的属性默认为False,若一个节点requires_grad被设置为True，那么所有依赖它的节点的requires_grad都为True，此时要求计算tensor的梯度。

volatile

volatile是Variable的另一个重要的标识，它能够将所有依赖它的节点全部设为volatile=True，优先级比requires_grad=True高。
而volatile=True的节点不会求导，即使requires_grad=True，也不会进行反向传播，对于不需要反向传播的情景(inference，测试阶段推断阶段)，该参数可以实现一定速度的提升，并节省一半的显存，因为其不需要保存梯度。
但是，注意 volatile已经取消了，使用with torch.no_grad()来替代。

no_grad

torch.no_grad() 是一个上下文管理器，被该语句内部的语句将不会计算梯度。
torch.no_grad()是新版本pytorch中volatile的替代。

pytorch中with torch.no_grad(): && model.eval()

with torch.no_grad():

1.关于`with`

2.关于`with torch.no_grad()`:

（1）使用`with torch.no_grad()`:

（2）不使用`with torch.no_grad()`:

model.eval()与with torch.no_grad()

requires_grad、volatile、model.eval()与no_grad

requires_grad

volatile

no_grad

相关阅读

相关文章

相关问答

相关文档

pytorch中with torch.no_grad(): && model.eval()

with torch.no_grad():

1.关于with

2.关于with torch.no_grad():

（1）使用with torch.no_grad():

（2）不使用with torch.no_grad():

model.eval()与with torch.no_grad()

requires_grad、volatile、model.eval()与no_grad

requires_grad

volatile

no_grad

相关阅读

相关文章

相关问答

相关文档

1.关于`with`

2.关于`with torch.no_grad()`:

（1）使用`with torch.no_grad()`:

（2）不使用`with torch.no_grad()`: