通常,当你处理图片、文本、声音或者视频数据的时候,你使用标准python package加载数据到 numpy array 的 python 包,然后你把array 转换成 torch.*Tensor
特别的,对于图像,我们创造了一个名为 torchvision 的包,torchvision可直接加载如Imagenet, CIFAR10, MNIST之类的常用数据集,还有一些非常常用的数据转换器,这提供了巨大的方便,避免了范例文件代码的编写
本教程我们将使用CIFAR10数据集。共有十类: ‘airplane’, ‘automobile’, ‘bird’, ‘cat’, ‘deer’, ‘dog’, ‘frog’, ‘horse’, ‘ship’, ‘truck’. CIFAR10中的图片3通道,32*32大小
使用torchvision,加载CIFAR10 so easy,(妈妈再也不用担心我的学习…)
import torch
import torchvision
import torchvision.transforms as tfs
# torchvision 数据集的输出是[0, 1]范围的PILImage图片
# 我们使用归一化方法将其转化为[-1, 1]范围内的Tensor
transform = tfs.Compose([tfs.ToTensor(),
tfs.Normalize((0.5, 0.5, 0.5),(0.5, 0.5, 0.5))])
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,shuffle=True, num_workers=2)
testset = torchvision.datasets.CIFAR10(root='./data', train=Flase,download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4, shuffle=False, num_workers=2)
classes = ('plane', 'car', 'bird', 'cat', 'deer',
'dog', 'frog', 'horse', 'ship', 'truck')
import matplotlib.pyplot as plt
import numpy as np
# functions to show an image
def imshow(img):
img = img / 2 + 0.5 # unnormalize
npimg = img.numpy()
plt.imshow(np.transpose(npimg, (1, 2, 0)))
# get some random training images
dataiter = iter(trainloader)
images, labels = dataiter.next()
# show images
# print labels
print(' '.join('%5s' % classes[labels[j]] for j in range(4)))
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 6, 5) # 3 input image channel, 6 output channels, 5x5 square convolution kernel
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16*5*5, 120) # an affine operation: y = Wx + b
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2)) # Max pooling over a (2, 2) window
x = F.max_pool2d(F.relu(self.conv2(x)), 2) # If the size is a square you can only specify a single number
x = x.view(-1, self.num_flat_features(x))
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
def num_flat_features(self, x):
size = x.size()[1:] # all dimensions except the batch dimension
num_features = 1
for s in size:
num_features *= s
return num_features
net = Net()
使用 Classification Cross-Entropy 和 SGD
import torch.optim as optim
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
for epoch in range(2): # Loop over the data set multiple times
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
# get the inputs
inputs, labels = data
# zero the parameter gradients
# forward + backward + optimize
outputs = net(inputs)
loss = criterion(outputs, labels)
# print statoistics
running_loss += loss.item()
if i % 2000 == 1999: # print every 2000 mini-batches
print('[%d, %d] loss: %.3f' %
(epoch + 1, i + 1, running_loss / 2000))
running_loss = 0.0
print('Finished Training.')
我们已经在training data上训练两遍网络,但是我们需要检查网络是否学到了什么没有
第一步,先展示一下从testing set获得的一些照片
dataiter = iter(testloader)
images, labels = dataiter.next()
# print images
print('GroundTruth: ', ' '.join('%5s' % classes[labels[j]] for j in range(4)))
# GroundTruth: cat ship ship plane
outputs = net(Variable(images))
# the outputs are energies for the 10 classes.
# Higher the energy for a class, the more the network
# thinks that the image is of the particular class
# So, Let's get the index of the highest energy
_, predicted = torch.max(outputs.data, 1)
# 训练结果
[1, 2000] loss: 2.195
[1, 4000] loss: 1.789
[1, 6000] loss: 1.633
[1, 8000] loss: 1.534
[1, 10000] loss: 1.511
[1, 12000] loss: 1.433
[2, 2000] loss: 1.387
[2, 4000] loss: 1.368
[2, 6000] loss: 1.338
[2, 8000] loss: 1.307
[2, 10000] loss: 1.273
[2, 12000] loss: 1.281
Finished Training.
predicted: horse bird plane truck
让我们看一下网络在整个testing data上表现如何
corret = 0
total = 0
with torch.no_grad():
for data in testloader:
images, lables = data
outputs = net(images)
_, predicts = torch.max(outputs.data, 1)
total += labels.size(0)
corret += (predicted == labels).sum().iterm()
print('Accuracy of the network on the 10000 test images: %d %%'
% 100 * corret / total)
训练的结果要比随机好, 要从十个中选择一个的话准确率大概只有10%
那么它究竟在哪些类别表现良好, 哪些类别表现不好呢?
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
for data in testloader:
images, labels = data
outputs = net(Variable(images))
_, predicted = torch.max(outputs.data, 1)
c = (predicted == labels).squeeze()
for i in range(4):
label = labels[i]
class_correct[label] += c[i].item()
class_total[label] += 1
for i in range(10):
prit('Accucary of %5s : %2d %%' %
classes[i], 100 * class_correct[i]/class_total[i])