機器學習 | PyTorch簡明教程上篇
前面幾篇文章介紹了特征歸一化和張量,接下來開始寫兩篇PyTorch簡明教程,主要介紹PyTorch簡單實踐。
1、四則運算
import torch
a = torch.tensor([2, 3, 4])
b = torch.tensor([3, 4, 5])
print("a + b: ", (a + b).numpy())
print("a - b: ", (a - b).numpy())
print("a * b: ", (a * b).numpy())
print("a / b: ", (a / b).numpy())
加減乘除就不用多解釋了,輸出為:
a + b: [5 7 9]
a - b: [-1 -1 -1]
a * b: [ 6 12 20]
a / b: [0.6666667 0.75 0.8 ]
2、線性回歸
線性回歸是找到一條直線盡可能接近已知點,如圖:
圖1
import torch
from torch import optim
def build_model1():
return torch.nn.Sequential(
torch.nn.Linear(1, 1, bias=False)
)
def build_model2():
model = torch.nn.Sequential()
model.add_module("linear", torch.nn.Linear(1, 1, bias=False))
return model
def train(model, loss, optimizer, x, y):
model.train()
optimizer.zero_grad()
fx = model.forward(x.view(len(x), 1)).squeeze()
output = loss.forward(fx, y)
output.backward()
optimizer.step()
return output.item()
def main():
torch.manual_seed(42)
X = torch.linspace(-1, 1, 101, requires_grad=False)
Y = 2 * X + torch.randn(X.size()) * 0.33
print("X: ", X.numpy(), ", Y: ", Y.numpy())
model = build_model1()
loss = torch.nn.MSELoss(reductinotallow='mean')
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
batch_size = 10
for i in range(100):
cost = 0.
num_batches = len(X) // batch_size
for k in range(num_batches):
start, end = k * batch_size, (k + 1) * batch_size
cost += train(model, loss, optimizer, X[start:end], Y[start:end])
print("Epoch = %d, cost = %s" % (i + 1, cost / num_batches))
w = next(model.parameters()).data
print("w = %.2f" % w.numpy())
if __name__ == "__main__":
main()
(1)先從main函數開始,torch.manual_seed(42)用于設置隨機數生成器的種子,以確保在每次運行時生成的隨機數序列相同,該函數接受一個整數參數作為種子,可以在訓練神經網絡等需要隨機數的場景中使用,以確保結果的可重復性;
(2)torch.linspace(-1, 1, 101, requires_grad=False)用于在指定的區間內生成一組等間隔的數值,該函數接受三個參數:起始值、終止值和元素個數,返回一個張量,其中包含了指定個數的等間隔數值;
(3)build_model1內部實現:
- torch.nn.Sequential(torch.nn.Linear(1, 1, bias=False))中使用nn.Sequential類的構造函數,將線性層作為參數傳遞給它,然后返回一個包含該線性層的神經網絡模型;
- build_model2和build_model1功能一樣,使用add_module()方法向其中添加了一個名為linear的子模塊;
(4)torch.nn.MSELoss(reductinotallow='mean')定義損失函數;
(5)optim.SGD(model.parameters(), lr=0.01, momentum=0.9)實現隨機梯度下降(Stochastic Gradient Descent,SGD)優化算法;
(6)通過batch_size將訓練集拆分,循環100次;
(7)接下來是訓練函數train,用于訓練一個神經網絡模型,具體來說,該函數接受以下參數:
- model:神經網絡模型,通常是一個繼承自nn.Module的類的實例;
- loss:損失函數,用于計算模型的預測值與真實值之間的差異;
- optimizer:優化器,用于更新模型的參數;
- x:輸入數據,是一個torch.Tensor類型的張量;
- y:目標數據,是一個torch.Tensor類型的張量;
(8)train是PyTorch訓練步驟的通用方法,步驟如下:
- 將模型設置為訓練模式,即啟用dropout和batch normalization等訓練時使用的特殊操作;
- 將優化器的梯度緩存清零,以便進行新一輪的梯度計算;
- 將輸入數據傳遞給模型,計算模型的預測值,并將預測值與目標數據傳遞給損失函數,計算損失值;
- 對損失值進行反向傳播,計算模型參數的梯度;
- 使用優化器更新模型參數,以最小化損失值;
- 返回損失值的標量值;
(9)print("Epoch = %d, cost = %s" % (i + 1, cost / num_batches))最后打印當前訓練的輪次和損失值,上述的代碼輸出如下:
...
Epoch = 95, cost = 0.10514946877956391
Epoch = 96, cost = 0.10514946877956391
Epoch = 97, cost = 0.10514946877956391
Epoch = 98, cost = 0.10514946877956391
Epoch = 99, cost = 0.10514946877956391
Epoch = 100, cost = 0.10514946877956391
w = 1.98
3、邏輯回歸
邏輯回歸即用一根曲線近似表示一堆離散點的軌跡,如圖:
圖2
import numpy as np
import torch
from torch import optim
from data_util import load_mnist
def build_model(input_dim, output_dim):
return torch.nn.Sequential(
torch.nn.Linear(
input_dim, output_dim, bias=False)
)
def train(model, loss, optimizer, x_val, y_val):
model.train()
optimizer.zero_grad()
fx = model.forward(x_val)
output = loss.forward(fx, y_val)
output.backward()
optimizer.step()
return output.item()
def predict(model, x_val):
model.eval()
output = model.forward(x_val)
return output.data.numpy().argmax(axis=1)
def main():
torch.manual_seed(42)
trX, teX, trY, teY = load_mnist(notallow=False)
trX = torch.from_numpy(trX).float()
teX = torch.from_numpy(teX).float()
trY = torch.tensor(trY)
n_examples, n_features = trX.size()
n_classes = 10
model = build_model(n_features, n_classes)
loss = torch.nn.CrossEntropyLoss(reductinotallow='mean')
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
batch_size = 100
for i in range(100):
cost = 0.
num_batches = n_examples // batch_size
for k in range(num_batches):
start, end = k * batch_size, (k + 1) * batch_size
cost += train(model, loss, optimizer,
trX[start:end], trY[start:end])
predY = predict(model, teX)
print("Epoch %d, cost = %f, acc = %.2f%%"
% (i + 1, cost / num_batches, 100. * np.mean(predY == teY)))
if __name__ == "__main__":
main()
(1)先從main函數開始,torch.manual_seed(42)上面有介紹,在此略過;
(2)load_mnist是自己實現下載mnist數據集,返回trX和teX是輸入數據,trY和teY是標簽數據;
(3)build_model內部實現:torch.nn.Sequential(torch.nn.Linear(input_dim, output_dim, bias=False))用于構建一個包含一個線性層的神經網絡模型,模型的輸入特征數量為input_dim,輸出特征數量為output_dim,且該線性層沒有偏置項,其中n_classes=10表示輸出10個分類;
(4)其他的步驟就是定義損失函數,梯度下降優化器,通過batch_size將訓練集拆分,循環100次進行train;
(5)optim.SGD(model.parameters(), lr=0.01, momentum=0.9)實現隨機梯度下降(Stochastic Gradient Descent,SGD)優化算法;
(6)每一輪訓練完成后,執行predict,該函數接受兩個參數model(訓練好的模型)和teX(需要預測的數據),步驟如下:
- model.eval()模型設置為評估模式,這意味著模型將不會進行訓練,而是僅用于推理;
- 將output轉換為NumPy數組,并使用argmax()方法獲取每個樣本的預測類別;
(7)print("Epoch %d, cost = %f, acc = %.2f%%" % (i + 1, cost / num_batches, 100. * np.mean(predY == teY)))最后打印當前訓練的輪次,損失值和acc,上述的代碼輸出如下(執行很快,但是準確率偏低):
...
Epoch 91, cost = 0.252863, acc = 92.52%
Epoch 92, cost = 0.252717, acc = 92.51%
Epoch 93, cost = 0.252573, acc = 92.50%
Epoch 94, cost = 0.252431, acc = 92.50%
Epoch 95, cost = 0.252291, acc = 92.52%
Epoch 96, cost = 0.252153, acc = 92.52%
Epoch 97, cost = 0.252016, acc = 92.51%
Epoch 98, cost = 0.251882, acc = 92.51%
Epoch 99, cost = 0.251749, acc = 92.51%
Epoch 100, cost = 0.251617, acc = 92.51%
4、神經網絡
一個經典的LeNet網絡,用于對字符進行分類,如圖:
圖3
- 定義一個多層的神經網絡
- 對數據集的預處理并準備作為網絡的輸入
- 將數據輸入到網絡
- 計算網絡的損失
- 反向傳播,計算梯度
import numpy as np
import torch
from torch import optim
from data_util import load_mnist
def build_model(input_dim, output_dim):
return torch.nn.Sequential(
torch.nn.Linear(input_dim, 512, bias=False),
torch.nn.Sigmoid(),
torch.nn.Linear(512, output_dim, bias=False)
)
def train(model, loss, optimizer, x_val, y_val):
model.train()
optimizer.zero_grad()
fx = model.forward(x_val)
output = loss.forward(fx, y_val)
output.backward()
optimizer.step()
return output.item()
def predict(model, x_val):
model.eval()
output = model.forward(x_val)
return output.data.numpy().argmax(axis=1)
def main():
torch.manual_seed(42)
trX, teX, trY, teY = load_mnist(notallow=False)
trX = torch.from_numpy(trX).float()
teX = torch.from_numpy(teX).float()
trY = torch.tensor(trY)
n_examples, n_features = trX.size()
n_classes = 10
model = build_model(n_features, n_classes)
loss = torch.nn.CrossEntropyLoss(reductinotallow='mean')
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
batch_size = 100
for i in range(100):
cost = 0.
num_batches = n_examples // batch_size
for k in range(num_batches):
start, end = k * batch_size, (k + 1) * batch_size
cost += train(model, loss, optimizer,
trX[start:end], trY[start:end])
predY = predict(model, teX)
print("Epoch %d, cost = %f, acc = %.2f%%"
% (i + 1, cost / num_batches, 100. * np.mean(predY == teY)))
if __name__ == "__main__":
main()
(1)以上這段神經網絡的代碼與邏輯回歸沒有太多的差異,區別的地方是build_model,這里是構建一個包含兩個線性層和一個Sigmoid激活函數的神經網絡模型,該模型包含一個輸入特征數量為input_dim,輸出特征數量為output_dim的線性層,一個Sigmoid激活函數,以及一個輸入特征數量為512,輸出特征數量為output_dim的線性層;
(2)print("Epoch %d, cost = %f, acc = %.2f%%" % (i + 1, cost / num_batches, 100. * np.mean(predY == teY)))最后打印當前訓練的輪次,損失值和acc,上述的代碼輸入如下(執行時間比邏輯回歸要長,但是準確率要高很多):
...
Epoch 91, cost = 0.054484, acc = 97.58%
Epoch 92, cost = 0.053753, acc = 97.56%
Epoch 93, cost = 0.053036, acc = 97.60%
Epoch 94, cost = 0.052332, acc = 97.61%
Epoch 95, cost = 0.051641, acc = 97.63%
Epoch 96, cost = 0.050964, acc = 97.66%
Epoch 97, cost = 0.050298, acc = 97.66%
Epoch 98, cost = 0.049645, acc = 97.67%
Epoch 99, cost = 0.049003, acc = 97.67%
Epoch 100, cost = 0.048373, acc = 97.68%