Introduction to pyTorch
Deep-Learning has gone from breakthrough but mysterious field to a well known and widely applied technology. In recent years (or months) several frameworks based mainly on Python were created to simplify Deep-Learning and to make it available to the general public of software engineer. In this battle field to be the future framework of reference, some stand out such a Theano, Keras and especially Google’s TensorFlow and Facebook’s pyTorch. This article is the first of a series of tutorial on pyTorch that will start with the basic gradient descend algorithm to very advanced concept and complex models. The goal of this article is to give you a general but useful view of the gradient descent algorithm used in all the Deep-Learning frameworks.
Do not miss the previous and the next articles :
- Introduction to pyTorch #1 : The gradient descent algorithm;
- Introduction to pyTorch #3 : Image classification with CNN;
The Linear Regression
A linear regression model is a model of regression which seeks to establish a linear relation between one variable and one or multiple other variables. Given a samples
, a linear regression model assumes that the relationship between the dependent variable
and the
predictors
is linear. This relationship is modeled with an additional unobserved variable that adds noise, thus the model is defined by
(1)
These equations are sometimes stacker together and written as vectors . The model’s parameters are the
variables written as
-dimensional parameter vector, where
is the constant term. In the rest of the article we will use pyTorch to find these parameters with Stochastic Gradient Descent. In mose of the cases, we can use direct analytic methods but we use SGD here as an example. We will have to following model,
(2)
and the following parameters :
- b, the only predictor;
- c, the constant term (or bias);
First we will set true value for this two parameters and generate sample using a random number generator for and finally, we will use the generated samples to find an approximation of the true parameters with an numerical optimisation algorithm. We start by importing some usefull packages.
# Imports import argparse import matplotlib.pyplot as plt import math import numpy as np from matplotlib import cm import torch from torch.autograd import Variable import torch.nn as nn import torch.optim as optim
We import packages like matplotlib for visualization, numpy for numercial tools and pytorch to define our model and optimization methods. We use then torch function manual_seed() and numpy function seed() to initialize the random number generator, that way we will always get the same results.
# Random seed torch.manual_seed(1) np.random.seed(1)
We define the parameter value with which we will generate samples for our dataset.
# True parameter values a = 4 c = 2
The variable v set the noise magnitude.
# Noise parameter v = 8
And n_samples set the number of samples we’re going to generate.
# Number of samples n_samples = 50
We will put the and
values in two array
and
. The values for
will be in
. For each sample, we generate the
value with the rand() function (which gives float number in
) and use this value in the linear equation. We use the rand() function again for the noise (
).
# Generate samples X = np.zeros(n_samples) Y = np.zeros(n_samples) for i in range(n_samples): x = np.random.rand()*10.0 y = a*x + c + v*(2*np.random.rand()-1.0) X[i] = x Y[i] = y # end for
First, we create a linear model, the first parameter of the object nn.Linear() is the input size (number of predictors) and the second the number of dependent variables (). We set the bias as True as it corresponds to our
parameter.
# Linear layer linear = nn.Linear(1, 1, bias=True) linear.cuda()
We now need an objective function which measure the difference between the current model’s output and the true output
. Here we use the Mean Squared Error which measure the error as the squared difference between
and
.
(3)
To use MSE with pyTorch, there is the object nn.MSELoss().
# Objective function is Mean Squared Error criterion = nn.MSELoss()
We set the learning rate to 0.01.
# Learning parameters learning_parameters = 0.01
The optim package as an object SGD for the stochastic gradient descent algorithm (SGD). The first argument is the list of parameters we want to optimize and the second is the learning rate.
# Optimizer optimizer = optim.SGD(linear.parameters(), lr=learning_parameters)
We will do 500 iterations.
# Loop over the data set for epoch in range(500):
We take our generated samples and add a dimension at the end as we have a one-dimensional feature vector and a one-dimensional output vector ().
# Inputs and outputs (n_samples * in_features) inputs, outputs = torch.Tensor(X).unsqueeze(1), torch.Tensor(Y).unsqueeze(1)
We transform the sample vector in Variable object. You can remove the cuda() function if you don’t have any GPU in your computer.
# To variable inputs, outputs = Variable(inputs.cuda()), Variable(outputs.cuda())
Then we put the gradient of each parameter to zero.
# Zero param gradients optimizer.zero_grad()
Then, we can run a forward pass, feeding the inputs into our linear layer, and then computing the Mean Squared Error between the model’s output (linear_outputs) and the target (outputs). The backward() function compute the gradient for each parameter base on the computed MSE. And finally, the step() function update the parameter using the computer gradients.
# Forward + Backward + optimize linear_outputs = linear(inputs) loss = criterion(linear_outputs, outputs) loss.backward() optimizer.step()
Each 10 iterations, we display the MSE.
# Print result if epoch % 10 == 0: print(u"Loss {} : {}".format(epoch, loss.data[0])) # end if # end for
At the end of the iterations, we get our two approximated parameters and
, and display them.
# Get and print parameter model_a = float(list(linear.parameters())[0]) model_c = float(linear.bias) print(u"Found a : {}".format(model_a)) print(u"Found c : {}".format(model_c))
And finally, we display the true model (in blue), the predicted model (in red), and all the generated samples.
# Show points and line plt.scatter(X, Y, c='r', marker='o', s=1) plt.plot([0, 10], [c, a * 10 + c], c='b') plt.plot([0, 10], [model_c, model_a * 10 + model_c], c='g') plt.show()
And the final result.
Loss 0 : 456.620697021 Loss 10 : 8.18101978302 Loss 20 : 8.01882457733 Loss 30 : 7.87698125839 Loss 40 : 7.75292825699 Loss 50 : 7.64443922043 Loss 60 : 7.54955673218 Loss 70 : 7.46657657623 Loss 80 : 7.39400577545 Loss 90 : 7.33053779602 Loss 100 : 7.27503061295 Loss 110 : 7.22648668289 Loss 120 : 7.18403053284 Loss 130 : 7.14690303802 Loss 140 : 7.11443138123 Loss 150 : 7.08603286743 Loss 160 : 7.0611948967 Loss 170 : 7.0394744873 Loss 180 : 7.02047872543 Loss 190 : 7.003865242 Loss 200 : 6.98933458328 Loss 210 : 6.97662782669 Loss 220 : 6.96551513672 Loss 230 : 6.95579576492 Loss 240 : 6.94729471207 Loss 250 : 6.93986082077 Loss 260 : 6.93336200714 Loss 270 : 6.92767572403 Loss 280 : 6.92270326614 Loss 290 : 6.91835308075 Loss 300 : 6.91455078125 Loss 310 : 6.91122436523 Loss 320 : 6.90831565857 Loss 330 : 6.90577077866 Loss 340 : 6.90354633331 Loss 350 : 6.90160083771 Loss 360 : 6.89989852905 Loss 370 : 6.89840984344 Loss 380 : 6.89710950851 Loss 390 : 6.89597034454 Loss 400 : 6.89497566223 Loss 410 : 6.89410400391 Loss 420 : 6.8933429718 Loss 430 : 6.89267683029 Loss 440 : 6.8920955658 Loss 450 : 6.89158439636 Loss 460 : 6.89113903046 Loss 470 : 6.89074993134 Loss 480 : 6.89040851593 Loss 490 : 6.89011240005 Found a : 4.03545856476 Found c : 2.14601278305