# Deep Learning with Torch: 7

## Using Optim

Posted by Fabio Fumarola on October 26, 2015

## Abstract:

In this post we analyze how to use optim to train a neural network.

## Overview

This package implements several optimization methods that can be used to train a neural network.

In the previous posts we showed how to train a neural network using a for and a learning function. The method gradientUpgrade peforms a learning step, which consists of a forward, backward and upgrade over the network weights.

function gradientUpgrade(model, x, y, criterion, learningRate, i)
local prediction = model:forward(x)
local err = criterion:forward(prediction, y)
if i % 100 == 0 then
print('error for iteration ' .. i  .. ' is ' .. err/rho)
end
model:updateParameters(learningRate)
end


However, the Optim package provides a complete list of already implemented optimization algorithms such as:

These optimization methods can be used to train a neural network.

Each optimization method is based on the same interface:

w_new, fs = optim.method(func, w, state)


where:

• w_new is the new parameter vector (after optimization),
• fs is a a table containing all the values of the objective, as evaluated during the optimization procedure:
• fs[1] is the value before optimization, and
• fs[#fs] is the most optimized one (the lowest).
• func is a closure function with the following interface: f,df_dw = func(w), normally called feval
• w is the trainable/adjustable parameter vector,
• state a list of parameters algorithm dependent

Each method has a list of parameters that can be check on the source code. Below there is a simple example of training with optim.

algo_params = {
learningRate = 1e-3,
momentum = 0.5
}

for i,sample in ipairs(training_samples) do
local feval = function(w_nex)
-- define eval function

return loss_mini_batch,dl_d_mini_batch
end

optim.sgd(fevale,x,algo_params)
end


# Example LSTM with SGD

We take the example lstm with sequencer and replace the iteration for and the gradientUpdate with a feval function.

require 'rnn'
require 'optim'

batchSize = 50
rho = 5
hiddenSize = 64
nIndex = 10000

-- define the model
model = nn.Sequential()
criterion = nn.SequencerCriterion(nn.ClassNLLCriterion())



Defines the model decorated with a Sequencer. Note that the criterion is decorated with nn.SequenceCriterion.

-- create a Dummy Dataset, dummy dataset (task predict the next item)
dataset = torch.randperm(nIndex)

-- offset is a convenient pointer to iterate over the dataset
offsets = {}
for i= 1, batchSize do
table.insert(offsets, math.ceil(math.random() * batchSize))
end
offsets = torch.LongTensor(offsets)

-- method to compute a batch
function nextBatch()
local inputs, targets = {}, {}
for step = 1, rho do
--get a batch of inputs
table.insert(inputs, dataset:index(1, offsets))
-- shift of one batch indexes
for j=1,batchSize do
if offsets[j] > nIndex then
offsets[j] = 1
end
end
-- fill the batch of targets
table.insert(targets, dataset:index(1, offsets))
end
return inputs, targets
end



Defines:

• a dummy dataset composed of a random permutation from 1 to nIndex,
• an offset table to store a list of pointers to scan the dataset, and
• a method to get the next batch.

-- get weights and loss wrt weights from the model
x, dl_dx = model:getParameters()

-- In the following code, we define a closure, feval, which computes
-- the value of the loss function at a given point x, and the gradient of
-- that function with respect to x. weigths is the vector of trainable weights,
-- it extracts a mini_batch via the nextBatch method
feval = function(x_new)
-- copy the weight if are changed
if x ~= x_new then
x:copy(x_new)
end
-- select a training batch
local inputs, targets = nextBatch()
-- batch methods)
dl_dx:zero()

-- evaluate the loss function and its derivative with respect to x, given a mini batch
local prediction = model:forward(inputs)
local loss_x = criterion:forward(prediction, targets)
model:backward(inputs, criterion:backward(prediction, targets))

return loss_x, dl_dx
end



Get the parameters from the built model and defines the feval function for the optimizer method

sgd_params = {
learningRate = 0.1,
learningRateDecay = 1e-4,
weightDecay = 0,
momentum = 0
}

-- cycle on data
for i = 1,1e4 do
-- train a mini_batch of batchSize in parallel
_, fs = optim.sgd(feval,x, sgd_params)

if sgd_params.evalCounter % 100 == 0 then
print('error for iteration ' .. sgd_params.evalCounter  .. ' is ' .. fs[1] / rho)
-- print(sgd_params)
end
end



Defines the parameter for the method and the main for to perform mini-batches on the dataset. Each 100 mini-batches it prints the error.

Check the runnable examples for classification and regression.