防止梯度更新架构中的子网络权重

我有一个使用nngraph构建的架构,如下所示:

require 'nn'
require 'nngraph'

input = nn.Identity()()
net1 = nn.Sequential():add(nn.SpatialConvolution(1, 5, 3, 3)):add(nn.ReLU(true)):add(nn.SpatialConvolution(5, 20, 4, 4))
net2 = nn.Sequential():add(nn.SpatialFullConvolution(20, 5, 4, 4)):add(nn.ReLU(true)):add(nn.SpatialFullConvolution(5, 1, 3, 3)):add(nn.Sigmoid())
net3 = nn.Sequential():add(nn.SpatialConvolution(1, 20, 3, 3)):add(nn.ReLU(true)):add(nn.SpatialConvolution(20, 40, 4, 4)):add(nn.ReLU(true)):add(nn.SpatialConvolution(40, 2, 3, 3)):add(nn.Sigmoid())

output1 = net1(input)
output2 = net2(output1)
output3 = net3(output2)
gMod = nn.gModule({input}, {output1, output3})

target1 = torch.rand(20, 51, 51)
target2 = torch.rand(2, 49, 49)
target2[target2:gt(0.5)] = 1
target2[target2:lt(0.5)] = 0
-- 进行正向传递
out1, out2 = unpack(gMod:forward(torch.rand(1, 56, 56)))

cr1 = nn.MSECriterion()
cr1:forward(out1, target1)
gradient1 = cr1:backward(out1, target1)

cr2 = nn.BCECriterion()
cr2:forward(out2, target2)
gradient2 = cr2:backward(out2, target2)

-- 现在更新网络权重
LR = 0.001
gMod:backward(input, {gradient1, gradient2})
gMod:updateParameters(LR)

我想知道:

1)如何停止gradient2更新net1的权重,并仅对net2net3的权重进行更新?

2)如何防止gradient2更新net3权重,但仍要更新其他子[网络]权重?

点赞
用户6076729
用户6076729

你试过在net1上停止反向传播吗?

net1.updateGradInput = function(self, inp, out) end
net1.accGradParameters = function(self,inp, out) end

只需在gradient1 = cr1:backward(out1, target1)之后放置此代码,它应该可以工作。

2016-08-02 07:43:17
用户2838606
用户2838606

问题1

这个问题有点棘手,但完全可行。如果 net2 的第一层权重不应该用 gradient2 更新,则需要修改该层后的 updateGradInput() 函数,并使其输出一个零张量。以下是代码:

input = nn.Identity()()
net1 = nn.Sequential():add(nn.SpatialConvolution(1, 5, 3, 3)):add(nn.ReLU(true)):add(nn.SpatialConvolution(5, 20, 4, 4))
net2 = nn.Sequential():add(nn.SpatialFullConvolutionInputGrad0(20, 5, 4, 4)):add(nn.ReLU(true)):add(nn.SpatialFullConvolution(5, 1, 3, 3)):add(nn.Sigmoid())
net3 = nn.Sequential():add(nn.SpatialConvolution(1, 20, 3, 3)):add(nn.ReLU(true)):add(nn.SpatialConvolution(20, 40, 4, 4)):add(nn.ReLU(true)):add(nn.SpatialConvolution(40, 2, 3, 3)):add(nn.Sigmoid())

-- 修改 updateGradInput 函数,使其在 net2 的第一层输出一个零张量
local tempLayer = net2:get(1)
function tempLayer:updateGradInput(input, gradOutput)
         self.gradInput:resizeAs(input):zero()
         return self.gradInput
end

output1 = net1(input)
output2 = net2(output1)
output3 = net3(output2)
gMod = nn.gModule({input}, {output1, output3})

-- 其他都和原来一样...

问题2

input = nn.Identity()()
net1 = nn.Sequential():add(nn.SpatialConvolution(1, 5, 3, 3)):add(nn.ReLU(true)):add(nn.SpatialConvolution(5, 20, 4, 4))
net2 = nn.Sequential():add(nn.SpatialFullConvolution(20, 5, 4, 4)):add(nn.ReLU(true)):add(nn.SpatialFullConvolution(5, 1, 3, 3)):add(nn.Sigmoid())
net3 = nn.Sequential():add(nn.SpatialConvolution(1, 20, 3, 3)):add(nn.ReLU(true)):add(nn.SpatialConvolution(20, 40, 4, 4)):add(nn.ReLU(true)):add(nn.SpatialConvolution(40, 2, 3, 3)):add(nn.Sigmoid())

net3.updateParameters = function() end -- 这样做会防止在反向传播期间更新 net3 的权重,因为 updateParameters 函数已被覆盖

output1 = net1(input)
output2 = net2(output1)
output3 = net3(output2)
gMod = nn.gModule({input}, {output1, output3})

-- 其他都和原来一样...
2016-08-04 21:08:11