Torch - 使用多线程将张量加载到队列以进行训练

2018-5-8 3:30:42

收藏：0

阅读：64

评论：2

我想使用线程库（或者可能是Parallel库）来将数据加载/预处理到队列中，但我不太确定它是如何工作的。总之；

加载数据（张量），预处理张量（这需要时间，因此我在这里），并将它们放入队列中。我希望尽可能多的线程执行此操作，以便模型不会等待或等待时间过长。
对于队列顶部的张量，提取它并将其传递到模型中，然后从队列中删除它。

我真的不太理解https://github.com/torch/threads中的示例。如果能给我一个示例或提示，让我知道应该在哪里加载数据到队列并进行训练，那就太好了。

编辑于 14/03/2016

在此示例中" https://github.com/torch/threads/blob/master/test/test-low-level.lua"中，使用低级线程，有谁知道如何将数据从这些线程中提取到主线程中？

用户117844

看看这个多线程的数据提供程序：

https://github.com/soumith/dcgan.torch/blob/master/data/data.lua

它在线程中运行这个文件：

https://github.com/soumith/dcgan.torch/blob/master/data/data.lua#L18

在这里调用它：

https://github.com/soumith/dcgan.torch/blob/master/data/data.lua#L30-L43

然后，如果你想将任务排入线程队列中，你需要提供两个函数：

https://github.com/soumith/dcgan.torch/blob/master/data/data.lua#L84

第一个函数在线程内部运行，第二个函数在第一个函数完成后在主线程中运行。希望这让你更清楚了解它。

2016-02-25 17:28:27

用户2104596

如果Soumith在之前的回答中的示例不太容易使用，我建议您从头开始构建自己的流水线。我在这里提供了两个同步线程的示例：一个用于写入数据，一个用于读取数据：

local t = require 'threads'
t.Threads.serialization('threads.sharedserialize')
local tds = require 'tds'
local dict = tds.Hash()  -- 只有本地变量可以在此处工作，只有表或tds.Hash（）

dict[1] = torch.zeros(4)

local m1 = t.Mutex()
local m2 = t.Mutex()
local m1id  = m1:id()
local m2id  = m2:id()

m1:lock()

local pool = t.Threads(
  1,
  function(threadIdx)
  end
)

pool:addjob(
  function()
    local t = require 'threads'
    local m1 = t.Mutex(m1id)
    local m2 = t.Mutex(m2id)

    while true do
      m2:lock()
      dict[1] = torch.randn(4)
      m1:unlock()

      print ('W ===> ')
      print(dict[1])
      collectgarbage()
      collectgarbage()
    end

    return __threadid
  end,
  function(id)
  end
)

-- 主程序执行的代码：
local a = 1
while true do
  m1:lock()
  a = dict[1]
  m2:unlock()

  print('R --> ')
  print(a)
end

2016-07-07 12:46:56

评论区的留言会收到邮件通知哦~

作者:

用户5407700

技术支撑

Nana 框架
Kong API 网关
Nuxt 服务端渲染

统计信息

会员 0
文章数: 0
话题数: ...