luajit 中 FFI.cast 的低性能问题

请问以下代码中为什么 FFI.cast 的性能如此低?

prof = require 'profile'

local ffi = require("ffi")

ffi.cdef[[
struct message {
    int field_a;
};

]]

function cast_test1()
   bytes = ffi.new("char[100000000]")

   sum = 0
   t1 = prof.rdtsc()
   for i=1,1000000 do
      sum = sum + i
   end
   t2 = prof.rdtsc()

   print("test1", tonumber(t2-t1))
end

function cast_test2()
   bytes = ffi.new("char[100000000]")

   sum = 0
   t1 = prof.rdtsc()
   for i=1,1000000 do
      sum = sum + i
      msg = ffi.cast("struct message *", bytes+ i * 16)
--      msg.field_a = i
   end
   t2 = prof.rdtsc()

   print("test2", tonumber(t2-t1))
end

cast_test1()
cast_test2()

似乎含有 FFI.cast 的循环运行大约慢了 30 倍。有什么办法能够解决这个问题吗?

% luajit -v  cast_tests.lua
LuaJIT 2.0.3 -- Copyright (C) 2005-2014 Mike Pall. http://luajit.org/
test1   3227528
test2   94474000
点赞
用户244989
用户244989

看起来全局的 msg 变量是罪魁祸首。将它替换为本地变量可以让速度提高 20 倍 :)

这适用于 lualit-2.0.3 和 lualit-2.1

function cast_test3()
   local bytes = ffi.new("char[100000000]")
   local sum = 0
   local t1 = prof.rdtsc()
   for i=1,1000000 do
      sum = sum + i
      local msg = ffi.cast("struct message *", bytes+ i * 4)
      msg.field_a = i
   end
   local t2 = prof.rdtsc()
   local sum2 = 0
   for i=1,1000000 do
      local msg = ffi.cast("struct message *", bytes+ i * 4)
      sum2 = sum2 + msg.field_a
   end

   local t3 = prof.rdtsc()
   print(sum, sum2)
   print("test3", tonumber(t2-t1), tonumber(t3-t2))
end

cast_test3()

结果:

% /usr/bin/luajit -v    cast_tests.lua           ~/Projects/lua_tests/lua_rdtsc
LuaJIT 2.0.3 -- Copyright (C) 2005-2014 Mike Pall. http://luajit.org/
500000500000    500000500000
test3   4502508 4850884
2016-02-05 10:10:32