Unicode字符串的字节表示

2018-2-13 10:43:45

收藏：0

阅读：81

评论：2

这是python3代码:

>>> bytes(json.dumps({'Ä':0}), "utf-8")
b'{"\\u00c4": 0}'

json.dumps()返回Unicode字符串，bytes()返回它的字节表示——字符串编码为utf-8。

我如何在Lua中实现相同的结果？我需要一个包含非ASCII字符的json对象的字节表示。

用户6834680

你必须手动完成。

local function utf8_to_unicode(utf8str, pos)
   local code, size = utf8str:byte(pos), 1
   if code >= 0xC0 and code < 0xFE then
      local mask = 64
      code = code - 128
      repeat
         local next_byte = utf8str:byte(pos + size) or 0
         if next_byte >= 0x80 and next_byte < 0xC0 then
            code, size = (code - mask - 2) * 64 + next_byte, size + 1
         else
            code, size = utf8str:byte(pos), 1
         end
         mask = mask * 32
      until code < mask
   end
   -- 返回 utf8 字符的代码和字节数
   return code, size
end

function utf8_to_python(utf8str)
   local pos = 1
   local z = ''
   while pos <= #utf8str do
      local unicode, size = utf8_to_unicode(utf8str, pos)
      pos = pos + size
      if unicode < 0x80 then
         z = z..string.char(unicode)
      elseif unicode < 0x10000 then
         z = z..string.format('\\\\u%04x', unicode)
      else
         z = z..string.format('\\\\U%08x', unicode)
      end
   end
   return z
end

用法：

local json = require('json')
local x = {['Ã„'] = 0}
local y = json.encode(x)
print(y)                       -->  {"Ã„":0}
local z = utf8_to_python(y)
print(z)                       -->  {"\\u00c4":0}

2018-02-13 11:34:54

用户9383219

使用 string.gsub 的简单版本：

local function python_escape(str)
  return (string.gsub(
    str,
    -- 以一个或多个 continuation bytes 紧随的前导字节；
    -- 对于 Lua 5.1 的十进制版本："[\194-\244][\128-\191]+",
    "[\xC2-\xF4][\x80-\xBF]+",
    function (non_ASCII)
      local codepoint = utf8.codepoint(non_ASCII)
      if codepoint <= 0xFFFF then
        return ("\\u%04x"):format(codepoint)
      else
        return ("\\U%08x"):format(codepoint)
      end
    end))
end

我在返回值前加入了括号(string.gsub(--[[...]]))以去掉 string.gsub 的第二个返回值（替换次数）。

2018-02-19 23:37:56

评论区的留言会收到邮件通知哦~

作者:

用户227024

Unicode字符串的字节表示

社区规范

发文指南

社区文章

开源项目 & 应用

🎮 游戏开发

World of Warcraft

Roblox

Defold

LÖVE 2D

🌐 高性能网络与 Web 服务

OpenResty

Kong

Redis

Nmap

LuaJIT

Wapiti

Wireshark

⚙️ 嵌入式系统与应用工具

LuatOS

TeX Live

Awesome WM

Vim/Neovim

FFmpeg

🧠 人工智能与科学计算

Torch

SciLua