问题：

C++将ASII转义unicode字符串转换为utf8字符串

怀飞扬

2023-03-14

我需要读入带有unicode转义的标准ascii样式字符串，并将其转换为包含utf8编码等价物的std：：字符串。因此，例如“\u03a0”（包含6个字符的std：：字符串）应转换为包含两个字符的std：：字符串，分别为0xCE和0xA0，以原始二进制格式。

如果有一个简单的答案使用icu或boost我会很高兴，但我还没有找到一个。

（这类似于将Unicode字符串转换为转义ASCII字符串，但注意到我最终需要达到UTF8编码。如果我们可以使用Unicode作为中间步骤，那就好了。）

共有2个答案

宰父君昊

2023-03-14

试试这样的方法：

std::string to_utf8(uint32_t cp)
{
    /*
    if using C++11 or later, you can do this:

    std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t> conv;
    return conv.to_bytes( (char32_t)cp );

    Otherwise...
    */

    std::string result;

    int count;
    if (cp <= 0x007F)
        count = 1
    else if (cp <= 0x07FF)
        count = 2;
    else if (cp <= 0xFFFF)
        count = 3;
    else if (cp <= 0x10FFFF)
        count = 4;
    else
        return result; // or throw an exception

    result.resize(count);

    if (count > 1)
    {
        for (int i = count-1; i > 0; --i)
        {
            result[i] = (char) (0x80 | (cp & 0x3F));
            cp >>= 6;
        }

        for (int i = 0; i < count; ++i)
            cp |= (1 << (7-i));
    }

    result[0] = (char) cp;

    return result;
}

null

std::string str = ...; // "\\u03a0"
std::string::size_type startIdx = 0;
do
{
    startIdx = str.find("\\u", startIdx);
    if (startIdx == std::string::npos) break;

    std::string::size_type endIdx = str.find_first_not_of("0123456789abcdefABCDEF", startIdx+2);
    if (endIdx == std::string::npos) break;

    std::string tmpStr = str.substr(startIdx+2, endIdx-(startIdx+2));
    std::istringstream iss(tmpStr);

    uint32_t cp;
    if (iss >> std::hex >> cp)
    {
        std::string utf8 = to_utf8(cp);
        str.replace(startIdx, 2+tmpStr.length(), utf8);
        startIdx += utf8.length();
    }
    else
        startIdx += 2;
}
while (true);

唐增

2023-03-14

(\u03a0是希腊文大写字母PI的Unicode码位，其UTF-8编码为0xCE 0xA0)

您需要：

从字符串“\u03a0”中获取数字0x03a0:删除反斜杠和u并将03a0解析为十六进制，并将其解析为wchar_t。重复直到得到一个（宽的）字符串。
将0x3A0转换为UTF-8。C++11有一个可能有用的codecvt_utf8。

C++将ASII转义unicode字符串转换为utf8字符串

共有2个答案

相关问答

相关文章

相关阅读

相关工具

相关文档