问题：

如何正确地从InputStream读取Unicode？

暨弘毅

2023-03-14

我发现其他人也有同样的问题，他们的问题通过在InputStreamReader构造函数中指定UTF-8来解决：

以UTF-8形式读取InputStream

这对我不起作用，我也不知道为什么。无论我尝试什么，我总是得到转义的unicode值（斜杠-U+十六进制），而不是实际的语言字符。我在这里做错了什么？提前道谢！

// InputStream is is a FileInputStream:
public void load(InputStream is) throws Exception {

    BufferedReader br = null;

    try {
        // Passing "UTF8" or "UTF-8" to this constructor makes no difference for me:
        br = new BufferedReader(new InputStreamReader(is, StandardCharsets.UTF_8));
        String line = null;         
        while ((line = br.readLine()) != null) {
            // The following prints "got line: chinese = \u4f60\u597d" instead of "got line: chinese = 你好"
            System.out.println("got line: " + line);
        }
    } finally {
        if (br != null) {
            br.close();
        }
    }       
}

请注意：这不是字体问题。我之所以知道这一点，是因为如果我对同一个文件使用ResourceBundle，我就会正确地获得IDE控制台中打印的中文字符。但是每当我尝试使用FileInputStream手动读取文件时，就会有一些东西不断地将字符转换为斜杠/u约定。即使我告诉它使用UTF-8编码。我也试着修改了项目的JVM参数编码，但仍然没有得到满意的结果。再次感谢你的建议。

此外，使用ResourceBundle作为最终解决方案也不是我的选择。对于这个特定的项目来说，为什么它不是一个合适的工具，为什么我要自己明确地做这件事，都是有正当理由的。

编辑：我尝试手动从InputStream中提取字节，完全绕过InputStreamReader及其构造函数，它似乎忽略了我的编码参数。这只会导致同样的行为。斜杠+U约定而不是正确的字符。很难理解为什么我不能让它的工作方式，它的工作方式似乎是每个人。我是否有某个系统/操作系统设置覆盖了Java正确处理Unicode的能力？我在Windows7版本6.1（也是64位）上使用Java版本1.8.0_65（64位）。

public void load(InputStream is) throws Exception {     
    String line = null;     
    try {
        while ((line = readLine(is)) != null) {
            // The following prints "got line: chinese = \u4f60\u597d" instead of "got line: chinese = 你好"
            System.out.println("got line: " + line);                
        }           
    } finally {
        is.close();
    }       
}

private String readLine(InputStream is) throws Exception {      
    List<Byte> bytesList = new ArrayList<>();       
    while (true) {
        byte b = -1;

        try {
            b = (byte)is.read();
        } catch (EOFException e) {
            return bytesToString(bytesList);
        }           
        if (b == -1) {
            return bytesToString(bytesList);
        }
        char ch = (char)b;
        if (ch == '\n') {
            return bytesToString(bytesList);
        }
        bytesList.add(b);
    }       
}

private String bytesToString(List<Byte> bytesList) {        
    if (bytesList.isEmpty()) {
        return null;
    }       
    byte[] bytes = new byte[bytesList.size()];
    for (int i = 0; i < bytes.length; i++) {
        bytes[i] = bytesList.get(i);
    }       
    return new String(bytes, 0, bytes.length);
}

共有1个答案

颜霖

2023-03-14

万一有其他人遇到同样的麻烦，我也能找到解决办法。由于ResourceBundle总是为我做正确的事情，我深入研究了为什么会这样，并发现java.util.properties通过loadConvert（）函数发挥了所有的魔力。在BufferedReader从文件中给出一行文本后，我需要显式解码该字符串中的Unicode转义字符，大致如下所示：

public void load(InputStream is) throws Exception {

    BufferedReader br = null;

    try {
        // Passing "UTF8" or "UTF-8" to this constructor makes no difference for me:
        br = new BufferedReader(new InputStreamReader(is, StandardCharsets.UTF_8));
        String line = null;         
        while ((line = br.readLine()) != null) {
            // The following prints "got line: chinese = \u4f60\u597d" instead of "got line: chinese = 你好"
            System.out.println("got line: " + line);
            line = decodeUni(line);
            // The following prints "decoded line: chinese = 你好" exactly as it should!
            System.out.println("decoded line: " + line);
        }
    } finally {
        if (br != null) {
            br.close();
        }
    }       
}

// Converts encoded "\\uxxxx" to unicode chars
private String decodeUni(String string) {

    char[] charsIn = string.toCharArray();
    int len = charsIn.length;
    char[] charsOut = new char[len];
    char ch;
    int outLen = 0;
    int off = 0;
    int end = off + len;

    while (off < end) {
        ch = charsIn[off++];
        // Does aChar start with "\\u" ?
        if (ch == '\\') {
            ch = charsIn[off++];
            if(ch == 'u') {
                // Yep! Convert the hex part to the correct character.
                int value = 0;
                for (int i = 0; i < 4; i++) {
                    ch = charsIn[off++];  
                    switch (ch) {
                        case '0': case '1': case '2': case '3': case '4':
                        case '5': case '6': case '7': case '8': case '9': {
                            value = (value << 4) + ch - '0';
                            break;
                        }
                        case 'a': case 'b': case 'c': case 'd': case 'e': case 'f': {
                            value = (value << 4) + 10 + ch - 'a';
                            break;
                        }
                        case 'A': case 'B': case 'C': case 'D': case 'E': case 'F': {
                            value = (value << 4) + 10 + ch - 'A';
                            break;
                        }
                        default: throw new IllegalArgumentException("Malformed \\uxxxx encoding: " + string);
                    }
                }
                charsOut[outLen++] = (char)value;
            } else {
                // Starts with a slash but not "\\u", handle the other possible escaped characters.
                switch (ch) {
                    case 't':
                        ch = '\t';
                        break;
                    case 'r':
                        ch = '\r';
                        break;
                    case 'n':
                        ch = '\n'; 
                        break;
                    case 'f':
                        ch = '\f';
                        break;
                    default:
                        break;
                }
                charsOut[outLen++] = ch;
            }
        } else {
            // Doesn't start with a slash, leave as-is.
            charsOut[outLen++] = ch;
        }
    }
    return new String(charsOut, 0, outLen).trim();
}

类似资料：

如何使ImageIO从InputStream读取：Java

问题内容：我已经创建了可执行的jar文件（使用Eclipse），在jar中包含一组图像（.png）文件。所以我添加了一个源文件夹，其中所有图像都位于项目的文件夹中。代码必须访问这些文件才能使用创建BufferedImage 较早前，为了获得我使用的路径在执行jar时，它抛出错误 URI不是分层的所以现在我正在使用但是如何使ImageIO从Inputstream读取？我试过如下抛出错误 I
Java-如何扩展InputStream从JTextField读取？

问题内容：在一个项目上，我通过一个类似于控制台的小窗口运行Java应用程序。由于这里有一个很棒的社区，我设法通过从流程输出数据来解决问题，但是由于没有输入流，我运行的命令行应用程序将不断出错。基于该线程中最后一个有用的答复，我想我将以类似的方式实现该实现，但是在javadocs中以及整个google和互联网中寻找某个类，该类确实没有发现任何解释方法。因此，我需要一些链接，示例，教程，示例代码
如何使用InputStream从ZIP读取文件？

问题内容：我必须使用SFTP从ZIP存档（只有一个文件，我知道它的名称）中获取文件内容。我唯一拥有的是ZIP的。大多数示例说明如何使用以下语句获取内容：但是正如我所说，我的本地计算机上没有ZIP文件，也不想下载它。是够看了？ UPD：这是我的方法：问题答案：好吧，我已经做到了：它可以帮助我阅读ZIP的内容而无需写入另一个文件。
如何使用InputStream从ZIP读取文件？

我必须使用SFTP从ZIP存档（只有一个文件，我知道它的名称）获取文件内容。我唯一拥有的是ZIP的InputStream。大多数示例显示了如何使用此语句获取内容：但正如我所说，我的本地机器上没有ZIP文件，我不想下载它。输入流是否足以读取？ UPD：我就是这样做的：
如何决定从inputstream读取多少字节？

我试图从一个读取。我写了下面的代码我不明白的是我应该在一次迭代中读取多少字节？流包含保存在磁盘上的文件。我在这里读过，但我并不真正理解这篇文章。
如何从BufferedImage获取InputStream？

问题内容：如何从BufferedImage对象获取InputStream？我尝试了这个，但是ImageIO.createImageInputStream（）总是返回NULL 图片缩略图已正确生成，因为我可以成功将 bigImage绘制到 JPanel 。谢谢。问题答案：如果您尝试将图像保存到文件，请尝试：如果您只想读取字节，请尝试执行写调用，但将其传递给ByteArrayOutputS

如何正确地从InputStream读取Unicode？

共有1个答案

相关问答

相关文章

相关阅读

相关工具

相关文档