在Java中解析XML时，请保留数字字符实体字符（例如``）

和选

2023-03-14

问题内容：

我正在解析包含数字字符实体字符的XML，例如（但不限于）
  < >Java中的（换行符回车<>）。解析时，我将节点的文本内容附加到StringBuffer上，以便稍后将其写到文本文件中。

但是，当我将String写入文件或打印出文件时，这些unicode字符会被解析或转换为换行符/空格。

在Java中遍历XML文件的节点并将文本内容节点存储到String时，如何保留原始的数字字符实体字符符号？

演示xml文件的示例：

<?xml version="1.0" encoding="UTF-8"?>
<ABCD version="2">    
    <Field attributeWithChar="A string followed by special symbols &#13;  &#10;" />
</ABCD>

示例Java代码。它加载XML，遍历节点，并将每个节点的文本内容收集到StringBuffer。迭代结束后，它将StringBuffer写入控制台以及文件
 符号（但不写入符号）。

将这些符号存储到字符串中时，如何保留这些符号？请你帮助我好吗？谢谢。

public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException, TransformerException {   
    DocumentBuilderFactory documentFactory = DocumentBuilderFactory.newInstance();
    Document document = null;
    DocumentBuilder documentBuilder = documentFactory.newDocumentBuilder();
    document = documentBuilder.parse(new File("path/to/demo.xml"));
    StringBuilder sb = new StringBuilder();

    NodeList nodeList = document.getElementsByTagName("*");
    for (int i = 0; i < nodeList.getLength(); i++) {
        Node node = nodeList.item(i);
        if (node.getNodeType() == Node.ELEMENT_NODE) {
            NamedNodeMap nnp = node.getAttributes();
            for (int j = 0; j < nnp.getLength(); j++) {
                sb.append(nnp.item(j).getTextContent());
            }
        }
    }
    System.out.println(sb.toString());

    try (Writer writer = new BufferedWriter(new OutputStreamWriter(
            new FileOutputStream("path/to/demo_output.xml"), "UTF-8"))) {
        writer.write(sb.toString());
    }
}

问题答案：

您需要先将所有XML实体转义，然后再将文件解析为Document。您可以通过使用与之对应的XML实体对“ ＆”号
进行转义来实现。就像是，&``&

DocumentBuilder documentBuilder =
        DocumentBuilderFactory.newInstance().newDocumentBuilder();

String xmlContents = new String(Files.readAllBytes(Paths.get("demo.xml")), "UTF-8");

Document document = documentBuilder.parse(
         new InputSource(new StringReader(xmlContents.replaceAll("&", "&amp;"))
        ));

输出：

2A string followed by special symbols &#13;  &#10;

在Java中解析XML时，请保留数字字符实体字符（例如``）

相关阅读

相关文章

相关问答

相关工具

相关文档