问题：

c#使用openXML创建word文档：XML解析错误（替换字符串包含空格时）

谭玉泽

2023-03-14

我正试图在我的C#应用程序中使用openXML使用word模板创建word文档。以下是我目前的代码：

DirectoryInfo tempDir = new DirectoryInfo(Server.MapPath("~\\Files\\WordTemplates\\"));

DirectoryInfo docsDir = new DirectoryInfo(Server.MapPath("~\\Files\\FinanceDocuments\\"));

string ype = "test Merge"; //if ype string contains spaces then I get this error
string sourceFile = tempDir + "\\PaymentOrderTemplate.dotx";
string destinationFile = docsDir + "\\" + "PaymentOrder.doc";

// Create a copy of the template file and open the copy 
File.Copy(sourceFile, destinationFile, true);

// create key value pair, key represents words to be replace and 
//values represent values in document in place of keys.
Dictionary<string, string> keyValues = new Dictionary<string, string>();
keyValues.Add("ype", ype);                
SearchAndReplace(destinationFile, keyValues);
Process.Start(destinationFile);

以及SearchAndReplace功能：

public static void SearchAndReplace(string document, Dictionary<string, string> dict)
{
    using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(document, true))
    {
        string docText = null;

        using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
        {
            docText = sr.ReadToEnd();
        }

        foreach (KeyValuePair<string, string> item in dict)
        {
            Regex regexText = new Regex(item.Key);
            docText = regexText.Replace(docText, item.Value);
        }

        using (StreamWriter sw = new StreamWriter(
                  wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
        {
            sw.Write(docText);
        }
    }
}

但是，当我试图打开导出的文件时，会出现以下错误：

XML解析错误

位置：零件：/word/文件。xml，第2行，第2142列

文件xml第一行：

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>


<w:document xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:cx="http://schemas.microsoft.com/office/drawing/2014/chartex" xmlns:cx1="http://schemas.microsoft.com/office/drawing/2015/9/8/chartex" xmlns:cx2="http://schemas.microsoft.com/office/drawing/2015/10/21/chartex" xmlns:cx3="http://schemas.microsoft.com/office/drawing/2016/5/9/chartex" xmlns:cx4="http://schemas.microsoft.com/office/drawing/2016/5/10/chartex" xmlns:cx5="http://schemas.microsoft.com/office/drawing/2016/5/11/chartex" xmlns:cx6="http://schemas.microsoft.com/office/drawing/2016/5/12/chartex" xmlns:cx7="http://schemas.microsoft.com/office/drawing/2016/5/13/chartex" xmlns:cx8="http://schemas.microsoft.com/office/drawing/2016/5/14/chartex" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:aink="http://schemas.microsoft.com/office/drawing/2016/ink" xmlns:am3d="http://schemas.microsoft.com/office/drawing/2017/model3d" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml" xmlns:w16cid="http://schemas.microsoft.com/office/word/2016/wordml/cid" xmlns:w16se="http://schemas.microsoft.com/office/word/2015/wordml/symex" xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape" mc:Ignorable="w14 w15 w16se w16cid wp14">

<w:body>

<w:tbl>

<w:tblPr>

<w:tblW w:w="10348" w:ttest Merge="dxa"/>

<w:tblInd w:w="108" w:ttest Merge="dxa"/>

<w:tblBorders>

编辑我发现这个问题发生是因为我在单词模板中使用合并字段。如果我使用纯文本，它就会工作。但是在这种情况下，它会很慢，因为它必须检查模板中的每一个单词，如果匹配替换它。有可能用另一种方式来做吗？

共有1个答案

邰伟彦

2023-03-14

免责声明：您似乎在使用OpenXML SDK，因为您的代码看起来几乎与这里找到的代码相同：https://msdn.microsoft.com/en-us/library/bb508261（v=office.12）。aspx-我一生中从未使用过这个SDK，我的答案是基于对正在发生的事情的有根据的猜测

您在Word文档上执行的操作似乎正在影响文档中您不打算的部分。

我相信电话文件。主要部分。GetStream（）只是让您或多或少地直接访问文档的XML，然后将其视为纯XML文件，将其作为文本处理，并执行一系列纯文本替换？我认为这很可能是问题的原因，因为您打算编辑文档文本，但在编辑过程中意外损坏了xml节点结构

下面是一个简单的超文本标记语言文档：

<html>
 <head><title>Damage report</title></head>
 <body>
  <p>The soldier was shot once in the body and twice in the head</p>
 </body>
</html>

你决定运行“查找/替换”来确定士兵被射杀的位置，具体一点：

var html = File.ReadAllText(@"c:\my.html");
html = html.Replace("body", "chest");
html = html.Replace("head", "forehead");
File.WriteAllText(@"c:\my.html");

只有一件事，你的文件现在被毁了：

<html>
 <forehead><title>Damage report</title></forehead>
 <chest>
  <p>The soldier was shot once in the chest and twice in the forehead</p>
 </chest>
</html>

浏览器再也无法解析它（嗯，我想它仍然有效，但毫无意义），因为替换操作破坏了一些东西。

您正在将"ype"替换为"test Merge"，但这似乎正在重击单词"type"的出现-这似乎很可能出现在XML属性或元素名称中-并将其转换为"ttest Merge"。

要正确更改XML文档节点文本的内容，应该将其从文本解析为XML文档对象模型表示，迭代节点，更改文本，然后将整个内容重新序列化为XML文本。Office SDK似乎确实提供了实现这一点的方法，因为您可以将文档视为类对象实例的集合，并说出以下代码片段（也来自MSDN）：

// Create a Wordprocessing document. 
using (WordprocessingDocument myDoc = WordprocessingDocument.Create(docName, WordprocessingDocumentType.Document)) 
{ 
   // Add a new main document part. 
   MainDocumentPart mainPart = myDoc.AddMainDocumentPart(); 
   //Create DOM tree for simple document. 
   mainPart.Document = new Document(); 
   Body body = new Body(); 
   Paragraph p = new Paragraph(); 
   Run r = new Run(); 
   Text t = new Text("Hello World!"); 
   //Append elements appropriately. 
   r.Append(t); 
   p.Append(r); 
   body.Append(p); 
   mainPart.Document.Append(body); 
   // Save changes to the main document part. 
   mainPart.Document.Save(); 
}

您应该寻找另一种方式来访问文档元素，而不是使用streams/直接低级xml访问。比如：

https://blogs.msdn.microsoft.com/brian_jones/2009/01/28/traversing-in-the-open-xml-dom/ 
https://www.gemboxsoftware.com/document/articles/find-replace-word-csharp

或者可以从这样一个相关的SO问题开始：在OPENXML中搜索和替换文本（添加文件）（尽管你需要的答案可能在这个问题中的链接中）

类似资料：

C#采用OpenXml给Word文档添加表格

本文向大家介绍C#采用OpenXml给Word文档添加表格，包括了C#采用OpenXml给Word文档添加表格的使用技巧和注意事项，需要的朋友参考一下本文实例讲述了C#采用OpenXml给Word文档添加表格的方法，是非常实用的操作技巧。分享给大家供大家参考。具体分析如下：这里将展示如何使用Openxml向Word添加表格. 代码中表头和数据我们用的同一个TableRow来添加，其实可以通过T
使用DOMParser解析包含HTML字符串的XML文件

我正在尝试使用DOMParser解析带有HTML字符串的XML文件。问题是getTextContent（）方法只获取文本，而不获取其中的任何HTML标记。我希望字符串按原样返回，而不是按解析后的版本返回。我搜索了整个网络，却找不到任何对我有帮助的东西。顺便说一句，我无法对HTML字符串进行任何更改，因为在大约500个文件中有超过100k个Sting。测验xml文件 Java模块实际输出这里有
C#采用OpenXml实现给word文档添加文字

本文向大家介绍C#采用OpenXml实现给word文档添加文字，包括了C#采用OpenXml实现给word文档添加文字的使用技巧和注意事项，需要的朋友参考一下本文实例讲述了C#采用OpenXml实现给word文档添加文字的方法，分享给大家供大家参考。具体方法如下：一般来说，使用OpenXml给word文档添加文字，每个模块都有自己对于的属性以及内容，要设置样式就先声明属性对象，将样式Appen
用包含字符串的列表替换字符串

两行的一个例子是:([a，b，c]，d)和([d，e]，a)我想把这些行转换成([a，b，c]，[d，e])和([d，e]，[a，b，c]) dataframe的列名是“src”和“dst”。我如何处理这个问题？
如何在解析文档时用另一个字符串替换文档中字符串的一部分

我有一个解析的PDF文档，我使用java中的库进行了解析。问题是文档中的表没有正确解析，它像文本一样被解析（一行一行）。所以我使用了一个名为Camelot的Python库，它给了我解析的表格式，我将其发送到java。我需要用Camelot中的表替换PDF解析的表，并保持其余的不变。文档中有多个表，因此解析的表以字符串列表的形式返回，每个索引给出每个表的解析值。标记表示Camelot输出的附加图像
使用Java创建Word文档

我使用数据库中的数据获取默认表模型，我想以doc word打印为表。如何实现。请参阅下面的代码：

c#使用openXML创建word文档：XML解析错误（替换字符串包含空格时）

共有1个答案

相关问答

相关文章

相关阅读

相关工具

相关文档