问题：

打开XML-在文档模板[duplicate]中查找并替换多个占位符

暴英达

2023-03-14

我知道有很多关于这个话题的帖子，但似乎没有一篇是专门针对这个问题的。我正在尝试制作一个小型通用文档生成器POC。我使用的是开放式XML。

代码是这样的：

   private static void ReplacePlaceholders<T>(string templateDocumentPath, T templateObject)
        where T : class
    {

        using (var templateDocument = WordprocessingDocument.Open(templateDocumentPath, true))
        {
            string templateDocumentText = null;
            using (var streamReader = new StreamReader(templateDocument.MainDocumentPart.GetStream()))
            {
                templateDocumentText = streamReader.ReadToEnd();
            }

            var props = templateObject.GetType().GetProperties();
            foreach (var prop in props)
            {
                var regexText = new Regex($"{prop.Name}");
                templateDocumentText =
                    regexText.Replace(templateDocumentText, prop.GetValue(templateObject).ToString());
            }

            using var streamWriter = new StreamWriter(templateDocument.MainDocumentPart.GetStream(FileMode.Create));
                streamWriter.Write(templateDocumentText);
        }
    }

代码按预期工作。问题如下：

StreamReader. ReadToend（）在标记之间分割我的占位符，所以我的替换方法只替换不会被分割的单词。

在这种情况下，我的代码将搜索单词“Firstname”，但会找到“irstname”，因此不会替换它。

有没有办法逐字扫描整个. docx并替换它们？

（编辑）我找到的部分解决方案/解决方法：-我注意到你必须在。立即docx（无需重新编辑）。例如，如果我写“firstname”，然后返回并将其修改为“firstname”，它会将单词拆分为“F”“irstname”。如果不编辑，它将不会被丢弃。

孔磊

2023-03-14

简而言之，问题的解决方案是使用OpenXMLPowerTools的OpenXmlRegex实用程序类，如下面的单元测试所示。

使用开放式XML，可以用多种方式表示同一文本。如果Microsoft Word参与创建开放式XML标记，那么为生成该文本所做的编辑将发挥重要作用。这是因为Word会跟踪在哪个编辑会话中进行的编辑。因此，例如，以下极端场景中显示的w:p（段落）元素代表的是完全相同的文本。这两个例子之间的任何事情都是可能的，所以任何真正的解决方案都必须能够解决这个问题。

以下标记非常简单：

<w:p>
  <w:r>
    <w:t>Firstname</w:t>
  </w:r>
</w:p>

虽然您通常找不到以下标记，但它代表了理论极限，每个字符都有自己的w:r和w:t元素。

<w:p>
  <w:r>
    <w:t>F</w:t>
    <w:t>i</w:t>
    <w:t>r</w:t>
    <w:t>s</w:t>
    <w:t>t</w:t>
    <w:t>n</w:t>
    <w:t>a</w:t>
    <w:t>m</w:t>
    <w:t>e</w:t>
  </w:r>
</w:p>

你可能会问，如果这个极端的例子在实践中没有出现，我为什么要使用它？答案是，它在解决方案中起着至关重要的作用，以防你想自己动手。

要做好这件事，你必须：

将段落（w:p）的运行（w:r）转换为单字符运行（即w:r元素，每个元素有一个单字符w:t或一个w:sym），保留运行属性（w:rPr）
对这些单字符运行执行搜索和替换操作（使用其他一些技巧）；和
考虑到搜索和替换操作产生的运行的潜在不同运行属性（w:rPr），请将这些产生的运行转换回表示文本及其格式所需的最少数量的“合并”运行

替换文本时，您不应该丢失或更改不受替换影响的文本的格式。您也不应该删除不受影响的字段或内容控件（w： sdt）。啊，顺便说一句，不要忘记修订标记，如w： ins和w： del...

好消息是你不必自己动手。Eric White的Open Xml PowerTools的OpenXmlRegex实用程序类实现了上述算法（以及更多）。我已经成功地将其应用于大规模的RFP和承包方案中，并为此做出了贡献。

在本节中，我将演示如何使用Open Xml PowerTools将占位符文本“Firstname”（如问题中所示）替换为各种名字（在示例输出文档中使用“Bernie”）。

让我们先看看下面的示例文档，它是由稍后显示的单元测试创建的。请注意，我们有格式化的运行和一个符号。如问题所示，占位符“Firstname”被分成两个运行，即“F”和“irstname”。

<?xml version="1.0" encoding="utf-8"?>
<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
  <w:body>
    <w:p>
      <w:r>
        <w:rPr>
          <w:i />
        </w:rPr>
        <w:t xml:space="preserve">Hello </w:t>
      </w:r>
      <w:r>
        <w:rPr>
          <w:b />
        </w:rPr>
        <w:t>F</w:t>
      </w:r>
      <w:r>
        <w:rPr>
          <w:b />
        </w:rPr>
        <w:t>irstname</w:t>
      </w:r>
      <w:r>
        <w:t xml:space="preserve"> </w:t>
      </w:r>
      <w:r>
        <w:sym w:font="Wingdings" w:char="F04A" />
      </w:r>
    </w:p>
  </w:body>
</w:document>

以下是将“Firstname”替换为“Bernie”后生成的文档，前提是操作正确。请注意，格式是保留的，我们没有丢失符号。

<?xml version="1.0" encoding="utf-8"?>
<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
  <w:body>
    <w:p>
      <w:r>
        <w:rPr>
          <w:i />
        </w:rPr>
        <w:t xml:space="preserve">Hello </w:t>
      </w:r>
      <w:r>
        <w:rPr>
          <w:b />
        </w:rPr>
        <w:t>Bernie</w:t>
      </w:r>
      <w:r>
        <w:t xml:space="preserve"> </w:t>
      </w:r>
      <w:r>
        <w:sym w:font="Wingdings" w:char="F04A" />
      </w:r>
    </w:p>
  </w:body>
</w:document>

接下来，这里是一个完整的单元测试，演示如何使用OpenXmlRegex。Replace（）。单元测试也证明了这一点：

无论占位符（例如，“名字”）如何在一个或多个跑步中分割

[Theory]
[InlineData("1 Run", "Firstname", new[] { "Firstname" }, "Albert")]
[InlineData("2 Runs", "Firstname", new[] { "F", "irstname" }, "Bernie")]
[InlineData("9 Runs", "Firstname", new[] { "F", "i", "r", "s", "t", "n", "a", "m", "e" }, "Charly")]
public void Replace_PlaceholderInOneOrMoreRuns_SuccessfullyReplaced(
    string example,
    string propName,
    IEnumerable<string> runTexts,
    string replacement)
{
    // Create a test WordprocessingDocument on a MemoryStream.
    using MemoryStream stream = CreateWordprocessingDocument(runTexts);

    // Save the Word document before replacing the placeholder.
    // You can use this to inspect the input Word document.
    File.WriteAllBytes($"{example} before Replacing.docx", stream.ToArray());

    // Replace the placeholder identified by propName with the replacement text.
    using (WordprocessingDocument wordDocument = WordprocessingDocument.Open(stream, true))
    {
        // Read the root element, a w:document in this case.
        // Note that GetXElement() is a shortcut for GetXDocument().Root.
        // This caches the root element and we can later write it back
        // to the main document part, using the PutXDocument() method.
        XElement document = wordDocument.MainDocumentPart.GetXElement();

        // Specify the parameters of the OpenXmlRegex.Replace() method,
        // noting that the replacement is given as a parameter.
        IEnumerable<XElement> content = document.Descendants(W.p);
        var regex = new Regex(propName);

        // Perform the replacement, thereby modifying the root element.
        OpenXmlRegex.Replace(content, regex, replacement, null);

        // Write the changed root element back to the main document part.
        wordDocument.MainDocumentPart.PutXDocument();
    }

    // Assert that we have done it right.
    AssertReplacementWasSuccessful(stream, replacement);

    // Save the Word document after having replaced the placeholder.
    // You can use this to inspect the output Word document.
    File.WriteAllBytes($"{example} after Replacing.docx", stream.ToArray());
}

private static MemoryStream CreateWordprocessingDocument(IEnumerable<string> runTexts)
{
    var stream = new MemoryStream();
    const WordprocessingDocumentType type = WordprocessingDocumentType.Document;

    using (WordprocessingDocument wordDocument = WordprocessingDocument.Create(stream, type))
    {
        MainDocumentPart mainDocumentPart = wordDocument.AddMainDocumentPart();
        mainDocumentPart.PutXDocument(new XDocument(CreateDocument(runTexts)));
    }

    return stream;
}

private static XElement CreateDocument(IEnumerable<string> runTexts)
{
    // Produce a w:document with a single w:p that contains:
    // (1) one italic run with some lead-in, i.e., "Hello " in this example;
    // (2) one or more bold runs for the placeholder, which might or might not be split;
    // (3) one run with just a space; and
    // (4) one run with a symbol (i.e., a Wingdings smiley face).
    return new XElement(W.document,
        new XAttribute(XNamespace.Xmlns + "w", "http://schemas.openxmlformats.org/wordprocessingml/2006/main"),
        new XElement(W.body,
            new XElement(W.p,
                new XElement(W.r,
                    new XElement(W.rPr,
                        new XElement(W.i)),
                    new XElement(W.t,
                        new XAttribute(XNamespace.Xml + "space", "preserve"),
                        "Hello ")),
                runTexts.Select(rt =>
                    new XElement(W.r,
                        new XElement(W.rPr,
                            new XElement(W.b)),
                        new XElement(W.t, rt))),
                new XElement(W.r,
                    new XElement(W.t,
                        new XAttribute(XNamespace.Xml + "space", "preserve"),
                        " ")),
                new XElement(W.r,
                    new XElement(W.sym,
                        new XAttribute(W.font, "Wingdings"),
                        new XAttribute(W._char, "F04A"))))));
}

private static void AssertReplacementWasSuccessful(MemoryStream stream, string replacement)
{
    using WordprocessingDocument wordDocument = WordprocessingDocument.Open(stream, false);

    XElement document = wordDocument.MainDocumentPart.GetXElement();
    XElement paragraph = document.Descendants(W.p).Single();
    List<XElement> runs = paragraph.Elements(W.r).ToList();

    // We have the expected number of runs, i.e., the lead-in, the first name,
    // a space character, and the symbol.
    Assert.Equal(4, runs.Count);

    // We still have the lead-in "Hello " and it is still formatted in italics.
    Assert.True(runs[0].Value == "Hello " && runs[0].Elements(W.rPr).Elements(W.i).Any());

    // We have successfully replaced our "Firstname" placeholder and the
    // concrete first name is formatted in bold, exactly like the placeholder.
    Assert.True(runs[1].Value == replacement && runs[1].Elements(W.rPr).Elements(W.b).Any());

    // We still have the space between the first name and the symbol and it
    // is unformatted.
    Assert.True(runs[2].Value == " " && !runs[2].Elements(W.rPr).Any());

    // Finally, we still have our smiley face symbol run.
    Assert.True(IsSymbolRun(runs[3], "Wingdings", "F04A"));
}

private static bool IsSymbolRun(XElement run, string fontValue, string charValue)
{
    XElement sym = run.Elements(W.sym).FirstOrDefault();
    if (sym == null) return false;

    return (string) sym.Attribute(W.font) == fontValue &&
           (string) sym.Attribute(W._char) == charValue;
}

虽然使用段落类（或OpenXmlElement类的其他子类）的InnerText属性可能很有诱惑力，但问题是您将忽略任何非文本（w： t）标记。例如，如果您的段落包含符号（w： sym元素，例如，上面示例中使用的笑脸），这些符号将丢失，因为InnerText属性没有考虑它们。以下单元测试演示了这一点：

[Theory]
[InlineData("Hello Firstname ", new[] { "Firstname" })]
[InlineData("Hello Firstname ", new[] { "F", "irstname" })]
[InlineData("Hello Firstname ", new[] { "F", "i", "r", "s", "t", "n", "a", "m", "e" })]
public void InnerText_ParagraphWithSymbols_SymbolIgnored(string expectedInnerText, IEnumerable<string> runTexts)
{
    // Create Word document with smiley face symbol at the end.
    using MemoryStream stream = CreateWordprocessingDocument(runTexts);
    using WordprocessingDocument wordDocument = WordprocessingDocument.Open(stream, false);

    Document document = wordDocument.MainDocumentPart.Document;
    Paragraph paragraph = document.Descendants<Paragraph>().Single();

    string innerText = paragraph.InnerText;

    // Note that the innerText does not contain the smiley face symbol.
    Assert.Equal(expectedInnerText, innerText);
}

注意，在简单的用例中，您可能不需要考虑上面所有的内容。但是，如果您必须处理现实生活中的文档或Microsoft Word所做的标记更改，那么您很可能无法忽略其复杂性。等到你需要处理修订标记。。。

和往常一样，完整的源代码可以在我的代码片段GitHub存储库中找到。寻找OpenXMLRegExtTests类。

打开XML-在文档模板[duplicate]中查找并替换多个占位符

共有1个答案

相关问答

相关文章

相关阅读

相关工具

相关文档