问题：

docx4j转换html->docx->html

吴松

2023-03-14

(*来自http://www.docx4java.org/forums/xhtml-import-f28/html-docx-html-inserts-a-lot-of-space-t1966.html#p6791？sid=78b64a02482926c4dbdbbafbf50d0a914将在应答时更新）

我已经创建了一个html测试文档，其内容如下：

<html><ul><li>TEST LINE 1</li><li>TEST LINE 2</li></ul></html>

然后，我的代码从这个html创建一个docx，如下所示：WordprocessingMLPackage wordMLPackage=WordprocessingMLPackage.createPackage（）；

    NumberingDefinitionsPart ndp = new NumberingDefinitionsPart();
    wordMLPackage.getMainDocumentPart().addTargetPart(ndp);
    ndp.unmarshalDefaultNumbering();

    XHTMLImporterImpl xHTMLImporter = new XHTMLImporterImpl(wordMLPackage);
    xHTMLImporter.setHyperlinkStyle("Hyperlink");

    wordMLPackage.getMainDocumentPart().getContent()
            .addAll(xHTMLImporter.convert(new File("test.html"), null));

    System.out.println(XmlUtils.marshaltoString(wordMLPackage
            .getMainDocumentPart().getJaxbElement(), true, true));

    wordMLPackage.save(new java.io.File("test.docx"));

    NumberingDefinitionsPart ndp = new NumberingDefinitionsPart();
    wordMLPackage.getMainDocumentPart().addTargetPart(ndp);
    ndp.unmarshalDefaultNumbering();

    XHTMLImporterImpl xHTMLImporter = new XHTMLImporterImpl(wordMLPackage);
    xHTMLImporter.setHyperlinkStyle("Hyperlink");

    WordprocessingMLPackage docx = WordprocessingMLPackage.load(new File("test.docx"));
    AbstractHtmlExporter exporter = new HtmlExporterNG2();
    OutputStream os = new java.io.FileOutputStream("test.html");
    HTMLSettings htmlSettings = new HTMLSettings();
    javax.xml.transform.stream.StreamResult result = new javax.xml.transform.stream.StreamResult(
            os);
    exporter.html(docx, result, htmlSettings);

返回的html是：

<?xml version="1.0" encoding="UTF-8"?><html xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
<head>
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
<style>
<!--/*paged media */ div.header {display: none }div.footer {display: none } /*@media print { */@page { size: A4; margin: 10%; @top-center {content: element(header) } @bottom-center {content: element(footer) } }/*element styles*/ .del  {text-decoration:line-through;color:red;} .ins {text-decoration:none;background:#c0ffc0;padding:1px;}
 /* TABLE STYLES */ 

 /* PARAGRAPH STYLES */ 
.DocDefaults {display:block;margin-bottom: 4mm;line-height: 115%;font-size: 11.0pt;}
.Normal {display:block;}

 /* CHARACTER STYLES */ span.DefaultParagraphFont {display:inline;}
-->
</style>
<script type="text/javascript">
<!--function toggleDiv(divid){if(document.getElementById(divid).style.display == 'none'){document.getElementById(divid).style.display = 'block';}else{document.getElementById(divid).style.display = 'none';}}
--></script>
</head>
<body>

  <!-- userBodyTop goes here -->




<div class="document">


<p class="Normal DocDefaults " style="text-align: left;position: relative; margin-left: 17mm;text-indent: -0.25in;margin-bottom: 0in;">&bull;  <span class="DefaultParagraphFont " style="font-weight: normal;color: #000000;font-style: normal;font-size: 11.0pt;">TEST LINE 1</span>
</p>


<p class="Normal DocDefaults " style="text-align: left;position: relative; margin-left: 17mm;text-indent: -0.25in;margin-bottom: 0in;">&bull;  <span class="DefaultParagraphFont " style="font-weight: normal;color: #000000;font-style: normal;font-size: 11.0pt;">TEST LINE 2</span>
</p>
</div>







  <!-- userBodyTail goes here -->


</body>
</html>

现在每行后面都有很多额外的空间。不确定为什么会发生这种情况，转换似乎增加了大量额外的空白/回车。

共有1个答案

轩辕嘉平

2023-03-14

从您的问题中不清楚您是担心(X)HTML源文件中的空白，还是担心呈现的页面中的空白（大概是在CKEditor中）。如果是后者，那么浏览器和CK版本可能是相关的。

空白可能很重要，也可能不重要；尝试在谷歌上搜索“XHTML显著空白”以获得更多信息。

作为背景，根据docx4j属性docx4j.convert.out.html.outputMethodXML，docx4j将使用

<xsl:output method="html" encoding="utf-8" omit-xml-declaration="no" indent="no" 
doctype-public="-//W3C//DTD XHTML 1.0 Transitional//EN"
      doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"/>

  <xsl:output method="xml" encoding="utf-8" omit-xml-declaration="no" indent="no" 
        doctype-public="-//W3C//DTD XHTML 1.0 Transitional//EN"
        doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"/>

类似资料：

Docx4j将html转换为docx

我在将HTML转换为docx时遇到了新问题，它引发了异常：组织。xml。萨克斯。SAXS异常；行号：4；栏目号：73；实体“nbsp”已被引用，但未被声明正如我所理解的，这是因为docx4j认为我的文件是XML，并希望将其转换为docx但XML中只有5个预定义的实体，而nbsp等实体没有在XML中定义。如何让docx4j将超文本标记语言转换为doc，而无需在doctype中声明实体nbsp？
使用Docx4j将HTML转换为Docx

我一直在尝试使用他们的库将html内容转换为docx，我确实在运行我的应用程序后创建了一个docx文件，但它有空白内容，而html中确实有一些内容。请检查下面的代码，我已经包含了git上AndroidDocxtoHTML示例中所有必要的库。代码：我不明白我得到的空白文档的代码中缺少了什么。我为java找到了这段代码，我为android修改了这段代码。有些人建议使用夜间构建jar进行xhtml转
docx4j:将HTML转换为docx-table格式

null
使用docx4j将docx部件转换为html

我有一个应用程序试图拉一些数据从数据库，然后保存在一个docx文件。这些数据的一部分是html代码，因此使用docx4j，我能够将html代码转换为docx格式。这里有一个相关的帖子。现在，我想使用docx4j将这部分文本（在docx文件的表单元格中）转换回html格式，并将html代码保存到数据库中。或者也许有更好的解决方案来完成从docx到HTML的转换？希望我说清楚了。任何提示都很感激。
docx4j将docx转换为错误的html格式

我对docx4j样本有一些问题。我需要转换一个文件从docx在html格式和回来。我正在尝试编译ConvertInXHTMLDocument。java示例。它创建的Html文件很好，但当试图将其转换回docx时，抛出一个缺少关闭标记（META、img等）的异常。有人遇到过这个问题吗？
如何使用docx4j将HTML转换为.docx？[关闭]

null 很抱歉，我无法发布我尝试过的任何内容，因为我还没有在此任务上尝试过任何内容，尽管我使用将从获得的转换为，以便在应用程序的中输出。请开导我，我在压力和困惑中迷失了……！

docx4j转换html->docx->html

共有1个答案

相关问答

相关文章

相关阅读

相关工具

相关文档