问题：

阅读方程式

林俊英

2023-03-14

我想读取word/docx文件的数据并保存到我的数据库中，需要时我可以从数据库中获取数据并在我的html页面上显示我使用ApachePOI读取docx文件中的数据，但它无法获取公式，请帮助我！

共有2个答案

胥宏义

2023-03-14

添加到@Axel Richter答案中，我发现很难找到所需的依赖集

        <dependency>
            <groupId>org.apache.poi</groupId>
            <artifactId>ooxml-schemas</artifactId>
            <version>1.4</version>
        </dependency>
        <dependency>
            <groupId>org.apache.poi</groupId>
            <artifactId>poi-ooxml</artifactId>
            <version>3.15</version>
        </dependency>

对于Office2019，我想他们不提供OMML2MML。XSL下面是它的链接https://github.com/Versal/word2markdown/blob/master/libs/omml2mml.xsl

赏逸春

2023-03-14

Word*。docx文件是包含XML文件的ZIP归档文件，这些文件是Office Open XML。包含在Word中的公式。docx文档是Office MathML（OMML）。

不幸的是，这种XML格式在microsoftoffice之外并不广为人知。因此，它不能直接用于HTML。但幸运的是，它是XML，因此可以使用XSLT转换XML数据。因此，我们可以将OMML转换为MathML，例如，它可以在更广泛的用例领域中使用。

通过XSLT的转换过程主要基于转换的XSL定义。不幸的是，创建这样一个系统也不是很容易。但幸运的是Microsoft已经这样做了，如果您安装了当前的Microsoft Office，您可以找到这个文件OMML2MML。%ProgramFiles%\中的Microsoft Office程序目录中的XSL。如果你没有找到它，做一个网络调查得到它。

因此，如果我们知道这一切，我们可以从XWPFDocument获取OMML，将其转换为MathML，然后将其保存以供以后使用。

我的示例将找到的公式存储为字符串的ArrayList中的MathML。您还应该能够在数据库中存储这些字符串。

该示例需要完整的ooxml-schemas-1.3。jar如中所述https://poi.apache.org/faq.html#faq-N10025。这是因为它使用的是CTOMath，而不是较小的poi ooxml模式jar。

Word文档：

Java代码：

import java.io.*;
import org.apache.poi.xwpf.usermodel.*;

import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTP;
import org.openxmlformats.schemas.officeDocument.x2006.math.CTOMath;
import org.openxmlformats.schemas.officeDocument.x2006.math.CTOMathPara;

import org.w3c.dom.Node;

import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamSource;
import javax.xml.transform.stream.StreamResult;

import java.awt.Desktop;

import java.util.List;
import java.util.ArrayList;

/*
needs the full ooxml-schemas-1.3.jar as mentioned in https://poi.apache.org/faq.html#faq-N10025
*/

public class WordReadFormulas {

 static File stylesheet = new File("OMML2MML.XSL");
 static TransformerFactory tFactory = TransformerFactory.newInstance();
 static StreamSource stylesource = new StreamSource(stylesheet); 

 static String getMathML(CTOMath ctomath) throws Exception {
  Transformer transformer = tFactory.newTransformer(stylesource);

  Node node = ctomath.getDomNode();

  DOMSource source = new DOMSource(node);
  StringWriter stringwriter = new StringWriter();
  StreamResult result = new StreamResult(stringwriter);
  transformer.setOutputProperty("omit-xml-declaration", "yes");
  transformer.transform(source, result);

  String mathML = stringwriter.toString();
  stringwriter.close();

  //The native OMML2MML.XSL transforms OMML into MathML as XML having special name spaces.
  //We don't need this since we want using the MathML in HTML, not in XML.
  //So ideally we should changing the OMML2MML.XSL to not do so.
  //But to take this example as simple as possible, we are using replace to get rid of the XML specialities.
  mathML = mathML.replaceAll("xmlns:m=\"http://schemas.openxmlformats.org/officeDocument/2006/math\"", "");
  mathML = mathML.replaceAll("xmlns:mml", "xmlns");
  mathML = mathML.replaceAll("mml:", "");

  return mathML;
 }

 public static void main(String[] args) throws Exception {
    
  XWPFDocument document = new XWPFDocument(new FileInputStream("Formula.docx"));

  //storing the found MathML in a AllayList of strings
  List<String> mathMLList = new ArrayList<String>();

  //getting the formulas out of all body elements
  for (IBodyElement ibodyelement : document.getBodyElements()) {
   if (ibodyelement.getElementType().equals(BodyElementType.PARAGRAPH)) {
    XWPFParagraph paragraph = (XWPFParagraph)ibodyelement;
    for (CTOMath ctomath : paragraph.getCTP().getOMathList()) {
     mathMLList.add(getMathML(ctomath));
    }
    for (CTOMathPara ctomathpara : paragraph.getCTP().getOMathParaList()) {
     for (CTOMath ctomath : ctomathpara.getOMathList()) {
      mathMLList.add(getMathML(ctomath));
     }
    }
   } else if (ibodyelement.getElementType().equals(BodyElementType.TABLE)) {
    XWPFTable table = (XWPFTable)ibodyelement; 
    for (XWPFTableRow row : table.getRows()) {
     for (XWPFTableCell cell : row.getTableCells()) {
      for (XWPFParagraph paragraph : cell.getParagraphs()) {
       for (CTOMath ctomath : paragraph.getCTP().getOMathList()) {
        mathMLList.add(getMathML(ctomath));
       }
       for (CTOMathPara ctomathpara : paragraph.getCTP().getOMathParaList()) {
        for (CTOMath ctomath : ctomathpara.getOMathList()) {
         mathMLList.add(getMathML(ctomath));
        }
       }
      }
     }
    }
   }
  }

  document.close();

  //creating a sample HTML file 
  String encoding = "UTF-8";
  FileOutputStream fos = new FileOutputStream("result.html");
  OutputStreamWriter writer = new OutputStreamWriter(fos, encoding);
  writer.write("<!DOCTYPE html>\n");
  writer.write("<html lang=\"en\">");
  writer.write("<head>");
  writer.write("<meta charset=\"utf-8\"/>");

  //using MathJax for helping all browsers to interpret MathML
  writer.write("<script type=\"text/javascript\"");
  writer.write(" async src=\"https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=MML_CHTML\"");
  writer.write(">");
  writer.write("</script>");

  writer.write("</head>");
  writer.write("<body>");
  writer.write("<p>Following formulas was found in Word document: </p>");

  int i = 1;
  for (String mathML : mathMLList) {
   writer.write("<p>Formula" + i++ + ":</p>");
   writer.write(mathML);
   writer.write("<p/>");
  }

  writer.write("</body>");
  writer.write("</html>");
  writer.close();

  Desktop.getDesktop().browse(new File("result.html").toURI());

 }
}

结果：

刚刚使用ApachePOI5.0.0对这段代码进行了测试，它运行正常。您需要poi-ooxml-full-5.0.0。jar用于apache poi 5.0.0。请阅读https://poi.apache.org/help/faq.html#faq-N10025 for whatooxml库对于whatapache poi版本是必需的。

阅读方程式

共有2个答案

相关问答

相关文章

相关阅读

相关工具

相关文档