当前位置: 首页 > 知识库问答 >
问题:

如何在样式属性不变的情况下将HTML转换为格式良好的DOCX

丁嘉庆
2023-03-14
public boolean convertHTMLToDocx(String inputFilePath, String outputFilePath, boolean headerFlag,
        boolean footerFlag,String orientation, String logoPath, String margin, JSONObject json,boolean isArabic) {
    boolean conversionFlag;
    boolean orientationFlag = false;
    try {
        if(!orientation.equalsIgnoreCase("Y")){
            orientationFlag = true;
        }
        String stringFromFile = FileUtils.readFileToString(new File(inputFilePath), "UTF-8");
        String unescaped = stringFromFile;
        WordprocessingMLPackage wordMLPackage  = WordprocessingMLPackage.createPackage();
        NumberingDefinitionsPart ndp = new NumberingDefinitionsPart();
        wordMLPackage.getMainDocumentPart().addTargetPart(ndp);
        ndp.unmarshalDefaultNumbering();

        ImportXHTMLProperties.setProperty("docx4j-ImportXHTML.Bidi.Heuristic", true);
        ImportXHTMLProperties.setProperty("docx4j-ImportXHTML.Element.Heading.MapToStyle", true);
        ImportXHTMLProperties.setProperty("docx4j-ImportXHTML.fonts.default.serif", "Frutiger LT Arabic 45 Light");
        ImportXHTMLProperties.setProperty("docx4j-ImportXHTML.fonts.default.sans-serif", "Frutiger LT Arabic 45 Light");
        ImportXHTMLProperties.setProperty("docx4j-ImportXHTML.fonts.default.monospace", "Frutiger LT Arabic 45 Light");

        XHTMLImporterImpl xHTMLImporter = new XHTMLImporterImpl(wordMLPackage);
        xHTMLImporter.setHyperlinkStyle("Hyperlink");
        xHTMLImporter.setParagraphFormatting(FormattingOption.CLASS_PLUS_OTHER);
        xHTMLImporter.setTableFormatting(FormattingOption.CLASS_PLUS_OTHER);
        xHTMLImporter.setRunFormatting(FormattingOption.CLASS_PLUS_OTHER);

        wordMLPackage.getMainDocumentPart().getContent().addAll(xHTMLImporter.convert(unescaped, ""));

        XmlUtils.marshaltoString(wordMLPackage.getMainDocumentPart().getJaxbElement(),true,true);
        File output = new File(outputFilePath);

        wordMLPackage.save(output);

        Console.log("file path where it is stored is" + " " + output.getAbsolutePath());
        if (headerFlag || footerFlag) {
            File file = new File(outputFilePath);
            InputStream in = new FileInputStream(file);

            wordMLPackage = WordprocessingMLPackage.load(in);
            if (headerFlag) {
                // set Header 
            }
            if (footerFlag) {
                // set Footer
            }

            wordMLPackage.save(file);
            Console.log("Finished editing the word document");
        }
        conversionFlag = true;
    } catch (InvalidFormatException e) {
        Error.log("Invalid format found:-" + getStackTrace(e));
        conversionFlag = false;
    } catch (Exception e) {
        Error.log("Error while converting:-" + getStackTrace(e));
        conversionFlag = false;
    }

    return conversionFlag;
}

共有1个答案

微生景胜
2023-03-14

以下是我如何接近它的。这不是最好的方法,但我已经看到在组织中实现了这一点。在这些方法中,它们在应用程序服务器上创建war文件,用于托管HTTP请求的静态和动态内容。

因此,我使用一个简单的字节数组写入。doc文件,而不是。docx。这样,最终的word文档将与HTML显示完全相同。我面临的唯一问题是二值图像无法显示。只有一个盒子代替了图像。

所以,我写了两个文件:

编辑-您可以创建一个新的war文件来承载这些图像,或者使用生成这些图像的文件。

我的经验--对于英文文档,使用docx4j进行。docx转换。对于阿拉伯语、希伯来语或其他RTL语言,请进行。doc转换,如上所示。然后,所有这样的。doc文档都可以很容易地从MS Word转换为。docx。

列出这两个文件,请根据需要更改:

        public static void writeHTMLDatatoDoc(String content, String inputHTMLFile,String outputDocFile,String uniqueName) throws Exception {
            String baseTag = getRemoteServerURL()+"/{war_deployment_desciptor}/images?image=";
            String tag = "Image_";
            String ext = ".png";
            String srcTag = "";
            String pathOnServer = getDiskPath() + File.separator + "TemplateGeneration"
                    + File.separator + "generatedTemplates" + File.separator + uniqueName + File.separator + "images" + File.separator;
    
            int i = 0;
            boolean binaryimgFlag = false;
    
            Pattern p = Pattern.compile("<img [^>]*src=[\\\"']([^\\\"^']*)");
            Matcher m = p.matcher(content);
            while (m.find()) {
                String src = m.group();
                int startIndex = src.indexOf("src=") + 5;
                int endIndex = src.length();
                
                // srcTag will contain data as .........
                // Replace this whole later with path on local disk
                srcTag = src.substring(startIndex, src.length());
                
                if(srcTag.contains("base64")) {
                    binaryimgFlag = true;
                }
                if(binaryimgFlag) {
                    
                    // Extract image mime type and image extension from srcTag containing binary image
                    ext = extractMimeType(srcTag);
                    if(ext.lastIndexOf(".") != -1 && ext.lastIndexOf(".") != 0)
                        ext = ext.substring(ext.lastIndexOf(".")+1);
                    else 
                        ext = ".png";
                    
                    // read files already created for the different documents for this unique entity.
                    // The location contains all image files as Image_{i}.{image_extension}
                    // Sort files and read max counter in image names. 
                    // Increase value of i to generate next image as Image_{incremented_i}.{image_entension}
                    i = findiDynamicallyFromFilesCreatedForWI(pathOnServer);
                    i++; // Increase count for next image
                    
                    // save whole data to replace later
                    String srcTagBegin = srcTag; 
                    
                    // Remove data:image/png;base64, from srcTag , so I get only encoded image data.
                    // Decode this using Base64 decoder.
                    srcTag = srcTag.substring(srcTag.indexOf(",") + 1, srcTag.length());
                    byte[] imageByteArray = decodeImage(srcTag);
                    
                    // Constrcu replacement tag
                    String replacement = baseTag+pathOnServer+tag+i+ext;
                    replacement = replacement.replace("\\", "/");
    
                    // Writing image inside local directory on server
                    FileOutputStream imageOutFile = new FileOutputStream(pathOnServer+tag+i+ext);
                    imageOutFile.write(imageByteArray);
                    content = content.replace(srcTagBegin, replacement);
                    imageOutFile.close();
                }
            }
            
            //Re write HTML file
            writeHTMLData(content,inputHTMLFile);
    
            // write content to doc file
            writeHTMLData(content,outputDocFile);
        }
    
        public static int findiDynamicallyFromFilesCreatedForWI(String pathOnServer) {
            String path = pathOnServer;
            int nextFileCount = 0;
            String number = "";
            String[] dirListing = null;
            File dir = new File(path);
            dirListing = dir.list();
            if(dirListing.length != 0) {
                Arrays.sort(dirListing);
                int length = dirListing.length;
                int index = dirListing[length - 1].indexOf('.');
                number = dirListing[length - 1].substring(0,index);
                int index1 = number.indexOf('_');
                number = number.substring(index1+1,number.length());
                nextFileCount = Integer.parseInt(number);
            }
            return nextFileCount;
        }
    
        private static String extractMimeType(final String encoded) {
            final Pattern mime = Pattern.compile("^data:([a-zA-Z0-9]+/[a-zA-Z0-9]+).*,.*");
            final Matcher matcher = mime.matcher(encoded);
            if (!matcher.find())
                return "";
            return matcher.group(1).toLowerCase();
        }
    
        private static void writeHTMLData(String inputData, String outputFilepath) {
            BufferedWriter writer = null;
            try {
                writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(new File(outputFilepath)), Charset.forName("UTF-8")));
                writer.write(inputData);
            } catch (IOException e) {
                e.printStackTrace();
            } finally {
                try {
                    if(writer != null)
                        writer.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
    
        public static byte[] decodeImage(String imageDataString) {
            return Base64.decodeBase64(imageDataString);
        }
    
        private static String readHTMLData(String inputFile) {
            String data = "";
            String str = "";
    
            try (BufferedReader reader = new BufferedReader(
                    new InputStreamReader(new FileInputStream(new File(inputFile)), StandardCharsets.UTF_8))) {
                while ((str = reader.readLine()) != null) {
                    data += str;
                }
            } catch (IOException e) {
                e.printStackTrace();
            }
            return data;
        }
 import java.io.File;
 import java.io.IOException;
 import java.nio.file.Files;
 
 import javax.servlet.ServletException;
 import javax.servlet.http.HttpServlet;
 import javax.servlet.http.HttpServletRequest;
 import javax.servlet.http.HttpServletResponse;
 import com.newgen.clos.logging.consoleLogger.Console;
 public class ImageServlet extends HttpServlet {
     public void init() throws ServletException {
     public ImageServlet() {
         super();
     }
 
     protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
         String param = request.getParameter("image");
         Console.log("Image Servlet executed");
         Console.log("File Name Requested: " + param);
         param.replace("\"", "");
         param.replace("%20"," ");
         File file = new File(param);
         response.setHeader("Content-Type", getServletContext().getMimeType(param));
         response.setHeader("Content-Length", String.valueOf(file.length()));
         response.setHeader("Content-Disposition", "inline; filename=\"" + param + "\"");
         Files.copy(file.toPath(), response.getOutputStream());
     }
 }
 
 类似资料:
  • 问题内容: 我有个问题。我正在尝试将一些字符串转换为日期,但我不知道日期到达的格式。 这或许让他们或等。 如何将这些字符串转换为Date?我尝试了这个: 但是,当我打印出someDate时,它的打印方式是这样的:2019-08-05 12:42:48.638 CEST这意味着,但是当我运行以上代码时,日期对象现在变成了,至少可以这样说。 有什么想法可以正确格式化日期格式吗? 问题答案: 你不能!

  • Clang接受以下代码,但gcc拒绝它。 以下是错误消息:

  • 问题内容: 我已经习惯于完美地对齐表单字段。这是我通常编写表单的方式: 我知道这是不好的做法,我 想 使用CSS, ,,或清洁方法。但是,事实是,对于表格来说,效果非常好。一切都完全正确地对齐,间距是完美的,所有错误都在彼此之间,等等。 我最近尝试为表单使用and 标记,但是由于它们看起来好多了,我最终返回到表。 不使用s 如何获得这种对齐的表格布局? 问题答案: 这可能不会得到很多支持,但这是我

  • 我在用org。乔达。时间LocalDate和LocalDateTime。我从外部源获得一个Unix时间戳,并希望从中生成一个LocalDate(时间)。关键是,在该外部系统的界面中定义,所有日期/时间都在UTC时区内。因此,我希望避免从该时间戳到本地系统的任何默认时区的任何隐式转换,这可能与UTC不同。有一个LocalDateTime的构造器用于这些事情,所以我尝试(作为一个例子): 结果让我有点

  • 我对docx4j样本有一些问题。我需要转换一个文件从docx在html格式和回来。我正在尝试编译ConvertInXHTMLDocument。java示例。它创建的Html文件很好,但当试图将其转换回docx时,抛出一个缺少关闭标记(META、img等)的异常。有人遇到过这个问题吗?