问题：

如何在样式属性不变的情况下将HTML转换为格式良好的DOCX

丁嘉庆

2023-03-14

public boolean convertHTMLToDocx(String inputFilePath, String outputFilePath, boolean headerFlag,
        boolean footerFlag,String orientation, String logoPath, String margin, JSONObject json,boolean isArabic) {
    boolean conversionFlag;
    boolean orientationFlag = false;
    try {
        if(!orientation.equalsIgnoreCase("Y")){
            orientationFlag = true;
        }
        String stringFromFile = FileUtils.readFileToString(new File(inputFilePath), "UTF-8");
        String unescaped = stringFromFile;
        WordprocessingMLPackage wordMLPackage  = WordprocessingMLPackage.createPackage();
        NumberingDefinitionsPart ndp = new NumberingDefinitionsPart();
        wordMLPackage.getMainDocumentPart().addTargetPart(ndp);
        ndp.unmarshalDefaultNumbering();

        ImportXHTMLProperties.setProperty("docx4j-ImportXHTML.Bidi.Heuristic", true);
        ImportXHTMLProperties.setProperty("docx4j-ImportXHTML.Element.Heading.MapToStyle", true);
        ImportXHTMLProperties.setProperty("docx4j-ImportXHTML.fonts.default.serif", "Frutiger LT Arabic 45 Light");
        ImportXHTMLProperties.setProperty("docx4j-ImportXHTML.fonts.default.sans-serif", "Frutiger LT Arabic 45 Light");
        ImportXHTMLProperties.setProperty("docx4j-ImportXHTML.fonts.default.monospace", "Frutiger LT Arabic 45 Light");

        XHTMLImporterImpl xHTMLImporter = new XHTMLImporterImpl(wordMLPackage);
        xHTMLImporter.setHyperlinkStyle("Hyperlink");
        xHTMLImporter.setParagraphFormatting(FormattingOption.CLASS_PLUS_OTHER);
        xHTMLImporter.setTableFormatting(FormattingOption.CLASS_PLUS_OTHER);
        xHTMLImporter.setRunFormatting(FormattingOption.CLASS_PLUS_OTHER);

        wordMLPackage.getMainDocumentPart().getContent().addAll(xHTMLImporter.convert(unescaped, ""));

        XmlUtils.marshaltoString(wordMLPackage.getMainDocumentPart().getJaxbElement(),true,true);
        File output = new File(outputFilePath);

        wordMLPackage.save(output);

        Console.log("file path where it is stored is" + " " + output.getAbsolutePath());
        if (headerFlag || footerFlag) {
            File file = new File(outputFilePath);
            InputStream in = new FileInputStream(file);

            wordMLPackage = WordprocessingMLPackage.load(in);
            if (headerFlag) {
                // set Header 
            }
            if (footerFlag) {
                // set Footer
            }

            wordMLPackage.save(file);
            Console.log("Finished editing the word document");
        }
        conversionFlag = true;
    } catch (InvalidFormatException e) {
        Error.log("Invalid format found:-" + getStackTrace(e));
        conversionFlag = false;
    } catch (Exception e) {
        Error.log("Error while converting:-" + getStackTrace(e));
        conversionFlag = false;
    }

    return conversionFlag;
}

微生景胜

2023-03-14

以下是我如何接近它的。这不是最好的方法，但我已经看到在组织中实现了这一点。在这些方法中，它们在应用程序服务器上创建war文件，用于托管HTTP请求的静态和动态内容。

因此，我使用一个简单的字节数组写入。doc文件，而不是。docx。这样，最终的word文档将与HTML显示完全相同。我面临的唯一问题是二值图像无法显示。只有一个盒子代替了图像。

所以，我写了两个文件：

编辑-您可以创建一个新的war文件来承载这些图像，或者使用生成这些图像的文件。

我的经验--对于英文文档，使用docx4j进行。docx转换。对于阿拉伯语、希伯来语或其他RTL语言，请进行。doc转换，如上所示。然后，所有这样的。doc文档都可以很容易地从MS Word转换为。docx。

列出这两个文件，请根据需要更改：

        public static void writeHTMLDatatoDoc(String content, String inputHTMLFile,String outputDocFile,String uniqueName) throws Exception {
            String baseTag = getRemoteServerURL()+"/{war_deployment_desciptor}/images?image=";
            String tag = "Image_";
            String ext = ".png";
            String srcTag = "";
            String pathOnServer = getDiskPath() + File.separator + "TemplateGeneration"
                    + File.separator + "generatedTemplates" + File.separator + uniqueName + File.separator + "images" + File.separator;
    
            int i = 0;
            boolean binaryimgFlag = false;
    
            Pattern p = Pattern.compile("<img [^>]*src=[\\\"']([^\\\"^']*)");
            Matcher m = p.matcher(content);
            while (m.find()) {
                String src = m.group();
                int startIndex = src.indexOf("src=") + 5;
                int endIndex = src.length();
                
                // srcTag will contain data as data:image/png;base64,AAABAAEAEBAAAAEAGABoAw.........
                // Replace this whole later with path on local disk
                srcTag = src.substring(startIndex, src.length());
                
                if(srcTag.contains("base64")) {
                    binaryimgFlag = true;
                }
                if(binaryimgFlag) {
                    
                    // Extract image mime type and image extension from srcTag containing binary image
                    ext = extractMimeType(srcTag);
                    if(ext.lastIndexOf(".") != -1 && ext.lastIndexOf(".") != 0)
                        ext = ext.substring(ext.lastIndexOf(".")+1);
                    else 
                        ext = ".png";
                    
                    // read files already created for the different documents for this unique entity.
                    // The location contains all image files as Image_{i}.{image_extension}
                    // Sort files and read max counter in image names. 
                    // Increase value of i to generate next image as Image_{incremented_i}.{image_entension}
                    i = findiDynamicallyFromFilesCreatedForWI(pathOnServer);
                    i++; // Increase count for next image
                    
                    // save whole data to replace later
                    String srcTagBegin = srcTag; 
                    
                    // Remove data:image/png;base64, from srcTag , so I get only encoded image data.
                    // Decode this using Base64 decoder.
                    srcTag = srcTag.substring(srcTag.indexOf(",") + 1, srcTag.length());
                    byte[] imageByteArray = decodeImage(srcTag);
                    
                    // Constrcu replacement tag
                    String replacement = baseTag+pathOnServer+tag+i+ext;
                    replacement = replacement.replace("\\", "/");
    
                    // Writing image inside local directory on server
                    FileOutputStream imageOutFile = new FileOutputStream(pathOnServer+tag+i+ext);
                    imageOutFile.write(imageByteArray);
                    content = content.replace(srcTagBegin, replacement);
                    imageOutFile.close();
                }
            }
            
            //Re write HTML file
            writeHTMLData(content,inputHTMLFile);
    
            // write content to doc file
            writeHTMLData(content,outputDocFile);
        }
    
        public static int findiDynamicallyFromFilesCreatedForWI(String pathOnServer) {
            String path = pathOnServer;
            int nextFileCount = 0;
            String number = "";
            String[] dirListing = null;
            File dir = new File(path);
            dirListing = dir.list();
            if(dirListing.length != 0) {
                Arrays.sort(dirListing);
                int length = dirListing.length;
                int index = dirListing[length - 1].indexOf('.');
                number = dirListing[length - 1].substring(0,index);
                int index1 = number.indexOf('_');
                number = number.substring(index1+1,number.length());
                nextFileCount = Integer.parseInt(number);
            }
            return nextFileCount;
        }
    
        private static String extractMimeType(final String encoded) {
            final Pattern mime = Pattern.compile("^data:([a-zA-Z0-9]+/[a-zA-Z0-9]+).*,.*");
            final Matcher matcher = mime.matcher(encoded);
            if (!matcher.find())
                return "";
            return matcher.group(1).toLowerCase();
        }
    
        private static void writeHTMLData(String inputData, String outputFilepath) {
            BufferedWriter writer = null;
            try {
                writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(new File(outputFilepath)), Charset.forName("UTF-8")));
                writer.write(inputData);
            } catch (IOException e) {
                e.printStackTrace();
            } finally {
                try {
                    if(writer != null)
                        writer.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
    
        public static byte[] decodeImage(String imageDataString) {
            return Base64.decodeBase64(imageDataString);
        }
    
        private static String readHTMLData(String inputFile) {
            String data = "";
            String str = "";
    
            try (BufferedReader reader = new BufferedReader(
                    new InputStreamReader(new FileInputStream(new File(inputFile)), StandardCharsets.UTF_8))) {
                while ((str = reader.readLine()) != null) {
                    data += str;
                }
            } catch (IOException e) {
                e.printStackTrace();
            }
            return data;
        }

 import java.io.File;
 import java.io.IOException;
 import java.nio.file.Files;
 
 import javax.servlet.ServletException;
 import javax.servlet.http.HttpServlet;
 import javax.servlet.http.HttpServletRequest;
 import javax.servlet.http.HttpServletResponse;
 import com.newgen.clos.logging.consoleLogger.Console;
 public class ImageServlet extends HttpServlet {
     public void init() throws ServletException {
     public ImageServlet() {
         super();
     }
 
     protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
         String param = request.getParameter("image");
         Console.log("Image Servlet executed");
         Console.log("File Name Requested: " + param);
         param.replace("\"", "");
         param.replace("%20"," ");
         File file = new File(param);
         response.setHeader("Content-Type", getServletContext().getMimeType(param));
         response.setHeader("Content-Length", String.valueOf(file.length()));
         response.setHeader("Content-Disposition", "inline; filename=\"" + param + "\"");
         Files.copy(file.toPath(), response.getOutputStream());
     }
 }

如何在样式属性不变的情况下将HTML转换为格式良好的DOCX

共有1个答案

相关问答

相关文章

相关阅读

相关工具

相关文档