DocFormat枚举类提供了DOCX格式的文件输出格式选项。如果你想要将PDF转化为DOCX格式,请参考本文下面的内容。
步骤:
具体代码如下:
public static void ConvertPDFtoWord_DOCX_Format() {
// Open the source PDF document
Document pdfDocument = new Document(_dataDir + "PDFToDOC.pdf");
// Save the resultant DOC file
pdfDocument.save(_dataDir + "saveOptionsOutput_out.doc", SaveFormat.DocX);
}
DocSaveOptions类有个Format属性,它提供了文档格式(DOC、DOCX)的功能。要将PDF转化成为DOCX格式,需要设置DocSaveOptions为DocFormat.DOCX.
具体代码如下:
public static void ConvertPDFtoWord_Advanced_DOCX_Format()
{
// Open the source PDF document
Document pdfDocument = new Document(_dataDir + "PDFToDOC.pdf");
// Instantiate DocSaveOptions object
DocSaveOptions saveOptions = new DocSaveOptions();
// Specify the output format as DOCX
saveOptions.setFormat(DocSaveOptions.DocFormat.DocX);
// Set other DocSaveOptions params
// ....
// Save document in docx format
pdfDocument.save("ConvertToDOCX_out.docx", saveOptions);
}
步骤:
代码如下:
public static void main(String[] args) throws IOException {
ConvertPDFtoWord();
ConvertPDFtoWordDocAdvanced();
}
public static void ConvertPDFtoWord() {
// Open the source PDF document
Document pdfDocument = new Document(_dataDir + "PDFToDOC.pdf");
// Save the file into MS document format
pdfDocument.save(_dataDir + "PDFToDOC_out.doc", SaveFormat.Doc);
}
使用DocSaveOptions进行转化
public static void ConvertPDFtoWordDocAdvanced()
{
Path pdfFile = Paths.get(_dataDir.toString(), "PDF-to-DOC.pdf");
Path docFile = Paths.get(_dataDir.toString(), "PDF-to-DOC.doc");
Document pdfDocument = new Document(pdfFile.toString());
DocSaveOptions saveOptions = new DocSaveOptions();
// Specify the output format as DOC
saveOptions.setFormat(DocSaveOptions.DocFormat.Doc);
// Set the recognition mode as Flow
saveOptions.setMode(DocSaveOptions.RecognitionMode.Flow);
// Set the Horizontal proximity as 2.5
saveOptions.setRelativeHorizontalProximity(2.5f);
// Enable the value to recognize bullets during conversion process
saveOptions.setRecognizeBullets(true);
pdfDocument.save(docFile.toString(), saveOptions);
}
DocSaveOptions类提供了许多属性,方便设置将PDF转化为DOC格式的过程。可以通过设置Mode属性来指定PDF内容识别模式,Mode的取值来自RecognitionMode枚举类。