使用PDFBOX提取PDF文件

葛志国

2023-12-01

最近在使用lucene建立索引时，要使用PDFBOX提取PDF文件，但结果总是报错：

java.lang.Throwable: Warning: You did not close the PDF Document

这个问题很烦人，从第三方类库例出来。

在网上记录下找到的解决办法：

原来的代码：

StringBuffer content = new StringBuffer（“”）; //   内容的所述文件 
的FileInputStream FIS = 新的FileInputStream（F）;
                        PDFParser p = 新的 PDFParser（fis）;
                        p.parse（）;
                        PDFTextStripper ts = new PDFTextStripper（）;
                        content.append（ts.getText（p.getPDDocument（）））;

不报错的代码：

StringBuffer content = new StringBuffer（“”）; //   内容的所述文件 
                PDDocument pdfDocument = 空 ;
                尝试 {
                        FileInputStream fis = new FileInputStream（f）;
                        PDFTextStripper stripper = new PDFTextStripper（）;
                        pdfDocument = PDDocument.load（fis）;
                        StringWriter writer = new StringWriter（）;
                        stripper.writeText（pdfDocument，writer）;
                        content.append（writer.getBuffer（）的toString（））;
                        fis.close（）;
                } catch（java.io.IOException e）{
                        System.err.println（“IOException =” + e）;
                        System.exit（1）;
                } finally {
                         if（pdfDocument！= null）{
 //                               System.err.println（“关闭文档” + f + “...”）;
                                org.pdfbox.cos.COSDocument cos = pdfDocument.getDocument（）;
                                cos.close（）;
//                               System.err.println（“Closed” + cos）;
                                pdfDocument.close（）;
                        }
                }

使用PDFBOX提取PDF文件

相关阅读

相关文章

相关问答

相关文档