我想把PDF文档转换成图像。我用的是Ghost4j。
问题:Ghost4J需要gsdll32。dll文件,我不想使用dll文件。
问题1:是否有任何方法,在ghost4j转换图像没有dll?
问题2:我在PDFBox API中找到了解决方案<代码>组织。阿帕奇。pdfbox。pdmodel。PDPagep具有将PDF页面转换为图像格式的方法convertToImage()。
PDDocument doc = PDDocument.load(new File("/document.pdf"));
List<PDPage>pages = doc.getDocumentCatalog().getAllPages();
PDPage page = pages.get(0);
BufferedImage image =page.convertToImage();
File outputfile = new File("/image.png");
ImageIO.write(image, "png", outputfile);
doc.close();
我只有PDF文档上的文本。当我运行这段代码时,我有一个例外:
Aug 12, 2013 6:00:24 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: BDC
Exception in thread "main" java.lang.ExceptionInInitializerError
at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.getawtFont(PDTrueTypeFont.java:481)
at org.apache.pdfbox.pdmodel.font.PDSimpleFont.drawString(PDSimpleFont.java:109)
at org.apache.pdfbox.pdfviewer.PageDrawer.processTextPosition(PageDrawer.java:235)
at org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:496)
at org.apache.pdfbox.util.operator.ShowTextGlyph.process(ShowTextGlyph.java:62)
at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554)
at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
at org.apache.pdfbox.pdfviewer.PageDrawer.drawPage(PageDrawer.java:125)
at org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:781)
at org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:712)
at ge.eid.esignature.adessa.pades.sign.PDFtoImage.main(PDFtoImage.java:25)
Caused by: java.lang.IllegalArgumentException
at java.nio.Buffer.position(Buffer.java:216)
at sun.font.TrueTypeFont.lookupName(TrueTypeFont.java:1153)
at sun.font.TrueTypeFont.getPostscriptName(TrueTypeFont.java:1205)
at java.awt.Font.getPSName(Font.java:1156)
at org.apache.pdfbox.pdmodel.font.FontManager.loadFonts(FontManager.java:101)
at org.apache.pdfbox.pdmodel.font.FontManager.<clinit>(FontManager.java:53)
... 13 more
通过PDFBox的方式是避免本机绑定的好方法。尝试使用PDFBox中的PDFImageWriter,我在几行中也使用了它,它工作得很好。您必须提取PDF文档并使用它的写入器。
PDFImageWriter.write(doc, "png", null, , Integer.MAX_VALUE, "picture");
对于所有页面。
PDFImageWriter.write(doc, "png", null, 0, 0, "picture");
请参阅:PDFImageWriter Javadoc
您可以尝试使用nonSequentialParser来避免某些PDF文件(带有增量更新)的错误:
PDF文档=PDDocument.loadnonSeq(new File("/document.pdf"));
您可以轻松地将04-Resest-Headers.pdf文件页面转换为图像格式。
使用PDF Box将所有PDF页面转换为Java的图像格式。
Apache PDFBox 1.8的解决方案。*版本:
Jar需要pdfbox-1.8.3。罐子
还是maven依赖
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>1.8.3</version>
</dependency>
以下是解决方案:
package com.pdf.pdfbox.examples;
import java.awt.image.BufferedImage;
import java.io.File;
import java.util.List;
import javax.imageio.ImageIO;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
@SuppressWarnings("unchecked")
public class ConvertPDFPagesToImages {
public static void main(String[] args) {
try {
String sourceDir = "C:/Documents/04-Request-Headers.pdf"; // Pdf files are read from this folder
String destinationDir = "C:/Documents/Converted_PdfFiles_to_Image/"; // converted images from pdf document are saved here
File sourceFile = new File(sourceDir);
File destinationFile = new File(destinationDir);
if (!destinationFile.exists()) {
destinationFile.mkdir();
System.out.println("Folder Created -> "+ destinationFile.getAbsolutePath());
}
if (sourceFile.exists()) {
System.out.println("Images copied to Folder: "+ destinationFile.getName());
PDDocument document = PDDocument.load(sourceDir);
List<PDPage> list = document.getDocumentCatalog().getAllPages();
System.out.println("Total files to be converted -> "+ list.size());
String fileName = sourceFile.getName().replace(".pdf", "");
int pageNumber = 1;
for (PDPage page : list) {
BufferedImage image = page.convertToImage();
File outputfile = new File(destinationDir + fileName +"_"+ pageNumber +".png");
System.out.println("Image Created -> "+ outputfile.getName());
ImageIO.write(image, "png", outputfile);
pageNumber++;
}
document.close();
System.out.println("Converted Images are saved at -> "+ destinationFile.getAbsolutePath());
} else {
System.err.println(sourceFile.getName() +" File not exists");
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
可以将图像转换为jpg、jpeg、png、bmp、gif
格式。
注:我提到了主要使用的图像格式。
ImageIO.write(image , "jpg", new File( destinationDir +fileName+"_"+pageNumber+".jpg" ));
ImageIO.write(image , "jpeg", new File( destinationDir +fileName+"_"+pageNumber+".jpeg" ));
ImageIO.write(image , "png", new File( destinationDir +fileName+"_"+pageNumber+".png" ));
ImageIO.write(image , "bmp", new File( destinationDir +fileName+"_"+pageNumber+".bmp" ));
ImageIO.write(image , "gif", new File( destinationDir +fileName+"_"+pageNumber+".gif" ));
控制台输出:
Images copied to Folder: Converted_PdfFiles_to_Image
Total files to be converted -> 13
Aug 06, 2014 1:35:49 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_1.png
Aug 06, 2014 1:35:50 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_2.png
Aug 06, 2014 1:35:51 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_3.png
Aug 06, 2014 1:35:51 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_4.png
Aug 06, 2014 1:35:52 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_5.png
Aug 06, 2014 1:35:52 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_6.png
Aug 06, 2014 1:35:53 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_7.png
Aug 06, 2014 1:35:53 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_8.png
Aug 06, 2014 1:35:54 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_9.png
Aug 06, 2014 1:35:54 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_10.png
Aug 06, 2014 1:35:54 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_11.png
Aug 06, 2014 1:35:55 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_12.png
Aug 06, 2014 1:35:55 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_13.png
Converted Images are saved at -> C:\Documents\Converted_PdfFiles_to_Image
Apache PDFBox 2.0的解决方案。*版本:
所需罐子pdfbox-2.0.16.jar、fontbox-2.0.16.jar、commons-logging-1.2.jar
或者来自pom。xml依赖关系
<!-- https://mvnrepository.com/artifact/org.apache.pdfbox/pdfbox -->
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>2.0.16</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.pdfbox/fontbox -->
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>fontbox</artifactId>
<version>2.0.16</version>
</dependency>
<!-- https://mvnrepository.com/artifact/commons-logging/commons-logging -->
<dependency>
<groupId>commons-logging</groupId>
<artifactId>commons-logging</artifactId>
<version>1.2</version>
</dependency>
2.0.16版本的解决方案:
package com.pdf.pdfbox.examples;
import java.awt.image.BufferedImage;
import java.io.File;
import javax.imageio.ImageIO;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.rendering.ImageType;
import org.apache.pdfbox.rendering.PDFRenderer;
/**
*
* @author venkataudaykiranp
*
* @version 2.0.16(Apache PDFBox version support)
*
*/
public class ConvertPDFPagesToImages {
public static void main(String[] args) {
try {
String sourceDir = "C:\\Users\\venkataudaykiranp\\Downloads\\04-Request-Headers.pdf"; // Pdf files are read from this folder
String destinationDir = "C:\\Users\\venkataudaykiranp\\Downloads\\Converted_PdfFiles_to_Image/"; // converted images from pdf document are saved here
File sourceFile = new File(sourceDir);
File destinationFile = new File(destinationDir);
if (!destinationFile.exists()) {
destinationFile.mkdir();
System.out.println("Folder Created -> "+ destinationFile.getAbsolutePath());
}
if (sourceFile.exists()) {
System.out.println("Images copied to Folder Location: "+ destinationFile.getAbsolutePath());
PDDocument document = PDDocument.load(sourceFile);
PDFRenderer pdfRenderer = new PDFRenderer(document);
int numberOfPages = document.getNumberOfPages();
System.out.println("Total files to be converting -> "+ numberOfPages);
String fileName = sourceFile.getName().replace(".pdf", "");
String fileExtension= "png";
/*
* 600 dpi give good image clarity but size of each image is 2x times of 300 dpi.
* Ex: 1. For 300dpi 04-Request-Headers_2.png expected size is 797 KB
* 2. For 600dpi 04-Request-Headers_2.png expected size is 2.42 MB
*/
int dpi = 300;// use less dpi for to save more space in harddisk. For professional usage you can use more than 300dpi
for (int i = 0; i < numberOfPages; ++i) {
File outPutFile = new File(destinationDir + fileName +"_"+ (i+1) +"."+ fileExtension);
BufferedImage bImage = pdfRenderer.renderImageWithDPI(i, dpi, ImageType.RGB);
ImageIO.write(bImage, fileExtension, outPutFile);
}
document.close();
System.out.println("Converted Images are saved at -> "+ destinationFile.getAbsolutePath());
} else {
System.err.println(sourceFile.getName() +" File not exists");
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
我想把PDF文档转换成图像。我用的是Ghost4j。 问题:Ghost4J需要gsdll32。dll文件,我不想使用dll文件。 问题1:是否有任何方法,在ghost4j转换图像没有dll? 问题2:我在PDFBox API中找到了解决方案<代码>组织。阿帕奇。pdfbox。pdmodel。PDPagep具有将PDF页面转换为图像格式的方法convertToImage()。 我只有PDF文档上的文
有人能给我举个例子,说明如何使用ApachePDFBox转换不同图像中的PDF文件(PDF的每一页对应一个图像)?
问题内容: 按照目前的情况,这个问题并不适合我们的问答形式。我们希望答案得到事实,参考或专业知识的支持,但是这个问题可能会引起辩论,争论,民意调查或扩展讨论。如果您认为此问题可以解决并且可以重新提出,请访问帮助中心以获取指导。 7年前关闭。 我需要从现有的(X)HTML文档自动生成PDF文件。输入文件(报告)使用非常简单的基于表的布局,因此可能不需要支持真正精美的JavaScript / CSS。
问题内容: 我需要从现有的(X)HTML文档自动生成PDF文件。输入文件(报告)使用非常简单的基于表的布局,因此可能不需要支持真正精美的JavaScript / CSS。 由于我习惯于在Java中工作,因此最好在Java项目中轻松使用的解决方案。不过,它仅需要在Windows系统上工作。 一种可行的方法,但不会产生高质量的输出(至少是开箱即用的),一种方法是使用CSS2XSLFO和Apache F
问题内容: 我有大量文本字符串,这些字符串显然是PDF文件的原始数据,我需要将其重新制作为PDF。 目前,我正在将字符串读取到StringBuffer中,但是如果需要,可以更改它。从那里,我尝试将其写到文件中并更改扩展名(我真的希望这样做能起作用,但是我有点不知道),我尝试将其带入String,然后从中取出byte []。并将其写入文件,或使用DataOutputStream将字节放入文件中。这些
问题内容: 我想将PDF文件转换为CSV文件。我为此使用iText库。程序运行正常,但输出格式不正确。所有数据都在csv文件的第一行中。输出应与pdf文件完全相同(表示带有换行符)。请帮忙。提前致谢。 问题答案: 您需要在每个表行之后在缓冲区中引入一个换行符’\ n’。