当前位置: 首页 > 工具软件 > Apache Toree > 使用案例 >

Apache PDFBox - A Java PDF Library

苏华荣
2023-12-01

The Apache PDFBox® library is an open source Java tool for working with PDF documents. This project allows creation of new PDF documents, manipulation of existing documents and the ability to extract content from documents. Apache PDFBox also includes several command-line utilities. Apache PDFBox is published under the Apache License v2.0.

Apache PDFBox®库是一个用于处理PDF文档的开源Java工具。该项目允许创建新的PDF文档、操作现有文档以及从文档中提取内容。ApachePDFBox还包括几个命令行实用程序。Apache PDFBox是在Apache许可证V2.0下发布的

特征:

Extract Text:Extract Unicode text from PDF files.

提取文本:从PDF文件中提取Unicode文本。

Split & Merge:Split a single PDF into many files or merge multiple PDF files.

分隔&合并:将单个PDF拆分为多个文件或合并多个PDF文件。

Fill Forms:Extract data from PDF forms or fill a PDF form.

填充表单:从PDF表格中提取数据或填写PDF表格。

Preflight:Validate PDF files against the PDF/A-1b standard.

预检:校验PDF文件是否符合PDF/A-1b标准。

Print:Print a PDF file using the standard Java printing API.

打印:使用标准Java打印API打印PDF文件。

Save as Image:Save PDFs as image files, such as PNG or JPEG.

保存为图片:将PDF另存为图像文件,例如PNG或JPEG。

Create PDFs:Create a PDF from scratch, with embedded fonts and images.

创建PDF文件:用嵌入的字体和图像从头开始创建PDF。

Signing:Digitally sign PDF files.

验签:数字签名PDF文件

Getting Started

Maven

To use the latest release you'll need to add the following dependency:

要使用最新版本,您需要添加以下依赖项:

<dependency>
  <groupId>org.apache.pdfbox</groupId>
  <artifactId>pdfbox</artifactId>
  <version>2.0.22</version>
</dependency>

PDFBox and Java 8

Due to the change of the java color management module towards "LittleCMS", users can experience slow performance in color operations. A solution is to disable LittleCMS in favor of the old KCMS (Kodak Color Management System) by:

由于java颜色管理模块向“LittleCMS”的转变,用户可能会体验到颜色操作的缓慢性能。一种解决方案是通过以下方式禁用LittleCMS,以支持旧的KCMS(柯达颜色管理系统):

  • Starting with -Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider 
  • Calling System.setProperty("sun.java2d.cmm", "sun.java2d.cmm.kcms.KcmsServiceProvider")

 Sources: https://bugs.openjdk.java.net/browse/JDK-8041125

Rendering Performance渲染性能(since:2.0.4)

PDFBox 2.0.4 introduced a new command line setting

PDFBox 2.0.4引入了新的命令行设置

-Dorg.apache.pdfbox.rendering.UsePureJavaCMYKConversion=true

which may improve the performance of rendering PDFs on some systems especially if there are a lot of images on a page.

这可能会提高某些系统上渲染PDF的性能,尤其是当页面上有大量图片时。

Core Components

The three PDFBox components are named pdfboxfontbox and xmpbox. The Maven groupId of all PDFBox components is org.apache.pdfbox.

这三个PDFBox组件被命名为PDFBox、fontbox和xmpbox。所有PDFBox组件的Maven groupId都是org.apache.pdfbox。

Minimum Requirements

PDFBox has the following basic dependencies:

PDFBox具有以下基本依赖关系:

Commons Logging is a generic wrapper around different logging frameworks, so you'll either need to also use a logging library like log4j or let commons-logging fall back to the standard java.util.logging API included in the Java platform.

Commons Logging是一个围绕不同日志框架的通用包装器,因此您要么还需要使用log4j之类的日志库,要么让Commons日志回到java平台的标准java.util.logging.API

Font Handling

For font handling the fontbox component is needed.

字体处理需要fontbox组件。

XMP Metadata

To support XMP metadata the xmpbox component is needed.

为了支持XMP元数据,需要xmpbox组件。

Include Dependencies Using Maven

To add the pdfbox, fontbox, xmpbox and commons-logging jars to your application, the easiest thing is to declare the Maven dependency shown below. This gives you the main pdfbox library directly and the other required jars as transitive dependencies.

要将pdfbox、fontbox、xmpbox和commons日志jar添加到应用程序中,最简单的方法是声明如下所示的Maven依赖项。这将直接为您提供主pdfbox库,以及作为可传递依赖项的其他必需JAR。

<dependency>
    <groupId>org.apache.pdfbox</groupId>
    <artifactId>pdfbox</artifactId>
    <version>...</version>
</dependency>

Optional Components

PDFBox does not ship with all features enabled. Third party components are necessary to get full support for certain functionality.

PDFBox并非一开始就启用了所有特性功能。对于某些特定功能可以通过获取第三方组件的支持。

JAI Image I/O

PDF supports embedded image files, however support for some formats require third party libraries which are distributed under terms incompatible with the Apache 2.0 license:

PDF支持嵌入式图像文件,但对某些格式的支持需要第三方库,这些库是根据与Apache 2.0许可证不兼容的条款分发的:

These libraries are optional and will be loaded if present on the classpath, otherwise support for these image formats will be disabled and a warning will be logged when an unsupported image is encountered.

这些库是可选的,如果这些库在类路径上,将加载这些库,否则将禁用对这些图像格式的支持,并且在遇到不支持的图像时会将警告记录。

Maven dependencies for these components can be found in parent/pom.xmlChange the scope of the components if needed. Please make sure that any third party licenses are suitable for your project.

这些组件的Maven依赖关系可以在parent/pom.xml文件中找到。可以根据需要更改组件的作用域。请确保任何第三方许可证都适合您的项目。

To include the JBIG2 library the following part can be included in your project pom.xml:

例如:要包含JBIG2库,可以在项目中pom.xml中包含下面引用:

<dependency>
    <groupId>org.apache.pdfbox</groupId>
    <artifactId>jbig2-imageio</artifactId>
    <version>3.0.0</version>
</dependency>

Encryption and Signing 

Encrypting and sigining PDFs requires the bcprovbcmail and bcpkix libraries from the Legion of the Bouncy Castle. These can be included in your Maven project using the following dependencies:

加密和签名PDF需要Legion of the Bouncy Castle 中的cprovbcmail 和 bcpkix库。可以使用以下依赖项将其包含在Maven项目中:

<dependency>
    <groupId>org.bouncycastle</groupId>
    <artifactId>bcprov-jdk15on</artifactId>
    <version>1.64</version>
</dependency>

<dependency>
    <groupId>org.bouncycastle</groupId>
    <artifactId>bcmail-jdk15on</artifactId>
    <version>1.64</version>
</dependency>

<dependency>
    <groupId>org.bouncycastle</groupId>
    <artifactId>bcpkix-jdk15on</artifactId>
    <version>1.64</version>
</dependency>

Java Cryptography Extension (JCE) 

256-bit AES encryption requires a JDK with "unlimited strength" cryptography, which requires extra files to be installed. For JDK 7, see Java Cryptography Extension (JCE). If these files are not installed, building PDFBox will throw an exception with the following message:

256位AES加密需要具有“无限强度”加密的JDK,这需要安装额外的文件。对于JDK 7,请参阅  Java Cryptography Extension (JCE).。如果未安装这些文件,构建PDFBox将引发异常,并显示以下消息:

JCE unlimited strength jurisdiction policy files are not installed
 类似资料:

相关阅读

相关文章

相关问答