我们一直在使用基于iText的PdfVeryDenseMergeTool,我们在这个问题中发现了如何在合并时删除空白,以便将多个PDF文件合并到单个PDF文件中。该工具可以合并PDF,而不会在两者之间留下任何空格,并且在可能的情况下,单个PDF也可以跨页面进行拆分。
我们想把PdfVeryDenseMergeTool移植到PDFBox。我们发现了一个基于PDFBox 2的PDFDenseMerge工具,可以像这样合并PDF:
个别PDF:
密集合并PDF:
我们正在寻找类似的东西(这已经是基于iText的PdfVeryDenseMergeTool中的一个,但我们希望使用PDFBox 2来实现):
在我们尝试进行移植时,我们发现PdfVeryDenseMergeTool使用PageVerticalAnalyzer扩展iText PDF渲染监听器,并且每次在PDF中绘制文本、图像或弧时都会执行一些操作。然后,所有的渲染信息都用于将单个PDF分割成多个页面。我们尝试在PDFBox 2中寻找类似的PDF渲染监听器,但发现可用的PDFRenader类只有图像渲染方法。因此,我们不确定如何将PageVerticalAnalyzer端口到PDFBox。
如果有人能提出前进的方法,我们将非常感谢他们的帮助。
非常感谢!
编辑7二月2020
目前,我们正在从PDFBox扩展PDFGraphicsStreamEngine,以创建一个自定义渲染引擎,在绘制图像、文本线和圆弧时跟踪它们的坐标。该自定义引擎将是PageVerticalAnalyzer的端口。之后,我们希望能够将PdfVeryDenseMergeTool移植到PDFBox。
编辑8二月2020
这里是一个非常简单的PageVerticalAnalyzer端口,用于处理图像和文本。我是一个PDFBox新手,所以我处理图像的逻辑可能不稳定。以下是基本方法:
文本:对于打印的每个字形,获取底部并使topY=底部charHeight,标记这些顶部/底部点。
图像:对于每次调用DrawImage(),似乎有两种方法可以找出它是在哪里绘制的。首先是使用最后一次调用appendRectgle()的代码,其次是使用最后一次调用moveTo()、多lineTo()和ClosePath()。我优先考虑后者。如果我找不到任何路径(我在一个PDF中找到了它,在另一个PDF中,在绘图图像()之前,我只找到appendRectgle()),我使用前者。如果它们都不存在,我不知道该怎么办。下面是我假设PDFBox如何使用moveTo()/lineTo()/ClosePath()标记图像代码:
以下是我当前的实现:
import java.awt.geom.Point2D;
import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.apache.pdfbox.contentstream.PDFGraphicsStreamEngine;
import org.apache.pdfbox.cos.COSArray;
import org.apache.pdfbox.cos.COSName;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.font.PDFont;
import org.apache.pdfbox.pdmodel.graphics.image.PDImage;
import org.apache.pdfbox.pdmodel.interactive.annotation.PDAnnotation;
import org.apache.pdfbox.util.Matrix;
import org.apache.pdfbox.util.Vector;
public class PageVerticalAnalyzer extends PDFGraphicsStreamEngine
{
/**
* This is a port of iText based PageVerticalAnalyzer found here
* https://github.com/mkl-public/testarea-itext5/blob/master/src/main/java/mkl/testarea/itext5/merge/PageVerticalAnalyzer.java
*
* @param page PDF Page
*/
protected PageVerticalAnalyzer(PDPage page)
{
super(page);
}
public static void main(String[] args) throws IOException
{
File file = new File("q2.pdf");
try (PDDocument doc = PDDocument.load(file))
{
PDPage page = doc.getPage(0);
PageVerticalAnalyzer engine = new PageVerticalAnalyzer(page);
engine.run();
System.out.println(engine.verticalFlips);
}
}
/**
* Runs the engine on the current page.
*
* @throws IOException If there is an IO error while drawing the page.
*/
public void run() throws IOException
{
processPage(getPage());
for (PDAnnotation annotation : getPage().getAnnotations())
{
showAnnotation(annotation);
}
}
// All path related stuff
@Override
public void clip(int windingRule) throws IOException
{
System.out.println("clip");
}
@Override
public void moveTo(float x, float y) throws IOException
{
System.out.printf("moveTo %.2f %.2f%n", x, y);
lastPathBottomTop = new float[] {(Float) null, y};
}
@Override
public void lineTo(float x, float y) throws IOException
{
System.out.printf("lineTo %.2f %.2f%n", x, y);
lastLineTo = new float[] {x, y};
}
@Override
public void curveTo(float x1, float y1, float x2, float y2, float x3, float y3) throws IOException
{
System.out.printf("curveTo %.2f %.2f, %.2f %.2f, %.2f %.2f%n", x1, y1, x2, y2, x3, y3);
}
@Override
public Point2D getCurrentPoint() throws IOException
{
// if you want to build paths, you'll need to keep track of this like PageDrawer does
return new Point2D.Float(0, 0);
}
@Override
public void closePath() throws IOException
{
System.out.println("closePath");
lastPathBottomTop[0] = lastLineTo[1];
lastLineTo = null;
}
@Override
public void endPath() throws IOException
{
System.out.println("endPath");
}
@Override
public void strokePath() throws IOException
{
System.out.println("strokePath");
}
@Override
public void fillPath(int windingRule) throws IOException
{
System.out.println("fillPath");
}
@Override
public void fillAndStrokePath(int windingRule) throws IOException
{
System.out.println("fillAndStrokePath");
}
@Override
public void shadingFill(COSName shadingName) throws IOException
{
System.out.println("shadingFill " + shadingName.toString());
}
// Rectangle related stuff
@Override
public void appendRectangle(Point2D p0, Point2D p1, Point2D p2, Point2D p3) throws IOException
{
System.out.printf("appendRectangle %.2f %.2f, %.2f %.2f, %.2f %.2f, %.2f %.2f%n",
p0.getX(), p0.getY(), p1.getX(), p1.getY(),
p2.getX(), p2.getY(), p3.getX(), p3.getY());
lastRectBottomTop = new float[] {(float) p0.getY(), (float) p3.getY()};
}
// Image drawing
@Override
public void drawImage(PDImage pdImage) throws IOException
{
System.out.println("drawImage");
if (lastPathBottomTop != null) {
addVerticalUseSection(lastPathBottomTop[0], lastPathBottomTop[1]);
} else if (lastRectBottomTop != null ){
addVerticalUseSection(lastRectBottomTop[0], lastRectBottomTop[1]);
} else {
throw new Error("Drawing image without last reference!");
}
lastPathBottomTop = null;
lastRectBottomTop = null;
}
// All text related stuff
@Override
public void showTextString(byte[] string) throws IOException
{
System.out.print("showTextString \"");
super.showTextString(string);
System.out.println("\"");
}
@Override
public void showTextStrings(COSArray array) throws IOException
{
System.out.print("showTextStrings \"");
super.showTextStrings(array);
System.out.println("\"");
}
@Override
protected void showGlyph(Matrix textRenderingMatrix, PDFont font, int code, String unicode,
Vector displacement) throws IOException
{
// print the actual character that is being rendered
System.out.print(unicode);
super.showGlyph(textRenderingMatrix, font, code, unicode, displacement);
// rendering matrix seems to contain bounding box of dimensions the char
// and an x/y point where bounding box starts
//System.out.println(textRenderingMatrix.toString());
// y of the bottom of the char
// not sure why the y value is in the 8th column
// when I print the matrix, it shows up in the 6th column
float yBottom = textRenderingMatrix.getValue(0, 7);
// height of the char
// using the value in the first column as the char height
float yTop = yBottom + textRenderingMatrix.getValue(0, 0);
addVerticalUseSection(yBottom, yTop);
}
// Keeping track of bottom/top point pairs
void addVerticalUseSection(float from, float to)
{
if (to < from)
{
float temp = to;
to = from;
from = temp;
}
int i=0, j=0;
for (; i<verticalFlips.size(); i++)
{
float flip = verticalFlips.get(i);
if (flip < from)
continue;
for (j=i; j<verticalFlips.size(); j++)
{
flip = verticalFlips.get(j);
if (flip < to)
continue;
break;
}
break;
}
boolean fromOutsideInterval = i%2==0;
boolean toOutsideInterval = j%2==0;
while (j-- > i)
verticalFlips.remove(j);
if (toOutsideInterval)
verticalFlips.add(i, to);
if (fromOutsideInterval)
verticalFlips.add(i, from);
}
final List<Float> verticalFlips = new ArrayList<Float>();
private float[] lastRectBottomTop;
private float[] lastPathBottomTop;
private float[] lastLineTo;
}
我正在寻找以下问题的答案:
这个答案与最初的iText版本存在相同的问题。
可以按如下方式将PageVerticalAnalyzer
从iText移植到PDFBox:
public class PageVerticalAnalyzer extends PDFGraphicsStreamEngine {
protected PageVerticalAnalyzer(PDPage page) {
super(page);
}
public List<Float> getVerticalFlips() {
return verticalFlips;
}
//
// Text
//
@Override
protected void showGlyph(Matrix textRenderingMatrix, PDFont font, int code, String unicode, Vector displacement)
throws IOException {
super.showGlyph(textRenderingMatrix, font, code, unicode, displacement);
Shape shape = calculateGlyphBounds(textRenderingMatrix, font, code);
if (shape != null) {
Rectangle2D rect = shape.getBounds2D();
addVerticalUseSection(rect.getMinY(), rect.getMaxY());
}
}
/**
* Copy of <code>org.apache.pdfbox.examples.util.DrawPrintTextLocations.calculateGlyphBounds(Matrix, PDFont, int)</code>.
*/
private Shape calculateGlyphBounds(Matrix textRenderingMatrix, PDFont font, int code) throws IOException
{
GeneralPath path = null;
AffineTransform at = textRenderingMatrix.createAffineTransform();
at.concatenate(font.getFontMatrix().createAffineTransform());
if (font instanceof PDType3Font)
{
// It is difficult to calculate the real individual glyph bounds for type 3 fonts
// because these are not vector fonts, the content stream could contain almost anything
// that is found in page content streams.
PDType3Font t3Font = (PDType3Font) font;
PDType3CharProc charProc = t3Font.getCharProc(code);
if (charProc != null)
{
BoundingBox fontBBox = t3Font.getBoundingBox();
PDRectangle glyphBBox = charProc.getGlyphBBox();
if (glyphBBox != null)
{
// PDFBOX-3850: glyph bbox could be larger than the font bbox
glyphBBox.setLowerLeftX(Math.max(fontBBox.getLowerLeftX(), glyphBBox.getLowerLeftX()));
glyphBBox.setLowerLeftY(Math.max(fontBBox.getLowerLeftY(), glyphBBox.getLowerLeftY()));
glyphBBox.setUpperRightX(Math.min(fontBBox.getUpperRightX(), glyphBBox.getUpperRightX()));
glyphBBox.setUpperRightY(Math.min(fontBBox.getUpperRightY(), glyphBBox.getUpperRightY()));
path = glyphBBox.toGeneralPath();
}
}
}
else if (font instanceof PDVectorFont)
{
PDVectorFont vectorFont = (PDVectorFont) font;
path = vectorFont.getPath(code);
if (font instanceof PDTrueTypeFont)
{
PDTrueTypeFont ttFont = (PDTrueTypeFont) font;
int unitsPerEm = ttFont.getTrueTypeFont().getHeader().getUnitsPerEm();
at.scale(1000d / unitsPerEm, 1000d / unitsPerEm);
}
if (font instanceof PDType0Font)
{
PDType0Font t0font = (PDType0Font) font;
if (t0font.getDescendantFont() instanceof PDCIDFontType2)
{
int unitsPerEm = ((PDCIDFontType2) t0font.getDescendantFont()).getTrueTypeFont().getHeader().getUnitsPerEm();
at.scale(1000d / unitsPerEm, 1000d / unitsPerEm);
}
}
}
else if (font instanceof PDSimpleFont)
{
PDSimpleFont simpleFont = (PDSimpleFont) font;
// these two lines do not always work, e.g. for the TT fonts in file 032431.pdf
// which is why PDVectorFont is tried first.
String name = simpleFont.getEncoding().getName(code);
path = simpleFont.getPath(name);
}
else
{
// shouldn't happen, please open issue in JIRA
System.out.println("Unknown font class: " + font.getClass());
}
if (path == null)
{
return null;
}
return at.createTransformedShape(path.getBounds2D());
}
//
// Bitmaps
//
@Override
public void drawImage(PDImage pdImage) throws IOException {
Matrix ctm = getGraphicsState().getCurrentTransformationMatrix();
Section section = null;
for (int x = 0; x < 2; x++) {
for (int y = 0; y < 2; y++) {
Point2D.Float point = ctm.transformPoint(x, y);
if (section == null)
section = new Section(point.y);
else
section.extendTo(point.y);
}
}
addVerticalUseSection(section.from, section.to);
}
//
// Paths
//
@Override
public void appendRectangle(Point2D p0, Point2D p1, Point2D p2, Point2D p3) throws IOException {
subPath = null;
Section section = new Section(p0.getY());
section.extendTo(p1.getY()).extendTo(p2.getY()).extendTo(p3.getY());
currentPoint = p0;
}
@Override
public void clip(int windingRule) throws IOException {
}
@Override
public void moveTo(float x, float y) throws IOException {
subPath = new Section(y);
path.add(subPath);
currentPoint = new Point2D.Float(x, y);
}
@Override
public void lineTo(float x, float y) throws IOException {
if (subPath == null) {
subPath = new Section(y);
path.add(subPath);
} else
subPath.extendTo(y);
currentPoint = new Point2D.Float(x, y);
}
/**
* Beware! This is incorrect! The control points may be outside
* the vertically used range
*/
@Override
public void curveTo(float x1, float y1, float x2, float y2, float x3, float y3) throws IOException {
if (subPath == null) {
subPath = new Section(y1);
path.add(subPath);
} else
subPath.extendTo(y1);
subPath.extendTo(y2).extendTo(y3);
currentPoint = new Point2D.Float(x3, y3);
}
@Override
public Point2D getCurrentPoint() throws IOException {
return currentPoint;
}
@Override
public void closePath() throws IOException {
}
@Override
public void endPath() throws IOException {
path.clear();
subPath = null;
}
@Override
public void strokePath() throws IOException {
for (Section section : path) {
addVerticalUseSection(section.from, section.to);
}
path.clear();
subPath = null;
}
@Override
public void fillPath(int windingRule) throws IOException {
for (Section section : path) {
addVerticalUseSection(section.from, section.to);
}
path.clear();
subPath = null;
}
@Override
public void fillAndStrokePath(int windingRule) throws IOException {
for (Section section : path) {
addVerticalUseSection(section.from, section.to);
}
path.clear();
subPath = null;
}
@Override
public void shadingFill(COSName shadingName) throws IOException {
// TODO Auto-generated method stub
}
Point2D currentPoint = null;
List<Section> path = new ArrayList<Section>();
Section subPath = null;
static class Section {
Section(double value) {
this((float)value);
}
Section(float value) {
from = value;
to = value;
}
Section extendTo(double value) {
return extendTo((float)value);
}
Section extendTo(float value) {
if (value < from)
from = value;
else if (value > to)
to = value;
return this;
}
private float from;
private float to;
}
void addVerticalUseSection(double from, double to) {
addVerticalUseSection((float)from, (float)to);
}
void addVerticalUseSection(float from, float to) {
if (to < from) {
float temp = to;
to = from;
from = temp;
}
int i=0, j=0;
for (; i<verticalFlips.size(); i++) {
float flip = verticalFlips.get(i);
if (flip < from)
continue;
for (j=i; j<verticalFlips.size(); j++) {
flip = verticalFlips.get(j);
if (flip < to)
continue;
break;
}
break;
}
boolean fromOutsideInterval = i%2==0;
boolean toOutsideInterval = j%2==0;
while (j-- > i)
verticalFlips.remove(j);
if (toOutsideInterval)
verticalFlips.add(i, to);
if (fromOutsideInterval)
verticalFlips.add(i, from);
}
final List<Float> verticalFlips = new ArrayList<Float>();
}
(PageVerticalAnalyzer.java)
实现实际上类似于这个答案中的BoundingBoxFinder
。就像在那里,我借用了PDFBox示例DrawPrintTextLocations
来确定文本大纲。
此外,与原始iText5PageVerticalAnalyzer
处理相对应的曲线中存在一个问题。根据该回答,控制点被视为位于实际曲线上,但实际上通常不是,并且可能远远超出曲线的垂直使用范围。我们可以使用相应的AWT类来代替这里实现的路径处理,但这在Android等平台上可能是不可能的。
就像这里一样,这个类忽略了注释,但是iText5也忽略了注释。这个类也会忽略剪辑路径。。。
public class PdfVeryDenseMergeTool {
public PdfVeryDenseMergeTool(PDRectangle size, float top, float bottom, float gap)
{
this.pageSize = size;
this.topMargin = top;
this.bottomMargin = bottom;
this.gap = gap;
}
public void merge(OutputStream outputStream, Iterable<PDDocument> inputs) throws IOException
{
try
{
openDocument();
for (PDDocument input: inputs)
{
merge(input);
}
if (currentContents != null) {
currentContents.close();
currentContents = null;
}
document.save(outputStream);
}
finally
{
closeDocument();
}
}
void openDocument() throws IOException
{
document = new PDDocument();
newPage();
}
void closeDocument() throws IOException
{
try
{
if (currentContents != null) {
currentContents.close();
currentContents = null;
}
document.close();
}
finally
{
this.document = null;
this.yPosition = 0;
}
}
void newPage() throws IOException
{
if (currentContents != null) {
currentContents.close();
currentContents = null;
}
currentPage = new PDPage(pageSize);
document.addPage(currentPage);
yPosition = pageSize.getUpperRightY() - topMargin;
currentContents = new PDPageContentStream(document, currentPage);
}
void merge(PDDocument input) throws IOException
{
for (PDPage page : input.getPages())
{
merge(input, page);
}
}
void merge(PDDocument sourceDoc, PDPage page) throws IOException
{
PDRectangle pageSizeToImport = page.getCropBox();
PageVerticalAnalyzer analyzer = new PageVerticalAnalyzer(page);
analyzer.processPage(page);
List<Float> verticalFlips = analyzer.getVerticalFlips();
if (verticalFlips.size() < 2)
return;
LayerUtility layerUtility = new LayerUtility(document);
PDFormXObject form = layerUtility.importPageAsForm(sourceDoc, page);
int startFlip = verticalFlips.size() - 1;
boolean first = true;
while (startFlip > 0)
{
if (!first)
newPage();
float freeSpace = yPosition - pageSize.getLowerLeftY() - bottomMargin;
int endFlip = startFlip + 1;
while ((endFlip > 1) && (verticalFlips.get(startFlip) - verticalFlips.get(endFlip - 2) < freeSpace))
endFlip -=2;
if (endFlip < startFlip)
{
float height = verticalFlips.get(startFlip) - verticalFlips.get(endFlip);
currentContents.saveGraphicsState();
currentContents.addRect(0, yPosition - height, pageSizeToImport.getWidth(), height);
currentContents.clip();
Matrix matrix = Matrix.getTranslateInstance(0, (float)(yPosition - (verticalFlips.get(startFlip) - pageSizeToImport.getLowerLeftY())));
currentContents.transform(matrix);
currentContents.drawForm(form);
currentContents.restoreGraphicsState();
yPosition -= height + gap;
startFlip = endFlip - 1;
}
else if (!first)
throw new IllegalArgumentException(String.format("Page %s content sections too large.", page));
first = false;
}
}
PDDocument document = null;
PDPage currentPage = null;
PDPageContentStream currentContents = null;
float yPosition = 0;
final PDRectangle pageSize;
final float topMargin;
final float bottomMargin;
final float gap;
}
(PdfVeryDenseMergeTool.java)
这本质上是iText 5
PdfVeryDenseMergeTool
的一个简单端口,没有什么特别之处。
一个简单地创建一个带有格式信息的
PdfVeryDenseMergeTool
实例,然后使用PD文档
实例作为源开始合并:
PDDocument document1 = ...;
...
PDDocument documentN = ...;
PdfVeryDenseMergeTool tool = new PdfVeryDenseMergeTool(PDRectangle.A4, 30, 30, 10);
tool.merge(new FileOutputStream(RESULT_FILE), Arrays.asList(document1, ..., documentN));
(DenseMerging test
testVeryDenseMerging
)
我需要将N个PDF文件合并成一个。我先创建一个空白文件 稍后,我将遍历html字符串数组 我不太明白PdfWriter和PDFCopy之间的区别。
我有一个pdf,里面总共有6页的图片。我想将第1页和第2页合并为单个pdf,以此类推,共3到6页。 我将所有6页的pdf拆分为单独的pdf。 从PyPDF2导入操作系统导入PdfFileReader、PdfFileWriter pdf_splitter: fname=os.path.splitext(os.path.basename(path))[0] if name=='main': path=
问题内容: 我正在尝试将类路径中的文件复制到另一个临时位置。 这是它的代码: readMeFile有2页,在tempFilesOutputPath文件夹中复制的文件也有2页,但没有任何内容。 如果我犯了一些错误,或者必须以其他方式进行处理,请告诉我。 干杯,马杜 问题答案: 问题完全无关。我正在使用Maven复制资源来复制src / main / resources /下的资源 这是我的行家资源:
问题内容: 我有普通的PDF文件,我想使用,在PDF 的末尾插入空白页,而不会打扰PDF内容。 问题答案: Dinup Kandel的答案是错误的,因为它是关于从头开始创建文档的。 NK123的答案 非常错误, 因为它使用/ 连接文件。该示例假定原始文档中的所有页面的尺寸均为A4。并非总是如此。如记录所示,这也将丢弃所有交互性。 唯一的好答案是这样的: 如果引用的文档有10页,则上面的代码将使用与
我看了一个视频,学习如何将PDF文件合并成一个PDF文件。我试图修改一点代码,以便处理一个文件夹,其中有PDF文件主文件夹(Spyder)有,这是代码 我有一个名为的子文件夹进入主文件夹,在这个子文件夹中,我把PDF文件和子文件夹内的我创建了一个名为的文件夹。我得到了错误文件没有找到1.pdf虽然当打印的内循环,我得到了PDF名称。 错误的追溯
问题内容: 我的概念是-网站中有10个pdf文件。用户可以选择一些pdf文件,然后选择合并以创建一个包含所选页面的pdf文件。我该如何用PHP做到这一点? 问题答案: 我以前做过 我有一个用fpdf生成的pdf,我需要在其中添加可变数量的PDF。 因此,我已经设置了fpdf对象和页面),并使用fpdi导入了文件通过扩展PDF类来添加FDPI: 基本上,这会将每个pdf转换为图像以放入您的其他pdf