问题：

PDFBox：PDDocument和PDPage是否相互引用？

欧君之

2023-03-14

PDPage对象是否包含对其所属PDDocument的引用
换句话说，PDPage是否了解其PDDocument<在应用程序的某个地方，我有一个文档列表
这些文档被合并到一个新的PDDocument中：

PDFMergerUtility pdfMerger = new PDFMergerUtility();

PDDocument mergedPDDocument = new PDDocument();
for (PDDocument pdfDocument : documentList) {
    pdfMerger.appendDocument(mergedPDDocument, pdfDocument);
}

然后将此PDI文档分成10个包：

Splitter splitter = new Splitter();
splitter.setSplitAtPage(bundleSize);
List<PDDocument> bundleList = splitter.split(mergedDocument);

我现在的问题是：
如果我循环遍历列表中这些拆分的PDDocuments的页面，是否有办法知道页面最初属于哪个PDDocument？

另外，如果你有一个PDPage对象，你能从中获取信息吗，比如，它的页码。。。。？或者你能通过另一种方式得到这个吗？

谢嘉

2023-03-14

PDPage对象是否包含对其所属的PDDocument的引用？换句话说，PDPage是否了解其PDDocument

不幸的是，PDPage不包含对其父PDDocument的引用，但它有一个文档中所有其他页面的列表，可用于在页面之间导航，而无需引用父PDDocument。

有一种解决方法，可以在没有可用的PDDocument的情况下获取有关PDPage在文档中的位置的信息。每个PDPage都有一个字典，其中包含有关页面大小、资源、字体、内容等的信息。其中一个属性称为Parent，这是一个页面数组，包含使用构造函数PDPage（COSDictionary）创建PDPage的浅层克隆所需的所有信息。页面顺序正确，因此可以通过记录在数组中的位置获取页码。

一旦将文档列表合并到单个文档中，对原始文档的所有引用都将丢失。您可以通过查看PDPage中的父对象来确认这一点，转到父对象

COSName {Parent} : COSObject {
  COSDictionary {
    COSName {Kids} : COSArray {
      COSObject {
        COSDictionary {
          COSName {TrimBox} : COSArray {0; 0; 612; 792;};
          COSName {MediaBox} : COSArray {0; 0; 612; 792;};
          COSName {CropBox} : COSArray {0; 0; 612; 792;};
          COSName {Resources} : COSDictionary {
            ...
          };
          COSName {Contents} : COSObject {
            ...
          };
          COSName {Parent} : 1781256139;
          COSName {StructParents} : COSInt {68};
          COSName {ArtBox} : COSArray {0; 0; 612; 792; };
          COSName {BleedBox} : COSArray {0; 0; 612; 792; };
          COSName {Type} : COSName {Page};
        }
    }

    ...

    COSName {Count} : COSInt {4};
    COSName {Type} : COSName {Pages};
  }
};

源代码

我编写了以下代码来展示如何使用PDPage字典中的信息来来回导航页面并使用数组中的位置获取页码。

public class PDPageUtils {
    public static void main(String[] args) throws InvalidPasswordException, IOException {
        System.setProperty("sun.java2d.cmm", "sun.java2d.cmm.kcms.KcmsServiceProvider");

        PDDocument document = null;
        try {
            String filename = "src/main/resources/pdf/us-017.pdf";
            document = PDDocument.load(new File(filename));

            System.out.println("listIterator(PDPage)");
            ListIterator<PDPage> pageIterator = listIterator(document.getPage(0));
            while (pageIterator.hasNext()) {
                PDPage page = pageIterator.next();
                System.out.println("page #: " + pageIterator.nextIndex() + ", Structural Parent Key: " + page.getStructParents());
            }
        } finally {
            if (document != null) {
                document.close();
            }
        }
    }

    /**
     * Returns a <code>ListIterator</code> initialized with the list of pages from
     * the dictionary embedded in the specified <code>PDPage</code>. The current
     * position of this <code>ListIterator</code> is set to the position of the
     * specified <code>PDPage</code>.
     * 
     * @param page the specified <code>PDPage</code>
     * 
     * @see {@link java.util.ListIterator}
     * @see {@link org.apache.pdfbox.pdmodel.PDPage}
     */
    public static ListIterator<PDPage> listIterator(PDPage page) {
        List<PDPage> pages = new LinkedList<PDPage>();

        COSDictionary pageDictionary = page.getCOSObject();
        COSDictionary parentDictionary = pageDictionary.getCOSDictionary(COSName.PARENT);
        COSArray kidsArray = parentDictionary.getCOSArray(COSName.KIDS);

        List<? extends COSBase> kidList = kidsArray.toList();
        for (COSBase kid : kidList) {
            if (kid instanceof COSObject) {
                COSObject kidObject = (COSObject) kid;
                COSBase type = kidObject.getDictionaryObject(COSName.TYPE);
                if (type == COSName.PAGE) {
                    COSBase kidPageBase = kidObject.getObject();
                    if (kidPageBase instanceof COSDictionary) {
                        COSDictionary kidPageDictionary = (COSDictionary) kidPageBase;
                        pages.add(new PDPage(kidPageDictionary));
                    }
                }
            }
        }
        int index = pages.indexOf(page);
        return pages.listIterator(index);
    }
}

样本输出

在本例中，PDF文档有4页，迭代器用第一页初始化。请注意，页码是previousIndex（）

System.out.println("listIterator(PDPage)");
ListIterator<PDPage> pageIterator = listIterator(document.getPage(0));
while (pageIterator.hasNext()) {
    PDPage page = pageIterator.next();
    System.out.println("page #: " + pageIterator.previousIndex() + ", Structural Parent Key: " + page.getStructParents());
}

listIterator(PDPage)
page #: 0, Structural Parent Key: 68
page #: 1, Structural Parent Key: 69
page #: 2, Structural Parent Key: 70
page #: 3, Structural Parent Key: 71

也可以从最后一页开始向后导航。现在请注意，页码是nextIndex（）。

ListIterator<PDPage> pageIterator = listIterator(document.getPage(3));
pageIterator.next();
while (pageIterator.hasPrevious()) {
    PDPage page = pageIterator.previous();
    System.out.println("page #: " + pageIterator.nextIndex() + ", Structural Parent Key: " + page.getStructParents());
}

listIterator(PDPage)
page #: 3, Structural Parent Key: 71
page #: 2, Structural Parent Key: 70
page #: 1, Structural Parent Key: 69
page #: 0, Structural Parent Key: 68

PDFBox：PDDocument和PDPage是否相互引用？

共有1个答案

相关问答

相关文章

相关阅读

相关工具

相关文档