我面临以下用例:
我收到一个包含许多文档的pdf。每个文档具有不同的页数。它们由条形码页分隔。
是否可以拆分包含多个文档的多页PDF,这些文档由带有条形码的页面分隔,并为每个文档创建一个新的PDF?
我听说我们可以用Itext:https://developers.itextpdf.com/examples/stamping-content-existing-pdfs/clone-splitting-pdf-file拆分pdf
但是我在网上找不到当我检测条形码页面时分割它的方法。
更新:@mkl我发现了如何使用zxing从二维码中读取文本:它可以使用简单的png文件
File QRfile = new File("test.png");
BufferedImage bufferedImg = ImageIO.read(QRfile);
LuminanceSource source = new BufferedImageLuminanceSource(bufferedImg);
BinaryBitmap bitmap = new BinaryBitmap(new HybridBinarizer(source));
Result result = new MultiFormatReader().decode(bitmap);
System.out.println("Barcode Format: " + result.getBarcodeFormat());
System.out.println("Content: " + result.getText());
但它不是循环工作的。我用pdf文档测试(7页)
这里是JAVA代码:
PdfDocument pdfDoc;
pdfDoc = new PdfDocument(new PdfReader(pathName));
logger.debug("pdfDoc OK");
PdfDocumentContentParser contentParser = new PdfDocumentContentParser(pdfDoc);
for (int page = 1; page <= pdfDoc.getNumberOfPages(); page++)
{
logger.debug("page: " + page);
contentParser.processContent(page, new IEventListener()
{
@Override
public Set<EventType> getSupportedEvents()
{
logger.debug("inside getSupportedEvents");
return Collections.singleton(RENDER_IMAGE);
}
@Override
public void eventOccurred(IEventData data, EventType type)
{
index = index + 1;
logger.debug("inside eventOccurred - data: " + data);
logger.debug("inside eventOccurred - type: " + type);
logger.debug("inside eventOccurred - index: " + index);
if (data instanceof ImageRenderInfo)
{
logger.debug("data instanceof ImageRenderInfo");
ImageRenderInfo imageRenderInfo = (ImageRenderInfo) data;
byte[] bytes = imageRenderInfo.getImage().getImageBytes();
try
{
logger.debug("avant Files writer");
String pngName = "C:/alfresco/klinck/splitImage-" + index + ".png";
logger.debug("pngName: " + pngName);
Files.write(new File(pngName).toPath(), bytes);
logger.debug("Files written");
File QRfile = new File(pngName);
logger.debug("QR File trouvé ! ");
BufferedImage bufferedImg = ImageIO.read(QRfile);
logger.debug("bufferedImg OK ");
LuminanceSource source = new BufferedImageLuminanceSource(bufferedImg);
logger.debug("source OK ");
BinaryBitmap bitmap = new BinaryBitmap(new HybridBinarizer(source));
logger.debug("bitmap OK");
Result result = new MultiFormatReader().decode(bitmap);
logger.debug("SplitFluxJobExcecuter - resultBarcodeFormat: " + result.getBarcodeFormat());
logger.debug("SplitFluxJobExcecuter - result.getText(): " + result.getText());
}catch (Exception e)
{
logger.error("SplitJobExecuter Exception : " + ExceptionUtils.getStackTrace(e));
}
}
}
int index = 0;
});
}
第一页包含3个图像(1个二维码)。在最后一步中,我得到了“com.google.zxing.NotFoundException”。
这是日志:
2018-07-25 16:27:00,227 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] pdfDoc OK
2018-07-25 16:27:00,227 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] page: 1
2018-07-25 16:27:00,237 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] inside getSupportedEvents
2018-07-25 16:27:00,265 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] inside eventOccurred - data: com.itextpdf.kernel.pdf.canvas.parser.data.ImageRenderInfo@2472ac79
2018-07-25 16:27:00,266 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] inside eventOccurred - type: RENDER_IMAGE
2018-07-25 16:27:00,266 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] inside eventOccurred - index: 1
2018-07-25 16:27:00,266 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] data instanceof ImageRenderInfo
2018-07-25 16:27:00,266 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] avant Files writer
2018-07-25 16:27:00,266 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] pngName: C:/alfresco/klinck/splitImage-1.png
2018-07-25 16:27:00,270 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] Files written
2018-07-25 16:27:00,270 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] QR File trouvé !
2018-07-25 16:27:00,304 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] bufferedImg OK
2018-07-25 16:27:00,305 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] source OK
2018-07-25 16:27:00,306 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] bitmap OK
2018-07-25 16:27:00,407 ERROR [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] SplitJobExecuter Exception : com.google.zxing.NotFoundException
2018-07-25 16:27:00,407 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] inside eventOccurred - data: com.itextpdf.kernel.pdf.canvas.parser.data.ImageRenderInfo@6e036aea
2018-07-25 16:27:00,407 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] inside eventOccurred - type: RENDER_IMAGE
2018-07-25 16:27:00,407 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] inside eventOccurred - index: 2
2018-07-25 16:27:00,407 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] data instanceof ImageRenderInfo
2018-07-25 16:27:00,408 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] avant Files writer
2018-07-25 16:27:00,408 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] pngName: C:/alfresco/klinck/splitImage-2.png
2018-07-25 16:27:00,411 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] Files written
2018-07-25 16:27:00,411 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] QR File trouvé !
2018-07-25 16:27:00,415 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] bufferedImg OK
2018-07-25 16:27:00,415 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] source OK
2018-07-25 16:27:00,415 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] bitmap OK
2018-07-25 16:27:00,473 ERROR [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] SplitJobExecuter Exception : com.google.zxing.NotFoundException
2018-07-25 16:27:00,474 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] inside eventOccurred - data: com.itextpdf.kernel.pdf.canvas.parser.data.ImageRenderInfo@4c205db7
2018-07-25 16:27:00,474 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] inside eventOccurred - type: RENDER_IMAGE
2018-07-25 16:27:00,474 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] inside eventOccurred - index: 3
2018-07-25 16:27:00,474 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] data instanceof ImageRenderInfo
2018-07-25 16:27:00,474 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] avant Files writer
2018-07-25 16:27:00,474 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] pngName: C:/alfresco/klinck/splitImage-3.png
2018-07-25 16:27:00,478 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] Files written
2018-07-25 16:27:00,478 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] QR File trouvé !
2018-07-25 16:27:00,479 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] bufferedImg OK
2018-07-25 16:27:00,479 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] source OK
2018-07-25 16:27:00,479 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] bitmap OK
2018-07-25 16:27:00,484 ERROR [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] SplitJobExecuter Exception : com.google.zxing.NotFoundException
从第2页到第7页,错误消息不同:
2018-07-25 16:27:00,487 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] page: 2
2018-07-25 16:27:00,488 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] inside getSupportedEvents
2018-07-25 16:27:00,488 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] inside eventOccurred - data: com.itextpdf.kernel.pdf.canvas.parser.data.ImageRenderInfo@6d41ffa2
2018-07-25 16:27:00,488 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] inside eventOccurred - type: RENDER_IMAGE
2018-07-25 16:27:00,488 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] inside eventOccurred - index: 1
2018-07-25 16:27:00,489 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] data instanceof ImageRenderInfo
2018-07-25 16:27:00,489 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] avant Files writer
2018-07-25 16:27:00,489 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] pngName: C:/alfresco/klinck/splitImage-1.png
2018-07-25 16:27:00,492 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] Files written
2018-07-25 16:27:00,493 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] QR File trouvé !
2018-07-25 16:27:00,493 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] bufferedImg OK
2018-07-25 16:27:00,493 ERROR [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] SplitJobExecuter Exception : java.lang.NullPointerException
at com.google.zxing.client.j2se.BufferedImageLuminanceSource.<init>(BufferedImageLuminanceSource.java:42)
at com.klinck.mc.jobs.SplitFluxJobExecuter$1.eventOccurred(SplitFluxJobExecuter.java:150)
at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.eventOccurred(PdfCanvasProcessor.java:534)
at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.displayImage(PdfCanvasProcessor.java:573)
at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.access$5800(PdfCanvasProcessor.java:108)
at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor$ImageXObjectDoHandler.handleXObject(PdfCanvasProcessor.java:1420)
at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.displayXObject(PdfCanvasProcessor.java:566)
at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.access$5600(PdfCanvasProcessor.java:108)
at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor$DoOperator.invoke(PdfCanvasProcessor.java:1285)
at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.invokeOperator(PdfCanvasProcessor.java:452)
at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.processContent(PdfCanvasProcessor.java:281)
at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.processPageContent(PdfCanvasProcessor.java:302)
at com.itextpdf.kernel.pdf.canvas.parser.PdfDocumentContentParser.processContent(PdfDocumentContentParser.java:77)
at com.itextpdf.kernel.pdf.canvas.parser.PdfDocumentContentParser.processContent(PdfDocumentContentParser.java:90)
at com.klinck.mc.jobs.SplitFluxJobExecuter.execute(SplitFluxJobExecuter.java:118)
at com.klinck.mc.jobs.SplitFluxJob$1.doWork(SplitFluxJob.java:27)
at org.alfresco.repo.security.authentication.AuthenticationUtil.runAs(AuthenticationUtil.java:555)
at com.klinck.mc.jobs.SplitFluxJob.executeJob(SplitFluxJob.java:24)
at org.alfresco.schedule.ScheduledJobLockExecuter.execute(ScheduledJobLockExecuter.java:94)
at org.alfresco.schedule.AbstractScheduledLockedJob.executeInternal(AbstractScheduledLockedJob.java:72)
at org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:114)
at org.quartz.core.JobRunShell.run(JobRunShell.java:216)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:563)
更新2
我认为出现错误消息“com.google.zxing.NotFoundException”是因为图像不包含文本消息或太大:com.google.中兴。当执行核心java程序时,NotFoundException异常出现?
它通过以下方法对我有效:
步骤1:
检测特定二维码并将页码存储在列表中:
PdfDocument pdfDoc;
pdfDoc = new PdfDocument(new PdfReader(pathName));
logger.debug("pdfDoc OK");
PdfDocumentContentParser contentParser = new PdfDocumentContentParser(pdfDoc);
List<Integer> pageList = new ArrayList<Integer>();
int[] currentPage = new int[1];
for ( int page = 1; page <= pdfDoc.getNumberOfPages(); page++) {
currentPage[0] = page;
contentParser.processContent(page, new IEventListener() {
@Override
public Set<EventType> getSupportedEvents() {
return Collections.singleton(RENDER_IMAGE);
}
@Override
public void eventOccurred(IEventData data, EventType type) {
index = index + 1;
if (data instanceof ImageRenderInfo) {
logger.debug("data instanceof ImageRenderInfo");
ImageRenderInfo imageRenderInfo = (ImageRenderInfo) data;
byte[] bytes = imageRenderInfo.getImage().getImageBytes();
String pngName = coreServices.getSplitFolderTemp() +"Page-" + currentPage[0] + "_Image-" + index + ".png";
logger.debug("pngName: " + pngName);
File image = new File(pngName);
try {
// le QR code KLINCK est stocké dans la première image de la feuille de séparation.
if (index == 1) {
// ZXING - > Read Data from QR Code
Files.write(new File(pngName).toPath(), bytes);
BufferedImage bufferedImg = ImageIO.read(image);
LuminanceSource source = new BufferedImageLuminanceSource(bufferedImg);
BinaryBitmap bitmap = new BinaryBitmap(new HybridBinarizer(source));
Result result = new MultiFormatReader().decode(bitmap);
if (result.getBarcodeFormat().toString().equals("QR_CODE") && result.getText().toString().equals("SEPARATEUR")) {
// on stocke les numéros de pages des QR Code Klinck
pageList.add(currentPage[0]);
logger.debug("QR code Klinck trouvé en page: " + currentPage[0]);
}
}
}
catch (Exception e) {
logger.error("l'image détectée n'est pas le QR Code Klinck : " + ExceptionUtils.getStackTrace(e));
}
if (image.delete())
logger.debug("immage supprimée");
}
}
int index = 0;
});
}
步骤2:创建PDF
logger.debug("Création des PDFs");
if (pageList.size() == 0) {
logger.debug("un seul document ");
PdfDocument pdfDest = new PdfDocument(new PdfWriter("C:/alfresco/klinck/onePdf.pdf"));
pdfDoc.copyPagesTo(1,pdfDoc.getNumberOfPages(), pdfDest);
pdfDest.close();
} else {
// 2) Un ou plusieurs QR code = au moins deux documents
logger.debug("longueur liste: " + pageList.size());
int start = 1;
for (int index = 0; index < pageList.size(); index++) {
logger.debug("QR Code Klinck trouvé en page " + pageList.get(index) );
logger.debug("Prochain document , page " + start + " à " + pageList.get(index) + "- 1");
// la 1ère page du document initial ne doit pas être un séparateur
if (pageList.get(index) != 1) {
PdfDocument pdfDest = new PdfDocument(new PdfWriter("C:/alfresco/klinck/splitPdf-" + start + ".pdf"));
pdfDoc.copyPagesTo(start,pageList.get(index)-1, pdfDest);
pdfDest.close();
}
start = pageList.get(index) + 1;
}
// gestion du dernier document
PdfDocument pdfDest = new PdfDocument(new PdfWriter("C:/alfresco/klinck/splitPdf-" + start + ".pdf"));
pdfDoc.copyPagesTo(start, pdfDoc.getNumberOfPages(), pdfDest);
pdfDest.close();
}
pdfDoc.close();
问题内容: 我有一个小样本数据: 好像 我想用’-‘分隔符分隔列’V’并将其移至另一个名为’allele’的列 到目前为止,我尝试过的代码不完整,无法正常工作: 要么 问题答案: 与vectoried一起使用:
我有一个输入字符串,其中包含由分隔符(| |)分隔的4个ID。我使用的代码如下: 但有些情况下并非所有ID都存在,如: 在上面的场景中,拆分不会分为4个部分,并且无法判断拆分数组中缺少哪个id。 有人可以帮助一个有效的解决方案。
如何将过滤器列表拆分为单个过滤器元件?split2String在线程“main”java.util.regex中导致:异常。PatternSyntaxException:索引10或(|和)附近的未闭合组(
免责声明: 我使用的是iText5。我知道这通常不受欢迎(相对于使用iText7),但我正在使用大量使用iText5的遗留代码,升级不在我的控制范围内。 null 进展/办法: 我扩展了以生成包含字体信息(大小和系列、粗体或斜体等)以及位置信息(相对于绝对坐标系,原点位于输入PDF第一页的左上角)的XML。 然后逐页生成一个新的PDF(根据上面概述的要求,每个页都是所需的长度),根据每个新页的边界
在上一章中,我们已经了解了如何将JavaScript添加到PDF文档中。 现在让我们学习如何将给定的PDF文档拆分成多个文档。 拆分PDF文档中的页面 您可以使用名为Splitter的类将给定的PDF文档拆分为多个PDF文档。 此类用于将给定的PDF文档拆分为多个其他文档。 以下是拆分现有PDF文档的步骤 第1步:加载现有PDF文档 使用PDDocument类的静态方法load()加载现有PDF文
我是 Perl 的新手,但根据我阅读的文档,看起来 Perl 中的 split 函数要求正则表达式模式而不是字符串分隔符作为第一个参数,但我发现使用 之类的东西仍然可以正确拆分字符串。 基于此,我尝试使用可变分隔符(例如。< code>print (split($var,$ string))[0] where < code > $ var = ' ' )并发现它不起作用。我做错了什么? 谢谢! 编