问题：

在java中将PDF导入字符串

仉刚洁

2023-03-14

我需要使用java从pdf文件中提取文本。我找到了iText，但它没有按我希望的方式工作。这是我的密码

package com.itextpdf.mavenproject1;


import com.itextpdf.forms.PdfAcroForm;
import com.itextpdf.forms.fields.PdfButtonFormField;
import com.itextpdf.forms.fields.PdfFormField;
import com.itextpdf.io.font.FontConstants;
import com.itextpdf.kernel.font.PdfFontFactory;
import com.itextpdf.kernel.geom.Rectangle;
import com.itextpdf.kernel.pdf.PdfDocument;
import com.itextpdf.kernel.pdf.PdfReader;
import com.itextpdf.text.pdf.parser.PdfTextExtractor;
import com.itextpdf.kernel.pdf.PdfString;
import com.itextpdf.kernel.pdf.PdfWriter;
import com.itextpdf.kernel.pdf.action.PdfAction;
import com.itextpdf.kernel.pdf.annot.PdfAnnotation;
import com.itextpdf.kernel.pdf.annot.PdfTextAnnotation;
import com.itextpdf.kernel.pdf.canvas.PdfCanvas;
import com.itextpdf.kernel.pdf.canvas.parser.PdfTextExtractor;
import com.itextpdf.test.annotations.WrapToTest;
import java.io.File;
import java.io.IOException;

public class zczytywanie {

    public static void main(String args[]) throws IOException {


       PdfDocument pdfDoc = new PdfDocument(new PdfReader("D:/pdf/pdf"));

       String page= PdfTextExtractor.getTextFromPage(pdfDoc, 1);

       System.out.println(page);

    }
}

而且它告诉我，在我尝试使用PDdfTextExtrator的行中有一个错误（PdfDocumable not to pdfPage，尽管我发现pdfDoc必须是PdfReader）

这对我不起作用

pdfReader pdfDoc=new PdfReader（"D：/pdf/pdf"）；

要么。

共有1个答案

陈实

2023-03-14

你可以试试PDFBox或Tikka。但是这里我举了一个关于PDFBox的例子

将PDFBox jar依赖项添加到pom中。xml。

<dependencies>
        <dependency>
            <groupId>org.apache.pdfbox</groupId>
            <artifactId>pdfbox</artifactId>
            <version>2.0.23</version>
        </dependency>
</dependencies>

Java班

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;
import org.apache.pdfbox.text.PDFTextStripperByArea;

import java.io.File;
import java.io.IOException;

    public class TestPDF {
        public static void main(String[] args) {
           try (PDDocument document = PDDocument.load(new File("/path_to_your_pdf_file"))) {
               document.getClass();
    
               if(!document.isEncrypted()){
                   PDFTextStripperByArea stripper = new PDFTextStripperByArea();
                   stripper.setSortByPosition(true);
    
                   PDFTextStripper tStripper = new PDFTextStripper();
                   String pdfFileInText = tStripper.getText(document);
                   System.out.println("Text:" + pdfFileInText);
                   
               }
           } catch (IOException e) {
               e.printStackTrace();
           }
        }
    }

类似资料：

使用PDFBox将unicode字符串写入PDF

问题内容：我想使用Apache PDFBox 1.8.8创建一个包含Unicode字符的PDF，但是我对支持什么和不支持什么感到困惑。请有人澄清。另外，如果这是一个已修复的错误，则有人可以告诉我何时可能发布PDFBox的下一个版本。谢谢。问题答案：基本上，您链接到的所有答案都是正确的。您必须记住它们分别引用哪个PDFBox版本。 _关于这个答案在2.0.0之前的版本（直到当前的1.8.
如何在java中将文件读入字符串？

问题内容：我已将文件读入字符串。该文件包含各种名称，每行一个名称。现在的问题是，我希望将这些名称放在String数组中。为此，我编写了以下代码：但是我没有得到预期的结果，分割字符串后获得的数组长度为1。这意味着“ fileString”不具有“ \ n”字符，但是文件具有此“ \ n”字符。那么如何解决这个问题呢？问题答案：问题不在于如何分割字符串。那一点是正确的。您必须查看如何将文
在Java中将字符串转换为XML输入流

问题内容：我正在尝试使用FOP和Java生成PDF文档。我以字符串而不是文件的形式接收XML。如何将该XML字符串转换为XML输入流，以便可以调用xslfoTransformer.transform（source，res）; 其中source是我的XML字符串作为输入流。请提供您的建议。问题答案：您可能希望将其转换为，而不是。使用StringReader执行此操作。StreamSour
在Swift中将字符串写入NSOutputStream

问题内容：我试图写一个给斯威夫特。使用Objective C以这种方式编写字符串通常可以将其作为NSData传递这行不通这产生了错误 “ NSData”不可转换为“ UnsafePointer” 用于将数据写入流的行。您将如何在Swift中将字符串写入NSOutputStream？问题答案：这里有两个问题。首先是您要传递给（而不是）（就像您在Objective- C代码中传递的那样）。
在Java中将字符串编码为base32字符串

问题内容：就像标题所说的那样，我正在尝试在Java中将字符串“ test”编码为base32字符串“ ORSXG5A =“。我在网上搜索时发现的所有类都是使用32位从字符串编码为数组的类，但是显然这不是我想要的。很抱歉这个新手问题。问题答案： Apache commons编解码器提供了一个可以执行此操作的类版画您可以在此处下载。
在Java中将字符串转换为“字符”数组

问题内容：我想将a转换为Character类的对象数组，但无法执行转换。我知道我可以使用该方法将String转换为原始数据类型类型为“ char”的数组，但这无助于将String转换为Character类型的对象数组。我将如何去做？问题答案：用这个：

在java中将PDF导入字符串

共有1个答案

相关问答

相关文章

相关阅读

相关工具

相关文档