问题：

如何使用java流从web获取PDF文件

樊杰

2023-03-14

我需要从网上下载PDF文件，例如 http://www.math.uni-goettingen.de/zirkel/loesungen/blatt15/loes15.pdf 此链接。我必须使用流来做。使用图像，它的工作原理很好：

public static void main(String[] args) {
        try {           
                //get the url page from the arguments array
                String arg = args[0];
                URL url = new URL("https://cs7065.vk.me/c637923/v637923205/25608/AD8WhOSx1ic.jpg");

                try{
                    //jpg
                    InputStream in = new BufferedInputStream(url.openStream());
                    ByteArrayOutputStream out = new ByteArrayOutputStream();
                    byte[] buf = new byte[131072];
                    int n = 0;
                    while (-1!=(n=in.read(buf)))
                    {
                       out.write(buf, 0, n);
                    }
                    out.close();
                    in.close();
                    byte[] response = out.toByteArray();
                    FileOutputStream fos = new FileOutputStream("borrowed_image.jpg");
                    fos.write(response);
                    fos.close();
                 }
        catch (Exception e) {
            e.printStackTrace();
        }
    }

但是对于PDf，它不起作用。可能是什么问题？

共有2个答案

闻人德庸

2023-03-14

试试这个，这完成了工作（pdf是可读的）。看看请求url时是否有任何异常。

public static void main(String[] args) {
        try {
            //get the url page from the arguments array
            URL url = new URL("http://www.math.uni-goettingen.de/zirkel/loesungen/blatt15/loes15.pdf");

            try {
                InputStream in = new BufferedInputStream(url.openStream());
                ByteArrayOutputStream out = new ByteArrayOutputStream();
                byte[] buf = new byte[131072];
                int n = 0;
                while (-1 != (n = in.read(buf))) {
                    out.write(buf, 0, n);
                }
                out.close();
                in.close();
                byte[] response = out.toByteArray();
                FileOutputStream fos = new FileOutputStream("loes15.pdf");
                fos.write(response);
                fos.close();
            } catch (Exception e) {
                e.printStackTrace();
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

羊冠玉

2023-03-14

我对您的代码进行了一些小的编辑，以修复语法错误，这似乎有效（见下文）。考虑将＜code＞close（）块中。

package org.snb;

import java.io.BufferedInputStream;
import java.io.ByteArrayOutputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.net.URL;

public class PdfTester {
    public static void main(String[] args) {
        //get the url page from the arguments array

        try{
            //String arg = args[0];
            URL url = new URL("http://www.pdf995.com/samples/pdf.pdf");
            //jpg
            InputStream in = new BufferedInputStream(url.openStream());
            ByteArrayOutputStream out = new ByteArrayOutputStream();
            byte[] buf = new byte[131072];
            int n = 0;
            while (-1!=(n=in.read(buf)))
            {
               out.write(buf, 0, n);
            }
            out.close();
            in.close();
            byte[] response = out.toByteArray();
            FileOutputStream fos = new FileOutputStream("/tmp/bart.pdf");
            fos.write(response);
            fos.close();
         }
        catch (Exception e) {
            e.printStackTrace();
        }
    }
}

类似资料：

如何使用java从pdf文件中获取原始文本

我有一些pdf文件，使用pdfbox我已经将它们转换为文本并存储到文本文件中，现在我要从文本文件中删除它们超链接我希望按如下方式逐行获取有效文本：我们提出了一种从纯文本中提取的多词概念词进行本体学习的方法OntoGain。OntoGain遵循一个由不同处理层定义的本体学习过程。在普通术语提取的基础上，通过对提取的概念进行聚类，形成概念层次结构。然后，衍生术语“分类法”将丰富非分类关系。已经研
如何使用Python从本地PDF文件中获取文本

请不要使用“tika”作为答案。我已经尝试了这个问题的答案：如何从PDF文件中提取文本？我有这个PDF文件，https://drive.google.com/file/d/1aUfQAlvq5hA9kz2c9CyJADiY3KpY3-Vn/view?usp=sharing，我想复制文本。输出为“提交日期：2019-10-21 16:03:36.093 |表单键：5544”，这只是文本的一部分
如何在app engine-java上从GCS获取pdf文件

我正在google-app-Engine上运行一个应用程序。尝试从google-cloud d-store上的pdf-file获取txt。当我在本地运行代码时，它会成功，但当在appengine上运行时，它会在org上失败。pdfbox。例外情况。包装异常这是我的代码：代码在with-没有添加其他消息：（从存储器下载—实际上成功了。如果我记录数据，我会得到如下结果：现在有没有办法克服这
如何使用Apache PDFBox从PDF文件提取文本

问题内容：我想使用Apache PDFBox从给定的PDF文件中提取文本。我写了这段代码：但是，出现以下错误：我在类路径中添加了pdfbox-1.8.5.jar和fontbox-1.8.5.jar。编辑我添加到程序的开头。我运行了它，然后出现了与上述相同的错误，并且未出现在控制台中。因此，我认为我对类路径或其他东西有疑问。谢谢。问题答案：我执行了您的代码，它工作正常。也许您的
如何使用Python请求获取pdf文件名？

问题内容：我正在使用Python请求库从网络获取PDF文件。这可以正常工作，但我现在也想要原始文件名。如果我在Firefox中转到PDF文件，然后单击它，则已经定义了文件名来保存pdf。如何获得此文件名？例如：我检查了是否有任何有趣的内容，但其中没有文件名。我实际上希望的是.. 有人知道如何通过请求库获取下载的PDF文件的文件名吗？问题答案：它在http标头中指定。因此，要提取名称，您将
如何使用Java8流从hashmap[duplicate]获取值

我有一个hashmap，看起来像这样：我的ImageRecipeMap对象如下所示：我想获取所有ImageID列表，并使用Java8流创建一个总的ImageID列表。到目前为止，这就是我所拥有的，但我的collect上似乎有一个编译错误：

如何使用java流从web获取PDF文件

共有2个答案

相关问答

相关文章

相关阅读

相关工具

相关文档