前言:最近一个安卓项目需要爬取网页上的PDF,并通过ImageView展示,然后发现网上基本都是利用pdfbox和ImageIO来处理的,但是安卓并不支持ImageIO,于是利用安卓自带的PdfRenderer来解决这一问题。
爬虫工具因人而异,我选择的是Jsoup:
connection = Jsoup.connect(url);
connection.header("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36");
response = connection.cookies(cookies_innet).cookies(cookies).ignoreContentType(true).followRedirects(true)
.method(Connection.Method.GET).execute();
url为PDF下载链接,最终我们返回了一个response,我们打印response的contentType:
System.out.println(response.contentType());
结果为:
application/pdf
这里先讲利用pdfbox来将pdf流存为jpg或者png文件:
<dependencies>
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>2.0.4</version>
</dependency>
</dependencies>
先获取response的bodyStream():
BufferedInputStream in = response.bodyStream();
接着存入文件:
ocument doc = PDDocument.load(in);
PDFRenderer renderer = new PDFRenderer(doc);
int pageCount = doc.getNumberOfPages();
for(int i = 0; i < pageCount; i++) {
BufferedImage image = renderer.renderImageWithDPI(i, 500);
// BufferedImage image = renderer.renderImage(i, 1.0f);
ImageIO.write(image, "PNG", new File("total.png"));
}
但令人遗憾的是,安卓当中并不允许使用ImageIO,所以我们只能利用安卓自带的处理PDF的包:
android.graphics.pdf.PdfRenderer
首先我们先将pdf流保存为pdf文件:(这一步是否需要我也不太清楚,我是觉得下方生成ParcelFileDescriptor对象需要一个PDF文件对象,于是就先保存为PDF文件了)
static void saveImage(BufferedInputStream in) throws IOException {
System.out.println(Environment.getExternalStorageDirectory() +"/score.pdf");
File file=new File(Environment.getExternalStorageDirectory() +"/score.pdf");
int index;
byte[] bytes = new byte[1024];
FileOutputStream out = new FileOutputStream(file);
while ((index = in.read(bytes)) != -1) {
out.write(bytes, 0, index);
out.flush();
}
in.close();
out.close();
}
接着利用该PDF文件生成一个PdfRenderer对象:
File file = new File(Environment.getExternalStorageDirectory() +"/score.pdf");
ParcelFileDescriptor pdfFile = ParcelFileDescriptor.open(file, ParcelFileDescriptor.MODE_READ_ONLY);
PdfRenderer renderer = new PdfRenderer(pdfFile);
最后将renderer转为Bitmap并利用ImageView展示:
final int pageCount = renderer.getPageCount();//获取pdf的页码数
Bitmap[] bitmaps=new Bitmap[pageCount];//新建一个bmp数组用于存放pdf页面
Display display = getWindowManager().getDefaultDisplay();
Point outSize = new Point();
display.getSize(outSize);//不能省略,必须有
int screenWidth = outSize.x;//得到屏幕的宽度
int screenHeight = outSize.y;//得到屏幕的高度
for (int i = 0; i < pageCount; i++) {
PdfRenderer.Page page = renderer.openPage(i);//根据i的变化打开每一页
Bitmap bitmap = Bitmap.createBitmap(page.getWidth() * screenWidth / page.getHeight(), screenWidth, Bitmap.Config.ARGB_8888);//根据屏幕的高宽缩放生成bmp对象
// bitmap = adjustPhotoRotation(bitmap, 90);
page.render(bitmap, null, null, PdfRenderer.Page.RENDER_MODE_FOR_DISPLAY);
bitmaps[i] = bitmap;
page.close();
}
imageView.setImageBitmap(bitmaps[0]);