python下载文件损坏_python - 尝试下载文件时,Python请求抛出带有http.client.IncompleteRead的Connection Broken:ChunkedEncodi...

戚晨
2023-12-01

我正在尝试使用请求模块下载PDF文件,代码如下:

import requests

url = ""

r = requests.get(url, stream=True, timeout=(60, 120), headers={'Connection': 'keep-alive','User-Agent': 'Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36 Edge/12.10136'})

print(r.headers)

print(r.status_code)

try:

with open('blah.pdf', 'wb') as f:

for chunk in r:

# print(chunk)

f.write(chunk)

except Exception as e:

print(e)

输出如下:

{'Cache-Control': 'private', 'Transfer-Encoding': 'chunked', 'Content-Type': 'application/pdf', 'Server': 'Microsoft-IIS/7.5', 'X-AspNet-Version': '4.0.30319', 'X-Powered-By': 'ASP.NET', 'Date': 'Wed, 02 Oct 2019 05:17:11 GMT', 'Set-Cookie': 'bbb=rd102o00000000000000000000ffff978433aao80; path=/; Httponly; Secure'}

200

('Connection broken: IncompleteRead(0 bytes read, 2 more expected)', IncompleteRead(0 bytes read, 2 more expected))

这是完整的堆栈跟踪:

Traceback (most recent call last):

File "/storage/anaconda3/lib/python3.7/site-packages/urllib3/response.py", line 425, in _error_catcher

yield

File "/storage/anaconda3/lib/python3.7/site-packages/urllib3/response.py", line 755, in read_chunked

chunk = self._handle_chunk(amt)

File "/storage/anaconda3/lib/python3.7/site-packages/urllib3/response.py", line 709, in _handle_chunk

self._fp._safe_read(2) # Toss the CRLF at the end of the chunk.

File "/storage/anaconda3/lib/python3.7/http/client.py", line 612, in _safe_read

raise IncompleteRead(b''.join(s), amt)

http.client.IncompleteRead: IncompleteRead(0 bytes read, 2 more expected)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File "/storage/anaconda3/lib/python3.7/site-packages/requests/models.py", line 750, in generate

for chunk in self.raw.stream(chunk_size, decode_content=True):

File "/storage/anaconda3/lib/python3.7/site-packages/urllib3/response.py", line 560, in stream

for line in self.read_chunked(amt, decode_content=decode_content):

File "/storage/anaconda3/lib/python3.7/site-packages/urllib3/response.py", line 781, in read_chunked

self._original_response.close()

File "/storage/anaconda3/lib/python3.7/contextlib.py", line 130, in __exit__

self.gen.throw(type, value, traceback)

File "/storage/anaconda3/lib/python3.7/site-packages/urllib3/response.py", line 443, in _error_catcher

raise ProtocolError("Connection broken: %r" % e, e)

urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(0 bytes read, 2 more expected)', IncompleteRead(0 bytes read, 2 more expected))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File "test.py", line 12, in

for chunk in r:

File "/storage/anaconda3/lib/python3.7/site-packages/requests/models.py", line 753, in generate

raise ChunkedEncodingError(e)

requests.exceptions.ChunkedEncodingError: ('Connection broken: IncompleteRead(0 bytes read, 2 more expected)', IncompleteRead(0 bytes read, 2 more expected))

当我在网络浏览器(例如Google Chrome)上打开pdf时,chrome的内置pdf插件可以正确加载它,并且可以在浏览器中阅读。 但是,如果我尝试通过单击下载图标来下载它,则会出现Failed - Network Error Firefox无法加载/下载它。 (Firefox和Chrome均已升级到最新版本)当我在Windows计算机上对其进行测试时,Microsoft edge能够下载pdf,但是...

我尝试了一些命令行工具,例如curl,wget,aria2c(已将适当的标头设置为浏览器请求)都无法下载pdf。

wget输出:

connected.

HTTP request sent, awaiting response... 200 OK

Length: unspecified [application/pdf]

Saving to: ‘blah.pdf’

[ <=> ] 101.68K 66.1KB/s in 1.5s

2019-10-02 11:29:50 (69.1 KB/s) - Read error at byte 108786 (Success).

使用wget下载的文件已损坏。

我尝试过的另一件事是使用mitm和chromedriver + Selenium组合对其进行检查。

自动Chrome浏览器无法加载pdf并显示错误:

502 Bad Gateway

HttpSyntaxException('Malformed chunked body',)

如何使用requests模块下载此pdf文件? 任何帮助将不胜感激。

 类似资料: