python下载文件损坏_python - 尝试下载文件时，Python请求抛出带有http.client.IncompleteRead的Connection Broken：ChunkedEncodi...

戚晨

2023-12-01

我正在尝试使用请求模块下载PDF文件，代码如下：

import requests

url = ""

r = requests.get(url, stream=True, timeout=(60, 120), headers={'Connection': 'keep-alive','User-Agent': 'Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36 Edge/12.10136'})

print(r.headers)

print(r.status_code)

try:

with open('blah.pdf', 'wb') as f:

for chunk in r:

# print(chunk)

f.write(chunk)

except Exception as e:

print(e)

输出如下：

{'Cache-Control': 'private', 'Transfer-Encoding': 'chunked', 'Content-Type': 'application/pdf', 'Server': 'Microsoft-IIS/7.5', 'X-AspNet-Version': '4.0.30319', 'X-Powered-By': 'ASP.NET', 'Date': 'Wed, 02 Oct 2019 05:17:11 GMT', 'Set-Cookie': 'bbb=rd102o00000000000000000000ffff978433aao80; path=/; Httponly; Secure'}

200

('Connection broken: IncompleteRead(0 bytes read, 2 more expected)', IncompleteRead(0 bytes read, 2 more expected))

这是完整的堆栈跟踪：

Traceback (most recent call last):

File "/storage/anaconda3/lib/python3.7/site-packages/urllib3/response.py", line 425, in _error_catcher

yield

File "/storage/anaconda3/lib/python3.7/site-packages/urllib3/response.py", line 755, in read_chunked

chunk = self._handle_chunk(amt)

File "/storage/anaconda3/lib/python3.7/site-packages/urllib3/response.py", line 709, in _handle_chunk

self._fp._safe_read(2) # Toss the CRLF at the end of the chunk.

File "/storage/anaconda3/lib/python3.7/http/client.py", line 612, in _safe_read

raise IncompleteRead(b''.join(s), amt)

http.client.IncompleteRead: IncompleteRead(0 bytes read, 2 more expected)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File "/storage/anaconda3/lib/python3.7/site-packages/requests/models.py", line 750, in generate

for chunk in self.raw.stream(chunk_size, decode_content=True):

File "/storage/anaconda3/lib/python3.7/site-packages/urllib3/response.py", line 560, in stream

for line in self.read_chunked(amt, decode_content=decode_content):

File "/storage/anaconda3/lib/python3.7/site-packages/urllib3/response.py", line 781, in read_chunked

self._original_response.close()

File "/storage/anaconda3/lib/python3.7/contextlib.py", line 130, in __exit__

self.gen.throw(type, value, traceback)

File "/storage/anaconda3/lib/python3.7/site-packages/urllib3/response.py", line 443, in _error_catcher

raise ProtocolError("Connection broken: %r" % e, e)

urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(0 bytes read, 2 more expected)', IncompleteRead(0 bytes read, 2 more expected))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File "test.py", line 12, in

for chunk in r:

File "/storage/anaconda3/lib/python3.7/site-packages/requests/models.py", line 753, in generate

raise ChunkedEncodingError(e)

requests.exceptions.ChunkedEncodingError: ('Connection broken: IncompleteRead(0 bytes read, 2 more expected)', IncompleteRead(0 bytes read, 2 more expected))

当我在网络浏览器（例如Google Chrome）上打开pdf时，chrome的内置pdf插件可以正确加载它，并且可以在浏览器中阅读。但是，如果我尝试通过单击下载图标来下载它，则会出现Failed - Network Error Firefox无法加载/下载它。（Firefox和Chrome均已升级到最新版本）当我在Windows计算机上对其进行测试时，Microsoft edge能够下载pdf，但是...

我尝试了一些命令行工具，例如curl，wget，aria2c（已将适当的标头设置为浏览器请求）都无法下载pdf。

wget输出：

connected.

HTTP request sent, awaiting response... 200 OK

Length: unspecified [application/pdf]

Saving to: ‘blah.pdf’

[ <=> ] 101.68K 66.1KB/s in 1.5s

2019-10-02 11:29:50 (69.1 KB/s) - Read error at byte 108786 (Success).

使用wget下载的文件已损坏。

我尝试过的另一件事是使用mitm和chromedriver + Selenium组合对其进行检查。

自动Chrome浏览器无法加载pdf并显示错误：

502 Bad Gateway

HttpSyntaxException('Malformed chunked body',)

如何使用requests模块下载此pdf文件？任何帮助将不胜感激。

python下载文件损坏_python - 尝试下载文件时，Python请求抛出带有http.client.IncompleteRead的Connection Broken：ChunkedEncodi...

相关阅读

相关文章

相关问答

相关文档