问题：

无法获取具有“内容配置：附件使用python请求

邓建柏

2023-03-14

使用firefox浏览器，我登录到一个下载站点，然后单击其中一个查询按钮。弹出一个名为“Opening report1.csv”的小窗口，我可以选择“打开”或“保存文件”。我保存文件。

对于这个动作，实时HTTP标头向我展示：

https://myserver/ReportPage？下载

获取 /ReportPage？下载

HTTP/1.1 200 OK
Date： Sat，30 Dec2017 22:37:40GMT
Server： Apache-Coyote/1.1
Last-Modify： Sat，30 Dec2017 22:37:40GMT
Expires： Thu，01 Jan1970 00:00:00GMT
Pragma： no-cache
Cache-Control： no-cache， no-store
Content-Distion：附件；文件名="report1.csv"；文件名*=UTF-8"report1.csv
Content-Type： text/csv
Content-Llong： 332369
Keet-Alive： timeout=5， max=100
连接：Keet-Alive

现在我试着用请求来模仿这一点。

$ python3
>>> import requests
>>> from lxml import html
>>>
>>> s = requests.Session()
>>> s.verify = './myserver.crt'  # certificate of myserver for https
>>>
>>> # get the login web page to enter username and password
... r = s.get( 'https://myserver' )
>>>
>>> # Get url for logging in. It's the action-attribute in the form anywhere.
... # We use xpath.
... tree = html.fromstring(r.text)
>>> loginUrl = 'https://myserver/' + list(tree.xpath("//form[@id='id4']/@action"))[0]
>>> print( loginUrl )   # it contains a session-id
https://myserver/./;jsessionid=77EA70CB95252426439097E274286966?0-1.loginForm
>>>
>>> # logging in with username and password
... r = s.post( loginUrl, data = {'username':'ingo','password':'mypassword'} )
>>> print( r.status_code )
200
>>> # try to get the download file using url from Live HTTP headers
... downloadQueryUrl = 'https://myserver/ReportPage?download&NAME=ALL&DATE=THISYEAR'
>>> r = s.get( downloadQueryUrl )
>>> print( r.status_code)
200
>>> print( r. headers )
{'Connection': 'Keep-Alive',
'Date': 'Sun, 31 Dec 2017 14:46:03 GMT',
'Cache-Control': 'no-cache, no-store',
'Keep-Alive': 'timeout=5, max=94',
'Transfer-Encoding': 'chunked',
'Expires': 'Thu, 01 Jan 1970 00:00:00 GMT',
'Pragma': 'no-cache',
'Content-Encoding': 'gzip',
'Content-Type': 'text/html;charset=UTF-8',
'Server': 'Apache-Coyote/1.1',
'Vary': 'Accept-Encoding'}
>>> print( r.url )
https://myserver/ReportPage?4&NAME=ALL&DATE=THISYEAR
>>>

请求成功，但我没有得到文件下载页面。标头中没有“Content-Distion：附件；”条目。我只获取查询开始的页面，例如来自引用者的页面。

这与会话cookie有关吗？似乎请求自动管理它。csv文件有特殊处理吗？我必须使用流吗？Live HTTP Headers显示的下载URL是正确的吗？也许有一个动态创建？

如何从myserver获取带有“Content-Distions：附件”的网页并通过请求下载其文件？

共有1个答案

杨甫

2023-03-14

我明白了@Patrick Mevzek为我指出了正确的方向。谢谢你这么做。

登录后，我不会停留在第一个登录页面上并调用查询。相反，我请求报告页面，从中提取query-url并请求query-url。现在我得到的响应是标题中的“Content-Distion：附件；”。现在将其文本打印到标准输出很简单。我更喜欢这样，因为我可以将输出重定向到任何文件。信息消息转到stderr，这样它们就不会搞乱重定向的输出。典型的调用是./下载

为了完整起见，这里是脚本模板，没有任何错误检查，以明确其工作原理。

#!/usr/bin/python3

import requests
import sys
from lxml import html

s = requests.Session()
s.verify = './myserver.crt'  # certificate of myserver for https

# get the login web site to enter username and password
r = s.get( 'https://myserver' )

# Get url for logging in. It's the action-attribute in the form anywhere.
# We use xpath.
tree = html.fromstring(r.text)
loginUrl = 'https://myserver/' + tree.xpath("//form[@id='id4']/@action")[0]

# logging in with username and password and go to ReportPage with queries
r = s.post( loginUrl, data = {'username':'ingo','password':'mypassword'} )
queryUrl = 'https://myserver/ReportPage?NAME=ALL&DATE=THISYEAR'
r = s.get( queryUrl )

# Get the download link for this query from this site. It's a link anywhere
# with value 'Download (UTF8)'
tree = html.fromstring( r.text )
downloadUrl = 'https://myserver/' + tree.xpath("//a[.='Download (UTF8)']/@href")[0]

# get the download file
r = s.get( downloadUrl )
if r.headers.get('Content-Disposition'):
    print( 'Downloading ...', file=sys.stderr )
    print( r.text )

# log out
r = s.get( 'https://myserver/logout' )

类似资料：

使用Python请求获取HEAD内容

问题内容：我正在尝试解析使用Python Requests库完成的HEAD请求的结果，但似乎无法访问响应内容。根据文档，我应该能够从request.Response.text访问内容。这对GET请求对我来说效果很好，但对HEAD请求返回None。 GET请求（有效）内容= HEAD请求（无内容）内容= 编辑好的，我很快就从答案中意识到，HEAD请求不应返回仅内容标头。但这是否意味着，要访
获取配置文件中内容

在程序入口文件index.js 中可以在init方法中获取server对象，通过该server可以获取config,具体方式如下： init(server, options) { const config = server.config(); const url = config.get('elasticsearch.url'); } 自定义配置 1.配置校验与默
使用Python获取网页内容？

问题内容：我正在使用Python 3.1，如果有帮助的话。无论如何，我正在尝试获取此网页的内容。我用Google搜索了一下，尝试了不同的方法，但是它们没有用。我猜想这应该是一件容易的事，但是…我做不到。：/。 urllib，urllib2的结果：谢谢杰森。：D。问题答案：由于您使用的是Python 3.1，因此需要使用新的Python 3.1 API 。尝试：或者，看起来您正在使用P
无法获取Azure容器网络配置文件ID

我们实际上是使用Azure CLI和创建命令将容器部署到Azure，并在下面指定示例留档：https://docs.microsoft.com/en-us/azure/container-instances/container-instances-vnet 在本文件中，下面的示例命令明确规定，当创建容器和Vnet/子网时，azure会为您创建一个网络配置文件Id（这是yaml Deplyement
php使用带有https的文件获取内容-

我看了看线索，遵循了建议——这让我来到了这里...我使用WAMP-php rev 5.4.12（Win7）代码尽可能简单：$result=file\u get\u contents（“https://g4apps.bliptrack.net/blipzones/report/publicdisplayapi.seam?display_id=dvp_vms4"); （此URL返回XML文件-在浏览
如何使用Python获取请求中响应的原始内容？

问题内容：尝试在Python中获取HTTP响应内容的原始数据。我有兴趣通过另一个渠道转发响应，这意味着理想情况下，内容应尽可能原始。什么是做到这一点的好方法？问题答案：如果使用呼叫获取HTTP响应，则可以使用响应的属性。这是docs中的代码。

无法获取具有“内容配置：附件使用python请求

共有1个答案

相关问答

相关文章

相关阅读

相关工具

相关文档