curl -- The Art of Scripting HTTP Requests Using Curl

商飞翮
2023-12-01

curl http://site.{one,two,three}.com
curl ftp://ftp.numericals.com/file[1-100].txt
curl ftp://ftp.numericals.com/file[001-100].txt
curl http://any.org/archive[1996-1999]/vol[1-4]/part{a,b,c}.html
curl http://www.numericals.com/file[1-100:10].txt
curl http://www.letters.com/file[a-z:2].txt

curl --trace-ascii debugdump.txt http://www.example.com/
curl http://curl.haxx.se
curl "http://www.hotmail.com/when/junk.cgi?birthyear=1905&press=OK"
curl --data "birthyear=1905&press=%20OK%20"  http://www.example.com/when.cgi
curl --data-urlencode "name=I am Daniel" http://www.example.com
curl --form upload=@localfilename --form press=OK [URL]
curl --upload-file uploadfile http://www.example.com/receive.cgi
curl --user name:password http://www.example.com
curl --proxy-user proxyuser:proxypassword curl.haxx.se
curl --referer http://www.example.come http://www.example.com
curl --user-agent "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)" [URL]
curl --user-agent "Mozilla/4.73 [en] (X11; U; Linux 2.2.15 i686)" [URL]
curl --location http://www.example.com

curl --cookie "name=Daniel" http://www.example.com
curl --dump-header headers_and_cookies http://www.example.com
curl --cookie stored_cookies_in_file http://www.example.com
curl --cookie nada --location http://www.example.com
curl --cookie cookies.txt --cookie-jar newcookies.txt http://www.example.com

curl https://secure.example.com
curl --cert mycert.pem https://secure.example.com
curl --data "<xml>" --header "Content-Type: text/xml" --request PROPFIND url.com
curl --header "Host:" http://www.example.com
curl --header "Destination: http://nowhere" http://example.com

======================================================================

Online:  http://curl.haxx.se/docs/httpscripting.html
Date:    Jan 19, 2011
 
                The Art Of Scripting HTTP Requests Using Curl
                =============================================
 
 本文档假设你熟悉HTML和基本的网络操作.
 
 编写脚本对打造一个良好的计算机系统是必需的. Unix 可通过shell脚本进行扩展, 不同
 的工具可执行各种自动化命令, 这是脚本为什么如此成功的原因之一.
 
 日益增加的应用程序, 使得"HTTP Scripting"需求越来越平凡. 获取网站数据给伪造用户,
 提交或上传数据到web服务器, 也日渐重要.
 
 Curl 是一个命令行工具, 用于各种URL处理和转换, 但这部分文档关注仅是HTTP
 请求. 我假设你已经知道如何使用 'curl --help' 或 'curl --manual' 获取curl
 的基本信息.
 
 Curl 并不能为你做任何事情. 它可发送请求, 获取数据, 发送数据, 获取信息. 你可能
 需要使用各种脚本语言 或 重复查阅手册, 将这些内容整合到一起.
 
1. The HTTP Protocol
 
 HTTP 是一种用于从服务器获取数据的协议. 它是基于TCP/IP构建的一种简单协议. 这种
 协议允许客户端使用一些不同的方法从服务器获取数据, 如下所示.
 
 HTTP , 是纯文本行, 从客户端发送到服务器, 用于请求特定操作, 在请求数据传到客户
 端之前, 服务器会先返回一些纯文本行.
 
 客户端, curl, 发送一个HTTP请求. 请求包含一个方法 (类似
 GET, POST, HEAD 等), 一些请求头 和 请求主体. HTTP 服务器响应附带状态返回(说明
 是否工作的很好), 响应头 和 响应主体. "body" 部分就是你请求的纯数据, 类似 HTML
 或 image 等.

 1.1 See the Protocol
 
  使用 curl 的选项 --verbose (-v 短选项) 会显示curl发送给服务器的命令, 以及一些
  其他的信息.
 
  --verbose 常用于调试 或 理解 curl与服务器之间的交互过程.
 
  有时 --verbose 还不够. --trace 和--trace-ascii 能够显示更详细的信息, 包括curl
  操作过程中发送和接收的所有信息. 使用如下:
 
      curl --trace-ascii debugdump.txt http://www.example.com/
 
2. URL
 
 统一资源定位符, 用于指定互联网资源地址. 例如: http://curl.haxx.se 和
 https://yourbank.com.
 
3. GET a page
 
 HTTP 最简单常用的操作, 就是访问一个URL地址. URL 可指向一个web页面, 一张图片
 或 一个文件. 客户端向服务器发送一个HTTP请求, 并返回请求的文档.
        curl http://curl.haxx.se
 
 终端执行上述命令, 可获取整个web页面.

 
 所有的HTTP应答都包含一组隐藏的响应头, 使用 curl 的 --include(-i) 可以显示响
 应头和文档内容. 你也可以使用选项--head(-I)(curl会发送一个HEAD请求), 只获取
 响应头.
 
4. Forms
 
 表单, 是网站向用户提供带输入域的HTML页面一种常见方式, 用于接收用户数据, 输入
 完成后, 按下 'OK' 或 'submit' 按钮, 向服务器发送数据. 服务器根据用户提交的数
 据决定如何操作. 类似利用输入字查询数据库, 或向bug跟踪系统中添加信息, 地图地
 址显示, 或使用登陆提示, 验证用户的哪些操作被允许.

 当然, 必需有服务端程序用于接收你发送的数据. 你不能凭空捏造.

 4.1 GET
 
  GET-form 使用的是 GET 方法, 如下面HTML描述:
 
        <form method="GET" action="junk.cgi">
          <input type=text name="birthyear">
          <input type=submit name=press value="OK">
        </form>
 
  使用你最喜欢的浏览器, 页面会显示一个输入框. 如果输入 '1905', 并按下 OK 按钮,
  浏览器会创建一个GET请求的连接. 原始URL会附加"junk.cgi?birthyear=1905&press=OK".
 
  如果原始表单页面地址是 "www.hotmail.com/when/birth.html", 访问后, 第二个地址变为
  "www.hotmail.com/when/junk.cgi?birthyear=1905&press=OK".
 
  大部分搜索引擎, 以这种方式工作.
 
  curl 能够为你完成这项任务:
 
        curl "http://www.hotmail.com/when/junk.cgi?birthyear=1905&press=OK"
 
 4.2 POST
 
  GET 方法会将所有输入域名称都显示在URL. 这种情况, 你收藏指定页面很方便, 弊端
  就是输入域的内容可以在URL上看见.
 
  HTTP 协议还提供了POST方法. 该方法可以从URL中分离数据, 这样在URL地址域你就看
  不到任何内容.
 
  POST与GET表单在形式上很类似:
 
        <form method="POST" action="junk.cgi">
          <input type=text name="birthyear">
          <input type=submit name=press value=" OK ">
        </form>
 
  curl 下面方法, 可帮助你提交数据:
 
        curl --data "birthyear=1905&press=%20OK%20"         http://www.example.com/when.cgi
 
  该类型的 POST 对应的 Content-Type 为  application/x-www-form-urlencoded .
 
  发送到服务器的数据, 必须经过编码, curl 不会为你这么做. 例如, 如果你想要数据
  包含空格, 需要使用 %20 替换空格. 如果你的数据不遵循这个规则, 那么可能会导致
  错误.
 
  新版的 curl 可以 url-encode POST 数据, 如下:
 
        curl --data-urlencode "name=I am Daniel" http://www.example.com
 
 4.3 File Upload POST
 
  1995 年, RFC 1867 提出一种额外的方法用于HTTP post数据.

  该方法用于支持文件上传. 下面的HTML表单允许用户上传文件:
 
    <form method="POST" enctype='multipart/form-data' action="upload.cgi">
      <input type=file name=upload>
      <input type=submit name=press value="OK">
    </form>
 
  Content-Type 类型是 multipart/form-data.
 
  curl 可使用下面方法提交数据:
 
        curl --form upload=@localfilename --form press=OK [URL]
 
 4.4 Hidden Fields
 
  通常基于HTML的应用程序, 可通过表单中的隐藏域传递状态信息. 隐藏域不会显示,
  它们向其他表单一样传递.
 
  例如:
 
    <form method="POST" action="foobar.cgi">
      <input type=text name="birthyear">
      <input type=hidden name="person" value="daniel">
      <input type=submit name="press" value="OK">
    </form>
 
  curl 处理时, 不用考虑该域是否为隐藏域. 如下:
 
        curl --data "birthyear=1905&press=OK&person=daniel" [URL]
 
 4.5 Figure Out What A POST Looks Like
 
  当你使用curl填写表单发送数据时, 你肯定对浏览器完成的POST操作很感兴趣.
 
  一种简单的办法可用于查看该过程, 保存表单HTML到本地磁盘, 修改'method' 为
  GET, 接着按下发送键(可按照自己的需求修改action)
 
  你可以清楚地看到URL后面附加的数据.
 
5. PUT
 
 上传数据到服务器的最好办法, 就是使用PUT方法.
 
 curl 上传文件到HTTP服务器:
 
        curl --upload-file uploadfile http://www.example.com/receive.cgi
 
6. HTTP Authentication
 
 HTTP 认证, 告诉服务器用户名和密码, 以便获取请求内容. HTTP 基本认证(默认)是
 基于明文的, 这意味着发送的用户名和密码只经过模糊简单的处理, 它仍可以被嗅探.
 
 curl 使用用户和密码, 进行认证:
 
        curl --user name:password http://www.example.com
 
 网站如果使用了不同的验证机制(检测服务器返回的头), 选项 --ntlm, --digest,
 --negotiate, --anyauth 可能适合你.
 
 有时候HTTP请求, 经过代理服务器, 这在很多公司都很常见. 访问互联网可能需要用
 户名和密码, curl 使用方法如下:
 
        curl --proxy-user proxyuser:proxypassword curl.haxx.se
 
 如果代理需要NTLM方法认证, 请使用--proxy-ntlm, 如果需要Digest, 请
 使用--proxy-digest.
 
 如果你使用user+password选项, 但是你忘记输入密码, curl会提示你.
 
 注意, 程序运行时, 它的参数可能可以通过列举进程列表获取. 因此, 如果你将用户和
 密码作为命令行选项, 其他的用户可能会看到你的密码
 
 不用担心HTTP的验证过程, 很多网站不会采用这种这种验证方式. 详情请看后面的
 Web Login 章节.
 
7. Referer
 
 HTTP 请求可能会包含一个 'referer' 域(没错, 它拼写错误), 用于指定来源链接.
 一些程序/脚本会检查referer域, 以确定是否来源于其他的网站或者未知页面. 这种
 检测方法不太可靠, 很容易欺骗, 但是还是有很多人用. curl 可帮你指定 referer-
 field.
 
 curl 可帮你指定 referer-field:
 
        curl --referer http://www.example.come http://www.example.com
 
8. User Agent
 
 与 referer类似, 所有HTTP 请求可能都会有 User-Agent. 很多应用程序会根据该选项
 决策如何显示页面内容. 傻的程序员通常会为浏览器定制页面. 他们通常会使用各种不
 同的javascript, vbscript 等.
 
 有时候, 你会发现 curl 返回的内容与你的浏览器不同. 然后你发现设置 User Agent
 可以愚弄服务器.
 
 curl 模拟Windows 2000 浏览器 Internet Explorer 5:
 
  curl --user-agent "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)" [URL]
 
 或模拟Linux 浏览器 Netscape 4.73:
 
  curl --user-agent "Mozilla/4.73 [en] (X11; U; Linux 2.2.15 i686)" [URL]
 
9. Redirects
 
 当你向服务器请求资源时, 服务器应答内容可能包含一个暗示, 告诉浏览器跳转到下一个
 页面, 或输出新内容的页面. header 中用于指定重定向的是Location:.
 
 默认情况下, Curl 不会重定向, 只会以简单的方式显示所有HTTP应答.

 curl 跟随一个 Location:
 
        curl --location http://www.example.com
 
 如果使用POST提交数据, 需立即重定向到另一个页面, 可联合使用 --location (-L) ,
 --data/--form.

 
10. Cookies
 
 The way the web browsers do "client side state control" is by using
 cookies. Cookies are just names with associated contents. The cookies are
 sent to the client by the server. The server tells the client for what path
 and host name it wants the cookie sent back, and it also sends an expiration
 date and a few more properties.
 
 When a client communicates with a server with a name and path as previously
 specified in a received cookie, the client sends back the cookies and their
 contents to the server, unless of course they are expired.
 
 Many applications and servers use this method to connect a series of requests
 into a single logical session. To be able to use curl in such occasions, we
 must be able to record and send back cookies the way the web application
 expects them. The same way browsers deal with them.
 
 The simplest way to send a few cookies to the server when getting a page with
 curl is to add them on the command line like:
 
        curl --cookie "name=Daniel" http://www.example.com
 
 Cookies are sent as common HTTP headers. This is practical as it allows curl
 to record cookies simply by recording headers. Record cookies with curl by
 using the --dump-header (-D) option like:
 
        curl --dump-header headers_and_cookies http://www.example.com
 
 (Take note that the --cookie-jar option described below is a better way to
 store cookies.)
 
 Curl has a full blown cookie parsing engine built-in that comes to use if you
 want to reconnect to a server and use cookies that were stored from a
 previous connection (or handicrafted manually to fool the server into
 believing you had a previous connection). To use previously stored cookies,
 you run curl like:
 
        curl --cookie stored_cookies_in_file http://www.example.com
 
 Curl's "cookie engine" gets enabled when you use the --cookie option. If you
 only want curl to understand received cookies, use --cookie with a file that
 doesn't exist. Example, if you want to let curl understand cookies from a
 page and follow a location (and thus possibly send back cookies it received),
 you can invoke it like:
 
        curl --cookie nada --location http://www.example.com
 
 Curl has the ability to read and write cookie files that use the same file
 format that Netscape and Mozilla do. It is a convenient way to share cookies
 between browsers and automatic scripts. The --cookie (-b) switch
 automatically detects if a given file is such a cookie file and parses it,
 and by using the --cookie-jar (-c) option you'll make curl write a new cookie
 file at the end of an operation:
 
        curl --cookie cookies.txt --cookie-jar newcookies.txt         http://www.example.com
 
11. HTTPS
 
 There are a few ways to do secure HTTP transfers. The by far most common
 protocol for doing this is what is generally known as HTTPS, HTTP over
 SSL. SSL encrypts all the data that is sent and received over the network and
 thus makes it harder for attackers to spy on sensitive information.
 
 SSL (or TLS as the latest version of the standard is called) offers a
 truckload of advanced features to allow all those encryptions and key
 infrastructure mechanisms encrypted HTTP requires.
 
 Curl supports encrypted fetches thanks to the freely available OpenSSL
 libraries. To get a page from a HTTPS server, simply run curl like:
 
        curl https://secure.example.com
 
 11.1 Certificates
 
  In the HTTPS world, you use certificates to validate that you are the one
  you claim to be, as an addition to normal passwords. Curl supports client-
  side certificates. All certificates are locked with a pass phrase, which you
  need to enter before the certificate can be used by curl. The pass phrase
  can be specified on the command line or if not, entered interactively when
  curl queries for it. Use a certificate with curl on a HTTPS server like:
 
        curl --cert mycert.pem https://secure.example.com
 
  curl also tries to verify that the server is who it claims to be, by
  verifying the server's certificate against a locally stored CA cert
  bundle. Failing the verification will cause curl to deny the connection. You
  must then use --insecure (-k) in case you want to tell curl to ignore that
  the server can't be verified.
 
  More about server certificate verification and ca cert bundles can be read
  in the SSLCERTS document, available online here:
 
        http://curl.haxx.se/docs/sslcerts.html
 
12. Custom Request Elements
 
 Doing fancy stuff, you may need to add or change elements of a single curl
 request.
 
 For example, you can change the POST request to a PROPFIND and send the data
 as "Content-Type: text/xml" (instead of the default Content-Type) like this:
 
         curl --data "<xml>" --header "Content-Type: text/xml"               --request PROPFIND url.com
 
 You can delete a default header by providing one without content. Like you
 can ruin the request by chopping off the Host: header:
 
        curl --header "Host:" http://www.example.com
 
 You can add headers the same way. Your server may want a "Destination:"
 header, and you can add it:
 
        curl --header "Destination: http://nowhere" http://example.com
 
13. Web Login
 
 While not strictly just HTTP related, it still cause a lot of people problems
 so here's the executive run-down of how the vast majority of all login forms
 work and how to login to them using curl.
 
 It can also be noted that to do this properly in an automated fashion, you
 will most certainly need to script things and do multiple curl invokes etc.
 
 First, servers mostly use cookies to track the logged-in status of the
 client, so you will need to capture the cookies you receive in the
 responses. Then, many sites also set a special cookie on the login page (to
 make sure you got there through their login page) so you should make a habit
 of first getting the login-form page to capture the cookies set there.
 
 Some web-based login systems features various amounts of javascript, and
 sometimes they use such code to set or modify cookie contents. Possibly they
 do that to prevent programmed logins, like this manual describes how to...
 Anyway, if reading the code isn't enough to let you repeat the behavior
 manually, capturing the HTTP requests done by your browers and analyzing the
 sent cookies is usually a working method to work out how to shortcut the
 javascript need.
 
 In the actual <form> tag for the login, lots of sites fill-in random/session
 or otherwise secretly generated hidden tags and you may need to first capture
 the HTML code for the login form and extract all the hidden fields to be able
 to do a proper login POST. Remember that the contents need to be URL encoded
 when sent in a normal POST.
 
14. Debug
 
 Many times when you run curl on a site, you'll notice that the site doesn't
 seem to respond the same way to your curl requests as it does to your
 browser's.
 
 Then you need to start making your curl requests more similar to your
 browser's requests:
 
 * Use the --trace-ascii option to store fully detailed logs of the requests
   for easier analyzing and better understanding
 
 * Make sure you check for and use cookies when needed (both reading with
   --cookie and writing with --cookie-jar)
 
 * Set user-agent to one like a recent popular browser does
 
 * Set referer like it is set by the browser
 
 * If you use POST, make sure you send all the fields and in the same order as
   the browser does it. (See chapter 4.5 above)
 
 A very good helper to make sure you do this right, is the LiveHTTPHeader tool
 that lets you view all headers you send and receive with Mozilla/Firefox
 (even when using HTTPS).
 
 A more raw approach is to capture the HTTP traffic on the network with tools
 such as ethereal or tcpdump and check what headers that were sent and
 received by the browser. (HTTPS makes this technique inefficient.)
 
15. References
 
 RFC 2616 is a must to read if you want in-depth understanding of the HTTP
 protocol.
 
 RFC 3986 explains the URL syntax.
 
 RFC 2109 defines how cookies are supposed to work.
 
 RFC 1867 defines the HTTP post upload format.
 
 http://curl.haxx.se is the home of the cURL project


 类似资料: