当前位置: 首页 > 工具软件 > crawl-me > 使用案例 >

crawl.py

阚小云
2023-12-01

>crawl.py http://www.hao123.com/index.htm

结果如下:

parsedurl =  ParseResult(scheme='http', netloc='www.hao123.com', path='/index.htm', params='', query='', fragment='')
path = 
www.hao123.com/index.htm
ext =  ('www.hao123.com/index', '.htm')
path = 
www.hao123.com/index.htm
ldir =  www.hao123.com
ldir =  www.hao123.com
path =  www.hao123.com/index.htm
self.url =  http://www.hao123.com/index.htm
self.file =  www.hao123.com/index.htm
retval =  ('www.hao123.com/index.htm', <httplib.HTTPMessage instance at 0x010F9968>)

( 1 )
URL:
http://www.hao123.com/index.htm
FILE: www.hao123.com/index.htm
http://www.hao123.com                                         ... new, added to Q
http://www.hao123.com/redian/tongzhi.htm                      ... new, added to Q
http://utility.hao123.com/quality_form.php                    ... discarded, not in domain
*  javascript:void(0)                                            ... discarded, javascript
http://www.hao123.com/redian/scookie.htm                      ... new, added to Q
*  javascript:void(0)                                            ... discarded, javascript
*  javascript:void(0)                                            ... discarded, javascript
*  javascript:void(0)                                            ... discarded, javascript
http://www.hao123.com                                         ... discarded, already in Q
http://wenku.baidu.com                                        ... discarded, not in domain
http://baike.baidu.com                                        ... discarded, not in domain
http://jingyan.baidu.com                                      ... discarded, not in domain
http://hi.baidu.com                                           ... discarded, not in domain
http://top.baidu.com                                          ... discarded, not in domain
http://dict.baidu.com                                         ... discarded, not in domain
http://s.baidu.com                                            ... discarded, not in domain
http://www.baidu.com                                          ... discarded, not in domain
http://www.hao123.com/daquan/shfwsite.htm                     ... new, added to Q
http://www.hao123.com/netbuy.htm                              ... new, added to Q
http://www.hao123.com/caipiao.htm                             ... new, added to Q
http://www.hao123.com/haoserver/index.htm                     ... new, added to Q
http://www.hao123.com/tianqi.htm                              ... new, added to Q
http://www.hao123.com/stock.htm                               ... new, added to Q
http://www.hao123.com/stock3.htm                              ... new, added to Q
http://www.hao123.com/bankjt.htm                              ... new, added to Q
http://www.hao123.com/lvyou.htm                               ... new, added to Q

..........

 类似资料:

相关阅读

相关文章

相关问答