(proxy_ip_project) C:\Users\user>scrapy --help
Scrapy 1.5.0 - no active project
Usage:
scrapy <command> [options] [args]
Available commands:
bench Run quick benchmark test
fetch Fetch a URL using the Scrapy downloader
genspider Generate new spider using pre-defined templates
runspider Run a self-contained spider (without creating a project)
settings Get settings values
shell Interactive scraping console
startproject Create new project
version Print Scrapy version
view Open URL in browser, as seen by Scrapy
[ more ] More commands available when run from project directory
Use "scrapy <command> -h" to see more info about a command
(proxy_ip_project) C:\Users\user>scrapy version
Scrapy 1.5.0
(proxy_ip_project) C:\Users\user>scrapy version -v
Scrapy : 1.5.0
lxml : 4.2.0.0
libxml2 : 2.9.8
cssselect : 1.0.3
parsel : 1.5.0
w3lib : 1.19.0
Twisted : 18.7.0
Python : 3.6.5 |Anaconda, Inc.| (default, Mar 29 2018, 13:32:41) [MSC v.19
00 64 bit (AMD64)]
pyOpenSSL : 18.0.0 (OpenSSL 1.0.2p 14 Aug 2018)
cryptography : 2.3.1
Platform : Windows-7-6.1.7601-SP1
默认的Scrapy项目结构:
scrapy.cfg myproject/ __init__.py items.py pipelines.py settings.py spiders/ __init__.py spider1.py spider2.py ...
列出当前项目中所有可用的spider >>scrapy list
给定的URL启动Scrapy shell >>scrapy shell " url " (双引号) 一个非常有用的命令,可用于调试数据、检测xpath、查看页面源码,等等
获取给定的URL并使用相应的spider分析处理 >> scrapy parse <url> [options]
支持的选项:
在未创建项目的情况下,运行一个编写在Python文件中的 spider >>scrapy runspider <spider_file.py>
执行一个基准测试;可用来检测scrapy是否安装成功 >> scrapy bench
(proxy_ip_project) C:\Users\user>scrapy bench
2018-09-25 09:53:53 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: scrapybo
t)
2018-09-25 09:53:53 [scrapy.utils.log] INFO: Versions: lxml 4.2.0.0, libxml2 2.9
.8, cssselect 1.0.3, parsel 1.5.0, w3lib 1.19.0, Twisted 18.7.0, Python 3.6.5 |A
naconda, Inc.| (default, Mar 29 2018, 13:32:41) [MSC v.1900 64 bit (AMD64)], pyO
penSSL 18.0.0 (OpenSSL 1.0.2p 14 Aug 2018), cryptography 2.3.1, Platform Window
s-7-6.1.7601-SP1
2018-09-25 09:53:55 [scrapy.crawler] INFO: Overridden settings: {'CLOSESPIDER_TI
MEOUT': 10, 'LOGSTATS_INTERVAL': 1, 'LOG_LEVEL': 'INFO'}
2018-09-25 09:53:56 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.logstats.LogStats']
2018-09-25 09:54:00 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-09-25 09:54:00 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-09-25 09:54:00 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-09-25 09:54:00 [scrapy.core.engine] INFO: Spider opened
2018-09-25 09:54:00 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pag
es/min), scraped 0 items (at 0 items/min)
2018-09-25 09:54:01 [scrapy.extensions.logstats] INFO: Crawled 29 pages (at 1740
pages/min), scraped 0 items (at 0 items/min)
2018-09-25 09:54:02 [scrapy.extensions.logstats] INFO: Crawled 53 pages (at 1440
pages/min), scraped 0 items (at 0 items/min)
2018-09-25 09:54:03 [scrapy.extensions.logstats] INFO: Crawled 93 pages (at 2400
pages/min), scraped 0 items (at 0 items/min)
2018-09-25 09:54:04 [scrapy.extensions.logstats] INFO: Crawled 125 pages (at 192
0 pages/min), scraped 0 items (at 0 items/min)
2018-09-25 09:54:05 [scrapy.extensions.logstats] INFO: Crawled 157 pages (at 192
0 pages/min), scraped 0 items (at 0 items/min)
2018-09-25 09:54:06 [scrapy.extensions.logstats] INFO: Crawled 189 pages (at 192
0 pages/min), scraped 0 items (at 0 items/min)
2018-09-25 09:54:07 [scrapy.extensions.logstats] INFO: Crawled 221 pages (at 192
0 pages/min), scraped 0 items (at 0 items/min)
2018-09-25 09:54:08 [scrapy.extensions.logstats] INFO: Crawled 253 pages (at 192
0 pages/min), scraped 0 items (at 0 items/min)
2018-09-25 09:54:09 [scrapy.extensions.logstats] INFO: Crawled 277 pages (at 144
0 pages/min), scraped 0 items (at 0 items/min)
2018-09-25 09:54:10 [scrapy.core.engine] INFO: Closing spider (closespider_timeo
ut)
2018-09-25 09:54:10 [scrapy.extensions.logstats] INFO: Crawled 309 pages (at 192
0 pages/min), scraped 0 items (at 0 items/min)
2018-09-25 09:54:11 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 123687,
'downloader/request_count': 325,
'downloader/request_method_count/GET': 325,
'downloader/response_bytes': 787569,
'downloader/response_count': 325,
'downloader/response_status_count/200': 325,
'finish_reason': 'closespider_timeout',
'finish_time': datetime.datetime(2018, 9, 25, 1, 54, 11, 740825),
'log_count/INFO': 17,
'request_depth_max': 12,
'response_received_count': 325,
'scheduler/dequeued': 325,
'scheduler/dequeued/memory': 325,
'scheduler/enqueued': 6501,
'scheduler/enqueued/memory': 6501,
'start_time': datetime.datetime(2018, 9, 25, 1, 54, 0, 743196)}
2018-09-25 09:54:11 [scrapy.core.engine] INFO: Spider closed (closespider_timeou
t)
(proxy_ip_project) C:\Users\user>
参考资料: https://scrapy-chs.readthedocs.io/zh_CN/0.24/topics/commands.html