这个东西真滴难,看了好多资料,终于明白了一点点
import aiohttp
import asyncio
import time
async def request(client):
async with client.get('http://httpbin.org/get') as response:
assert response.status == 200
print(response.text)
async def session():
async with aiohttp.ClientSession() as client:
await request(client)
async def main():
tasks = []
for i in range(10):
tasks.append(asyncio.create_task(session()))
await asyncio.wait(tasks)
start = time.time()
asyncio.run(main())
end = time.time()
print("aiohttp爬虫用时:{}".format(end-start))
分析:
1.第12行后的await表示发出的request异步函数是可等待的,因此当客户发送一个请求时,session函数就进行了下一个的请求。
2.client对象是由clientsession赋值而来的,而response其实是clientresponse对象。aiohttp使用clientsession来管理对象
示例:
异步爬取豆瓣电影top250
import aiohttp
import asyncio
import time
import aiohttp
import re
import fake_useragent
ua = fake_useragent.UserAgent()
useragent = ua.random
def myre(result):
pattern = re.compile(r'<li>.*?<span class="title">(?P<name>.*?)</span>',re.S)
target = pattern.finditer(result)
for it in target:
print(it.group('name'))
async def request(session,url):
async with session.get(url) as response:
result = await response.text()
myre(result)
async def single(url):
async with aiohttp.ClientSession(headers={'User-Agent':useragent}) as session:
await request(session,url)
async def main():
tasks=[]
for i in range(10):
url = "https://movie.douban.com/top250"
id = i * 25
url = url + "?start=" + str(id)
tasks.append(asyncio.create_task(single(url)))
await asyncio.wait(tasks)
start = time.time()
asyncio.run(main())
end = time.time()
print("总共用时:{}".format(end-start))