当前位置: 首页 > 知识库问答 >
问题:

python selenium网页爬虫多线程并发执行偶尔会报错,请问是什么原因?

仲孙飞文
2024-01-06

公司有个需求需要将动态html保存为pdf文件,准备用python+selenium实现。调用chromedriver的Page.printToPdf命令,在页面加载完成后获取打印的response,最后转为pdf保存。考虑到并发性,用多线程模拟测试发现偶尔会出现报错,单个执行又没问题,不知道啥原因

python 3.9.0
selenium 4.16.0

from selenium import webdriverfrom selenium.webdriver.chrome.options import Optionsfrom selenium.webdriver.chrome.service import Servicefrom selenium.common.exceptions import TimeoutExceptionfrom selenium.webdriver.common.by import Byfrom selenium.webdriver.support.ui import WebDriverWaitfrom selenium.webdriver.support.expected_conditions import staleness_offrom urllib.parse import urlparseimport jsonimport base64import threadingimport timedef test():    webdriver_service = Service(r"D:\work\chromedriver-win64\chromedriver.exe")    webdriver_options = Options()    webdriver_options.binary_location = r"D:\work\chrome-win64\chrome.exe"    webdriver_options.add_argument('--no-sandbox')    #webdriver_options.add_argument('--headless')    webdriver_options.add_argument('--disable-gpu')    webdriver_options.add_argument("--remote-debugging-port=9225")    webdriver_options.add_argument("--incognito")    #webdriver_options.page_load_strategy = 'eager'    #webdriver_options.add_argument('--disable-dev-shm-usage')    webdriver_prefs = {}    webdriver_options.experimental_options['prefs'] = webdriver_prefs    webdriver_prefs['profile.default_content_settings'] = {'images': 2}    driver = webdriver.Chrome(options=webdriver_options, service=webdriver_service)    print(driver.session_id)    driver.get("https://www.baidu.com")    driver.quit()if __name__ == '__main__':    t1 = threading.Thread(target=test)    t2 = threading.Thread(target=test)    t3 = threading.Thread(target=test)    t1.start()    t2.start()    t3.start()
Exception in thread Thread-2:Traceback (most recent call last):  File "C:\Users\Lenovo\AppData\Local\Programs\Python\Python39\lib\threading.py", line 950, in _bootstrap_inner    self.run()  File "C:\Users\Lenovo\AppData\Local\Programs\Python\Python39\lib\threading.py", line 888, in run    self._target(*self._args, **self._kwargs)  File "C:\Users\Lenovo\Desktop\canon\pythonProject\http\convert_html.py", line 104, in test    raise e  File "C:\Users\Lenovo\Desktop\canon\pythonProject\http\convert_html.py", line 102, in test    driver = webdriver.Chrome(options=webdriver_options, service=webdriver_service)  File "C:\Users\Lenovo\Desktop\canon\pythonProject\.venv\lib\site-packages\selenium\webdriver\chrome\webdriver.py", line 45, in __init__    super().__init__(  File "C:\Users\Lenovo\Desktop\canon\pythonProject\.venv\lib\site-packages\selenium\webdriver\chromium\webdriver.py", line 61, in __init__    super().__init__(command_executor=executor, options=options)  File "C:\Users\Lenovo\Desktop\canon\pythonProject\.venv\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 209, in __init__    self.start_session(capabilities)  File "C:\Users\Lenovo\Desktop\canon\pythonProject\.venv\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 293, in start_session    response = self.execute(Command.NEW_SESSION, caps)["value"]  File "C:\Users\Lenovo\Desktop\canon\pythonProject\.venv\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 348, in execute    self.error_handler.check_response(response)  File "C:\Users\Lenovo\Desktop\canon\pythonProject\.venv\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 229, in check_response    raise exception_class(message, screen, stacktrace)selenium.common.exceptions.WebDriverException: Message: disconnected: Unable to receive message from renderer  (failed to check if window was closed: disconnected: not connected to DevTools)  (Session info: chrome=120.0.6099.71)Stacktrace:    GetHandleVerifier [0x00007FF6341E4D02+56194]    (No symbol) [0x00007FF6341504B2]    (No symbol) [0x00007FF633FF76AA]    (No symbol) [0x00007FF633FE0839]    (No symbol) [0x00007FF633FE06EB]    (No symbol) [0x00007FF633FDEE3D]    (No symbol) [0x00007FF633FDF603]    (No symbol) [0x00007FF633FDE026]    (No symbol) [0x00007FF633FEEC0F]    (No symbol) [0x00007FF633FE02A8]    (No symbol) [0x00007FF633FDEE3D]    (No symbol) [0x00007FF633FDF603]    (No symbol) [0x00007FF633FDE026]    (No symbol) [0x00007FF633FE8394]    (No symbol) [0x00007FF633FE02A8]    (No symbol) [0x00007FF633FDEE3D]    (No symbol) [0x00007FF633FDF603]    (No symbol) [0x00007FF633FDE026]    (No symbol) [0x00007FF633FE55B2]    (No symbol) [0x00007FF633FE02A8]    (No symbol) [0x00007FF633FDEE3D]    (No symbol) [0x00007FF633FDF603]    (No symbol) [0x00007FF633FDE026]    (No symbol) [0x00007FF633FD5EEA]    (No symbol) [0x00007FF633FDD62D]    (No symbol) [0x00007FF633FDD1DF]    (No symbol) [0x00007FF633FF9931]    (No symbol) [0x00007FF633FD040E]    (No symbol) [0x00007FF633FCFCAC]    (No symbol) [0x00007FF634070A1C]    (No symbol) [0x00007FF634065C23]    (No symbol) [0x00007FF634034A45]    (No symbol) [0x00007FF634035AD4]    GetHandleVerifier [0x00007FF63455D5BB+3695675]    GetHandleVerifier [0x00007FF6345B6197+4059159]    GetHandleVerifier [0x00007FF6345ADF63+4025827]    GetHandleVerifier [0x00007FF63427F029+687785]    (No symbol) [0x00007FF63415B508]    (No symbol) [0x00007FF634157564]    (No symbol) [0x00007FF6341576E9]    (No symbol) [0x00007FF634148094]    BaseThreadInitThunk [0x00007FF97B5E7C24+20]    RtlUserThreadStart [0x00007FF97C3CD4D1+33]

共有1个答案

西门奇希
2024-01-06

试了下冲突主要是这行

webdriver_options.add_argument("--remote-debugging-port=9225")

这里共用同一个端口出问题了吧,去掉这行或者把端口做参数传入不同的端口测试正常

 类似资料:
  • 原始content: decode('utf-8')报错: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe8 in position 1: invalid continuation byte decode('utf-8', 'ignore'): decode('gbk', 'ignore'): decode('utf-16', 'ig

  • 本文向大家介绍基python实现多线程网页爬虫,包括了基python实现多线程网页爬虫的使用技巧和注意事项,需要的朋友参考一下 一般来说,使用线程有两种模式, 一种是创建线程要执行的函数, 把这个函数传递进Thread对象里,让它来执行. 另一种是直接从Thread继承,创建一个新的class,把线程执行的代码放到这个新的class里。 实现多线程网页爬虫,采用了多线程和锁机制,实现了广度优先算法

  • 主要内容:认识爬虫,爬虫分类,爬虫应用,爬虫是一把双刃剑,为什么用Python做爬虫,编写爬虫的流程网络爬虫又称网络蜘蛛、网络机器人,它是一种按照一定的规则自动浏览、检索网页信息的程序或者脚本。网络爬虫能够自动请求网页,并将所需要的数据抓取下来。通过对抓取的数据进行处理,从而提取出有价值的信息。 认识爬虫 我们所熟悉的一系列搜索引擎都是大型的网络爬虫,比如百度、搜狗、360浏览器、谷歌搜索等等。每个搜索引擎都拥有自己的爬虫程序,比如 360 浏览器的爬虫称作 360Spider,搜狗的爬虫叫做

  • 主要内容:多线程使用流程,Queue队列模型,多线程爬虫案例网络爬虫程序是一种 IO 密集型程序,程序中涉及了很多网络 IO 以及本地磁盘 IO 操作,这些都会消耗大量的时间,从而降低程序的执行效率,而 Python 提供的多线程能够在一定程度上提升 IO 密集型程序的执行效率。 如果想学习 Python 多进程、多线程以及 Python GIL 全局解释器锁的相关知识,可参考《Python并发编程教程》。 多线程使用流程 Python 提供了两个支持多线

  • 我是一个菜鸟,我在跟着 swift 官方文档进行学习。 ↓这是官方的示例图片,以证明我没有写错 示例代码如下,以便于大家复制 但是我在 Xcode 中却无法编译成功(最后 struct... 不是我自己改的,文件创建出来就是这样的,好像是写法改了)

  • python爬虫用 beautifulsoup 解析,有时候会出现 list out of range , 但是代码不变情况下,有时候也能运行。输出的列表均为空 输出statu_code 是 200, 也输出了soup ,但是就是列表返回不出数据