问题：

python selenium网页爬虫多线程并发执行偶尔会报错，请问是什么原因？

仲孙飞文

2024-01-06

公司有个需求需要将动态html保存为pdf文件，准备用python+selenium实现。调用chromedriver的Page.printToPdf命令，在页面加载完成后获取打印的response，最后转为pdf保存。考虑到并发性，用多线程模拟测试发现偶尔会出现报错，单个执行又没问题，不知道啥原因

python 3.9.0
selenium 4.16.0

from selenium import webdriverfrom selenium.webdriver.chrome.options import Optionsfrom selenium.webdriver.chrome.service import Servicefrom selenium.common.exceptions import TimeoutExceptionfrom selenium.webdriver.common.by import Byfrom selenium.webdriver.support.ui import WebDriverWaitfrom selenium.webdriver.support.expected_conditions import staleness_offrom urllib.parse import urlparseimport jsonimport base64import threadingimport timedef test():    webdriver_service = Service(r"D:\work\chromedriver-win64\chromedriver.exe")    webdriver_options = Options()    webdriver_options.binary_location = r"D:\work\chrome-win64\chrome.exe"    webdriver_options.add_argument('--no-sandbox')    #webdriver_options.add_argument('--headless')    webdriver_options.add_argument('--disable-gpu')    webdriver_options.add_argument("--remote-debugging-port=9225")    webdriver_options.add_argument("--incognito")    #webdriver_options.page_load_strategy = 'eager'    #webdriver_options.add_argument('--disable-dev-shm-usage')    webdriver_prefs = {}    webdriver_options.experimental_options['prefs'] = webdriver_prefs    webdriver_prefs['profile.default_content_settings'] = {'images': 2}    driver = webdriver.Chrome(options=webdriver_options, service=webdriver_service)    print(driver.session_id)    driver.get("https://www.baidu.com")    driver.quit()if __name__ == '__main__':    t1 = threading.Thread(target=test)    t2 = threading.Thread(target=test)    t3 = threading.Thread(target=test)    t1.start()    t2.start()    t3.start()

Exception in thread Thread-2:Traceback (most recent call last):  File "C:\Users\Lenovo\AppData\Local\Programs\Python\Python39\lib\threading.py", line 950, in _bootstrap_inner    self.run()  File "C:\Users\Lenovo\AppData\Local\Programs\Python\Python39\lib\threading.py", line 888, in run    self._target(*self._args, **self._kwargs)  File "C:\Users\Lenovo\Desktop\canon\pythonProject\http\convert_html.py", line 104, in test    raise e  File "C:\Users\Lenovo\Desktop\canon\pythonProject\http\convert_html.py", line 102, in test    driver = webdriver.Chrome(options=webdriver_options, service=webdriver_service)  File "C:\Users\Lenovo\Desktop\canon\pythonProject\.venv\lib\site-packages\selenium\webdriver\chrome\webdriver.py", line 45, in __init__    super().__init__(  File "C:\Users\Lenovo\Desktop\canon\pythonProject\.venv\lib\site-packages\selenium\webdriver\chromium\webdriver.py", line 61, in __init__    super().__init__(command_executor=executor, options=options)  File "C:\Users\Lenovo\Desktop\canon\pythonProject\.venv\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 209, in __init__    self.start_session(capabilities)  File "C:\Users\Lenovo\Desktop\canon\pythonProject\.venv\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 293, in start_session    response = self.execute(Command.NEW_SESSION, caps)["value"]  File "C:\Users\Lenovo\Desktop\canon\pythonProject\.venv\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 348, in execute    self.error_handler.check_response(response)  File "C:\Users\Lenovo\Desktop\canon\pythonProject\.venv\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 229, in check_response    raise exception_class(message, screen, stacktrace)selenium.common.exceptions.WebDriverException: Message: disconnected: Unable to receive message from renderer  (failed to check if window was closed: disconnected: not connected to DevTools)  (Session info: chrome=120.0.6099.71)Stacktrace:    GetHandleVerifier [0x00007FF6341E4D02+56194]    (No symbol) [0x00007FF6341504B2]    (No symbol) [0x00007FF633FF76AA]    (No symbol) [0x00007FF633FE0839]    (No symbol) [0x00007FF633FE06EB]    (No symbol) [0x00007FF633FDEE3D]    (No symbol) [0x00007FF633FDF603]    (No symbol) [0x00007FF633FDE026]    (No symbol) [0x00007FF633FEEC0F]    (No symbol) [0x00007FF633FE02A8]    (No symbol) [0x00007FF633FDEE3D]    (No symbol) [0x00007FF633FDF603]    (No symbol) [0x00007FF633FDE026]    (No symbol) [0x00007FF633FE8394]    (No symbol) [0x00007FF633FE02A8]    (No symbol) [0x00007FF633FDEE3D]    (No symbol) [0x00007FF633FDF603]    (No symbol) [0x00007FF633FDE026]    (No symbol) [0x00007FF633FE55B2]    (No symbol) [0x00007FF633FE02A8]    (No symbol) [0x00007FF633FDEE3D]    (No symbol) [0x00007FF633FDF603]    (No symbol) [0x00007FF633FDE026]    (No symbol) [0x00007FF633FD5EEA]    (No symbol) [0x00007FF633FDD62D]    (No symbol) [0x00007FF633FDD1DF]    (No symbol) [0x00007FF633FF9931]    (No symbol) [0x00007FF633FD040E]    (No symbol) [0x00007FF633FCFCAC]    (No symbol) [0x00007FF634070A1C]    (No symbol) [0x00007FF634065C23]    (No symbol) [0x00007FF634034A45]    (No symbol) [0x00007FF634035AD4]    GetHandleVerifier [0x00007FF63455D5BB+3695675]    GetHandleVerifier [0x00007FF6345B6197+4059159]    GetHandleVerifier [0x00007FF6345ADF63+4025827]    GetHandleVerifier [0x00007FF63427F029+687785]    (No symbol) [0x00007FF63415B508]    (No symbol) [0x00007FF634157564]    (No symbol) [0x00007FF6341576E9]    (No symbol) [0x00007FF634148094]    BaseThreadInitThunk [0x00007FF97B5E7C24+20]    RtlUserThreadStart [0x00007FF97C3CD4D1+33]

共有1个答案

西门奇希

2024-01-06

试了下冲突主要是这行

webdriver_options.add_argument("--remote-debugging-port=9225")

这里共用同一个端口出问题了吧,去掉这行或者把端口做参数传入不同的端口测试正常

类似资料：

python爬虫 - python3 爬虫，请问这是什么编码？

原始content： decode('utf-8')报错： UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe8 in position 1: invalid continuation byte decode('utf-8', 'ignore')： decode('gbk', 'ignore')： decode('utf-16', 'ig
基python实现多线程网页爬虫

本文向大家介绍基python实现多线程网页爬虫，包括了基python实现多线程网页爬虫的使用技巧和注意事项，需要的朋友参考一下一般来说，使用线程有两种模式, 一种是创建线程要执行的函数, 把这个函数传递进Thread对象里，让它来执行. 另一种是直接从Thread继承，创建一个新的class，把线程执行的代码放到这个新的class里。实现多线程网页爬虫，采用了多线程和锁机制，实现了广度优先算法
网络爬虫是什么

主要内容：认识爬虫,爬虫分类,爬虫应用,爬虫是一把双刃剑,为什么用Python做爬虫,编写爬虫的流程网络爬虫又称网络蜘蛛、网络机器人，它是一种按照一定的规则自动浏览、检索网页信息的程序或者脚本。网络爬虫能够自动请求网页，并将所需要的数据抓取下来。通过对抓取的数据进行处理，从而提取出有价值的信息。认识爬虫我们所熟悉的一系列搜索引擎都是大型的网络爬虫，比如百度、搜狗、360浏览器、谷歌搜索等等。每个搜索引擎都拥有自己的爬虫程序，比如 360 浏览器的爬虫称作 360Spider，搜狗的爬虫叫做
Python多线程爬虫

主要内容：多线程使用流程,Queue队列模型,多线程爬虫案例网络爬虫程序是一种 IO 密集型程序，程序中涉及了很多网络 IO 以及本地磁盘 IO 操作，这些都会消耗大量的时间，从而降低程序的执行效率，而 Python 提供的多线程能够在一定程度上提升 IO 密集型程序的执行效率。如果想学习 Python 多进程、多线程以及 Python GIL 全局解释器锁的相关知识，可参考《Python并发编程教程》。多线程使用流程 Python 提供了两个支持多线
请问 swift 中的这个 Map() 报错是什么原因？

我是一个菜鸟，我在跟着 swift 官方文档进行学习。 ↓这是官方的示例图片，以证明我没有写错示例代码如下，以便于大家复制但是我在 Xcode 中却无法编译成功（最后 struct... 不是我自己改的，文件创建出来就是这样的，好像是写法改了）
爬虫python ，为什么偶尔出现list out of range ，爬不出数据的情况？

python爬虫用 beautifulsoup 解析，有时候会出现 list out of range ，但是代码不变情况下，有时候也能运行。输出的列表均为空输出statu_code 是 200，也输出了soup ，但是就是列表返回不出数据

python selenium网页爬虫多线程并发执行偶尔会报错，请问是什么原因？

共有1个答案

相关问答

相关文章

相关阅读

相关工具

相关文档