在通过Python使用Selenium进行多处理时，Chrome在数小时后崩溃

华俊贤

2023-03-14

问题内容：

这是经过数小时的抓取后的错误回溯：

The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.

这是我设置的硒蟒蛇：

#scrape.py
from selenium.common.exceptions import *
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.chrome.options import Options

def run_scrape(link):
    chrome_options = Options()
    chrome_options.add_argument('--no-sandbox')
    chrome_options.add_argument("--headless")
    chrome_options.add_argument('--disable-dev-shm-usage')
    chrome_options.add_argument("--lang=en")
    chrome_options.add_argument("--start-maximized")
    chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])
    chrome_options.add_experimental_option('useAutomationExtension', False)
    chrome_options.add_argument("user-agent=Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36")
    chrome_options.binary_location = "/usr/bin/google-chrome"
    browser = webdriver.Chrome(executable_path=r'/usr/local/bin/chromedriver', options=chrome_options)
    browser.get(<link passed here>)
    try:
        #scrape process
    except:
        #other stuffs
    browser.quit()



#multiprocess.py
import time,
from multiprocessing import Pool
from scrape import *

if __name__ == '__main__':
    start_time = time.time()
    #links = list of links to be scraped
    pool = Pool(20)
    results = pool.map(run_scrape, links)
    pool.close()
    print("Total Time Processed: "+"--- %s seconds ---" % (time.time() - start_time))

Chrome，ChromeDriver安装程序，Selenium版本

ChromeDriver 79.0.3945.36 (3582db32b33893869b8c1339e8f4d9ed1816f143-refs/branch-heads/3945@{#614})
Google Chrome 79.0.3945.79
Selenium Version: 4.0.0a3

我想知道为什么Chrome正在关闭，但其他进程正在运行？

问题答案：

我获取了您的代码，对其进行了一些修改以适合我的 测试环境 ，这是执行结果：

代码块：
- multiprocess.py ：
```
    import time
```
  from multiprocessing import Pool
  from multiprocessingPool.scrape import run_scrape
  
  if name == ‘main’:
  start_time = time.time()
  links = [“https://selenium.dev/downloads/", “https://selenium.dev/documentation/en/"]
  pool = Pool(2)
  results = pool.map(run_scrape, links)
  pool.close()
  print(“Total Time Processed: “+”— %s seconds —” % (time.time() - start_time))
- scrape.py ：
```
    from selenium import webdriver
```
  from selenium.common.exceptions import NoSuchElementException, TimeoutException
  from selenium.webdriver.common.by import By
  from selenium.webdriver.chrome.options import Options
  
  def run_scrape(link):
  chrome_options = Options()
  chrome_options.add_argument(‘–no-sandbox’)
  chrome_options.add_argument(“–headless”)
  chrome_options.add_argument(‘–disable-dev-shm-usage’)
  chrome_options.add_argument(“–lang=en”)
  chrome_options.add_argument(“–start-maximized”)
  chrome_options.add_experimental_option(“excludeSwitches”, [“enable-automation”])
  chrome_options.add_experimental_option(‘useAutomationExtension’, False)
  chrome_options.add_argument(“user-agent=Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36”)
  chrome_options.binary_location=r’C:\Program Files (x86)\Google\Chrome\Application\chrome.exe’
  browser = webdriver.Chrome(executable_path=r’C:\Utility\BrowserDrivers\chromedriver.exe’, options=chrome_options)
  browser.get(link)
  try:
  print(browser.title)
  except (NoSuchElementException, TimeoutException):
  print(“Error”)
  browser.quit()
控制台输出：
```
Downloads
```
The Selenium Browser Automation Project :: Documentation for Selenium
Total Time Processed: — 10.248600006103516 seconds —

结论

很明显，您的程序在逻辑上是完美无缺的，而且非常完美。

这个用例

正如您提到的，在刮擦几个小时后出现了此错误，我怀疑这是由于WebDriver不是线程安全的。话虽如此，如果您可以序列化对底层驱动程序实例的访问，则可以在多个线程中共享一个引用。不建议这样做。但是您始终可以为每个线程实例化一个WebDriver实例。

理想情况下， 线程安全
问题不在您的代码中，而是在实际的浏览器绑定中。他们都假设一次只能有一个命令（例如，像真实用户一样）。但是另一方面，您始终可以为每个线程实例化一个
WebDriver 实例，这将启动多个浏览选项卡/窗口。到此为止，您的程序似乎是完美的。

现在，可以在同一 Webdriver 上运行不同的线程
，但是测试结果将不是您期望的。背后的原因是，当您使用多线程在不同的选项卡/窗口上运行不同的测试时，需要一些线程安全编码，否则您将执行的操作将类似于或将转到当前具有的打开的选项卡/窗口。在
焦点* 不管的线程你希望运行。从本质上讲，这意味着所有测试将同时在具有焦点的同一选项卡/窗口上运行，而不是
在预期的选项卡/窗口上运行。 __click()``send_keys() __ *

在通过Python使用Selenium进行多处理时，Chrome在数小时后崩溃

结论

这个用例

相关阅读

相关文章

相关问答

相关工具

相关文档