selenium 使用headless Chrome 实现整个网页截图

东郭赞
2023-12-01
#!/usr/bin/python
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import time

url = "https://www.sina.com.cn"
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument("--disable-gpu")

browser = webdriver.Chrome(chrome_options=chrome_options, executable_path="./chromedriver")
browser.set_page_load_timeout(300)
browser.set_script_timeout(300)
browser.get(url)

# Get the actual page dimensions using javascript
width = browser.execute_script("return 
		Math.max(document.body.scrollWidth,
		 document.ssbody.offsetWi, 
		 document.documentElement.clientWidth, 
		 document.documentElement.scrollWidth, 
		 document.documentElement.offsetWidth);")
		 
height = browser.execute_script("return 
		Math.max(document.body.scrollHeight, 
		document.body.offsetHe,
		 document.documentElement.clientHeight, 
		document.documentElement.scrollHeight, 
		document.documentElement.offsetHeight);")

#resize
browser.set_window_size(width,height)
time.sleep(3)

browser.get_screenshot_as_file("./sina.png")
browser.quit()​

另: 如果只是为了截图,phantomjs 新版本也可以截图,并且默认就是整个网页的截图。 只是因为phantomjs本身的问题,如内核太旧、退出进程无法清理干净、容易被反爬虫、无人维护等问题,现在倾向于使用headless chrome.

参考文档:
1. http://yizeng.me/2014/02/23/how-to-get-window-size-resize-or-maximize-window-using-selenium-webdriver/#heading-python
2. https://gist.github.com/elcamino/5f562564ecd2fb86f559
3. https://developers.google.com/web/updates/2017/04/headless-chrome#drivers

 类似资料: