我是selenium的新手,正在尝试一个需要从页面中抓取URL的项目。
来源:-https://www.autofurnish.com/audi-car-accessories
我想搜集数据以获取这些产品的URL。我能够完成它,但面临滚动部分的问题。我需要抓取这个页面上所有产品的所有URL。这是一个巨大的页面,有很多结果。
我尝试过:-
1.
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
我试过这个代码,但它只是向下滚动到最后,所有的产品都没有加载。
2.
data = driver.find_elements(By.XPATH,"//h2[@class='product-title']//a")
for i in data:
driver.execute_script("arguments[0].scrollIntoView();", i)
项目 = [] last_height = driver.execute_script(“返回文档.body.滚动高度”) item_targetcount = 1000,而item_targetcount
试图从中获取帮助:- 如何在Python硒中向下滚动 一步一步地滚动到元素使用Web驱动程序?尝试观看一些YouTube视频仍然无法解决此问题。
我刮其他细节的主要代码是:-
prod_details = []
for i in models:
driver.find_element(By.XPATH,"//span[@aria-labelledby='select2-brand-container']").click()
time.sleep(2)
driver.find_element(By.XPATH,"//input[@class='select2-search__field']").send_keys(i)
driver.find_element(By.XPATH,"//input[@class='select2-search__field']").send_keys(Keys.ENTER)
driver.find_element(By.XPATH,"//div[@class='btnred sbv-link sbv-inactive']").click()
time.sleep(3)
prod = driver.find_elements(By.XPATH,"//h2[@class='product-title']//a")
for i in prod:
prod_details.append(i.get_attribute("href"))
driver.get('https://www.autofurnish.com/')
time.sleep(2)
仍然无法完全加载页面并获得所有输出。
要从元素中提取href
属性的值,可以使用列表理解,也可以使用以下定位器策略之一:
>
使用CSS_SELECTOR:
driver.get('https://www.autofurnish.com/audi-car-accessories#/pageSize=32&viewMode=grid&orderBy=0')
print([my_elem.get_attribute("href") for my_elem in driver.find_elements(By.CSS_SELECTOR, "h2.product-title a")])
driver.quit()
使用 XPATH:
driver.get('https://www.autofurnish.com/audi-car-accessories#/pageSize=32&viewMode=grid&orderBy=0')
print([my_elem.get_attribute("href") for my_elem in driver.find_elements(By.XPATH, "//h2[@class='product-title']//a")])
driver.quit()
控制台输出:
['https://www.autofurnish.com/combo-of-7d-premium-car-pillow-neck-rest-hecta-6841-back-cushion-hecta-6851-each-set-of-two-beige', 'https://www.autofurnish.com/combo-of-7d-premium-car-pillow-neck-rest-hecta-6840-back-cushion-hecta-6850-each-set-of-two-black', 'https://www.autofurnish.com/combo-of-7d-premium-car-pillow-neck-rest-hecta-6843-back-cushion-hecta-6853-each-set-of-two-coffee', 'https://www.autofurnish.com/combo-of-7d-premium-car-pillow-neck-rest-hecta-6842-back-cushion-hecta-6852-each-set-of-two-tan', 'https://www.autofurnish.com/universal-2d-premium-leather-car-foot-mats-for-2-rows-beige', 'https://www.autofurnish.com/universal-2d-premium-leather-car-foot-mats-for-2-rows-black', 'https://www.autofurnish.com/universal-2d-premium-leather-car-foot-mats-for-2-rows-coffee', 'https://www.autofurnish.com/universal-2d-premium-leather-car-foot-mats-for-2-rows-tan', 'https://www.autofurnish.com/autofurnish-3d-car-auto-seat-back-multi-pocket-storage-bag-organizer-holder-hanger-accessory-beige', 'https://www.autofurnish.com/3d-car-auto-seat-back-multi-pocket-storage-bag-organizer-holder-hanger-accessory-black', 'https://www.autofurnish.com/3d-car-auto-seat-back-multi-pocket-storage-bag-organizer-holder-hanger-accessory-coffee', 'https://www.autofurnish.com/3d-car-auto-seat-back-multi-pocket-storage-bag-organizer-holder-hanger-accessory-set-of-two-beige', 'https://www.autofurnish.com/3d-car-auto-seat-back-multi-pocket-storage-bag-organizer-holder-hanger-accessory-set-of-two-black', 'https://www.autofurnish.com/3d-car-auto-seat-back-multi-pocket-storage-bag-organizer-holder-hanger-accessory-set-of-two-brown', 'https://www.autofurnish.com/3d-car-auto-seat-back-multi-pocket-storage-bag-organizer-holder-hanger-accessory-set-of-two', 'https://www.autofurnish.com/3d-car-auto-seat-back-multi-pocket-storage-bag-organizer-holder-hanger-accessory-tan', 'https://www.autofurnish.com/3d-car-auto-seat-back-multi-pocket-storage-bag-organizer-with-car-meal-tray-beige', 'https://www.autofurnish.com/3d-car-auto-seat-back-multi-pocket-storage-bag-organizer-with-car-meal-tray-black', 'https://www.autofurnish.com/3d-car-auto-seat-back-multi-pocket-storage-bag-organizer-with-car-meal-tray-coffee', 'https://www.autofurnish.com/3d-car-auto-seat-back-multi-pocket-storage-bag-organizer-with-car-meal-tray-set-of-2-beige', 'https://www.autofurnish.com/3d-car-auto-seat-back-multi-pocket-storage-bag-organizer-with-car-meal-tray-set-of-2-black', 'https://www.autofurnish.com/3d-car-auto-seat-back-multi-pocket-storage-bag-organizer-with-car-meal-tray-set-of-2-coffee', 'https://www.autofurnish.com/3d-car-auto-seat-back-multi-pocket-storage-bag-organizer-with-car-meal-tray-set-of-2-tan', 'https://www.autofurnish.com/3d-car-auto-seat-back-multi-pocket-storage-bag-organizer-with-car-meal-tray-tan', 'https://www.autofurnish.com/5d-premium-custom-fitted-car-mats-for-audi-a4-2021-beige', 'https://www.autofurnish.com/5d-premium-custom-fitted-car-mats-for-audi-a4-2021-black', 'https://www.autofurnish.com/5d-premium-custom-fitted-car-mats-for-audi-a4-2021-coffee', 'https://www.autofurnish.com/5d-premium-custom-fitted-car-mats-for-audi-a4-2021-tan', 'https://www.autofurnish.com/5d-premium-custom-fitted-car-mats-for-audi-a6-2020-beige', 'https://www.autofurnish.com/5d-premium-custom-fitted-car-mats-for-audi-a6-2020-black', 'https://www.autofurnish.com/5d-premium-custom-fitted-car-mats-for-audi-a6-2020-coffee', 'https://www.autofurnish.com/5d-premium-custom-fitted-car-mats-for-audi-a6-2020-tan']
这是一个相当棘手的问题……我遇到了几个意想不到的问题,试图让它发挥作用。
主要问题是等待加载微调器并将其保留在屏幕上。我最初尝试像您一样滚动到页面底部,这会使页面进入加载新产品部分的无限循环,因为页脚太大,加载微调器位于可见页面上方(至少对我来说是这样)。我通过滚动到最后一个可见的产品来修复这个问题,它足以触发下一个部分加载,但不会太低,以至于进入无限加载模式。
在大多数情况下,当涉及到加载微调器时,您希望等待它变得可见,然后不可见。这可以防止不良的时机情况,是等待新产品加载的最可靠方法。
基本流程是
代码
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
...
# may need to adjust the timeout based on your experience... the site is really slow for me
wait = WebDriverWait(driver, 60)
new_count = 0
old_count = 0
while True:
old_count = new_count
products = wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "h2.product-title > a"))
new_count = len(products)
# scroll down to last product to trigger the loading spinner
driver.execute_script("arguments[0].scrollIntoView();", products[len(products) - 1])
# wait for loading spinner to appear and then disappear
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.infinite-scroll-loader")))
wait.until(EC.invisibility_of_element_located((By.CSS_SELECTOR, "div.infinite-scroll-loader")))
# if the count didn't change, we've loaded all products on the page
# I put a max of 50 products to load as a demo. You can adjust higher as needed but you should put something reasonably sized here to prevent the script from running for an hour
if new_count == old_count or new_count > 50
break
# print results
print(len(products))
for product in products:
print(product.get_attribute("href"))
我是使用selenium进行网络抓取的新手,我正在抓取SeeTicket。我们的刮刀工作原理如下。 < li >登录 < li >搜索事件 < li >单击每个事件 < li >收集数据 回来吧 < li >单击下一个事件 < li >重复 现在的问题是,某些事件不包含某些元素,例如此事件:https://wl.seetickets.us/event/Beta-Hi-Fi/484490?afflk
我想自动从其他网站获取产品数据,或者通过抓取它,或者通过使用cURL访问API。由于我们的网站使用Wordpress,我正在尝试制作一个插件。我现在尝试在插件的设置页面上获取字段,以填写网站名称、cURL的链接格式以及应该导入的产品ID。插件的设置页面上会有一个按钮,当再次单击时,该按钮会添加相同的字段。我试图使用一个对象类,因为我想使用多个网站。我在我们的网站上收到HTTP错误500,所以我认为
一、业务背景 随着互联网宽带在中国的普及,人们对Internet提出了多样化的应用需求。比如今年以来,视频直播成为了一个炙手可热的业务模式。在视频业务的服务模式中,基于互联网基础网络的视频应用,需要面对大并发量的用户,这就需要高效的内容分发和传输技术做为依托,为最终用户提供更友好更极致的体验。 二、产品概述 CDN直播产品是基于CDN节点的流媒体服务器,为客户提供直播流推送、转码、分发、和播放功能
验证当你进入产品列表页,如服装等->在72个项目后,“查看更多”不应该自动加载更多,但需要点击底部的按钮,该按钮应该加载更多项目,如果该页面中有超过12个项目。另外,当我过滤结果时,我如何验证页面中返回的产品数量?
我正在尝试将类添加到WooCommerce单产品页面,特别是产品页面上的“product”div,但也添加了其他元素——使用挂钩和过滤器。 我对PHP不是很在行,我更像是一个前端开发人员,但我一直负责设计WooCommerce,以适应自定义Wordpress主题。 我以前使用下面的代码通过文件向Wordpress中的body元素添加类。我在WooCommerce模板中的产品页面上找到了“产品”di