当前位置：首页 > 面试题库 >

使用Selenium + Scrapy

那谦

2023-03-14

问题内容：

我正在尝试将Scraper与Selenium结合使用，以便能够与javascript进行交互，并且仍然具有Scrapy提供的强大的抓取框架。我编写了一个脚本，该脚本访问http://www.iens.nl，在搜索栏中输入“阿姆斯特丹”，然后成功单击搜索按钮。单击搜索按钮后，我希望scrapy从新呈现的页面中检索元素。不幸的是scrapy不会返回任何值。

这是我的代码如下所示：

from selenium import webdriver
from scrapy.loader import ItemLoader
from scrapy import Request
from scrapy.crawler import CrawlerProcess
from properties import PropertiesItem
import scrapy


class BasicSpider(scrapy.Spider):
    name = "basic"
    allowed_domains = ["web"]
    # Start on a property page
    start_urls = ['http://www.iens.nl']

    def __init__(self):
        chrome_path = '/Users/username/Documents/chromedriver'
        self.driver = webdriver.Chrome(chrome_path)

    def parse(self, response):
        self.driver.get(response.url)
        text_box = self.driver.find_element_by_xpath('//*[@id="searchText"]')
        submit_button = self.driver.find_element_by_xpath('//*[@id="button_search"]')
        text_box.send_keys("Amsterdam")
        submit_button.click()

        l = ItemLoader(item=PropertiesItem(), response=response)
        l.add_xpath('description', '//*[@id="results"]/ul/li[1]/div[2]/h3/a/')

        return l.load_item()


process = CrawlerProcess()
process.crawl(BasicSpider)
process.start()

“属性”是另一个如下所示的脚本：

from scrapy.item import Item, Field

class PropertiesItem(Item):
    # Primary fields
    description = Field()

问：如何成功地使到达页面selenium上的xpath找不到我称为“描述”的元素，并将其作为输出返回？

提前致谢！

问题答案：

response您要分配给您的对象ItemLoader是scrapy响应，而不是Selenium。

我建议Selector使用selenium返回的页面源创建一个新的页面：

from scrapy import Selector
...

selenium_response_text = driver.page_source

new_selector = Selector(text=selenium_response_text)
l = ItemLoader(item=PropertiesItem(), selector=new_selector)
...

这样，add_xpath它将从该响应结构中获取信息，而不是无用（您实际上不需要）。

类似资料：

09. Selenium的使用

9.1 动态渲染页面爬取对于访问Web时直接响应的数据（就是response内容可见），我们使用urllib、requests或Scrapy框架爬取。对应一般的JavaScript动态渲染的页面信息（Ajax加载），我们可以通过分析Ajax请求来抓取信息。即使通过Ajax获取数据，但还有会部分加密参数，后期经过JavaScript计算生成内容，导致我们难以直接找到规律，如淘宝页面。为了解决
使用 Selenium 和 WebDriver

引自ChromeDriver - WebDriver for Chrome: WebDriver 是一款开源的支持多浏览器的自动化测试工具。它提供了操作网页、用户输入、JavaScript 执行等能力。ChromeDriver 是一个实现了 WebDriver 与 Chromium 联接协议的独立服务。它也是由开发了 Chromium 和 WebDriver 的团队开发的。为了能够使 chrom
使用 Selenium 和 WebDriver

引自ChromeDriver - WebDriver for Chrome: WebDriver 是一款开源的支持多浏览器的自动化测试工具。它提供了操作网页、用户输入、JavaScript 执行等能力。ChromeDriver 是一个实现了 WebDriver 与 Chromium 联接协议的独立服务。它也是由开发了 Chromium 和 WebDriver 的团队开发的。通过 Spectron
使用 Selenium 和 WebDriver

引自 ChromeDriver - WebDriver for Chrome: WebDriver 是一款开源的支持多浏览器的自动化测试工具。它提供了操作网页、用户输入、JavaScript 执行等能力。 ChromeDriver 是一个实现了 WebDriver 与 Chromium 联接协议的独立服务。它也是由开发了 Chromium 和 WebDriver 的团队开发的。 Using S
Selenium 使用介绍

　Selenium 是 thoughtworks公司的一个集成测试的强大工具。最近参与了一个系统移植的项目，正好用到这个工具，　　把一些使用心得分享给大家，希望大家能多多使用这样的强大的，免费的工具，来保证我们的质量。　　Selenium 的文档现存的不少，不过都太简单了。使用Selenium的时候，我更多的是直接去看API文档，好在API不错，　　一个一个看，就能找到所需要的
12. Scrapy框架使用Selenium

案例目标：本节案例主要是通过Scrapy框架使用Selenium，以PhantomJS进行演示，爬取淘宝商品信息案例，并将信息存入数据库MongoDB中。准备工作：请确保PhantomJS和MongoDB都已安装号，并确保可以正常运行，安装好Scrapy、Selenium和PyMongod库。 ① 创建项目首先新建项目，名为scrapyseleniumtest： scrapy startp

相关阅读

使用phpunit / selenium保持selenium浏览器打开在Selenium中使用if / else 使用Python + Selenium选择iframe 使用Selenium粘贴命令在C＃中使用Selenium RemoteWebDriver

相关文章

Selenium简介 Selenium教程 Selenium IDE手动创建测试用例 Selenium WebDriver安装 Selenium WebDriver架构

相关问答

在Mac Chrome上使用Selenium 使用crontab（python）运行selenium 在selenium python中使用xpath 使用selenium IDE访问div Selenium使用太多内存

相关工具

Selenium Selenium Grid Selenium Java Evidence selenium-simple-test Selenium-python-helium

相关文档

Selenium 中文文档 Selenium 中文文档 Selenium Webdriver 简易教程 Selenium IDE 帮助文档 v3.9 Selenium IDE 帮助文档 v2.9