当前位置: 首页 > 知识库问答 >
问题:

为什么Scrapy返回重复的结果?

章学义
2023-03-14

我正在尝试使用废料并遇到一些问题。问题是我的脚本返回重复的结果。我正在尝试从父页面抓取URL,并按照每个单独的URL获取关联的日期。抓取每个嵌套的URL后,它似乎会再次从父页面输出URL列表。

下面是脚本:


    import scrapy
    from aeon.items import AeonItem
    from scrapy.http.request import Request

    class AeonSpider(scrapy.Spider):
        name = "aeon"
        allowed_domains = ["aeon.co"]
        start_urls = [
            "http://aeon.co/magazine/technology"
        ]

        def parse(self, response):
            items = []
            for sel in response.xpath('//*[@id="latestPosts"]'):
                item = AeonItem()
                item['primary_url'] = sel.xpath('div/div/div/a/@href').extract()    

                for each in item['primary_url']:
                    yield Request(each, callback=self.parse_next_page,meta={'item':item})

        def parse_next_page(self, response):
            for sel in response.xpath('//*[@id="top"]'):
                item = response.meta['item']
                item['comments'] =  sel.xpath('div[5]/div[3]/div[2]/div/p/em/span[@class="instapaper_datepublished"]/text()').extract()
                return item

下面是 json 输出:


    {"comments": ["13 February 2014"], "primary_url": ["http://aeon.co/magazine/science/the-search-for-quantum-gravity/", "http://aeon.co/magazine/philosophy/should-generation-ted-take-a-more-sceptical-view/", "http://aeon.co/magazine/technology/the-elon-musk-interview-on-mars/", "http://aeon.co/video/technology/analogue-people-in-a-digital-age-a-short-film-about-technology/", "http://aeon.co/video/technology/boxa-short-film-about-projection-mapping/", "http://aeon.co/video/technology/how-to-sharpen-pencils-a-short-film-about-a-master-artisan/", "http://aeon.co/magazine/technology/do-we-want-minority-report-policing/", "http://aeon.co/magazine/health/can-you-have-self-worth-without-self-love/", "http://aeon.co/magazine/technology/i-learnt-to-survive-like-an-11th-century-farmer/", "http://aeon.co/magazine/technology/can-tiny-plankton-help-reverse-climate-change/", "http://aeon.co/magazine/technology/are-halophytes-the-crop-of-the-future/", "http://aeon.co/magazine/technology/how-will-sexbots-change-human-relationships/", "http://aeon.co/video/technology/robotic-cheetah-a-short-film-about-biomimetic-robotics/", "http://aeon.co/video/technology/internet-archive-a-short-film-about-accessing-knowledge/", "http://aeon.co/video/technology/a-tiny-planet-a-short-film-about-wondrous-video-technology/", "http://aeon.co/magazine/culture/there-is-fortuitous-beauty-in-a-brute-force-attack/", "http://aeon.co/magazine/technology/can-we-design-systems-to-automate-ethics/", "http://aeon.co/magazine/technology/before-minecraft-or-snapchat-there-was-micromuse/", "http://aeon.co/magazine/technology/meet-darpas-new-generation-of-humanoid-robots/", "http://aeon.co/magazine/technology/the-problem-with-too-much-information/", "http://aeon.co/magazine/technology/can-nyc-be-completely-self-reliant/", "http://aeon.co/video/technology/theo-a-short-film-about-the-wind-eating-strandbeest/", "http://aeon.co/video/technology/terminal-a-short-film-about-the-mechanical-ballet-of-cargo/", "http://aeon.co/video/technology/metropolis-ii-a-short-film-about-the-city-of-tomorrow/", "http://aeon.co/magazine/culture/digital-art-should-be-about-possibilities-not-technicalities/", "http://aeon.co/magazine/society/can-sustainability-really-hope-to-beat-consumerism/", "http://aeon.co/magazine/technology/is-technology-making-the-world-too-complex/", "http://aeon.co/magazine/culture/creepypasta-is-how-the-internet-learns-our-fears/", "http://aeon.co/magazine/technology/virtual-afterlives-will-transform-humanity/", "http://aeon.co/magazine/technology/what-will-happen-to-my-online-identity-when-i-die/", "http://aeon.co/magazine/society/what-does-silicon-valley-tell-us-about-innovation/", "http://aeon.co/magazine/technology/the-rise-of-biotechnology-and-the-loss-of-scientific-neutrality/", "http://aeon.co/magazine/culture/why-i-gave-up-living-in-an-off-grid-commune/"]}
    {"comments": ["31 January 2014"], "primary_url": ["http://aeon.co/magazine/science/the-search-for-quantum-gravity/", "http://aeon.co/magazine/philosophy/should-generation-ted-take-a-more-sceptical-view/", "http://aeon.co/magazine/technology/the-elon-musk-interview-on-mars/", "http://aeon.co/video/technology/analogue-people-in-a-digital-age-a-short-film-about-technology/", "http://aeon.co/video/technology/boxa-short-film-about-projection-mapping/", "http://aeon.co/video/technology/how-to-sharpen-pencils-a-short-film-about-a-master-artisan/", "http://aeon.co/magazine/technology/do-we-want-minority-report-policing/", "http://aeon.co/magazine/health/can-you-have-self-worth-without-self-love/", "http://aeon.co/magazine/technology/i-learnt-to-survive-like-an-11th-century-farmer/", "http://aeon.co/magazine/technology/can-tiny-plankton-help-reverse-climate-change/", "http://aeon.co/magazine/technology/are-halophytes-the-crop-of-the-future/", "http://aeon.co/magazine/technology/how-will-sexbots-change-human-relationships/", "http://aeon.co/video/technology/robotic-cheetah-a-short-film-about-biomimetic-robotics/", "http://aeon.co/video/technology/internet-archive-a-short-film-about-accessing-knowledge/", "http://aeon.co/video/technology/a-tiny-planet-a-short-film-about-wondrous-video-technology/", "http://aeon.co/magazine/culture/there-is-fortuitous-beauty-in-a-brute-force-attack/", "http://aeon.co/magazine/technology/can-we-design-systems-to-automate-ethics/", "http://aeon.co/magazine/technology/before-minecraft-or-snapchat-there-was-micromuse/", "http://aeon.co/magazine/technology/meet-darpas-new-generation-of-humanoid-robots/", "http://aeon.co/magazine/technology/the-problem-with-too-much-information/", "http://aeon.co/magazine/technology/can-nyc-be-completely-self-reliant/", "http://aeon.co/video/technology/theo-a-short-film-about-the-wind-eating-strandbeest/", "http://aeon.co/video/technology/terminal-a-short-film-about-the-mechanical-ballet-of-cargo/", "http://aeon.co/video/technology/metropolis-ii-a-short-film-about-the-city-of-tomorrow/", "http://aeon.co/magazine/culture/digital-art-should-be-about-possibilities-not-technicalities/", "http://aeon.co/magazine/society/can-sustainability-really-hope-to-beat-consumerism/", "http://aeon.co/magazine/technology/is-technology-making-the-world-too-complex/", "http://aeon.co/magazine/culture/creepypasta-is-how-the-internet-learns-our-fears/", "http://aeon.co/magazine/technology/virtual-afterlives-will-transform-humanity/", "http://aeon.co/magazine/technology/what-will-happen-to-my-online-identity-when-i-die/", "http://aeon.co/magazine/society/what-does-silicon-valley-tell-us-about-innovation/", "http://aeon.co/magazine/technology/the-rise-of-biotechnology-and-the-loss-of-scientific-neutrality/", "http://aeon.co/magazine/culture/why-i-gave-up-living-in-an-off-grid-commune/"]}
    {"comments": ["12 March 2014"], "primary_url": ["http://aeon.co/magazine/science/the-search-for-quantum-gravity/", "http://aeon.co/magazine/philosophy/should-generation-ted-take-a-more-sceptical-view/", "http://aeon.co/magazine/technology/the-elon-musk-interview-on-mars/", "http://aeon.co/video/technology/analogue-people-in-a-digital-age-a-short-film-about-technology/", "http://aeon.co/video/technology/boxa-short-film-about-projection-mapping/", "http://aeon.co/video/technology/how-to-sharpen-pencils-a-short-film-about-a-master-artisan/", "http://aeon.co/magazine/technology/do-we-want-minority-report-policing/", "http://aeon.co/magazine/health/can-you-have-self-worth-without-self-love/", "http://aeon.co/magazine/technology/i-learnt-to-survive-like-an-11th-century-farmer/", "http://aeon.co/magazine/technology/can-tiny-plankton-help-reverse-climate-change/", "http://aeon.co/magazine/technology/are-halophytes-the-crop-of-the-future/", "http://aeon.co/magazine/technology/how-will-sexbots-change-human-relationships/", "http://aeon.co/video/technology/robotic-cheetah-a-short-film-about-biomimetic-robotics/", "http://aeon.co/video/technology/internet-archive-a-short-film-about-accessing-knowledge/", "http://aeon.co/video/technology/a-tiny-planet-a-short-film-about-wondrous-video-technology/", "http://aeon.co/magazine/culture/there-is-fortuitous-beauty-in-a-brute-force-attack/", "http://aeon.co/magazine/technology/can-we-design-systems-to-automate-ethics/", "http://aeon.co/magazine/technology/before-minecraft-or-snapchat-there-was-micromuse/", "http://aeon.co/magazine/technology/meet-darpas-new-generation-of-humanoid-robots/", "http://aeon.co/magazine/technology/the-problem-with-too-much-information/", "http://aeon.co/magazine/technology/can-nyc-be-completely-self-reliant/", "http://aeon.co/video/technology/theo-a-short-film-about-the-wind-eating-strandbeest/", "http://aeon.co/video/technology/terminal-a-short-film-about-the-mechanical-ballet-of-cargo/", "http://aeon.co/video/technology/metropolis-ii-a-short-film-about-the-city-of-tomorrow/", "http://aeon.co/magazine/culture/digital-art-should-be-about-possibilities-not-technicalities/", "http://aeon.co/magazine/society/can-sustainability-really-hope-to-beat-consumerism/", "http://aeon.co/magazine/technology/is-technology-making-the-world-too-complex/", "http://aeon.co/magazine/culture/creepypasta-is-how-the-internet-learns-our-fears/", "http://aeon.co/magazine/technology/virtual-afterlives-will-transform-humanity/", "http://aeon.co/magazine/technology/what-will-happen-to-my-online-identity-when-i-die/", "http://aeon.co/magazine/society/what-does-silicon-valley-tell-us-about-innovation/", "http://aeon.co/magazine/technology/the-rise-of-biotechnology-and-the-loss-of-scientific-neutrality/", "http://aeon.co/magazine/culture/why-i-gave-up-living-in-an-off-grid-commune/"]}
    {"comments": ["31 March 2014"], "primary_url": ["http://aeon.co/magazine/science/the-search-for-quantum-gravity/", "http://aeon.co/magazine/philosophy/should-generation-ted-take-a-more-sceptical-view/", "http://aeon.co/magazine/technology/the-elon-musk-interview-on-mars/", "http://aeon.co/video/technology/analogue-people-in-a-digital-age-a-short-film-about-technology/", "http://aeon.co/video/technology/boxa-short-film-about-projection-mapping/", "http://aeon.co/video/technology/how-to-sharpen-pencils-a-short-film-about-a-master-artisan/", "http://aeon.co/magazine/technology/do-we-want-minority-report-policing/", "http://aeon.co/magazine/health/can-you-have-self-worth-without-self-love/", "http://aeon.co/magazine/technology/i-learnt-to-survive-like-an-11th-century-farmer/", "http://aeon.co/magazine/technology/can-tiny-plankton-help-reverse-climate-change/", "http://aeon.co/magazine/technology/are-halophytes-the-crop-of-the-future/", "http://aeon.co/magazine/technology/how-will-sexbots-change-human-relationships/", "http://aeon.co/video/technology/robotic-cheetah-a-short-film-about-biomimetic-robotics/", "http://aeon.co/video/technology/internet-archive-a-short-film-about-accessing-knowledge/", "http://aeon.co/video/technology/a-tiny-planet-a-short-film-about-wondrous-video-technology/", "http://aeon.co/magazine/culture/there-is-fortuitous-beauty-in-a-brute-force-attack/", "http://aeon.co/magazine/technology/can-we-design-systems-to-automate-ethics/", "http://aeon.co/magazine/technology/before-minecraft-or-snapchat-there-was-micromuse/", "http://aeon.co/magazine/technology/meet-darpas-new-generation-of-humanoid-robots/", "http://aeon.co/magazine/technology/the-problem-with-too-much-information/", "http://aeon.co/magazine/technology/can-nyc-be-completely-self-reliant/", "http://aeon.co/video/technology/theo-a-short-film-about-the-wind-eating-strandbeest/", "http://aeon.co/video/technology/terminal-a-short-film-about-the-mechanical-ballet-of-cargo/", "http://aeon.co/video/technology/metropolis-ii-a-short-film-about-the-city-of-tomorrow/", "http://aeon.co/magazine/culture/digital-art-should-be-about-possibilities-not-technicalities/", "http://aeon.co/magazine/society/can-sustainability-really-hope-to-beat-consumerism/", "http://aeon.co/magazine/technology/is-technology-making-the-world-too-complex/", "http://aeon.co/magazine/culture/creepypasta-is-how-the-internet-learns-our-fears/", "http://aeon.co/magazine/technology/virtual-afterlives-will-transform-humanity/", "http://aeon.co/magazine/technology/what-will-happen-to-my-online-identity-when-i-die/", "http://aeon.co/magazine/society/what-does-silicon-valley-tell-us-about-innovation/", "http://aeon.co/magazine/technology/the-rise-of-biotechnology-and-the-loss-of-scientific-neutrality/", "http://aeon.co/magazine/culture/why-i-gave-up-living-in-an-off-grid-commune/"]}
    {"comments": ["30 May 2014"], "primary_url": ["http://aeon.co/magazine/science/the-search-for-quantum-gravity/", "http://aeon.co/magazine/philosophy/should-generation-ted-take-a-more-sceptical-view/", "http://aeon.co/magazine/technology/the-elon-musk-interview-on-mars/", "http://aeon.co/video/technology/analogue-people-in-a-digital-age-a-short-film-about-technology/", "http://aeon.co/video/technology/boxa-short-film-about-projection-mapping/", "http://aeon.co/video/technology/how-to-sharpen-pencils-a-short-film-about-a-master-artisan/", "http://aeon.co/magazine/technology/do-we-want-minority-report-policing/", "http://aeon.co/magazine/health/can-you-have-self-worth-without-self-love/", "http://aeon.co/magazine/technology/i-learnt-to-survive-like-an-11th-century-farmer/", "http://aeon.co/magazine/technology/can-tiny-plankton-help-reverse-climate-change/", "http://aeon.co/magazine/technology/are-halophytes-the-crop-of-the-future/", "http://aeon.co/magazine/technology/how-will-sexbots-change-human-relationships/", "http://aeon.co/video/technology/robotic-cheetah-a-short-film-about-biomimetic-robotics/", "http://aeon.co/video/technology/internet-archive-a-short-film-about-accessing-knowledge/", "http://aeon.co/video/technology/a-tiny-planet-a-short-film-about-wondrous-video-technology/", "http://aeon.co/magazine/culture/there-is-fortuitous-beauty-in-a-brute-force-attack/", "http://aeon.co/magazine/technology/can-we-design-systems-to-automate-ethics/", "http://aeon.co/magazine/technology/before-minecraft-or-snapchat-there-was-micromuse/", "http://aeon.co/magazine/technology/meet-darpas-new-generation-of-humanoid-robots/", "http://aeon.co/magazine/technology/the-problem-with-too-much-information/", "http://aeon.co/magazine/technology/can-nyc-be-completely-self-reliant/", "http://aeon.co/video/technology/theo-a-short-film-about-the-wind-eating-strandbeest/", "http://aeon.co/video/technology/terminal-a-short-film-about-the-mechanical-ballet-of-cargo/", "http://aeon.co/video/technology/metropolis-ii-a-short-film-about-the-city-of-tomorrow/", "http://aeon.co/magazine/culture/digital-art-should-be-about-possibilities-not-technicalities/", "http://aeon.co/magazine/society/can-sustainability-really-hope-to-beat-consumerism/", "http://aeon.co/magazine/technology/is-technology-making-the-world-too-complex/", "http://aeon.co/magazine/culture/creepypasta-is-how-the-internet-learns-our-fears/", "http://aeon.co/magazine/technology/virtual-afterlives-will-transform-humanity/", "http://aeon.co/magazine/technology/what-will-happen-to-my-online-identity-when-i-die/", "http://aeon.co/magazine/society/what-does-silicon-valley-tell-us-about-innovation/", "http://aeon.co/magazine/technology/the-rise-of-biotechnology-and-the-loss-of-scientific-neutrality/", "http://aeon.co/magazine/culture/why-i-gave-up-living-in-an-off-grid-commune/"]}

重申一下,我很难从父页面输出一个url列表,从每个单独的嵌套URL输出一个相应的日期列表。我是scrapy和python的新手,所以希望有人能给我指出正确的方向。

共有1个答案

上官锦
2023-03-14

您的代码在错误的地方迭代。

response.xpath ('//*[@id="latestPosts"]')位返回一个列表,其中只有一个选择器,其中包含所有文章链接。

尝试将循环更改为:

for sel in response.xpath('//*[@id="latestPosts"]/div/div/div'):
    item = AeonItem()
    item['primary_url'] = sel.xpath('./a/@href').extract()

    ...

您可能也想在另一个回调中应用相同的更改——我将把剩下的乐趣留给您。=)

  • Scrapy Selector-查看有关嵌套选择器和使用相对xpath的部分
  • 不错的XPath教程
  • 来自webscraping战壕的XPath提示
 类似资料:
  • 当我执行普通Select时,返回正确的结果,但当我执行Select for DB uptime时,它始终返回相同的第一个结果。我确实检查了Postgres日志,我看到select被执行了。

  • 我从JS开始,实际上喜欢异步方面(来自Python),但我不确定为什么有些函数返回Promise。具体来说,下面使用的代码让我想知道返回了什么: 除了流之外,我们在之后得到的HTTP响应是一个文本块,客户端稍后会对其进行解释,以提取标题、正文和其他有趣的元素,作为HTTP内容分析的一部分。 关键是这个文本块是一块的,所以第一个已经有了整个响应——为什么JSON主体的解析是异步操作,不同于第二个?

  • 问题内容: 这个问题已经在这里有了答案 : 7年前关闭。 可能重复: 如何从PHP的MySql响应中“回显”“资源ID#6”? 我是php和SQL的新手,我正在尝试使php页面列出表中的枚举数。我正在使用此代码,但它返回资源ID#2: 问题答案: 因为执行时会获得mysql资源。 使用类似的方法来获取下一行。它返回一个以列名作为索引的数组。就您而言,可能是。 这是您的代码段的修复程序和一些小改进:

  • 本文向大家介绍Math.min() < Math.max() 返回结果是什么?为什么?相关面试题,主要包含被问及Math.min() < Math.max() 返回结果是什么?为什么?时的应答技巧和注意事项,需要的朋友参考一下 返回,因为返回,返回。 猜测的实现方式大致如下: 接受不定项参数,当参数只有一个的时候,例如,那么这个需要和(js中最小的数)进行比较,所以结果返回,哪怕传入的值再小,单个

  • line.FlatMap(WordSutil::GetWords)是方法引用中错误的返回类型: 编码方法:

  • 问题内容: 我有一堆应聘者,他们有一些或多个工作,每个人都在公司工作,并且使用了一些技能。 坏的ascii艺术如下: 这是我的数据库: 。 。 。 。 这是我对查询的尝试(请注意,我打算将通配符更改为字段名称;我只是想使某些功能生效): HediSql说 查询出了什么问题?我希望从不良的ascii艺术中可以清楚地知道我要达到的目标。 (此外,它对我连接表的顺序是否有任何速度差异?我将担心新的MyS