当前位置: 首页 > 面试题库 >

使用Find_All函数返回意外结果集

东郭展
2023-03-14
问题内容

我正在使用python 3.8.2和bs4
BeautifulSoup。我试图找到一个标记的所有实例,并在结果集中列出每个实例,每行一个。但是,返回的结果集包含的行数多于网站的原始内容。这是因为结果集的第一行包含标记的所有实例。接下来的行包含除第一个实例以外的所有实例,第三行包含除第一个和第二个实例以外的所有实例,依此类推,以此类推,并包含结果集的其余部分。

这是代码:

from bs4 import BeautifulSoup
import requests

url = "https://www.sainsburys.co.uk/shop/gb/groceries/drinks/seeall"

html_content = requests.get(url, timeout=5)
soup = BeautifulSoup(html_content.text)

test_1 = soup.find('ul',{"class": "productLister gridView"})

test = test_1.find_all("li", attrs={"class": "gridItem"})

我如何获得它,以使的每个实例<li class: "gridItem">仅被自己列出,每行一个。

谢谢


问题答案:

网站加载了JavaScript事件,该事件会在页面加载后动态呈现其数据。

requests库将无法JavaScript即时渲染。因此您可以使用seleniumrequests_html。确实有很多模块可以做到这一点。

现在,我们在表上确实还有另一个选项,可以跟踪从哪里渲染数据。我能够找到XHR请求,该请求用于从中检索数据back- end API并将其呈现给用户端。

您可以XHR通过打开Developer-Tools来获取请求,然后检查Network并检查XHR/JS根据调用类型发出的请求,例如fetch

您可以在下面实现您的目标:

请注意以下几点:

  1. website 持有3068 item
  2. 我增加了每页要120使用的项目parameter "pageSize": "120"
  3. 所以3068 / 120= 26,这意味着26页每页120个项目。
  4. 因此,您将需要循环使用(0, 3120, 120)哪种方式0 > 120 > 240,依此类推,使用参数"beginIndex": "0",您将在for循环下递增。

由于您没有为我们提供最终目标,因此您可以在下面实现您的目标。但我相信您的目标是nameor price(url,img)或其他。你会找到它的。

import requests
from bs4 import BeautifulSoup

params = {
    "langId": "44",
    "storeId": "10151",
    "catalogId": "10241",
    "categoryId": "12192",
    "parent_category_rn": "",
    "top_category": "12192",
    "pageSize": "120",
    "orderBy": "FAVOURITES_FIRST",
    "searchTerm": "",
    "catSeeAll": "true",
    "beginIndex": "0",
    "categoryFacetId1": "12192",
    "categoryFacetId2": "",
    "requesttype": "ajax"
}


def main(url):
    with requests.Session() as req:
        r = req.post(url, params=params).json()
        for item in r[5]['productLists']:
            for nest in item['products']:
                soup = BeautifulSoup(nest['result'], 'html.parser')
                target = soup.find("div", class_="productNameAndPromotions")
                name = target.h3.a.text.strip()
                url = target.h3.a.get("href")
                img = f"https"+target.h3.a.img.get("src")
                price = soup.find(
                    "p", class_="pricePerUnit").get_text(strip=True)
                print(name, price, img, url)


main("https://www.sainsburys.co.uk/webapp/wcs/stores/servlet/gb/groceries/drinks/AjaxApplyFilterSearchResultView")

名称和价格的简要输出:

Sainsbury's British Semi Skimmed Milk 2.27L (4 pint) £1.10/unit
Sainsbury's British Semi Skimmed Milk 1.13L (2 pint) 80p/unit
Sainsbury's British Whole Milk 2.27L (4 pint) £1.10/unit
Cravendale Purefilter Semi Skimmed Milk 2L £1.90/unit
Sainsbury's British Skimmed Milk 2.27L (4 pint) £1.10/unit
Sainsbury's British Semi Skimmed Milk, SO Organic 2.27L (4 pint) £1.80/unit
Sainsbury's Sparkling Water, Basics 2L 25p/unit
Sainsbury's British Skimmed Milk 1.13L (2 pint) 80p/unit
Sainsbury's 100% Pure Squeezed Smooth Orange Juice, Not From Concentrate 1L £1.30/unit
Sainsbury's Water, Basics 2L 25p/unit
Sainsbury's British Whole Milk 1.13L (2 pint) 80p/unit
Sainsbury's Smooth Pure Orange Juice 1L 95p/unit
Pepsi Max 2L £1.90/unit
Sainsbury's Caledonian Still Water 4x2L £1.50/unit
Highland Spring Still Water 12x500ml £3.00/unit
Sainsbury's 100% Pressed Apple Juice, Not From Concentrate 1L £1.30/unit
Sainsbury's British Whole Milk, SO Organic 2.27L (4 Pint) £1.80/unit
Lactofree Semi Skimmed Lactose Free Fresh Dairy Drink 1L £1.50/unit
Diet Coke 8x330ml £4.00/unit
Alpro Roasted Almond Unsweetened UHT Drink 1L £1.80/unit
Robinsons Orange Squash No Added Sugar 1L £1.65/unit
Sainsbury's Soda Water 1L 60p/unit
Sainsbury's Caledonian Sparkling Water 4x2L £1.60/unit
Tropicana Smooth Orange Juice 950ml £2.45/unit
Sainsbury's Diet Indian Tonic Water 1L 60p/unit
Sainsbury's Pure Apple Juice 1L 95p/unit
Robinsons Apple & Blackcurrant Squash No Added Sugar 1L £1.65/unit
Sainsbury's Sparkling Flavoured Water, Lemon & Lime 1L 50p/unit
Sainsbury's Conegliano Prosecco, Taste the Difference 75cl £8.00/unit
Sainsbury's Unsweetened Soya Drink 1L 90p/unit
Sainsbury's British Semi Skimmed Milk, SO Organic 1.13L (2 pint) £1.15/unit
Sainsbury's Caledonian Sparkling Water 6x500ml £1.50/unit
Sainsbury's Apple & Blackcurrant Squash, No Added Sugar 1.5L £1.00/unit
Highland Spring Still Water 6x1.5L £3.00/unit
Alpro Roasted Almond Unsweetened Fresh Drink 1L £1.85/unit
Sainsbury's Semi Skimmed Long Life Milk 1L 90p/unit
Tropicana Smooth Orange Juice 1.6L £2.50/unit
Sainsbury's 100% Pure Squeezed Orange Juice with Bits, Not From Concentrate 1L £1.30/unit
Cravendale Purefilter Semi Skimmed Milk 1L £1.15/unit
Sainsbury's Caledonian Still Water Sports Cap 6x500ml £1.50/unit
Sainsbury's Double Strength Orange Squash, No Added Sugar 1.5L £1.00/unit
Diet Coke 18x330ml £7.00/unit
Sainsbury's Indian Tonic Water 1L 60p/unit
Sainsbury's Pure Orange Juice 1L 85p/unit
Sainsbury's Pure Apple Juice 6x200ml £1.50/unit
Buxton Still Natural Mineral Water 8x500ml £2.00/unit
Sainsbury's Whole Long Life Milk 1L £1.05/unit
Cravendale Purefilter Skimmed Milk 2L £1.90/unit
Sainsbury's Sparkling Flavoured Water, Blackcurrant & Cherry 1L 50p/unit
Innocent Smooth Orange Juice 1.35L £3.00/unit
Alpro Original Soya Fresh Drink 1L £1.55/unit
Sainsbury's Still Flavoured Water, Strawberry & Kiwi 1L 50p/unit
Sainsbury's British Filtered Semi Skimmed Milk 2L £1.35/unit
Sainsbury's Sparkling Flavoured Water, Mango & Passionfruit 1L 50p/unit
Sainsbury's Caledonian Still Water 5L £1.10/unit
McGuigan Estate Merlot 75cl £5.10/unit
Schweppes Slimline Tonic Water 1L £1.50/unit
PG tips Pyramid Tea Bags x240 696g £4.50/unit
Sainsbury's Sparkling Flavoured Water, Strawberry & Kiwi 1L 50p/unit
Sainsbury's Caledonian Sparkling Water 2L 55p/unit
Sainsbury's Sweetened Soya Drink 1L 90p/unit
Sainsbury's 100% Pure Squeezed Smooth Orange Juice, Not From Concentrate 1.75L £2.10/unit
Sainsbury's Diet Lemonade 2L 60p/unit
Sainsbury's Apple & Mango Juice, Not From Concentrate 1L £1.30/unit
Robinsons Summer Fruits Squash No Added Sugar 1L £1.65/unit
Sainsbury's 100% Pure Squeezed Pineapple Juice, Not From Concentrate 1L £1.30/unit
Clearsprings Sauvignon Blanc 75cl £5.50/unit
Phantom River Sauvignon Blanc 75cl £5.00/unit
Nestle Pure Life Still Spring Water 12x500ml £2.50/unit
Buxton Sparkling Natural Mineral Water 8x500ml £2.10/unit
Brancott Estate Sauvignon Blanc 75cl £6.75/unit
Schweppes Slimline Lemonade 2L £1.30/unit
McGuigan Estate South Australian Shiraz 75cl £5.10/unit
Coca-Cola Zero Sugar 8x330ml £4.00/unit
Villa Maria Private Bin Sauvignon Blanc 75cl £9.25/unit
Diet Coke Caffeine Free 8x330ml £4.00/unit
Sainsbury's British Skimmed Milk, SO Organic 1.13L (2 pint) £1.15/unit
Sainsbury's Kids Caledonian Still Water 6x300ml £1.10/unit
Canti Prosecco 75cl £7.50/unit
Oatly Enriched with Calcium Oat UHT Drink 1L £1.50/unit
Sainsbury's Pure Orange Juice 6x200ml £1.50/unit
Sainsbury's Still Flavoured Water, Lemon & Lime 1L 50p/unit
Valdo Prosecco Marca Oro 75cl £8.50/unit
Oyster Bay Sauvignon Blanc 75cl £8.00/unit
Ribena Blackcurrant Squash 850ml £2.30/unit
Volvic Mineral Water 6x1.5L £3.40/unit
Campo Viejo Rioja Tempranillo 75cl £6.75/unit
Nescafé Azera Americano Instant Coffee 100g £4.60/unit
Tropicana Orange Juice Original 950ml £2.45/unit
Sainsbury's Double Strength Orange & Mango Squash, No Added Sugar 1.5L £1.00/unit  
Robinsons Lemon Squash No Added Sugar 1L £1.65/unit
Schweppes Lemonade 2L £1.30/unit
Robinsons Orange & Pineapple Squash No Added Sugar 1L £1.65/unit
Sainsbury's Diet Indian Tonic with Lime 1L 60p/unit
St Helen's Farm Semi Skimmed Goats Milk 1L £1.80/unit
Sainsbury's Double Strength Orange, Lemon & Pineapple Squash, No Added Sugar 1.5L £1.00/unit
Sainsbury's Double Strength Summerfruits Squash, No Added Sugar 1.5L £1.00/unit    
Alpro Oat UHT Drink 1L £1.80/unit
Innocent Smooth Orange Juice 900ml £1.50/unit
Sainsbury's British Whole Milk, SO Organic 1.13L (2 pint) £1.15/unit
Sainsbury's Skimmed Long Life Milk 1L 80p/unit
Nescafé Gold Blend Instant Coffee 200g £7.00/unit
Highland Spring Still Water Sports Cap 12x330ml £3.00/unit
Sainsbury's Cava Brut 75cl £6.00/unit
Alpro Light Unsweetened Soya Fresh Drink 1L £1.55/unit
Sainsbury's Caledonian Still Water 2L 50p/unit
Koko Coconut UHT Drink 1L £1.50/unit
Sainsbury's House Pinot Grigio 75cl £4.50/unit
Sainsbury's Cola Zero 2L 45p/unit
St Helen's Farm Whole Goats Milk 1L £1.80/unit
Sainsbury's Double Strength Cherries & Berries Squash, No Added Sugar 1.5L £1.00/unit
Sainsbury's Lemonade 2L 60p/unit
Sainsbury's Pure Orange Juice With Bits 1L 85p/unit
Sainsbury's Pinot Grigio, Taste the Difference 75cl £6.00/unit
Schweppes Tonic Water 1L £1.50/unit
Sainsbury's Cranberry Juice Drink 1L 85p/unit
Nescafé Gold Blend Instant Coffee Refill 150g £3.50/unit
Sainsbury's Gold Roast Instant Coffee 200g £3.15/unit
Sainsbury's Pure Orange Juice with Juicy Bits 1L 95p/unit
Edizione 789 Di Mondelli Prosecco 75cl £6.25/unit


 类似资料:
  • 我使用了以下映射:我修改了英语分析器来使用ngram分析器,如下所示,这样我应该能够在以下情况下进行搜索:1]部分搜索和特殊字符搜索2]以获得语言分析器的优势 将我的数据索引如下:

  • 问题内容: 我正在尝试使用Java的SimpleDateFormat来解析带有以下代码的日期字符串。 我期待一些解析错误。但有趣的是,它打印以下字符串。 无法推理出来。有人可以帮忙吗? 谢谢 问题答案: 已将其解析为 月份 号2011,因为month()是日期模式的第一部分。 如果将2011个月加到28年,则得到195年。 2011个月是167年零7个月。七月是第七个月。您将02指定为日,将28指

  • 问题内容: 我试图在postgres函数内返回查询结果。我尝试了一下,并完美地工作了: 问题是我需要一些东西来返回以下结果: 我需要什么回报,或者应该改变什么才能实现这一目标? 问题答案: 这可以通过一个简单的SQL函数来完成: 可以在手册中找到更多详细信息和示例:http : //www.postgresql.org/docs/current/static/xfunc-sql.html#XFUN

  • 我需要使用Repast Simphony作为模拟器开发迭代囚徒困境的Java版本。 其思想是,每个都是一个代理,我们有一个网格,由组成,无法移动。每个必须与4个邻居(北部、南部、西部和东部)比赛,根据每轮4场不同比赛的结果找到最佳策略。 由于在Repast Simphony中没有一个内置的系统来在代理之间交换消息,所以我不得不实施某种解决方案来处理代理的同步(a对B和B对a应该算作同一轮,这就是为

  • [`const express=require('express');const app=express();const https=require('https'); 常量url=“https://api.thevirustracker.com/free-api?countrytimeline=US”; app.get(“/”,(req,res)=>{res.send(“server is ru

  • 为什么in\u array()有时表现得如此奇怪,并返回如此意外的结果? 我们来看几个例子: 嗯?这里发生了什么!? (几年前,我开始怀疑这种奇怪的行为。不过我认为它可能对某些人有用,所以我进入了这个问题。)