当前位置: 首页 > 知识库问答 >
问题:

如何使用BeautifulSoup和python从div标记中提取文本

和嘉澍
2023-03-14

我正在尝试使用Python中的BeautifulSoup包提取存在于div标记中的文本。

示例我想提取标记

内部的文本

以及

中的文本

当我运行代码时,系统崩溃并显示以下错误:

----------------------------------------------------------------------------------------------------在60###article_body=s.find('div',{“class”:'card-content t-small bt p20'}).text 61#text_info=s.find_all(“div”,{“class”:“card-content is-spaced”}

f:getattr(self,key)
2172中的\aienv\lib\site-packages\bs4\element.py“”“引发了一个有用的异常来解释常见的代码修复。”“”2173 Rise AttributeError(->2174“ResultSet对象没有属性”%s“。您可能将元素列表当作单个元素来处理。在要调用find()时,是否调用了find_all()?”%key 2175
)

AttributeError:ResultSet对象没有属性“find”。您可能会像对待单个元素一样对待元素列表。当您打算调用find()时,您调用了find_all()吗?

<div class="card-content t-small bt p20" style="max-height:50vh" data-viewsize='{"d":{"height": {"max": 1}}, "offset":"JobSearch.jobViewSize"}'>
<h2 class="h6">Job Description</h2>
<p>The Executive Chef has full knowledge and capability of managing the general operations of the kitchen, specialty outlets kitchen including Stewarding.</p>
<h2 class="h6 p10t">Skills</h2>
<p>•  Provide, develop, train and maintain a professional workforce• Excellent in English both in oral and written.• Computer knowledge is required and good in correspondences and reports writing.</p>
<h2 class="h6 p10t">Job Details</h2>
<dl class="dlist is-spaced is-fitted t-small m0">
<div>
<dt>Job Location</dt>
<dd> Al Olaya, Riyadh , Saudi Arabia </dd>
</div>
<div>
<dt>Company Industry</dt>
<dd>Food & Beverage Production; Entertainment; Catering, Food Service, & Restaurant</dd>
</div>
<div>
<dt>Company Type</dt>
<dd>Employer (Private Sector)</dd>
</div>
<div>
<dt>Job Role</dt>
<dd>Hospitality and Tourism</dd>
</div>
<div>
<dt>Employment Type</dt>
<dd>Unspecified</dd>
</div>
<div>
<dt>Monthly Salary Range</dt>
<dd>$4,000 - $5,000</dd>
</div>
<div>
<dt>Number of Vacancies</dt>
<dd>1</dd>
</div>
</dl>
<h2 class="h6 p10t">Preferred Candidate</h2>
<dl class="dlist is-spaced is-fitted t-small m0">
<div>
<dt>Career Level</dt>
<dd>Management</dd>
</div>
<div>
<dt>Years of Experience</dt>
<dd>Min: 10 Max: 20</dd>
</div>
<div>
<dt>Residence Location</dt>
<dd> Riyadh, Saudi Arabia ; Algeria; Bahrain; Comoros; Djibouti; Egypt; Iraq; Jordan; Kuwait; Lebanon; Libya; Mauritania; Morocco; Oman; Palestine; Qatar; Saudi Arabia; Somalia; Sudan; Syria; Tunisia; United Arab Emirates; Yemen</dd>
</div>
<div>
<dt>Gender</dt>
<dd>Male</dd>
</div>
<div>
<dt>Age</dt>
<dd>Min: 26 Max: 55</dd>
</div>
</dl>
</div>

==========================================================================

import time
import requests
from bs4 import BeautifulSoup

soup = BeautifulSoup(
    requests.get("https://www.bayt.com/en/international/jobs/executive-chef-jobs/").content,
    "lxml"
)

links = []
for a in soup.select("h2.m0.t-regular a"):
    if a['href'] not in links:
        links.append("https://www.bayt.com"+ a['href'])

for link in links:
    s = BeautifulSoup(requests.get(link).content, "lxml")
    text_info = s.find_all("div",{"class":"card-content is-spaced"})
    text_desc = text_info.find('div', attrs={'class':'card-content t-small bt p20'}).getText(strip=True)
    
    print(f"{date_published} {title}\n\n{text_desc}\n", "-" * 80)

共有2个答案

胥承
2023-03-14

要获取jobdesc和其他详细信息,请使用以下css选择器。

import requests
from bs4 import BeautifulSoup

soup = BeautifulSoup(
    requests.get("https://www.bayt.com/en/international/jobs/executive-chef-jobs/").content,"lxml")

links = []
for a in soup.select("h2.m0.t-regular a"):
    if a['href'] not in links:
        links.append("https://www.bayt.com"+ a['href'])

for link in links:
    print(link)
    s = BeautifulSoup(requests.get(link).content, "lxml")
    jobdesc=s.select_one("div[class='card-content is-spaced'] p")
    print(jobdesc.text)
    alldt = [dt.text for dt in s.select("div[class='card-content is-spaced'] dt")]
    print(alldt)
    alldt = [dd.text for dd in s.select("div[class='card-content is-spaced'] dd")]
    print(alldt)
    print("-" * 80) 

控制台输出:

https://www.bayt.com/en/qatar/jobs/executive-chef-4276199/
The ideal candidate is a seasoned chef with a background in fine dining. You will run an efficient kitchen by consistently looking to improve the menu, producing quality food, and working closely with rthe other staffs in the overall food and beverage operations of the palace.

['Job Location', 'Company Industry', 'Company Type', 'Job Role', 'Employment Type', 'Monthly Salary Range', 'Number of Vacancies', 'Career Level', 'Years of Experience', 'Residence Location', 'Gender', 'Nationality', 'Degree', 'Age']
[' Doha, Qatar ', 'Food & Beverage Production', 'Employer (Private Sector)', 'Management', 'Contractor', 'Unspecified', '2', 'Senior Executive', 'Min: 5', 'India; Lebanon', 'Male', 'Bahrain; Kuwait; Oman; Qatar; Saudi Arabia; United Arab Emirates', 'Certification / diploma', 'Min: 36']
--------------------------------------------------------------------------------
https://www.bayt.com/en/saudi-arabia/jobs/executive-chef-for-5-star-hotel-4274940/
The Executive Chef has full knowledge and capability of managing the general operations of the kitchen, specialty outlets kitchen including Stewarding. Responsibility includes food preparations that are used for banqueting, conferences, outside events, and catering. Basically ensures the culinary dishes are of high-quality prepared and served to enhance the guest experience. Monitors local competitors and compare their operations with the Food & Beverage Preparation enable to modify and develop a popular menu as needed so they remain effective for the purpose of the restaurants and other establishments. Also performs many administrative tasks including kitchen item requisition, ordering supplies, and maintain the highest professional food quality, hygiene, and sanitation standards.

['Job Location', 'Company Industry', 'Company Type', 'Job Role', 'Employment Type', 'Monthly Salary Range', 'Number of Vacancies', 'Career Level', 'Years of Experience', 'Residence Location', 'Gender', 'Age']
[' Al Olaya, Riyadh , Saudi Arabia ', 'Food & Beverage Production; Entertainment; Catering, Food Service, & Restaurant', 'Employer (Private Sector)', 'Hospitality and Tourism', 'Unspecified', '$4,000 - $5,000', '1', 'Management', 'Min: 10 Max: 20', ' Riyadh,Saudi Arabia ; Algeria; Bahrain; Comoros; Djibouti; Egypt; Iraq; Jordan; Kuwait; Lebanon; Libya; Mauritania; Morocco; Oman; Palestine; Qatar; Saudi Arabia; Somalia; Sudan; Syria; Tunisia; United Arab Emirates; Yemen', 'Male', 'Min: 26 Max: 55']
--------------------------------------------------------------------------------
https://www.bayt.com/en/saudi-arabia/jobs/executive-chef-4273678/

['Job Location', 'Company Industry', 'Company Type', 'Job Role', 'Employment Type', 'Monthly Salary Range', 'Number of Vacancies', 'Career Level', 'Residence Location']
[' Riyadh, Saudi Arabia ', 'Hospitality & Accomodation', 'Employer (Private Sector)', 'Hospitality and Tourism', 'Unspecified', 'Unspecified', 'Unspecified', 'Management', 'Saudi Arabia']
--------------------------------------------------------------------------------
https://www.bayt.com/en/other/jobs/executive-chef-4-58272955/
 Unit Description:  Artisan Restaurant Collection has a great Executive Chef 4 (resource lasting up-to6 months)opportunity in the Los Angeles area of California for a new piece of business.  The Artisan Restaurant Collection was imagined and created in California by a market need for local sustainable, chef driven, farm to fork food created with love.  The Executive Chef 4 will have total culinary responsibilities including the supervision ofhourly staff with a focus on amazing fresh food for this location.  The Ideal candidate must have 
['Job Location', 'Company Industry', 'Company Type', 'Job Role', 'Employment Type', 'Monthly Salary Range', 'Number of Vacancies']
['Other', 'Other Business Support Services', 'Unspecified', 'Hospitality and Tourism', 'Full Time Employee', 'Unspecified', 'Unspecified']
--------------------------------------------------------------------------------
https://www.bayt.com/en/other/jobs/executive-chef-3-58273086/
 Unit Description:  Artisan Restaurant Collection has a great Executive Chef 3 opportunity in San Jose, California for a new business venture.  The Artisan Restaurant Collection was imagined and created in California by a market need for local sustainable, chef driven, farm to fork food created with love.  The Executive Chef 3 will have total culinary responsibilities including the supervision ofhourly staff with a focus on amazing Asian food for this location.  The Ideal candidate must have 
['Job Location', 'Company Industry', 'Company Type', 'Job Role', 'Employment Type', 'Monthly Salary Range', 'Number of Vacancies']
['Other', 'Other Business Support Services', 'Unspecified', 'Hospitality and Tourism', 'Full Time Employee', 'Unspecified', 'Unspecified']
--------------------------------------------------------------------------------
so on..............
索吕恭
2023-03-14

您正在执行一个find_all,然后使用它,也许您需要为text_info:中的文本执行一个循环并提取该循环的信息。如果需要第一个div,请使用find而不是find_all

希望能帮到你!

 类似资料:
  • 我是网页刮刮的新手。我正在使用美丽的汤提取谷歌播放商店。但是,我坚持从div标记中检索文本。Div标记如下所示: 我想检索从“谢谢你的反馈”开始的文本。我使用以下代码检索文本: 但是,上面的命令也返回不需要的文本,即'education.com'和日期。我不确定如何从没有类名的div标记中检索文本,如上面的示例所示。等待你的指引。

  • 问题内容: 我要提取: 来自标签的src的文本和 类数据内的定位标记的文本 我成功地提取了img src,但是从锚标记中提取文本时遇到了麻烦。 这是整个HTML页面的链接。 这是我的代码: 我想做的是 提取图像src(链接)和中的标题,因此例如: 应该提取: 问题答案: 以上所有答案确实可以帮助我构建答案,因此,我对其他用户提出的所有答案投了赞成票:但是我最终对自己正在处理的确切问题汇总了自己的答

  • 我想摘录: 图像标记和 类数据内的锚标记文本 我成功地提取了img src,但从锚标记中提取文本时遇到了问题。 这是整个HTML页面的链接。 这是我的代码: 我试图做的是提取图像src(link)和div class=data中的标题,例如: 应提取: 尼康COOLPIX L26 16.1 MP数码相机,配备5倍变焦NIKKOR玻璃镜头和3英寸LCD(红色)

  • 问题内容: 我没有使用python,BeautifulSoup,Selenium等的经验,但是我很想从网站上抓取数据并将其存储为csv文件。我需要的单个数据样本编码如下(一行数据)。 我需要的输出是 我发现这些数据没有ID或类,但仍以通用文本形式出现在网站中。为此,我分别尝试使用BeautifulSoup和Python Selenium,在这两种方法中,我都陷入了无法提取的麻烦,因为我没有看到任何

  • 我想从Page_inspect得到课文课的价格。 使用driver.find_element_by_xpath和 Web 驱动程序等待。 结果未找到 : 回溯(最后一次调用):文件“D:\project\totempop\webscraping\asrPOP.py”,第22行,rateText=WebDriverWait(driver,10)。直到(EC.presence_of_all_eleme

  • 问题内容: 这是我要从中提取数据的网站链接,我试图在锚标记下获取属性的所有文本。这是示例html: 我想提取所有文本值,例如。 我尝试了: 但它给出(空)字符串。 关于如何实现的任何建议? PS-在“ 产品类型”* 下选择单选按钮的第一个值 * 问题答案: 要提取标签内的所有文本值,例如 [‘A / D TC-55 SEALER’,’Carbocrylic 3356-1’] ,您必须为引入 Web