当前位置: 首页 > 知识库问答 >
问题:

Python BeautifulSoup:从div标记检索文本

劳高爽
2023-03-14

我是网页刮刮的新手。我正在使用美丽的汤提取谷歌播放商店。但是,我坚持从div标记中检索文本。Div标记如下所示:

a = <`div class="LVQB0b"><div class="QoPmEb"></div><div><span class="X43Kjb">Education.com</span><span class="p2TkOb">August 15, 2019</span></div>Thanks for your feedback. We are sorry to hear you're having trouble with the app. This is a known issue and our team has fixed it. Please restart the app and let us know at support@education.com if you have any further trouble. Thanks!</div>` 

我想检索从“谢谢你的反馈”开始的文本。我使用以下代码检索文本:

response = a.find('div',{'class':'LVQB0b'}).get_text()

但是,上面的命令也返回不需要的文本,即'education.com'和日期。我不确定如何从没有类名的div标记中检索文本,如上面的示例所示。等待你的指引。

共有3个答案

西门旻
2023-03-14

也可以使用next_siblingfind_next_sibling(text=true)

from bs4 import BeautifulSoup

html= '''<div class="LVQB0b"><div class="QoPmEb"></div><div><span class="X43Kjb">Education.com</span><span class="p2TkOb">August 15, 2019</span></div>Thanks for your feedback. We are sorry to hear you're having trouble with the app. This is a known issue and our team has fixed it. Please restart the app and let us know at support@education.com if you have any further trouble. Thanks!</div>'''
soup = BeautifulSoup(html, 'html.parser')
print(soup.find('div',class_='QoPmEb').find_next('div').next_sibling)
from bs4 import BeautifulSoup

html= '''<div class="LVQB0b"><div class="QoPmEb"></div><div><span class="X43Kjb">Education.com</span><span class="p2TkOb">August 15, 2019</span></div>Thanks for your feedback. We are sorry to hear you're having trouble with the app. This is a known issue and our team has fixed it. Please restart the app and let us know at support@education.com if you have any further trouble. Thanks!</div>'''
soup = BeautifulSoup(html, 'html.parser')
print(soup.find('div',class_='QoPmEb').find_next('div').find_next_sibling(text=True))
蓟捷
2023-03-14

不需要的文本是 元素的一部分。您可以找到这些元素并从结果中删除它们的文本

response = a.find('div',{'class':'LVQB0b'}).get_text()
unwanted = a.select('.LVQB0b span')
for el in unwanted:
    response = response.replace(el.get_text(), '')
斜浩穰
2023-03-14

使用查找(text=true,recursive=false)

例:

from bs4 import BeautifulSoup

s = '''<div class="LVQB0b"><div class="QoPmEb"></div><div><span class="X43Kjb">Education.com</span><span class="p2TkOb">August 15, 2019</span></div>Thanks for your feedback. We are sorry to hear you're having trouble with the app. This is a known issue and our team has fixed it. Please restart the app and let us know at support@education.com if you have any further trouble. Thanks!</div>'''    
html = BeautifulSoup(s, 'html.parser')
print(html.find('div',{'class':'LVQB0b'}).find(text=True, recursive=False))

输出:

Thanks for your feedback. We are sorry to hear you're having trouble with the app. This is a known issue and our team has fixed it. Please restart the app and let us know at support@education.com if you have any further trouble. Thanks!
 类似资料:
  • 你好,我试图使用JSoup提取嵌套DIV标签中的span标签。下面的代码只是较大代码的一个片段。 我试图提取最后一个SPAN标签中的文本(多伦多C08,莫斯公园,多伦多和120-21-S) 我已经成功地解析了文档的其他部分,但是,我似乎无法隔离这些跨度。代码片段来自一个更大的页面(整页)。我可能使用了错误的方法,但下面是我为捕获父DIV之间的跨度所做的工作(结果在帖子顶部)。 所以现在我有了第一个

  • 下面的HTML代码需要xpath 仅供参考:xpath将具有排除邮政编码的div text(),以便返回剩余的div和span文本。有时postalCode不在这个div标记中。因此,如果它存在,跳过它,如果不返回整个div标记文本。

  • 我正在尝试使用Python中的BeautifulSoup包提取存在于div标记中的文本。 示例我想提取标记 内部的文本 以及 中的文本 当我运行代码时,系统崩溃并显示以下错误: ----------------------------------------------------------------------------------------------------在60###artic

  • 因此,这又和我的另一个问题链接到同一个网站,这是另一个定位问题,可能很简单。我有一个容器div,我想在里面放两个div,一个占据容器的三分之一,在右边包含图片,一个包含文本,在左边。然而,由于某种原因,当告诉两个内部div向左浮动时,容器似乎消失了,而当使用inspect元素时,它处于一个我无法解释的奇怪位置。 HTML: CSS: Jscript/jQuery文件只是淡入淡出,但对定位没有影响

  • 我想使用jsoup从网页中提取内容。这些值在内部标签中,如何提取这些值? 例如 我想提取锚点标签中的内容首页*将如何做到这一点?

  • 如何从数据库中检索数据,其中数据的形式为 我需要没有HTML标签和其他样式的值。我只想要里面的文字。我可以用PHP检索它。 提前感谢。