问题：

Python BeautifulSoup：从div标记检索文本

劳高爽

2023-03-14

我是网页刮刮的新手。我正在使用美丽的汤提取谷歌播放商店。但是，我坚持从div标记中检索文本。Div标记如下所示：

a = <`div class="LVQB0b"><div class="QoPmEb"></div><div><span class="X43Kjb">Education.com</span><span class="p2TkOb">August 15, 2019</span></div>Thanks for your feedback. We are sorry to hear you're having trouble with the app. This is a known issue and our team has fixed it. Please restart the app and let us know at support@education.com if you have any further trouble. Thanks!</div>`

我想检索从“谢谢你的反馈”开始的文本。我使用以下代码检索文本：

response = a.find('div',{'class':'LVQB0b'}).get_text()

但是，上面的命令也返回不需要的文本，即'education.com'和日期。我不确定如何从没有类名的div标记中检索文本，如上面的示例所示。等待你的指引。

共有3个答案

西门旻

2023-03-14

也可以使用next_sibling或find_next_sibling(text=true)

from bs4 import BeautifulSoup

html= '''<div class="LVQB0b"><div class="QoPmEb"></div><div><span class="X43Kjb">Education.com</span><span class="p2TkOb">August 15, 2019</span></div>Thanks for your feedback. We are sorry to hear you're having trouble with the app. This is a known issue and our team has fixed it. Please restart the app and let us know at support@education.com if you have any further trouble. Thanks!</div>'''
soup = BeautifulSoup(html, 'html.parser')
print(soup.find('div',class_='QoPmEb').find_next('div').next_sibling)

from bs4 import BeautifulSoup

html= '''<div class="LVQB0b"><div class="QoPmEb"></div><div><span class="X43Kjb">Education.com</span><span class="p2TkOb">August 15, 2019</span></div>Thanks for your feedback. We are sorry to hear you're having trouble with the app. This is a known issue and our team has fixed it. Please restart the app and let us know at support@education.com if you have any further trouble. Thanks!</div>'''
soup = BeautifulSoup(html, 'html.parser')
print(soup.find('div',class_='QoPmEb').find_next('div').find_next_sibling(text=True))

蓟捷

2023-03-14

不需要的文本是元素的一部分。您可以找到这些元素并从结果中删除它们的文本

response = a.find('div',{'class':'LVQB0b'}).get_text()
unwanted = a.select('.LVQB0b span')
for el in unwanted:
    response = response.replace(el.get_text(), '')

斜浩穰

2023-03-14

使用查找(text=true,recursive=false)

例：

from bs4 import BeautifulSoup

s = '''<div class="LVQB0b"><div class="QoPmEb"></div><div><span class="X43Kjb">Education.com</span><span class="p2TkOb">August 15, 2019</span></div>Thanks for your feedback. We are sorry to hear you're having trouble with the app. This is a known issue and our team has fixed it. Please restart the app and let us know at support@education.com if you have any further trouble. Thanks!</div>'''    
html = BeautifulSoup(s, 'html.parser')
print(html.find('div',{'class':'LVQB0b'}).find(text=True, recursive=False))

输出：

Thanks for your feedback. We are sorry to hear you're having trouble with the app. This is a known issue and our team has fixed it. Please restart the app and let us know at support@education.com if you have any further trouble. Thanks!

类似资料：

尝试使用JSOUP在嵌套的DIV标记中检索SPAN标记

你好，我试图使用JSoup提取嵌套DIV标签中的span标签。下面的代码只是较大代码的一个片段。我试图提取最后一个SPAN标签中的文本（多伦多C08，莫斯公园，多伦多和120-21-S）我已经成功地解析了文档的其他部分，但是，我似乎无法隔离这些跨度。代码片段来自一个更大的页面（整页）。我可能使用了错误的方法，但下面是我为捕获父DIV之间的跨度所做的工作（结果在帖子顶部）。所以现在我有了第一个
用于div标记的Xpath排除span标记并返回文本

下面的HTML代码需要xpath 仅供参考：xpath将具有排除邮政编码的div text()，以便返回剩余的div和span文本。有时postalCode不在这个div标记中。因此，如果它存在，跳过它，如果不返回整个div标记文本。
如何使用BeautifulSoup和python从div标记中提取文本

我正在尝试使用Python中的BeautifulSoup包提取存在于div标记中的文本。示例我想提取标记内部的文本以及中的文本当我运行代码时，系统崩溃并显示以下错误： ----------------------------------------------------------------------------------------------------在60###artic
超文本标记语言/CSS：容器div中的两个div

因此，这又和我的另一个问题链接到同一个网站，这是另一个定位问题，可能很简单。我有一个容器div，我想在里面放两个div，一个占据容器的三分之一，在右边包含图片，一个包含文本，在左边。然而，由于某种原因，当告诉两个内部div向左浮动时，容器似乎消失了，而当使用inspect元素时，它处于一个我无法解释的奇怪位置。 HTML: CSS: Jscript/jQuery文件只是淡入淡出，但对定位没有影响
使用jsoup从Div标记的内部标记获取属性值

我想使用jsoup从网页中提取内容。这些值在内部标签中，如何提取这些值？例如我想提取锚点标签中的内容首页*将如何做到这一点？
从数据库中检索不包括html标记的数据

如何从数据库中检索数据，其中数据的形式为我需要没有HTML标签和其他样式的值。我只想要里面的文字。我可以用PHP检索它。提前感谢。

Python BeautifulSoup：从div标记检索文本

共有3个答案

相关问答

相关文章

相关阅读

相关工具

相关文档