当前位置: 首页 > 工具软件 > Newspaper > 使用案例 >

python爬虫框架newspaper智能解析Html

傅峻
2023-12-01

如何使用newspaper智能解析网页?

安装

pip3 install newspaper3k

使用newspaper作为网页下载器,可以按照官网给出的例子使用

from newspaper import Article
url = ‘http://fox13now.com/2013/12/30/new-year-new-laws-obamacare-pot-guns-and-drones/’
article = Article(url)
article.download()
article.parse()
article.title
article.publish_date
article.text

scrapy + newspaper,直接把scrapy获取到的html使用newspaper解析

from newspaper import Article
from newspaper import Config
config = Config()
config.follow_meta_refresh = True
config.language = ‘zh’
article = Article(
‘’, config=config)
article.download(input_html=html)
article.parse()
article.title
article.publish_date
article.text

 类似资料: