其它

优质

小牛编辑

121浏览

2023-12-01

Applications that use Beautiful Soup

Lots of real-world applications use Beautiful Soup. Here are the publicly visible applications that I know about:
很多实际的应用程序已经使用Beautiful Soup。这里是一些我了解的公布的应用程序：

Scrape 'N' Feed is designed to work with Beautiful Soup to build RSS feeds for sites that don't have them.
htmlatex uses Beautiful Soup to find LaTeX equations and render them as graphics.
chmtopdf converts CHM files to PDF format. Who am I to argue with that?
Duncan Gough's Fotopic backup uses Beautiful Soup to scrape the Fotopic website.
Iñigo Serna's googlenews.py uses Beautiful Soup to scrape Google News (it's in the parse_entry and parse_category functions)
The Weather Office Screen Scraper uses Beautiful Soup to scrape the Canadian government's weather office site.
News Clues uses Beautiful Soup to parse RSS feeds.
BlinkFlash uses Beautiful Soup to automate form submission for an online service.
The linky link checker uses Beautiful Soup to find a page's links and images that need checking.
Matt Croydon got Beautiful Soup 1.x to work on his Nokia Series 60 smartphone. C.R. Sandeep wrote a real-time currency converter for the Series 60 using Beautiful Soup, but he won't show us how he did it.
Here's a short script from jacobian.org to fix the metadata on music files downloaded from allofmp3.com.
The Python Community Server uses Beautiful Soup in its spam detector.

类似的库

I've found several other parsers for various languages that can handle bad markup, do tree traversal for you, or are otherwise more useful than your average parser.
我已经找了几个其他的用于不同语言的可以处理烂标记的剖析器。简单介绍一下，也许对你有所帮助。

I've ported Beautiful Soup to Ruby. The result is Rubyful Soup.
Hpricot is giving Rubyful Soup a run for its money.
ElementTree is a fast Python XML parser with a bad attitude. I love it.
Tag Soup is an XML/HTML parser written in Java which rewrites bad HTML into parseable HTML.
HtmlPrag is a Scheme library for parsing bad HTML.
xmltramp is a nice take on a 'standard' XML/XHTML parser. Like most parsers, it makes you traverse the tree yourself, but it's easy to use.
pullparser includes a tree-traversal method.
Mike Foord didn't like the way Beautiful Soup can change HTML if you write the tree back out, so he wrote HTML Scraper. It's basically a version of HTMLParser that can handle bad HTML. It might be obsolete with the release of Beautiful Soup 3.0, though; I'm not sure.
Ka-Ping Yee's scrape.py combines page scraping with URL opening.