其它

优质
小牛编辑
121浏览
2023-12-01

Applications that use Beautiful Soup

Lots of real-world applications use Beautiful Soup. Here are the publicly visible applications that I know about:
很多实际的应用程序已经使用Beautiful Soup。 这里是一些我了解的公布的应用程序:

  • Scrape 'N' Feed is designed to work with Beautiful Soup to build RSS feeds for sites that don't have them.
  • htmlatex uses Beautiful Soup to find LaTeX equations and render them as graphics.
  • chmtopdf converts CHM files to PDF format. Who am I to argue with that?
  • Duncan Gough's Fotopic backup uses Beautiful Soup to scrape the Fotopic website.
  • Iñigo Serna's googlenews.py uses Beautiful Soup to scrape Google News (it's in the parse_entry and parse_category functions)
  • The Weather Office Screen Scraper uses Beautiful Soup to scrape the Canadian government's weather office site.
  • News Clues uses Beautiful Soup to parse RSS feeds.
  • BlinkFlash uses Beautiful Soup to automate form submission for an online service.
  • The linky link checker uses Beautiful Soup to find a page's links and images that need checking.
  • Matt Croydon got Beautiful Soup 1.x to work on his Nokia Series 60 smartphone. C.R. Sandeep wrote a real-time currency converter for the Series 60 using Beautiful Soup, but he won't show us how he did it.
  • Here's a short script from jacobian.org to fix the metadata on music files downloaded from allofmp3.com.
  • The Python Community Server uses Beautiful Soup in its spam detector.

类似的库

I've found several other parsers for various languages that can handle bad markup, do tree traversal for you, or are otherwise more useful than your average parser.
我已经找了几个其他的用于不同语言的可以处理烂标记的剖析器。简单介绍一下,也许对你有所帮助。

  • I've ported Beautiful Soup to Ruby. The result is Rubyful Soup.
  • Hpricot is giving Rubyful Soup a run for its money.
  • ElementTree is a fast Python XML parser with a bad attitude. I love it.
  • Tag Soup is an XML/HTML parser written in Java which rewrites bad HTML into parseable HTML.
  • HtmlPrag is a Scheme library for parsing bad HTML.
  • xmltramp is a nice take on a 'standard' XML/XHTML parser. Like most parsers, it makes you traverse the tree yourself, but it's easy to use.
  • pullparser includes a tree-traversal method.
  • Mike Foord didn't like the way Beautiful Soup can change HTML if you write the tree back out, so he wrote HTML Scraper. It's basically a version of HTMLParser that can handle bad HTML. It might be obsolete with the release of Beautiful Soup 3.0, though; I'm not sure.
  • Ka-Ping Yee's scrape.py combines page scraping with URL opening.

最后更新:

类似资料

  • 目录 有 “真实存在” 且很庞大的 Redux 项目吗? 如何在 Redux 中实现鉴权? 其他 有 “真实存在” 且很庞大的 Redux 项目吗? 存在,并且有很多,比如: Twitter's mobile site Wordpress's new admin page Firefox's new debugger Mozilla's experimental browser testbed T

  • 使用场景 nginx 命令进入根目录执行 app/console 技术选型与架构 网关层 服务异常邮件通知 单元测试

  • Shell输入输出重定向 Unix 命令默认从标准输入设备(stdin)获取输入,将结果输出到标准输出设备(stdout)显示。一般情况下,标准输入设备就是键盘,标准输出设备就是终端,即显示器。 输出重定向 命令的输出不仅可以是显示器,还可以很容易的转移向到文件,这被称为输出重定向。 命令输出重定向的语法为: command > file 这样,输出到显示器的内容就可以被重定向到文件。 例如,下面

  • 排序 bugu-mongo支持用字符串来表示排序规则。排序字符串可以采用标准的JSON格式,如: @RefList(sort="{'level': -1}") //1表示升序,-1表示降序 排序字符串可以用在注解中,也可以用在sort()方法中: List<Foo> list = dao.query.is("valid", true).sort("{'level':1}").results()

  • 其它 安全组在Havana版本中,默认是开启的,如果安装完毕后发现找不到qbr-*网桥,则可以检查在nova.conf里面是否设置以下内容: libvirt_vif_driver=nova.virt.libvirt.vif.LibvirtHybridOVSBridgeDriver

  • 其它 LICENSE custom/ 目录下可以放一些用户自定义的 Python 文件,比如自定义的拓扑类等。 test/ 目录下是一些测试的例子。 util/ 目录下是一些辅助文件,包括安装脚本、文档辅助生成等,重要的文件包括: m bash 脚本提供用户直接在 host 执行命令的接口。例如 m host cmd args… m 通过调用 mnexec 来实现对 Mininet 中的元素执行相