whalebot C/C++ web crawler

翟曦
2023-12-01

Summary

Whalebot is open-source web crawler. It is intended to be simple, fast and memory efficient. It was created as a targeted spider, but you may use it as common.

Current release 0.02

Current state. Bold - done, normal - TODO

If something broken or you have an idea, please visit http://groups.google.com/group/whalebot

Usages

  • It was used for collecting papers on target thematic from http://citeseerx.ist.psu.edu for my master degree work
  • Candidates for logo were collected using whalebot
  • Eating own dogs food (links for url parsing benchmark)

Features

  • Simple configuration from command line and text files
  • Start/Stop/Resume fetching sessions
posted on 2012-06-24 08:53  lexus 阅读( ...) 评论( ...) 编辑 收藏

转载于:https://www.cnblogs.com/lexus/archive/2012/06/24/2559703.html

 类似资料: