软件简介
htmlcxx 是一个 C++ 的 HTML 解析器和 CSS1 的解析器。The parsing politics attempt to mimic
the behavior of Mozilla Firefox, so you should expect parse trees similar to
those created by Firefox. However, it does not insert nonexistent stuff in
your HTML. Therefore, serializing the DOM tree gives exactly the same output
as the original HTML document. Another key feature is an STL-like tree
navigation API provided by the tree.hh template library.
示例代码:
#include
...
//Parse some html code
string html = "
hey";HTML::ParserDom parser;
tree<:node> dom = parser.parseTree(html);
//Print whole DOM tree
cout << dom << endl;
//Dump all links in the tree
tree<:node>::iterator it = dom.begin();
tree<:node>::iterator end = dom.end();
for (; it != end; ++it)
{
if (it->tagName() == "A")
{
it->parseAttributes();
cout << it->attributes("href");
}
}
//Dump all text of the document
it = dom.begin();
end = dom.end();
for (; it != end; ++it)
{
if ((!it->isTag()) && (!it->isComment()))
{
cout << it->text();
}
}