简单的PHP HTML DOM 解析器中文手册 | PHP Simple HTML DOM Parser中文手册

赫连晋

2023-12-01

简单的PHP HTML DOM 解析器中文手册 | PHP Simple HTML DOM Parser中文手册

快速入门

Top

 
    //从一个URL或者文件创建一个DOM对象 
    
$html =  
    file_get_html( 
    'http://www.google.cn/'); 
    
    // 寻找所有的img标签  
    
foreach($html-> 
    find( 
    'img') as $element)  
    
       echo $element-> 
    src .  
    '<br>'; 
    
    // 寻找所有的链接标签 
    
foreach($html-> 
    find( 
    'a') as $element)  
    
       echo $element-> 
    href .  
    '<br>';

 
    //从字符串创建一个DOM对象 
    
$html =  
    str_get_html( 
    '<div id="hello">Hello</div><div id="world">World</div>'); 
    

$html-> 
    find( 
    'div', 1)-> 
    class =  
    'bar'; 
    
 
    
$html-> 
    find( 
    'div[id=hello]', 0)-> 
    innertext =  
    'foo'; 
    
 
    
echo $html;  
    // 输出: <div id="hello">foo</div><div id="world" class="bar">World</div> 
   

 
    // 从HTML中提取内容(不包含标签) 
    
echo  
    file_get_html( 
    'http://www.google.com/')-> 
    plaintext;

 
    //从URL创建一个DOM对象 
    
$html =  
    file_get_html( 
    'http://slashdot.org/'); 
    
 
    
 
    // 寻找所有的article块 
    
foreach($html-> 
    find( 
    'div.article') as $article) { 
    
    $item[ 
    'title']     = $article-> 
    find( 
    'div.title',  
    0)-> 
    plaintext; 
    
    $item[ 
    'intro']    = $article-> 
    find( 
    'div.intro',  
    0)-> 
    plaintext; 
    
    $item[ 
    'details'] = $article-> 
    find( 
    'div.details',  
    0)-> 
    plaintext; 
    
    $articles[] = $item; 
    
} 
    
 
    
print_r($articles); 
   

如何创建HTML DOM 对象？

Top

 
    //从字符串创建一个DOM对象 
    
$html =  
    str_get_html( 
    '<html><body>Hello!</body></html>'); 
    
    //从一个URL创建一个DOM对象 
    
$html =  
    file_get_html( 
    'http://www.google.com/'); 
    
    //从一个HTML文件创建一个DOM对象 
    
$html =  
    file_get_html( 
    'test.htm');

 
    //创建一个DOM对象 
    
$html = new  
    simple_html_dom(); 
    
    //从字符串中载入HTML 
    
$html-> 
    load( 
    '<html><body>Hello!</body></html>'); 
    
    //从URL中载入HTML  
    
$html-> 
    load_file( 
    'http://www.google.cn/'); 
    
    //从文件中载入HTML  
    
$html-> 
    load_file( 
    'test.htm');  
    
    //输出  
    
echo $html;

如何查找HTML元素?

Top

 
    // 查找所有的锚, 返回一个元素对象数组 
    
$ret = $html->find( 
    'a'); 
    
    //查找第N个 锚, 返回元素对象或者当找不到时返回null  
    (从零开始) 
    
$ret = $html->find( 
    'a', 0); 
    
    //查找最后一个 锚, 返回元素对象或者当找不到时返回null  
    (从零开始) 
    
$ret = $html->find( 
    'a', -1);  
    
    //通过id属性 查找所有的<div> 
    
$ret = $html->find( 
    'div[id]'); 
    
    // 查找所有属性id=foo的<div>标签 
    
$ret = $html->find( 
    'div[id=foo]');

 
    //查找所有id=foo的元素 
    
$ret = $html->find( 
    '#foo'); 
    
    //查找所有class=foo的元素 
    
$ret = $html->find( 
    '.foo'); 
    
    //查找所有包含id属性的的元素 
    
$ret = $html->find( 
    '*[id]');  
    
    //查找所有的锚与图片 
    
$ret = $html->find( 
    'a, img');  
    
    //查找所有包含title属性的锚与图片 
    
$ret = $html->find( 
    'a[title], img[title]');

     在属性过滤器中支持如下运算符: 
    
 
    
 
    过滤器描述
[属性]匹配包含指定属性的元素.
[!属性]匹配不包含指定属性的元素.
[属性=value]匹配等于特定值的指定属性的元素.
[属性!=value]匹配除包含特定值的指定属性之外的元素
[属性^=value]匹配包含特定前缀的值的指定属性的元素.
[属性$=value]匹配包含特定后缀的值的指定属性的元素.
[属性*=value]匹配包含特定值的指定属性的元素..
 
   

过滤器	描述
[属性]	匹配包含指定属性的元素.
[!属性]	匹配不包含指定属性的元素.
[属性=value]	匹配等于特定值的指定属性的元素.
[属性!=value]	匹配除包含特定值的指定属性之外的元素
[属性^=value]	匹配包含特定前缀的值的指定属性的元素.
[属性$=value]	匹配包含特定后缀的值的指定属性的元素.
[属性*=value]	匹配包含特定值的指定属性的元素..

 
    //在<ul>中查找所有的<li>后代 
    
$es = $html->find( 
    'ul li'); 
    
    //查找所有的<div>嵌套 
    标签 
    
$es = $html->find( 
    'div div div');  
    
    //在<table>中查找所有的class=hello的<td>后代  
    
$es = $html->find( 
    'table.hello td'); 
    
    //在table标签中查找所有属性align=center的td 
    
$es = $html->find( 
    ''table td[align=center]');

 
    //查找所有的text区块 
    
$es = $html->find( 
    'text'); 
    
    //查找所有的comment (<!--...-->)区块 
    
$es = $html->find( 
    'comment');

 
    //在<ul>中查找所有的<li> 
    
foreach($html->find( 
    'ul') as $ul)  
    
{ 
    
       foreach($ul->find( 
    'li') as $li)  
    
       { 
    
    //在这里执行操作... 
    
       } 
    
} 
    
    //在第一个<ul>中查找第一个<li>  
    
$e = $html->find( 
    'ul', 0)->find( 
    'li', 0);

如何访问HTML元素的属性?

Top

 
    // 获取属性(如果是一个空值属性(例如. checked, selected...这些属性),则返回true或者false) 
    
$value = $e-> 
    href; 
    
    // 设置属性(如果是一个空值属性(例如. checked, selected...这些属性),则让值等于true或者false) 
    
$e-> 
    href =  
    'my link'; 
    
    // 删除属性,让其值为空!  
    
$e-> 
    href =  
    null; 
    
    // 确定某个属性是否存在?  
    
if(isset($e-> 
    href))  
    
        echo  
    'href exist!';

 
    // 列子 
    
 
    $ 
    html = str_get_html 
    ( 
    "<div>foo <b>bar</b></div>" 
    ) 
    ;  
    
$e = $html->find( 
    "div",  
    0); 
    
 
    
echo $e-> 
    tag;  
    // 返回: " div" 
    
echo $e-> 
    outertext;  
    // 返回: " <div>foo <b>bar</b></div>" 
    
echo $e-> 
    innertext;  
    // 返回: " foo <b>bar</b>" 
    
echo $e-> 
    plaintext;  
    // 返回: " foo bar"

 
    属性名用法
$e->tag Read or write the tag name of element.
$e->outertext Read or write the outer HTML text of element.
$e->innertext Read or write the inner HTML text of element.
$e->plaintext Read or write the plain text of element.
 
   

属性名	用法
$e->tag	Read or write the tag name of element.
$e->outertext	Read or write the outer HTML text of element.
$e->innertext	Read or write the inner HTML text of element.
$e->plaintext	Read or write the plain text of element.

 
    // Extract contents from HTML  
    
echo  
    $html-> 
    plaintext; 
    
    // Wrap a element 
    
$e-> 
    outertext =  
    '<div class="wrap">' . $e-> 
    outertext .  
    '<div>'; 
    
    // Remove a element, set it's outertext as an empty string  
    
$e-> 
    outertext =  
    ''; 
    
    // Append a element 
    
$e-> 
    outertext = $e-> 
    outertext .  
    '<div>foo 
    <div>'; 
    
    // Insert a element 
    
$e-> 
    outertext =  
    '<div>foo 
    <div>' . $e-> 
    outertext;

如何遍历DOM树?

Top

 
    //如果你不是很熟悉HTML DOM,那么请点击这个链接查看更多资料...  
    
 
    
 
    //列子 
    
echo $html-> 
    find( 
    "#div1", 0)-> 
    children( 
    1)-> 
    children( 
    1)-> 
    children( 
    2)-> 
    id; 
    
 
    //或者  
    
echo $html-> 
    getElementById( 
    "div1")-> 
    childNodes( 
    1)-> 
    childNodes( 
    1)-> 
    childNodes( 
    2)-> 
    getAttribute( 
    'id'); 
   

     你也可以使用 
    骆驼命名法调用. 
    
    方法描述

          mixed 
        
 $e->children ( [int $index] )Returns the Nth child object if index is set, otherwise return an array of children.

          element 
        
 $e->parent ()Returns the parent of element.

          element 
        
 $e->first_child ()Returns the first child of element, or null if not found.

          element 
        
 $e->last_child ()Returns the last child of element, or null if not found.

          element 
        
 $e->next_sibling ()Returns the next sibling of element, or null if not found.

          element 
        
 $e->prev_sibling ()Returns the previous sibling of element, or null if not found.

方法	描述
mixed $e->children ( [int $index] )	Returns the Nth child object if index is set, otherwise return an array of children.
element $e->parent ()	Returns the parent of element.
element $e->first_child ()	Returns the first child of element, or null if not found.
element $e->last_child ()	Returns the last child of element, or null if not found.
element $e->next_sibling ()	Returns the next sibling of element, or null if not found.
element $e->prev_sibling ()	Returns the previous sibling of element, or null if not found.

如何储存DOM对象中的内容?

Top

 
    //  
    将DOM树中的内容储存在字符串中 
    
$str = $html-> 
    save(); 
    
    //将DOM树中的内容储存在文件中  
    
$html-> 
    save( 
    'result.htm');

 
    //  
    将DOM树中的内容储存在字符串中  
    
$str = $html; 
    
    //打印输出! 
    
echo $html;

如何自定义解析器方法？

Top

Callback 函数

 
    //创建一个带有"$element"参数的函数 
    
function my_callback( 
    $element) { 
    
    //隐藏所有的<b>标签 
    
        if ($element->tag== 
    'b') 
    
                $element->outertext = ''; 
    
}  
    
    //用它的函数名注册callback函数 
    
$html-> 
    set_callback( 
    'my_callback'); 
    
    //当输出时就会引用Callback函数 
    
echo $html;

作者: S.C. Chen (me578022@gmail.com)
本程序创意来自Jose Solorzano的 HTML Parser for PHP 4.
贡献者: Yousuke Kumakura, Vadim Voituk, Antcs
中文手册翻译：蜗牛
指正翻译中的错误，以及该程序的讨论地址：蜗牛的牛窝
ComSing 开发者之家

简单的PHP HTML DOM 解析器中文手册 | PHP Simple HTML DOM Parser中文手册

简单的PHP HTML DOM 解析器中文手册 | PHP Simple HTML DOM Parser中文手册

目录

快速入门

如何创建HTML DOM 对象？

如何查找HTML元素?

如何访问HTML元素的属性?

如何遍历DOM树?

如何储存DOM对象中的内容?

如何自定义解析器方法？

相关阅读

相关文章

相关问答

相关文档

简单的PHP HTML DOM 解析器 中文手册 | PHP Simple HTML DOM Parser中文手册

简单的PHP HTML DOM 解析器 中文手册 | PHP Simple HTML DOM Parser中文手册

目录

快速入门

如何创建HTML DOM 对象？

如何查找HTML元素?

如何访问HTML元素的属性?

如何遍历DOM树?

如何储存DOM对象中的内容?

如何自定义解析器方法？

相关阅读

相关文章

相关问答

相关文档

简单的PHP HTML DOM 解析器中文手册 | PHP Simple HTML DOM Parser中文手册

简单的PHP HTML DOM 解析器中文手册 | PHP Simple HTML DOM Parser中文手册