当前位置: 首页 > 知识库问答 >
问题:

简单的超文本标记语言DOM解析器-在Foreach循环中显示变量的问题

国景铄
2023-03-14

我使用simple_html_dom.php类创建了一个简单的PHP脚本。我从一个网站上获取一些关于电影的信息。我有一个Foreach循环在另一个Foreach循环。当我尝试在Foreach循环中显示电影名时,我会得到最后一个电影名。我想实现的是在每个项目中获得每个独特的电影名称。问题出在$电影变量上。

(当我在第27行回显$movie var时,我得到了正确的结果,但我希望在第33行的youtube链接中包含每个moviename…)

<?php
include("simple_html_dom.php");
    
$tpb = 'https://tpb.party/search/2020/1/99/200';
$html = file_get_html(html_entity_decode($tpb));
    
foreach($html->find('tr.header') as $header) {
    $header->outertext = '';
}
        
foreach($html->find('td') as $bottom) {
    if ($bottom->colspan == '9') {
        $bottom->outertext = '';
    }
}
        
foreach($html->find('td.vertTh') as $vert) {
    $vert->outertext = '';
}   
    
foreach($html->find("div.detName") as $movie) {
    $movie = $movie->plaintext;
    echo $movie;    //Works Okey, it displays each of the movietitles
    
    foreach($html->find('img') as $img) {
    
        if ($img->outertext == '<img src="https://tpb.party/static/img/11x11p.png" height="11" width="11">') {
            $img->outertext = '&nbsp;&nbsp;<a href="https://www.youtube.com/results?search_query='. $movie /* Doesn't work, only displays one title, not one each of the 30*/ .'" target="_blank"><img src="img/youtube.png" alt="Trailer" title="Trailer" style="width:19px;" width="19" height="18" border="0"></a>';
        }
    }
}   
    
$html->save();
foreach($html->find("table") as $title) {
    echo $title->outertext . '<br>';
}
?>

原始来源:

<td>
  <div class="detName"> <a href="https://tpb.party/torrent/37614340/The.Mandalorian.S02E05.Chapter.13.The.Jedi.2020.1080p.WEB-DL.X26" class="detLink" title="Details for The.Mandalorian.S02E05.Chapter.13.The.Jedi.2020.1080p.WEB-DL.X26">The.Mandalorian.S02E05.Chapter.13.The.Jedi.2020.1080p.WEB-DL.X26</a>
  </div>
  <a href="magnet:?xt=urn:btih:4AEE012597EBEA65840A96F62CEBE9926F8ECE5D&dn=The.Mandalorian.S02E05.Chapter.13.The.Jedi.2020.1080p.WEB-DL.X26&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969%2Fannounce&tr=udp%3A%2F%2F9.rarbg.to%3A2920%2Fannounce&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337&tr=udp%3A%2F%2Ftracker.internetwarriors.net%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.leechers-paradise.org%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.pirateparty.gr%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.cyberia.is%3A6969%2Fannounce"
    title="Download this torrent using magnet"><img src="https://tpb.party/static/img/icon-magnet.gif" alt="Magnet link" height="12" width="12"></a>
  <a href="https://tpb.party/user/sotnikam/"><img src="https://tpb.party/static/img/vip.gif" alt="VIP" title="VIP" style="width:11px;" border="0" height="11" width="11"></a><img src="https://tpb.party/static/img/11x11p.png" height="11" width="11">
  <font class="detDesc">Uploaded 11-27&nbsp;10:12, Size 2.71&nbsp;GiB, ULed by <a class="detDesc" href="https://tpb.party/user/sotnikam/" title="Browse sotnikam">sotnikam</a> </font>
</td>

现在怎么样:

替换IMG元素的HTML代码,问题是所有元素的链接都是相同的,而每个元素(如电影)的链接应该是唯一的:

<td>
  <div class="detName"> <a href="https://tpb.party/torrent/37614340/The.Mandalorian.S02E05.Chapter.13.The.Jedi.2020.1080p.WEB-DL.X26" class="detLink" title="Details for The.Mandalorian.S02E05.Chapter.13.The.Jedi.2020.1080p.WEB-DL.X26">The.Mandalorian.S02E05.Chapter.13.The.Jedi.2020.1080p.WEB-DL.X26</a>
  </div>
  <a href="magnet:?xt=urn:btih:4AEE012597EBEA65840A96F62CEBE9926F8ECE5D&dn=The.Mandalorian.S02E05.Chapter.13.The.Jedi.2020.1080p.WEB-DL.X26&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969%2Fannounce&tr=udp%3A%2F%2F9.rarbg.to%3A2920%2Fannounce&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337&tr=udp%3A%2F%2Ftracker.internetwarriors.net%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.leechers-paradise.org%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.pirateparty.gr%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.cyberia.is%3A6969%2Fannounce"
    title="Download this torrent using magnet"><img src="https://tpb.party/static/img/icon-magnet.gif" alt="Magnet link" height="12" width="12"></a>
  <a href="https://tpb.party/user/sotnikam/"><img src="https://tpb.party/static/img/vip.gif" alt="VIP" title="VIP" style="width:11px;" border="0" height="11" width="11"></a>&nbsp;&nbsp;
  <a href="https://www.youtube.com/results?search_query=            The.Mandalorian.S02E05.Chapter.13.The.Jedi.2020.1080p.WEB-DL.X26  " target="_blank"><img src="img/youtube.png" alt="Trailer" title="Trailer" style="width:19px;" width="19" height="18" border="0"></a>
  <font class="detDesc">Uploaded 11-27&nbsp;10:12, Size 2.71&nbsp;GiB, ULed by <a class="detDesc" href="https://tpb.party/user/sotnikam/" title="Browse sotnikam">sotnikam</a> </font>
</td>

共有2个答案

马寒
2023-03-14

理想情况下,您应该只是查看数据,(如果需要的话,更改它)然后从中构建您的表。

?php
include("simple_html_dom.php");

$tpb = 'https://tpb.party/search/2020/1/99/200';
$html = file_get_html($tpb);

function remove_junk($movie_name) {
    // you get the idea.. maybe a db or further stripping
    return str_replace([
        'WEB-DL.X26',
        'GalaxyRG',
        '.1080p.WEB-DL.X26', 
        '0.HDRip.XviD.AC3-EVO[TGx]',
        '.720p.BluRay.800MB.x264-'
    ], '', $movie_name);
}

$movies = [];
foreach($html->getElementById("searchResult")->find('tr') as $tr) {
    //
    $td = $tr->find('td');

    // buggy simple_html_dom doesn't see tbody
    if ($tr->parent->tag === 'table' && isset($td[1])) {
        $td = $tr->find('td');

        $name = trim($td[1]->find('.detName', 0)->plaintext);

        $links = [];
        foreach ($td[1]->find('a') as $link) {
            $links[] = $link->href;
        }

        $info = $td[1]->find('.detDesc', 0)->plaintext;
        $info = explode(', ', $info);

        $uploaded = trim(str_replace(['Uploaded', '&nbsp;'], ' ', $info[0]));
        $size = trim(str_replace(['Size', '&nbsp;'], ' ', $info[1]));
        $ULed = trim(str_replace(['ULed by'], ' ', $info[2]));

        $movies[] = [
            'name' => $name,
            'links' => [
                'site' => $links[0],
                'magnet' => $links[1],
                'youtube' => 'https://www.youtube.com/results?search_query='.urlencode(remove_junk($name))
            ],
            'uploaded' => $uploaded,
            'size' => $size,
            'ULed' => [
                'user' => $ULed,
                'link' => $links[3]
            ],
            'seeds' => trim($td[2]->plaintext),
            'leecher' => trim($td[3]->plaintext)
        ];
    }
}  

print_r($movies);

将产生以下结构中的数组。

Array (
    ... snip
    [30] => Array
        (
            [name] => Pinocchio.2020.720p.WEBRip.800MB.x264-GalaxyRG
            [links] => Array
                (
                    [site] => https://tpb.party/torrent/38038881/Pinocchio.2020.720p.WEBRip.800MB.x264-GalaxyRG
                    [magnet] => magnet:?xt=urn:btih:BF16ACE87DABF2300253B7EDB7600B1BAB3EE02A&dn=Pinocchio.2020.720p.WEBRip.800MB.x264-GalaxyRG&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969%2Fannounce&tr=udp%3A%2F%2F9.rarbg.to%3A2920%2Fannounce&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337&tr=udp%3A%2F%2Ftracker.internetwarriors.net%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.leechers-paradise.org%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.pirateparty.gr%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.cyberia.is%3A6969%2Fannounce
                    [youtube] => https://www.youtube.com/results?search_query=Pinocchio.2020
                )

            [uploaded] => 12-07 01:51
            [size] => 798.15 MiB
            [ULed] => Array
                (
                    [user] => sotnikam
                    [link] => https://tpb.party/user/sotnikam/
                )

            [seeds] => 351
            [leecher] => 57
        )

)

然后,您可以循环构建自己的样式表,包括youtube链接。。虽然更好的做法是在一个任务中清除所有内容,将结果数据放入数据库,然后改为执行查询,这样您可以存储它们,这样您就不会在每次请求时都清除站点,并且可以在显示损坏的站点之前检测源是否更改。

微生弘
2023-03-14

您想要的图像嵌套在detNameDIV的一个同级中。因此,您可以通过在父元素中搜索来搜索它。

由于search()允许更复杂的CSS选择器,您可以专门搜索所需的图像,而不是遍历所有图像。

foreach($html->find("div.detName") as $movieDiv) {
    $movie = $movieDiv->plaintext;
    echo $movie;    //Works Okey, it displays each of the movietitles
    
    $img = $movieDiv->parent()->find('img[src="https://tpb.party/static/img/11x11p.png"]', 0);
    if ($img) {
        $img->outertext = '&nbsp;&nbsp;<a href="https://www.youtube.com/results?search_query='. $movie .'" target="_blank"><img src="img/youtube.png" alt="Trailer" title="Trailer" style="width:19px;" width="19" height="18" border="0"></a>';
    }
}
 类似资料:
  • 我有这个 我想从每个具有类“postrow”的div中提取以下内容,并且可能还有其他类,比如<code> < li >带有类标题的标记内的内容 < li >来自“blockquote”标记的HTML。但不包括该标签内的任何div。 我尝试的代码:

  • 我是新来的。我想解析html,但问题是我们必须在中指定的URL,我将在运行时从其他页面响应此URL。有没有办法将收到的网址传递到中?我读过这样的东西: 但是我不知道如何使用它。我很想知道是否有其他方法比jsoup更好。

  • 我正在用HTML发送一封电子邮件时事通讯。在HTML中,我有如下内容

  • 对于上面的html内容,我如何使用Jsoup解析并获取文本 当我使用 我得到了这样的东西

  • 我在stackoverflow上看了其他一些答案,但没有找到一个回答我问题的答案。 我有一个变量工具快捷方式,它是由对象数组组成的对象: 我试图为对象中的每个元素(上面对象中的3个元素)返回一些HTML。因为我使用的是带有return语句的for循环,所以只显示每个数组的前几个元素(3个元素中的2个)。如何显示所有三个元素?

  • 我在输入上使用typeahead发送建议。。一切都很好,但我知道我想动态创建HTML表并在表中赋值,所以在我的java脚本中,我声明HTML变量来存储表,然后在返回值时使用这个变量。当我将html保存在变量中时,问题就出现了,它开始给出错误,即变量未定义,当我将它放在引号中时,返回时,它只显示变量。。我不知道该怎么做,也不知道应该使用哪种方法,但我被困在了这一点上,我的HTML正在工作 这里是我的