我使用simple_html_dom.php类创建了一个简单的PHP脚本。我从一个网站上获取一些关于电影的信息。我有一个Foreach循环在另一个Foreach循环。当我尝试在Foreach循环中显示电影名时,我会得到最后一个电影名。我想实现的是在每个项目中获得每个独特的电影名称。问题出在$电影变量上。
(当我在第27行回显$movie var时,我得到了正确的结果,但我希望在第33行的youtube链接中包含每个moviename…)
<?php
include("simple_html_dom.php");
$tpb = 'https://tpb.party/search/2020/1/99/200';
$html = file_get_html(html_entity_decode($tpb));
foreach($html->find('tr.header') as $header) {
$header->outertext = '';
}
foreach($html->find('td') as $bottom) {
if ($bottom->colspan == '9') {
$bottom->outertext = '';
}
}
foreach($html->find('td.vertTh') as $vert) {
$vert->outertext = '';
}
foreach($html->find("div.detName") as $movie) {
$movie = $movie->plaintext;
echo $movie; //Works Okey, it displays each of the movietitles
foreach($html->find('img') as $img) {
if ($img->outertext == '<img src="https://tpb.party/static/img/11x11p.png" height="11" width="11">') {
$img->outertext = ' <a href="https://www.youtube.com/results?search_query='. $movie /* Doesn't work, only displays one title, not one each of the 30*/ .'" target="_blank"><img src="img/youtube.png" alt="Trailer" title="Trailer" style="width:19px;" width="19" height="18" border="0"></a>';
}
}
}
$html->save();
foreach($html->find("table") as $title) {
echo $title->outertext . '<br>';
}
?>
原始来源:
<td>
<div class="detName"> <a href="https://tpb.party/torrent/37614340/The.Mandalorian.S02E05.Chapter.13.The.Jedi.2020.1080p.WEB-DL.X26" class="detLink" title="Details for The.Mandalorian.S02E05.Chapter.13.The.Jedi.2020.1080p.WEB-DL.X26">The.Mandalorian.S02E05.Chapter.13.The.Jedi.2020.1080p.WEB-DL.X26</a>
</div>
<a href="magnet:?xt=urn:btih:4AEE012597EBEA65840A96F62CEBE9926F8ECE5D&dn=The.Mandalorian.S02E05.Chapter.13.The.Jedi.2020.1080p.WEB-DL.X26&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969%2Fannounce&tr=udp%3A%2F%2F9.rarbg.to%3A2920%2Fannounce&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337&tr=udp%3A%2F%2Ftracker.internetwarriors.net%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.leechers-paradise.org%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.pirateparty.gr%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.cyberia.is%3A6969%2Fannounce"
title="Download this torrent using magnet"><img src="https://tpb.party/static/img/icon-magnet.gif" alt="Magnet link" height="12" width="12"></a>
<a href="https://tpb.party/user/sotnikam/"><img src="https://tpb.party/static/img/vip.gif" alt="VIP" title="VIP" style="width:11px;" border="0" height="11" width="11"></a><img src="https://tpb.party/static/img/11x11p.png" height="11" width="11">
<font class="detDesc">Uploaded 11-27 10:12, Size 2.71 GiB, ULed by <a class="detDesc" href="https://tpb.party/user/sotnikam/" title="Browse sotnikam">sotnikam</a> </font>
</td>
现在怎么样:
替换IMG元素的HTML代码,问题是所有元素的链接都是相同的,而每个元素(如电影)的链接应该是唯一的:
<td>
<div class="detName"> <a href="https://tpb.party/torrent/37614340/The.Mandalorian.S02E05.Chapter.13.The.Jedi.2020.1080p.WEB-DL.X26" class="detLink" title="Details for The.Mandalorian.S02E05.Chapter.13.The.Jedi.2020.1080p.WEB-DL.X26">The.Mandalorian.S02E05.Chapter.13.The.Jedi.2020.1080p.WEB-DL.X26</a>
</div>
<a href="magnet:?xt=urn:btih:4AEE012597EBEA65840A96F62CEBE9926F8ECE5D&dn=The.Mandalorian.S02E05.Chapter.13.The.Jedi.2020.1080p.WEB-DL.X26&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969%2Fannounce&tr=udp%3A%2F%2F9.rarbg.to%3A2920%2Fannounce&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337&tr=udp%3A%2F%2Ftracker.internetwarriors.net%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.leechers-paradise.org%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.pirateparty.gr%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.cyberia.is%3A6969%2Fannounce"
title="Download this torrent using magnet"><img src="https://tpb.party/static/img/icon-magnet.gif" alt="Magnet link" height="12" width="12"></a>
<a href="https://tpb.party/user/sotnikam/"><img src="https://tpb.party/static/img/vip.gif" alt="VIP" title="VIP" style="width:11px;" border="0" height="11" width="11"></a>
<a href="https://www.youtube.com/results?search_query= The.Mandalorian.S02E05.Chapter.13.The.Jedi.2020.1080p.WEB-DL.X26 " target="_blank"><img src="img/youtube.png" alt="Trailer" title="Trailer" style="width:19px;" width="19" height="18" border="0"></a>
<font class="detDesc">Uploaded 11-27 10:12, Size 2.71 GiB, ULed by <a class="detDesc" href="https://tpb.party/user/sotnikam/" title="Browse sotnikam">sotnikam</a> </font>
</td>
理想情况下,您应该只是查看数据,(如果需要的话,更改它)然后从中构建您的表。
?php
include("simple_html_dom.php");
$tpb = 'https://tpb.party/search/2020/1/99/200';
$html = file_get_html($tpb);
function remove_junk($movie_name) {
// you get the idea.. maybe a db or further stripping
return str_replace([
'WEB-DL.X26',
'GalaxyRG',
'.1080p.WEB-DL.X26',
'0.HDRip.XviD.AC3-EVO[TGx]',
'.720p.BluRay.800MB.x264-'
], '', $movie_name);
}
$movies = [];
foreach($html->getElementById("searchResult")->find('tr') as $tr) {
//
$td = $tr->find('td');
// buggy simple_html_dom doesn't see tbody
if ($tr->parent->tag === 'table' && isset($td[1])) {
$td = $tr->find('td');
$name = trim($td[1]->find('.detName', 0)->plaintext);
$links = [];
foreach ($td[1]->find('a') as $link) {
$links[] = $link->href;
}
$info = $td[1]->find('.detDesc', 0)->plaintext;
$info = explode(', ', $info);
$uploaded = trim(str_replace(['Uploaded', ' '], ' ', $info[0]));
$size = trim(str_replace(['Size', ' '], ' ', $info[1]));
$ULed = trim(str_replace(['ULed by'], ' ', $info[2]));
$movies[] = [
'name' => $name,
'links' => [
'site' => $links[0],
'magnet' => $links[1],
'youtube' => 'https://www.youtube.com/results?search_query='.urlencode(remove_junk($name))
],
'uploaded' => $uploaded,
'size' => $size,
'ULed' => [
'user' => $ULed,
'link' => $links[3]
],
'seeds' => trim($td[2]->plaintext),
'leecher' => trim($td[3]->plaintext)
];
}
}
print_r($movies);
将产生以下结构中的数组。
Array (
... snip
[30] => Array
(
[name] => Pinocchio.2020.720p.WEBRip.800MB.x264-GalaxyRG
[links] => Array
(
[site] => https://tpb.party/torrent/38038881/Pinocchio.2020.720p.WEBRip.800MB.x264-GalaxyRG
[magnet] => magnet:?xt=urn:btih:BF16ACE87DABF2300253B7EDB7600B1BAB3EE02A&dn=Pinocchio.2020.720p.WEBRip.800MB.x264-GalaxyRG&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969%2Fannounce&tr=udp%3A%2F%2F9.rarbg.to%3A2920%2Fannounce&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337&tr=udp%3A%2F%2Ftracker.internetwarriors.net%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.leechers-paradise.org%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.pirateparty.gr%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.cyberia.is%3A6969%2Fannounce
[youtube] => https://www.youtube.com/results?search_query=Pinocchio.2020
)
[uploaded] => 12-07 01:51
[size] => 798.15 MiB
[ULed] => Array
(
[user] => sotnikam
[link] => https://tpb.party/user/sotnikam/
)
[seeds] => 351
[leecher] => 57
)
)
然后,您可以循环构建自己的样式表,包括youtube链接。。虽然更好的做法是在一个任务中清除所有内容,将结果数据放入数据库,然后改为执行查询,这样您可以存储它们,这样您就不会在每次请求时都清除站点,并且可以在显示损坏的站点之前检测源是否更改。
您想要的图像嵌套在detName
DIV的一个同级中。因此,您可以通过在父元素中搜索来搜索它。
由于search()
允许更复杂的CSS选择器,您可以专门搜索所需的图像,而不是遍历所有图像。
foreach($html->find("div.detName") as $movieDiv) {
$movie = $movieDiv->plaintext;
echo $movie; //Works Okey, it displays each of the movietitles
$img = $movieDiv->parent()->find('img[src="https://tpb.party/static/img/11x11p.png"]', 0);
if ($img) {
$img->outertext = ' <a href="https://www.youtube.com/results?search_query='. $movie .'" target="_blank"><img src="img/youtube.png" alt="Trailer" title="Trailer" style="width:19px;" width="19" height="18" border="0"></a>';
}
}
我有这个 我想从每个具有类“postrow”的div中提取以下内容,并且可能还有其他类,比如<code> < li >带有类标题的标记内的内容 < li >来自“blockquote”标记的HTML。但不包括该标签内的任何div。 我尝试的代码:
我是新来的。我想解析html,但问题是我们必须在中指定的URL,我将在运行时从其他页面响应此URL。有没有办法将收到的网址传递到中?我读过这样的东西: 但是我不知道如何使用它。我很想知道是否有其他方法比jsoup更好。
我正在用HTML发送一封电子邮件时事通讯。在HTML中,我有如下内容
对于上面的html内容,我如何使用Jsoup解析并获取文本 当我使用 我得到了这样的东西
我在stackoverflow上看了其他一些答案,但没有找到一个回答我问题的答案。 我有一个变量工具快捷方式,它是由对象数组组成的对象: 我试图为对象中的每个元素(上面对象中的3个元素)返回一些HTML。因为我使用的是带有return语句的for循环,所以只显示每个数组的前几个元素(3个元素中的2个)。如何显示所有三个元素?
我在输入上使用typeahead发送建议。。一切都很好,但我知道我想动态创建HTML表并在表中赋值,所以在我的java脚本中,我声明HTML变量来存储表,然后在返回值时使用这个变量。当我将html保存在变量中时,问题就出现了,它开始给出错误,即变量未定义,当我将它放在引号中时,返回时,它只显示变量。。我不知道该怎么做,也不知道应该使用哪种方法,但我被困在了这一点上,我的HTML正在工作 这里是我的