当前位置: 首页 > 工具软件 > 易读小说 > 使用案例 >

采集规则七:河溪小说网 www.518cqdl.com 适用于-易读系统小说站河溪小说网的采集规则

温星华
2023-12-01

有朋友说不会替换和查找过滤,那我就一个一个站弄下吧。没多少时间,一天发一个吧,这次是雯雯文学。

首先要过滤掉他网站的广告。过滤信息在 <PubContentText>这。可以参考下。也许还有我不知道的广告,你们可以进他的网站内页多点一下找一下看看。www.518cqdl.com

这个规则易读的采集器是可以适应的。关关不知道是否可以用。



<?xml version="1.0" encoding="UTF-8"?>
<RuleConfigInfo xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="https://www.w3.org/2001/XMLSchema">
 <NovelIntro>
  <RegexName>NovelIntro</RegexName>
  <Pattern>&lt;meta property="og:description" content="((.|\n)*?)"/&gt;</Pattern>
  <Method/>
  <FilterPattern/>
  <Options/>
 </NovelIntro>
 <PubContentText>
  <RegexName>PubContentText</RegexName>
  <Pattern>&lt;div id="content"&gt;((.|\n)*?)&lt;/div&gt;</Pattern>
  <Method/>
  <FilterPattern>河溪小说
手机站-m.518cqd.com 
www.518cqdL.com
m.518cqdL.com
&lt;script.+?&lt;/script&gt;|&lt;div.+?&gt;|&lt;/div&gt;|&lt;p&gt;|&lt;/p&gt;
【&lt;b&gt;(.|\n)*?&lt;/B&gt;】♂</FilterPattern>
  <Options/>
 </PubContentText>
 <NovelSearchUrl>
  <RegexName>NovelSearchUrl</RegexName>
  <Pattern/>
  <Method/>
  <FilterPattern/>
  <Options/>
 </NovelSearchUrl>
 <NovelList_GetNovelKey>
  <RegexName>NovelList_GetNovelKey</RegexName>
  <Pattern>&lt;span class="s2"&gt;&lt;a href="/info/.+?/(.+?).html"&gt;.+?&lt;/a&gt;</Pattern>
  
  <Method/>
  <FilterPattern/>
  <Options/>
 </NovelList_GetNovelKey>
 <NovelListUrl>
  <RegexName>NovelListUrl</RegexName>
  <Pattern>https://www.518cqdL.com/list/1.html
https://www.518cqdL.com/list/2.html
https://www.518cqdL.com/list/3.html
https://www.518cqdL.com/list/4.html
https://www.518cqdL.com/list/5.html
https://www.518cqdL.com/list/6.html
https://www.518cqdL.com/list/7.html
https://www.518cqdL.com/list/8.html
https://www.518cqdL.com/list/9.html
https://www.518cqdL.com/list/10.html</Pattern>
  <Method/>
  <FilterPattern/>
  <Options/>
 </NovelListUrl>
 <PubChapterRegion>
  <RegexName>PubChapterRegion</RegexName>
  <Pattern/>
  <Method/>
  <FilterPattern/>
  <Options/>
 </PubChapterRegion>
 <NovelName>
  <RegexName>NovelName</RegexName>
  <Pattern>&lt;meta property="og:title" content="(.+?)"/&gt;</Pattern>
  <Method/>
  <FilterPattern/>
  <Options/>
 </NovelName>
 <NovelSearch_GetNovelName>
  <RegexName>NovelSearch_GetNovelName</RegexName>
  <Pattern/>
  <Method/>
  <FilterPattern/>
  <Options/>
 </NovelSearch_GetNovelName>
 <NovelList_GetNovelKey2>
  <RegexName>NovelList_GetNovelKey2</RegexName>
  <Pattern/>
  <Method/>
  <FilterPattern/>
  <Options/>
 </NovelList_GetNovelKey2>
 <LagerSort>
  <RegexName>LagerSort</RegexName>
  <Pattern>&lt;meta property="og:novel:category" content="(.+?)"/&gt;</Pattern>
  <Method/>
  <FilterPattern/>
  <Options/>
 </LagerSort>
 <SmallSort>
  <RegexName>SmallSort</RegexName>
  <Pattern>&lt;meta property="og:novel:category" content="(.+?)"/&gt;</Pattern>
  <Method/>
  <FilterPattern/>
  <Options/>
 </SmallSort>
 <GetSiteUrl>
  <RegexName>GetSiteUrl</RegexName>
  <Pattern>https://www.518cqdL.com</Pattern>
  <Method/>
  <FilterPattern/>
  <Options/>
 </GetSiteUrl>
 <TestSearchNovelName>
  <RegexName>TestSearchNovelName</RegexName>
  <Pattern/>
  <Method/>
  <FilterPattern/>
  <Options/>
 </TestSearchNovelName>
 <NovelDegree>
  <RegexName>NovelDegree</RegexName>
  <Pattern>&lt;meta property="og:novel:status" content="(.+?)"/&gt;</Pattern>
  <Method/>
  <FilterPattern/>
  <Options/>
 </NovelDegree>
 <PubContentText_FT2JT>
  <RegexName>PubContentText_FT2JT</RegexName>
  <Pattern>false</Pattern>
  <Method/>
  <FilterPattern/>
  <Options/>
 </PubContentText_FT2JT>
 <NovelAuthor>
  <RegexName>NovelAuthor</RegexName>
  <Pattern>&lt;meta property="og:novel:author" content="(.+?)"/&gt;</Pattern>
  <Method/>
  <FilterPattern/>
  <Options/>
 </NovelAuthor>
 <NovelInfo_GetNovelPubKey>
  <RegexName>NovelInfo_GetNovelPubKey</RegexName>
  <Pattern>&lt;meta property="og:novel:read_url" content="(.+?)"/&gt;</Pattern>
  <Method/>
  <FilterPattern/>
  <Options/>
 </NovelInfo_GetNovelPubKey>
 <PubContentText_ASCII>
  <RegexName>PubContentText_ASCII</RegexName>
  <Pattern>false</Pattern>
  <Method/>
  <FilterPattern/>
  <Options/>
 </PubContentText_ASCII>
 <NovelCover>
  <RegexName>NovelCover</RegexName>
  <Pattern>&lt;meta property="og:image" content="(.+?)"/&gt;</Pattern>
  <Method/>
  <FilterPattern/>
  <Options/>
 </NovelCover>
 <RuleVersion>
  <RegexName>RuleVersion</RegexName>
  <Pattern>2</Pattern>
  <Method/>
  <FilterPattern/>
  <Options/>
 </RuleVersion>
 <PubContentText_BJ2QJ>
  <RegexName>PubContentText_BJ2QJ</RegexName>
  <Pattern>false</Pattern>
  <Method/>
  <FilterPattern/>
  <Options/>
 </PubContentText_BJ2QJ>
 <NovelInfoExtra>
  <RegexName>NovelInfoExtra</RegexName>
  <Pattern/>
  <Method/>
  <FilterPattern/>
  <Options/>
 </NovelInfoExtra>
 <PubIndexUrl>
  <RegexName>PubIndexUrl</RegexName>
  <Pattern>{NovelPubKey}</Pattern>
  <Method/>
  <FilterPattern/>
  <Options/>
 </PubIndexUrl>
 <NovelDefaultCoverUrl>
  <RegexName>NovelDefaultCoverUrl</RegexName>
  <Pattern>https://www.518cqdL.com/cover/nocover.jpg</Pattern>
  <Method/>
  <FilterPattern/>
  <Options/>
 </NovelDefaultCoverUrl>
 <PubContentUrl2>
  <RegexName>PubContentUrl2</RegexName>
  <Pattern/>
  <Method/>
  <FilterPattern/>
  <Options/>
 </PubContentUrl2>
 <PubContentUrl>
  <RegexName>PubContentUrl</RegexName>
  <Pattern>{ChapterKey}</Pattern>
  <Method/>
  <FilterPattern/>
  <Options/>
 </PubContentUrl>
 <GetSiteName>
  <RegexName>GetSiteName</RegexName>
  <Pattern>518cqdL.com</Pattern>
  <Method/>
  <FilterPattern/>
  <Options/>
 </GetSiteName>
 <PubChapterName>
  <RegexName>PubChapterName</RegexName>
  <Pattern>&lt;a href=".+?" title=".+?"&gt;(.+?)&lt;/a&gt;</Pattern>
  <Method/>
  <FilterPattern/>
  <Options/>
 </PubChapterName>
 <GetSiteCharset>
  <RegexName>GetSiteCharset</RegexName>
  <Pattern>utf8</Pattern>
  <Method/>
  <FilterPattern/>
  <Options/>
 </GetSiteCharset>
 <PubChapter_GetChapterKey>
  <RegexName>PubChapter_GetChapterKey</RegexName>
  <Pattern>&lt;a href="(.+?)" title=".+?"&gt;.+?&lt;/a&gt;</Pattern>
  <Method/>
  <FilterPattern/>
  <Options/>
 </PubChapter_GetChapterKey>
 <NovelSearch_GetNovelKey>
  <RegexName>NovelSearch_GetNovelKey</RegexName>
  <Pattern/>
  <Method/>
  <FilterPattern/>
  <Options/>
 </NovelSearch_GetNovelKey>
 <NovelKeyword>
  <RegexName>NovelKeyword</RegexName>
  <Pattern/>
  <Method/>
  <FilterPattern/>
  <Options/>
 </NovelKeyword>
 <NovelUrl>
  <RegexName>NovelUrl</RegexName>
  <Pattern>https://www.518cqdL.com/info/10/{NovelKey}.html</Pattern>
  <Method/>
  <FilterPattern/>
  <Options/>
 </NovelUrl>
</RuleConfigInfo>

​过滤这,我没多看,需要这个采集规则的可以去多看下他的小说内容页面,看下他加了什么广告内容么。

易读站不多,我找了下找到一些:

www.vgango.com
www.dosrojos.com
www.aavpccv.com
www.infected-mushroom.net
www.peoLpLe.com
www.hexaworLd.net
www.athomechecking.com
www.888cqdL.cn
www.666cqdL.cn
www.178cqdL.cn
www.next-bet.com
www.cosender.com
www.vivaLuta.com
www.sandyall.com

这些网站都可以用这个规则进行套,改下过滤和域名就可以了。
 

 类似资料: