当前位置: 首页 > 工具软件 > Web-Harvest > 使用案例 >

爬虫学习-Web-Harvest

宰父淳
2023-12-01

Web-Harvest
java编写
GUI图形操作界面(直接双击jar包即可)
通过编写xml文件解析并获取网页资源,简单优雅

下载链接:
http://web-harvest.sourceforge.net/download.php

Demo:

<?xml version="1.0" encoding="UTF-8"?>
<config charset="UTF-8">
	
	<var-def name="webPage">
		http://www.meituba.com
	</var-def>
	
	<var-def name="saveLocation">
		D:/home/images
	</var-def>
	
	<loop item="link" index="i" filter="unique">
		<list>
			<xpath expression="//img/@src">
				<html-to-xml>
					<http url="${webPage}" />
				</html-to-xml>
			</xpath>
		</list>
		<body>
			<file action="write" type="binary" path="${saveLocation}/${i}.jpg">
				<http url="${sys.fullUrl(webPage,link)}" />
			</file>
		</body>
	</loop>
</config>

中文api参考https://max.book118.com/html/2018/0524/168288967.shtm

 类似资料: