当前位置: 首页 > 工具软件 > Crawljax > 使用案例 >

crawljax之ExitNotifier类

宰子琪
2023-12-01

这个类主要是描述爬虫退出的原因

public enum ExitStatus {

		/**
		 * The maximum number of states is reached as defined in
		 * {@link CrawljaxConfiguration#getMaximumStates()}.
		 */
		MAX_STATES("Maximum states passed"),

		/**
		 * The maximum crawl time is reached as defined in
		 * {@link CrawljaxConfiguration#getMaximumRuntime()}.
		 */
		MAX_TIME("Maximum time passed"),

		/**
		 * The crawl is done.
		 */
		EXHAUSTED("Exausted"),

		/**
		 * The crawler quite because of an error.
		 */
		ERROR("Errored"),

		/**
		 * When {@link CrawljaxRunner#stop()} has been called.
		 */
		STOPPED("Stopped manually");

		private final String readableName;

		private ExitStatus(String readableName) {
			this.readableName = readableName;

		}

		@Override
		public String toString() {
			return readableName;
		}
	}
这是一个枚举类型的类,主要描述了爬虫退出的五种状态:超过了爬虫的最大状态数设置,超过了爬虫设置的最长爬取时间,爬虫已经爬取完毕,手动停止,出错而导致的爬虫终止。

private final CountDownLatch latch = new CountDownLatch(1);
	private final AtomicInteger states = new AtomicInteger();
	private final int maxStates;

	private ExitStatus reason = ExitStatus.ERROR;

	public ExitNotifier(int maxStates) {
		this.maxStates = maxStates;
	}
<pre name="code" class="java">public ExitStatus awaitTermination() throws InterruptedException {
		latch.await();
		return reason;
	}


 这部分定义了两个同步类,用来协调各个线程之间的工作。latch用于当调用awaitTermination方法后等待相应的终止情况发生,并停止程序的运行。states用于实现状态数的原子增加或减少。同时该类提供了动态设置最大状态数的构造函数,默认情况下的停止原因是程序出错。 

/**
	 * @return The new number of states.
	 */
	public int incrementNumberOfStates() {
		int count = states.incrementAndGet();
		if (count == maxStates) {
			reason = ExitStatus.MAX_STATES;
			latch.countDown();
		}
		return count;
	}

	public void signalTimeIsUp() {
		reason = ExitStatus.MAX_TIME;
		latch.countDown();
	}

	/**
	 * Signal that all {@link CrawlTaskConsumer}s are done.
	 */
	public void signalCrawlExhausted() {
		reason = ExitStatus.EXHAUSTED;
		latch.countDown();
	}

	/**
	 * Manually stop the crawl.
	 */
	public void stop() {
		reason = ExitStatus.STOPPED;
		latch.countDown();
	}
上面四个接口分别是当相应的终止状态发生时,将latch数减1,使得程序终止

boolean isExitCalled() {
		return latch.getCount() == 0;
	}
最后一个接口用来说明是否应该终止程序


 类似资料: