这个类主要是描述爬虫退出的原因
public enum ExitStatus {
/**
* The maximum number of states is reached as defined in
* {@link CrawljaxConfiguration#getMaximumStates()}.
*/
MAX_STATES("Maximum states passed"),
/**
* The maximum crawl time is reached as defined in
* {@link CrawljaxConfiguration#getMaximumRuntime()}.
*/
MAX_TIME("Maximum time passed"),
/**
* The crawl is done.
*/
EXHAUSTED("Exausted"),
/**
* The crawler quite because of an error.
*/
ERROR("Errored"),
/**
* When {@link CrawljaxRunner#stop()} has been called.
*/
STOPPED("Stopped manually");
private final String readableName;
private ExitStatus(String readableName) {
this.readableName = readableName;
}
@Override
public String toString() {
return readableName;
}
}
这是一个枚举类型的类,主要描述了爬虫退出的五种状态:超过了爬虫的最大状态数设置,超过了爬虫设置的最长爬取时间,爬虫已经爬取完毕,手动停止,出错而导致的爬虫终止。
private final CountDownLatch latch = new CountDownLatch(1);
private final AtomicInteger states = new AtomicInteger();
private final int maxStates;
private ExitStatus reason = ExitStatus.ERROR;
public ExitNotifier(int maxStates) {
this.maxStates = maxStates;
}
<pre name="code" class="java">public ExitStatus awaitTermination() throws InterruptedException {
latch.await();
return reason;
}
/**
* @return The new number of states.
*/
public int incrementNumberOfStates() {
int count = states.incrementAndGet();
if (count == maxStates) {
reason = ExitStatus.MAX_STATES;
latch.countDown();
}
return count;
}
public void signalTimeIsUp() {
reason = ExitStatus.MAX_TIME;
latch.countDown();
}
/**
* Signal that all {@link CrawlTaskConsumer}s are done.
*/
public void signalCrawlExhausted() {
reason = ExitStatus.EXHAUSTED;
latch.countDown();
}
/**
* Manually stop the crawl.
*/
public void stop() {
reason = ExitStatus.STOPPED;
latch.countDown();
}
上面四个接口分别是当相应的终止状态发生时,将latch数减1,使得程序终止
boolean isExitCalled() {
return latch.getCount() == 0;
}
最后一个接口用来说明是否应该终止程序