Java GC机制小结之CMS触发条件

常小白

2023-12-01

CMS GC是Mostly Concurrent收集器，全称为Concurrent Mark Sweep GC，是一种以获取最短停顿时间为目标的收集器。CMS的设计初衷和目的是为了消除Parallel GC和Serial Old GC在Full GC时长时间的停顿，从名字（Mark Sweep）上就可以看出，CMS GC是基于标记-清除算法实现，这也导致服务长时间运行会有严重的内存碎片化问题。另外，算法在实现上也会比较复杂。

CMS GC从触发机制上可以分为Background Collector和Foregroud Collector（其实还有Mark-Sweep-Compact Collector，MSC Collector，也就是Java GC机制小结所示图片中，位于Serial Old下方的MSC，只是该算法基本可以认为就是Serial Old GC）。值的注意的是Foreground Collector已经在Java 9中被移除了。

Background Collector

Backgroud collector是通过CMS后台线程不断的去扫描，过程中主要是判断是否符合background collector的触发条件，一旦有符合的情况，就会进行一次background的collect。

void ConcurrentMarkSweepThread::run() {
... // 省略
  while (!_should_terminate) {
    sleepBeforeNextCycle();
    if (_should_terminate) break;
    GCCause::Cause cause = _collector->_full_gc_requested ?
      _collector->_full_gc_cause : GCCause::_cms_concurrent_mark;
    _collector->collect_in_background(false, cause);
  }
... // 省略
}

每次扫描过程中，先等CMSWaitDuration（默认值是2s）时间，然后再去进行一次shouldConcurrentCollect判断，看是否满足CMS background collector的触发条件。

void ConcurrentMarkSweepThread::sleepBeforeNextCycle() {
  while (!_should_terminate) {
    if (CMSIncrementalMode) {
      icms_wait();
      if(CMSWaitDuration >= 0) {
        // Wait until the next synchronous GC, a concurrent full gc
        // request or a timeout, whichever is earlier.
        wait_on_cms_lock_for_scavenge(CMSWaitDuration);
      }
      return;
    } else {
      if(CMSWaitDuration >= 0) {
        // Wait until the next synchronous GC, a concurrent full gc
        // request or a timeout, whichever is earlier.
        wait_on_cms_lock_for_scavenge(CMSWaitDuration);
      } else {
        // Wait until any cms_lock event or check interval not to call shouldConcurrentCollect permanently
        wait_on_cms_lock(CMSCheckInterval);
      }
    }
    // Check if we should start a CMS collection cycle
    if (_collector->shouldConcurrentCollect()) {
      return;
    }
    // .. collection criterion not yet met, let's go back
    // and wait some more
  }
}

让我们来看看shouldConcurrentCollect方法中有哪些条件呢？

bool CMSCollector::shouldConcurrentCollect() {
  // 第一种触发情况，是否并行Full GC
  if (_full_gc_requested) {
    if (Verbose && PrintGCDetails) {
      gclog_or_tty->print_cr("CMSCollector: collect because of explicit "
                             " gc request (or gc_locker)");
    }
    return true;
  }

  // For debugging purposes, change the type of collection.
  // If the rotation is not on the concurrent collection
  // type, don't start a concurrent collection.
  NOT_PRODUCT(
    if (RotateCMSCollectionTypes &&
        (_cmsGen->debug_collection_type() !=
          ConcurrentMarkSweepGeneration::Concurrent_collection_type)) {
      assert(_cmsGen->debug_collection_type() !=
        ConcurrentMarkSweepGeneration::Unknown_collection_type,
        "Bad cms collection type");
      return false;
    }
  )

  FreelistLocker x(this);
  // ------------------------------------------------------------------
  // Print out lots of information which affects the initiation of
  // a collection.
  if (PrintCMSInitiationStatistics && stats().valid()) {
    gclog_or_tty->print("CMSCollector shouldConcurrentCollect: ");
    gclog_or_tty->stamp();
    gclog_or_tty->cr();
    stats().print_on(gclog_or_tty);
    gclog_or_tty->print_cr("time_until_cms_gen_full %3.7f",
      stats().time_until_cms_gen_full());
    gclog_or_tty->print_cr("free=" SIZE_FORMAT, _cmsGen->free());
    gclog_or_tty->print_cr("contiguous_available=" SIZE_FORMAT,
                           _cmsGen->contiguous_available());
    gclog_or_tty->print_cr("promotion_rate=%g", stats().promotion_rate());
    gclog_or_tty->print_cr("cms_allocation_rate=%g", stats().cms_allocation_rate());
    gclog_or_tty->print_cr("occupancy=%3.7f", _cmsGen->occupancy());
    gclog_or_tty->print_cr("initiatingOccupancy=%3.7f", _cmsGen->initiating_occupancy());
    gclog_or_tty->print_cr("cms_time_since_begin=%3.7f", stats().cms_time_since_begin());
    gclog_or_tty->print_cr("cms_time_since_end=%3.7f", stats().cms_time_since_end());
    gclog_or_tty->print_cr("metadata initialized %d",
      MetaspaceGC::should_concurrent_collect());
  }
  // ------------------------------------------------------------------

  // If the estimated time to complete a cms collection (cms_duration())
  // is less than the estimated time remaining until the cms generation
  // is full, start a collection.
  // 第二种触发条件，即基于运行时收集的数据来启动CMS垃圾收集，强烈建议开启UseCMSInitiatingOccupancyOnly
  if (!UseCMSInitiatingOccupancyOnly) {
    if (stats().valid()) {
      if (stats().time_until_cms_start() == 0.0) {
        return true;
      }
    } else {
      // We want to conservatively collect somewhat early in order
      // to try and "bootstrap" our CMS/promotion statistics;
      // this branch will not fire after the first successful CMS
      // collection because the stats should then be valid.
      if (_cmsGen->occupancy() >= _bootstrap_occupancy) {
        if (Verbose && PrintGCDetails) {
          gclog_or_tty->print_cr(
            " CMSCollector: collect for bootstrapping statistics:"
            " occupancy = %f, boot occupancy = %f", _cmsGen->occupancy(),
            _bootstrap_occupancy);
        }
        return true;
      }
    }
  }

  // Otherwise, we start a collection cycle if
  // old gen want a collection cycle started. Each may use
  // an appropriate criterion for making this decision.
  // XXX We need to make sure that the gen expansion
  // criterion dovetails well with this. XXX NEED TO FIX THIS
  // 第三种触发条件，根据内存使用是否满足回收条件进行判定
  if (_cmsGen->should_concurrent_collect()) {
    if (Verbose && PrintGCDetails) {
      gclog_or_tty->print_cr("CMS old gen initiated");
    }
    return true;
  }

  // We start a collection if we believe an incremental collection may fail;
  // this is not likely to be productive in practice because it's probably too
  // late anyway.
  GenCollectedHeap* gch = GenCollectedHeap::heap();
  assert(gch->collector_policy()->is_two_generation_policy(),
         "You may want to check the correctness of the following");
  // 第四种触发条件，根据增量模式收集是否失败决定是否回收
  if (gch->incremental_collection_will_fail(true /* consult_young */)) {
    if (Verbose && PrintGCDetails) {
      gclog_or_tty->print("CMSCollector: collect because incremental collection will fail ");
    }
    return true;
  }

  // 第五种触发条件，根据元空间的内存使用决定是否回收
  if (MetaspaceGC::should_concurrent_collect()) {
    if (Verbose && PrintGCDetails) {
      gclog_or_tty->print("CMSCollector: collect for metadata allocation ");
    }
    return true;
  }

  // CMSTriggerInterval starts a CMS cycle if enough time has passed.
  // 第六种触发条件：
  // 1. 如果配置为0，则返回true；
  //  2.如果自上次CMS触发到现在为止的时间大于触发周期，返回true。
  if (CMSTriggerInterval >= 0) {
    if (CMSTriggerInterval == 0) {
      // Trigger always
      return true;
    }

    // Check the CMS time since begin (we do not check the stats validity
    // as we want to be able to trigger the first CMS cycle as well)
    if (stats().cms_time_since_begin() >= (CMSTriggerInterval / ((double) MILLIUNITS))) {
      if (Verbose && PrintGCDetails) {
        if (stats().valid()) {
          gclog_or_tty->print_cr("CMSCollector: collect because of trigger interval (time since last begin %3.7f secs)",
                                 stats().cms_time_since_begin());
        } else {
          gclog_or_tty->print_cr("CMSCollector: collect because of trigger interval (first collection)");
        }
      }
      return true;
    }
  }

  return false;
}

从上述代码可知，从大类上分，background collector一共有6种触发情况：

是否是并行Full GC，指的是在GC cause是gclocker且配置了GCLockInvokesConcurrent参数，或者GC cause是javalangsystemgc（就是System.gc()调用）且配置了ExplicitGCInvokesConcurrent参数，这时会触发一次background collector；
根据运行时数据动态计算是否需要GC（仅当未配置UseCMSInitiatingOccupancyOnly时），强烈建议配置参数UseCMSInitiatingOccupancyOnly，判断逻辑是，如果预测CMS GC完成所需的时间大于预计的老年代将要填满的时间，则进行GC。这些判断是需要基于历史的CMS GC统计指标，然而，第一次CMS GC时，统计数据还没有形成，是无效的，这时会根据Old Gen的使用占比来判断是否需要进行GC。

 // If the estimated time to complete a cms collection (cms_duration())
 // is less than the estimated time remaining until the cms generation
 // is full, start a collection.
 if (!UseCMSInitiatingOccupancyOnly) {
   if (stats().valid()) {
     if (stats().time_until_cms_start() == 0.0) {
       return true;
     }
   } else {
     // We want to conservatively collect somewhat early in order
     // to try and "bootstrap" our CMS/promotion statistics;
     // this branch will not fire after the first successful CMS
     // collection because the stats should then be valid.
     if (_cmsGen->occupancy() >= _bootstrap_occupancy) {
       if (Verbose && PrintGCDetails) {
         gclog_or_tty->print_cr(
           " CMSCollector: collect for bootstrapping statistics:"
           " occupancy = %f, boot occupancy = %f", _cmsGen->occupancy(),
           _bootstrap_occupancy);
       }
       return true;
     }
   }
 }

那占多比率呢（也就是_bootstrap_occupancy的值是多少呢）？答案是50%。或者你已经遇到类似案例，在没有配置UseCMSInitiatingOccupancyOnly时，发现老年代占比到50%就进行了一次CMS GC；

根据Old Gen情况来判断，

// We should be conservative in starting a collection cycle.  To
// start too eagerly runs the risk of collecting too often in the
// extreme.  To collect too rarely falls back on full collections,
// which works, even if not optimum in terms of concurrent work.
// As a work around for too eagerly collecting, use the flag
// UseCMSInitiatingOccupancyOnly.  This also has the advantage of
// giving the user an easily understandable way of controlling the
// collections.
// We want to start a new collection cycle if any of the following
// conditions hold:
// . our current occupancy exceeds the configured initiating occupancy
//   for this generation, or
// . we recently needed to expand this space and have not, since that
//   expansion, done a collection of this generation, or
// . the underlying space believes that it may be a good idea to initiate
//   a concurrent collection (this may be based on criteria such as the
//   following: the space uses linear allocation and linear allocation is
//   going to fail, or there is believed to be excessive fragmentation in
//   the generation, etc... or ...
// [.(currently done by CMSCollector::shouldConcurrentCollect() only for
//   the case of the old generation; see CR 6543076):
//   we may be approaching a point at which allocation requests may fail because
//   we will be out of sufficient free space given allocation rate estimates.]
bool ConcurrentMarkSweepGeneration::should_concurrent_collect() const {

 assert_lock_strong(freelistLock());
 if (occupancy() > initiating_occupancy()) {
   if (PrintGCDetails && Verbose) {
     gclog_or_tty->print(" %s: collect because of occupancy %f / %f  ",
       short_name(), occupancy(), initiating_occupancy());
   }
   return true;
 }
 if (UseCMSInitiatingOccupancyOnly) {
   return false;
 }
 if (expansion_cause() == CMSExpansionCause::_satisfy_allocation) {
   if (PrintGCDetails && Verbose) {
     gclog_or_tty->print(" %s: collect because expanded for allocation ",
       short_name());
   }
   return true;
 }
 if (_cmsSpace->should_concurrent_collect()) {
   if (PrintGCDetails && Verbose) {
     gclog_or_tty->print(" %s: collect because cmsSpace says so ",
       short_name());
   }
   return true;
 }
 return false;
}

从源码来看，有四种情况：
a. Old Gen空间使用占比情况与阀值比较，如果大于阀值则进行CMS GC也就是occupancy() > initiating_occupancy()，occupancy毫无疑问是Old Gen当前空间的使用占比，而initiating_occupancy是多少呢？

void ConcurrentMarkSweepGeneration::init_initiating_occupancy(intx io, uintx tr) {
  assert(io <= 100 && tr <= 100, "Check the arguments");
  if (io >= 0) {
    _initiating_occupancy = (double)io / 100.0;
  } else {
    _initiating_occupancy = ((100 - MinHeapFreeRatio) +
                             (double)(tr * MinHeapFreeRatio) / 100.0)
                            / 100.0;
  }
}

可以看到当io(该值就是CMSInitiatingOccupancyFraction，tr是CMSTriggerRatio)参数配置值大于0，就是io / 100.0，当CMSInitiatingOccupancyFraction参数配置值小于0时（默认是-1），是((100 - MinHeapFreeRatio) +(double)(tr * MinHeapFreeRatio) / 100.0) / 100.0，这到底是多少呢？是92%，或者你曾经了解过CMSInitiatingOccupancyFraction的默认值是92，但是其实CMSInitiatingOccupancyFraction没有配置是-1，所以阀值取后者92%，并不是CMSInitiatingOccupancyFraction的值是92。
另外三种情况还在调试源码中，o(╯□╰)o，以后再做补全。

根据增量GC是否可能会失败，什么意思呢？两代GC体系中，主要指的是Young GC是否会失败。如果Young GC已经失败或者可能会失败，JVM就认为需要进行一次CMS GC。

  // Returns true if an incremental collection is likely to fail.
  // We optionally consult the young gen, if asked to do so;
  // otherwise we base our answer on whether the previous incremental
  // collection attempt failed with no corrective action as of yet.
  bool incremental_collection_will_fail(bool consult_young) {
    // Assumes a 2-generation system; the first disjunct remembers if an
    // incremental collection failed, even when we thought (second disjunct)
    // that it would not.
    assert(heap()->collector_policy()->is_two_generation_policy(),
           "the following definition may not be suitable for an n(>2)-generation system");
    return incremental_collection_failed() ||
           (consult_young && !get_gen(0)->collection_attempt_is_safe());
  }

我们看两个判断条件，incremental_collection_failed和!get_gen(0)->collection_attempt_is_safe()，其中incremental_collection_failed这里指的是Young GC已经失败，至于为什么会失败一般是因为Old Gen没有足够的空间来容纳晋升的对象，!get_gen(0)->collection_attempt_is_safe()指的是新生代晋升是否安全。通过判断当前 Old Gen 剩余的空间大小是否足够容纳 Young GC 晋升的对象大小。
Young GC 到底要晋升多少是无法提前知道的，因此，这里通过统计平均每次 Young GC 晋升的大小和当前 Young GC 可能晋升的最大大小来进行比较：

// av_promo 是平均每次 YoungGC 晋升的大小，max_promotion_in_bytes 是当前可能的最大晋升大小（ eden+from 当前使用空间的大小）
bool   res = (available >= av_promo) || (available >= max_promotion_in_bytes);

根据MetaSpace的情况判断，这里主要是看Metaspace的shouldconcurrent_collect标志，这个标志在meta space进行扩容前如果配置了CMSClassUnloadingEnabled参数时，会进行设置。这种情况下就会进行一次CMS GC。因此经常会有应用启动不久，Old Gen空间占比还很小的情况下，进行了一次CMS GC，让你很莫名其妙，其实就是这个原因导致的。
根据参数CMSTriggerInterval来判断：
a. 如果参数CMSTriggerInterval配置为0，则返回true；
b. 否则如果自上次CMS触发到现在为止的时间大于触发周期，返回true。

Foreground Collector

相对background collector繁杂的触发条件，foreground collector的触发条件相对来说简单的多。一般是遇到对象分配但是空间不够就会直接触发GC，来立即进行空间回收。采用的算法是mark sweep，不进行压缩，值的注意的是Foreground Collector可以对Background Collector进行抢占。如下代码是JVM中判断CMS回收类型的函数，即是进行MSC还是进行Foreground Collector：

// A work method used by foreground collection to determine
// what type of collection (compacting or not, continuing or fresh)
// it should do.
// NOTE: the intent is to make UseCMSCompactAtFullCollection
// and CMSCompactWhenClearAllSoftRefs the default in the future
// and do away with the flags after a suitable period.
void CMSCollector::decide_foreground_collection_type(
  bool clear_all_soft_refs, bool* should_compact,
  bool* should_start_over) {
  // Normally, we'll compact only if the UseCMSCompactAtFullCollection
  // flag is set, and we have either requested a System.gc() or
  // the number of full gc's since the last concurrent cycle
  // has exceeded the threshold set by CMSFullGCsBeforeCompaction,
  // or if an incremental collection has failed
  GenCollectedHeap* gch = GenCollectedHeap::heap();
  assert(gch->collector_policy()->is_two_generation_policy(),
         "You may want to check the correctness of the following");
  // Inform cms gen if this was due to partial collection failing.
  // The CMS gen may use this fact to determine its expansion policy.
  if (gch->incremental_collection_will_fail(false /* don't consult_young */)) {
    assert(!_cmsGen->incremental_collection_failed(),
           "Should have been noticed, reacted to and cleared");
    _cmsGen->set_incremental_collection_failed();
  }
  *should_compact =
    UseCMSCompactAtFullCollection &&
    ((_full_gcs_since_conc_gc >= CMSFullGCsBeforeCompaction) ||
     GCCause::is_user_requested_gc(gch->gc_cause()) ||
     gch->incremental_collection_will_fail(true /* consult_young */));
  *should_start_over = false;
  if (clear_all_soft_refs && !*should_compact) {
    // We are about to do a last ditch collection attempt
    // so it would normally make sense to do a compaction
    // to reclaim as much space as possible.
    if (CMSCompactWhenClearAllSoftRefs) {
      // Default: The rationale is that in this case either
      // we are past the final marking phase, in which case
      // we'd have to start over, or so little has been done
      // that there's little point in saving that work. Compaction
      // appears to be the sensible choice in either case.
      *should_compact = true;
    } else {
      // We have been asked to clear all soft refs, but not to
      // compact. Make sure that we aren't past the final checkpoint
      // phase, for that is where we process soft refs. If we are already
      // past that phase, we'll need to redo the refs discovery phase and
      // if necessary clear soft refs that weren't previously
      // cleared. We do so by remembering the phase in which
      // we came in, and if we are past the refs processing
      // phase, we'll choose to just redo the mark-sweep
      // collection from scratch.
      if (_collectorState > FinalMarking) {
        // We are past the refs processing phase;
        // start over and do a fresh synchronous CMS cycle
        _collectorState = Resetting; // skip to reset to start new cycle
        reset(false /* == !asynch */);
        *should_start_over = true;
      } // else we can continue a possibly ongoing current cycle
    }
  }
}

从decide_foreground_collection_type函数中可以看到主要有4种情况会进行Compact，即Full GC：

Foreground Collector和Compact的Full GC 次数，这里说的次数是指上次Background Collector之后，Forground Collector和Compact的Full GC的次数，只要次数大于等于CMSFullGCsBeforeCompaction参数阈值，就表示可以进行一次压缩式的 Full GC（CMSFullGCsBeforeCompaction 参数默认是 0，意味着默认是要进行压缩式的 Full GC）。
GCCause是否是用户请求式触发导致，用户请求式触发导致的 GCCause 指的是 _java_lang_system_gc（即 System.gc()）或者 _jvmti_force_gc（即 JVMTI 方式的强制 GC）
意味着只要是 System.gc（前提没有配置 ExplicitGCInvokesConcurrent 参数）调用或者 JVMTI 方式的强制 GC 都会进行一次压缩式的 Full GC。
增量GC是否可能失败，就是上文中提到过的在两代式的GC体系中，incremental_collection_failed()和 !get_gen(0)->collection_attempt_is_safe()。
是否清理所有SoftReference，SoftReference 软引用，你应该了解它的特性，一般是在内存不够的时候，GC 会回收相关对象内存。这里说的就是需要回收所有软引用的情况，在配置了CMSCompactWhenClearAllSoftRefs 参数的情况下，会进行一次压缩式的 Full GC。

注意：
JDK 1.9 有变更：
彻底去掉了 CMS forground collector 的功能，也就是说除了 background collector，就是压缩式的 Full GC。自然（UseCMSCompactAtFullCollection、CMSFullGCsBeforeCompaction 这两个参数也已经不在支持了。

参考：
JVM 源码解读之 CMS 何时会进行 Full GC
JVM 源码解读之 CMS GC 触发条件

Java GC机制小结之CMS触发条件

Background Collector

Foreground Collector

相关阅读

相关文章

相关问答

相关文档