JVM garbage collection notes

This is how garbage collection works. While some nod to JVM versions is given, most of "the way it was" is sacrificed in these notes and the assumption is made that Java 8 or later is the version. For historical diversion, please read someone else's articles. They abound anyway.

The JVM heap is divided into two generations:

The young generation is futher subdivided into two sections:

(There are also virtual spaces for both generations and used by garbage collectors to resize other regions.)

I found and stole this illustration from Grzegorz Mirek:

Minor garbage collection

Objects are usually created and used for short periods of time. This maxime underpins the weak-generational hypothesis, a theory of garbage collection.

Objects begin their life in Eden (hence the name). When Eden fills up, a so-called minor garbage collection is done. This is disruptive:

  1. All application threads are halted (called the "stop the world" pause).
  2. Objects that are no longer in use are discarded.
  3. All other objects are removed from Eden into the first survivor space (S0).
  4. The next time minor garbage collection is done, the survivor objects are relocated to a second survivor space (S1).
  5. The third time minor garbage collection is done, the objects relocated to S1 are moved back to S0 (if still in use and not just discarded).

Why toss objects back and forth between S1 and S0? When the object reaches a measurable threshold, i.e.: a configured number of times tossed between S0 and S1, then it's promoted to old or tenured generation for more permanent location (since it's "proved" itself to be less likely to be in need of being collected).

Major garbage collection

Ultimately, old generation will fill up at which time major garbage collection will be performed to clean up and compact that space. It's how major garbage collection is done, how stop-the-world pauses occur, that depends on the specific garbage-collection algorithm chosen.

Full garbage collection

Beside minor and major garbage collection, there is also a full garbage collection that's all about cleaning up the entire heap (both young and old generations). Full garbage collection is triggered by minor garbage collection. For this reason, what's formally termed major garbage collection in the explanation above is often passed upon in silence and full garbage collection is discussed instead.

See this article by Nikita Salnikov-Tarnovski, entitled Minor vs. Major vs. Full Garbage Collection.

There are two advantages to dividing the heap into two regions as described. First, it's always faster to have to process only some portion of the heap (one of the two divisions). That way, the infamous stop-the-world pauses are greatly reduced in number and in length of interruption. Second, during minor garbage collection, all objects from Eden are either moved or discarded. This automatically means that Eden remains a part of the heap that is always compacted (i.e.: not fragmented).

Types of garbage collection

There are three:

  1. serial
  2. parallel
  3. concurrent
  4. G1GC (!)

Serial

This type is performed by only one thread. This necessitates stop-the-world pauses during both minor and full garbage collection. (Remember that "full" in some discussions often means major as well as full.) The mark/copy algorithm is used for young generation whereas the mark/sweep/compact algorithm is used for old generation.

This type of garbage collection is designed for single-threaded environments, usually client-class hosts, and for relatively small heaps. It's enabled using the -XX:+UseSerialGC option to the JVM.

Parallel

The young generation collection is parallelized by multiple threads making minor garbage collection much faster. This leads to short if more frequent stop-the-world pauses. Since Java 7, old generation is collected this way too (and suffers stop-the-world pauses). Both -XX:+UseParalletGC and -XX:+UseParallelOldGC (since "old" means "pre-Java 7") enable parallel garbage collection.

Parallel collection also uses the mark/copy algorithm for young generation and mark/sweep/compact for old generation. Both are executed by multiple threads as noted. To configure the number of threads, use the -XX:ParallelGCThreads=N option.

Parallel is a good choice whenever throughput is more important than reduced latency.

Concurrent

This sort of garbage collection is designed to minimize stop-the-world pauses, that is, reducing latency, making the application run as responsively as possible.

In this type of garbage collection, minor garbage collection is done using multiple threads and a parallel mark/copy algorithm. All application threads are halted during minor garbage collection. Old generation is collected concurrently (application threads are paused for very short periods of time as the background thread scans the space).

During major garbage collection, concurrent mark/sweep is the algorithm. There's no compact after the sweep and the old (or tenured) space isn't compacted, the memory is left fragmented.

When garbage collection isn't able to fit new objects in memory, because of no compaction, the JVM falls back to the serial mark/sweep/compact algorithm to defragment and and compact old generation and performace degradation occurs with all application threads being halted while a single thread carries out the work on the old space.

This type of collection is done using the -X:+UseConcMarkSweepGC option.

G1GC

Specially, and something I use for NiFi and generally recommend (not that I'm any authority—it's just my practice to do so), is garbage-first garbage collection (G1GC). This breaks the heap down into several regions of fixed size (if still maintaining the generational nature of the heap as a whole). This design rids garbage collection of long, stop-the-world pauses for both young and old generation spaces. Each region is collected separately. Objects are copied from one region to another (regions do not map to generation spaces). This produces a heap that is at all times at least partially compacted. G1GC uses an incremental version of the mark/sweep/compact algorithm. It's enabled using the -XX:+UseG1GC option.

G1GC is an option for Java 8 (parallel is the default for server-class hosts). It will be the default in Java 9. All client-class hosts continue to run serial by default.

Garbage-collection options missing from Java 9

On the eve of JDK 9's release, the following options will no longer be supported:

DefNew + CMS -XX:-UseParNewGC -XX:+UseConcMarkSweepGC
ParNew + SerialOld -XX:+UseParNewGC
ParNew + iCMS -Xincgc
ParNew + iCMS -XX:+CMSIncrementalMode -XX:+UseConcMarkSweepGC
DefNew + iCMS -XX:+CMSIncrementalMode -XX:+UseConcMarkSweepGC -XX:-UseParNewGC
CMS foreground -XX:+UseCMSCompactAtFullCollection
CMS foreground -XX:+CMSFullGCsBeforeCompaction
CMS foreground -XX:+UseCMSCollectionPassing