JVM garbage-collection notes

March 2017
last update:

This is how garbage collection works. While some nod to JVM versions is given, most of "the way it was" is sacrificed in these notes and the assumption is made that Java 8 or later is the version. For historical diversion, please read someone else's articles. They abound anyway.

The JVM heap is divided into two generations:

The young generation is futher subdivided into two sections:

(There are also virtual spaces for both generations and used by garbage collectors to resize other regions.)

I found and stole this illustration from Grzegorz Mirek:

Minor garbage collection

Objects are usually created and used for short periods of time. This maxime underpins the weak-generational hypothesis, a theory of garbage collection.

Objects begin their life in Eden (hence the name). When Eden fills up, a so-called minor garbage collection is done. This is disruptive:

  1. All application threads are halted (called the "stop the world" pause).
  2. Objects that are no longer in use are discarded.
  3. All other objects are removed from Eden into the first survivor space (S0).
  4. The next time minor garbage collection is done, the survivor objects are relocated to a second survivor space (S1).
  5. The third time minor garbage collection is done, the objects relocated to S1 are moved back to S0 (if still in use and not just discarded).

Why toss objects back and forth between S1 and S0? When the object reaches a measurable threshold, i.e.: a configured number of times tossed between S0 and S1, then it's promoted to old or tenured generation for more permanent location (since it's "proved" itself to be less likely to be in need of being collected). This is called the tenuring threshold: if a young-generation region survives and retains enough live objects to avoid being evacuated (moved out of the region with the goal that the region will then be considered free), then this region is promoted, first to survivor, then eventually to tenured.

Major garbage collection

Ultimately, old generation will fill up at which time major garbage collection will be performed to clean up and compact that space. It's how major garbage collection is done, how stop-the-world pauses occur, that depends on the specific garbage-collection algorithm chosen.

Full garbage collection

Beside minor and major garbage collection, there is also a full garbage collection that's all about cleaning up the entire heap (both young and old generations). Full garbage collection is triggered by minor garbage collection. For this reason, what's formally termed major garbage collection in the explanation above is often passed upon in silence and full garbage collection is discussed instead.

See this article by Nikita Salnikov-Tarnovski, entitled Minor vs. Major vs. Full Garbage Collection. See also Garbage Collection in Java (part 4)—Garbage First.

There are two advantages to dividing the heap into two regions as described. First, it's always faster to have to process only some portion of the heap (one of the two divisions). That way, the infamous stop-the-world pauses are greatly reduced in number and in length of interruption. Second, during minor garbage collection, all objects from Eden are either moved or discarded. This automatically means that Eden remains a part of the heap that is always compacted (i.e.: not fragmented).

To put it another way, evacuation failures (when it's not possible to move a region out so that an entire region is left free) happen when there aren't any free regions left in the collector. No free regions means nowhere to evacuate objects. This is why G1GC attempts to compact its regions on the fly instead of waiting for a compaction failure to occur.

Types of garbage collection

There are three:

  1. serial
  2. parallel
  3. concurrent (CMS)
  4. G1GC (!)

Serial

This type is performed by only one thread. This necessitates stop-the-world pauses during both minor and full garbage collection. (Remember that "full" in some discussions often means major as well as full.) The mark/copy algorithm is used for young generation whereas the mark/sweep/compact algorithm is used for old generation.

This type of garbage collection is designed for single-threaded environments, usually client-class hosts, and for relatively small heaps. It's enabled using the -XX:+UseSerialGC option to the JVM.

Parallel

The young generation collection is parallelized by multiple threads making minor garbage collection much faster. This leads to short if more frequent stop-the-world pauses. Since Java 7, old generation is collected this way too (and suffers stop-the-world pauses). Both -XX:+UseParalletGC and -XX:+UseParallelOldGC (since "old" means "pre-Java 7") enable parallel garbage collection.

Parallel collection also uses the mark/copy algorithm for young generation and mark/sweep/compact for old generation. Both are executed by multiple threads as noted. To configure the number of threads, use the -XX:ParallelGCThreads=N option.

Parallel is a good choice whenever throughput is more important than reduced latency.

Concurrent Memory Sweep (CMS)

This sort of garbage collection is designed to minimize stop-the-world pauses, that is, reducing latency, making the application run as responsively as possible. Applications using it prefer short garbage-colleciton pauses and can afford to share resources with the garbage collector while running. If an application has a relatively large set of long-lived data (a large, tenured generation) and is run on machines with two or more processors, it will benefit from this collect. However, once you start using heaps in the 10s or 100s of gigabytes, theGC pause times start to ramp up seriously.

In this type of garbage collection, minor garbage collection is done using multiple threads and a parallel mark/copy algorithm. All application threads are halted during minor garbage collection. Old generation is collected concurrently (application threads are paused for very short periods of time as the background thread scans the space).

The first pause is to mark as live the objects directly reachable from the roots (for example, object references from application-thread stacks and registers, static objects, etc.) and from elsewhere in the heap (for example, the young generation). The first pause is called initial mark pause. The second pause comes at the end of the concurrent tracing phase and finds objects that were missed by the concurrent tracing due to updates by the application threads of references in an object after the collector had finished tracing it. This second pause is called the remark pause.

During major garbage collection, concurrent mark/sweep is the algorithm. There's no compact after the sweep and the old (or tenured) space isn't compacted, the memory is left fragmented.

When garbage collection isn't able to fit new objects in memory, because of no compaction, the JVM falls back to the serial mark/sweep/compact algorithm to defragment and and compact old generation and performace degradation occurs with all application threads being halted while a single thread carries out the work on the old space.

This collector throws an OutOfMemoryError if too much time is spent in garbage colleciton: if more than 98% of the time is spent in garbage collection while less than 2% of the heap is recovered. This is to prevent applications from running for an extended period of time while making little or no progress because the heap is so small. This behavior can be disabled: -XX:-UseGCOverheadLimit.

This type of collection is done using the -X:+UseConcMarkSweepGC option.

G1GC

Beware: many early articles on G1GC speak in consequence of problems this method had before it had evolved long enough to become the default method for Java 9. Not every observed, bad behavior you read remained true for G1GC.

Older types of garbage collection, that might have resulted in longer, stop-the-world pauses work tolerably for service-oriented architecture applications, but not for services (especially clustered applications like Cassandra and Kafka). Indeed, if an application has to stop completely for longer than a few milliseconds, this will likely become a very serious problem for services with a heart beat, gossip contact, etc.

The goal of G1GC is to mark actively regions continually and put off full GC like CMS, then run full GC on a single thread. It's observed that full GC is far less work (than in parallel GC's case). So, average throughput is comparable to parallel GC while not stopping-the-world as does parallel GC and then, when resorting to stopping it, the pause isn't nearly as long as parallel GC. Also, G1GC supports enormous heaps sizes (for example up to 32Gb) as compared to parallel GC which, for the same heap size, takes a very long time to complete full GC.

Specially, and something I use for NiFi and generally recommend (not that I'm any authority—it's just my practice to do so), is garbage-first garbage collection (G1GC). This breaks the heap down into several regions of fixed size (if still maintaining the generational nature of the heap as a whole). This design rids garbage collection of long, stop-the-world pauses for both young and old generation spaces. Each region is collected separately. Objects are copied from one region to another (regions do not map to generation spaces). This produces a heap that is at all times at least partially compacted. G1GC uses an incremental version of the mark/sweep/compact algorithm. It's enabled using the -XX:+UseG1GC option.

G1GC is an option for Java 8 (parallel is the default for server-class hosts). It is the default in Java 9. All client-class hosts continue to run serial by default.

How to know when you need to tune G1GC differently from its defaults? G1GC is intended to avoid full GC as much as possible. It's recommended that if you detect any full GC pauses, you should tune.

Shenandoah

The low pause-time garbage collector performs more collection concurrently with the running Java program, including compaction, making pause times no longer directly proportional to heap size. Collecting a 200Gb or 2Gb heap should have the same behavior.

ZGC

The low-latency garbage collector performs all expensive work concurrently without halting execution of application threads for more than 10ms at a time. It's suitable for applications that require low latency and/or use a very large heap. The heap must be carefully selected to accommodate the live-set of the application and have enough headroom to permit the servicing of allocations while the garbage collector is running. The bigger the heap allocated, the better ZGC works (but, also, the more memory potentially taken away from other processes on the hardware).

Table of JVM garbage collectors

Algorithm JVM argument Java version
Serial -XX:+UseSerialGC Java 1
Parallel -XX:+UseParallelGC Java 1
Concurrent Mark & Sweep (CMS) -XX:+UseConcMarkSweepGC Java 4
G1 (G1GC) -XX:+UseG1GC Java 7u9
Epsilon -XX:+UseEpsilonGC Java 11
Shenandoah -XX:+UseShenandoahGC Java 12
Z (ZGC) -XX:+UseZGC Java 12

On-line garbage-collection analysis tool

Tuning can be tedious, complex and even mysterious, but there's help in a tool at GCEasy. You only need to create a garbage-collection log via this JVM command-line option:

-XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:GC-log-file-path

...then up-load the resulting log file to GCEasy (if very big, a zipped copy will make the process faster).

Garbage-collection options missing from Java 9

On the eve of JDK 9's release, the following options will no longer be supported:

DefNew + CMS -XX:-UseParNewGC -XX:+UseConcMarkSweepGC
ParNew + SerialOld -XX:+UseParNewGC
ParNew + iCMS -Xincgc
ParNew + iCMS -XX:+CMSIncrementalMode -XX:+UseConcMarkSweepGC
DefNew + iCMS -XX:+CMSIncrementalMode -XX:+UseConcMarkSweepGC -XX:-UseParNewGC
CMS foreground -XX:+UseCMSCompactAtFullCollection
CMS foreground -XX:+CMSFullGCsBeforeCompaction
CMS foreground -XX:+UseCMSCollectionPassing

List of tuning parameters for G1GC

...and their defaults (when noted).

  1. -XX:ConcGCThreads=n —number of parallel marking threads. n becomes approximately ¼ of the number of parallel garbage collection threads.
  2. -XX:G1HeapRegionSize=n —a power of 2 ranging from 1 to 32Mb The goal is to have around 2048 regions based on the minimum JVM heap size.
  3. -XX:G1HeapWastePercent=10 —percent of heap you are willing to waste. When the reclaimable percentage is less than the heap-waste percentage, garbage collection is not initiated.
  4. -XX:G1MaxNewSizePercent=60 —percentage of the heap to use for young generation. Originally experimental.
  5. -XX:G1MixedGCCountTarget=8 —sets target number of mixed GCs after a marking cycle to collect old regions with at most G1MixedGCLiveThresholdPercent live data. The goal for mixed collections is to be within this target. Not available in Java HotSpot, build 23.
  6. -XX:G1MixedGCLiveThresholdPercent=65 —occupancy threshold for an old region to be included in a mixed garbage collection cycle. Experimental. Not available in Java HotSpot VM, build 23.
  7. -XX:G1NewSizePercent=5 —percentage of the heap to use as minimum for young generation size. Experimental. Not available in Java HotSpot VM, build 23.
  8. -XX:G1OldCSetRegionThresholdPercent=10 —upper limit on number of old regions to be collected during a mixed garbage collection cycle. Not available in Java HotSpot VM, build 23.
  9. -XX:G1ReservePercent=10 —percentage of reserve memory to keep free to reduce the risk of to-space overflows. When increasing or decreasing, ensure total Java heap is adjusted by the same amount. Not available in Java HotSpot VM, build 23.
  10. -XX:InitiatingHeapOccupancyPercent=45 —heap occupancy threshold that triggers a marking cycle.
  11. -XX:MaxGCPauseMillis=200 —target for desired maximum pause time. This value does not adapt to your heap size.
  12. -XX:ParallelGCThreads=n —number of stop-the-world threads. n becomes the number of logical processors, up to 8.
  13. -XX:GCTimeRatio=nnn —hint to the virtual machine that it's desirable that not more than 1 / ( 1 + nnn ) of the application execution time be spent in the collector, e.g.: if nnn is 19, it sets a goal of 5% of the total time for GC and throughput goal of 95%, that is, the application should get 19 times as much time as the collector.

    By default this is 99, that is that the collector should run for not more than 1% of the total time, thought to be a good choice for server applications. However, a value that is too high will cause the size of the heap to grow to its maximum.

    At one point, this parameter was dropped from G1GC (including documentation) by mistake (see
    JDK-6937160: G1 should observe GCTimeRatio).

Options (already noted above) for tuning mixed garbage collection, that is, organizing both young- and old-generation collections.

  1. -XX:G1HeapWastePercent
  2. -XX:G1MixedGCCountTarget
  3. -XX:G1MixedGCLiveThresholdPercent
  4. -XX:G1OldCSetRegionThresholdPercent
  5. -XX:InitiatingHeapOccupancyPercent

List of JVM memory tuning parameters (reminder)

Example, sets maximum heap size to 14Gb:

$ java -Xmx14G myprogram

Notes:

To avoid using the command line, set these parameters in JVM_OPTS.

The maximum for a 32-bit implementation is 2Gb.

Easy table of optimization options

Option Description
Initial heap memory size -Xms
Maximum heap memory size -Xmx
Size of Young Generation -Xmn
Initial Permanent Generation size -XX:PermSize
Maximum Permanent Generation size -XX:MaxPermSize

Epsilon, the no collector (Java 11)

This garbage collector permits you to allocate as much memory as you want until the initial, allocated heap is used up, then the JVM just shuts down. It's useful to demonstrate a) how much memory your application actually needs and b) just what impact garbage collection has on your application. For example, if you think you only need 4Gb to run in, start your application with -Xmx4g and let it run. If it crashes, rerun it with XX:HeapDumpOnOutOfMemoryError, then look at the dump produced when the JVM stops. Running this garbage collector isn't for production mode unless you completely understand how much memory your application uses and also your application uses memory very statically (i.e.: allocates, but never frees or otherwise creates garbage to be collected).

XX:+UnlockExperimentalVMOptions Unlock JVM experimental options.
XX:+UseEpsilonGC Enable Epsilon garbage collection.
XX:XmxXg Set heap size. The JVM will exit as soon as this amount is exceeded.
XX:HeapDumpOnOutOfMemoryError Generate a heap dump as soon as the JVM runs out of memory.
XX:OnOutOfMemoryError=command Run specified command when an out-of-memory error occurs.

Z garbage collector (or ZGC, new in Java 12)

A low-latency, high-amount garbage collector, still experimental. It works using its own threads concurrently with your application and uses load barriers for heap references (instead of pre- and post-write barriers as used by G1C). It uses a lot more memory (something like 22 bits per reference), but is made for environments with huge amounts of memory and huge expectations in terms of memory usage. It remaps objects when memory becomes fragmented instead of only as soon as new space is needed and it uses its own threads to do this reducing pauses roughly from 200 to 1 millisecond. How to use:

XX:+UnlockExperimentalVMOptions Unlock JVM experimental options.
XX:+UseZGC Enable ZGC.
XX:XmxXg Set heap size.
XX:ConcGCThreads=X Set number of threads for garbage collection.

ZGC marks memory in three phases. First, with a short, stop-the-world pause, the concurrent pauses, and finally, moving objects around to free up large areas in the heap making allocations work faster. The heap is divided up into pages and only one page is worked on at a time.

Shenandoah garbage collector (new in Java 12)

A heuristic (self-observing or self-learning) garbage collector. It's been around experimentally for some time and is back-portable to Java 8. It's a technically complex garbage collector worthy of explicit research rather than the shallow description I can give here.

XX:+UnlockExperimentalVMOptions Unlock JVM experimental options.
XX:+UseShenandoahC Enable Shenandoah garbage collection.
XX:XmxXg Set heap size.
XX:ShenandoahGCHeuristics=heuristic Select heuristic approach, Adaptive, Static or Compact.

Links