Introduction
The stop-the-world (STW) parallel mark-sweep GC algorithm, which is the basic algorithm employed by the IBM JVM, imposes pause times that are proportional to the size of the heap. Although various improvements have shortened these pauses to acceptable levels for heap sizes currently in use, they are sure to become unacceptably long if the heap grows much larger. This is precisely what the advent of 64-bit platforms and its attendant expansion of address space is expected to bring about. In support of IBM's effort to port the JVM to 64-bit platforms, we are looking into ways to keep the pause times, induced by the existing GC, in check, even on multi-gigabyte heaps.
To this end, several algorithmic enhancements were studied. All these approaches lend themselves to effective parallelization, thus are capable of making use of multiple CPUs. Since they are also compatible with each other, they can be deployed independently.
Generational GC
A well-known algorithm [14] that divides the heap into a nursery, where most new objects are allocated, and an old area, where mature nursery objects are promoted after they become old. Based on observed program behavior, most garbage accrues in the nursery, which therefore requires frequent, but short GCs. Mature objects die infrequently, so the old area does not need GC often. Under these circumstances, most of the GC resources are applied to collect the nursery, thus shortening the overall pause time.
Mostly Concurrent GC
An algorithm, first proposed in [5], for shortening the GC pause time at the expense of some throughput. It delegates most of the marking required for mark-sweep GC to be done concurrently by mutator threads with their application work, thus drastically reducing the amount of marking done inside the (stop-the-world) pause. Additional low priority threads participate in the concurrent marking effort, using up idle CPU time. Measurements show that this enhancement reduces pause times by as much as 80%. This garbage collector is part of the production JVM release 1.3.1. The implementation is described in a paper which appeared in PLDI'02 [17].
Concurrent Sweep
After the mostly concurrent algorithm has reduced the time spent in marking, the sweep phase becomes the dominant contributor to pause time (when compaction is avoided). Most of the sweep work is deferred in this algorithm until after the GC pause, which is to be done by mutator threads concurrently with their application work.
Incremental Compaction
Compaction is not done in every GC cycle, but when it does occur, it is responsible for most of the pause time. This algorithm attempts to reduce compaction time by confining its effort to only a part of the heap at a time. By compacting a different section of each heap's GC cycle, the need for complete heap compaction can be reduced, albeit not eliminated. The algorithm and preliminary results appeared in ISMM'02 [4].
Several patents that deal with large heaps solutions are in the process of being filed.