Skip to main content

On-The-Fly Collector

Java Memory Management

On-the-Fly Concurrent Collection

The goal of this project was to improve the scalability of the JVM by designing and implementing an industry-leading garbage collector for multiprocessor server machines. This was accomplished by employing an on-the-fly concurrent garbage collector, which never stops the execution of user programs for GC (garbage collection), but rather executes concurrently with mutator threads. Additionally, the Java heap manager was redesigned to reduce the serialization of heap allocations.

Our collector was a part of the S/390 JVM JDK1.1.4 (until it was replaced by JDK1.3.1 due to considerations of compatibility with other platforms), and is also part of the JDK on the iSeries V4R2. We enhanced the collector and the heap manager designs for this platform to incorporate new Java features and to keep up with the platform's development.

The design is based on a parallel mark-sweep algorithm described in [7,6]. Rather than stop all mutator threads for the duration of the GC cycle, this algorithm coordinates its progress with the mutators by posting its current status in a global variable. Mutator threads check this status at regular intervals and react accordingly. Therefore, mutator threads are able to continue running while GC is taking place. The amount of overhead incurred by mutator threads, however, is minimal. The implementation differs from the algorithm as published in many details, in order to account for properties peculiar to the Java environment, including support for long-running native methods, object finalization, and weak references.

This GC has been complemented by a binning-type allocation scheme, where memory is divided into chunks that are chained together and organized into bins. Objects of similar sizes are allocated from the same bin. Allocation from bins is very fast, and requires a compare-and-swap operation to remove a chunk from the linked list.

We built an extensive test suite to prove the correctness of the implementation based on Sun's "Java Language Specifications". A paper that describes the design and the implementation of the On-The-Fly Collector appeared in ISMM'00 [9].

Generational On-The-Fly Garbage Collector

Our group dedicated several projects to improving the on-the-fly collector. One such enhancement introduced the idea of generational collection. The generational collection algorithm, first proposed by Lieberman and Hewitt [14] in 1983, is based on the observation that in many applications most objects die young, and relatively few objects remain reachable for a substantial part of the duration of the program. Therefore, it is more effective to scan recently-allocated objects more often, since these are likely to contain collectable garbage. Technically, the memory is partitioned into two areas: one part, the nursery, where all new objects are allocated, and the other, the old area, where objects are moved once they grow old. The collection concentrates efforts on the young generation where more garbage is likely to be found. In addition to obtaining a more efficient collector (on the average), a generational collector also induces better program behavior with respect to page faults. The more frequent collections allow for faster reuse of the memory and thus, the program uses fewer pages. Furthermore, a collection of the nursery does not have to trace the entire heap thereby, touching fewer pages.

Incorporating generations into the on-the-fly collector has never been done before, and we had to solve several synchronization and efficiency problems to make it work. A paper that describes this effort appeared in PLDI'00 [8].

Improvements to the On-The-Fly Collector

Another one of our group's improvement had to do with streamlining the on-the-fly algorithm itself. One of the bottlenecks of on-the-fly collectors, as well as other parallel programs, is synchronization between the various threads, which causes delays and reduced throughput. By implementing improvements to the on-the-fly algorithm put forth by Lamport [13] and by Hudak and Keller [11], as well as by removing the synchronization between the sweep phase of the collector and the allocation of objects by the program threads, our adaptation could avoid costly barrier instructions (e.g., sync on the PowerPC) that would otherwise be required to ensure memory coherence.