Overview
This project's goal is to build a coherent set of tools for deep program analysis, including program understanding and program transformation, for various types of legacy systems. This technology will build a generic wide-spectrum abstract representation for programs, augmented with techniques such as pointer analysis and program slicing.
The internal representation will be based on the plan calculus [1], essentially a data-flow and control-flow graph. This representation has previously been used in several research projects, including two high-quality commercial program transformation projects. The first system translated IBM assembly language to portable C [2], and the second transformed COBOL programs written for network databases to use relational databases instead [3]. Both systems were able to discover program concepts that were implicit in the source programs, like deducing subroutine interfaces and precise data types, discovering unused results (such as the high-order part of products and the condition code), and eliminating them from the target program. The systems also analyzed loops in the source program and identified filters, joins, and aggregative operations in the source program. These were then removed from the COBOL source and coded in SQL, resulting in smaller, clearer, and more efficient programs.
This activity is a multi-year infrastructure project, with specific spin-offs every year to address scenarios such as legacy transformation to SOA, advanced refactoring capabilities for legacy systems, and tools for API change management.