IBM Research
2005 Programming Languages Day at Watson

The Sixth IBM Programming Languages Day will be held at the IBM Thomas J. Watson Research Center on Friday, April 22, 2005. The day will be held in cooperation with the New Jersey and New England Programming Languages and Systems Seminars. The main goal of the event is to increase awareness of each other's work, and to encourage interaction and collaboration.

The Programming Languages Day features a keynote presentation and six regular presentations. Simon Peyton Jones of Microsoft Research will deliver the keynote presentation this year.

You are welcome from 9AM onwards, and the keynote presentation will start at 10AM sharp. We expect the program to run until 4PM. The Programming Languages day will be held in room GN-F15 in the Hawthorne-1 building in Hawthorne, New York.

If you plan to attend the Programming Languages Day, please register by sending an e-mail with your name, affiliation, and contact information to raghavac@us.ibm.com so that we can plan for lunch and refreshments.


Agenda

10:00-11:15 Simon Peyton Jones
Microsoft Research (Cambridge)
Composable memory transactions
11:15-11:45 Yitzhak Mandelbaum
Princeton University

A Calculus for Specifying Ad Hoc Data Formats
11:45-12:15 Suad Alagic
University of Southern Maine
Type Erasure: Breaking the Java Type System

12:15-13:15 Lunch  
13:15-14:00 Vijay Saraswat
IBM Research
X10: An Object-Oriented Approach to Non-Uniform Cluster Computing

14:00-14:30 Geoffrey Washburn
University of Pennsylvania
Generalizing Parametricity Using Information Flow


14:30-15:00 Break  
15:00-15:30 Adriana Compagnoni
Stevens Institute for Technology
SIF, A Typed Assembly Language for Non-Interference


15:30-16:00 Eric Allen
Sun Microsystems, Inc.
Encapsulated Upgradable Components


Abstracts

Composable memory transactions
Simon Peyton Jones
Microsoft Research (Cambridge)


Concurrent programs are increasingly important, but are notoriously difficult to write. Familiar problems include forgetten locks, deadlock, missed wake-up calls, and complex error-recovery code. Worse than all these, however, is lack of compositionality: even correctly-implemented concurrency abstractions cannot be composed together to form larger abstractions.
After a decade of stumbling around in the dark, I went to a talk about \emph{Software Transactional Memory} (STM), and my life was changed. In this talk I’ll share my Damascus-road experience with you, by presenting a new shared-memory concurrency model that avoids, by construction, many of the problems that plague lock-based synchronisation schemes. Better still, memory transactions compose beautifully, both in sequence and as choices.
Transactional memory works particularly nicely in a declarative setting, and I’ll describe our realisation in Haskell. Gratifyingly, the pure setting allowed us to come up with two ideas that were previously inaccessible: modular primitives for blocking and choice.
This isn’t a Haskell-ghetto talk, though: I think that transactional memory is the best hope yet for shared-memory concurrency in any language.
Joint work with Tim Harris, Maurice Herlihy, Simon Marlow


A Calculus for Specifying Ad Hoc Data Formats
Yitzhak Mandelbaum
Princeton University

An ad hoc data format is any nonstandard data format for which parsing, querying, analysis or transformation tools are not readily available. Vast amounts of useful data are stored and processed in ad hoc formats. Traditional databases and XML systems provide rich infrastructure for processing well-behaved data, but are of little help when dealing with ad hoc formats. Before anything can be done with an ad hoc data source, someone has to produce a suitable parser for it. Today, people tend to use C or PERL for this task. Unfortunately, writing such parsers is tedious and error-prone. It is complicated by the lack of documentation, convoluted encodings designed to save space, the need to produce efficient code to cope with the scale of the data, and the need to handle errors robustly to avoid corrupting precious data.
As an alternative, a number of languages have been designed to enable users to describe data declaratively in terms of types and constraints, including PADS, DataScript and PacketTypes. While these languages differ significantly in their application, their essential semantics are quite similar. Unfortunately, no formal specification of the semantics of any of these languages exists. Such semantics would be very useful in understanding the existing implementations, as well as in developing new bindings and alternative implementations.
To address this shortcoming, we have developed a calculus of dependent types for describing the physical layout and semantic properties of a data source. It consists of a small number of orthogonal type constructors that can be composed with one another to describe rich and varied data formats. The calculus captures many of the commonalities of the aforementioned data description languages. We present a semantics for the calculus in which we interpret the types simultaneously as the simple types of the data described and as parsing functions for a data stream. The parsing functions are robust, automatically detecting and recording errors in the data stream without halting parsing. We prove two important correctness properties of the parsers. First, we show they are type-correct, outputing data whose type matches the simple-type interpretation of the specification, and second, we show they are “error-correct,” reporting exactly the number of physical and semantic errors that occur in the returned data. Finally, we present a case study of using this calculus to describe the PADS language.

Joint work with: Kathleen Fisher and David Walker.


Type Erasure: Breaking the Java Type System
Suad Alagic
University of Southern Maine

A Type erasure is an implementation idiom for extending Java with generics (parametric polymorphism) underlying the most recent release Java 5.0. We show that this idiom is provably incorrect and as such it leads to major violations of the Java type system that appear in the new release. The new release thus turns a type safe language into an unsafe one as static type checking does not work any more. The run-time type information in this release is incorrect so that dynamic type checking can fail in unexpected ways. This leads to major problems for the programmers relying on the Java reflective capabilities. Further problems are in type violations related to persistence and the Java serialization mechanism. We show that the problem of extending Java with parametric polymorphism does not have a correct solution unless the Java Virtual Machine is extended to handle it properly. We explain the subtleties required by a correct implementation technique that includes representation of parametric classes in the standard Java class file format, representation of the instantiated parametric class objects, extensions of the Java Core Reflection to report type information about (instantiated) parametric classes, and the loading techniques required by this solution for extending Java with generics. Previous solutions for this problem are analyzed as well.

Background paper:
Parametric polymorphism for Java: Is there any hope in sight?, ACM SIGPLAN Notices, December 2004.
Joint work with Brian Cabana, Jeff Faulkner and Mark Royer


X10: An Object-Oriented Approach to Non-Uniform Cluster Computing
Vijay Saraswat
IBM Research

The next generation of high performance computers (e.g. those capable of O(10^{15}) operations per second) will be based on scale-out techniques rather than increasing clock rates. This leads to a notion of clustered computing: a single computer may contain hundreds of thousands of tightly coupled nodes. Unlike a distributed model, failure of a single node is tantamount to failure of the entire machine. However, the cost of memory access by a hardware processor may vary as much as five orders of magnitude across the cluster; hence the notion of a single shared memory may no longer be appropriate for such machines.
We have designed a concrete modern object-oriented programming language, X10, for high performance, high productivity programming of such machines. Past work in the Java Grande Forum has exposed the need for substantial changes in modern OO languages in order to support high performance computation (e.g. support for true multi-dimensional arrays, value types, relaxed exception model, changes to the concurrency model, support for distribution etc.) X10 builds on this past work. A member of the *partitioned global address space* family of languages (which includes Titanium, UPC and Co-Array Fortran), X10 is distinguished by the explicit reification of locality (*places*), termination detection (*finish*), by the use of lock-free synchronization (conditional atomic sections), and by the ability to express clustered data-structures (e.g. arrays scattered across multiple places). X10 smoothly integrates concurrent shared memory access (e.g. {} as expressed by OpenMP or Java threads with message-passing (e.g. as expressed by MPI). We present the design of the core features of the language, experience with a reference implementation and present results from some initial productivity studies.
Joint work with: Philippe Charles, Christopher Donawa, Kemal Ebcioglu, Christian Grothoff, Allan Kielstra, Christoph von Praun, Vivek Sarkar


Generalizing Parametricity Using Information Flow
Geoffrey Washburn
University of Pennsylvania

Run-time type analysis allows programmers to easily and concisely define operations based upon type structure, such as serialization, iterators, and structural equality. However, when types can be inspected at run time, nothing is secret. A module writer cannot use type abstraction to hide implementation details from clients: they can use type analysis to determine the structure of these supposedly “abstract” data types. Furthermore, access control mechanisms do not help isolate the implementation of abstract datatypes from their clients. Buggy or malicious authorized modules may simply leak type information to unauthorized clients, so module implementors cannot reliably tell which parts of a program rely on their type definitions.
Currently, module implementors rely on parametric polymorphism to provide guarantees about the use of their abstract datatypes. Standard parametricity does not hold for a language with run-time type analysis, but in this paper we show how to generalize parametricity so that it does hold in the presence of type analysis and still encompasses the integrity and confidentiality policies that are derived from parametricity. The key is to augment the type system with annotations about information-flow. By tracking the flow of dynamic type information, the implementor of an abstract data type can easily see which parts of the program depend on their chosen implementation.
Joint work with: Stephanie Weirich.


SIF, A Typed Assembly Language for Non-Interference
Adriana Compagnoni
Stevens Institute for Technology

In a multilevel security architecture information can range from having low (public) to high (confidential) security level. Information flow analysis studies whether an attacker can obtain information about the confidential data by observing the output of the system. In particular, the non-interference property states that any two executions of the same program, where only the high-level inputs differ in both executions, does not exhibit any observable difference in the program's output.

Standard perimeter security mechanisms such as access control or digital signatures fail to address the enforcement of information-flow policies. On the other hand, language-based strategies offer a promising approach to information flow security. However, the
challenge of studying information flow for assembly languages is that the control flow constructs that guide the analysis in high-level languages are not present. To address this problem, we developed a typed assembly language that uses low-level linear continuations to impose a stack discipline on the control flow of programs.

In this talk we will present SIF, a typed assembly language for secure information flow analysis with security types. This language provides a stack of continuations that indicate the program points where different branches of code converge. We present a type system that guarantees that these continuations are treated linearly in the sense that they are obligations and must be discharged. Well-typed SIF programs are assembled to machine code that satisfy non-interference.

Joint work with: Ricardo H. Medel and Eduardo Bonelli


Encapsulated Upgradable Components
Eric Allen,
Sun Microsystems, Inc.

Modern software applications and libraries are difficult to deploy, upgrade and maintain. Users regularly encounter incompatible software and broken upgrades with hidden dependencies that are difficult to diagnose and repair,often requiring reinstallation of applications, or even of entire operating systems. In this paper, we argue that these problems correspond to other, well understood, issues that occur during software development, which are solved through the use of module systems. We extend the notion of software modules and present a single solution to problems in both contexts:
encapsulated upgradable components that are carried throughout the software lifecycle.
We propose a set of desirable features that an upgradable component system should provide and present a system with these features, including mechanisms for controlled compilation, testing, installation, linking, and upgrade. Our system allows users and programmers to reason about software as if each deployed component were a self-contained entity. In addition, users can safely upgrade exposed libraries used by components, and programmers can impose restrictions on installations and upgrades. We formally state the key properties of our system. We show how the structure of this component system can be expressed in the component system itself, enabling management and upgrades of components by other components, and how such a system can be efficiently implemented with resource requirements comparable to dynamic linking.
Joint work with: Victor Luchangco and Sam Tobin-Hochstadt


Program committee:
Patricia Johann, Rutgers University
Shriram Krishnamurthi, Brown University
Mukund Raghavachari, IBM T.J. Watson Research Center

 

Privacy Legal Contact IBM www.research Research Sites Page Contact