Arrival and Refreshments: 8:30 - 9:00 GN-F15, Hawthorne 1 Building (Enter in the rear of the building)
Welcome and Introduction: 9:00
Keynote Address: 9:00 - 10:30
Session 1: 10:50 - 12:30
Chair: David Wonnacott, Haverford College
Lunch and Poster Session: 12:30 - 1:30 GN-F15
Session 2: 1:30 - 2:45
Chair: Stephen Fink, IBM Research
Refreshment Break: 2:45 - 3:15
Session 3: 3:15 - 4:25
Chair: Mukund Raghavachari, IBM Research
Refreshment Break: 4:25 - 4:45
Session 4: 4:45 - 6:00
Chair: Matthew Arnold, Rutgers University
Languages like Java and ML have proved successful in the development of robust software, in part because they automatically manage the allocation and deallocation of storage. While automatic storage management has its costs, extant algorithms work well when the overhead can be amortized over all storage operations. For real-time systems, however, each operation must take bounded time. Thus, predictability is more important than raw speed for such systems.
In this talk we present algorithms for storage management that are well suited for real-time systems. For storage allocation, we present variations of Knuth's buddy system that offer bounded-time for allocation as well as reasonable management of storage fragmentation. For garbage collection, we present two algorithms that operate in bounded time. One algorithm is a variation on reference counting, and the other algorithm dynamically schedules an object's collection in response to storage accesses of that object.
We present results for these algorithms based on our implementation
in Sun's JDK interpreter for Java.
Finally, we present an approach for informed storage management,
in which the allocator and collector bias their activity based on
a program's storage needs.
Cache replacement policies play a key role in determining hit rates in set-associative caches. Current cache replacement algorithms rely on runtime trace history, and neither programmers nor compilers can explicitly control cache replacement. This paper describes a novel mechanism to improve cache replacement decisions without the hardware costs of higher set-associativity. We develop compiler-generated auxiliary information using dependence testing and locality analysis to improve cache replacement decisions for scientific programs. We present an ideal theoretical model of our new cache replacement algorithm and prove that it matches or improves LRU for set-associative caches. We present a 16 and 1-bit cache line tag implementation of this model. We call the one bit tag the evict-me (EM) bit. On a miss, the architecture replaces a line if its EM bit is set, or, if no bit is set, it uses the LRU bits. We prove that the EM bit yields replacement decisions at least as good as those of LRU. The EM bit is practical and easy to implement in current set-associative cache architectures. Simulation results on 7 programs show that the EM bit can reduce miss rates in set-associative caches by up to 45\% over LRU. On 6 of the 7 programs, it achieves the same or better hit rates with 2-way set associative as compared to a 4-way cache without an EM bit. A comparison with victim cache shows that the two strategies can work together to further improve cache performance.
Full PaperProviding accurate branch prediction is critical to exploit instruction level parallelism effectively. While most modern processors employ speculative execution to alleviate the branch hazard problem, a large amount of speculative work has to be discarded when a branch misprediction occurs. With the current trend towards deeper pipelining and wider issuing rates of high performance superscalar processors, branch prediction becomes increasingly more important. Even a small improvement of prediction accuracy can mean a significant increase in the performance of the processor. On the other hand, hardware costs are considerable for most modern branch predictors. The struggle to reduce the on-chip resource costs is still continuing.
Some of the existing branch prediction methods are static: compilers use opcode information and profiling statistics to make predictions. Processors such as the MIPS and PowerPC encode compiler's predictions in a single branch-prediction-bit. Other branch prediction methods are dynamic: hardware uses execution history collected at run-time to make predictions. Among them, Two-level Adaptive Branch Prediction schemes achieve high accuracy and are widely used. Some branch prediction methods, such as that used in Intel's most recent processor, Itanium, combine hardware and software techniques to improve the prediction accuracy.
In this paper, we propose a hybrid branch prediction scheme that integrates software and hardware techniques, using various prediction methods for branch instructions of different behaviors. To implement our scheme, the branch instruction format is modified to allow compiler to insert classification bits. An enhanced compiler classifies all branch instructions into four different categories according to their behaviors from profiling runs and encodes the classification in a 2-bit classification field of each branch instruction. Thereby, at run-time, the branch instructions in different classes are handled in different ways. We designed an innovative on-chip branch predictor, named "Switch-Counter". Not only loops but also the branches having basically regular behaviors can be predicted accurately by the Switch-Counter.
Using trace-driven simulations on SPEC95 benchmarks, we measured the branch prediction accuracy of the proposed hybrid prediction scheme and compared our proposed scheme with the conventional two-level branch prediction schemes and a hybrid branch prediction scheme without using the Switch-Counter. The simulation results show that our proposed scheme consistently achieves higher accuracy at approximately similar hardware cost. On average of all the simulated benchmarks, the proposed scheme achieves 0.75% and 1.79% improvements compared with the hybrid scheme without Switch-Counter and the corresponding conventional two-level scheme respectively. The simulation results also show that our proposed hybrid scheme using 4K Bytes hardware resource performs even better than the other schemes using 16K Bytes hardware resources. In other words, our proposal can achieve better prediction accuracy and! in the meantime save considerable hardware costs.
Full PaperIn compiling or developing applications for execution on distributed memory machines, communication optimizations are critical due to their impact on performance. Multithreaded architectures support more than one thread of execution on each processor, with low-cost thread initiation, low-overhead communication, and efficient data transfer and synchronization between threads on different processors. These mechanisms can be used for achieving an effective overlap between communication and computation, and therefore, good performance on communication intensive parallel applications.
In our research, we are focusing on generating correct and efficient multithreaded code for array based programs that involve different classes of reductions. We consider irregular reductions, scalar reductions, and near-neighbor communication patterns. We demonstrate how a compiler can generate threaded code for loops with each of these patterns. Automatic recognition of patterns, thread granularity selection, and generating threaded code for programs with loops of different patterns are topics of our ongoing research.
Full PaperThe goal of this project is to build a Scheme compiler and runtime system that would allow transparently distributed execution of code. The ultimate goal is to allow a network of n machines to be viewed as a single machine with a lot of memory, and n CPUs ignoring the networking layer entirely. This is accomplished by allowing Scheme objects to be stored remotely, and providing a threads-like API. Spawning a "thread" starts a process on a remote node, while providing the synchronization primitives expected of a threads package. A Scheme bytecode compiler and a runtime engine are being build from scratch, allowing distribution of memory to occur at the lowest levels, in turn allowing parts of the compiler itself to use network resources if the need arises. The system is largely independant of network topology and permits systems to enter and leave dynamically to lend resources. Design of the systems involves considerations of distributed mutual exclusion, garbage collection and load balancing. The paper provides some remarks on how this system may be extended to include more failure tolerance and how it can be adapted to other languages.
Full PaperComponent-based programming has become the central part of a paradigm shift in software development, moving programming away from ``from-the-ground-up'' application development and towards a model of plugging together off-the-shelf components. Application frameworks have emerged in the object-oriented programming community as a basis for facilitating component-based programming. While component-based programming at the class level has had some success, there is a growing recognition of the limitations of such a low-level view in developing large applications from off-the-shelf components. There is growing interest in developing module languages specifically for composing class libraries in object-oriented languages.
We are developing an approach to componential programming for a class-based language. Applications of our approach are in statically type-safe monad-based aspect-oriented programming, and in modular mixin-based Internet programming. There are three extensions of class-based languages that we consider useful for the kind of reusable components that we are interested in supporting:
Our vehicle for adding these extensions to a class-based language is mixin modules. A mixin module is a collection of mutually recursive type and implementation definitions. One might expect that a mixin module is then a collection of mutually recursive class definitions, similar to the approach advocated by Bruce et al for a statically safe alternative to virtual types [2]. In our approach, a mixin module is a collection of mutually recursive class and mixin definitions. This separation is a critical aspect of our approach: classes and mixins have different purposes, different operations, and different properties. We have developed a formal type system and an operational semantics for our approach. We are developing an implementation in a compiler for a subset of Java.
In this talk we will focus on giving an informal overview of our work, including its motivation.
[1] Dominic Duggan.
A mixin-based, semantics-based approach to reusing domain-specific
programming languages.
In European Conference on Object-Oriented Programming (ECOOP), Lecture
Notes in Computer Science, Cannes, France, June 2000. Springer-Verlag.
[2] Kim B. Bruce, Martin Odersky, and Philip Wadler.
A statically safe alternative to virtual types.
In European Conference on Object-Oriented Programming (ECOOP), Lecture
Notes in Computer Science, Brussels, Belgium, July 1998. Springer-Verlag.
Despite Java's popularity, several tangible limitations imposed by its type system have become apparent in recent years. The most glaring omission is the lack of a generic mechanism. As a result, many recent projects have extended Java to support polymorphism in the style of C++ templates or Ada generics. One project, GJ [1], adds parametric polymorphism to Java via a homogeneous translation (such that only one class file results from each compiled source file), and produces bytecode that is compatible with the standard Java Virtual Machine. However while GJ's simple translation based on erasure allows for maximum interaction with existing Java code, the new parameterized types that it supports do not operate consistently with Java's semantics for lightweight reflection (i.e. checked type-casts and instanceof operations).
We present Rupiah, an extension of Java based on features adapted from LOOM [2], and implemented by a translation based on GJ. However the translation used in Rupiah differs from GJ in that it harnesses Java's built in reflection utilities to store run-time information about parameterized types. The resulting bytecode can correctly execute cast, instanceof, and array creation expressions because it has access to the necessary run-time type information. We also add a ThisType construct, which solves many of the problems that arise when binary methods are mixed with inheritance, and replace subtyping with a different relation, matching. Finally, we add exact types (which can be used to limit Java's subtype polymorphism), an inheritable virtual constructor mechanism: ThisClass, and compiler features to allow separate compilation of Rupiah source files. These features are implemented in a modified javac compiler. The resulting classfiles are runnable on any Java 1.2 VM. Thus, the contribution of our project is a complete implementation of an extension of Java with a much more robust type system that maintains a close fit with existing Java semantics and philosophies.
[1]
http://www.cs.bell-labs.com/who/wadler/gj/Documents/gj-oopsla-letter.ps
[2]
ftp://www.cs.williams.edu/pub/kim/LOOM.ps.gz
The goal of points-to analysis for Java is to determine the set of objects pointed to by a reference variable or a reference object field. This information has a wide variety of client applications in optimizing compilers and software engineering tools. In this paper we present a points-to analysis for Java based on Andersen's points-to analysis for C. We implement the analysis by using a new constraint-based approach which employs annotated inclusion constraints. Constraint annotations allow us to model precisely and efficiently the semantics of virtual calls and the flow of values through object fields. By solving systems of annotated inclusion constraints, we have been able to perform practical and precise points-to analysis for Java.
We evaluate the performance of the analysis on a large set of Java programs. Our experiments show that the analysis runs in practical time and space. We also show that the points-to solution has significant impact on clients such as object read-write information, call graph construction, virtual call resolution, synchronization removal, and stack-based object allocation. Our results demonstrate that the analysis is a realistic candidate for a relatively precise, practical, general-purpose points-to analysis for Java.
Full PaperVoiceXML is a Web-based markup language for representing human-computer dialogs, just like HTML. While HTML assumes a graphical web browser, with display, keyboard, and mouse, VoiceXML is assumes a voice browser with audio output (computer-synthesized and/or recorded), and audio input (voice and/or keypad tones). VoiceXML leverages the Internet for voice application development and delivery, greatly simplifying these difficult tasks and creating new opportunities.
VoiceXML 1.0 is also a specification of the VoiceXML Forum, an industry consortium of over 300 companies. The Forum is active in the conformance testing, education, and marketing of VoiceXML, and has given control over further language development to the World Wide Web Consortium (W3C). Because it is a specification, applications that work on one conformant VoiceXML platform will work on others as well. VoiceXML is an emerging standard that brings the web development paradigm to the IVR market, which means the existing HTTP gateways to enterprise services and data built using technologies like SSL and cookies can be seamlessly extended to the phone.
VoiceXML is a programming language for describing call flows for interactive voice applications. The VoiceXML language provides a clean and simple means for:
VoiceXML documents can perform programming functions such as arithmetic and text manipulation. This allows a document to check the validity of the user's input. Also, a user's session need not be a simple sequence that runs the same way every time. The document may include "if-then-else" decision making (branching) and other complex structures.
Writing powerful documents is easier when you use Nuance Speech Objects. These are pieces of software that are pre-written, tested, and packaged in a form that is easy for a VoiceXML document to use. Speech Objects conduct dialogs for common functions such as accepting credit card numbers, times and dates, and dollar amounts.
The grammar in VXML is the most important aspect for developing a VXML application. Motorola, IBM, TellMe Studio, BeVocal are the few known firms which are working on VXML application development. The standard version of VXML is yet to develop, since each and every company.
How it works?
A VoiceXML-based voice application is composed of one or more VoiceXML documents. VoiceXML document consists of one or more dialogs. Each dialog defines a call state. A call state can be interactive or informational. In an interactive call state, the voice application prompts the caller to utter a response, and that response is either recognized or rejected. VoiceXML is a derivative of the Extensible Markup Language (XML). XML is the standard format for defining structured documents and data on the Web. XML enables programmers to define an arbitrary vocabulary, formally known as a schema, using a standard, well-defined, easily parsed syntax. One XML schema might describe customer information, another might describe a mathematical equation, and yet another might describe a recipe for chocolate chip cookies. In a nutshell VXML surely has a great future ahead and many companies are opting for it.
Full PaperThe thriving World Wide Web provides users with huge amount of information and services. However, the current WWW relies exclusively on visual interfaces. In order to get information from the web, users must have a computer equipped with a monitor, keyboard and pointing devices. Those who have no access to a computer cannot get any benefit from the web.
VoiceXML enables users to conversationally access the same infrastructures from the web using telephone. It is an XML-based markup language for creating distributed voice applications (voice input, audio output). It is similar to HTML, which is used to create graphical interfaces between users and services. VoiceXML creates and manages dialogs between users and services.
This paper includes two parts:
The first part gives a brief introduction of VoiceXML. I will introduce the general architecture of a system delivering the web content to a user using voice. The function of each component in the architecture, including the speech recognition engine, TTS engine, voice browser, HTTP server and their collaboration will be illustrated in great detail.
The web server could run two kinds of applications. One is the VoiceXML application, the other is HTML application. The voice browser talking with the web severs running VoiceXML application is quite direct. However, the voice browser could not directly talk with HTML-based web servers. In this paper, I present an architecture, which enables the VoiceXML browser to talk with the HTML web servers, with the help of CGI scripts or IBM WebSphere Application Server and HTTP Server.
In the second part, I will discuss some details on the design of the voice applications. The first is the speed issue. I will discuss how to use resource fetching and generate dynamic grammar to enhance speed of speech recognition. The second is how to design the grammar to improve the speech recognition accuracy. The third is to consider different solutions to natural dialog based on the amount of dialog information.
Finally, I will discuss the features and weaknesses of VoiceXML and the trends in its future.
Full PaperVoice Extensible Markup Language, or VoiceXML, is designed for creating dynamic, Internet-powered, phone dialogue applications that feature synthesized speech, digitized audio, recognition of spoken and DTMF (touch-tone) key input, and telephony. The VoiceXML architecture involves various components, which integrate speech recognition software and text-to-speech converter with the existing web architecture. In this paper we present the design and discuss the implementation issues in setting up a VoiceXML end-to-end system for Pace University’s research and application development. It involves setup of a Cisco router 1750 for converting voice over IP, IBM WebSphere Voice server, IBM Web Sphere Application Server, and Microsoft SQL Server. The system will facilitate call traffic analysis reports to understand trends in call volume and application usage, while voice application designers can identify and solve complex usability issues using unique speech recognition tuning facilities provided by the IBM’s Voice Server. The system will allow each developer to have his/her own development space to test and develop new VXML applications, and will offer functionality such as debugging the syntax of the VXML script. We also describe some VoiceXML applications that we developed for Pace University including a student grade information system and demonstrate one that won a prize in the recent IBM Cool Blue VoiceXML programming contest. Finally, we address steps towards solving design issues involving generation of dynamic grammar by proposing a front end with a neural-network learning algorithm that modifies the grammar by adding new words that the system comes across during interaction with the user.
Full PaperThis paper describes the development of a template-based, multi-language VoiceXML application framework. The tools used rely on Apache Forum shareware, with IBM Voice products for audio input/output deployed on an NT platform. This framework takes advantages of many readily available software products to provide a short learning curve for the developer and fast system deployment. The design follows the MVC (Model, View, Controller) software engineering paradigm that maps into abstract layers of presentation, business logic, and data access components. Although VoiceXML is easy to understand, building a good speech application can be complicated. This framework allows the voice user-interface expert to concentrate on building a powerful dialog engine, while leaving the business logic to the programmer. A multi-language, directory-assistance voice application will be demonstrated to illustrate the flexibility and ease of use of this framework. It shows the step-by-step design of the dialog templates, mapping of the dynamic content to the database, wiring of content with business logic, and incremental unit testing via HTML and voice browsers. Further development of this framework could lead to single authoring of multi-modal, multi-channel, and multi-language applications.
Full PaperThe purpose of this paper is to show a design and its implementation for a generic retailing software that satisfies three basic requirements: Designer defined catalog content and structure, separation of catalog presentation from catalog content and capability of catalog delivery in different markup languages. Our implementation employs Java, eXtensible Markup Language (XML), eXtensible Style Language (XSL) transformations and Relational Database Management System (RDBMS) technologies to achieve such a result.
One of the core functionalities in both Business-to-Business (B2B) and Business-to-Consumer (B2C) e-commerce software is Catalog Management. In simple words, a catalog contains all product related information for an enterprise. An enterprise may have several product types which have and require different kinds of product information to be kept in a catalog. Similarly, different enterprises may chose to have different catalog information in their system even for the same product type. Our system provides a simple web based interface to the designer to define the content and the structure of a catalog, e.g. what properties of products should be kept in the catalog and what are their relations to products. Based on the designer's specification, the system modifies the database schema and creates new tables if necessary.
Rapid emergence of web access devices like Personal Digital Assistants (PDAs) and mobile phones, which have limited display capabilities and computing power compared to PCs, causes the web designers to change their content and presentation to make their web site compatible with those devices and hence maintaining multiple web sites at the same time. Separation of content from presentation partly solves this problem if those devices uses the same markup language. For instance, a web site can be designed to have two HTML presentations, one for PCs and one for PDAs. However some devices such as mobile phones use a different mark-up language (Wireless mark-up language, WML). In addition, many web sites use XML to exchange catalogs or other business data. In our system all presentation information is kept in XSL documents. Any query to the catalog returns an XML document as a result. The system matches the resulting XML document with the XSL document and then applies an XSL transformation to generate the final result in target markup language such as XHTML, WML or XML.
Our approach simplifies the maintenance of a web site when multiple presentations of the same information (or its subset) are desired. The designer just needs to define XSL transformations to generate new presentations in desired target markup languages.
Full PaperIn the course of performing dependence analysis, a common technique is to treat the problem as a system of equalities, inequalities, and disequalities. Typically, determining if the disequalities affect the feasibility of the system is done through an exponential enumeration algorithm which is time consuming for large numbers of disequalities. This is a source of significant slow down in practice so even reducing the size of the exponent is useful. It has already been shown that equalities and inequalities can be solved efficiently (polynomial time) on the commonly occurring LI(2)-unit subdomain, so determining how disequalities behave on that subdomain, as well as in general, is an important question.
In this paper we give an overview of how one deals with equalities and inequalities, and then give a thorough examination of some of the concerns that arise when analyzing disequalities. We derive a sequence of efficient algorithms for shifting some or all of the weight off of the exponential algorithm, and analyze its complexity both in general and on the commonly occurring LI(2)-unit subdomain. We discuss the notions of disequalities "adding up" - when disequalities which themselves do not eliminate all of the solutions can collectively eliminate all solutions. We also introduce the notion of a system of inequalities being "splayed open", "prism open" or "closed" - which categorize the shape of the system of constraints and whether or not disequalities can add up. Through careful classification of the disequalities, this algorithm is (in polynomial time) able to identify which of the disequalities needs to be analyzed with the full exponential algorithm, and which can be studied in low order polynomial additional time.
We show that the algorithm takes O(n^5) time in general to classify the disequalities, and only O(n^3) time to classify when we are operating on the LI(2)-unit subdomain. Furthermore, we provide several simple and fast (linear) preprocessing tests which provide conservative information about classification, which may avoid the need for running the quadratic time algorithm and possibly avoid the full exponential test. In the appendices, we give full mathematical proofs for the accuracy and time complexity of each of the tests presented.
Full PaperThis paper describes a general approach for optimized live heap space and live heap space-bound analyses for garbage-collected languages. The approach is based on program analysis and transformations and is fully automatic. The space-bound analysis produces accurate (tight) upper bounds in the presence of partially known input structures. The optimization drastically improve the analysis efficiency. The analyses have been implemented and experimental results confirm their accuracy and efficiency.
Full PaperThis paper deals with a parallel merging algorithm. The original algorithm was designed for a CREW SM SIMD computer. The algorithm is improved and implemented with Multithreading features of Java.
The desired property of parallel merge algorithm such as the number of threads used by the algorithm is sub linear and adaptive. The running time of the algorithm is significantly smaller than its sequential algorithm. The cost of the algorithm is optimal.
For two sorted sequences of size r and s, and in the worst case where r = s = n, the running time is O((n/N) + log n), where N is the number of the threads. The algorithm is cost optimal for N = n/logn.
Full PaperMobile code provides a highly popular, flexible, and functional form of computing which comes with additional security considerations above and beyond traditional computing paradigms. Executing mobile code on a local machine should prompt the user to question the source, correctness, intentions, and integrity of that code. A great deal of work has gone into finding a solution to address these concerns and provide for the safe execution of mobile code. Solutions that include forbidding mobile code use, digitally signing mobile code applications, utilizing certificates, executing mobile code in a protective sandbox, encrypting mobile code applications, proof carrying code, certified code, and combinations of the above have all been proposed or implemented. However, each of these existing solutions have trade-offs that detract from their desirability.
Our work addresses the questions of code integrity and author authentication utilizing steganographic techniques. Steganography is the process of hiding or embedding secret information in other, seemingly innocuous, cover information. Steganography has typically been employed to provide a means for the covert communication of intelligence information in the world of espionage. Modern day uses have included digital watermarks for copyright protection of electronic media (e.g., images, music, and video).
We have implemented a system that embeds a fragile Tamper Detection
Mark (TDM) within Java classfiles. These TDM's can be thought of as
Message Authentication Codes (an encrypted hash code for a given
classfile) and can be extracted by a client to authenticate the source
of the code and to prove that the code has not been altered since the
time it was compiled.
Designing programs around the natural decomposition of a problem into reusable components makes it easier for programmers to write and maintain programs. This principle is known as modularity. Despite the benefits of modularity, it introduces inefficiency into programs, in part due to the creation of intermediate data structures such as lists and trees. Deforestation is a program transformation that eliminates some of these structures, allowing programmers to write modular programs with reckless abandon and still enjoy the efficiency benefits of monolithic programming.
Most existing deforestation algorithms require the input program to be in a restricted form in order to ensure that deforestation occurs. Olaf Chitil has recently developed a type-inference-based algorithm for deforestation, which removes some of these restrictions. Chitil has also developed a prototype which partially implements this algorithm. This prototype requires more development in order to test it on example programs. For example, the prototype accepts input programs which are written in F, a modified version of Core, the intermediate language of the Glasgow Haskell Compiler (GHC). I have developed a Core-to-F translator, which will allow the prototype to be tested on real Haskell programs, which are translated into Core by the GHC front-end and then translated into F by my translator. I have also implemented the full deforestation algorithm of which the prototype is only a part. The next step is to run the deforestation algorithm on various benchmarks, but after that, there are many different directions in which this project could be taken, such as cross-module deforestation, exploring the relationship between inlining and deforestation, and extending Chitil's algorithm to handle a wider range of datatypes. My work will lead to a stronger basis for comparison between different deforestation techniques, and to a better understanding of how deforestation works in practice.
References: Olaf Chitil, ``Type-Inference Based Deforestation of Functional
Programs'', unpublished Ph.D thesis, 2000.
Back to the main
MASPLAS page.