ACM SIGPLAN 2008 Conference on
Programming Language Design and Implementation (PLDI'08)

Tucson, Arizona, June 7 - 13, 2008

http://pldi2008.cs.ucr.edu/

Tutorials Program -- Sunday, June 8

Please choose one tutorial from the morning (M1 or M2) and one from the afternoon (A1 or A2).

Morning Session
8:30 am - 12:00 pm
M1
Demystifying GCC: Under the Hood of the GNU Compiler Collection
Morgan Deters and Ron Cytron (Washington University)
Slides

M2

Transactional Memory: From Semantics to Implementation
Yang Ni and Adam Welc (Intel Corporation)
Slides




Afternoon Session
1:30 pm - 5:00 pm
A1
Analysis and Optimization of Parallel Programs
Sam Midkiff (Purdue University) and Vivek Sarkar (Rice University)
Slides

A2
Building a High Level Language Compiler for GPGPU
Bixia Zheng, Derek Gladding, and Micah Villmow (AMD)
Slides




M1: Demystifying GCC: Under the Hood of the GNU Compiler Collection

Presenters: Morgan Deters and Ron Cytron (Washington University)
8:30am - 12:00pm
Slides

This tutorial studies the implementation details of the GNU Compiler Collection (GCC). Many researchers have avoided using GCC as a research and development platform, largely because GCC's necessary complexity makes it intimidating for new developers. It is our position, however, that GCC's portability, support for multiple language front-ends, and active development community make it a useful platform for much language and compiler research. In this tutorial, we mitigate its chief limitation: that it has a high learning curve.

After attending this tutorial, participants will be able to navigate GCC's source tree, will have an understanding of GCC's intermediate representations, will be comfortable making localized modifications to GCC source, and will have an idea of what necessary edits compose more substantial modifications. This tutorial will demystify GCC so that participants can use it as a research and development platform for language and compiler research, add front-ends for new high-level languages, port GCC to new target platforms, and make other contributions.

Morgan Deters is a postdoctoral researcher in the Department of Software at the Technical University of Catalonia in Barcelona. His chief research interests include compiler support for advanced language and runtime mechanisms, static analysis, garbage collection, and software verification. Morgan has worked with and modified GCC for his research and has lectured on the topic. Morgan earned his Ph.D. in computer science at Washington University in St. Louis in 2007.

Ron Cytron is a professor of Computer Science and Engineering at Washington University in St. Louis. He has decades of experience teaching compilation, program optimization, languages, and software development to students, researchers, and industry practitioners. Ron earned his Ph.D. in computer science at the University of Illinois at Urbana-Champaign in 1984.


M2: Transactional Memory: From Semantics to Implementation

Presenters: Yang Ni and Adam Welc (Intel Corporation)
8:30am - 12:00pm
Slides

With single thread performance starting to plateau, HW architects have turned to chip level multiprocessing (CMP) to increase processing power. All major microprocessor companies are aggressively shipping multi-core products in the mainstream computing market. Moore’s law will largely be used to increase HW thread-level parallelism through higher core counts in a CMP environment. CMPs bring new challenges into the design of the software system stack.

In this tutorial, we talk about the shift to multi-core processors and the programming implications. In particular, we focus on transactional programming. Transactions have emerged as a promising alternative to lock-based synchronization that eliminates many of the problems associated with lock-based synchronization. The tutorial will cover a range of topics related to transactional memory spanning from the description of high-level language constructs and their semantics to the low-level details of specific algorithms used to support efficient execution of these constructs. We will take a programming systems view of transactional memory and walk the audience through each layer of the system starting from the top-level programmer's view of transactional memory and working down to the implementation level. We show how transactional memory can avoid the problems of lock-based synchronization such as deadlock and poor scalability when lock-based software modules are composed. We discuss how transactional constructs can be added to languages as an alternative to current synchronization constructs. We discuss the semantics of transactional language constructs including advanced semantic issues related to isolation and nesting. We present software strategies for implementing transactional memory. We show how to integrate transactional memory with other language and runtime features. We also show how to leverage compiler optimizations to reduce the overheads of transactional memory. We will cover language integration and implementation issues related to both the C and Java languages. Finally, we will present some important open research issues.

This tutorial aims to educate the PLDI community on emerging software transactional memory technologies. It gives a comprehensive overview of transactional memory, presents the current state of the art in transactional memory research, and discusses open research problems. It targets members of the community (researchers, practitioners, educators, and developers) interested in emerging programming language technologies for multi-core architectures

Pre-requisite knowledge: Pre-requisite knowledge: The tutorial assumes the audience has a basic understanding of compilers, language runtimes, parallel programming, and computer architecture.

Yang Ni is a Research Scientist in Intel's Programming Systems Lab. He has been working on programming languages for platforms from mobile devices to chip multi processors. His current research focuses on transactional memory. He is a major contributor to the Intel C/C++ TM compiler. Yang received his Ph.D. in Computer Science from Rutgers University.

Adam Welc is a Research Scientist in Intel's Programming Systems Lab. His work is in the area of programming language design and implementation, with specific interests in concurrency control, compiler and run-time system optimizations, transactional processing as well as architectural support for programming languages and applications. Adam received the Master of Science in Computer Science from Poznan University of Technology, Poland, in July 1999. He continued his graduate studies at Purdue University, receiving the Master of Science in Computer Science in May 2003, and the Ph.D. in Computer Science in March 2006.


A1: Analysis and Optimization of Parallel Programs

Presenters: Sam Midkiff (Purdue University), Vivek Sarkar (Rice University)
1:30pm - 5:00pm
Slides

The historical foundations of code optimization including intermediate representations, data flow analyses, and optimizing transformations are all deeply entrenched in the von Neumann model of sequential computing. These foundations have to be reworked as compilers are being challenged to generate code for multiple homogeneous and heterogeneous cores. In this tutorial, we summarize the state of the art in analysis and optimization of parallel programs by covering the following topics:

The objective of the tutorial is to make compiler practitioners and researchers aware of the limitations of current code analysis and optimization frameworks for parallelism, and introduce them to the changes that will be necessary for targeting parallel hardware. The prerequisite knowledge assumed is familiarity with the foundations of analysis and optimization of sequential programs.

Sam Midkiff is an Associate Professor of Electrical and Computer Engineering at Purdue University. His research activities include compiling for explicitly parallel programs, programming languages an and models for parallel programming, compiler analysis for parallel programs and high performance computing. Prior to joining Purdue he was a Research Staff Member at IBM Research. While at IBM Research he was involved in the IBM PTRAN automatic parallelization system, the xlHPF High Performance Fortran Compiler, the Distributed Resource Management System (DRMS) and the Numerically Intensive Java (NINJA) project, which brought near Fortran performance to Java. Sam holds a BS from the University of Kentucky, and MS and PhD degrees from the University of Illinois at Urbana-Champaign. He has given tutorials in the past at the ACM Java Grande Conference (2000) and at the ACM Supercomputing conference (2000), and taught full length courses.

Vivek Sarkar is the E.D. Butcher Professor of Computer Science at Rice University. He conducts research in programming languages, program analysis, compiler optimizations and virtual machines for parallel and high performance computer systems, and currently leads the Habanero Multicore Software Research project at Rice (www.habanero.rice.edu). Prior to joining Rice, he was senior manager of programming technologies at IBM Research. His past projects include the X10 programming language, the Jikes Research Virtual Machine for the Java language, the ASTI optimizer used in IBM's XL Fortran product compilers, the PTRAN automatic parallelization system, and profile-directed partitioning and scheduling of Sisal programs. Vivek holds a B.Tech. degree from the Indian Institute of Technology, Kanpur, an M.S. degree from University of Wisconsin-Madison, and a Ph.D. from Stanford University. In 1997, he was on sabbatical as a visiting associate professor at MIT, where he was a founding member of the MIT RAW project. Vivek has given several tutorials in past conferences including PLDI 1993, POPL 1996, ASPLOS 1996, PLDI 2000, OOPSLA 2003, ECOOP 2004, OOPSLA 2006, PPoPP 2007, PLDI 2007, and has also taught many short courses and full-length courses.


A2: Building a High Level Language Compiler for GPGPU

Presenters: Bixia Zheng, Derek Gladding, and Micah Villmow (AMD)
1:30pm - 5:00pm
Slides

Commodity graphics processors (GPU) have tremendous processing power for data parallel computing. Researchers in general-purpose computing on graphics processors (GPGPU) have demonstrated performance benefits running computationally intensive tasks on GPUs. However, the traditional way of programming GPUs is very difficult and requires deep understanding of the graphic processing pipeline. Programming environments, such as CUDA (NVIDIA) and CAL (AMD), aim at abstracting away the graphics-centric concepts used in programming GPUs and eases GPU programming. This tutorial presents a case study in building a data-parallel compiler for GPUs based on such an abstraction layer provided by GPU hardware vendors.

In the tutorial, we first give an overview of modern GPU architectures. Then, we present the AMD Compute Abstraction Layer (CAL), which provides driver level control and access for programming the GPU using a virtual instruction set. Third, we present Stanford University's BrookGPU open-source project as well as our extension to the project (Brook+). Brook+ includes a data-parallel language, a compiler and a runtime library for programming GPUs as data-parallel coprocessors. Finally, we walk through a basic algorithm, such as matrix multiplication, and show how to implement the algorithm for CPU, for CAL, and for Brook.

Bixia Zheng is a member of the Stream Compute Software Group at AMD. Her current focus is programming languages and compiler tools for GPGPU. In the past, she worked on Sun SPARC compilers and dynamic binary translation. Bixia holds a PhD from the University of Minnesota.

Derek Gladding works with the Stream Compute Software Group at AMD. He has an MEng from the University of Manchester and has spent most of his career straddling the hardware/software boundary.

Micah Villmow received his Computer Science Bachelors in Spring 2005 and Masters in Fall 2006 from Florida State University specializing in Computer Vision and Parallel Algorithms and Data Structures. Micah was an intern for ATI Research Inc. working on Object Recognition on the GPU using CTM and CAL and started working at AMD as a stream computing application engineer in Fall 2007.