This new text examines the design and implementation of lcc, a production-quality, retargetable compiler for the ANSI C programming language, designed at AT&T Bell Laboratories and Princeton University. The authors' innovative approach--a "literate program" that intermingles the text with the source code--gives a detailed tour of the code that explains the implementation and design decisions reflected in the software. And while most books describe toy compilers or focus on isolated pieces of code, the authors provide the entire source code for a real compiler, which is available via ftp. Structured as a self-study guide that describes the real-world tradeoffs encountered in building a production-quality compiler, this book is useful to individuals who work in application areas applying or creating language-based tools and techniques.
Senior undergraduate or graduate level second course. Any researcher or implementer of compilers for parallel or advanced computers.
How to Read This Book.
Memory Management Interface.
Representing Symbol Tables.
Finding and Installing Identifiers.
Structure and Enumeration Types.
Recognizing Character Constants and Strings.
Languages and Grammars.
Ambiguity and Parse Trees.
FIRST and FOLLOW Sets.
Writing Parsing Functions.
Handling Syntax Errors.
Parsing C Expressions.
Unary and Postfix Expressions.
Unary and Postfix Operators.
Labels and Gotos.
Managing Labels and Jumps.
The Main Program.
Eliminating Common Subexpressions.
Flow of Control.
Enforcing Evaluation Order.
Driving Code Generation.
Eliminating Multiply Referenced Nodes.
Organization of the Code Generator.
Generating Code to Copy Blocks.
Labelling the Tree.
Reducing the Tree.
Coordinating Instruction Selection.
Tracking the Register State.
Syntactic and Semantic Analyses.
Code Generation and Optimization.
Testing and Validation.
The compiler is the linchpin of the programmer's toolbox. Working programmers use compilers every day and count heavily on their correctness and reliability. A compiler must accept the standard definition of the programming language so that source code will be portable across platforms. A compiler must generate efficient object code. Perhaps more important, a compiler must generate correct object code; an application is only as reliable as the compiler that compiled it.
A compiler is itself a large and complex application that is worthy of study in its own right. This book tours most of the implementation of lcc, a compiler for the ANSI C programming language. It is to compiling what Software Tools by B.W. Kernighan and P.J. Plauger (Addison-Wesley, 1976) is to text processing like text editors and macro processors. Software design and implementation are best learned through experience with real tools. This book explains in detail and shows most of the code for a real compiler. The accompanying diskette holds the source code for the complete compiler.
lcc is a production compiler. It's been used to compile production programs since 1988 and is now used by hundreds of C programmers daily. Detailing most of a production compiler in a book leaves little room for supporting material, so we present only the theory needed for the implementation at hand and leave the broad survey of compiling techniques to existing texts. The book omits a few language features--those with mundane or repetitive implementations and those deliberately treated only in the exercises--but the full compiler is available on the diskette, and the book makes it understandable.
The obvious use for this book is to learn more about compiler construction. But only few programmers need to know how to design and implement compilers. Most work on applications and other aspects of systems programming. There are four reasons why this majority of C programmers may benefit from this book.
First, programmers who understand how a C compiler works are often better programmers in general and better C programmers in particular. The compiler writer must understand even the darkest corners of the C language; touring the implementation of those corners reveals much about the language itself and its efficient realization on modern computers.
Second, most texts on programming must necessarily use small examples, which often demonstrate techniques simply and elegantly. Most programmers, however, work on large programs that have evolved--or degenerated--over time. There are few well documented examples of this kind of "programming in the large" that can serve as reference examples. lcc isn't perfect, but this book documents both its good and bad points in detail and thus provides one such reference point.
Third, a compiler is one of the best demonstrations in computer science of the interaction between theory and practice. lcc displays both the places where this interaction is smooth and the results are elegant, as well as where practical demands strain the theory, which shows in the resulting code. Exploring these interactions in a real program helps programmers understand when, where, and how to apply different techniques. lcc also illustrates numerous C programming techniques.
Fourth, this book is an example of a "literate program." Like TEX: The Program by D.E. Knuth (Addison-Wesley, 1986), this book is lcc's source code and the prose that describes it. The code is presented in the order that best suits understanding, not in the order dictated by the C programming language. The source code that appears on the diskette is extracted automatically from the book's text files.
This book is well suited for self-study by both academics and professionals. The book and its diskette offer complete documented source code for lcc, so they may interest practitioners who wish to experiment with compilation or those working in application areas that use or implement language-based tools and techniques, such as user interfaces.
The book shows a large software system, warts and all. It could thus be the subject of a postmortem in a software engineering course, for example.
For compiler courses, this book complements traditional compiler texts. It shows one way of implementing a C compiler, while traditional texts survey algorithms for solving the broad range of problems encountered in compiling. Limited space prevents such texts from including more than a toy compiler. Code generation is often treated at a particularly high level to avoid tying the book to a specific computer.
As a result many instructors prepare a substantial programming project to give their students some practical experience. These instructors usually must write these compilers from scratch; students duplicate large portions and have to use the rest with only limited documentation. The situation is trying for both students and instructors, and unsatisfying to boot, because the compilers are still toys. By documenting most of a real compiler and providing the source code, this book offers an alternative.
This book presents full code generators for the MIPS R3000, SPARC, and Intel 386 and successor architectures. It exploits recent research that produces code generators from compact specifications. These methods allow us to present complete code generators for several machines, which no other book does. Presenting several code generators avoids tying the book to a single machine, and helps students appreciate engineering retargetable software.
Assignments can add language features, optimizations, and targets. When used with a traditional survey text, assignments could also replace existing modules with those using alternate algorithms. Such assignments come closer to the actual practice of compiler engineering than assignments that implement most of a toy compiler, where too much time goes to low-level infrastructure and accommodating repetitive language features. Many of the exercises pose just these kinds of engineering problems.
lcc has also been adapted for purposes other than conventional compilation. For example, it's been used for building a C browser and for generating remote-procedure-call stubs from declarations. It could also be used to experiment with language extensions, proposed computer architectures and code-generator technologies.
We assume readers are fluent in C and assembly language for some computer, know what a compiler is and have a general understanding of what one does, and have a working understanding of data structures and algorithms at the level covered in typical undergraduate courses; the material covered by Algorithms in C by R. Sedgewick (Addison-Wesley, 1990), for example, is more than sufficient for understanding lcc.
This book owes much to the many lcc users at AT&T Bell Laboratories, Princeton University, and elsewhere who suffered through bugs and provided valuable feedback. Those who deserve explicit thanks include Hans Boehm, Mary Fernandez, Michael Golan, Paul Haahr, Brian Kernighan, Doug McIlroy, Rob Pike, Dennis Ritchie, and Ravi Sethi. Ronald Guilmette, David Kristol, David Prosser, and Dennis Rithchie provided valuable information concerning the fine points of the ANSI Standard and its interpretation. David Gay helped us adapt the PFORT library of numerical software to be an invaluable stress test for lcc's code generators.
Careful reviews of both our code and our prose by Jack Davidson, Todd Proebsting, Norman Ramsey, William Waite, and David Wall contributed significantly to the quality of both. Our thanks to Steve Beck, who installed and massaged the fonts used for this book, and to Maylee Noah, who did the artwork with Adobe Illustrator.
Christopher W. Fraser
David R. Hanson