Preface This bwk is a descendant of Prinrlpdes of Compiler Design by Alfred V , Aho and Jeffrey D UNman Like its ancestor, it is intended as a text for a first course in compiler design The emphasis is on solving p b l c m s universally cnwuntered in designing s language'translator, regardless of the source or target machine Although few p p l e are likely to build or even maintain a compiler for a major programming language, the reader can profitably apply the ideas and techniques discussed in this book to general software design Fwr example, the string matching techniques for building lexical analyzers have also been used in text editors, information retrieval systems, and pattern recognition programs Curttext-free grammars and syntax-d irected definitions have been u d to build many little languages such as the typesettin6 and figure drawing systems that prproduced this h k , The techniques of d e optimization have been used in program verifitrs and in programs that prduce 'Structured" pdograms from unstructured ones The m a p topicn' in cornpib design are covered in depth The first chapter intrduccs the basic structure of a compiler and is essential to the rest of the bQk Chapter presents a translator from infix to p t f i x expressions, built using some of the basic techniques described in this book, Many of the remaining chapters amplify the material in Chapter Chapter covers lexical analysis, regular expressions, finitc-state machines, and scanner-generator tools The maprial in this chapter i s broadly applicabk to text-prcxx~ing* Chapter cuvers the major parsing techniques in depth, ranging from t h t recursiue&scent methods that are suitable for hand implementation to the mmputatianaly more intensive LR techniques that haw ken used in parser generators Chapter introduces the principal Meas in syntaxdirected translation This chapter is used in the remainder of the h k for both specifying and implcmenting t rrrnslations Chapter presents the main ideas for pwforming static semantic checking, Type checking and unification are discuswd in detail, PREFACE Chapter discusses storage organizations u d to support the run-time environment of a program Chapter begins with a discussion of intermediate languages and then shows how common programming language constructs can be translated into intermediate d e Chapter covers target d e generation Included are the basic "on-thefly" d e generation mcthds, as well as optimal rnethds for generating d t for expressions, Peephole optimization and dt-generator generators arc also covered Chapter 10 is a wmprehensivc treatment of d t optimization Data-flow analysis methods are covered in detail, as well as the principal rnethds for global optirnhtiw Chapter I discusses some pragmatic issues that arise in implementing a compiler Software engineering and teaing are particularly important in m- pller mnstxuctim Chapter 12 presents case studies of wmpikrs that have been ms~nrctcd udng some of the techniques presented in this book Appndix A dcscriks a simple language; a "subset" of Pascal, that can be used as the basis of an implementation project, The authors have taught both introductory and advanced courses, at the undergraduate and graduate levels, from the material in this b k at: AT&T &11 hbratories, Columbia, Princeton, and Stanford, An introductory mmpibr course might cover matmid from the following sections of this book: introduction lexical analysis symbl tables parsing Chapter and Sections 2.1-2.5 2.6 3.1-3.4 2.7, 7-6 2.4, 4.1-4,4 synt a x ireded trawlation type checking run-time organization intermediate code generation d e generation d e optimization Informmtbn needmi for a programming project like the one in Apptndix A is introduced in Chapter A course stressing twls In compiler construction might include tbe dimssion of lexical analyzer generators in Sections 3.5, of pmw generators in SIXtions 4.8 and 4.9, of code-generator generators in Wim 9.12, and material on techniques for compiler constriction from Chapter I I An advanced course might stress the algorithms used in lexica1 analyzer generators and parser gcneratms discussed in Chapters and 4, the material PREFACE on type equivalence, overloading, polymurphisrn, and unifica~ionIn Chapter , the material on run-time storage organizalion in Chapter 7, the paiterndirected code generation methods discussed in Chapter 9, and material on code optimization from Chapter 10 Exercises As before: we rate exercises with stars Exereism without stars test understanding of definitions, singly starred exercises are intended for more advanced courses, and doubly starred exercises are fond for thought Acknowledgments At various stages in the writing of this book, a number of people have given us invaluable comments on the manuscript In this regard we owe a debt of gratitude to Bill Appelbe Nelson Beebe, Jon Btntley, Lois Bngess, Rodney Farrow, Stu Feldman, Charles Fischer, Chris Fraser, Art Gittelman, Eric Grosse, Dave Hanson, Fritz Henglein, Robert Henry, Gerard Holzmann, Steve Johnson, Brian Kernighan, Ken Kubota, Daniel Lehmann, Dave MacQueen, Dtanne Maki, Alan Martin, Doug Mcllroy, Charles McLaughlin, John Mitchell, Elliott Organick, Roberr Paige, Phil Pfeiffer, Rob Pike, Kari-Jouko Riiiha, Dennis Rirchic Srirarn Sankar, Paul Stwcker, Bjarne Strmlstrup, Tom Szyrnanskl Kim Tracy Peter Weinberger, Jennifer Widom and Reinhard Wilhelra This book was phototypeset by the authors using the cxcellenr software available on the UNlX system The typesetting c o m n m d read picJk.s tbl e q n I t m f f -ms p i c is Brian Kernighan's language for typesetting figures; we owe Brian a special debt of gratirude for accommodating our special and extensive figuredrawing needs so cheerfully, tbl is Mike Lesk's language for laying out tables eqn is Brian Kernighan a d Lorinda Cherry's language for typesetting mathcrnatics trofi is Joe Ossana's program for formarring text for a phototypesetter, which in our case was a Mergenthakr Lino~ron202M The ms package of troff macros was written by Mike Lesk in addition, we managed the lext using make due to Stu Feldman, Crass references wirhin the text.-were mainrained using awk crealed by A l Aho, Brian Kernighan, and Peter Weinberger, and sed created bv Lee McMahon The authors would par~icularlylike to aekoowledp Patricia Solomon for heipin g prepare the manuscript for photocomposiiion Her cheerfuhcss and expert typing were greatly appreciated I D Ullrnan was supported by an Einstein Fellowship of the Israeli Academy of Arts and Sciences during part of the lime in which this book was written Finally, the authors would like thank AT&T Bell Laboratories far ils suppurt during the preparation of the manuscript A,V+A,.R.S J.D.U Contents 1.1 Compilers I 1.2 Analysis of the source program 1.3 The phasa of a compiler 16 1.4 Cousins of the compiler 1.5 The grouping of phases ., I 20 1.6 Compiler-construction tools Bibliographic noles Cbapkr 22 23 A Simple Ompass Cempiler 2.1 Overview 2.2 Syntax definition 2.3 Syntax-directed translation 2.4 Parsing 2.5 A translator for simple expressions 2.6 Lexical analysis 2.7 Incarprating a symbol table 2.8 Abstract stack machines 2.9 Putting the techniques together Exercises Bibliographic notes Chapter bid Analysis 33 3.1 The role of the bxical analyzer 3.2 Input buffering 3.3 Specification of tokens 3.4 Recognition of tokens 3.5 A language for specifying lexical analyzers Finite automata 3.7 From a regular expression to an NFA 3.8 Design of a lexical analyzer generator 3.9 Optimization of DFA-based pattern matchers Exercises Bibliographic notes CONTENTS Chapter Syntax A d y s b 4.1 The role of the parser 4.2 Context-free grammars 4.3 Writing a grammar 4.4 Topdown parsing 4.5 Bottom-up parsing ; - 4.6 Operator-precedence parsing 4.7 LR parsers 4.8 Using ambiguous grammars 4.9 Parser generators Exercises * .*.* *.**** Bibliographic notes Chapter S y n t s K - D i m Translation 5.1 Synta~directeddefinitions 5.2 Construction of syntax trees 5.3 Bottom-up evaluation of Sattributed definitions 5.4 L-attributed definitions 5.5 Topdown translation 5.6 Bottom-up evaluation of inherited attributes 5.7 Recursive evaluators 5.8 Space for attribute values at compile time 5.9 Assigning spare at compiler-construction time LO Analysis of syntaxdirected definitions E ~ercises * * .** .* ' Bibliographic notes Chapter Type khaklng 6.1 Type systems 6.2 Specification of a simple type checker 6.3 Equivalence of type expressions 6.4 Type conversions Overloading of functions and operators 6.6 Polymorphic funclions 6.7 An algorithm for unification Exercises Bibliographic notes 7+1 Source language issues 7.2 Storage organization 7.3 Storage-allocation strategies 7.4 A m s s to nonlocal names CONTENTS 7.5 Parameter passing 424 7.6 Symbol tables .429 7.7 Language facilities for dynamic storage allmation 440 7.8 Dynamic storage alkation techniques , 442 7.9 $orage allocation in Fortran 446 Exercises 455 Bibliographic notes 461 463 Chapter Intermediate C& Generstba 8.I Intcrmediatt languages ** ** , 8.2 Declarations 8.3 Assignment slaternents 8.4 Boolean e~pressions ** .* 8.5 Case statements - 8.6 Backpatching 8.7 P r d u r e calls Bibliographic notes Exercises 9.1 Issues in the design of a code generator 9.2 The target machine 9.3 Run-time storage management 9.4 Basic blocks and flow graphs 9.5 Next-use information 9.6 A simple code generator 9.7 Register allocation and assignment 9.8 The dag representation of basic blwks 9.9 Peephole optimist ion 9.10 Generating code from dagg 9.1 Dynamic programming code-generation algorithm 9.12 Code-generator generators Exercises Bibliographic noles 586 10.1 Introduction I 10.2 The principal sources of optimization 592 10.3 Optimization of basic blocks 598 10.4 Loops in flow graphs .- 602 608 10.5 introduction to global data-flow analysis 10.6 l€erative mlutiosi of data-flow equations 624 10.7 Cde-improving transformations 633 10.8 Dealing with aliases 648 CONTENTS 10.9 Data-flow analysis of structured flow graphs 10.10 Efficient data-flow algorithms 10.1 A tool for data-flow analysis 10.12 Estimation of typ +,., 10.13 Sy m b l i c debugging of optimized axle Exercises Bibliographic notes 660 671 680 694 703 711 718 723 Chapter 11 Want to Write a Compiler? Planning a compiler Approaches to compiler development The compilerdevelopment environment Testing and maintenance 723 725 12.1 BQN a preproawr for typesetting mathematics 12.2 Compilers for Pascal 12.3 The C compilers 12.4 The Fortran H compilers - 12.5 The Bliss( l compiler 12.6 Modula-2 optimizing compiler 733 734 735 11 11.2 I 1.3 L 729 731 737 740 742 A l Intrduction 745 A.2 A Pascalsubset 745 A.3 Program structure 745 A.4 Lexical conventions 743 A Suggested exercises 749 A.6 Evolution of the interpreter 750 A.7 Extensions : 751 ? CHAPTER Introduction to Compiling The principles and techniques of compiler writing are so pervasive that the ideas found in this book will be used many times in the career of a cumputer scicnt is1, Compiler writing spans programming languages, machine architecture, language theory, algorithms, and software engineering Fortunately, a few basic mrnpikr-writing techniques can be used to construct translators for P wide variety of languages and machines In this chapter, we intrduce the subject of cornpiiing by dewxibing the components of a compiler, the environment in which compilers their job, and some software tools that make it easier to build compilers 1.1 COMPILERS Simply stated, a mmpiltr i s a program that reads a program written in oae language - the source Language - and translates it inm an equivalent prqgram in another language - the target language (see Fig 1.I) As an important part of this translation process, the compiler reports to its user the presence of errors in the murcc program messages At first glance, the variety of mmpilers may appear overwhelming There are thousands of source languages, ranging from traditional programming languages such as Fortran and Pascal to specialized languages (hat have arisen in vktually every area of computer application Target languages are equally as varied; a target language may be another programming language, or the machine language of any computer between a microprocasor and a supercwmputcr, Compilers arc sometimes classified as ~ingle~pass, multi-pass, load-and-go, debugging, or optimizing, depending on how they have been constructed or on what function they arc suppsed to pcrform Uespitc this apparent complexity, the basic tasks that any compiler must perform arc essentially the same By understanding thcse tasks, we can construct compilers h r a wide variety of murcc languages and targct machines using the same basic techniques Our knowlctlp about how to organim and write compilers has increased vastly sincc thc first compilers startcd to appcar in the carty 1950'~~ it is difficult to give an exact date for the first compiler kcausc initially a great deal of experimentat ion and implementat ion was donc independently by several groups Much of the early work on compiling deal1 with the translation of arithmetic formulas into machine cads Throughout the lY501s, compilers were mnsidcred notoriously difficult programs to write The first Fortran ~Cimpller,for exampie, t o o k f staff-years to implement (Backus ct a[ 119571) We have since discovered systematic techniques for handling many of the imponant tasks that mcur during compilation Good implementation languages, programming environments, and software t w l s have also been developed With the% advances, a substantial compiler can be implemented even as a student projtxt in a onesemester wmpilar-design cuursc+ There are two puts to compilation: analysis and synthesis The analysis part breaks up the source program into mnstitucnt pieces and 