(BQ) Part 2 book Data structures and problem solving using C++ has contents Stacks compilers, utilities, simulation, graphs paths, stacks queues, linked lists, trees, binary search trees, hash tables, a priority queue the binary heap, splay trees, merging priority queues, the disjoint set class.
Chapter 12 Stacks and Compilers Stacks are used extensively in compilers In this chapter we present two simple components of a compiler: a balanced symbol checker and a simple calculator We so to show simple algorithms that use stacks and to show how the STL classes described in Chapter are used In this chapter, we show: how to use a stack to check for balanced symbols, how to use a state machine to parse symbols in a balanced symbol program, and how to use operator precedence parsing to evaluate infix expressions in a simple calculator program 12.1 Balanced-Symbol Checker As discussed in Section 7.2, compilers check your programs for syntax errors Frequently, however, a lack of one symbol (such as a missing * / comment-ender or 1) causes the compiler to produce numerous lines of diagnostics without identifying the real error A useful tool to help debug compiler error messages is a program that checks whether symbols are balanced In other words, every { must correspond to a 1, every [ to a l , and so on However, simply counting the numbers of each symbol is insufficient For example, the sequence [ ( ) is legal, but the sequence [ ( I ) is wrong 12.1.1 Basic Algorithm A stack is useful here because we know that when a closing symbol such as is seen, it matches the most recently seen unclosed ( Therefore, by placing an opening symbol on a stack, we can easily determine whether a closing symbol makes sense Specifically, we have the following algorithm A stack can be used detect mismatched symbols Stacks and Compilers I > Symbols: ( [ ( [ I >* [ ) ) * [ eof* Errors (indicated by *): (when expecting) (with no matching opening symbol [ unmatched at end of input Figure 12.1 Stack operations in a balanced-symbol algorithm Make an empty stack Read symbols until the end of the file a If the symbol is an opening symbol, push it onto the stack b If it is a closing symbol the following i If the stack is empty, report an error ii Otherwise, pop the stack If the symbol popped is not the corresponding opening symbol, report an error At the end of the file, if the stack is not empty, report an error Symbols in comments, string constants, and character constants need not be balanced Line numbers are needed for meaningful error messages In this algorithm, illustrated in Figure 12.1, the fourth, fifth, and sixth symbols all generate errors The > is an error because the symbol popped from the top of stack is a (, so a mismatch is detected The ) is an error because the stack is empty, so there is no corresponding ( The [ is an error detected when the end of input is encountered and the stack is not empty To make this technique work for C++ programs, we need to consider all the contexts in which parentheses, braces, and brackets need not match For example, we should not consider a parenthesis as a symbol if it occurs inside a comment, string constant, or character constant We thus need routines to skip comments, string constants, and character constants A character constant in C++ can be difficult to recognize because of the many escape sequences possible, so we need to simplify things We want to design a program that works for the bulk of inputs likely to occur For the program to be useful, we must not only report mismatches but also attempt to identify where the mismatches occur Consequently, we keep track of the line numbers where the symbols are seen When an error is encountered, obtaining an accurate message is always difficult If there is an extra , does that mean that the > is extraneous? Or was a I missing earlier? Balanced-Symbol Checker We keep the error handling as simple as possible, but once one error has been reported, the program may get confused and start flagging many errors Thus only the first error can be considered meaningful Even so, the program developed here is very useful 12.1.2 Implementation The program has two basic components One part, called tokenization, is the process of scanning an input stream for opening and closing symbols (the tokens) and generating the sequence of tokens that need to be recognized The second part is running the balanced symbol algorithm, based on the tokens The two basic components are represented as separate classes Figure 12.2 shows the Tokenizer class interface, and Figure 12.3 shows the Balance class interface The Tokenizer class provides a constructor that requires an istream and then provides a set of accessors that can be used to get - - Tokenization is the process Of generating the sequence of symbols~tokens,that need to be recognized the next token (either an openinglclosing symbol for the code in this chapter or an identifier for the code in Chapter 13), the current line number, and the number of errors (mismatched quotes and comments) The Tokenizer class maintains most of this information in private data members The Balance class also provides a similar constructor, but its only publicly visible routine is checkBalance,shown at line 24 Everything else is a supporting routine or a class data member We begin by describing the Tokenizer class inputstreamis a reference to an istream object and is initialized at construction Because of the ios hierarchy (see Section I), it may be initialized with an ifstream object The current character being scanned is stored in ch,and the current line number is stored in currentline.Finally, an integer that counts the number of errors is declared at line 37 The constructor, shown at lines 22 and 23, initializes the error count to and the current line number to and sets the istream reference We can now implement the class methods, which as we mentioned, are concerned with keeping track of the current line and attempting to differentiate symbols that represent opening and closing tokens from those that are inside comments, character constants, and string constants This general process of recognizing tokens in a stream of symbols is called lexical analysis Figure 12.4 shows a pair of routines, nextchar and putBackChar.The nextchar method reads the next character from inputstream,assigns it ~exica~analysis is wedto ignore comments and re,,gnize symbols Stacks and Compilers 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 #include #include #include #include using namespace std; / / Tokenizer class / / CONSTRUCTION: with an istream that is ready to be read // / / ******************PUBLIC OPERATIONS*********************** // // // // // // char getNextOpenClose( i int getLineNumber( ) int getErrorCount ( ) string getNextID( ) Return next open/close symbol Return current line number Return number of parsing errors Return next C++ identifier (see Section 13.2) > > - > > *******I*****~****ERRORS**I*~***************************** / / Mismatched ' , " , and EOF reached in a comment are noted class Tokenizer { public: Tokenizer( istream : currentline( & ) , input ) errors( ) , inputstream( input ) I j / / The public routines char getNextOpenClose( ) ; string getNextID( ) ; int getLineNumber( ) const; int getErrorCount( ) const; private: enum CommentType { SLASH-SLASH, SLASH-STAR ; istream & inputstream; char ch; int currentline; int errors; i/ // // // Reference to the input stream Current character Current line Number of errors detected / / A host of internal routines boo1 nextchar ( ) ; void putBackChar ( ) ; void skipcomment( ComrnentType start void skipQuote( char quoteType ) ; string getRemainingString( ) ; ) ; 1; Figure 12.2 The Tokenizer class interface, used to retrieve tokens from an input stream Balanced-Symbol Checker 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 #include "Tokenizer.hn #include using namespace std; / / Symbol is the class that will be placed on the Stack struct Symbol i char token; int theline; 1; / / Balance class interface: check for balanced symbols // / / CONSTRUCTION: with an istream object / / ******************PUBLIC OPERATIONS******************** / / int CheckBalance( ) // > Print mismatches return number of errors class Balance { public: Balance( istream int checkBalance ( private: Tokenizer tok; int errors; & input ) : tok( input ) , errors( ) { ) ; / / Token source / / Mismatched openiclose symbol errors void checkMatch( const Symbol & opSym, const Symbol & clSp 1; I; Figure 12.3 I Class interface for a balanced-symbol program to ch,and updates currentLine if a newline is encountered It returns false only if the end of the file has been reached The complementary procedure putBackChar puts the current character, ch,back onto the input stream, and decrements currentLine if the character is a newline Clearly, putBackChar should be called at most once between calls to nextchar;as it is a private routine, we not worry about abuse on the part of the class user Putting characters back onto the input stream is a commonly used technique in parsing In many instances we have read one too many characters, and undoing the read is useful In our case this occurs after processing a / We must determine whether the next character begins the comment start token; if it does not, we cannot simply disregard it because it could be an opening or closing symbol or a quote Thus we pretend that it is never read - Stacks and Compilers 10 11 12 13 14 15 16 17 18 19 20 // // // // // nextchar sets ch based on the next character in inputstream and adjusts currentLine if necessary It returns the result of get putBackChar puts the character back onto inputstream Both routines adjust currentLine if necessary boo1 Tokenizer::nextChar( ) i if ( !inputStream.get(ch ) ) return false; if( ch = = ' \ n ' ) currentline++; return true; void Tokenizer::putBackChar( ) i inputStream.putback( ch ) ; if ( ch == \ n ' ) currentline ; Figure 12.4 The state machine is a common technique used to parse symbols; at any pointy it is in some state, and each input character takes it to a new state Eventually, the state machine reaches a state in which a svmbol has been recognized The nextchar routine for reading the next character, updating currentLine if necessary, and returning true if not at the end of file; and the putBackChar routine for putting back ch and updating currentLine if necessary Next is the routine skipcomment,shown in Figure 12.5 Its purpose is to skip over the characters in the comment and position the input stream so that the next read is the first character after the comment ends This technique is complicated by the fact that comments can either begin with / / , in which case the line ends the comment, or / *, in which case * / ends the comment.] In the / / case, we continually get the next character until either the end of file is reached (in which case the first half of the && operator fails) or we get a newline At that point we return Note that the line number is updated automatically by nextchar Otherwise, we have the / * case, which is processed starting at line 15 The skipcomment routine uses a simplified state machine The state machine is a common technique used to parse symbols; at any point, it is in some state, and each input character takes it to a new state Eventually, it reaches a state at which a symbol has been recognized In skipcomment,at any point, it has matched 0, , or characters of the * / terminator, corresponding to states 0, 1, and If it matches two characters, it can return Thus, inside the loop, it can be in only state or because, if it is in state and sees a /, it returns immediately Thus the state I We not consider deviant cases involving \ Balanced-Symbol Checker 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 / / Precondition: We are about to process a comment; // have already seen comment start token / / Postcondition: Stream will be set immediately after /I' comment ending token void Tokenizer::skipComment( CommentType start i { if( start == SLASH-SLASH ) i while( nextchar ( ) & & ch ( != ' \ n l) ) return; / / ~ o o kfor * / boo1 state = false; while( nexrChar( ) / / Seen first char in comment ender ) I if( state & & ch == return; state = ( c h = = ' * ' ' / I ) ) ; errors++; cout