INTRODUCTION TO COMPUTER SCIENCE HANDOUT #10. PARSING K5 & K6, Computer Science Department, Vaên Lang University Second semester Feb, 2002 Instructor: Traàn Ñöùc Quang Major themes: 1. Parse Trees 2. Constructing a Parse Tree Reading: Section 11.4. 10.1 PARSE TREES As we have briefly discussed in the previous handout, we can discover that a string s belongs to the language L(<S>), for some syntactic category <S>, by the repeated application of productions: 1. Start with some strings derived from basis productions, those that have no syn- tactic category in the body. 2. Then "apply" productions to strings already for various syntactic categories. Each application involves substituting strings for occurrences of the various syntactic categories in the body of the production, and thereby constructing a string that belongs to the syntactic category of the head. 3. Eventually, construct the string s by applying a production with <S> at the head. It is often useful to draw the "proof" that s is in L(<S>) as a tree, which we call a parse tree. The nodes of a parse tree are labeled, either by terminals, by syntactic categories, or by the symbol ε. 1. The leaves are labeled only by terminal or ε, and 2. The interior nodes are labeled only by syntactic categories. 3. Every interior node v represents the application of a production. That is, there must be some production such that: a) The syntactic category labeling v is the head of the production, and b) The labels of the children of v, from the left, form the body of the production. 54 INTRODUCTION TO COMPUTER SCIENCE: HANDOUT #10. PARSING Here is the parse tree for the string 3*(2+14) using the grammar in the Handout #9, but we have abbreviated the syntactic categories <Expression>, <Number>, and <Digit> to <E>, <N>, and <D>, respectively. The string 3*(2+14) is called the yield of the above parse tree. 10.2 CONSTRUCTING A PARSE TREE To see how a parse tree can be build, let us follow the construction of the parse tree shown in the figure. The grammar is reproduced for easy reference. (1) <E> → <N> (2) <E> → ( <E> ) (3) <E> → <E> + <E> (4) <E> → <E> −− <E> (5) <E> → <E> * <E> (6) <E> → <E> / <E> (7) <D> → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 (8) <N> → <D> (9) <N> → <N> <D> <E> <E> <E> * <E> <E> <E> + ( ) <N> <D> <N> 3 <N> <D> 2 <D><N> <D> 1 4 10.2 CONSTRUCTING A PARSE TREE 55 1. First, construct a one-node tree for each terminal in the tree. 3 * ( 2 + 1 4 ) 2. For the terminals 1, 2, 3, and 4, apply the productions (7) to get four two-node trees. 3. Apply the production (8), or <N> → <D>, to the trees (a), (b), and (c) to obtain three following trees: 4. Now, apply the production (9), or <N> → <N> <D>, to the trees (a) in step 3 and (d) in step 2 to get the tree for 14. <D> 2 <D> 3 <D> 1 <D> 4 (a) (b) (c) (d) <N> <D> 2 <N> <D> 3 <N> <D> 1 (a) (b) (c) <N> <D><N> <D> 1 4 56 INTRODUCTION TO COMPUTER SCIENCE: HANDOUT #10. PARSING 5. Three parse trees below are constructed by the production (1), or <E> → <N>. 6. Next, use the production (3), or <E> → <E> + <E>, for (b) and (c) in step 5, and + in step 1, to construct a new parse tree with the yield 2+14. 7. Applying the production (2), or <E> → ( <E> ), to the resulting tree in step 6, we have the parse tree with the yield (2+14) as shown in the figure on the next page. 8. The overall parse tree, as in page 54, for the string 3*(2+14) is produced by applying the production (4), or <E> → <E> * <E>, to the parse trees (a) in step 3, * in step 1, and the parse tree of step 7. <N> <D> <N> <D> 1 4 <E> <N> <D> 2 (b) <N> <D> 3 (a) (c) <E><E> <E> + <N> <D> <N> <D> 1 4 <E> <N> <D> 2 <E> 10.3 GLOSSARY 57 10.3 GLOSSARY Parsing: Phaõn tớch cuự phaựp. Also syntax analysis. Parse tree: Caõy phaõn tớch cuự phaựp. Syntax tree: Caõy cuự phaựp. A compacted parse tree; also expression tree or operator tree. Yield: Hoa lụùi (cuỷa caõy phaõn tớch cuự phaựp). <E> + <N> <D> <N> <D> 1 4 <E> <N> <D> 2 <E> <E> ( ) . INTRODUCTION TO COMPUTER SCIENCE HANDOUT #10. PARSING K5 & K6, Computer Science Department, Vaên Lang University Second semester Feb, 2002 Instructor: Traàn Ñöùc Quang Major. labels of the children of v, from the left, form the body of the production. 54 INTRODUCTION TO COMPUTER SCIENCE: HANDOUT #10. PARSING Here is the parse tree for the string 3*(2+14) using the grammar. (d) <N> <D> 2 <N> <D> 3 <N> <D> 1 (a) (b) (c) <N> <D><N> <D> 1 4 56 INTRODUCTION TO COMPUTER SCIENCE: HANDOUT #10. PARSING 5. Three parse trees below are constructed by the production (1),