The role of syntax analysis• Receive tokens from lexical analyzer • Verify if the received tokens conform to the language grammar or not • Generate a parsing representation usually a p
Trang 1Syntax Analysis
Quan Thanh Tho (qttho@)Nguyen Hua Phung (phung@)
cse.hcmut.edu.vn
Trang 2programming languages syntax
compilers
syntax errors
Trang 3• The role of syntax analysis (parser)
• Language syntax specification
• Parsing Techniques
• Error Recovery
Trang 4The role of syntax analysis
• Receive tokens from lexical analyzer
• Verify if the received tokens conform to the
language grammar or not
• Generate a parsing representation (usually a
parse tree)
• Handle syntax error (report and recover)
Lexical Analyzer token Syntax Analyzer
Trang 5• The role of syntax analysis (parser)
• Language syntax specification
– Syntax and Grammar
– Context-free Grammar
• Derivation
• Parse Tree
– Grammar Construction for Programming Language:
• Language construct definition
• Operators precedence and associativity
• Ambiguity
• Parsing Techniques
Trang 6Syntax and Grammar
• Syntax (programming language sense):
– Define structure of a program
– Not reflect the meaning (semantic) of the
program
• Grammar:
– Rule-based formalism to specify a language syntax
Trang 7• Effectively support language modification
• Provide fundamental basic to develop
Trang 9Formal Definition of CFG
• G = (VN ,VT,S, P)
• VN: finite set of nonterminal symbols
VT: finite set of tokens (VT∩VN=∅)
S∈VN: start symbol
P: finite set of rules (or productions ) of BNF (Backus – Naur Form) form AÆ
Trang 11Derivations : S ⇒ + α where α consists of tokens only.
Sentential form : S ⇒ * α Ù α is a sentential form
Sentence : S ⇒ + α is a derivation Ù α is a sentence
Trang 14Example 2 (cont’d)
• exp ⇒ exp op exp
⇒ exp op exp op exp
Trang 15Leftmost/ Rightmost Derivation
• There may be many derivations for a
certain sentence
sentential form is further derived by
replacing its leftmost nonterminal
sentential form is further derived by
Trang 16Example 3 – Leftmost Derivation
• exp ⇒ exp op exp
⇒ id op exp
⇒ id + exp
⇒ id + id
Trang 17Example 3 – Rightmost Derivation
• exp ⇒ exp op exp
⇒ exp op id
⇒ exp + id
⇒ id + id
Trang 18Hands-on Exercise
• Find the leftmost derivation and rightmost
derivation of id+id*id+id
Trang 19• Verify if the sequence of tokens generated
by the lexical analyzer are grammatically
• Carried out by finding a derivation
corresponding to the sequence
computer-understandable structure for further
Trang 20Parse Tree
• Tree-based structure representing a
derivation
– Root node ÙStart symbol
– Interior node ÙNonterminal symbol
– Leaf node Ùtoken or nonterminal
– Children of a node from left to right form the right-hand side of a production whose left-
hand side is the node.
– Parse tree is constructed based on the
Trang 21Example 4
• exp
exp
Trang 22Example 4
• exp ⇒ exp op exp
exp
Trang 26Hands-on Exercise
• Draw the corresponding parse tree for a derivation of id+id*id+id
Trang 27Extended Backus-Naur Form
Trang 28Language Construct Definition
• Program:
prog Æ (declaration)? statements
statements Æ statement statements
Trang 30Classic Expression Grammar
exp Æ exp + term | exp – term | term
term Æ term * factor | term /factor |factor
factor Æ ( exp ) | ID | INT
why is this classic expression grammar
better than the previously used one?
Trang 33factor
Trang 37factor
Trang 38Precedence and Associativity
• When properly written, a grammar can
enforce operator precedence and
associativity as desired
Trang 40Ambiguous Grammar
• A grammar is considered ambiguous if it allows to produce more than one parse
trees for some derivations
• A grammar can be rewritten to eliminate the ambiguousness
Trang 41Example 5
• The “Dangling-Else” Grammar
stmt Æ if exp then stmt | if exp then stmt else stmt | other
if a then if b then c else d
Trang 42• The role of syntax analysis (parser)
• Language syntax specification
Trang 43Parsing Technique
• Parsing: Find the corresponding derivation
of a sentence
• Derivation: Deriving sequence from the
• Find the forward sequence from the start symbol to the sentence: top-down parsing
Trang 44Parsing Techniques (cont’d)
• Find the backward sequence from the
sentence to the start symbol: bottom-up
parsing
• Most of the cases, the compiler will try to find the leftmost derivation and rightmost
bottom-up derivation, respectively
Trang 46Graphics Point of View
• Top-down parsing draws the parse tree
from the root node to leaves
• Bottom-up parsing draws the parse tree
from leaves to root node
Trang 47Top-down Parsing
• Starts from the start symbol, find the
leftmost derivation
• To find the leftmost derivation
– Find the leftmost nonterminal in the current
sentential form
– Replace the leftmost nonterminal by a string
inferred from a suitable production
Trang 48Lookahead Problem
• Reconsider the sentence id+id
• Start symbol: exp
• First step: find the leftmost nonterminal
Æexp
• Replace exp by a new string
exp Æ exp – term or exp Æterm
• Which alternative should be taken? Æ
Trang 49Lookahead Problem (cont’d)
• id+id
• first token looked-ahead: id
– No decision can be made in a guaranteed
manner : all three possible alternatives are
able to derive a string started by id
– look-ahead one more token
Trang 50Lookahead Problem (cont’d)
• id+id
• next token looked-ahead: +
– only two possibilities now since if we took
expÆ term , there was no way to generate the
+ token from there
– keep looking-ahead to find the most suitable one between expÆ exp+term and expÆ exp- term
Trang 51Lookahead Problem (cont’d)
• id+id
• next token looked-ahead: id
– no decision should be made
Trang 52Lookahead Problem (cont’d)
• Due to computational reason, most of
compilers can only handle to look-ahead one token
• The basic expression grammar cannot be parsed completely with only one token
looked-ahead Î why?
Trang 54Lookahead Problem Revisited
• Why we still cannot make a decision
though having obtained a certain
Trang 55Left Factoring
• AÆ Bα1| Bα2|…| Bαn| β1|…| βn
• Decision cannot be made when the
lookahead token is derivable from B
• exp Æ exp + term | exp – term | term
Trang 56Left Factoring Elimination
• Intuitional idea: Confusion is due to many
Bs Just try to convert these Bs into only one B
Trang 57Left Factoring Elimination
• Intuitional idea: Confusion is due to many
Bs Just try to convert these Bs into only one B
• Technical Solution
AÆ Bα1| Bα2|…| Bαn| β1|…| βn
Ù
AÆ BC| β1|…| βn
Trang 58Example 6
exp Æ exp + term | exp – term | term
Ù
exp Æ exp exp_tail | term
exp_tail Æ + term | - term
Trang 59• Confusion occurs when the derivable
token is the very lookahead token
Trang 60Left Recursion Elimination
• Intuitional idea: Confusion is because Aαwill eventually derive to βiα (i = 1 n) No
way to escape! Æ Solution: Rewrite the
grammar to let such deriving directly occur
in the productions
Trang 61Left Recursion Elimination (cont’d)
A Æ A α | β1 | … | βn
Ù
A Æ β1A’ | … | βnA’
A’ Æ αA’ | ∈
• Why A’ Æ αA’? :- A Æ A αÆ A ααÆ A ααα
• Why A’ Æ ∈’? :- How to stop the loop?
Trang 62Example 7
exp Æ exp exp_tail | term
exp_tail Æ + term | - term
Î
exp Æ term exp’
exp’ Æ exp_tail exp’ | ∈
exp_tail Æ + term | - term
Æ (simplified)
exp Æ term exp’
exp’ Æ + term exp’ | - term exp’ |∈
Trang 63Example 8
exp Æ exp + term | exp – term | term
Æ
exp Æ term exp’
exp’ Æ + term exp’ | - term exp’ | ∈
Trang 64Hands-on Exercise
• Rewrite the solution for eliminating
left-recursion in the general case
• A Æ Aα1 | Aα2 |… |Aαn | β1 | … | βn
Trang 65Transforming Grammar for
Top-down Parsing
• To perform top-down parsing:
– Check if the grammar contains any
left-factoring and left-recursion productions
– If yes, eliminate them
– As a result, a transformed grammar obtained
– Parse the sentence based on the transformed grammar
Trang 66Hands-on Exercise
• Construct the transformed grammar for the basic expression grammar
Trang 67Intuition of First Set
• Revisit the grammar (sentence: gchfc )
• Q: Why select S ÆBc instead of SÆ Ab
• A: Because B can derive a string beginning with
Trang 68Intuition of First Set (cont’d)
• Q: Which terminals can begin strings
derived from A and B, respectively
• A: Strings derived from A can begin with {c,d,h,i}, and B with {e,g}
• Notation: First(A) = {c,d,h,i}, First(B) =
{e,g}
Trang 69First Set
• First(α) is the set of all terminals that can
• If α ⇒* ∈, then First(α) includes ∈
• A is said nullable if A ⇒* ∈
Trang 70Compute First(α)
• If α is terminal a, then First(α) = {a}
• If α is ∈, then First(α) = {∈}
• If α is nonterminal A, and AÆβ1|…|βn, then
First(α) = First(β1)∪ First(β2)∪…∪ First(βn)
• If α = X1X2…Xn
First(α) = {}; i = 0
Repeat
i++
First(α) = First(α) ∪ (First (Xi) - ∈)
Until i=n or Xi is not nullable
If X is nullable with all i then add ∈ to First(α)
Trang 71First(A) = First(Df)∪ First(CA)
First (Df): First(D) = {h,i} D is not nullableÆ
First(Df) = {h,i}
First(CA) = {d,c}
Trang 72Example 9 (cont’d)
• A → CAd | BCa
• B → b | ∈
• C → c | ∈
Trang 73Introductory Example of Follow Set
exp Æ term exp’
exp’ Æ + term exp’ | - term exp’ | ∈
term Æ factor term’
term’ Æ * factor term’ | / factor term’ | ∈
First(term’) = {*,/,∈}
• id+id
expÆ term exp’
Æ factor term’ exp’
Trang 74Follow Set
• Why the First set is sometime not enough informative? Æ It cannot tell when we
• When αAβ Æ αβ (meaning AÆ ∈) applied,
the lookahead token should be in First(β)
• Follow(A) = the set of terminals that can
Follow(A) = {x| S ⇒* α Ax β }
Trang 75Compute Follow(A)
• Follow(A) only makes sense when A is a
nonterminal
• If A is the start symbol, Follow(A) includes $
• Find through all productions for occurrences of
BÆ αAβ
– Add {First(β) - ∈} to Follow(A)
– If β⇒*∈, add Follow{B} to Follow(A) (why?)
Trang 76Example 10
exp Æ term exp’
exp’ Æ + term exp’ | - term exp’ | ∈
term Æ factor term’
term’ Æ * factor term’ | / factor term’ | ∈
factor Æ (exp) | ID | INT
Trang 77Hands-on Exercise
• Find Follow sets of all nonterminals of the transformed expression grammar
Trang 78Idea of Select Set
• When should a production AÆα be
applied?
– The lookahead token in First(α)
– The lookahead token in Follow(A) if α⇒*∈
Select(AÆα) = (First(α)\{∈})∪Follow(A) if
First(α)∋∈
Select(AÆα) = First(α) otherwise
Trang 79Example 11
exp Æ term exp’
exp’ Æ + term exp’ | - term exp’ | ∈
term Æ factor term’
term’ Æ * factor term’ | / factor term’ | ∈
factor Æ (exp) | ID | INT
• Select(term’Æ * factor term’) = {*}
• Select(term’Æ / factor term’) = {/}
• Select(term’Æ∈) = {+,-,),$}
Never get confused anymore when selecting
Trang 80Top-down Recursive Parsing
• Scan the input repeatedly (recursively) to find the leftmost derivation
• Recursive-descent parsing (backtracking required)
• (Recursive) predictive parsing (no
backtracking required)
Trang 81Recursive-descent parsing
• Locate the leftmost nonterminal in the
current (potential) sentential form
• Find the first production whose left-hand side is the located nonterminal
• Apply the production to derives a new
sentential form candidate
parsing the sentence
Trang 86• This grammar is called LL(1):
– when scanning from l eft to right to find
l eftmost derivation, 1 lookeahead token is
enough
Trang 87LL(k) Grammar (cont’d)
• LL(k) grammar: replace 1 by k
• Classic expression grammar is not LL(k) with any k
• Transformed expression grammar is LL(1)
• Most of programming language grammars are LL(1) (including those given in the
assignments) when properly transformed
Trang 88case ‘$’: break;
default: error(“waiting for c, d or $”);
}
}
Trang 89Programming Issue (cont’d)
procedure parseA
{ // A Æ bb | aS
switch(lookahead) {
case ‘b’: match(‘b’); match(‘b’);break;
case ‘a’: match(‘a’);parseS();break;
default: error(“waiting for b or a”);
}
Trang 90Object-Oriented Programming
class S {
void parse() throw SyntaxException {
switch(lookahead) { case ‘c’: match(‘c’);
a = new A(); a.parse();
Trang 91Object-Oriented Programming
class A {
void parse () throw SyntaxException {
switch(lookahead) { case ‘b’: match(‘b’); match(‘b’);break;
case ‘a’: match(‘a’);
Trang 92Non-recursive Predictive Parsing
• Programming philosophy: how to avoid
recursion?
• Answers: using array-based data and
structured statements
– size(L1,…, Ln) = 1 + size(L2,…, Ln)
– size(L1,…,Ln) = for i:[1->n] k = k+1
• Non-recursive Predictive Parsing:
table-driven technique
Trang 93Non-recursive Predictive Parsing
(cont’d)
• Algorithm: Alg 4.3
• Parsing table: Alg 4.4
Trang 94grammar