1. Trang chủ
  2. » Công Nghệ Thông Tin

slike bài giảng môn chương trình dịch chương 3 syntax analysis

94 1,4K 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 94
Dung lượng 201,72 KB

Nội dung

The role of syntax analysis• Receive tokens from lexical analyzer • Verify if the received tokens conform to the language grammar or not • Generate a parsing representation usually a p

Trang 1

Syntax Analysis

Quan Thanh Tho (qttho@)Nguyen Hua Phung (phung@)

cse.hcmut.edu.vn

Trang 2

programming languages syntax

compilers

syntax errors

Trang 3

• The role of syntax analysis (parser)

• Language syntax specification

• Parsing Techniques

• Error Recovery

Trang 4

The role of syntax analysis

• Receive tokens from lexical analyzer

• Verify if the received tokens conform to the

language grammar or not

• Generate a parsing representation (usually a

parse tree)

• Handle syntax error (report and recover)

Lexical Analyzer token Syntax Analyzer

Trang 5

• The role of syntax analysis (parser)

• Language syntax specification

– Syntax and Grammar

– Context-free Grammar

• Derivation

• Parse Tree

– Grammar Construction for Programming Language:

• Language construct definition

• Operators precedence and associativity

• Ambiguity

• Parsing Techniques

Trang 6

Syntax and Grammar

• Syntax (programming language sense):

– Define structure of a program

– Not reflect the meaning (semantic) of the

program

• Grammar:

– Rule-based formalism to specify a language syntax

Trang 7

• Effectively support language modification

• Provide fundamental basic to develop

Trang 9

Formal Definition of CFG

• G = (VN ,VT,S, P)

• VN: finite set of nonterminal symbols

VT: finite set of tokens (VT∩VN=∅)

S∈VN: start symbol

P: finite set of rules (or productions ) of BNF (Backus – Naur Form) form AÆ

Trang 11

Derivations : S ⇒ + α where α consists of tokens only.

Sentential form : S ⇒ * α Ù α is a sentential form

Sentence : S ⇒ + α is a derivation Ù α is a sentence

Trang 14

Example 2 (cont’d)

• exp ⇒ exp op exp

⇒ exp op exp op exp

Trang 15

Leftmost/ Rightmost Derivation

• There may be many derivations for a

certain sentence

sentential form is further derived by

replacing its leftmost nonterminal

sentential form is further derived by

Trang 16

Example 3 – Leftmost Derivation

• exp ⇒ exp op exp

id op exp

id + exp

id + id

Trang 17

Example 3 – Rightmost Derivation

• exp ⇒ exp op exp

exp op id

exp + id

id + id

Trang 18

Hands-on Exercise

• Find the leftmost derivation and rightmost

derivation of id+id*id+id

Trang 19

• Verify if the sequence of tokens generated

by the lexical analyzer are grammatically

• Carried out by finding a derivation

corresponding to the sequence

computer-understandable structure for further

Trang 20

Parse Tree

• Tree-based structure representing a

derivation

– Root node ÙStart symbol

– Interior node ÙNonterminal symbol

– Leaf node Ùtoken or nonterminal

– Children of a node from left to right form the right-hand side of a production whose left-

hand side is the node.

– Parse tree is constructed based on the

Trang 21

Example 4

• exp

exp

Trang 22

Example 4

• exp ⇒ exp op exp

exp

Trang 26

Hands-on Exercise

• Draw the corresponding parse tree for a derivation of id+id*id+id

Trang 27

Extended Backus-Naur Form

Trang 28

Language Construct Definition

• Program:

prog Æ (declaration)? statements

statements Æ statement statements

Trang 30

Classic Expression Grammar

exp Æ exp + term | exp – term | term

term Æ term * factor | term /factor |factor

factor Æ ( exp ) | ID | INT

why is this classic expression grammar

better than the previously used one?

Trang 33

factor

Trang 37

factor

Trang 38

Precedence and Associativity

• When properly written, a grammar can

enforce operator precedence and

associativity as desired

Trang 40

Ambiguous Grammar

• A grammar is considered ambiguous if it allows to produce more than one parse

trees for some derivations

• A grammar can be rewritten to eliminate the ambiguousness

Trang 41

Example 5

• The “Dangling-Else” Grammar

stmt Æ if exp then stmt | if exp then stmt else stmt | other

if a then if b then c else d

Trang 42

• The role of syntax analysis (parser)

• Language syntax specification

Trang 43

Parsing Technique

• Parsing: Find the corresponding derivation

of a sentence

• Derivation: Deriving sequence from the

• Find the forward sequence from the start symbol to the sentence: top-down parsing

Trang 44

Parsing Techniques (cont’d)

• Find the backward sequence from the

sentence to the start symbol: bottom-up

parsing

• Most of the cases, the compiler will try to find the leftmost derivation and rightmost

bottom-up derivation, respectively

Trang 46

Graphics Point of View

• Top-down parsing draws the parse tree

from the root node to leaves

• Bottom-up parsing draws the parse tree

from leaves to root node

Trang 47

Top-down Parsing

• Starts from the start symbol, find the

leftmost derivation

• To find the leftmost derivation

– Find the leftmost nonterminal in the current

sentential form

– Replace the leftmost nonterminal by a string

inferred from a suitable production

Trang 48

Lookahead Problem

• Reconsider the sentence id+id

• Start symbol: exp

• First step: find the leftmost nonterminal

Æexp

• Replace exp by a new string

exp Æ exp – term or exp Æterm

• Which alternative should be taken? Æ

Trang 49

Lookahead Problem (cont’d)

• id+id

• first token looked-ahead: id

– No decision can be made in a guaranteed

manner : all three possible alternatives are

able to derive a string started by id

– look-ahead one more token

Trang 50

Lookahead Problem (cont’d)

• id+id

• next token looked-ahead: +

– only two possibilities now since if we took

expÆ term , there was no way to generate the

+ token from there

– keep looking-ahead to find the most suitable one between expÆ exp+term and expÆ exp- term

Trang 51

Lookahead Problem (cont’d)

• id+id

• next token looked-ahead: id

– no decision should be made

Trang 52

Lookahead Problem (cont’d)

• Due to computational reason, most of

compilers can only handle to look-ahead one token

• The basic expression grammar cannot be parsed completely with only one token

looked-ahead Î why?

Trang 54

Lookahead Problem Revisited

• Why we still cannot make a decision

though having obtained a certain

Trang 55

Left Factoring

• AÆ Bα1| Bα2|…| Bαn| β1|…| βn

• Decision cannot be made when the

lookahead token is derivable from B

• exp Æ exp + term | exp – term | term

Trang 56

Left Factoring Elimination

• Intuitional idea: Confusion is due to many

Bs Just try to convert these Bs into only one B

Trang 57

Left Factoring Elimination

• Intuitional idea: Confusion is due to many

Bs Just try to convert these Bs into only one B

• Technical Solution

AÆ Bα1| Bα2|…| Bαn| β1|…| βn

Ù

AÆ BC| β1|…| βn

Trang 58

Example 6

exp Æ exp + term | exp – term | term

Ù

exp Æ exp exp_tail | term

exp_tail Æ + term | - term

Trang 59

• Confusion occurs when the derivable

token is the very lookahead token

Trang 60

Left Recursion Elimination

• Intuitional idea: Confusion is because Aαwill eventually derive to βiα (i = 1 n) No

way to escape! Æ Solution: Rewrite the

grammar to let such deriving directly occur

in the productions

Trang 61

Left Recursion Elimination (cont’d)

A Æ A α | β1 | … | βn

Ù

A Æ β1A’ | … | βnA’

A’ Æ αA’ | ∈

• Why A’ Æ αA’? :- A Æ A αÆ A ααÆ A ααα

• Why A’ Æ ∈’? :- How to stop the loop?

Trang 62

Example 7

exp Æ exp exp_tail | term

exp_tail Æ + term | - term

Î

exp Æ term exp’

exp’ Æ exp_tail exp’ | ∈

exp_tail Æ + term | - term

Æ (simplified)

exp Æ term exp’

exp’ Æ + term exp’ | - term exp’ |∈

Trang 63

Example 8

exp Æ exp + term | exp – term | term

Æ

exp Æ term exp’

exp’ Æ + term exp’ | - term exp’ | ∈

Trang 64

Hands-on Exercise

• Rewrite the solution for eliminating

left-recursion in the general case

• A Æ Aα1 | Aα2 |… |Aαn | β1 | … | βn

Trang 65

Transforming Grammar for

Top-down Parsing

• To perform top-down parsing:

– Check if the grammar contains any

left-factoring and left-recursion productions

– If yes, eliminate them

– As a result, a transformed grammar obtained

– Parse the sentence based on the transformed grammar

Trang 66

Hands-on Exercise

• Construct the transformed grammar for the basic expression grammar

Trang 67

Intuition of First Set

• Revisit the grammar (sentence: gchfc )

• Q: Why select S ÆBc instead of SÆ Ab

• A: Because B can derive a string beginning with

Trang 68

Intuition of First Set (cont’d)

• Q: Which terminals can begin strings

derived from A and B, respectively

• A: Strings derived from A can begin with {c,d,h,i}, and B with {e,g}

• Notation: First(A) = {c,d,h,i}, First(B) =

{e,g}

Trang 69

First Set

• First(α) is the set of all terminals that can

• If α ⇒* ∈, then First(α) includes ∈

• A is said nullable if A ⇒* ∈

Trang 70

Compute First(α)

• If α is terminal a, then First(α) = {a}

• If α is ∈, then First(α) = {∈}

• If α is nonterminal A, and AÆβ1|…|βn, then

First(α) = First(β1)∪ First(β2)∪…∪ First(βn)

• If α = X1X2…Xn

First(α) = {}; i = 0

Repeat

i++

First(α) = First(α) ∪ (First (Xi) - ∈)

Until i=n or Xi is not nullable

If X is nullable with all i then add ∈ to First(α)

Trang 71

First(A) = First(Df)∪ First(CA)

First (Df): First(D) = {h,i} D is not nullableÆ

First(Df) = {h,i}

First(CA) = {d,c}

Trang 72

Example 9 (cont’d)

• A → CAd | BCa

• B → b | ∈

• C → c | ∈

Trang 73

Introductory Example of Follow Set

exp Æ term exp’

exp’ Æ + term exp’ | - term exp’ | ∈

term Æ factor term’

term’ Æ * factor term’ | / factor term’ | ∈

First(term’) = {*,/,∈}

• id+id

expÆ term exp’

Æ factor term’ exp’

Trang 74

Follow Set

• Why the First set is sometime not enough informative? Æ It cannot tell when we

• When αAβ Æ αβ (meaning AÆ ∈) applied,

the lookahead token should be in First(β)

• Follow(A) = the set of terminals that can

Follow(A) = {x| S ⇒* α Ax β }

Trang 75

Compute Follow(A)

• Follow(A) only makes sense when A is a

nonterminal

• If A is the start symbol, Follow(A) includes $

• Find through all productions for occurrences of

BÆ αAβ

– Add {First(β) - ∈} to Follow(A)

– If β⇒*∈, add Follow{B} to Follow(A) (why?)

Trang 76

Example 10

exp Æ term exp’

exp’ Æ + term exp’ | - term exp’ | ∈

term Æ factor term’

term’ Æ * factor term’ | / factor term’ | ∈

factor Æ (exp) | ID | INT

Trang 77

Hands-on Exercise

• Find Follow sets of all nonterminals of the transformed expression grammar

Trang 78

Idea of Select Set

• When should a production AÆα be

applied?

– The lookahead token in First(α)

– The lookahead token in Follow(A) if α⇒*∈

Select(AÆα) = (First(α)\{∈})∪Follow(A) if

First(α)∋∈

Select(AÆα) = First(α) otherwise

Trang 79

Example 11

exp Æ term exp’

exp’ Æ + term exp’ | - term exp’ | ∈

term Æ factor term’

term’ Æ * factor term’ | / factor term’ | ∈

factor Æ (exp) | ID | INT

• Select(term’Æ * factor term’) = {*}

• Select(term’Æ / factor term’) = {/}

• Select(term’Æ∈) = {+,-,),$}

Never get confused anymore when selecting

Trang 80

Top-down Recursive Parsing

• Scan the input repeatedly (recursively) to find the leftmost derivation

• Recursive-descent parsing (backtracking required)

• (Recursive) predictive parsing (no

backtracking required)

Trang 81

Recursive-descent parsing

• Locate the leftmost nonterminal in the

current (potential) sentential form

• Find the first production whose left-hand side is the located nonterminal

• Apply the production to derives a new

sentential form candidate

parsing the sentence

Trang 86

• This grammar is called LL(1):

– when scanning from l eft to right to find

l eftmost derivation, 1 lookeahead token is

enough

Trang 87

LL(k) Grammar (cont’d)

• LL(k) grammar: replace 1 by k

• Classic expression grammar is not LL(k) with any k

• Transformed expression grammar is LL(1)

• Most of programming language grammars are LL(1) (including those given in the

assignments) when properly transformed

Trang 88

case ‘$’: break;

default: error(“waiting for c, d or $”);

}

}

Trang 89

Programming Issue (cont’d)

procedure parseA

{ // A Æ bb | aS

switch(lookahead) {

case ‘b’: match(‘b’); match(‘b’);break;

case ‘a’: match(‘a’);parseS();break;

default: error(“waiting for b or a”);

}

Trang 90

Object-Oriented Programming

class S {

void parse() throw SyntaxException {

switch(lookahead) { case ‘c’: match(‘c’);

a = new A(); a.parse();

Trang 91

Object-Oriented Programming

class A {

void parse () throw SyntaxException {

switch(lookahead) { case ‘b’: match(‘b’); match(‘b’);break;

case ‘a’: match(‘a’);

Trang 92

Non-recursive Predictive Parsing

• Programming philosophy: how to avoid

recursion?

• Answers: using array-based data and

structured statements

– size(L1,…, Ln) = 1 + size(L2,…, Ln)

– size(L1,…,Ln) = for i:[1->n] k = k+1

• Non-recursive Predictive Parsing:

table-driven technique

Trang 93

Non-recursive Predictive Parsing

(cont’d)

• Algorithm: Alg 4.3

• Parsing table: Alg 4.4

Trang 94

grammar

Ngày đăng: 23/10/2014, 17:33

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

w