1. Trang chủ
  2. » Công Nghệ Thông Tin

slike bài giảng môn chương trình dịch chương 2 design pattern visitor

39 413 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 39
Dung lượng 242,74 KB

Nội dung

LEXICAL ANALYSIS Phung Hua Nguyen University of Technology 2006 Outline • Introduction to Lexical Analysis • Token specification – Language – Regular Expressions (REs) • Token recoginition –REs ⇒ NFA (Thompson’s construction, Algorithm 3.3) –NFA ⇒ DFA (subset construction, Algorithm 3.2) –DFA ⇒ minimal DFA (Algorithm 3.6) • Programming CSE - HCMUT Lexical Analysis 2 Introduction • Read the input characters • Produce as output a sequence of tokens • Eliminate white space and comments lexical analyzer parser symbol table source program token get next token CSE - HCMUT Lexical Analysis 3 Why ? • Simplify design • Improve compiler efficiency • Enhance compiler portability CSE - HCMUT Lexical Analysis 4 Tokens, Patterns, Lexemes Token Sample Lexeme Informal description of pattern const const const if if if relation <,<=,==,!=,>,>= < or <= or == or != or > or >= id pi, count, x2 letter followed by letters or digits num 3.14, 25, 6.02E3 any numeric constant literal “core dumped” any characters between “ and “ except “ CSE - HCMUT Lexical Analysis 5 Outline • Introduction √ • Token specification – Language – Regular Expressions (REs) • Token recoginition –REs ⇒ NFA (Thompson’s construction, Algorithm 3.3) –NFA ⇒ DFA (subset construction, Algorithm 3.2) –DFA ⇒ minimal DFA (Algorithm 3.6) • Programming CSE - HCMUT Lexical Analysis 6 Alphabet, Strings and Languages • Alphabet ∑: any finite set of symbols – The Vietnamese alphabet {a, á, à, , ã, , b, c, d, đ,…} – The binary alphabet {0,1} – The ASCII alphabet •String: a finite sequence of symbols drawn from ∑ : – Length |s| of a string s: the number of symbols in s – The empty string, denoted ∈, |∈| = 0 • Language: any set of strings over ∑; – its two special cases: • ∅: the empty set •{ ∈} CSE - HCMUT Lexical Analysis 7 Examples of Languages • ∑ ={a, á, à, , ã, , b, c, d, đ,…} – Vietnamese language • ∑ = {0,1} – A string is an instruction – The set of Pentium instructions • ∑ = the ASCII set – A string is a program – The set of C programs CSE - HCMUT Lexical Analysis 8 Terms (Fig.3.7) Term Definition prefix of s a string obtained by removing 0 or more trailing symbols of s; e.g. ban is a prefix of banana suffix of s a string formed by deleting 0 or more the leading symbols of s; e.g. na is a suffix of banana substring of s a string obtained by deleting a prefix and a suffix from s; e.g. nan is a substring of banana proper prefix, suffix or substring of s Any nonempty string x that is, respectively, a prefix, suffix os substring of s such that s ≠ x CSE - HCMUT Lexical Analysis 9 String operations • String concatenation –If x and y are strings, xy is the string formed by appending y to x. E.g.: x = hom, y = nay ⇒ xy = homnay – ∈ is the identity: ∈y = y; x∈ = x • String exponentiation –s 0 = ∈ –s i = s i-1 s E.g. s = 01, s 0 = ∈, s 2 = 0101, s 3 = 010101 CSE - HCMUT Lexical Analysis 10 [...]... B A A 20 Transition table State Input symbol 0 b {0} 1 2 CSE - HCMUT a {0,1} - {2} {3} Lexical Analysis 21 Acceptance • A NFA accepts an input string x iff there is some path in the transition graph from start state to some accepting state such that the edge labels along this path spell out x 0 0 B A 01010 A 0 B 1 A0 B 1 A0 B 1 1 0 error CSE - HCMUT Lexical Analysis 01011 A 0 B 1 A0 B1 A 1 ? 22 Deterministic... is a special case of NFA in which 1 no state has an -transition, and 2 for each state s and input symbol a, there is at most one edge labeled a leaving s CSE - HCMUT Lexical Analysis 23 Thompson’s construction of NFA from REs • guided by the syntactic structure of the RE r • For , i f • For a in i CSE - HCMUT a f Lexical Analysis 24 Thompson’s construction (cont’d) • Suppose N(s) and N(t) are NFA’s... strings of letters, including D)* CSE - HCMUT all strings of letters and digits beginning with a letter all strings of one or more digits Lexical Analysis 12 Regular Expressions (REs) over Alphabet • Inductive base: 1 is a RE, denoting the RL { } 2 a is a RE, denoting the RL {a} • Inductive step: Suppose r and s are REs, denoting the language L(r) and L(s) Then 3 (r)|(s) is a RE, denoting the RL L(r)... Regular Expressions (REs) • Token recoginition – REs – NFA – DFA NFA (Thompson’s construction, Algorithm 3.3) DFA (subset construction, Algorithm 3 .2) minimal DFA (Algorithm 3.6) • Programming CSE - HCMUT Lexical Analysis 17 Overview RE 3.3 NFA CSE - HCMUT 3.5 3 .2 3.6 DFA Lexical Analysis mDFA 18 Nondeterministic finite automata • A nondeterministic finite automaton (NFA) is a mathematical model that consists... N(t) N(s) i N(s) f f – For (s), use N(s) itself CSE - HCMUT Lexical Analysis 25 Outline • Introduction • Token specification – Language – Regular Expressions (REs) • Token recoginition – REs – NFA – DFA NFA (Thompson’s construction) DFA (subset construction) minimal DFA (Algorithm 3.6) • Programming CSE - HCMUT Lexical Analysis 26 Subset construction Operation Description Set of NFA states reachable from... Lexical Analysis 27 Subset construction (cont’d) Let s0 be the start state of the NFA; Dstates contains the only unmarked state -closure(s0); while there is an unmarked state T in Dstates do begin mark T for each input symbol a do begin U := -closure(move(T; a)); if U is not in Dstates then Add U as an unmarked state to Dstates; DTran[T; a] := U; end; end; CSE - HCMUT Lexical Analysis 28 DFA • Let (... -closure(s0) CSE - HCMUT Lexical Analysis 29 Outline • Introduction • Token specification – Language – Regular Expressions (REs) • Token recoginition – REs – NFA – DFA NFA (Thompson’s construction) DFA (subset construction) minimal DFA (Algorithm 3.6) • Programming CSE - HCMUT Lexical Analysis 30 Minimise a DFA Initially, create two states: 1 one is the set of all final states: F 2 the other is the set of all... b b Step1: {A,B,C,D} b C {E} For a, {B,B,B,B} a For b, {C,D,C,E} A a B b a D b E a a Split b A B a CSE - HCMUT {E} For b, {C,D,C} Split a {D} Step 2: b b {A,B,C} b a {A,C} {B} {D} {E} Step 3: D b For a, {B,B} E a For b, {C,C} Terminate Lexical Analysis 32 Outline • Introduction • Token specification – Language – Regular Expressions (REs) • Token recoginition – REs – NFA – DFA NFA (Thompson’s construction)... highest precedence – “|” has the lowest precedence • Associativity: – all are left-associative E.g.: (a)|((b)*(c)) a|b*c H Unnecessary parentheses can be removed CSE - HCMUT Lexical Analysis 14 Example • 1 2 3 4 5 = {a, b} a|b denotes {a,b} (a|b)(a|b) denotes {aa,ab,ba,bb} a* denotes { ,a,aa,aaa,aaaa,…} (a|b)* denotes ? a|a*b denotes ? CSE - HCMUT Lexical Analysis 15 Notational Shorthands • One or more instances... reload second half forward++ } else if (forward at end of second half) { reload first half forward = 0 } else terminate the analysis } Lexical Analysis 35 Transition Diagrams relop 2 return(relop,LE) 3 return(relop,NE) 4 0 return(relop,LT) 7 return(id,lexeme) other letter id letter(letter|digit)* 5 6 other letter or digit Transition diagram is a DFA in which there is no edge leaving . ? • Simplify design • Improve compiler efficiency • Enhance compiler portability CSE - HCMUT Lexical Analysis 4 Tokens, Patterns, Lexemes Token Sample Lexeme Informal description of pattern const. <,<=,==,!=,>,>= < or <= or == or != or > or >= id pi, count, x2 letter followed by letters or digits num 3.14, 25 , 6.02E3 any numeric constant literal “core dumped” any characters between. Algorithm 3.3) –NFA ⇒ DFA (subset construction, Algorithm 3 .2) –DFA ⇒ minimal DFA (Algorithm 3.6) • Programming CSE - HCMUT Lexical Analysis 2 Introduction • Read the input characters • Produce as

Ngày đăng: 23/10/2014, 17:33

TỪ KHÓA LIÊN QUAN