1. Trang chủ
  2. » Giáo Dục - Đào Tạo

Chapter 4 lexical and syntax analysis

57 8 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 57
Dung lượng 469,14 KB

Nội dung

Chapter Lexical and Syntax Analysis ISBN 0-321-33025-0 Chapter Topics • Introduction • Lexical Analysis • The Parsing Problem • Recursive-Descent Parsing • Bottom-Up Parsing Copyright © 2006 Addison-Wesley All rights reserved 1-2 Introduction • Language implementation systems must analyze source code, regardless of the specific implementation approach: compilation, pure interpretation or hybrid method • Nearly all syntax analysis is based on a formal description of the syntax of the source language (CFG or BNF) Copyright © 2006 Addison-Wesley All rights reserved 1-3 Using BNF to Describe Syntax • Provides a clear and concise syntax description • The parser can be based directly on the BNF • Parsers based on BNF are easy to maintain Copyright © 2006 Addison-Wesley All rights reserved 1-4 Syntax Analysis • The syntax analysis portion of a language processor nearly always consists of two parts: – A low-level part called a lexical analyzer (mathematically, a finite automaton based on a regular grammar) – A high-level part called a syntax analyzer, or parser (mathematically, a push-down automaton based on a context-free grammar, or BNF) Copyright © 2006 Addison-Wesley All rights reserved 1-5 Reasons to Separate Lexical and Syntax Analysis • Simplicity - less complex approaches can be used for lexical analysis; separating them simplifies the parser • Efficiency - separation allows optimization of the lexical analyzer • Portability - parts of the lexical analyzer may not be portable, but the parser always is portable Copyright © 2006 Addison-Wesley All rights reserved 1-6 Lexical Analysis • A lexical analyzer is a pattern matcher for character strings • A lexical analyzer is a “front-end” for the parser • Identify substrings of the source program that belong together lexemes – Lexemes match a character pattern, which is associated with a lexical category called a token – sum is a lexeme; its token may be IDENT Copyright © 2006 Addison-Wesley All rights reserved 1-7 Example sum = oldsum – value / 100; Token Lexeme IDENT ASSIGN_OP IDENT sum = oldsum SUBSTRACT_OP IDENT DIVISION_OP INT_LIT SEMICOLON – value / 100 ; Copyright © 2006 Addison-Wesley All rights reserved 1-8 Lexical Analysis (cont.) • The lexical analyzer is usually a function that is called by the parser when it needs the next token • The lexical analysis process also: – Includes skipping comments, tabs, newlines, and blanks – Inserts lexemes for user-defined names (strings, identifiers, numbers) into the symbol table – Saves source locations (file, line, column) for error messages – Detects and reports lexical errors in tokens, such as ill-formed floating-point literals, to the user Copyright © 2006 Addison-Wesley All rights reserved 1-9 Lexical Analysis (cont.) • Three main approaches to building a scanner: Write a formal description of the tokens and use a software tool that constructs lexical analyzers given such a description Design a state diagram that describes the token patterns and write a program that implements the diagram* Design a state diagram that describes the token patterns and hand-construct a table-driven impementation of the state diagram Copyright © 2006 Addison-Wesley All rights reserved 1-10 Example • Consider the following simple grammar EE+T|T TT*F|F F  (E) | id • The sentential form E + T * id includes three RHSs, E + T, T, and id Only one of these is the correct one to be rewritten – If the RHS E + T were chosen to be rewritten in this sentential form, the resulting sentential form would be E * id But E * id is not a legal right sentential form for the given grammar Copyright © 2006 Addison-Wesley All rights reserved 1-43 Definitions •  is the handle of the right sentential form  if and only if S rm* Aw rm w ( ) •  is a phrase of the right sentential form  if and only if S * 1A2 + 12 ( ) •  is a simple phrase of the right sentential form  if and only if S * 1A2  12 ( ) Copyright © 2006 Addison-Wesley All rights reserved 1-44 Example: Parser Tree of Sentential Form E + T * id E T F E + T * id • The phrases of the sentential form E + T * id are E + T * id, T * id, and id • The only simple phrase is id • The handle of a rightmost sentential form is the leftmost simple phrase Copyright © 2006 Addison-Wesley All rights reserved 1-45 Example: Consider the string id + id * id E (8) T E (3) (7) T T (2) (5) F F (1) id F (4) + id (6) * id E (8) E + T (7) E + T * F (6) E + T * id (5) E + F * id (4) E + id * id (3) T + id * id (2) F + id * id (1) id + id * id Copyright © 2006 Addison-Wesley All rights reserved 1-46 Shift-Reduce Algorithms • Reduce is the action of replacing the handle on the top of the parse stack with its corresponding LHS • Shift is the action of moving the next token to the top of the parse stack Copyright © 2006 Addison-Wesley All rights reserved 1-47 LR Parsers • Many different bottom-up parsing algorithms have been devised Most of these are variations of a process called LR parser – L means it scans the input string left to right and the R means it produces a rightmost derivation • The original LR algorithm was designed by Donald Knuth (1965) This algorithm, which is sometimes called canonical LR Copyright © 2006 Addison-Wesley All rights reserved 1-48 Advantages of LR parsers • They will work for nearly all grammars that describe programming languages • They work on a larger class of grammars than other bottom-up algorithms, but are as efficient as any other bottom-up parser • They can detect syntax errors as soon as it is possible • The LR class of grammars is a superset of the class parsable by LL parsers Copyright © 2006 Addison-Wesley All rights reserved 1-49 Structure of an LR Parser input a1 top a2 … … am $ Sm Xm … LR Parser S1 X1 Parsing Table S0 Copyright © 2006 Addison-Wesley All rights reserved 1-50 Configurations • The contents of the parse stack for an LR parser has the following form: S0X1S1…XmSm  top of stack where the Si are state symbols, the Xi are grammar symbols • An LR parsing table has two parts: – The ACTION part has state symbols as its row labels and the terminal symbols as its column labels – The GOTO part has state symbols as its row labels and the nonterminals symbols as column labels Copyright © 2006 Addison-Wesley All rights reserved 1-51 Configurations (cont.) • The input string has a „$‟ at its right end It is used for normal termination of the parser • An LR parser configuration is a pair of strings (stack, input), with the detailed form (S0X1S1…XmSm, aiai+1 … an$) • The initial configuration of an LR parser is (S0, a1a2 … an$) Copyright © 2006 Addison-Wesley All rights reserved 1-52 The Parser Actions • ACTION[Sm, ai] = shift S (S0X1S1X2S2 … XmSm S, ai+1 … an $) • ACTION[Sm, ai] = reduce by A   where r = ||, S = GOTO[Sm–r, A] (S0X1S1X2S2 … Xm-rSm-r A S, ai+1 … an $) • ACTION[Sm, ai] = accept the parse is complete and no errors were found • ACTION[Sm, ai] = error the parser calls an error-handling routine Copyright © 2006 Addison-Wesley All rights reserved 1-53 Example: The Grammar for Arithmetic Expressions E  E + T E  T T  T * F T  F F  (E) F  id Copyright © 2006 Addison-Wesley All rights reserved 1-54 LR Parsing Table State Action id + S5 * ( Goto ) $ S4 S6 R2 S7 R2 R2 R4 R4 R4 R4 S4 R6 T F accept S5 E R6 R6 S5 S4 S5 S4 R6 10 S6 S11 R1 S7 R1 R1 10 R3 R3 R3 R3 11 R5 R5 R5 R5 Copyright © 2006 Addison-Wesley All rights reserved EE+T ET TT*F TF F  (E) F  id 1-55 A Trace of a Parse of the String id + id * id Stack Input Action id * id + id $ Shift id * id + id $ Reduce by F  id 0F3 * id + id $ Reduce by T  F 0T2 * id + id $ Shift 0T2*7 id + id $ Shift T * id + id $ Reduce by F  id T * F 10 + id $ Reduce by T  T * F 0T2 + id $ Reduce by E  T 0E1 + id $ Shift 0E1+6 id $ Shift E + id $ Reduce by F  id 0E1+6F3 $ Reduce by T  F 0E1+6T9 $ Reduce by E  E + T 0E1 $ Accept Copyright © 2006 Addison-Wesley All rights reserved 1-56 Summary • Syntax analysis is a common part of language implementation • A lexical analyzer is a pattern matcher that isolates small-scale parts of a program – Detects syntax errors – Produces a parse tree • A recursive-descent parser is an LL parser • Parsing problem for bottom-up parsers: find the substring of current sentential form • The LR family of shift-reduce parsers is the most common bottom-up parsing approach Copyright © 2006 Addison-Wesley All rights reserved 1-57 ... Lexical and Syntax Analysis • Simplicity - less complex approaches can be used for lexical analysis; separating them simplifies the parser • Efficiency - separation allows optimization of the lexical. .. clear and concise syntax description • The parser can be based directly on the BNF • Parsers based on BNF are easy to maintain Copyright © 2006 Addison-Wesley All rights reserved 1 -4 Syntax Analysis. .. All rights reserved 1-8 Lexical Analysis (cont.) • The lexical analyzer is usually a function that is called by the parser when it needs the next token • The lexical analysis process also: –

Ngày đăng: 23/03/2022, 08:27

TỪ KHÓA LIÊN QUAN