Chapter 3 Topics • Introduction • The General Problem of Describing Syntax • Formal Methods of Describing Syntax • Attributes Grammars • Describing the Meanings of Programs: Dynamic Semantics Introduction A language may be hard to learn, hard to implement, and any ambiguity in the specification may lead to dialect differences if we do not have a clear language definition Most new programming languages are subjected to a period of scrutiny by potential users before their designs are completed Who must use language definitions – Other language designers – Implementors – Programmers (the users of the language)
Chapter Describing Syntax and Semantics ISBN 0-321-33025-0 Chapter Topics • Introduction • The General Problem of Describing Syntax • Formal Methods of Describing Syntax • Attributes Grammars • Describing the Meanings of Programs: Dynamic Semantics Copyright © 2006 Addison-Wesley All rights reserved 1-2 Introduction • A language may be hard to learn, hard to implement, and any ambiguity in the specification may lead to dialect differences if we not have a clear language definition • Most new programming languages are subjected to a period of scrutiny by potential users before their designs are completed • Who must use language definitions – Other language designers – Implementors – Programmers (the users of the language) Copyright © 2006 Addison-Wesley All rights reserved 1-3 Introduction (cont.) • The study of programming languages can be divided into examinations of syntax and semantics – Syntax - the form or structure of the expressions, statements, and program units – Semantics - the meaning of the expressions, statements, and program units • Semantics should follow from syntax, the form of statements should be clear and imply what the statements or how they should be used Copyright © 2006 Addison-Wesley All rights reserved 1-4 Example •Syntax Example: Simple C if statement if () else •Semantics Example: If the expression evaluated to true execute the true statement otherwise execute the false statement Copyright © 2006 Addison-Wesley All rights reserved 1-5 The General Problem of Describing Syntax • A sentence is a string of characters over some alphabet • A language is a set of sentences • A lexeme is the lowest level syntactic unit of a language (e.g., *, +, =, sum, begin) • A token is a category of lexemes (e.g., identifier, number, operator, …) Copyright © 2006 Addison-Wesley All rights reserved 1-6 Example index = * count + 10; Lexeme Token index = identifier equal_sign int_literal * count + 10 ; mult_op identifier plus_op int_literal semicolon Copyright © 2006 Addison-Wesley All rights reserved 1-7 The Definition of Languages • Languages can be formally defined in two distinct ways: by recognition and by generation • Language Recognizers – A recognition device of the language reads input strings and decides whether the input strings belong to the language – Example: syntax analysis part of a compiler Copyright © 2006 Addison-Wesley All rights reserved 1-8 The Definition of Languages (cont.) • Language Generators – A device that generates sentences of a language – One can determine if the syntax of a particular sentence is correct by comparing it to the structure of the generator Copyright © 2006 Addison-Wesley All rights reserved 1-9 Language Recognizers vs Generators • The language recognizer can only be used in trial-and-error mode (black box) • The structure of the language generator is an open-book which people can easily read and understand Copyright © 2006 Addison-Wesley All rights reserved 1-10 Denotational Semantic • The most abstract semantic description method • Based on recursive function theory • The fundamental concepts: – Define a mathematical object for each language entity – Define a function that maps instances of the language entities onto instances of the corresponding mathematical objects – Because the objects are rigorously defined, they represent the exact meaning of their corresponding entities – There are rigorous ways of manipulating mathematical objects but not for programming language constructs Copyright © 2006 Addison-Wesley All rights reserved 1-85 Example: Binary numbers • Syntax of binary numbers: | | | • Parser tree: 1 Copyright © 2006 Addison-Wesley All rights reserved 1-86 Example (cont.) • The actual meaning is associated with each rule that has a single terminal symbol as its RHS (the first two grammar rules) – In this example the meaning of a binary number will be its decimal equivalent • The other two grammar rules are, in a sense, computational rules because they combine: – A terminal symbol, to which an object can be associated – A nonterminal, which can be expected to represent some construct Copyright © 2006 Addison-Wesley All rights reserved 1-87 Example (cont.) • Let the domain of semantic values of the objects be ℕ • The semantic function, named Mbin, maps the syntactic objects, as described in the grammar rules above, to the objects in ℕ • The function Mbin is defined as follows: Mbin(„0‟) = Mbin(„1‟) = Mbin( „0‟) = * Mbin() Mbin( „1‟) = * Mbin() + Copyright © 2006 Addison-Wesley All rights reserved 1-88 Example (cont.) • The meanings can be attached to the nodes of the parse tree: 0 1 1 Copyright © 2006 Addison-Wesley All rights reserved 1-89 Example: Decimal Numbers • Syntax of decimal numbers: | | | | | | | | | | (0 | | | | | | | | | 9) • The following denotational semantics description maps decimal numbers as strings of symbols into numeric values: Mdec('0') = 0, Mdec('1') = 1, …, Mdec('9') = Mdec ( '0') = 10 * Mdec () Mdec ( '1‟) = 10 * Mdec () + … Mdec ( '9') = 10 * Mdec () + Copyright © 2006 Addison-Wesley All rights reserved 1-90 The State of a Program • The denotational semantics of a program could be defined in terms of state changes in an ideal computer The operational semantics are also defined in this way • The difference between denotational and operational semantics: – In operational semantics, the state changes are defined by coded algorithms, written in some programming language – In denotational semantics, the state changes are defined by rigorous mathematical functions Copyright © 2006 Addison-Wesley All rights reserved 1-91 The State of a Program (cont.) • Let the state s of a program be represented as a set of ordered pairs, as follows: s = {, , …, } – Each i is the name of a variable, and the associated v's are the current values of those variables – Any of the v's can have the special value undef, which indicates that its associated variable is currently undefined • Let VARMAP be a function that, when given a variable name and a state, returns the current value of the variable VARMAP(ij, s) = vj Copyright © 2006 Addison-Wesley All rights reserved 1-92 Expressions • Expressions are fundamental to most programming languages • We assume that: – Expressions have no side effects – Expressions are decimal numbers, variables, or binary expressions having one arithmetic operator (+, ×) and two operands, each of which can be an expression • Following is the BNF description of these expressions: | | + | × Copyright © 2006 Addison-Wesley All rights reserved 1-93 Expressions (cont.) • The only error we consider in expressions is that, a variable has the value undef • Expressions are mapped to value, not states – Let ℤ be the set of integers – Let error be the error value ℤ {error} is the set of values to which an expression can evaluate Copyright © 2006 Addison-Wesley All rights reserved 1-94 Mapping Function Me(, s) = case of Mdec() if VARMAP(, s) == undef then error else VARMAP(, s) if ( Me(., s) == undef OR Me(., s) = undef) then error else if (. == „+‟ then Me(., s) + Me(., s) else Me(., s) ì Me(., s) Copyright â 2006 Addison-Wesley All rights reserved 1-95 Assignment Statements • An assignment statement is an expression evaluation plus the setting of the left-side variable to the expression's value • Maps state sets to state sets • Mapping function: Ma(x = E, s) = if Me(E, s) == error then error else s‟ = {,, ,}, where for j = 1, 2, , n, vj‟ = VARMAP(ij, s) if ij x; = Me(E, s) Copyright © 2006 Addison-Wesley All rights reserved if ij == x 1-96 Logical Pretest Loops • The meaning of the loop is simply the value of the program variables after the statements in the loop have been executed the prescribed number of times • In essence, the loop has been converted from iteration to recursion, where the recursive control is mathematically defined by other recursive state mapping functions • Recursion is easier to describe with mathematical rigour than iteration Copyright © 2006 Addison-Wesley All rights reserved 1-97 Mapping function • We assume that there are two other existing mapping functions: – Msl: maps statement lists to states – Mb: maps Boolean expressions to Boolean values (or error) Ml(while B L, s) = if Mb(B, s) == undef then error else if Mb(B, s) == false then s else if Msl(L, s) == error then error else Ml(while B L, Msl(L, s)) Copyright © 2006 Addison-Wesley All rights reserved 1-98 Evaluation of denotational semantics • Can be used to prove the correctness of programs • Provides a rigorous way to think about programs • Can be an aid to language design • Has been used in compiler generation systems Copyright © 2006 Addison-Wesley All rights reserved 1-99 .. .Chapter Topics • Introduction • The General Problem of Describing Syntax • Formal Methods of Describing Syntax • Attributes Grammars • Describing the Meanings of Programs: Dynamic Semantics. .. reserved 1 -3 Introduction (cont.) • The study of programming languages can be divided into examinations of syntax and semantics – Syntax - the form or structure of the expressions, statements, and program... reserved 1-14 Grammar and Rules • A grammar is a finite nonempty set of rules • A rule has a left-hand side (LHS) and a righthand side (RHS), and consists of terminal (lexeme or token) and nonterminal