Chapter 2. LexicalStructure The lexicalstructure of a programming language is the set of elementary rules tha specifies how you write programs in that language. It is the lowest-level syntax of a language; it specifies such things as what variable names look like, what characters are used for comments, and how one program statement is separated from the next. This short chapter documents the lexicalstructure of JavaScri t pt. hich is nt feature for in portant for programmers who do not speak English. ext d acters a antities may be disconcerted to know that JavaScript represents each character using 2 bytes, but this fact is actually transparent to the in erals -- all other parts of an ECMAScript v1 program are restricted to the ASCII character set. Versions of JavaScript that predate ECMAScript bles, four distinct es. hile typed in any case in HTML, in JavaScript they 2.1 Character Set JavaScript programs are written using the Unicode character set. Unlike the 7-bit ASCII encoding, which is useful only for English, and the 8-bit ISO Latin-1 encoding, w useful only for English and major Western European languages, the 16-bit Unicode encoding can represent virtually every written language in common use on the planet. This is an importa ternationalization and is particularly im American and other English-speaking programmers typically write programs using a t editor that supports only the ASCII or Latin-1 character encodings, and thus they don't have easy access to the full Unicode character set. This is not a problem, however, because both the ASCII and Latin-1 encodings are subsets of Unicode, so any JavaScript program written using those character sets is perfectly valid. Programmers who are use to thinking of char s 8-bit qu programmer and can simply be ignored. Although the ECMAScript v3 standard allows Unicode characters anywhere in a JavaScript program, Versions 1 and 2 of the standard allow Unicode characters only comments and quoted string lit standardization typically do not support Unicode at all. 2.2 Case Sensitivity JavaScript is a case-sensitive language. This means that language keywords, varia function names, and any other identifiers must always be typed with a consistent capitalization of letters. The while keyword, for example, must be typed "while", not "While" or "WHILE". Similarly, online, Online, OnLine, and ONLINE are variable nam Note, however, that HTML is not case-sensitive. Because of its close association with client-side JavaScript, this difference can be confusing. Many JavaScript objects and properties have the same names as the HTML tags and attributes they represent. W these tags and attribute names can be typically must be all lowercase. For example, is commonly specified as onClick in HTML the HTML onclick event handler attribute , but it must be referred to as onclick in cape, however, so in Internet Explorer 4 and later, client-side objects and s are case-sensitive. 2.3 Whitespace and Line Breaks spaces, tabs, and newlines that appear between tokens in programs, re part of string or regular expression literals. A token is a keyword, ariable name, number, function name, or some other entity in which you would 12 3 parate tokens (and constitutes a syntax error, incidentally). s 2.4 Optional Semicolons tatements in JavaScript are generally followed by semicolons (;), just as they are C, C++, and Java. The semicolon serves to separate statements from each other. In mit the semicolon if each of your statements is placed on following code could be written without semicolons: the first semicolon is required: Script theoretically allows line breaks between any two tokens, the fact t automatically inserts semicolons for you causes some exceptions to this JavaScript code. While core JavaScript is entirely and exclusively case-sensitive, exceptions to this rule are allowed in client-side JavaScript. In Internet Explorer 3, for example, all client-side objects and properties were case-insensitive. This caused problematic incompatibilities with Nets propertie JavaScript ignores xcept those that ae v obviously not want to insert a space or a line break. If you place a space, tab, or newline within a token, you break it up into two tokens -- thus, 123 is a single numeric token, but is two se Because you can use spaces, tabs, and newlines freely in your programs (except in strings, regular expressions, and tokens), you are free to format and indent your program in a neat and consistent way that makes the code easy to read and understand. Note, however, that there is one minor restriction on the placement of line breaks; it is described in the following section. S in imple s JavaScript, however, you may o a separate line. For example, the a = 3; = 4; b But when formatted as follows, a = 3; b = 4; Omitting semicolons is not a good programming practice; you should get in the habit of using them. Although Java hat JavaScript rule. Loosely, if you break a line of code in such a way that the line before the break appears to be a complete statement, JavaScript may think you omitted the semicolon an insert one for you, altering your meaning. Some places you should look out for this ar with the d e eturn, break, and continue statements (which are described in Chapter 6r ). For example, consider the following: true; true; atch out for -- this code does not cause a syntax error and will fail in a nonobvious way. A similar problem occurs if you write: break keyword, causing a syntax error when it + and -- postfix operators (see return JavaScript assumes you meant: return; However, you probably meant: return true; This is something to w outerloop; JavaScript inserts a semicolon after the break tries to interpret the next line. For similar reasons, the + Chapter 5) must always appear on the same line as the expressions to which they applied. are 2.5 Comments Java, supports both C++ and C-style comments. Any text between a // and the end of a line is treated as a comment and is ignored by JavaScript. Any text comment. These C-style comments following lines of code are all legal vaScript comments: ere is another comment. /* JavaScript, like between the characters and is also treated as a/* */ may span multiple lines but may not be nested. The Ja // This is a single-line comment. /* This is also a comment */ // and h * This is yet another comment. * It has multiple lines. */ 2.6 Literals A literal is a data value that appears directly in a program. The following are all literals: 12 // The number twelve 1.2 // The number one point two "hello world" // A string of text 'Hi' // Another string true // A boolean value false // The other boolean value /javascript/gi // A "regular expression" literal (for pattern t pressions that serve as array and object literals are also supported. ple: x:1, y:2 } // An object initializer [1,2,3,4,5] // An array initializer matching) null // Absence of an objec In ECMAScript v3, ex For exam { Note that these array and object literals have been supported since JavaScript 1.2 but were not standardized until ECMAScript v3. Literals are an important part of any programming language, as it is impossible to write a program without them. The various JavaScript literals are described in detail in Chapter 3. 2.7 Identifiers nd functions and to provide labels for certain loops in JavaScript code. The rules for legal cript as they are in Java and many other languages. underscore (_), or a dollar sign ($). [1] An identifier is simply a name. In JavaScript, identifiers are used to name variables a identifier names are the same in JavaS The first character must be a letter, an Subsequent so you should avoid using dollar signs in identifiers in the code you write yourself. i my_variable_name characters may be any letter or digit or an underscore or dollar sign. (Numbers are not allowed as the first character so that JavaScript can easily distinguish identifiers from numbers.) These are all legal identifiers: [1] Note that dollar signs are not legal in identifiers prior to JavaScript 1.1. They are intended for use only by code-generation tools, v13 _dummy $str In ECMAScript v3, identifiers can contain letters and digits from the complete Unicode character set. Prior to this version of the standard, JavaScript identifiers are restricted to the ASCII character set. ECMAScript v3 also allows Unicode escape sequences to appear in identifiers. A Unicode escape is the characters \u followed by 4 hexadecimal digits a 16-b r encoding. For example, the identifier that specify it characte could also be \u03c0. Although this is an awkward syntax, it makes it possible to translate t program t contain Unic haracters into a fo hat allows ated with t and othe at do not supp t the full Uni de haracter set. rposes in 2.8 Reserved Words There are a num ou cannot use rs (va ames, function names, and bels) in y cript Table written as JavaScrip s tha ode c rm t them to be manipul ext editors r tools th or co c Finally, identifiers cannot be the same as any of the keywords used for other pu JavaScript. The next section lists the special names that are reserved in JavaScript. ber of reserved words in JavaScript. These are words that y as identifie programs. riable n 2-1 loop la our JavaS lists the k ds standardized by ECMAScript v3. These words ial me to JavaS e 2-1 served Java word eywor have spec aning cript -- they are part of the language syntax itself. Tabl . Re Script key s break do if switc peofh ty case else in this var catch false instanceof throw void continue finally new true while default for null try with delete function return Table 2-2 lists other reserved keywords. These words are not currently used in JavaScript, but they are reserved by ECMAScript v3 as possible future extensions to the language. Table 2-2. Words reserved for ECMA extensions abstract double goto native static boolean enum implements package super byte export import private synchronized char extends int protected throws Table ions2-2. Words reserved for ECMA extens abstract double a statigoto n tive c class final terfac u transin e p blic ient const float g sh volatlon ort ile debugger In addition to some of the ser ds ju current draf CMAScript v4 standard are contemplating the use of the keywords as, is, namespace, nd use. Current JavaScript interpreters will not prevent you from using these four words s identifiers, but you should avoid them anyway. ou should also avoid using as identifiers the names of global variables and functions at are predefined by JavaScript. If you create variables or functions with these names, ither you will get an error (if the property is read-only) or you will redefine the existing ariable or function -- something you should not do unless you know exactly what you're formally re ved wor st listed, ts of the E a a Y th e v doing. Table 2-3 lists global variables and functions defined by the ECMAScript v standard. Specific implementations may define other global properties, and each s JavaScript embedding (client-side, server-side, etc.) will have its own extensive lis 3 pecific t of global properties. [2] [2] See the Window object in the client-side reference section of this book for a list of the additional global variables and functions de client-side JavaScript. Table 2-3. Other identifiers to avoid fined by arguments encodeURI Infinity Object String Array Error isFinite parseFloat SyntaxError Boolean escape isNaN parseInt TypeError Date eval Math RangeError undefined decodeURI EvalError NaN ReferenceError unescape decodeURIComponent Function Number RegExp URIError . Chapter 2. Lexical Structure The lexical structure of a programming language is the set of elementary. statement is separated from the next. This short chapter documents the lexical structure of JavaScri t pt. hich is nt feature for in portant for programmers