ptg 17.4 Getting Control—The Metacharacters 733 17.4 Getting Control—The Metacharacters Regular expression metacharacters are characters that do not represent themselves. They are endowed with special powers to allow you to control the search pattern in some way (e.g., find the pattern only at the beginning of line, or at the end of the line, or if it starts with an upper- or lowercase letter, etc.). Metacharacters will lose their special meaning if preceded with a backslash. For example, the dot metacharacter represents any single character, but when preceded with a backslash it is just a dot or period. If you see a backslash preceding a metacharacter, the backslash turns off the meaning of the metacharacter, but if you see a backslash preceding an alphanumeric character in a regular expression, then the backslash is used to create a metasymbol. A metasymbol provides a simpler form to represent some of the regular expression metachacters. For example, [0-9] represents numbers in the range between 0 and 9, and \d, the metasym- bol, represents the same thing. [0-9] uses the bracketed character class, whereas \d is a metasymbol (see Table 17.6). EXPLANATION 1 A new array object is created. 2 The string “apples pears,peaches:plums,oranges” is assigned to the variable called myString. The delimiters are a tab, comma, and colon. 3 The regular expression /[\t:,]/ is assigned to the variable called regex. 4 The String object’s split() method splits up the string using a tab, colon, or comma as the delimiter. The delimiting characters are enclosed in square brackets, which in regular expression parlance is called a character class. (See the section “Getting Control—The Metacharacters” on page 733.) In simple terms, any one of the characters listed within the brackets is a delimiter in the string. The split() method will search for any one of these characters and split the string accordingly, return- ing an array called splitArray. 5 Each of the array elements is displayed in the page. See Figure 17.10. Figure 17.10 The string is split on tabs, colons, and commas. EXAMPLE 17.10 /^a c/ From the Library of WoweBook.Com ptg 734 Chapter 17 • Regular Expressions and Pattern Matching EXPLANATION This regular expression contains metacharacters (see Table 17.6). The first one is a caret (^). The caret metacharacter matches for a string only if it is at the beginning of the line. The period (.) is used to match for any single character, including a whitespace. This expression contains three periods, representing any three characters. To find a literal period or any other character that does not represent itself, the char- acter must be preceded by a backslash to prevent interpretation. The expression reads: Search at the beginning of the line for an a, followed by any three single characters, followed by a c. It will match, for example, abbbc, a123c, a c, aAx3c, and so on, but only if those patterns were found at the beginning of the line. Table 17.6 Metacharacters and Metasymbols Metacharacter/Met asymbol What It Matches Character Class: Single Characters and Digits . Matches any character except newline [a–z0–9] Matches any single character in set [^a–z0–9] Matches any single character not in set \d Matches one digit \D Matches a nondigit, same as [^0–9] \w Matches an alphanumeric (word) character \W Matches a nonalphanumeric (nonword) character Character Class: Whitespace Characters \0 Matches a null character \b Matches a backspace \f Matches a formfeed \n Matches a newline \r Matches a return \s Matches whitespace character, spaces, tabs, and newlines \S Matches nonwhitespace character \t Matches a tab From the Library of WoweBook.Com ptg 17.4 Getting Control—The Metacharacters 735 Character Class: Anchored Characters ^ Matches to beginning of line $ Matches to end of line \A Matches the beginning of the string only \b Matches a word boundary (when not inside [ ]) \B Matches a nonword boundary \G Matches where previous m//g left off \Z Matches the end of the string or line \z Matches the end of string only Character Class: Repeated Characters x? Matches 0 or 1 of x x* Matches 0 or more of x x+ Matches 1 or more of x (xyz)+ Matches one or more patterns of xyz x{m,n} Matches at least m of x and no more than n of x Character Class: Alternative Characters was|were|will Matches one of was, were, or will Character Class: Remembered Characters (string) Used for backreferencing (see the section “Remembering or Capturing” on page 762) \1 or $1 Matches first set of parentheses \2 or $2 Matches second set of parentheses \3 or $3 Matches third set of parentheses Continues Table 17.6 Metacharacters and Metasymbols (continued) Metacharacter/Met asymbol What It Matches From the Library of WoweBook.Com ptg 736 Chapter 17 • Regular Expressions and Pattern Matching If you are searching for a particular character within a regular expression, you can use the dot metacharacter to represent a single character, or a character class that matches on one character from a set of characters. In addition to the dot and character class, Java- Script has added some backslashed symbols (called metasymbols) to represent single characters. See Table 17.7 for the single-character metacharacters, and Table 17.8 on page 742 for a list of metasymbols. 17.4.1 The Dot Metacharacter The dot metacharacter matches for any single character with the exception of the new- line character. For example, the regular expression /a.b/ is matched if the string contains an a, followed by any one single character (except the \n), followed by b, whereas the expression / / matches any string containing at least three characters. New with JavaScript 1.5 (?:x) Matches x but does not remember the match. These are called noncapturing parentheses. The matched substring cannot be recalled from the resulting array’s elements [1], , [n] or from the predefined RegExp object’s properties $1, , $9. x(?=y) Matches x only if x is followed by y. For example, /Jack(?=Sprat)/ matches Jack only if it is followed by Sprat. /Jack(?=Sprat|Frost)/ matches Jack only if it is followed by Sprat or Frost. However, neither Sprat nor Frost are part of the match results. x(?!y) Matches x only if x is not followed by y. For example, /\d+(?!\.)/ matches a number only if it is not followed by a decimal point. /\d+(?!\.)/.exec("3.141") matches 141 but not 3.141. Table 17.7 Single-Character and Single-Digit Metacharacters Metacharacter What It Matches . Matches any character except newline. [a–z0–9_] Matches any single character in set. [^a–z0–9_] Matches any single character not in set. Table 17.6 Metacharacters and Metasymbols (continued) Metacharacter/Met asymbol What It Matches From the Library of WoweBook.Com ptg 17.4 Getting Control—The Metacharacters 737 EXAMPLE 17.11 <html> <head><title>The dot Metacharacter</title></head> <body> <script type="text/javascript"> 1 var textString="Norma Jean"; 2 var reg_expression = /N ma/; 3 var result=reg_expression.test(textString); // Returns true // or false document.write(result+"<br />"); 4 if ( reg_expression.test(textString)){ // if (result) document.write("<b>The reg_ex /N ma/ matched the string\""+ textString +"\".<br />"); } else{ 5 document.write("No Match!"); } </script> </body> </html> EXPLANATION 1 The variable textString is assigned the string “Norma Jean”. 2 The regular expression /N ma/ is assigned to the variable reg_expression. A match is found if the string being tested contains an uppercase N followed by any two single characters (each dot represents one character), and an m and an a. It would find Norma, No man, Normandy, and so on. 3 The test method returns true if the string textString matches the regular expression and false if it doesn’t. The variable result contains either true or false. 4 If the string “Norma Jean” contains regular expression pattern /N ma/, the return from the test method is true, and the output is sent to the screen as shown in Figure 17.11. 5 If the pattern is not found, No Match! is displayed on the page. Figure 17.11 The user entered Norma Jean,an N followed by any 2 characters, and ma. From the Library of WoweBook.Com ptg 738 Chapter 17 • Regular Expressions and Pattern Matching 17.4.2 The Character Class A character class represents one character from a set of characters. For example [abc] matches either an a, b, or c; and [a-z] matches one character from a set of characters in the range from a to z; and [0-9] matches one character in the range of digits between 0 to 9. If the character class contains a leading caret, ^, then the class represents any one character not in the set; thus, [^a-zA-Z] matches a single character not in the range from a to z or A to Z, and [^0-9] matches a single digit not in the range between 0 and 9. JavaScript provides additional symbols, called metasymbols, to represent a character class. The symbols \d and \D represent a single digit and a single nondigit, respectively; the same as [0-9] and [^0-9]; whereas \w and \W represent a single word character and a single nonword character, respectively; same as [A-Za-z_0-9] and [^A-Za-z_0-9]. EXAMPLE 17.12 <html> <head><title>The Character Class</title></head> <body> <script type="text/javascript"> 1 var reg_expression = /[A-Z][a-z]eve/; 2 var textString=prompt("Type a string of text",""); 3 var result=reg_expression.test(textString);// Returns true // or false document.write(result+"<br />"); if ( result){ document.write("<b>The reg_ex /[A-Z][a-z]eve/ matched the string\""+ textString +"\".<br />"); } else{ alert("No Match!"); } </script> </body> </html> EXPLANATION 1 The variable is assigned a bracketed regular expression containing alphanumeric characters. This regular expression matches a string that contains at least one up- percase character ranging between A and Z, followed by one lowercase character ranging between a and z, followed by eve. 2 The variable textString is assigned user input, in this example Steven lives in Cleve- land was entered. 3 The regular expression test() method will return true because Steven contains an uppercase character, followed by a lowercase character, and eve. Cleveland also matches the pattern. The variable result contains either true or false. See the out- put in Figures 17.12 and 17.13. From the Library of WoweBook.Com ptg 17.4 Getting Control—The Metacharacters 739 Figure 17.12 The user entered Steven lives in Cleveland, one uppercase letter [A- Z], followed by one lowercase letter [a-z], followed by eve. This matches both Steven and Cleveland. Figure 17.13 When the user entered Believe! (top), it didn’t match (bottom). Would it have matched if he or she had entered BeLieve. Why? EXAMPLE 17.13 <html> <head><title>The Character Class</title></head> <body> <script type="text/javascript"> // Character class 1 var reg_expression = /[A-Za-z0-9_]/;// A single alphanumeric // word character 2 var textString=prompt("Type a string of text",""); 3 var result=reg_expression.test(textString); // Returns true // or false Continues From the Library of WoweBook.Com ptg 740 Chapter 17 • Regular Expressions and Pattern Matching document.write(result+"<br />"); if (result){ document.write("<b>The reg_ex /[A-Za-z0-9_]/ matched the string\""+ textString +"\".<br />"); } else{ alert("No Match!"); } </script> </body> </html> EXPLANATION 1 A regular expression object, an alphanumeric character in the bracketed character class [A-Za-z0-9_] is assigned to the variable called reg_expression. This regular expression matches a string that contains at least one character in the character class ranging between A and Z, a and z, 0 and 9, and the underscore character, _. 2 User input is entered in the prompt dialog box and assigned to the variable text- String. In this example the user entered Take 5. 3 The regular expression test method will return true because this string Take 5 con- tains at least one alphanumeric character (see Figure 17.14). Figure 17.14 User entered Take 5 (top). The string contained at least one alphanumeric character (bottom). EXAMPLE 17.14 <html> <head><title>The Character Class and Negation</title></head> <body> EXAMPLE 17.13 (CONTINUED) From the Library of WoweBook.Com ptg 17.4 Getting Control—The Metacharacters 741 17.4.3 Metasymbols Metasymbols offer an alternative way to represent a character class. For example, instead of representing a number as [0-9], it can be represented as \d, and the alternative for rep- resenting a nonnumber [^0-9] is \D. Metasymbols are easier to use and to type than metacharacters. <script type="text/javascript"> // Negation within a Character Class 1 var reg_expression = /[^0-9]/; 2 var textString=prompt("Type a string of text",""); 3 var result=reg_expression.test(textString);// Returns true // or false document.write(result+"<br />"); if (result){ document.write("<b>The reg_ex /[^0-9]/ matched the string\""+ textString +"\".<br />"); } else{ alert("No Match!"); } </script> </body> </html> EXPLANATION 1 The caret inside a character class, when it is the first character after the opening bracket, creates a negation, meaning any character not in this range. This regular expression matches a string that does not contain a number between 0 and 9. 2 User input is assigned to the variable textString. In this example, abc was entered. 3 The regular expression test() method will return true because the string abc does not contain a character ranging from 0 to 9 (see Figure 17.15). Figure 17.15 The user entered abc. It contains a character that is not in the range between 0 and 9. EXAMPLE 17.14 (CONTINUED) From the Library of WoweBook.Com . string using a tab, colon, or comma as the delimiter. The delimiting characters are enclosed in square brackets, which in regular expression parlance is called a character class. (See the section. WoweBook.Com ptg 736 Chapter 17 • Regular Expressions and Pattern Matching If you are searching for a particular character within a regular expression, you can use the dot metacharacter to represent. b, whereas the expression / / matches any string containing at least three characters. New with JavaScript 1.5 (?:x) Matches x but does not remember the match. These are called noncapturing