Without word boundaries, the 0 in $30 was also matched. Why? Because there is $ in front of it. Enclosing the entire pattern within word boundaries solves this problem. Summary Looking ahead and behind provides greater control over what is returned when matches are made. The lookaround operations allow subexpressions to be used to specify the location of text to be matched but not consumed (matched, but not included in the matched text itself). Positive lookahead is defined using (?=), and negative lookahead is defined using (?!). Some regular expression implementations also support lookbehind using (?<=) and negative lookahead using (?<!). Lesson 10. Embedding Conditions A powerful yet infrequently used feature of the regular expression language is the capability to embed conditional processing within an expression. This lesson will explore this topic. Why Embed Conditions? (123)456-7890 and 123-456-7890 are both acceptable presentation formats for North American phone numbers. 1234567890, (123)-456-7890, and (123-456- 7890 all contain the correct number of digits, but are badly formatted. How could you write a regular expression to match only the acceptable formats and not any others? This is not a trivial problem; consider this obvious solution: 123-456-7890 (123)456-7890 (123)-456-7890 (123-456-7890 1234567890 123 456 7890 \(?\d{3}\)?-?\d{3}-\d{4} 123-456-7890 (123)456-7890 (123)-456-7890 (123-456-7890 1234567890 123 456 7890 \(? matches an optional opening parenthesis (notice that ( must be escaped), \d{3} matches the first three digits, \)? matches an optional closing parenthesis, -? matches an optional hyphen, and \d{3}-\d{4} matches the remaining seven digits (separated by a hyphen). The pattern correctly did not match the last two lines, but it did match the third and fourth—both of which are incorrect (the third contains both ) and -, and the fourth has an unmatched parenthesis). Replacing \)?-? with [\)-]? will help eliminate the third line (by allowing only ) or - , but not both) but the fourth line is a problem. The pattern needs to match ) only if there is an opening (. In truth, the pattern needs to match ) if there is an opening (. If not, it needs to match -, and that type of pattern cannot be implemented without conditional processing. Caution Conditional processing is not supported by all regular expression implementations. Using Conditions Regular expression conditions are defined using ?. In fact, you have already seen a couple of very specific conditions: ? matches the previous character or expression if it exists. ?= and ?<= match text ahead or behind, if it exists. Embedded condition syntax also uses ?, which is not surprising considering that the conditions that are embedded are the same two just listed: Conditional processing based on a backreference. Conditional processing based on lookaround. Backreference Conditions A backreference condition allows for an expression to be used only if a previous subexpression search was successful. If that sounds obscure, consider an example: You need to locate all <IMG> tags in your text; in addition, if any <IMG> tags are links (enclosed between <A> and </A> tags), you need to match the complete link tags as well. The syntax for this type of condition is (?(backreference)true). The ? starts the condition, the backreference is specified within parentheses, and the expression to be evaluated only if the backreference is present immediately follows. Now for the example: <! Nav bar > <TD> <A HREF="/home"><IMG SRC="/images/home.gif"></A> <IMG SRC="/images/spacer.gif"> <A HREF="/search"><IMG SRC="/images/search.gif"></A> <IMG SRC="/images/spacer.gif"> <A HREF="/help"><IMG SRC="/images/help.gif"></A> </TD> (<[Aa]\s+[^>]+>\s*)?<[Ii][Mm][Gg]\s+[^>]+>(?(1)\s*</[Aa]>) <! Nav bar > <TD> <A HREF="/home"><IMG SRC="/images/home.gif"></A> <IMG SRC="/images/spacer.gif"> <A HREF="/search"><IMG SRC="/images/search.gif"></A> <IMG SRC="/images/spacer.gif"> <A HREF="/help"><IMG SRC="/images/help.gif"></A> </TD> This pattern requires explanation. (<[Aa]\s+[^>]+>\s*)? matches an opening <A> or <a> tag (with any attributes that may be present), if present (the closing ? makes the expression optional). <[Ii][Mm][Gg]\s+[^>]+> then matches the <IMG> tag (regardless of case) with any of its attributes. (?(1)\s*</[Aa]>) starts off with a condition: ?(1) means execute only what comes next if backreference 1 (the opening <A> tag) exists (or in other words, execute only what comes next if the first <A> match was successful). If (1) exists, then \s*</[Aa]> matches any trailing whitespace followed by the closing </A> tag. Note ?(1) checks to see if backreference 1 exists. The backreference number (1 in this example) does not need to be escaped in conditions. So, ?(1) is correct, and ?(\1) is not (although the latter will usually work, too). The pattern just used executes an expression if a condition is met. Conditions can also have else expressions, expressions that are executed only if the backreference does not exist (the condition is not met). The syntax for this form of condition is (?(backreference)true|false). This syntax accepts a condition, as well as the expressions to be executed if the condition is met or not met. This syntax provides the solution for the phone number problem as shown here: 123-456-7890 (123)456-7890 (123)-456-7890 (123-456-7890 1234567890 123 456 7890 (\()?\d{3}(?(1)\)|-)\d{3}-\d{4} 123-456-7890 (123)456-7890 (123)-456-7890