Professional Information Technology-Programming Book part 112 doc

6 58 0
Professional Information Technology-Programming Book part 112 doc

Đang tải... (xem toàn văn)

Thông tin tài liệu

CFMX1: $899.00 XTC99: $69.96 Total items found: 4 (?<=\$)[0-9.]+ ABC01: $23.45 HGG42: $5.31 CFMX1: $899.00 XTC99: $69.96 Total items found: 4 That did the trick. (?<=\$) matches $, but does not consume it, and so only the prices (without the leading $ signs) are returned. Compare the first and last expressions used in this example. \$[0-9.]+ matched $ followed by a dollar amount. (?<=\$)[0-9.]+ also matched $ followed by a dollar amount. The difference between the two is not in what they located while performing the search; it is in what they included in the results. The former located and included the $. The latter located $ so as to correctly find the prices, but did not include that $ in the matched results. Lookahead patterns may be variable length; they may contain . and +, for example, so as to be highly dynamic. Lookbehind patterns, on the other hand, must generally be fixed length. This is a restriction imposed by almost all regular expression implementations. Combining Lookahead and Lookbehind Lookahead and lookbehind operations may be combined, as in the following example (the solution to the problem at the start of this lesson): <HEAD> <TITLE>Ben Forta's Homepage</TITLE> </HEAD> (?<=<[tT][iI][tT][lL][eE]>).*(?=</[tT][iI][tT][lL][eE]>) <HEAD> <TITLE>Ben Forta's Homepage</TITLE> </HEAD> That worked. (?<=<[tT][iI][tT][lL][eE]>) is a lookbehind operation that matches (but does not consume) <TITLE>; (?=</[tT][iI][tT][lL][eE]>) similarly matches (but does not consume) </TITLE>. All that is returned is the title text (as that is all that was consumed). Tip In the preceding example, it may be worthwhile to escape the < (the first character being matched) to prevent ambiguity, so (?<=\ < instead of (?<=<. Negating Lookaround As seen thus far, lookahead and lookbehind are usually used to match text, essentially to specify the location of text to be returned (by specifying the text before or after the desired match). These are known as positive lookahead and positive lookbehind. The term positive refers to the fact that they look for a match. A lesser-used form of lookaround is the negative lookaround. Negative lookahead looks ahead for text that does not match the specified pattern, and negative lookbehind similarly looks behind for text that does not match the specified pattern. You might have expected to be able to use ^ to negate a lookaround, but no, the syntax is a little different. Lookaround operations are negated using ! (which replaces the =). Table 9.1 lists all the lookaround operations. Table 9.1. Lookaround Operations Class Description (?=) Positive lookahead (?!) Negative lookahead (?<=) Positive lookbehind (?<!) Negative lookbehind Tip Generally, any regular expression implementations supporting lookahead support both positive and negative lookahead. Similarly, those implementations supporting lookbehind support both positive and negative lookbehind. To demonstrate the difference between positive and negative lookbehind, here is an example. The following block of text contains numbers—both prices and quantities. First we'll just obtain the prices: I paid $30 for 100 apples, 50 oranges, and 60 pears. I saved $5 on this order. (?<=\$)\d+ I paid $30 for 100 apples, 50 oranges, and 60 pears. I saved $5 on this order. This is very similar to the example seen previously. \d+ matches numbers (one or more digits), and (?<=\$) looks behind to match (but not consume) the $ (escaped as \$). Therefore, the numbers in the two prices were matched, but not the quantities. Now we'll do the opposite, locating just the quantities but not the prices: I paid $30 for 100 apples, 50 oranges, and 60 pears. I saved $5 on this order. \b(?<!\$)\d+\b I paid $30 for 100 apples, 50 oranges, and 60 pears. I saved $5 on this order. Again, \d+ matched numbers, but this time only the quantities were matched and not the prices. Expression (?<!\$) is a negative lookbehind that will match only when what precedes the numbers is not a $. Changing the = in the lookbehind changes the pattern from positive to negative. You may be wondering why the pattern in the negative lookbehind example defines word boundaries (using \b). To understand why this is necessary, here is the same example without those boundaries: I paid $30 for 100 apples, 50 oranges, and 60 pears. I saved $5 on this order. (?<!\$)\d+ I paid $30 for 100 apples, 50 oranges, and 60 pears. I saved $5 on this order.

Ngày đăng: 07/07/2014, 03:20

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan