1. Trang chủ
  2. » Công Nghệ Thông Tin

Professional Information Technology-Programming Book part 102 pdf

6 297 0

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 6
Dung lượng 30,21 KB

Nội dung

This pattern is looking increasingly complex (but it actually is not), so let's look at it together. \w+ matches any alphanumeric character but not . (the valid characters with which to start an email address). After the initial valid characters, it is indeed possible to have a . and additional characters, although these may in fact not be present. [\w.]* matches zero or more instances of . or alphanumeric characters, which is exactly what was needed. Note Think of * as being the make it optional metacharacter. Unlike +, which requires at least one match, * matches any number of matches if present, but does not require that any be present. * is a metacharacter. To match an * you'll need to escape it as \*. Matching Zero or One Character One other very useful metacharacter is ?. Like +, ? matches optional text (and so zero instances will match). But unlike +, ? matches only zero or one instance of a character (or set), but not more than one. As such, ? is very useful for matching specific, single optional characters in a block of text. Consider the following example: The URL is http://www.forta.com/, to connect securely use https://www.forta.com/ instead. http://[\w./]+ The URL is http://www.forta.com/, to connect securely use https://www.forta.com/ instead. The pattern used to match a URL is http:// (which is literal text and therefore matches only itself) followed by [\w./]+, which matches one or more instances of a set that allows alphanumeric characters, ., and forward slash. This pattern can match only the first URL (the one that starts with http://) but not the second (the one that starts with https://). And s* (zero or more instances of s) would not be correct because that would then also allow httpsssss:// (which is definitely not valid). The solution? Use s? as seen in the following example: The URL is http://www.forta.com/, to connect securely use https://www.forta.com/ instead. https?://[\w./]+ The URL is http://www.forta.com/, to connect securely use https://www.forta.com/ instead. The pattern here begins with https?://. ? means that the preceding character (the s) should be matched if it is not present, or if a single instance of it is present. In other words, https?:// matches both http:// and https:// (but nothing else). Incidentally, using ? is the solution to a problem alluded to in the previous lesson. You looked at an example where \r\n was being used to match an end of line, and I mentioned that on Unix or Linux boxes, you would need to use \n (without \r) and that an ideal solution would be to match an optional \r followed by \n. That example follows again, this time using a slightly modified regular expression: "101","Ben","Forta" "102","Jim","James" "103","Roberta","Robertson" "104","Bob","Bobson" [\r]?\n[\r]?\n "101","Ben","Forta" "102","Jim","James" "103","Roberta","Robertson" "104","Bob","Bobson" [\r]?\n matches an optional single instance of \r followed by a required \n. Tip You'll notice that the regular expression here used [\r]? instead of \r?. [\r] defines a set containing a single metacharacter, a set of one, so [\r]? is actually functionally identical to \r?. [] is usually used to define a set of characters, but some developers like to use it even around single characters to prevent ambiguity (to make it stand out so that you know exactly what the following metacharacter applies to). If you are using both [] and ?, make sure to place the ? outside of the set. Therefore, http[s]?:// is correct, but http[s?]:// is not. Tip ? is a metacharacter. To match an ? you'll need to escape it as \?. Using Intervals +, *, and ? are used to solve many problems with regular expressions, but sometimes they are not enough. Consider the following:  + and * match an unlimited number of characters. They provide no way to set a maximum number of characters to match.  The only minimums supported by +, *, and ? are zero or one. They provide no way to set an explicit minimum number of matches.  There is also no way to specify an exact number of matches desired. To solve these problems, and to provide a greater degree of control over repeating matches, regular expressions allow for the use of intervals. Intervals are specified between the { and } characters. Note { and } are metacharacters and, as such, should be escaped using \when needed as literal text. It is worth noting that many regular expression implementations seem to be able to correctly process { and } even if they are not escaped (being able to determine when they are literal and when they are metacharacters). However, it is best not to rely on this behavior and to escape the characters when needing them as literals. Exact Interval Matching To specify an exact number of matches, you place that number between { and }. Therefore, {3} means match three instances of the previous character or set. If there are only 2 instances, the pattern would not match. To demonstrate this, let's revisit the RGB example (used in Lessons 3 and 4). You will recall that RGB values are specified as three sets of hexadecimal numbers (each of 2 characters). The first pattern used to match an RGB value was the following: #[0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f] In Lesson 4, you used a POSIX class and changed the pattern to #[[:xdigit:]][[:xdigit:]][[:xdigit:]][[:xdigit:]][[:xdigit:]][[:xdigit:]] The problem with both patterns is that you had to repeat the exact character set (or class) six times. Here is the same example, this time using interval matching: <BODY BGCOLOR="#336633" TEXT="#FFFFFF" MARGINWIDTH="0" MARGINHEIGHT="0" TOPMARGIN="0" LEFTMARGIN="0"> #[[:xdigit:]]{6} <BODY BGCOLOR="#336633" TEXT="#FFFFFF" . slightly modified regular expression: "101","Ben","Forta" " ;102& quot;,"Jim","James" "103","Roberta","Robertson". "104","Bob","Bobson" [ ]? [ ]? "101","Ben","Forta" " ;102& quot;,"Jim","James" "103","Roberta","Robertson"

Ngày đăng: 07/07/2014, 03:20