Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 50 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
50
Dung lượng
1 MB
Nội dung
Figure 12.11. Splitting with multiple delimiters. Output from Example 12.10.
!
Example 12.11.
<?php
1 $alpha="SAN FRANCISCO";
2 $array=preg_split("//", $alpha, -1, PREG_SPLIT_NO_EMPTY);
echo "<h2>Splitting A Word into Letters</h2>";
3 print_r($array);
?>
Explanation
E
F5>/!>/!+5*!/+->.;!+50+!6><<!?*!/4<>+!,4:
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
G
]1!,/>.;!0.!*74+1!9*<>7>+*-&!preg_split()!6><<!/4<>+!,4!+5*!/+->.;!?1!>+/!>.9>D>9,0<!
850-08+*-/:!F5*!PREG_SPLIT_NO_EMPTY!3<0;!80,/*/!+5*!3,.8+>2.!+2!-*+, !0.!0 01!
6>+52,+!0.1!*74+1!*<*7*.+/:
K
F5*!0 01!23!<*++*-/!8-*0+*9!?1!/4<>++>.;!2.!0.!*74+1!9*<>7>+*-!>/!9>/4<01*9!0/!0.!0 01!
?1!+5*!print_r()!3,.8+>2.&!/526.!>.!N>;,-*!EG:EG:
!
Figure 12.12. Splitting up a word with the preg_split() function. Output from Example 12.11.
Example 12.12.
<?php
1 $alpha="PORT OF SAN FRANCISCO";
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
2 $array=preg_split("/\s/", $alpha, -1,
PREG_SPLIT_OFFSET_CAPTURE);
echo "<h2>Splitting A Word into Letters</h2>";
print_r($array);
?>
Explanation
E
F5>/!>/!+5*!/+->.;!6*!6><<!?*!/4<>++>.;!2.!<>.*!G:
G
F5*!preg_split()!3,.8+>2.!+0@*/!0!.,7?*-!23!0-;,7*.+/:!L.!+5>/!*=074<*&!+5*!3>-/+!
0-;,7*.+!>/!+5*!9*<>7>+*-:!\s!-*4-*/*.+/!0!65>+*/408*!850-08+*-:!F5*!/*82.9!0-;,7*.+!
>/!+5*!/+->.;!+50+!>/!?*>.;!/4<>+&!$alpha:!F5*!+5>-9!0-;,7*.+!".2-70<<1!27>++*9(!>/!-1&!
/+0+>.;!+50+!+5*-*!>/!.2!<>7>+!+2!+5*!.,7?*-!23!0 01!*<*7*.+/!+50+!80.!?*!8-*0+*9!65*.!
/4<>++>.;!,4!+5>/!/+->.;:!F5*!PREG_SPLIT_OFFSET_CAPTURE!3<0;!/01/!+50+!32-!*D*-1!0 01!
*<*7*.+!8-*0+*9&!+5*!233/*+!23!65*-*!>+!288, *9!6>+5>.!+5*!/+->.;!6><<!0</2!?*!-*+, *9:!
`2,!80.!/**!>.!+5*!2,+4,+!23!+5>/!*=074<*!"N>;,-*!EG:EK(!+50+!*085!/,?/+->.;!>/!0.!0 01!
*<*7*.+&!0.9!>+/!233/*+!6>+5>.!+5*!/+->.;!>/!0.2+5*-!0 01!82./>/+>.;!23!+62!*<*7*.+/&!+5*!
0 01!*<*7*.+!"/,?/+->.;(!0.9!+5*!233/*+!42/>+>2.!23!65*-*!+50+!/,?/+->.;!60/!32,.9!>.!
+5*!2->;>.0<!/+->.;:
!
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Figure 12.13. Splitting up a string with the preg_split() function. Output from Example 12.12.
!
Other related PHP functions are: spliti(), split(), implode(), and explode(). See Chapter 8, “Arrays,”
for more on these.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
The preg_grep() Function
Similar to the UNIX grep command, the preg_grep() function returns an array of values that match a pattern
found in an array instead of a search string. You can also invert the search and get an array of all elements that do not
contain the pattern being searched for (like UNIX grep -v) by using the PREG_GREP_INVERT flag.
Format
array preg_grep ( string pattern, array input [, int flags] )
!
Example:
$new_array = preg_grep("/ma/", array("normal", "mama", "man","plan")); //
$new_array contains: normal, mama, man
$new_array=preg_grep("/ma/",array("normal","mama","man",
"plan"),PREG_GREP_INVERT); // $new_array contains: plan
Example 12.13.
_29*!a>*6b!
<html><head><title>The preg_grep() Function</title></head>
<body bgcolor="lavender">
<font face="verdana" >
<b>
<h2>The preg_grep() Function</h2>
<font size="+1">
<pre>
<?php
1 $regex="/Pat/";
2 $search_array=array("Margaret","Patsy", "Patrick",
"Patricia", "Jim");
sort($search_array);
3 $newarray=preg_grep( $regex, $search_array );
4 print "Found ". count($newarray). " matches\n";
5 print_r($newarray);
6 $newarray=preg_grep($regex,$search_array,
PREG_GREP_INVERT);
print "Found ". count($newarray). " that didn't match\n";
print_r($newarray);
?>
</b>
</pre>
</font>
</body>
</html>
Explanation
E
F5*!D0->0?<*!$regex!>/!0//>;.*9!+5*!-*;,<0-!*=4-*//>2.&!/Pat/&!+50+!6><<!?*!,/*9!<0+*-!?1!
preg_grep()!0/!+5*!/*0-85!40++* :
G
F5>/!0 01!6><<!?*!,/*9!0/!+5*!/,?Z*8+!32-!+5*!/*0-85!6>+5!+5*!preg_grep()!3,.8+>2.:
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
K
W3+*-!+5*!0 01!50/!?**.!/2-+*9&!+5*!preg_grep()!3,.8+>2.!6><<!/*0-85!32-!+5*!40++* &!
/Pat/&!>.!*085!*<*7*.+!23!+5*!0 01&!0.9!-*+, !0.9!0//>;.!+5*!70+85*9!0 01!*<*7*.+/!
+2!0.2+5*-!0 01!80<<*9!$newarray:
%
F5*!count()!3,.8+>2.!-*+, /!+5*!.,7?*-!23!*<*7*.+/!>.!+5*!.*6!0 01M!+50+!>/&!+5*!
.,7?*-!23!*<*7*.+/!65*-*!+5*!40++* !/Pat/!60/!32,.9:
'
F5*!32,.9!*<*7*.+/!0-*!9>/4<01*9:!O2+*!+50+!+5*!>.9*=!D0<,*/!50D*!?**.!4-*/*-D*9:
P
R5*.!+5*!PREG_GREP_INVERT!3<0;!>/!/4*8>3>*9&!+5*!preg_grep()!3,.8+>2.!6><<!70+85!0.9!
-*+, !0.1!*<*7*.+/!.2+!32,.9!>.!+5*!2->;>.0<!0 01&!0/!/526.!>.!+5*!2,+4,+!>.!N>;,-*!
EG:E%:
!
Figure 12.14. The preg_grep() function. Output from Example 12.13.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
12.2.3. Getting Control—The RegEx Metacharacters
Regular expression metacharacters are characters that do not represent themselves. They are endowed with special
powers to allow you to control the search pattern in some way (e.g., finding a pattern only at the beginning of the line,
or at the end of the line, or if it starts with an upper- or lowercase letter). Metacharacters will lose their special meaning
if preceded with a backslash. For example, the dot metacharacter represents any single character, but when preceded
with a backslash is just a dot or period.
If you see a backslash preceding a metacharacter, the backslash turns off the meaning of the metacharacter, but if you
see a backslash preceding an alphanumeric character in a regular expression, then the backslash is used to create a
metasymbol. A metasymbol provides a simpler form to represent some of regular expression metacharacters. For
example, [0-9] represents numbers in the range between 0 and 9, and \d represents the same thing. [0-9] uses the
bracketed character class, whereas \d is a metasymbol (see Table 12.6).
Table 12.6. Metacharacters
Character+Class
What+It+Matches
Metacharacter
A>.;<*!850-08+*-/!0.9!9>;>+/!
"32-!72-*&!/**!XQ0+85>.;!
A>.;<*!_50-08+*-/!0.9!
H>;>+/Y!2.!40;*!'G%(
Q0+85*/!0.1!850-08+*-!*=8*4+!0!.*6<>.*:!!
Matches any single character in a set.
Q0+85*/!0.1!/>.;<*!850-08+*-!.2+!>.!0!/*+:
.!!
[a-z0-9]
[^a-z0-9]
A>.;<*!850-08+*-/!0.9!9>;>+/!
cQ*+0/17?2</!"32-!72-*&!
/**!XQ*+0/17?2</Y!2.!40;*!
'KJ(
Q0+85*/!2.*!9>;>+:!!
Matches a nondigit, same as [^0-9].
Matches an alphanumeric (word) character.
Q0+85*/!0!.2.0<450.,7*->8!".2.62-9(!
850-08+*-:
\d!!
\D
\w
\W
R5>+*/408*!850-08+*-/
Q0+85*/!65>+*/408*!850-08+*-&!/408*/&!+0?/&!
0.9!.*6<>.*/:
\s
!
Q0+85*/!0!.2.65>+*/408*!850-08+*-:
\S
!
Q0+85*/!0!.*6<>.*:
\n
!
Q0+85*/!0!-*+, :
\r
!
Q0+85*/!0!+0?:
\t
!
Q0+85*/!0!32-7!3**9:
\f
!
Q0+85*/!0!.,<<!850-08+*-
\0
W.852-*9!850-08+*-/!"32-!
72-*!/**!XW.852->.;!
Q*+0850-08+*-/Y!2.!40;*!
'GJ(
Q0+85*/!0!62-9!?2,.90-1:!!
Matches a nonword boundary.
Matches to beginning of line.
Matches to end of line.
Matches the beginning of the string only.
Q0+85*/!+5*!*.9!23!+5*!/+->.;!2-!<>.*:
\b!!
\B
^
$
\A
\D
Q0+85*/!J!2-!E!288, *.8*/!23!+5*!<*++*-!x:
x?
)*4*0+*9!850-08+*-/!"32-!
72-*&!/**!XQ*+0850-08+*-/!
+2!)*4*0+!#0++* !Q0+85*/Y!
2.!40;*!'KK(
Q0+85*/!J!2-!72-*!288, *.8*/!23!+5*!<*++*-!
x:
x*
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Table 12.6. Metacharacters
Character+Class
What+It+Matches
Metacharacter
!
Q0+85*/!E!2-!72-*!288, *.8*/!23!+5*!<*++*-!
x:
x+
Q0+85*/!2.*!2-!72-*!40++* /!23!xyz!"*:;:&!
xyxxyzxyz(:
(xyz)+
d-2,4*9!850-08+*-/!"32-!
72-*&!/**!Xd-2,4>.;!2-!
_<,/+*->.;:Y!2.!40;*!'%%(
Q0+85*/!0+!<*0/+!m!288, *.8*/!23!+5*!<*++*-!x&!
0.9!.2!72-*!+50.!n!288, *.8*/!23!+5*!<*++*-!
x:
x{m,n}
W<+* 0+>D*!850-08+*-/!"32-!
72-*&!/**!XQ*+0850-08+*-/!
32-!W<+* 0+>2.Y!2.!40;*!
'%K(
Q0+85*/!2.*!23!was&!were&!2-!will:
was|were|will
)*7*7?*-*9!850-08+*-/!
"32-!72-*&!
/**X)*7*7?*->.;!2-!
_04+,->.;Y!2.!40;*!'%'(
e/*9!32-!?08@-*3*-*.8>.;:!!
Matches first set of parentheses.
Matches second set of parentheses.
Q0+85*/!+5>-9!/*+!23!40-*.+5*/*/:
(string)!!
\1 or $1
\2 or $2
\3!2-!$3
Q0+85*/!x!?,+!92*/!.2+!-*7*7?*-!+5*!70+85:!
F5*/*!0-*!80<<*9!.2.804+,->.;!40-*.+5*/*/:
(?:x)
#2/>+>D*!<22@05*09!0.9!
<22@?*5>.9!"32-!72-*&!/**!
X#2/>+>D*!B22@05*09Y!2.!
40;*!''J!0.9!X#2/>+>D*!
B22@?*5>.9Y!2.!40;*!''G
Q0+85*/!x!2.<1!>3!x!>/!32<<26*9!?1!y:!N2-!
*=074<*&!/Jack(?=Sprat)/!70+85*/!Jack!
2.<1!>3!>+!>/!32<<26*9!?1!Sprat:!
/Jack(?=Sprat|Frost)/!70+85*/!Jack!2.<1!>3!
>+!>/!32<<26*9!?1!Sprat!2-!Frost:!O*>+5*-!
Sprat!.2-!Frost!>/!@*4+!0/!40-+!23!650+!60/!
70+85*9:
x(?=y)
!
Q0+85*/!x!2.<1!>3!x!>/!.2+!32<<26*9!?1!y:!N2-!
*=074<*&!/\d+(?!\.)/!70+85*/!2.*!2-!72-*!
.,7?*-/!2.<1!>3!+5*1!0-*!.2+!32<<26*9!?1!0!
9*8>70<!42>.+:
x(?!y)
!
The following regular expression contains metacharacters:
/^a c/
!
The first metacharacter is a caret (^). The caret metacharacter matches for a string only if it is at the beginning of the
line. The period (.) is used to match for any single character, including a space. This expression contains three periods,
representing any three characters. To find a literal period or any other character that does not represent itself, the
character must be preceded by a backslash to prevent interpretation.
The expression reads: Search at the beginning of the line for a letter a, followed by any three single characters,
followed by a letter c. It will match, for example, abbbc, a123c, a c, aAx3c, and so on, only if those patterns were
found at the beginning of the line.
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
In the following examples, we perform pattern matches, searches, and replacements based on the data from a text file
called data.txt. In the PHP program, the file will be opened and, within a while loop, each line will be read. The
functions discussed in the previous section will be used to find patterns within each line of the file. The regular
expressions will contain metacharacters, described in Table 12.6.
Anchoring Metacharacters
Often it is necessary to find a pattern only if it is found at the beginning or end of a line, word, or string. The
“anchoring” metacharacters (see Table 12.7) are based on a position just to the left or to the right of the character that is
being matched. Anchors are technically called zero-width assertions because they correspond to positions, not actual
characters in a string; for example, /^abc/ means find abc at the beginning of the line, where the ^ represents a
position, not an actual character.
Table 12.7. Anchors (Assertions)
Metacharacter
What+It+Matches
^
Q0+85*/!+2!?*;> >.;!23!<>.*!2-!?*;> >.;!23!/+->.;:
$
Q0+85*/!+2!*.9!23!<>.*!2-!*.9!23!/+->.;:
\A
Q0+85*/!+5*!?*;> >.;!23!0!/+->.;:
\b
Q0+85*/!0!62-9!?2,.90-1:
\B
Q0+85*/!0!.2.62-9!?2,.90-1:
\D
Q0+85*/!+5*!*.9!23!0!/+->.;:
!
Beginning-of-Line Anchor
The ^ metacharacter is called the beginning-of-line anchor. It is the first character in the regular expression and matches
a pattern found at the beginning of a line or string.
Example 12.14.
_29*!a>*6b!
(The file data.txt Contents)
Mama Bear 702
Steve Blenheim 100
Betty Boop 200
Igor Chevsky 300
Norma Cord 400
Jon DeLoach 500
Karen Evich 600
BB Kingson 803
(The PHP Program)
<?php
1 $fh=fopen("data.txt", "r");
2 while( ! feof($fh)){
3 $text = fgets($fh);
4 if (preg_match("/^B/", $text)){
5 echo "$text";
}
}
?>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
(Output)
Betty Boop 200
BB Kingson 803
Explanation
E
F5*!3><*!data.txt!>/!24*.*9!32-!-*09>.;:
G
W/!<2.;!0/!+5*!*.9!23!3><*!50/!.2+!?**.!-*085*9&!+5*!while!<224!6><<!82.+>.,*!+2!
*=*8,+*:
K
N2-!*085!>+*-0+>2.!23!+5*!<224&!+5*!fgets()!3,.8+>2.!-*09/!>.!0!<>.*!23!+*=+:
%
F5*!preg_match()!3,.8+>2.!6><<!-*+, !TRUE!>3!0!40++* !82./>/+>.;!23!0!/+->.;!
?*;> >.;!6>+5!0!B!>/!70+85*9:
'
F5*!<>.*/!+50+!70+85*9!0-*!4->.+*9:
End-of-Line Anchor
The end-of-line anchor, a dollar sign, is used to indicate the ending position in a line. The dollar sign must be the last
character in the pattern, just before the closing forward slash delimiter of the regular expression, or it no longer means
“end-of-line anchor.”
[1]
[1]
If moving files between Windows and UNIX, the end-of-line anchor might not work. You can use programs such as
dos2unix to address this problem.
Example 12.15.
_29*!a>*6b!
(The File data.txt Contents)
Mama Bear 702
Steve Blenheim 100
Betty Boop 200
Igor Chevsky 300
Norma Cord 400
Jon DeLoach 500
Karen Evich 600
BB Kingson 803
(The PHP Program)
<?php
1 $fh=fopen("data.txt", "r");
2 while( ! feof($fh)){
3 $text = fgets($fh);
4 if (preg_match("/0$/", $text)){
5 echo "$text";
}
}
?>
(Output)
Steve Blenheim 100
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
[...]... in parentheses and prepended with ?= The text in the lookahead is not captured as in the previous examples, but is only used as criteria for the search For example, the regular expression "/Bob (?= Black|Jones)/" says search for Bob and look ahead to see if either Black or Jones are next, and if so, there is a match The parentheses will not capture and create $1, and the values Black and Jones will... provides additional metasymbols to represent a character class The symbols \d and \D represent a single digit and a single nondigit, respectively (the same as [0-9] and [^0-9]); \w and \W represent a single word character and a single nonword character, respectively (the same as [A-Za-z_0-9] and [^A-Za-z_0-9]) If you are searching for a particular character within a regular expression, you can use the... character class can also be represented as a range of characters by placing a dash between two characters, the first being the start of the range and the second the end of the range; for example, [0-9] represents one character in the range between 0 and 9 and [A-Za-z0-9] represents one alphanumeric character If you want to represent a range between 10 and 13, the regular expression would be /1[0-3]/, not /[10-13]/... pattern that begins with an uppercase letter, followed by zero or more lowercase letters, a space, another uppercase letter, followed by zero or more lowercase letters (only lowercase), and a space Because the last name DeLoach contains an uppercase letter D, followed by both upper- and lowercase letters, this line is not a match The first... uppercase letter, followed by zero or more lowercase letters, a space, another uppercase letter, followed by zero or more upper-‐ and lowercase letters In the previous example, DeLoach did not match because the last name contained a mix of upper-‐ and lowercase letters That problem was addressed in this example, by including [a-zA-Z] in the... amount of occurrences; two numbers separated by a comma (e.g., {3,10}), represents an inclusive range; and a number followed by a comma (e.g., {4,}), represents a number of characters and any amount after that Table 12.11 Repeating Characters Metacharacter What It Does a{10} Matches exactly 10 occurrences of the letter a a{3,5} Matches between 3 and 5 occurrences of the letter a a{6,}... preg_replace(), parenthesized patterns can be backreferenced by using a backslash and the number of the pattern; for example, the first parenthesized pattern is referenced as \1, the second as \2, the third as \3, up to \9 If enclosed in double quotes, the backreferences are referenced as \\1, \\2, \\3, and so on Newer versions of PHP use $1, $2, $3, and so on, rather than backslashes without limit on the... love, and would match glove, clove, or love, but not clover /\blove\b matches a word beginning and ending with the pattern love, and would match only the word love Example 12.16 Code View: (The File data.txt Contents) Mama Bear 702 Steve Blenheim 100 Betty Boop 200 Igor Chevsky 300 Norma Cord 400 Jon DeLoach 500 Karen Evich 600 BB Kingson 803 -(The PHP Script) < ?php. .. or A to Z, and [^0-9] matches a single digit not in the range between 0 and 9 (see Table 12.8) Table 12.8 Character Classes Metacharacter What It Matches [abc] Matches an a or b or c [a–z0–9_] Matches any single character in a set [^a–z0–9_] Matches any single character not in a set PHP provides additional metasymbols to represent a character class The symbols \d and \D represent... O’Reilley? The + Metacharacter and Greed The + metacharacter attaches itself to the preceding character and matches on one or more of that character Example 12.30 Code View: (The File data.txt Contents) Mama Bear 702 Steve Blenheim 100 Betty Boop 200 Igor Chevsky 300 Norma Cord 400 Jon DeLoach 500 Karen Evich 600 BB Kingson 803 -(The PHP Script) < ?php 1 $fh=fopen("data.txt", . same as [0-9] and [^0-9]); w and W represent a single word character
and a single nonword character, respectively (the same as [A-Za-z_0-9] and [^A-Za-z_0-9])
Q0+85*/!0.1!/>.;<*!850-08+*-!.2+!>.!0!/*+:
!
PHP provides additional metasymbols to represent a character class. The symbols d and D represent a single digit
and a single nondigit,