Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 78 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
78
Dung lượng
3,09 MB
Nội dung
6. Click the Find First icon and then click the Find Next icon twice, observing each time what char- acter sequence is or is not matched. Figure 8-7 shows the appearance after the Find Next icon has been clicked twice. With the modi- fication of the regular expression, all three occurrences of the character sequence Andrew now match. Figure 8-7 7. Now that you know that each occurrence of the relevant string matches, you can modify the regular expression to create two groups between which you can insert the desired apostrophe to make Andrew’s possessive. Modify the regular expression in the Match tab to (Andrew)(s)(?=\b). 8. Using the Find First and Find Next icons, confirm that all three occurrences of the slightly modi- fied desired character sequence Andrews match. 9. Click the Replace tab. In the lower pane on the Replace tab, type $1’$2. 10. On the Test tab, click the Replace All icon, and inspect the results in the lower pane on the Test tab. (You may need to adjust the window size to see all the results.) Figure 8-8 shows the appearance after this step. 207 Lookahead and Lookbehind 11_574892 ch08.qxd 1/7/05 11:02 PM Page 207 Figure 8-8 How It Works The pattern Andrew((?=s )|(?=s\b)) matches the character sequence Andrew followed by either s and a space character or by s and a word boundary. In the first line, the character sequence Andrew is followed by s and a space character, so it satisfies the first lookahead constraint. Because the match is successful and the lookahead constraint is satisfied, there is a match for the whole regular expression. In the second line, the character sequence Andrew is followed by s and a period character. The second lookahead constraint is satisfied. On the third line, the character sequence Andrew is followed by an s and then a question mark. Because the question mark is in neither lookahead, the lookahead constraint is not satisfied. When the regular expression pattern is changed to Andrew(?=s\b), when Andrew is matched, the lookahead constraint is an s followed by a word boundary. There is a word boundary following each Andrews and the following character on all three lines. In Line 1, there is a word boundary before the space character. In Line 2, there is a word boundary before the period character. In Line 3, there is a word boundary before the question mark. So each occurrence of Andrew matches. 208 Chapter 8 11_574892 ch08.qxd 1/7/05 11:02 PM Page 208 When the regular expression is modified to (Andrew)(s)(?=\b), you capture the character sequence Andrew in $1 and capture the s in $2. The lookahead does not capture any characters. So to insert an apostrophe, you want $1 (Andrew) to be followed by an apostrophe to be followed by $2 (a lowercase s). Lookbehind Lookbehind tests whether a sequence of characters that is matched is preceded (positive lookbehind) or not preceded (negative lookbehind) by another sequence of characters. For example, if you wanted to match the surname Jekyll only if it is preceded by the sequence of char- acters Dr. (an uppercase D, a lowercase r, a period, and a space character), you would use a pattern like this: (?<=Dr. )Jekyll The component (?<=Dr. ) indicates the sequence of characters that is tested for as a lookbehind, and the component Jekyll matches literally. Positive Lookbehind A positive lookbehind is a constraint on matching. Matching occurs only if the pattern to be matched is preceded by the pattern contained in the lookbehind assertion. Try It Out Positive Lookbehind 1. Open the Komodo Regular Expression Toolkit, and delete any residual regular expression and sample text. 2. In the Enter a String to Match Against area, enter the test text, Mr. Hyde and Dr. Jekyll are char- acters in a famous novel. 3. In the Enter a Regular Expression area, enter the pattern (?<=Dr. )Jekyll. 4. Inspect the highlighted text in the String to Match Against area and the description of the results in the gray area below, Match succeeded: 0 groups. Figure 8-9 shows the appearance. Notice that the sequence of characters Jekyll is highlighted. 5. Edit the regular expression pattern to read (?<=Mr. )Jekyll. 6. Inspect the description of the results in the gray area, No matches found. 7. Edit the regular expression pattern to read ((?<=Mr. )|(?<=Mister ))Hyde. Ensure that there is a space character after the r of Mister. If that is omitted, there will be no match. 209 Lookahead and Lookbehind 11_574892 ch08.qxd 1/7/05 11:02 PM Page 209 Figure 8-9 8. Inspect the description of the results in the gray area, Match succeeded: 1 group. Also notice that the character sequence Hyde is highlighted. 9. Edit the Mr. in the test text to read Mister. 10. Inspect the gray area again. Again, the description is Match succeeded: 1 group. Figure 8-10 shows the appearance. Figure 8-10 210 Chapter 8 11_574892 ch08.qxd 1/7/05 11:02 PM Page 210 How It Works The following description of how the regular expression engine operates is a conceptual one and may not reflect the approach taken by any individual regular expression engine. The text matched is the sequence of characters Jekyll. Matching starts at the beginning of the test text. The character following the regular expression’s posi- tion is checked to see whether it is an uppercase J. If so, that is matched, and an attempt is made to match the other characters making up the sequence of characters Jekyll. If any attempt to match fails, the whole pattern fails, and the regular expression engine moves forward through the text attempting to match the character sequence Jekyll. If a match is found for the character sequence Jekyll, the regular expression engine is at the position immediately before the J of Jekyll. It checks that the immediately preceding character is a space charac- ter. If so, it then tests if the character before that is a period character. If so, it tests if the character before that is a lowercase r. Finally, it tests if the character before that is an uppercase D. Because matching of Jekyll was successful, and the constraint that the character sequence Jekyll be preceded by the charac- ter sequence Dr. (including a space character) was satisfied, the whole regular expression succeeds. When you edit the pattern to read (?<=Mr. )Jekyll, the character sequence Jekyll is successfully matched as before. However, when the regular expression engine checks the characters that precede that character sequence, the constraint fails, because despite the fact (reading backward) that the space character, the period character, and the lowercase r are all present, there is no preceding uppercase D. Because the lookbehind constraint is not satisfied, there is no match. It is possible to express alternatives in lookbehind. The problem definition might read as follows: Match the character sequence Hyde if it is preceded by EITHER the character sequence Mr. (includ- ing a final space character) OR by the character sequence Mister (including a final space character). After changing the pattern to read ((?<=Mr. )|(?<=Mister ))Hyde, the regular expression engine attempts to match the character sequence Hyde. When it reaches the position immediately before the H of Hyde it will successfully match that character sequence. It then must also satisfy the constraint on the sequence of characters that precedes Hyde. The pattern ((?<=Mr. )|(?<=Mister ))Hyde uses parentheses to group two alternative patterns that must precede Hyde. The first option, specified by the pattern (?<=Mr. ), requires that the sequence of four characters M, r, a period, and a space character must precede Hyde. At Step 8, that four-character sequence matches. After the edit has been made to the test text, replacing Mr. with Mister, the other alternative comes into play. The pattern (?<=Mister ) requires that a seven-character sequence (Mister plus a space charac- ter) precedes Hyde. The positioning of the lookbehind assertion is important, as you will see in the next example. 211 Lookahead and Lookbehind 11_574892 ch08.qxd 1/7/05 11:02 PM Page 211 Try It Out Positioning of Positive Lookbehind 1. Open RegexBuddy, click the Match tab, and enter the regular expression (?<=like )SQL Server. 2. Click the Test tab, click the Open File icon, and open the Databases.txt file. 3. Click the Find First icon, and inspect the highlighted text in the pane in the Test tab, as shown in Figure 8-11. Figure 8-11 4. Edit the regular expression in the Match tab so that it reads SQL Server(?<=like ). 5. Click the Find First icon in the Test tab. Confirm that there is no now no highlighted text. 6. Edit the regular expression in the Match tab so that it reads SQL Server(?<=like SQL Server) . 7. Click the Find First icon in the Test tab. Confirm that there is again a match in the test text, as shown in Figure 8-12. How It Works When the pattern is (?<=like )SQL Server, the lookbehind looks behind, starting from the position immediately before the S of SQL. Because the character sequence like SQL Server exists in the test text, there is a match. When the pattern is SQL Server(?<=like ), the lookbehind starts from the posi- tion after the r of Server. Because that position is preceded by Server, not like, and the lookbehind is attempting to match the character sequence like, there is no match. 212 Chapter 8 11_574892 ch08.qxd 1/7/05 11:02 PM Page 212 Figure 8-12 Negative Lookbehind Negative lookbehind is a constraint on matching. Matching occurs only if the pattern to be matched is not preceded by the pattern contained in the lookbehind assertion. Try It Out Negative Lookbehind Find occurrences of the character sequence SQL Server that are not preceded by the character sequence like followed by a space character. 1. Open RegexBuddy, click the Match tab, and enter the regular expression (?<!like )SQL Server. 2. Click the Test tab, click the Open File icon, and open the Databases.txt file. 3. Click the Find First icon, and inspect the highlighted text in the pane in the Test tab, as shown in Figure 8-13. 4. Look for other matches by clicking the Find Next icon several times. Note which occurrences of SQL Server match or don’t match. 213 Lookahead and Lookbehind 11_574892 ch08.qxd 1/7/05 11:02 PM Page 213 Figure 8-13 How It Works When the regular expression engine matches the character sequence SQL Server, it checks whether the preceding characters correspond to the pattern specified in the lookbehind. The first occurrence of SQL Server is not preceded by the character sequence like followed by a space character. The negative lookbehind is, therefore, satisfied. Because the character sequence SQL Server matches and the negative lookbehind constraint is satisfied, the whole regular expression matches. The only occurrence of the character sequence SQL Server that fails to match is the occurrence pre- ceded by the word like. The occurrence of the character sequence like followed by a space character does not satisfy the constraint imposed by the lookbehind. Therefore, although the character sequence SQL Server matches, the failure to satisfy the lookbehind constraint means that the whole regular expression fails to match. How to Match Positions By combining lookahead and lookbehind, it is possible to match positions between characters. For example, suppose that you wanted to match a position immediately before the Andrew of the following sample text: This is Andrews book. 214 Chapter 8 11_574892 ch08.qxd 1/7/05 11:02 PM Page 214 You could state the problem definition as follows: Match a position that is preceded by the character sequence is followed by a space character and is followed by the character sequence Andrew. You could match that position using the following pattern: (?<=is )(?=Andrew) Try It Out Matching a Position 1. Open RegexBuddy. On the Match tab, type the regular expression pattern (?<=is )(?=Andrew). If you used RegexBuddy for the replace example earlier in this chapter, delete the replacement text on the Replace tab. 2. On the Test tab, enter the sample text This is Andrews book. 3. Click the Find First icon, and inspect the information in the lower pane of the Test tab, as shown in Figure 8-14. On-screen, you can see the cursor blinking at the position immediately before the initial A of Andrews. Figure 8-14 215 Lookahead and Lookbehind 11_574892 ch08.qxd 1/7/05 11:02 PM Page 215 [...]... unanticipated comma How Much Should the Regular Expressions Do? Most of the examples earlier in this book use a range of tools with regular expression functionality to apply regular expressions That’s great when teaching regular expressions, but when you use regular 232 Sensitivity and Specificity of Regular Expressions expressions as a developer, you will typically be using regular expressions inside code written... this book, you have in all likelihood already discovered that regular expressions can be hard to write Regular expressions can also be hard to read, whether they were written by somebody else or by you, even a short time ago When regular expressions are used in a project over a period of time, you come face to face with a third truism: Regular expressions are hard to maintain The purpose of this chapter... a period character 226 Sensitivity and Specificity of Regular Expressions You can do that using the following pattern: \w*(? . replacement text is 1,2 34, which is what you want. The regular expression pattern works for four-digit numbers. 216 Chapter 8 11_5 748 92 ch08.qxd 1/7/05 11:02 PM Page 216 Figure 8-15 4. Edit the test. pattern. 220 Chapter 8 11_5 748 92 ch08.qxd 1/7/05 11:02 PM Page 220 9 Sensitivity and Specificity of Regular Expressions This chapter discusses the issues of sensitivity and specificity of regular expression. specified command. Figure 9-2 2 24 Chapter 9 12_5 748 92 ch09.qxd 1/7/05 11:01 PM Page 2 24 As the figure shows, all the valid email addresses (which are on lines 4, 5, 9, and 10) are selected. This gives