1. Trang chủ
  2. » Công Nghệ Thông Tin

Thinking in Java 3rd Edition phần 7 pdf

119 378 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 119
Dung lượng 570,71 KB

Nội dung

684 Thinking in Java www.BruceEckel.com static method you’ll usually use getClass(). You don’t need to use the current class as the node identifier, but that’s the usual practice. Feedback Once you create the node, it’s available for either loading or reading data. This example loads the node with various types of items, and then gets the keys(). These come back as a String[], which you might not expect if you’re used to keys() in the collections library. Here, they’re converted to a List which is used to produce an Iterator for printing the keys and values. Notice the second argument to get(). This is the default value which is produced if there isn’t any entry for that key value. While iterating through a set of keys, you always know there’s an entry so using null as the default is safe, but normally you’ll be fetching a named key, as in: Feedback prefs.getInt("Companions", 0)); In the normal case you’ll want to provide a reasonable default value. In fact, a typical idiom is seen in the lines: int usageCount = prefs.getInt("UsageCount", 0); usageCount++; prefs.putInt("UsageCount", usageCount); This way, the first time you run the program the UsageCount will be zero, but on subsequent invocations it will be nonzero. Feedback When you run PreferencesDemo.java you’ll see that the UsageCount does indeed increment every time you run the program, but where is the data stored? There’s no local file that appears after the program is run the first time. The Preferences API uses appropriate system resources to accomplish its task, and these will vary depending on the OS. In Windows, the registry is used (since it’s already a hierarchy of nodes with key-value pairs). But the whole point is that the information is magically stored for you so that you don’t have to worry about how it works from one system to another. Feedback There’s more to the preferences API than shown here. Consult the JDK documentation, which is fairly understandable, for further details. Feedback Chapter 12: The Java I/O System 685 Regular expressions To finish this chapter, we’ll look at regular expressions, which were added in JDK 1.4 but have been integral to Standard Unix utilities like sed & awk, and languages like Python and Perl (some would argue that they are predominant reason for Perl’s success). Technically these are string manipulation tools (previously delegated to the String, StringBuffer, & StringTokenizer classes in Java), but they are typically used in conjunction with I/O so it’s not too far-fetched to include them here 5 . Feedback Regular expressions are powerful and flexible text processing tools. They allow you to specify, programmatically, complex patterns of text that can be discovered in an input string. Once you discover these patterns, you can then react to them any way you want. Although the syntax of regular expressions can be intimidating at first, they provide a compact and dynamic language which can be employed to solve all sorts of string processing, matching and selection, editing, and verification problems in a completely general way. Feedback Creating regular expressions You can begin learning regular expressions with a useful subset of the possible constructs. A complete list of constructs for building regular expressions can be found in the JavaDocs for the Pattern class for package java.util.regex. Feedback Characters B The specific character B \xhh Character with hex value 0xhh \uhhhh The Unicode character with hex representation 0xhhhh \t Tab 5 A chapter dedicated to strings will have to wait until the 4 th edition. Mike Shea contributed to this section. 686 Thinking in Java www.BruceEckel.com \n Newline \r Carriage return \f Formfeed \e Escape The power of regular expressions begins to appear when defining character classes. Here are some typical ways to create character classes, and some predefined classes: Feedback Character Classes . Represents any character [abc] Any of the characters a, b, or c (same as a|b|c) [^abc] Any character except a, b, and c (negation) [a-zA-Z] Any character a thru z or A thru Z (range) [abc[hij]] Any of a,b,c,h,i,j (same as a|b|c|h|i|j) (union) [a-z&&[hij]] Either h, i, or j (intersection) \s A whitespace character (space, tab, newline, formfeed, carriage return) \S A non-whitespace character ([^\s]) \d A numeric digit [0-9] \D A non-digit [^0-9] \w A word character [a-zA-Z_0-9] \W A non-word character [^\w] If you have any experience with regular expressions in other languages, you’ll immediately notice a difference in the way backslashes are handled. In other languages, “\\” means “I want to insert a plain old (literal) backslash in the regular expression. Don’t give it any special meaning.” In Java, “\\” means “I’m inserting a regular expression backslash, so the following character has special meaning.” For example, if you want to indicate one or more word characters, your regular expression string will be “\\w+”. If you want to insert a literal backslash, you say “\\\\”. Chapter 12: The Java I/O System 687 However, things like newlines and tabs just use a single backslash: “\n\t”. Feedback What’s shown here is only a sampling; you’ll want to have the java.util.regex.Pattern JDK documentation page bookmarked or on your “start” menu so you can easily access all the possible regular expression patterns. Feedback Logical Operators XY X followed by Y X|Y X or Y (X) A capturing group. You can refer to the i th captured group later in the expression with \i Boundary Matchers ^ Beginning of a line $ End of a line \b Word boundary \B Non-word boundary \G End of the previous match As an example, each of the following represent valid regular expressions, and all will successfully match the character sequence "Rudolph": Rudolph [rR]udolph [rR][aeiou][a-z]ol.* R.* Quantifiers A Quantifier describes the way that a pattern absorbs input text: • Greedy: Quantifiers are greedy unless otherwise altered. A greedy expression finds as many possible matches for the pattern as possible. A typical cause of problems is assuming that your pattern 688 Thinking in Java www.BruceEckel.com will only match the first possible group of characters, when it’s actually greedy and will keep going. Feedback • Reluctant: Specified with a question mark. Matches the minimum necessary number of characters to satisfy the pattern. Also called lazy, minimal matching, non-greedy or ungreedy. Feedback • Possessive: Currently only available in Java (not in other languages), and is more advanced so you probably won’t use it right away. As a regular expression is applied to a string, it generates many states so that it can backtrack if the match fails. Possessive quantifiers do not keep those intermediate states, preventing backtracking. They can be used to prevent the a regular expression from running away and also to make it execute more efficiently. Feedback Greedy Reluctant Possessive Matches X? X?? X?+ X, one or none X* X*? X*+ X, zero or more X+ X+? X++ X, one or more X{n} X{n}? X{n}+ X, exactly n times X{n,} X{n,}? X{n,}+ X, at least n times X{n,m} X{n,m}? X{n,m}+ X, at least n but not more than m times You should be very aware that the expression ‘X’ will often need to be surrounded in parentheses for it to work the way you desire. For example: abc+ Might seem like it would match the sequence ‘abc’ one or more times, and if you apply it to the input string ‘abcabcabc’ you will in fact get three matches. However, the expression actually says “match ‘ab’ followed by one or more occurrences of ‘c’.” To match the entire string ‘abc’ one or more times, you must say: Chapter 12: The Java I/O System 689 (abc)+ You can easily be fooled when using regular expressions – it’s a new language, on top of Java. Feedback CharSequence JDK1.4 defines a new interface called CharSequence, which establishes a definition of a character sequence, abstracted from the String or StringBuffer classes: interface CharSequence { charAt(int i); length(); subSequence(int start, int end); toString(); } The String, StringBuffer, and CharBuffer classes have been modified implement this new CharSequence interface. Many regular expression operations take CharSequence arguments. Feedback Pattern and Matcher As a first example, the following class can be used to test regular expressions against an input string. The first argument is the input string to match against, followed by one or more regular epressions to be applied to the input. Under Unix/Linux, the regular expressions must be quoted on the command line. Feedback This program can be useful in testing regular expressions as you construct them to see that they produce your intended matching behavior. //: c12:TestRegularExpression.java // Allows you to easly try out regular expressions. // {Args: abcabcabcdefabc "abc+" "(abc)+" "(abc){2,}" } import java.util.regex.*; public class TestRegularExpression { public static void main(String[] args) { if(args.length < 2) { System.out.println("Usage:\n" + "java TestRegularExpression " + "characterSequence regularExpression+"); 690 Thinking in Java www.BruceEckel.com System.exit(0); } System.out.println("Input: \"" + args[0] + "\""); for(int i = 1; i < args.length; i++) { System.out.println( "Regular expression: \"" + args[i] + "\""); Pattern p = Pattern.compile(args[i]); Matcher m = p.matcher(args[0]); while(m.find()) { System.out.println("Match \"" + m.group() + "\" at positions " + m.start() + "-" + (m.end() - 1)); } } } } ///:~ Regular expressions are implemented in Java through the Pattern and Matcher classes in the package java.util.regex. A Pattern object represents a compiled version of a regular expression. The static compile( ) method compiles a regular expression string into a Pattern object. As seen above, you can use the matcher( ) method and the input string to produce a Matcher object from the compiled Pattern object. Pattern also has a static boolean matches(String regex, CharSequence input) for quickly discerning if regex can be found in input, and a split( ) method that produces an array of String that has been broken around matches of the regex. Feedback A Matcher object is generated by calling Pattern.matcher( ) with the input string as an argument. The Matcher object is then used to access the results, using methods to evaluate the success or failure of different types of matches: boolean matches() boolean lookingAt() boolean find() boolean find(int start) The matches( ) method is successful if the pattern matches the entire input string, while lookingAt( ) is successful if the input string, starting at the beginning, is a match to the pattern. Feedback Chapter 12: The Java I/O System 691 find() Matcher.find( ) can be used to discover multiple pattern matches in the CharSequence to which it is applied. For example: //: c12:FindDemo.java import java.util.regex.*; import com.bruceeckel.simpletest.*; import java.util.*; public class FindDemo { private static Test monitor = new Test(); public static void main(String[] args) { Matcher m = Pattern.compile("\\w+") .matcher("Evening is full of the linnet's wings"); while(m.find()) System.out.println(m.group()); int i = 0; while(m.find(i)) { System.out.print(m.group() + " "); i++; } monitor.expect(new String[] { "Evening", "is", "full", "of", "the", "linnet", "s", "wings", "Evening vening ening ning ing ng g is is s full " + "full ull ll l of of f the the he e linnet linnet " + "innet nnet net et t s s wings wings ings ngs gs s " }); } } ///:~ The pattern “\\w+” indicates “one or more word characters,” so it will simply split the input up into words. find( ) is like an iterator, moving forward through the input string. However, the second version of find( ) can be given an integer argument that tells it the character position for the beginning of the search – this version resets the search position to the value of the argument, as you can see from the output. Feedback 692 Thinking in Java www.BruceEckel.com Groups Groups are regular expressions set off by parentheses, which can be called up later with their group number. Group zero indicates the whole expression match, group one is the first parenthesized group, etc. Thus in A(B(C))D there are three groups: group 0 is ABCD, group 1 is BC, and group 2 is C. Feedback The Matcher object has methods to give you information about groups: public int groupCount( ) returns the number of groups in this matcher's pattern. Group zero is not included in this count. public String group( ) returns group zero (the entire match) from the previous match operation (find( ), for example). public String group(int i) returns the given group number during the previous match operation. If the match was successful but the group specified failed to match any part of the input string, then null is returned. public int start(int group) returns the start index of the group found in the previous match operation. public int end(int group) returns the index of the last character, plus one, of the group found in the previous match operation. Feedback Here’s an example of regular expression groups: //: c12:Groups.java import java.util.regex.*; import com.bruceeckel.simpletest.*; public class Groups { private static Test monitor = new Test(); static public final String poem = "Twas brillig, and the slithy toves\n" + "Did gyre and gimble in the wabe.\n" + "All mimsy were the borogoves,\n" + "And the mome raths outgrabe.\n\n" + "Beware the Jabberwock, my son,\n" + "The jaws that bite, the claws that catch.\n" + Chapter 12: The Java I/O System 693 "Beware the Jubjub bird, and shun\n" + "The frumious Bandersnatch."; public static void main(String[] args) { Matcher m = Pattern.compile("(?m)(\\S+)\\s+((\\S+)\\s+(\\S+))$") .matcher(poem); while(m.find()) { for(int j = 0; j <= m.groupCount(); j++) System.out.print("[" + m.group(j) + "]"); System.out.println(); } monitor.expect(new String[]{ "[the slithy toves]" + "[the][slithy toves][slithy][toves]", "[in the wabe.][in][the wabe.][the][wabe.]", "[were the borogoves,]" + "[were][the borogoves,][the][borogoves,]", "[mome raths outgrabe.]" + "[mome][raths outgrabe.][raths][outgrabe.]", "[Jabberwock, my son,]" + "[Jabberwock,][my son,][my][son,]", "[claws that catch.]" + "[claws][that catch.][that][catch.]", "[bird, and shun][bird,][and shun][and][shun]", "[The frumious Bandersnatch.][The]" + "[frumious Bandersnatch.][frumious][Bandersnatch.]" }); } } ///:~ The poem is the first part of Lewis Carroll’s “Jabberwocky,” from Through the Looking Glass. You can see that the regular expression pattern has a number of parenthesized groups, consisting of any number of non- whitespace characters (‘\S+’) followed by any number of whitespace characters (‘\s+’). The goal is to capture the last three words on each line; the end of a line is delimited by ‘$’. However, the normal behavior is to match ‘$’ with the end of the entire input sequence, so we must explicitly tell the regular expression to pay attention to newlines within the input. This is accomplished with the ‘(?m)’ pattern flag at the beginning of the sequence (pattern flags will be shown shortly). Feedback [...]... starting with # are ignored until the end of a line Unix lines mode can also be enabled via the embedded flag expression In dotall mode, the expression ‘.’ matches any character, including a line terminator By default the ‘.’ Thinking in Java www.BruceEckel.com expression does not match line terminators Pattern.MULTILINE (?m) In multiline mode the expressions ‘^’ and ‘$’ match the beginning and ending... "java" , "Java" , "JAVA" , etc and attempt a match for each line within a multiline set (matches starting at the beginning of the character sequence and following each line terminator within the character sequence) Note that the group( ) method only produces the matched portion Feedback split() Splitting divides an input string into an array of String objects, delimited by the regular expression String[]... main(String[] args) { Pattern p = Pattern.compile(" ^java" , Pattern.CASE_INSENSITIVE|Pattern.MULTILINE); Matcher m = p.matcher( "java has regex\nJava has regex\n" + "JAVA has pretty good regular expressions\n" + "Regular expressions are in Java" ); while(m.find()) System.out.println(m.group()); monitor.expect(new String[] { "java" , "Java" , "JAVA" }); } } ///:~ This creates a pattern which will match lines... String[] { "input 0: Java has regular expressions in 1.4", "m1.find() 'regular' start = 9 end = 16", "m1.find() 'ressions' start = 20 end = 28", "m2.find() 'Java has regular expressions in 1.4'" + " start = 0 end = 35", "m2.lookingAt() start = 0 end = 35", "m2.matches() start = 0 end = 35", "input 1: regular expressions now " + "expressing in Java" , "m1.find() 'regular' start = 0 end = 7" , "m1.find()... Exercises Solutions to selected exercises can be found in the electronic document The Thinking in Java Annotated Solution Guide, available for a small fee from www.BruceEckel.com 1 Open a text file so that you can read the file one line at a time Read each line as a String and place that String object into a LinkedList Print all of the lines in the LinkedList in reverse order Feedback 2 Modify Exercise 1 so... the beginning of each // line with no spaces Must enable MULTILINE mode: s = s.replaceAll("(?m)^ +", ""); System.out.println(s); s = s.replaceFirst("[aeiou]", "(VOWEL1)"); StringBuffer sbuf = new StringBuffer(); Pattern p = Pattern.compile("[aeiou]"); Matcher m = p.matcher(s); 70 0 Thinking in Java www.BruceEckel.com // Process the find information as you // perform the replacements: while(m.find())... ///:~ 70 4 Thinking in Java www.BruceEckel.com With regular expressions you can also split a string into parts using more complex patterns, something that’s much more difficult with StringTokenizer It seems safe to say that regular expressions replace any tokenizing classes in earlier versions of Java Feedback You can learn much more about regular expressions in Mastering Regular Expressions, 2nd Edition. .. charseq) String[] split(CharSequence charseq, int limit) This is a quick and handy way of breaking up input text over a common boundary: //: c12:SplitDemo .java import java. util.regex.*; import com.bruceeckel.simpletest.*; import java. util.*; public class SplitDemo { private static Test monitor = new Test(); 698 Thinking in Java www.BruceEckel.com public static void main(String[] args) { String input =... String toString() { return "#" + getName() + ": " + countDown; } public void run() { while(true) { System.out.println(this); if( countDown == 0) return; try { sleep(100); } catch (InterruptedException e) { throw new RuntimeException(e); } } } 71 6 Thinking in Java www.BruceEckel.com public static void main(String[] args) throws InterruptedException { for(int i = 0; i < 5; i++) new SleepingThread().join();... beginning and ending of a line, respectively ‘^’ also matches the beginning of the input string, and ‘$’ also matches the end of the input string By default these expressions only match at the beginning and the end of the entire input string Pattern.UNICODE_CASE (?u) When this flag is specified then caseinsensitive matching, when enabled by the CASE_INSENSITIVE flag, is done in a manner consistent with . 12: The Java I/O System 6 97 expression does not match line terminators. Pattern.MULTILINE (?m) In multiline mode the expressions ‘^’ and ‘$’ match the beginning and ending of a line, respectively to newlines within the input. This is accomplished with the ‘(?m)’ pattern flag at the beginning of the sequence (pattern flags will be shown shortly). Feedback 694 Thinking in Java www.BruceEckel.com. "linnet", "s", "wings", "Evening vening ening ning ing ng g is is s full " + "full ull ll l of of f the the he e linnet linnet " + "innet

Ngày đăng: 14/08/2014, 00:21

TỪ KHÓA LIÊN QUAN