Java I/O Elliotte Rusty Harold Publisher: O'Reilly First Edition March 1999 ISBN: 1-56592-485-1, 596 pages All of Java's Input/Output (I/O) facilities are based on streams, which provide simple ways to read and write data of different types. Java™ I/O tells you all you need to know about the four main categories of streams and uncovers less-known features to help make your I/O operations more efficient. Plus, it shows you how to control number formatting, use characters aside from the standard ASCII character set, and get a head start on writing truly multilingual software Table of Contents Preface Correcting Misconceptions Organization of the Book Who You Are Versions Security Issues Conventions Used in This Book Request for Comments Acknowledgments 1 1 3 8 8 9 9 11 12 I: Basic I/O 13 1. Introducing I/O 1.1 What Is a Stream? 1.2 Numeric Data 1.3 Character Data 1.4 Readers and Writers 1.5 The Ubiquitous IOException 1.6 The Console: System.out, System.in, and System.err 1.7 Security Checks on I/O 14 14 17 20 24 25 26 32 2. Output Streams 2.1 The OutputStream Class 2.2 Writing Bytes to Output Streams 2.3 Writing Arrays of Bytes 2.4 Flushing and Closing Output Streams 2.5 Subclassing OutputStream 2.6 A Graphical User Interface for Output Streams 34 34 34 36 37 38 39 3. Input Streams 3.1 The InputStream Class 3.2 The read( ) Method 3.3 Reading Chunks of Data from a Stream 3.4 Counting the Available Bytes 3.5 Skipping Bytes 3.6 Closing Input Streams 3.7 Marking and Resetting 3.8 Subclassing InputStream 3.9 An Efficient Stream Copier 42 42 42 44 45 46 46 47 47 48 II: Data Sources 50 4. File Streams 4.1 Reading Files 4.2 Writing Files 4.3 File Viewer, Part 1 51 51 53 56 5. Network Streams 5.1 URLs 5.2 URL Connections 5.3 Sockets 5.4 Server Sockets 5.5 URLViewer 60 60 62 65 68 71 III: Filter Streams 6. Filter Streams 74 6.1 The Filter Stream Classes 6.2 The Filter Stream Subclasses 6.3 Buffered Streams 6.4 PushbackInputStream 6.5 Print Streams 6.6 Multitarget Output Streams 6.7 File Viewer, Part 2 75 75 80 81 83 84 85 89 7. Data Streams 7.1 The Data Stream Classes 7.2 Reading and Writing Integers 7.3 Reading and Writing Floating-Point Numbers 7.4 Reading and Writing Booleans 7.5 Reading Byte Arrays 7.6 Reading and Writing Text 7.7 Miscellaneous Methods 7.8 Reading and Writing Little-Endian Numbers 7.9 Thread Safety 7.10 File Viewer, Part 3 96 96 98 103 106 106 107 111 111 123 124 8. Streams in Memory 8.1 Sequence Input Streams 8.2 Byte Array Streams 8.3 Communicating Between Threads with Piped Streams 131 131 132 135 9. Compressing Streams 9.1 Inflaters and Deflaters 9.2 Compressing and Decompressing Streams 9.3 Working with Zip Files 9.4 Checksums 9.5 JAR Files 9.6 File Viewer, Part 4 140 140 152 159 172 176 189 10. Cryptographic Streams 10.1 Hash Function Basics 10.2 The MessageDigest Class 10.3 Digest Streams 10.4 Encryption Basics 10.5 The Cipher Class 10.6 Cipher Streams 10.7 File Viewer, Part 5 193 193 195 203 209 212 225 231 IV: Advanced and Miscellaneous Topics 236 11. Ob j ect Serialization 11.1 Reading and Writing Objects 11.2 Object Streams 11.3 How Object Serialization Works 11.4 Performa 11.5 The Serializable Interface nce 11.6 The ObjectInput and ObjectOutput Interfaces 11.7 Versioning 11.8 Customizing the Serialization Format 11.9 Resolving Classes 11.10 Resolvin g Ob j ects 237 237 238 239 241 241 247 249 251 260 261 11.11 Validation 11.12 Sealed Objects 261 263 12. Working with Files 12.1 Understanding Files 12.2 Directories and Paths 12.3 The File Class 12.4 Filename Filters 12.5 File Filters 12.6 File Descriptors 12.7 Random-Access Files 12.8 General Techniques for Cross-Platform File Access Code 267 267 274 280 299 300 301 302 304 13. File Dialogs and Choosers 13.1 File Dialogs 13.2 JfileChooser 13.3 File Viewer, Part 6 306 306 313 331 14. Multilingual Character Sets and Unicode 14.1 Unicode 14.2 Displaying Unicode Text 14.3 Unicode Escapes 14.4 UTF-8 14.5 The char Data Type 14.6 Other Encodings 14.7 Converting Between Byte Arrays and Strings 337 337 338 345 346 348 356 357 15. Readers and Writers 15.1 The java.io.Writer Class 15.2 The OutputStreamWriter Class 15.3 The java.io.Reader Class 15.4 The InputStreamReader Class 15.5 Character Array Readers and Writers 15.6 String Readers and Writers 15.7 Reading and Writing Files 15.8 Buffered Readers and Writers 15.9 Print Writers 15.10 Piped Readers and Writers 15.11 Filtered Readers and Writers 15.12 File Viewer Finis 360 360 361 363 365 366 369 372 374 378 380 381 386 16. Formatted I/O with java.text 16.1 The Old Way 16.2 Choosing a Locale 16.3 Number Formats 16.4 Specifying Width with FieldPosition 16.5 Parsing Input 16.6 Decimal Formats 16.7 An Exponential Number Format 395 395 397 400 408 412 414 423 17. The Java Communications API 17.1 The Architecture of the Java Communications API 17.2 Identifying Ports 17.3 Communicating with a Device on a Port 17.4 Serial Ports 17.5 Parallel Ports 429 429 430 437 443 452 V: Appendixes 458 A. Additional Resources A.1 Digital Think A.2 Design Patterns A.3 The java.io Package A.4 Network Programming A.5 Data Compression A.6 Encryption and Related Technology A.7 Object Serialization A.8 International Character Sets and Unicode A.9 Java Communications API A.10 Updates and Breaking News 459 459 459 460 460 461 461 462 462 463 463 B. Character Sets 465 Colophon 472 Dedication To Lynn, the best aunt a boy could ask for. Java I/O 1 Preface In many ways this book is a prequel to my previous book, Java Network Programming (O'Reilly & Associates). When writing that book, I more or less assumed that readers were familiar with basic input and output in Java™—that they knew how to use input streams and output streams, convert bytes to characters, connect filter streams to each other, and so forth. However, after that book was published, I began to notice that a lot of the questions I got from readers of the book and students in my classes weren't so much about network programming itself as they were about input and output (I/O in programmer vernacular). When Java 1.1 was released with a vastly expanded java.io package and many new I/O classes spread out across the rest of the class library, it became obvious that a book that specifically addressed I/O was required. This is that book. Java I/O endeavors to show you how to really use Java's I/O classes, allowing you to quickly and easily write programs that accomplish many common tasks. Some of these include: • Reading and writing files • Communicating over network connections • Filtering data • Interpreting a wide variety of formats for integer and floating-point numbers • Passing data between threads • Encrypting and decrypting data • Calculating digital signatures for streams • Compressing and decompressing data • Writing objects to streams • Copying, moving, renaming, and getting information about files and directories • Letting users choose files from a GUI interface • Reading and writing non-English text in a variety of character sets • Formatting integer and floating-point numbers as strings • Talking directly to modems and other serial port devices • Talking directly to printers and other parallel port devices Java is the first language to provide a cross-platform I/O library that is powerful enough to handle all these diverse tasks. Java I/O is the first book to fully expose the power and sophistication of this library. Correcting Misconceptions Java is the first programming language with a modern, object-oriented approach to input and output. Java's I/O model is more powerful and more suited to real-world tasks than any other major language used today. Surprisingly, however, I/O in Java has a bad reputation. It is widely believed (falsely) that Java I/O can't handle basic tasks that are easily accomplished in other languages like C, C++, and Pascal. In particular, it is commonly said that: • I/O is too complex for introductory students; or, more specifically, there's no good way to read a number from the console. • Java can't handle basic formatting tasks like printing with three decimal digits of precision. Java I/O 2 This book will show you that not only can Java handle these two tasks with relative ease and grace; it can do anything C and C++ can do, and a whole lot more. Java's I/O capabilities not only match those of classic languages like C and Pascal, they vastly surpass them. The most common complaint about Java I/O among students, teachers, authors of textbooks, and posters to comp.lang.java is that there's no simple way to read a number from the console (System.in). Many otherwise excellent introductory Java books repeat this canard. Some textbooks go to great lengths to reproduce the behavior they're accustomed to from C or Pascal, apparently so teachers don't have to significantly rewrite the tired Pascal exercises they've been using for the last 20 years. However, new books that aren't committed to the old ways of doing things generally use command-line arguments for basic exercises, then rapidly introduce the graphical user interfaces any real program is going to use anyway. Apple wisely abandoned the command-line interface back in 1984, and the rest of the world is slowly catching up. [1] Although System.in and System.out are certainly convenient for teaching and debugging, in 1999 no completed, cross-platform program should even assume the existence of a console for either input or output. The second common complaint about Java I/O is that it can't handle formatted output; that is, that there's no equivalent of printf() in Java. In a very narrow sense, this is true because Java does not support the variable length argument lists a function like printf() requires. Nonetheless, a number of misguided souls (your author not least among them) have at one time or another embarked on futile efforts to reproduce printf() in Java. This may have been necessary in Java 1.0, but as of Java 1.1, it's no longer needed. The java.text package, discussed in Chapter 16, provides complete support for formatting numbers. Furthermore, the java.text package goes way beyond the limited capabilities of printf(). It supports not only different precisions and widths, but also internationalization, currency formats, percentages, grouping symbols, and a lot more. It can easily be extended to handle Roman numerals, scientific or exponential notation, or any other number format you may require. The underlying flaw in most people's analysis of Java I/O is that they've confused input and output with the formatting and interpreting of data. Java is the first major language to cleanly separate the classes that read and write bytes (primarily, various kinds of input streams and output streams) from the classes that interpret this data. You often need to format strings without necessarily writing them on the console. You may also need to write large chunks of data without worrying about what they represent. Traditional languages that connect formatting and interpretation to I/O and hard-wire a few specific formats are extremely difficult to extend to other formats. In essence, you have to give up and start from scratch every time you want to process a new format. Furthermore, C's printf(), fprintf(), and sprintf() family only really works well on Unix (where, not coincidentally, C was invented). On other platforms, the underlying assumption that every target may be treated as a file fails, and these standard library functions must be replaced by other functions from the host API. Java's clean separation between formatting and I/O allows you to create new formatting classes without throwing away the I/O classes, and to write new I/O classes while still using the old formatting classes. Formatting and interpreting strings are fundamentally different 1 MacOS X will reportedly add a real command-line shell to the Mac for the first time ever. Mainly, this is because MacOS X has Unix at its heart. However, Apple at least has the good taste to hide the shell so it won't confuse end users and tempt developers away from the righteous path of graphical user interfaces. Java I/O 3 operations from moving bytes from one device to another. Java is the first major language to recognize and take advantage of this. Organization of the Book This book has 17 chapters that are divided into four parts, plus two appendixes. Part I: Basic I/O Chapter 1 Chapter 1 introduces the basic architecture and design of the java.io package, including the reader/stream dichotomy. Some basic preliminaries about the int, byte, and char data types are discussed. The IOException thrown by many I/O methods is introduced. The console is introduced, along with some stern warnings about its proper use. Finally, I offer a cautionary message about how the security manager can interfere with most kinds of I/O, sometimes in unexpected ways. Chapter 2 Chapter 2 teaches you the basic methods of the java.io.OutputStream class you need to write data onto any output stream. You'll learn about the three overloaded versions of write(), as well as flush() and close(). You'll see several examples, including a simple subclass of OutputStream that acts like /dev/null and a TextArea component that gets its data from an output stream. Chapter 3 The third chapter introduces the basic methods of the java.io.InputStream class you need to read data from a variety of sources. You'll learn about the three overloaded variants of the read() method and when to use each. You'll see how to skip over data and check how much data is available, as well as how to place a bookmark in an input stream, then reset back to that point. You'll learn how and why to close input streams. This will all be drawn together with a StreamCopier program that copies data read from an input stream onto an output stream. This program will be used repeatedly over the next several chapters. Part II: Data Sources Chapter 4 The majority of I/O involves reading or writing files. Chapter 4 introduces the FileInputStream and FileOutputStream classes, concrete subclasses of InputStream and OutputStream that let you read and write files. These classes have all the usual methods of their superclasses, such as read(), write(), available(), flush(), and so on. Also in this chapter, development of a File Viewer program commences. You'll see how to inspect the raw bytes in a file in both decimal and hexadecimal format. This example will be progressively expanded throughout the rest of the book. [...]... specific exception that subclasses IOException (However, methods usually only declare that they throw an IOException.) Here are the subclasses of IOException that you'll find in java. io: CharConversionException FileNotFoundException InvalidClassException NotActiveException ObjectStreamException StreamCorruptedException UTFDataFormatException WriteAbortedException EOFException InterruptedIOException InvalidObjectException... InvalidObjectException NotSerializableException OptionalDataException SyncFailedException UnsupportedEncodingException There are a number of IOException subclasses scattered around the other packages, particularly java. util.zip (DataFormatException and ZipException) and java. net (BindException, ConnectException, MalformedURLException, NoRouteToHostException, ProtocolException, SocketException, UnknownHostException,... and UnknownServiceException) The java. io. IOException class declares no public methods or fields of significance—just the usual two constructors you find in most exception classes: public IOException() public IOException(String message) The first constructor creates an IOException with an empty message The second provides more details about what went wrong Of course, IOException has the usual methods... throws IOException public void write(byte[] data) throws IOException public void write(byte[] data, int offset, int length) throws IOException The java. io. Writer class, therefore, declares these three write() methods: public void write(int i) throws IOException public void write(char[] data) throws IOException public abstract void write(char[] data, int offset, int length) throws IOException As you can... character ASCII characters 0-3 1 and character 127 are nonprinting control characters Characters 3 2-4 7 are various punctuation and space characters Characters 4 8-5 7 are the digits 0-9 Characters 5 8-6 4 are another group of punctuation characters Characters 6 5-9 0 are the capital letters AZ Characters 9 1-9 6 are a few more punctuation marks Characters 9 7-1 22 are the lowercase letters a-z Finally, characters... characters ISO Latin-1 uses them for various accented letters like ü needed for non-English languages written in a Roman script, additional punctuation marks and symbols like ©, and additional control characters The upper, non-ASCII half of the ISO Latin-1 character set is shown in Table B.2 Latin-1 provides enough characters to write most Western European languages (again with the notable exception of Greek)... major concern Java 2's I/O classes are mostly identical to those in Java 1.1, with one noticeable exception Java 2 does a much better (though still imperfect) job of abstracting out platform-dependent filesystem idiosyncrasies than does Java 1.1 Some (though not all) of these improvements are also available to Java 1.1 programmers working with Swing I'll discuss both the Java 1.1 and Java 2 approaches... instance, Sun documents the write() method in java. io. OutputStream like this: public void write(byte b[]) throws IOException public void write(byte b[], int off, int len) throws IOException I've rewritten that in this more intelligible form: public void write(byte[] data) throws IOException public void write(byte[] data, int offset, int length) throws IOException These are exactly equivalent, however Method... of an unsigned byte ( 0-2 55) This does not match any Java primitive data type These ints are then converted into bytes internally 18 Java I/O For instance, according to the javadoc class library documentation, the read() method of java. io. InputStream returns "the next byte of data, or -1 if the end of the stream is reached." On a little thought, this sounds suspicious How is a -1 that appears as part... or write to the file The java. io. File class attempts to provide a platform-independent abstraction for common file operations and meta-information Unfortunately, this class really shows its Unix roots It works fine on Unix, reasonably well on Windows—with a few caveats—and fails miserably on the Macintosh File manipulation is thus one of the real bugbears of cross-platform Java programming Therefore, . Java I/O Elliotte Rusty Harold Publisher: O'Reilly First Edition March 1999 ISBN: 1-5 659 2-4 8 5-1 , 596 pages All of Java& apos;s Input/Output (I/O). can read or write to the file. The java. io. File class attempts to provide a platform-independent abstraction for common file operations and meta-information. Unfortunately, this class really. Arabic. Chapter 17 Chapter 17 introduces the Java Communications API, a standard extension available for Java 1.1 and later that allows Java applications and trusted applets to send and receive