Copyright Preface Who Wrote Apache, and Why? The Demonstration Code Conventions Used in This Book Organization of This Book Acknowledgments Chapter 1. Getting Started Section 1.1. What Does a Web Server Do? Section 1.2. How Apache Works Section 1.3. Apache and Networking Section 1.4. How HTTP Clients Work Section 1.5. What Happens at the Server End? Section 1.6. Planning the Apache Installation Section 1.7. Windows? Section 1.8. Which Apache? Section 1.9. Installing Apache Section 1.10. Building Apache 1.3.X Under Unix Section 1.11. New Features in Apache v2 Section 1.12. Making and Installing Apache v2 Under Unix Section 1.13. Apache Under Windows Chapter 2. Configuring Apache: The First Steps Section 2.1. What's Behind an Apache Web Site? Section 2.2. site.toddle Section 2.3. Setting Up a Unix Server Section 2.4. Setting Up a Win32 Server Section 2.5. Directives Section 2.6. Shared Objects Chapter 3. Toward a Real Web Site Section 3.1. More and Better Web Sites: site.simple Section 3.2. Butterthlies, Inc., Gets Going Section 3.3. Block Directives Section 3.4. Other Directives Section 3.5. HTTP Response Headers Section 3.6. Restarts Section 3.7. .htaccess Section 3.8. CERN Metafiles Section 3.9. Expirations Chapter 4. Virtual Hosts Section 4.1. Two Sites and Apache Section 4.2. Virtual Hosts Section 4.3. Two Copies of Apache Section 4.4. Dynamically Configured Virtual Hosting Chapter 5. Authentication Section 5.1. Authentication Protocol Section 5.2. Authentication Directives Section 5.3. Passwords Under Unix Section 5.4. Passwords Under Win32 Section 5.5. Passwords over the Web Section 5.6. From the Client's Point of View Section 5.7. CGI Scripts Section 5.8. Variations on a Theme Section 5.9. Order, Allow, and Deny Section 5.10. DBM Files on Unix Section 5.11. Digest Authentication Section 5.12. Anonymous Access Section 5.13. Experiments Section 5.14. Automatic User Information Section 5.15. Using .htaccess Files Section 5.16. Overrides Chapter 6. Content Description and Modification Section 6.1. MIME Types Section 6.2. Content Negotiation Section 6.3. Language Negotiation Section 6.4. Type Maps Section 6.5. Browsers and HTTP 1.1 Section 6.6. Filters Chapter 7. Indexing Section 7.1. Making Better Indexes in Apache Section 7.2. Making Our Own Indexes Section 7.3. Imagemaps Section 7.4. Image Map Directives Chapter 8. Redirection Section 8.1. Alias Section 8.2. Rewrite Section 8.3. Speling Chapter 9. Proxying Section 9.1. Security Section 9.2. Proxy Directives Section 9.3. Apparent Bug Section 9.4. Performance Section 9.5. Setup Chapter 10. Logging Section 10.1. Logging by Script and Database Section 10.2. Apache's Logging Facilities Section 10.3. Configuration Logging Section 10.4. Status Chapter 11. Security Section 11.1. Internal and External Users Section 11.2. Binary Signatures, Virtual Cash Section 11.3. Certificates Section 11.4. Firewalls Section 11.5. Legal Issues Section 11.6. Secure Sockets Layer (SSL) Section 11.7. Apache's Security Precautions Section 11.8. SSL Directives Section 11.9. Cipher Suites Section 11.10. Security in Real Life Section 11.11. Future Directions Chapter 12. Running a Big Web Site Section 12.1. Machine Setup Section 12.2. Server Security Section 12.3. Managing a Big Site Section 12.4. Supporting Software Section 12.5. Scalability Section 12.6. Load Balancing Chapter 13. Building Applications Section 13.1. Web Sites as Applications Section 13.2. Providing Application Logic Section 13.3. XML, XSLT, and Web Applications Chapter 14. Server-Side Includes Section 14.1. File Size Section 14.2. File Modification Time Section 14.3. Includes Section 14.4. Execute CGI Section 14.5. Echo Section 14.6. Apache v2: SSI Filters Chapter 15. PHP Section 15.1. Installing PHP Section 15.2. Site.php Chapter 16. CGI and Perl Section 16.1. The World of CGI Section 16.2. Telling Apache About the Script Section 16.3. Setting Environment Variables Section 16.4. Cookies Section 16.5. Script Directives Section 16.6. suEXEC on Unix Section 16.7. Handlers Section 16.8. Actions Section 16.9. Browsers Chapter 17. mod_perl Section 17.1. How mod_perl Works Section 17.2. mod_perl Documentation Section 17.3. Installing mod_perl — The Simple Way Section 17.4. Modifying Your Scripts to Run Under mod_perl Section 17.5. Global Variables Section 17.6. Strict Pregame Section 17.7. Loading Changes Section 17.8. Opening and Closing Files Section 17.9. Configuring Apache to Use mod_perl Chapter 18. mod_jserv and Tomcat Section 18.1. mod_jserv Section 18.2. Tomcat Section 18.3. Connecting Tomcat to Apache Chapter 19. XML and Cocoon Section 19.1. XML Section 19.2. XML and Perl Section 19.3. Cocoon Section 19.4. Cocoon 1.8 and JServ Section 19.5. Cocoon 2.0.3 and Tomcat Section 19.6. Testing Cocoon Chapter 20. The Apache API Section 20.1. Documentation Section 20.2. APR Section 20.3. Pools Section 20.4. Per-Server Configuration Section 20.5. Per-Directory Configuration Section 20.6. Per-Request Information Section 20.7. Access to Configuration and Request Information Section 20.8. Hooks, Optional Hooks, and Optional Functions Section 20.9. Filters, Buckets, and Bucket Brigades Section 20.10. Modules Chapter 21. Writing Apache Modules Section 21.1. Overview Section 21.2. Status Codes Section 21.3. The Module Structure Section 21.4. A Complete Example Section 21.5. General Hints Section 21.6. Porting to Apache 2.0 Appendix A. The Apache 1.x API Section A.1. Pools Section A.2. Per-Server Configuration Section A.3. Per-Directory Configuration Section A.4. Per-Request Information Section A.5. Access to Configuration and Request Information Section A.6. Functions Colophon Index Copyright Copyright © O'Reilly & Associates, Inc. Printed in the United States of America. Published by O'Reilly & Associates, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O'Reilly & Associates books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://safari.oreilly.com ). For more information, contact our corporate/institutional sales department: (800) 998-9938 or corporate@oreilly.com . Nutshell Handbook, the Nutshell Handbook logo, and the O'Reilly logo are registered trademarks of O'Reilly & Associates, Inc. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and O'Reilly & Associates, Inc. was aware of a trademark claim, the designations have been printed in caps or initial caps. The association between the image of Appaloosa horse and the topic of Apache is a trademark of O'Reilly & Associates, Inc. While every precaution has been taken in the preparation of this book, the publisher and authors assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein. Preface Apache: The Definitive Guide, Third Edition, is principally about the Apache web-server software. We explain what a web server is and how it works, but our assumption is that most of our readers have used the World Wide Web and understand in practical terms how it works, and that they are now thinking about running their own servers and sites. This book takes the reader through the process of acquiring, compiling, installing, configuring, and modifying Apache. We exercise most of the package's functions by showing a set of example sites that take a reasonably typical web business — in our case, a postcard publisher — through a process of development and increasing complexity. However, we have deliberately tried to make each site as simple as possible, focusing on the particular feature being described. Each site is pretty well self-contained, so that the reader can refer to it while following the text without having to disentangle the meat from extraneous vegetables. If desired, it is possible to install and run each site on a suitable system. Perhaps it is worth saying what this book is not. It is not a manual, in the sense of formally documenting every command — such a manual exists on the Apache site and has been much improved with Versions 1.3 and 2.0; we assume that if you want to use Apache, you will download it and keep it at hand. Rather, if the manual is a road map that tells you how to get somewhere, this book tries to be a tourist guide that tells you why you might want to make the journey. In passing, we do reproduce some sections of the web site manual simply to save the reader the trouble of looking up the formal definitions as she follows the argument. Occasionally, we found the manual text hard to follow and in those cases we have changed the wording slightly. We have also interspersed comments as seemed useful at the time. This is not a book about HTML or creating web pages, or one about web security or even about running a web site. These are all complex subjects that should be either treated thoroughly or left alone. As a result, a webmaster's library might include books on the following topics: • The Web and how it works • HTML — formal definitions, what you can do with it • How to decide what sort of web site you want, how to organize it, and how to protect it • How to implement the site you want using one of the available servers (for instance, Apache) • Handbooks on Java, Perl, and other languages • Security Apache: The Definitive Guide is just one of the six or so possible titles in the fourth category. Apache is a versatile package and is becoming more versatile every day, so we have not tried to illustrate every possible combination of commands; that would require a book of a million pages or so. Rather, we have tried to suggest lines of development that a typical webmaster could follow once an understanding of the basic concepts is achieved. We realized from our own experience that the hardest stage of learning how to use Apache in a real-life context is right at the beginning, where the novice webmaster often has to get Apache, a scripting language, and a database manager to collaborate. This can be very puzzling. In this new edition we have therefore included a good deal of new material which tries to take the reader up these conceptual precipices. Once the collaboration is working, development is much easier. These new chapters are not intended to be an experts' account of, say, the interaction between Apache, Perl, and MySQL — but a simple beginners' guide, explaining how to make these things work with Apache. In the process we make some comments, from our own experience, on the merits of the various software products from which the user has to choose. As with the first and second editions, writing the book was something of a race with Apache's developers. We wanted to be ready as soon as Version 2 was stable, but not before the developers had finished adding new features. In many of the examples that follow, the motivation for what we make Apache do is simple enough and requires little explanation (for example, the different index formats in Chapter 7 ). Elsewhere, we feel that the webmaster needs to be aware of wider issues (for instance, the security issues discussed in Chapter 11 ) before making sensible decisions about his site's configuration, and we have not hesitated to branch out to deal with them. Who Wrote Apache, and Why? Apache gets its name from the fact that it consists of some existing code plus some patches. The FAQFAQ is netspeak for Frequently Asked Questions. Most sites/subjects have an FAQ file that tells you what the thing is, why it is, and where it's going. It is perfectly reasonable for the newcomer to ask for the FAQ to look up anything new to her, and indeed this is a sensible thing to do, since it reduces the number of questions asked. Apache's FAQ can be found at http://www.apache.org/docs/FAQ.html . thinks that this is cute; others may think it's the sort of joke that gets programmers a bad name. A more responsible group thinks that Apache is an appropriate title because of the resourcefulness and adaptability of the American Indian tribe. You have to understand that Apache is free to its users and is written by a team of volunteers who do not get paid for their work. Whether they decide to incorporate your or anyone else's ideas is entirely up to them. If you don't like what they do, feel free to collect a team and write your own web server or to adapt the existing Apache code — as many have. The first web server was built by the British physicist Tim Berners-Lee at CERN, the European Centre for Nuclear Research at Geneva, Switzerland. The immediate ancestor of Apache was built by the U.S. government's NCSA, the National Center for Supercomputing Applications. Because this code was written with (American) taxpayers' money, it is available to all; you can, if you like, download the source code in C from http://www.ncsa.uiuc.edu , paying due attention to the license conditions. There were those who thought that things could be done better, and in the FAQ for Apache (at http://www.apache.org ), we read: Apache was originally based on code and ideas found in the most popular HTTP server of the time, NCSA httpd 1.3 (early 1995). That phrase "of the time" is nice. It usually refers to good times back in the 1700s or the early days of technology in the 1900s. But here it means back in the deliquescent bogs of a few years ago! While the Apache site is open to all, Apache is written by an invited group of (we hope) reasonably good programmers. One of the authors of this book, Ben, is a member of this group. Why do they bother? Why do these programmers, who presumably could be well paid for doing something else, sit up nights to work on Apache for our benefit? There is no such thing as a free lunch, so they do it for a number of typically human reasons. One might list, in no particular order: • They want to do something more interesting than their day job, which might be writing stock control packages for BigBins, Inc. • They want to be involved on the edge of what is happening. Working on a project like this is a pretty good way to keep up-to-date. After that comes consultancy on the next hot project. • The more worldly ones might remember how, back in the old days of 1995, quite a lot of the people working on the web server at NCSA left for a thing called Netscape and became, in the passage of the age, zillionaires. • It's fun. Developing good software is interesting and amusing, and you get to meet and work with other clever people. • They are not doing the bit that programmers hate: explaining to end users why their treasure isn't working and trying to fix it in 10 minutes flat. If you want support on Apache, you have to consult one of several commercial organizations (see Appendix A ), who, quite properly, want to be paid for doing the work everyone loathes. The Demonstration Code The code for the demonstration web sites referred to throughout the book is available at http://www.oreilly.com/catalog/apache3/ . It contains the requisite README file with installation instructions and other useful information. The contents of the download are organized into two directories: install/ This directory contains scripts to install the sample sites: install Run this script to install the sites. install.conf Unix configuration file for install. installwin.conf Win32 configuration file for install. sites/ This directory contains the sample sites used in the book. Conventions Used in This Book This section covers the various conventions used in this book. Typographic Conventions Constant width Used for HTTP headers, status codes, MIME content types, directives in configuration files, commands, options/switches, functions, methods, variable names, and code within body text Constant width bold Used in code segments to indicate input to be typed in by the user Constant width italic Used for replaceable items in code and text [...]... gone well, you should look in /usr/local/sbin to find the new executables Use the command ls -l to see the timestamps to make sure they came from the build you have just done (it is surprisingly easy to do several different builds in a row and get the files mixed up): total 1054 -rwxr-xr-x -rwxr-xr-x -rwxr-xr-x -rwxr-xr-x -rwxr-xr-x -rw-r r rwxr-xr-x 1 1 1 1 1 1 1 root root root root root root root wheel... directories in /usr/src /apache Go there, copy the .tar.gz or .tar.Z file, and uncompress the Z version or gunzip (or gzip -d ) the gz version: uncompress .tar.Z or: gzip -d .tar.gz Make sure that the resulting file is called .tar, or tar may turn up its nose If not, type: mv .tar Now unpack it: % tar xvf .tar Incidentally,... Server Do? 1.2 How Apache Works 1.3 Apache and Networking 1.4 How HTTP Clients Work 1.5 What Happens at the Server End? 1.6 Planning the Apache Installation 1.7 Windows? 1.8 Which Apache? 1.9 Installing Apache 1.10 Building Apache 1.3.X Under Unix 1.11 New Features in Apache v2 1.12 Making and Installing Apache v2 Under Unix 1.13 Apache Under Windows Apache is the dominant web server on the Internet today,... by the first of the four numbers, but a shortage of addresses required a change to the use of subnet masks These allow us to further subdivide the network by using more of the bits for the network number and less for the host number Their correct use is rather technical, so we leave it to the routing experts (You should not need to know the details of how this works in order to run a host, because the. .. were used by asking the operating system to where the connection is connecting Apache then uses the IP address, port number — and the Host header in HTTP 1.1 — to decide which virtual host is the target of this request The virtual host then looks at the path, which was handed to it in the request, and reads that against its configuration to decide on the appropriate response, which it then returns Most... the same interface The interface in this context is actually the bit of software — the driver — that handles the physical connection (Ethernet card, serial port, etc.) to the outside While writing this book, we accessed the practice sites through an Ethernet connection between a Windows 95 machine (the client) and a FreeBSD box (the server) running Apache Our environment was very untypical, since the. .. they separate the HTTP header from its body If the request were a POST, there would be data following The server sends the response back and closes the connection To see it in action, connect again to the Internet, get a command-line prompt, and type the following: % telnet www .apache. org 80 > telnet www .apache. org 80 GET http://www .apache. org/foundation/contact.html HTTP/1.1 Host: www .apache. org On... pair, and there is no port What happens? The browser observes that the URL starts with http: and deduces that it should be using the HTTP protocol The client then contacts a name server, which uses DNS to resolve www .apache. org to an IP address At the time of writing, this was 63.251.56.142 One way to check the validity of a hostname is to go to the operating-system prompt[8] and type: ping www .apache. org... appropriate action The webmaster's main control over Apache is through the Config file The webmaster has some 200 directives at her disposal, and most of this book is an account of what these directives do and how to use them to reasonable advantage The webmaster also has a dozen flags she can use when Apache starts up We've quoted most of the formal definitions of the directives directly from the Apache site... Server: Apache/ 2.0.32 (Unix) Cache-Control: max-age=86400 Expires: Tue, 26 Feb 2002 15:03:19 GMT Accept-Ranges: bytes Content-Length: 4946 Content-Type: text/html Contact Information The Apache . implement the site you want using one of the available servers (for instance, Apache) • Handbooks on Java, Perl, and other languages • Security Apache: The Definitive Guide is just one of the six. speaker, we rearranged the syntax a little. As they stand, they save the reader having to break off and go to the Apache site 1.3 Apache and Networking At its core, Apache is about communication. talk to each other over networks. The two protocols that give the suite its name are among the most important, but there are many others, and we shall meet some of them later. These protocols