HTTP Essentials Stephen Thomas Protocols for Secure, Scaleable Web Sites WILEY HTTP Essentials Protocols for Secure, Scaleable Web Sites Stephen A. Thomas Wiley Computer Publishing John Wiley & Sons, Inc. New York • •• • Chichester • •• • Weinheim • •• • Brisbane • •• • Singapore • •• • Toronto Publisher: Robert Ipsen Editor: Margaret Eldridge Managing Editor: Micheline Frederick Text Design & Composition: Stephen Thomas Designations used by companies to distinguish their products are often claimed as trademarks. In all instances where John Wiley & Sons, Inc., is aware of a claim, the product names appear in initial capital or all capital letters. Readers, however, should contact the appropriate companies for more complete information regarding trademarks and registration. This book is printed on acid-free paper. Copyright © 200 1 by Stephen A. Thomas. All rights reserved. Published by John Wiley & Sons, Inc. Published simultaneously in Canada. No part of this publication may be reproduced, stored in a retrieval system or trans- mitted in any form or by any means, electronic, mechanical, photocopying, re- cording, scanning or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, 222 Rosewood Drive, Danvers, ma 0 1923, (978) 750- 8400, fax (978) 750-4744. Requests to the Publisher for permission should be ad- dressed to the Permissions Department, John Wiley & Sons, Inc., 605 Third Avenue, New York, ny 10158-0012, (212) 850-6011, fax (212) 850-6008, email perm- req@wiley.com. This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold with the understanding that the pub- lisher is not engaged in professional services. If professional advice or other expert assistance is required, the services of a competent professional person should be sought. Printed in the United States of America. 10 9 8 7 6 5 4 3 2 1 For the West Avenue Gang vii CONTENTS Chapter 1: Introduction 1 1.1 HTTP and the World Wide Web 2 1.2 Protocol Layers 3 1.3 Uniform Resource Identifiers 9 1.4 Organization of This Book 10 Chapter 2: HTTP Operation 13 2.1 Clients and Servers 13 2.1.1 Initiating Communication 14 2.1.2 Connections 15 2.1.3 Persistence 15 2.1.4 Pipelining 17 2.2 User Operations 19 2.2.1 Web Page Retrieval – GET 19 2.2.2 Web Forms – POST 20 2.2.3 File Upload – PUT 22 2.2.4 File Deletion – DELETE 23 2.3 Behind the Scenes 24 2.3.1 Capabilities – OPTIONS 24 2.3.2 Status – HEAD 25 2.3.3 Path – TRACE 25 2.4 Cooperating Servers 26 2.4.1 Virtual Hosts 27 viii HTTP Essentials 2.4.2 Redirection 29 2.4.3 Proxies, Gateways, and Tunnels 30 2.4.4 Cache Servers 33 2.4.5 Counting and Limiting Page Views 35 2.5 Cookies and State Maintenance 37 2.5.1 Cookies 38 2.5.2 Cookie Attributes 41 2.5.3 Accepting Cookies 42 2.5.4 Returning Cookies 44 Chapter 3: HTTP Messages 47 3.1 The Structure of HTTP Messages 48 3.1.1 HTTP Requests 48 3.1.2 HTTP Responses 51 3.2 Header Fields 53 3.2.1 Accept 57 3.2.2 Accept-Charset 58 3.2.3 Accept-Encoding 59 3.2.4 Accept-Language 59 3.2.5 Accept-Ranges 60 3.2.6 Age 61 3.2.7 Allow 65 3.2.8 Authentication-Info 65 3.2.9 Authorization 65 3.2.10 Cache-Control 65 3.2.11 Connection 70 3.2.12 Content-Encoding 73 3.2.13 Content-Language 74 3.2.14 Content-Length 74 3.2.15 Content-Location 75 3.2.16 Content-MD5 76 3.2.17 Content-Range 77 3.2.18 Content-Type 78 3.2.19 Cookie 79 3.2.20 Cookie2 80 3.2.21 Date 80 Contents ix 3.2.22 ETag 81 3.2.23 Expect 83 3.2.24 Expires 84 3.2.25 From 84 3.2.26 Host 85 3.2.27 If-Match 86 3.2.28 If-Modified-Since 88 3.2.29 If-None-Match 90 3.2.30 If-Range 91 3.2.31 If-Unmodified-Since 92 3.2.32 Last-Modified 93 3.2.33 Location 93 3.2.34 Max-Forwards 94 3.2.35 Meter 99 3.2.36 Pragma 102 3.2.37 Proxy-Authenticate 102 3.2.38 Proxy-Authorization 103 3.2.39 Range 103 3.2.40 Referer 103 3.2.41 Retry-After 105 3.2.42 Server 105 3.2.43 Set-Cookie2 106 3.2.44 TE 106 3.2.45 Trailer 107 3.2.46 Transfer-Encoding 108 3.2.47 Upgrade 110 3.2.48 User-Agent 110 3.2.49 Vary 111 3.2.50 Via 112 3.2.51 Warning 113 3.2.52 WWW-Authenticate 114 3.3 Status Codes 115 3.3.1 Informational (1xx) 117 3.3.2 Successful (2xx) 119 3.3.3 Redirection (3xx) 122 3.3.4 Client Error (4xx) 124 3.3.5 Server Error (5xx) 127 x HTTP Essentials Chapter 4: Securing HTTP 129 4.1 Web Authentication 130 4.1.1 Basic Authentication 130 4.1.2 Original Digest Authentication 133 4.1.3 Improved Digest Authentication 142 4.1.4 Protecting Against Replay Attacks 144 4.1.5 Mutual Authentication 145 4.1.6 Protection for Frequent Clients 149 4.1.7 Integrity Protection 152 4.2 Secure Sockets Layer 156 4.2.1 SSL and Other Protocols 157 4.2.2 Public Key Cryptography 159 4.2.3 SSL Operation 161 4.3 Transport Layer Security 168 4.3.1 Differences from SSL 168 4.3.2 Control of the Protocol 169 4.3.3 Upgrading to TLS within an HTTP Session 169 4.4 Secure HTTP 172 Chapter 5: Accelerating HTTP 177 5.1 Load Balancing 177 5.1.1 Locating Servers 178 5.1.2 Distributing Requests 180 5.1.3 Determining a Target Server 182 5.2 Advanced Caching 186 5.2.1 Caching Implementations 186 5.2.2 Proxy Auto Configuration Scripts 194 5.2.3 Web Proxy Auto-Discovery 197 5.2.4 Web Cache Communication Protocol 200 5.2.5 Network Element Control Protocol 204 5.2.6 Internet Cache Protocol 212 5.2.7 Hyper Text Caching Protocol 216 5.2.8 Cache Array Routing Protocol 222 5.3 Other Acceleration Techniques 225 [...]... 5.3 .1 Specialized SSL Processing .225 5.3.2 TCP Multiplexing 227 Appendix A: HTTP Versions 229 A .1 HTTP’s Evolution 229 A.2 HTTP Version Differences .2 31 A.3 HTTP 1. 1 Support .234 Appendix B: Building Bullet-Proof Web Sites 2 41 B .1 The Internet Connection 242 B .1. 1 Redundant Links 242 B .1. 2 Multi-homing 246 B .1. 3... 19 7 ads, and 15 9 of them—over 80 percent—feature a World Wide Web address Even more remarkably, only 12 1 ( 61 percent) list a telephone number If advertisements are a reflection of society, then here in the United States, at least, the Web has become an indispensable part of our lives This book is about what makes the Web tick It explains the protocol that defines how Web browsers communicate with Web. .. the GET operation, a server can return information itself as part of the response For Web browsing, this information is typically a new Web page to display, often a page acknowledging the user’s input; in the case of a search form, the new Web page often shows the search results 2.2.3 File Upload – PUT The PUT operation also provides a way for clients to send information to servers It is significantly... implementation in one system (for example, a Web browser) is effectively communicating with the http implementation in another (a Web server, perhaps) To see this process in more detail, let’s look at how an http message makes its way from your Web browser to a Web server on the Internet Figure 1. 3 shows the first four steps in Introduction 7 Communication System (Web Browser) Application (HTTP) 1 TCP Transport... closing of the TCP connection 16 HTTP Essentials on Web performance That’s because complex Web pages consist of many separate objects, and the client must issue a separate http request to retrieve each of those objects The Web page of figure 2.3, for example, contains over 20 objects (the page itself, plus the individual graphic elements) With early versions of http, Web browsers would have to establish... the most basic http operations 2.2 .1 Web Page Retrieval – GET The simplest http operation of all is GET It is how a client retrieves an object from the server On the Web, browsers request a page from a Web server with a GET For example, clicking on the link in the middle of figure 2.7 will force the browser to issue a GET request to the server asking for the new Web page to display As figure 2.8 shows,... message includes the form’s data In this example the POST data will include the search term (“HTTP”), the scope (All Fields), the results per page (25), and the link method (FTP) client, the POST operation provides a way for clients to send information to servers Web browsers most commonly use POST operations to send forms to Web servers Figure 2.9 shows an example of such a form It is a Web page that allows... communications; for that Berners-Lee and Cailliau designed the first version of http Since then, Web traffic has grown to dominate the Internet By 19 98, http accounted for over 75 percent of the traffic on 1 Internet backbones dwarfing other protocols such as email, file transfer, and remote login Today, at least in the common vernacular, the World Wide Web is the Internet And the Web continues to... Infrastructure 250 B.2 .1 Reliability through Mirrored Web Sites 250 B.2.2 Local Load Balancing and Clustering 2 51 B.2.3 Multi-Layer Security Architectures 254 B.3 Applications .255 B.3 .1 Web Application Dynamics 256 B.3.2 Application Servers .257 B.3.3 Database Management Systems .260 B.3.4 Application Security .265 B.3.5 Platform Security ... http://guest:secret@www.ietf.org:80/html.charters/ wg-dir.html http://guest:secret@www.ietf.org:80/html.charters/wg-dir.html? sess =1 http://guest:secret@www.ietf.org:80/html.charters/wg-dir.html?sess =1 #Applications_Area 10 HTTP Essentials Table 1. 1 Components of a Uniform Resource Identifier Component Use protocol Identifies the application protocol needed to access the resource, in this case HTTP . Upgrade 11 0 3.2.48 User-Agent 11 0 3.2.49 Vary 11 1 3.2.50 Via 11 2 3.2. 51 Warning 11 3 3.2.52 WWW-Authenticate 11 4 3.3 Status Codes 11 5 3.3 .1 Informational (1xx) 11 7 3.3.2 Successful (2xx) 11 9. HTTP Essentials Stephen Thomas Protocols for Secure, Scaleable Web Sites WILEY HTTP Essentials Protocols for Secure, Scaleable Web Sites Stephen A. Thomas Wiley Computer. 1. 4 Organization of This Book 10 Chapter 2: HTTP Operation 13 2 .1 Clients and Servers 13 2 .1. 1 Initiating Communication 14 2 .1. 2 Connections 15 2 .1. 3 Persistence 15 2 .1. 4 Pipelining 17