Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 15 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
15
Dung lượng
74,88 KB
Nội dung
Chapter 2:DemystifyingtheBrowser-P1
Before you start writing your own web programs, you have to become
comfortable withthe fact that your web browser is just another client. Lots
of complex things are happening: user interface processing, network
communication, operating system interaction, and HTML/graphics
rendering. But all of that is gravy; without actually negotiating withweb
servers and retrieving documents via HTTP, the browser would be as useless
as a TV without a tuner.
HTTP may sound intimidating, but it isn't as bad as you might think. Like
most other Internet protocols, HTTP is text-based. If you were to look at the
communication between your web browser and a web server, you would see
text and lots of it. After a few minutes of sifting through it all, you'd find
out that HTTP isn't too hard to read. By the end of this chapter, you'll be able
to read HTTP and have a fairly good idea of what's going on during typical
everyday transactions over the Web.
The best way to understand how HTTP works is to see it in action. You
actually see it in action every day, with every click of a hyperlink it's just
that the gory details are hidden from you. In this chapter, you'll see some
common web transactions: retrieving a page, submitting a form, and
publishing a web page. In each example, the HTTP for each transaction is
printed as well. From there, you'll be able to analyze and understand how
your actions withthe browser are translated into HTTP. You'll learn a little
bit about how HTTP is spoken between a webclient and server.
After you've seen bits and pieces of HTTP in this chapter, Chapter 3,
Learning HTTP, introduces HTTP in a more thorough manner. In Chapter 3,
you'll see all the different ways that a client can request something, and all
the ways a server can reply. In the end, you'll get a feel for what is possible
under HTTP.
Behind the Scenes of a Simple Document
Let's begin by visiting a hypothetical web server at
http://hypothetical.ora.com/. Its imaginary (and intentionally sparse) web
page appears in Figure 2-1.
Figure 2-1. A hypothetical web page
This is something you probably do every day request a URL and then view
it in your browser. But what actually happened in order for this document to
appear in your browser?
The Browser's Request
Your browser first takes in a URL and parses it. In this example, the browser
is given the following URL:
http://hypothetical.ora.com/
The browser interprets the URL as follows:
http://
In the first part of the URL, you told the browser to use HTTP, the
Hypertext Transfer Protocol.
hypothetical.ora.com
In the next part, you told the browser to contact a computer over the
network withthe hostname of hypothetical.ora.com.
/
Anything after the hostname is regarded as a document path. In this
example, the document path is /.
So the browser connects to hypothetical.ora.com using the HTTP protocol.
Since no port was specified, it assumes port 80, the default port for HTTP.
The message that the browser sends to the server at port 80 is:
GET / HTTP/1.0
Connection: Keep-Alive
User-Agent: Mozilla/3.0Gold (WinNT; I)
Host: hypothetical.ora.com
Accept: image/gif, image/x-xbitmap, image/jpeg,
image/pjpeg, */*
Let's look at what these lines are saying:
1. The first line of this request (GET / HTTP/1.0) requests a
document at / from the server. HTTP/1.0 is given as the version of the
HTTP protocol that the browser uses.
2. The second line tells the server to keep the TCP connection open until
explicitly told to disconnect. If this header is not provided, the server
has no obligation to stick around under HTTP 1.0, and disconnects
after responding to the client's request. The behavior of theclient and
server depend on what version of HTTP is spoken. (See the discussion
of persistent connections in Chapter 3 for the full scoop.)
3. In the third line, beginning withthe string User-Agent, theclient
identifies itself as Mozilla (Netscape) version 3.0, running on
Windows NT.
4. The fourth line tells the server what theclient thinks the server's
hostname is. Since the server may have multiple hostnames, theclient
indicates which hostname was used. In this environment, a web server
can have a different document tree for each hostname it owns. If the
client hasn't specified the server's hostname, the server may be unable
to determine which document tree to use.
5. The fifth line tells the server what kind of documents are accepted by
the browser. This is discussed more in the section "Media Types" in
Chapter 3.
Together, these 5 lines constitute a request. Lines 2 through 5 are request
headers.
The Server's Response
Given a request like the one previously shown, the server looks for the file
associated with "/" and returns it to the browser, preceding it with some
"header information":
HTTP/1.0 200 OK
Date: Fri, 04 Oct 1996 14:31:51 GMT
Server: Apache/1.1.1
Content-type: text/html
Content-length: 327
Last-modified: Fri, 04 Oct 1996 14:06:11 GMT
<title>Sample Homepage</title>
<img /images/oreilly_mast.gif">
<h1>Welcome</h2>
Hi there, this is a simple web page. Granted, it
may not be as elegant
as some other web pages you've seen on the net, but
there are
some common qualities:
<ul>
<li> An image,
<li> Text,
<li> and a <a href="/example2.html"> hyperlink
</a>
</ul>
If you look at this response, you'll see that it begins with a series of lines that
specify information about the document and about the server itself. Then
after a blank line, it returns the document. The series of lines before the first
blank line is called the response header, and the part after the first blank line
is called the body or entity, or entity-body. Let's look at the header
information:
1. The first line, HTTP/1.0 200 OK, tells theclient what version of the
HTTP protocol the server uses. But more importantly, it says that the
document has been found and is going to be transmitted.
2. The second line indicates the current date on the server. The time is
expressed in Greenwich Mean Time (GMT).
3. The third line tells theclient what kind of software the server is
running. In this case, the server is Apache version 1.1.1.
4. The fourth line (Content-type) tells the browser the type of the
document. In this case, it is HTML.
5. The fifth line tells theclient how many bytes are in the entity body
that follows the headers. In this case, the entity body is 327 bytes
long.
6. The sixth line specifies the most recent modification time of the
document requested by the client. This modification time is often used
for caching purposes so a browser may not need to request the entire
HTML file again if its modification time doesn't change.
After all that, a blank line and the document text follow.
Figure 2-2 shows the transaction.
Figure 2-2. A simple transaction
Parsing the HTML
The document is in HTML (as promised in the Content-type line). The
browser retrieves the document and then formats it as needed for example,
each <li> item between the <ul> and </ul> is printed as a bullet and
indented, the <img> tag displays a graphic on the screen, etc.
And while we're on the topic of the <img> tag, how did that graphic get on
the screen? While parsing the HTML file, the browser sees:
<img /images/oreilly_mast.gif">
and figures out that it needs the data for the image as well. Your browser
then sends a second request, such as this one, through its connection to the
web server:
GET /images/oreilly_mast.gif HTTP/1.0
Connection: Keep-Alive
User-Agent: Mozilla/3.0Gold (WinNT; I)
Host: hypothetical.ora.com
Accept: image/gif, image/x-xbitmap, image/jpeg,
image/pjpeg, */*
The server responds with:
HTTP/1.0 200 OK
Date: Fri, 04 Oct 1996 14:32:01 GMT
Server: Apache/1.1.1
Content-type: image/gif
Content-length: 9487
Last-modified: Tue, 31 Oct 1995 00:03:15 GMT
[data of GIF file]
Figure 2-3 shows the complete transaction, withthe image requested as well
as the original document.
Figure 2-3. Simple transaction with embedded image
There are a few differences between this request/response pair and the
previous one. Based on the <img> tag, the browser knows where the image
is stored on the server. From <img
[...]... is up to the server to tell theclientThe important thing to note here is that the HTML formatting and image rendering are done at the browser end All the server does is return documents; the browser is responsible for how they look to the user Clicking on a Hyperlink When you click on a hyperlink, theclient and server go through something similar to what happened when we visited http://hypothetical.ora.com/... Associates Link: ; rev="Made" When the document is finished, your shell prompt should return The server has closed the connection Congratulations! What you've just done is simulate the behavior of a webclient Behind the Scenes of an HTML Form You've probably seen fill-out forms on the Web, in which you enter... example, when you click on the hyperlink from the previous example, the browser looks at its associated HTML: hyperlink From there, it knows that the next location to retrieve is /example2.html The browser then sends the following to hypothetical.ora.com: GET /example2.html HTTP/1.0 Connection: Keep-Alive User-Agent: Mozilla/3.0Gold (WinNT; I) Host: hypothetical.ora.com Accept:...src="/images/oreilly_mast.gif">, the browser requests a document at a different location than "/": GET /images/oreilly_mast.gif HTTP/1.0 The server's response is basically the same, except that the content type is different: Content-type: image/gif From the declared content type, the browser knows what kind of image it will receive and can render it as required The browser shouldn't guess the content type based on the document... image/pjpeg, */* The server responds with: HTTP/1.0 200 OK Date: Fri, 04 Oct 1996 14: 32:1 4 GMT Server: Apache/1.1.1 Content-type: text/html Content-length: 431 Last-modified: Thu, 03 Oct 1996 08:39:45 GMT [HTML data] And the browser displays the new HTML page on the user's screen Retrieving a Document Manually Now that you see what a browser does, it's time for the most empowering statement in this book: There's... telnet specifies the port number to use By default, telnet uses port 23 Most web servers use port 80 If you are behind a firewall, you may have problems accessing www.ora.com directly from your machine Replace www.ora.com withthe hostname of a web server inside your firewall for the same effect.) Now type in a GET command[2] for the document root: GET / HTTP/1.0 Press ENTER twice, and you receive what... There's nothing in these transactions that you can't do yourself And you don't need to write a program you can just do it by hand, using the standard telnet client and a little knowledge of HTTP Telnet to www.ora.com at port 80 From a UNIX shell prompt:[1] % telnet www.ora.com 80 Trying 198.112.208.23 Connected to www.ora.com Escape character is '^]' (The second argument for telnet specifies the port number... you've just done is simulate the behavior of a webclient Behind the Scenes of an HTML Form You've probably seen fill-out forms on the Web, in which you enter information into your browser and submit the form Common uses for forms are guestbooks, accessing databases, or specifying keywords for a search engine . Chapter 2: Demystifying the Browser-P1
Before you start writing your own web programs, you have to become
comfortable with the fact that your web browser. NT.
4. The fourth line tells the server what the client thinks the server's
hostname is. Since the server may have multiple hostnames, the client