Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 26 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
26
Dung lượng
40,86 KB
Nội dung
Chapter 4:TheSocketLibrary-P2 Now we wait for a response from the server. We read in the response and selectively echo it out, where we look at the $response, $header, and $data variables to see if the user is interested in looking at each part of the reply: # get the HTTP response line my $the_response=<F>; print $the_response if ($all || defined $response); # get the header data while(<F>=~ m/^(\S+):\s+(.+)/) { print "$1: $2\n" if ($all || defined $header); } # get the entity body if ($all || defined $data) { print while (<F>); } The full source code looks like this: #!/usr/local/bin/perl -w # socket based hypertext version of UNIX cat use strict; use Socket; # include Socket module require 'tcp.pl'; # file with Open_TCP routine require 'web.pl'; # file with parseURL routine use vars qw($opt_h $opt_H $opt_r $opt_d); use Getopt::Std; # parse command line arguments getopts('hHrd'); # print out usage if needed if (defined $opt_h || $#ARGV<0) { help(); } # if it wasn't an option, it was a URL while($_ = shift @ARGV) { hcat($_, $opt_r, $opt_H, $opt_d); } # Subroutine to print out usage information sub usage { print "usage: $0 -rhHd URL(s)\n"; print " -h help\n"; print " -r print out response\n"; print " -H print out header\n"; print " -d print out data\n\n"; exit(-1); } # Subroutine to print out help text along with usage information sub help { print "Hypertext cat help\n\n"; print "This program prints out documents on a remote web server.\n"; print "By default, the response code, header, and data are printed\n"; print "but can be selectively printed withthe - r, -H, and -d options.\n\n"; usage(); } # Given a URL, print out the data there sub hcat { # grab paramaters my ($full_url, $response, $header, $data)=@_; # assume that response, header, and data will be printed my $all = !($response || $header || $data); # if the URL isn't a full URL, assume that it is a http request $full_url="http://$full_url" if ($full_url !~ m/(\w+):\/\/([^\/:]+)(:\d*)?([^#]*)/); # break up URL into meaningful parts my @the_url = parse_URL($full_url); if (!defined @the_url) { print "Please use fully qualified valid URL\n"; exit(-1); } # we're only interested in HTTP URL's return if ($the_url[0] !~ m/http/i); # connect to server specified in 1st parameter if (!defined open_TCP('F', $the_url[1], $the_url[2])) { print "Error connecting to web server: $the_url[1]\n"; exit(-1); } # request the path of the document to get print F "GET $the_url[3] HTTP/1.0\n"; print F "Accept: */*\n"; print F "User-Agent: hcat/1.0\n\n"; # print out server's response. # get the HTTP response line my $the_response=<F>; print $the_response if ($all || defined $response); # get the header data while(<F>=~ m/^(\S+):\s+(.+)/) { print "$1: $2\n" if ($all || defined $header); } # get the entity body if ($all || defined $data) { print while (<F>); } # close the network connection close(F); } Shell Hypertext cat With hcat, one can easily retrieve documents from remote web servers. But there are times when a client request needs to be more complex than hcat is willing to allow. To give the user more flexibility in sending client requests, we'll change hcat into shcat, a shell utility that accepts methods, headers, and entity-body data from standard input. With this program, you can write shell scripts that specify different methods, custom headers, and submit form data. All of this can be done by changing a few lines around. In hcat, where you see this: # request the path of the document to get print F "GET $the_url[3] HTTP/1.0\n"; print F "Accept: */*\n"; print F "User-Agent: hcat/1.0\n\n"; Replace it with this: # copy STDIN to network connection while (<STDIN>) {print F;} and save it as shcat. Now you can say whatever you want on shcat's STDIN, and it will forward it on to theweb server you specify. This allows you to do things like HTML form postings with POST, or a file upload with PUT, and selectively look at the results. At this point, it's really all up to you what you want to say, as long as it's HTTP compliant. Here's a UNIX shell script example that calls shcat to do a file upload: #!/bin/ksh echo "PUT /~apm/hi.txt HTTP/1.0 User-Agent: shcat/1.0 Accept: */* Content-type: text/plain Content-length: 2 hi" | shcat http://publish.ora.com/ Grep out URL References When you need to quickly get a list of all the references in an HTML page, here's a utility you can use to fetch an HTML page from a server and print out the URLs referenced within the page. We've taken the hcat code and modified it a little. There's also another function that we added to parse out URLs from the HTML. Let's go over that first: sub grab_urls { my($data, %tags) = @_; my @urls; [...]... and using webclient software Most of these issues are automatically handled by LWP, but when programming directly with sockets, you have to take care of them yourself How does your client handle tag parameters? The decision to process or ignore extra tag parameters depends on the application of the webclient Some tag parameters change the tag's appearance by adjusting colors or sizes Other tags are... by the server In general, theclient should be equipped to handle variations in metadata as they occur Does your client handle URL redirection? Does it need to? Perhaps the desired data still exists, but not at the location specified by your client In the event of a redirection, will your client handle it? Does it examine the Location header? The answers to these questions depend on the purpose of the. .. client do when the server is down? When the server is down, there are several options The most obvious option is for theclient to attempt the HTTP request at a later time Other options are to try an alternate server or abort the transaction The programmer should give the user some configuration options about theclient' s actions What does your client do when the server response time is long? For simple... Does theclient analyze the response line and headers? It is not advisable to write clients that skip over the HTTP response line and headers While it may be easier to do so, it often comes back to haunt you later For example, if the URL used by theclient becomes obsolete or is changed, theclient may interpret the entitybody incorrectly Media types for the URL may change, and could be noticed in the. .. theclient Does the client send authorization information when it shouldn't? Two or more separate organizations may have CGI programs on the same server It is important for your client not to send authorization information unless it is requested Otherwise, the client may expose its authentication to an outside organization This opens up the user's account to outsiders What does your client do when the. .. updated But other changes, like the general format of the HTML, may cause your current client to interpret important values incorrectly Changes in data may be unpredictable When your client doesn't understand the data, it is safer for the client not to assume anything, to abort its current operation, and to notify someone to look into it Theclient may need to be updated to handle the changes at the server... parameter=" "> The outer if statement looks for HTML tags, like , , , The inner if statement looks for parameters to the tags, like SRC and HREF, followed by text Upon finding a match, the referenced URL is pushed into an array, which is returned at the end of the function We've saved this in web. pl, and will include it in the hgrepurl program with a require 'web. pl' The second major... it might send data over 20 characters As the HTML standard evolves, your client may require some updating What does your client do when the server's expected HTML format changes? Examine the data coming back from the server After your client can handle the current data, think about possible changes that may occur in the data Some changes won't affect your client' s functionality For example, textual... variable declarations in HTML forms Your client may need to pay close attention to these tags For example, if your client sends form data, it may want to check all the parameters Otherwise, your client may send data that is inconsistent with what the HTML specified e.g., an HTML form might specify that a variable's value may not exceed a length of 20 characters If the client ignored this parameter, it might... long? For simple applications, it may be better to allow the user to interrupt the application For user-friendly or unattended batch applications, it is desirable to time out the connection and notify the user What does your client do when the server has a higher version of HTTP? And what happens when theclient doesn't understand the response? The most logical thing is to attempt to talk on a common . Chapter 4: The Socket Library- P2 Now we wait for a response from the server. We read in the response and selectively echo it out, where we look at the $response,. returned at the end of the function. We've saved this in web. pl, and will include it in the hgrepurl program with a require &apos ;web. pl'. The second