1. Trang chủ
  2. » Công Nghệ Thông Tin

Web Server Programming phần 4 docx

63 310 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 63
Dung lượng 773,65 KB

Nội dung

● Read and process the input line by line. ● Change each line by removing all < > sequences u sing a su bstitution pattern match (note that an input like ‘ <a href=link>Next</a>’ should leave the text ‘Next’). ● Output the updated text to the output file, omitting any lines that are now blank. This program will not correctly process HTML tags that are opened on one line and closed on a subsequent line. (4) Revise the program from Exercise 3 so that it reads the entire input file into memory and processes its contents as a single long string. HTML tags that span multiple lines should now be handled correctly. (5) Write a program that will use regular expression pattern matching to find all HTML <a href= > link tags in an input HTML file and will use these data to construct a collection of the names of the different files referenced in links. These filenames are to be printed when the input data have all been processed. (6) This example is for implementation on a Unix system where the “finger” command can be used to lookup the name of a user whose login id is known. Writeaprogram, StudentList.pl, that is to help administrative staff who have to tran- scribe students’ assignment marks into a University database system. Marks are returned by tutors in files that list student user-ids and marks; for example: aa63 7 am83 7.5 bjr02 8 cjw11 7.5 The database system does not display user-ids; instead it lists students by name, ordered by family name. The finger command can be used to find the name of the person with a given user-id. The following are typical outputs from finger: $ finger bm07 Login name: bm07 In real life: Bradley Milner Directory: /home/ug/c/bm07 Shell: /share/bin/uow_sh Never logged in. $finger rgc01 Login name: rgc01 In real life: Robert George Composti Directory: /home/ug/u/rgc01 Shell: /share/bin/uow_sh Never logged in. The StudentList.pl program is to read marks files supplied by tutors, convert user iden - tifiers to names, and list names and marks in the correct order for transcription to th e database Exercises 175 For an input file: bm07 8 rgc01 8.5 rvi01 8 It should produce the output: Composti Robert George 8.5 Iyer Ravichandran V 8 Milner Bradley 8 More specifically, the StudentList.pl program: ● Reads input from a sequence of files listed on the command line. ● File input is read line-by-line. ● Each line con sists of a student’s user id entifier, followed by a tab character or some spaces, and a mark (a number with possibly a fractional part). ● The system finger command is to be used (via ‘backticks’) to obtain a string con- taining all the iden tification data for a user identifier. ● The name data are to be extracted and rearranged. ● A string containing family name, given names and marks, suitably formatted to guar- antee alignment, is to be generated. ● The string is ad ded to a collectio n. ● When all inputs have been read, the collection of strings is to be sorted and the sorted list printed. You will need to create your own data file with a collection of user ids for some of the users on your Unix system. Exercises 7–10 use Apache log files as input data. You should try to use log files from a local server. The web site associated with this book does have a couple of compressed files with logs from an Apache system; these files have around twenty thousand records each, which is a reasonable amount of data for analysis. Apache log files contain data like the following (line breaks have been inserted to fit the data to the page; each record is actually on a single line): 203.132.227.144 - - [19/Jun/2001:00:43:46 +1000] "GET /current/subject_outlines/ HTTP/1.0" 304 - 203.106.173.151 - - [19/Jun/2001:00:39:57 +1000] "POST /cgi-bin/labpref3 HTTP/1.1" 200 563 176 Perl 203.88.255.122 - - [19/Jun/2001:08:06:03 +1000] "GET /subjects/iact417/tut4/g4_2/Image1.jpg HTTP/1.0" 401 397 203.88.255.122 - iact417 [19/Jun/2001:08:06:06 +1000] "GET /subjects/iact417/tut4/g4_2/Image1.jpg HTTP/1.0" 200 8408 The first element is the client’s IP address, the second is almost always just a ‘-’place holder (it will contain the email address or user identifier in rare cases where the client has chosen to supply this information), the third field will be a user identity if one has been entered in response to an authorization challenge for a controlled realm, the fourth field is a timestamp (date, time, timezone), the fifth field is a quoted string with the ‘get’, ‘post’, ‘put’ or other command (command, resource name, protocol), the fifth field is the HTTP response code, and the final field is the number of bytes of content data contained in the response sent back to the client. Regular expression matches are required to pick up data such as the protocol level used inarequest. (7) Write a program that reads a log file such as described above and: ● Calculates the percentage of clients who are still submitting requests using the HTTP/ 1.0 protocol ● Calculates the percentage of requests that were successful (response codes in the 200 region), resulted only in informational responses (response codes in the 300 region), had client errors (response codes in the 400 region) or caused server errors (response codes in the 500 region). (8) Write a program that will produce a listing of filenames that probably occur as bad links in your web site. This program is to: ● Process only ‘get’ requests that result in ‘404 file not found’ responses. ● Build a collection of the filenames and counts of the number of requests for each missing file. ● Prints a sorted list o f those files and their counts for those files where there are three or more requests; the missing files that are most often requested should be listed first (files with counts of 1 or 2 are usually due to users entering incorrectly spelled filenames). (9) Write a program that: ● Processes only ‘get’ requests. ● Ignores requests for image files and icons (files with names ending .ico, .gif, .jpg, .jpeg, .bmp, .png and capitalized variants of these extensions). ● Keeps records of each distinct filename and associated request count. ● When all data have been read, produces a sorted list showing the twenty most fre - quently requested files and their request counts. Exercises 177 (10) (This program requires interaction with a Domain Name Server and so can only be done online. Depending on the data files used, it may result in a large number of requests to the DNS system, and so may run slowly.) Write a program that ● Constructs a record of all distinct requestor IP addresses and the number of requests they submitted. ● Uses the gethostbyaddr fu nction (DNS services invoked) to look up the names corre - sponding to each distinct address. ● Lists the names of the twenty clients that submit the most requests. ● List counts of total requests from each top-level domain. Exercises 11 and 12 require a database and use of the DBI module along with the correct database drivers. (11) Getting the mech anics to work! Create a database table equivalent to the following SQL table definition ( your database system may require the use of some visual editor helper program to ‘design’ this table, or it may accept the SQL definition). create table demotable ( id integer, name varchar(32), constraint demotable_pk primary key(id) ); insert into demotable values ( 1, 'one'); insert into demotable values ( 2, 'two'); insert into demotable values (3, 'three'); Write a Perl program that: ● Connects to the database and displays the current contents (submit the query select * from demotable and print each row in the result set). ● Allows the user to insert a new record (attempt to pick up error messages if the user vio - lates the primary key constraint). ● Allows the user to modify th e name field associated with an existing record . ● Allows the user to identify a record that is to be deleted. (12) Write a Perl program and define associated database tab les for a sy stem that will record member details for a group and allocate unique membership numbers. 178 Perl The program should be able to: ● List details of all current group members. ● Add a new record with name and add ress details (fields for family name, given name, middle initial, add ress, city, zip-code, state); th e record's pr imary key is a membership number that is automatically allocated. (Automatic allocation of membership number is somewhat database-dependent. For example, Microsoft Access has ‘autonumber’fields; an autonumber field will work fine as the primary key and automatically allocated membership number. Oracle has a somewhat more complex ‘sequence’ construct. One approach that always works is to h ave a separate table that is used to handle unique number generation for different applications. The rows in this table contain an application name and an integer for the next number that is to be allocated. This table is read using the application name as retrieval key, and then a new number value is written b ack. The number taken from the numb ers tab le is then used as part of the entry in the membership table. If you want to be really correct, a single database transaction should be used to group operation on both the numbers table and the member - ship table.) Exercises 13–15 involve Perl CGI programs. It is easier if students can run their own Apache servers with their Perl programs in their server’s cgi-bin directory, and their HTM L p age s i n i ts htdocs directory. When individual students launch their own Apache servers, the server and the spawned CGI programs all run with the student’s user-identi- fier and so their privileges are those appropriate to students. There are quite a number of messy configuration issues to resolve if a single Apache server is to be shared by students. Typically, it is necessary to create individual subdirec- tories for each student within th e htdocs directory of the shared Apache server; these sub- directories require permissions to hold CGI programs. Often it is necessary to resolve issues of ‘set user id’programs, or use o f Apache SUExec, so as to arrange that individual students’ CGI programs run with their author’s access privileges. (13) Getting the mech anics to work! Write a Perl CGI ‘hello world’ program. The implementation should involve a static HTML page, a Perl CGI program and a dynamically generated response page. The static HTML page displays a simple form that has an ‘Enter your name here’ input text box, and a submit button. Submission invokes a Perl program that generates a well- formatted HTML response page, containing a ‘Hello ’greeting that echoes the name that was input in the form. The response page should also include a date stamp showing the time and date as recorded on the server host machine. (14) Getting the mechan ics to work II: adding the backend database. Write a Perl CGI database access system. The system is to allow viewing and updating of a simple data table, such as the demotable defined in Exercise 11. Exercises 179 The implementation should involve a static HTML page, a Perl CGI program, a data - base, and a dynamically generated response page. The static HTML page d isplays a simple form that allows the user to select listing, updating, deletion and insertion operations on the (number, name) demotable defined above. A radio button group can be used to define the required operation; input text fields will be needed for number and name (their values will only be used in a subset of the oper - ations); and, of course, there will be a submit button. Submission invokes a Perl program that generates a well-formatted HTML response page containing a report appropriate to the operation selected – either listing of all entries in a table, or a success or failure mes - sage for the operations that modify the table. (Depending on how the OS and database environments are set up, it may be necessary for the CGI program to explicitly define some environment variables th at con trol d atabase access. Details depend on the local system configuration.) (15) Write a Perl CGI program that handles membership applications for some group. The system will involve two static HTML pages, a couple of Perl CGI programs that gen - erate dynamic response pages, and the same membership database as used in Exercise 12. The first HTML page and CGI program should handle new applications. The form page should have input fields for member details (name, address). The CGI program adds a new record to the members data table and returns a ‘Welcome new member’ page that informs the new member of their allocated membership number. The second HTML page and CGI program are for the group administrator. This form page should allow the administrator to view, update and delete records. The program should p erform the required database operations and report on their outcome via the dynamically generated response page. Deploy the programs on your Apache server so that the membership application com- ponent is generally accessible while the administrator control component is within a ‘con- trolled realm’ so that usage is subject to name/password checks. Short answer questions (1) Compose a pattern that will extract a (4-digit) phone number and an optional fax number fro m listings that in clude lines like: Phone: 6744 Fax: none Phone: 5433 Fax: Phone: 5344 Fax: +61 2 42345678 (2) Here is a short Perl program that can be used to test whether one has come up with the required regular expression to perform some matching and extraction task: #!/share/bin/perl #define the pattern in a doubly quoted string $pat = ; 180 Perl print "Target pattern is $pat \n"; #let user enter data to test the pattern while(<STDIN>) { if( /$pat/ ) { print "\tInput matched\n"; #uncomment if checking for a subexpression # print "\t\tFirst subexpression matched was $1\n"; } } For example, the pattern definition $pat = "^a.*z\$" would allow the program to match all input lines with first letter ‘a’ and last letter ‘z’. Explain, with help of examples, the input data that are being tested for by the patterns with following definitions (where appropriate, identify matched sub-expressions): $pat = "=[0-9A-Fa-f]{32}\\s"; $pat = "^[^:]+:([^:]+):"; $pat = "(\\d\\d):(\\d\\d):"; $pat = "\\W(thread|p_thread|pthread)\\W"; (Note that these patterns are being defined as doubly quoted strings, so ‘extra’ back- slash characters are required as shown.) Explorations (1) Pick an application area, such as encryption, XML parsing, support for Simple Object Access Protocol (SOAP), CGI/HTML processing or similar; then go to http:// www.cpan.org/ (home to Perl modules) and find out about Perl support in your chosen area. Download some representative modules and run small examples using them. Write a report on the potential for using Perl in the chosen area; illustrate your report with frag - ments from your test examples. Exercises 181 This page intentionally left blank 6 PHP4 PHP4 is another of the open source quick hacks that grew explosively and became a fairly well-established industry. It is essentially the open source competitor for Microsoft’s pro - prietary Active Server Page technology. The PHP4 system is simple to deploy (it inte - grates best with Apache, but can work with other web servers), and PHP4 scripting code is easy to write (it is probably the easiest to use of all the server-side technologies). It has grown into one of the most widely used server-side technologies. It is mainly deployed for small web sites – small companies, sites belonging to individuals and so forth – but there are some major commercial users. This chapter starts with brief sections on the o rigins of PHP and on its syntax. A series of simple examples are then used to illustrate its application . More advanced examples follow that illustrate common w eb needs such as m ulti-page forms, file uploads , database access and graphical response pages. (I p refer PHP over all other alternatives when it comes to generating response pages that include graphical images that are based on user- submitted data.) The final section looks at different ways in which an application can maintain state information about clients in the stateless world of WWW and HTTP. 6.1 PHP4’s origins Rasmus Lerdorf wrote the first version of PHP to manage his home page; hence the orig - inal PHP name, an acronym for Personal Home Pages. Lerdorf’s original 1994 implemen - tation involved a group of Perl scripts; this system was expanded w ith a ‘Form Interpreter’ component to produce PHP/FI 2.0. In 1997, Zeev Suraski and Andi Gurmans constructed a parser for this steadily evolving scripting language, which led to the PHP3.0 implemen - tation that was the first version to establish a really significant user population. The cur - rent version, PHP4, has tidied up the language, improved the implementation and incorporated as standard many features that were only available as contributed add-on libraries in the earlier P HP3. Why did PHP emerge and why did it grow? I t faced established competitors. When the Common Gateway Interface (CGI) proto cols were established , NCSA supplied function libraries for C/C++ that handled common task s like th e extraction o f n ame/value pairs from a browser-supplied query string. Perl had proved popular, and many libraries were emerging to support Perl-based CGI systems. Lerdorf provides his own explanation for PHP's popularity via a small example in which he compares the programs that are needed to echo ‘name’ and ‘age’ data as entered in a simple HTML for m. His first version illustrates how he would write a C version: #include <stdio.h> #include <stdlib.h> #include <ctype.h> #include <string.h> #define ishex(x) (((x) >= '0' && (x) <= '9') || \ ((x) >= 'a' && (x) <= 'f') || \ ((x) >= 'A' && (x) <= 'F')) int htoi(char *s) { int value; char c; c = s[0]; if(isupper(c)) c = tolower(c); value=(c >= '0' && c <= '9'?c-'0':c-'a'+10)*16; c = s[1]; if(isupper(c)) c = tolower(c); value += c >= '0' && c <= '9'?c-'0':c-'a'+10; return(value); } void main(int argc, char *argv[]) { char *params, *data, *dest, *s, *tmp; char *name, *age; puts("Content-type: text/html\r\n"); puts("<html><header><title>Form Example</title></header>"); puts("<body><h1>Welcome</h1>"); data = getenv("QUERY_STRING"); if(data) { params = data; dest = data; /* In situ replacement of x-www-urlencoded string with decoded version. */ while(*data) { /* Plus to space */ if(*data=='+') *dest=' '; else if(*data == '%' && ishex(*(data+1)) && ishex(*(data+2))) { /* Hex combination to character */ *dest = (char) htoi(data + 1); data+=2; 184 PHP4 [...]... interpreters, and so use PHP as your main Unix/Linux scripting language) But typically PHP is integrated directly into the web server The most common variant of the PHP interpreter is a module integrated into an Apache web server (there are also ISAPI and NSAPI modules for the IIS and Netscape iPlanet servers, and there is support for other less common systems) Integration saves the cost of the separate CGI process... can always be faked, so such tags cannot be relied on Apart from the risks associated with accepting anything from web world, there are other problems The PHP script is run inside the web server and so is running with the 208 PHP4 Figure 6.3 File upload user-id and permissions used for that server (typically, user ‘nobody’ or ‘www’) This user-id is likely to have restricted access to files, and this can... the HTTP POST method G Also, all the general environment variables of the process running the web server When a form is submitted to a PHP script, the form’s variables are made available to the script by PHP The exact mechanism depends on the configuration options used when setting up the PHP module for the web server The forms variables (its named input fields etc.) may be defined, automatically, as... conducted by Netcraft and other web analysis companies; these data show the recent growth Between January 2000 and September 2001, PHP grew in usage from about 40 0,000 of the IP addresses surveyed to over one million; in terms of domains, the growth was from approximately 1.2 million to around 7 million Other surveys (from SecuritySpace) show that by September 2001 about 44 per cent of all Apache sites... => "Core Java", "cost" => 34. 95", "stars" => "****", ); The PHP interpreter has a large number of predefined variables that are available to any script that it runs These variables do depend on your system configuration The phpinfo() function can be used to get a printout of the variables that are defined in your system The following may be included: G SERVER_ NAME: the name server host under which the... rather than the Perl last and next statements 6.2 .4 Functions There are thousands of predefined functions, but you can also define your own: function MyVeryOwnFunction($arg_1, $arg_2, , $arg_n) { echo "Running in my function.\n"; $result = ; return $result; } PHP supports C++ style default arguments: function drawstring($str, $x = 0, $y =0) { } 1 94 PHP4 Arguments are normally passed by value, but you... data, must be saved somewhere while other requests and responses pass between client browser and web server One approach involves saving all previously entered data in each of the dynamically generated forms that are returned to a user These previously entered data should be returned, hopefully unchanged, to the server along with any additional data entered in the new form Such previously entered data are... address Your age Less than 14 14- 19 20-25 201 202 PHP4 26-35 Over 35 Male Female "Crown Street Mall, Wollongong", "x" => 130, "y" => 310 ); $mac2 = array ( "address" => "Moombarra Street, Dapto", "x" => 51, "y" => 42 0 ); // Several similar definitions for other data objects // Definition . /subjects/iact417/tut4/g4_2/Image1.jpg HTTP/1.0" 40 1 397 203.88.255.122 - iact417 [19/Jun/2001:08:06:06 +1000] "GET /subjects/iact417/tut4/g4_2/Image1.jpg HTTP/1.0" 200 840 8 The first. that will extract a (4- digit) phone number and an optional fax number fro m listings that in clude lines like: Phone: 6 744 Fax: none Phone: 543 3 Fax: Phone: 5 344 Fax: +61 2 42 345 678 (2) Here is. record is actually on a single line): 203.132.227. 144 - - [19/Jun/2001:00 :43 :46 +1000] "GET /current/subject_outlines/ HTTP/1.0" 3 04 - 203.106.173.151 - - [19/Jun/2001:00:39:57 +1000] "POST

Ngày đăng: 14/08/2014, 12:20

TỪ KHÓA LIÊN QUAN

w