© 2006 KDnuggets 152.152.98.11 - - [16/Nov/2005:16:32:50 -0500] "GET /jobs/ HTTP/1.1" 200 15140 "http://www.google.com/search?q=salary+for+data+mining&hl=en&lr=&start=10&sa=N" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)“ 252.113.176.247 - - [16/Feb/2006:00:06:00 -0500] "GET / HTTP/1.1" 200 12453 "http://www.yisou.com/search? p=data+mining&source=toolbar_yassist_button&pid=400740_1006" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; MyIE2)" 252.113.176.247 - - [16/Feb/2006:00:06:00 -0500] "GET /kdr.css HTTP/1.1" 200 145 "http://www.kdnuggets.com/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; MyIE2)" 252.113.176.247 - - [16/Feb/2006:00:06:00 -0500] "GET /images/KDnuggets_logo.gif HTTP/1.1" 200 784 "http://www.kdnuggets.com/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; MyIE2)" Perl for Web Log Analysis © 2006 KDnuggets Perl - introduction A full-featured, fast, and easy to use scripting language Very powerful pattern-matching facilities More powerful than gawk; very popular for web programming and CGI files Many Perl tutorials, e.g. learn.perl.org/ www.perl.com/pub/a/2000/10/begperl1.html www.perlmonks.org/index.pl?node=Tutorials © 2006 KDnuggets Perl – historical note PERL stands for Practical Extraction and Reporting Language Developed by Larry Wall Perl 1.0 was released to usenet's alt.comp.sources in 1987 Perl is the most popular web programming language – due to powerful text manipulation and quick development. Perl is widely known as "the duct-tape of the Internet". © 2006 KDnuggets Perl - running First Perl script (on Unix) file1.pl #!/usr/local/bin/perl -w print "Hi there!\n"; Note: On Windows, first line usually is #!c:/Perl/bin/perl.exe -w % file1.pl Result: Hi there! © 2006 KDnuggets Perl for Windows Active Perl – ready-to-install Perl distribution Runs on Windows, Linux, MAC OS, and other OS Free download www.activestate.com/Products/ActivePerl/ © 2006 KDnuggets Perl basics Two data types: numbers and strings Perl uses many special characters $, @, %, as part of its syntax Perl variables: Scalars (simple variables, things) start with $, e.g. $count Arrays (lists) start with @, e.g. @array1 Hashes (associative arrays) start with % Usual control structures Full introduction to Perl is beyond the scope of this module © 2006 KDnuggets What does this code do? @P=split//,".URRUU\c8R";@d=split//,"\nrekcah xinU / lreP rehtona tsuJ";sub p{ @p{"r$p","u$p"}=(P,P);pipe"r$p","u$p";++$p; ($q*=2)+=$f=!fork;map{$P=$P[$f^ord ($p{$_})&6]; $p{$_}=/ ^$P/ix?$P:close$_}keys %p}p;p;p;p;p;map{$p{$_}=~/^[P.]/&& close$_}%p;wait until$?;map{/^r/&&<$_>}%p;$_=$d[$q];sleep rand(2)if/\S/;print Answer: We do NOT want to know ! © 2006 KDnuggets The Tao of Coding Human time is MUCH more precious than computer time It is much better (and faster) to develop programs using methods that AVOID mistakes than try to find bugs in badly written programs © 2006 KDnuggets Perl style: understandability first Perl allows you to do tricky programs to save a few lines of text AVOID this approach Use careful, step by step development Test after every step A good program should be easy to understand Only after you have an understandable program, and only if you need it, you can improve efficiency © 2006 KDnuggets Perl coding Variables can be declared implicitly by their first use, e.g. $oldvar=$nevar+27 if $nevar was not declared before, it will be initialized to zero Danger! Can lead to hard-to-find errors (what if the variable was misspelled and was supposed to be $newvar ?) Much better to declare variables explicitly e.g. my $newvar = 0; Enforced by command use strict [...]... KDnuggets Perl hash iteration Iteration over the entire hash foreach $country (keys %capitals) { print "$country capital $capitals{$country}\n"; } © 2006 KDnuggets Additional tools for Web log analysis Perl for web log analysis www.oreilly.com/catalog/perlwsmng/chapter/ch08.html Some web log analysis tools Analog www.analog.cx/ AWstats awstats.sourceforge.net/ Webalizer www.mrunix.net/webalizer/...Sample log file We will again use file d100 .log – first 100 lines from the Nov 16, 2005 KDnuggets log file We will give useful code examples You are encouraged to try the code examples in this lecture on this file You should get the same answers! © 2006 KDnuggets Perl for parsing a web log file Program 0: logparse0.pl - read and print log file #!c: /Perl/ bin /perl. exe -w use strict;... in $4 © 2006 KDnuggets Parsing log: Time Zone The time zone is relative to GMT The time zone in the log file is for the SERVER, not for the visitor, so it is nearly always the same in the time log but it changes during daylight savings time In our test log file the time zone is -0500, US Eastern time zone © 2006 KDnuggets Parsing log: Request Regular expression for parsing Request field: opening... expression in the first parentheses, etc © 2006 KDnuggets Perl regex: match variables Note: First line with Perl is probably different on your machine #!c: /Perl/ bin /perl. exe –w use strict; my $cnt=0; while () { my $line = $_; if ($line =~ /^(\S+) - - /) { my $ip = $1; print "ip $ip\n"; $cnt++; } else { print "bad line $line\n"; } } print " processed $cnt log lines\n"; this program shows how to assign IP... line print $line; } © 2006 KDnuggets Perl regular expressions, 1 Usage: $var =~ / regex / where regex is a regular expression E.g $line =~ /google/ will match all lines containing "google" Note: / delimit regular expression, so / can't be used inside (unless escaped like this \/ ) © 2006 KDnuggets Perl log parsing, 1 Check how many lines refer to google #!c: /Perl/ bin /perl. exe -w use strict; my $cnt=0;... "(GET|HEAD|POST|OPTIONS) (\S+) HTTP(\S+)" (\d\d\d) (\S+) "([^"]*)" "([^"]+)"/ ) { … } Full code is in program weblog_parse.pl © 2006 KDnuggets Perl arrays Perl array is an ordered list of items Array names begin with @ Array initialization: @days=("Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat") © 2006 KDnuggets Perl arrays, num of items When referring to a single array item, name begins with "$" E.g we print... KDnuggets Perl array iteration Iterating over entire array foreach $day (@days) {print $day,"\n" } ; is the same as for $n ($n=0; $n . Many Perl tutorials, e.g. learn .perl. org/ www .perl. com/pub/a/2000/10/begperl1.html www.perlmonks.org/index.pl?node=Tutorials © 2006 KDnuggets Perl – historical note PERL stands for Practical. You should get the same answers! © 2006 KDnuggets Perl for parsing a web log file Program 0: logparse0.pl - read and print log file #!c: /Perl/ bin /perl. exe -w use strict; while (<>) { my. KDnuggets Perl for Windows Active Perl – ready-to-install Perl distribution Runs on Windows, Linux, MAC OS, and other OS Free download www.activestate.com/Products/ActivePerl/ © 2006 KDnuggets Perl