Google hacking for penetration tester - part 21 doc

Figure 5.20 The LinkedIn Profile of the Author of a Government Document Can this process of grabbing documents and analyzing them be automated? Of course! As a start we can build a scraper that will find the URLs of Office documents (.doc,.ppt,.xls, .pps). We then need to download the document and push it through the meta information parser. Finally, we can extract the interesting bits and do some post processing on it. We already have a scraper (see the previous section) and thus we just need something that will extract the meta information from the file.Thomas Springer at ServerSniff.net was kind enough to provide me with the source of his document information script. After some slight changes it looks like this: #!/usr/bin/perl # File-analyzer 0.1, 07/08/2007, thomas springer # stripped-down version # slightly modified by roelof temmingh @ paterva.com # this code is public domain - use at own risk # this code is using phil harveys ExifTool - THANK YOU, PHIL!!!! # http://www.ebv4linux.de/images/articles/Phil1.jpg Google’s Part in an Information Collection Framework • Chapter 5 201 452_Google_2e_05.qxd 10/5/07 12:46 PM Page 201 use strict; use Image::ExifTool; #passed parameter is a URL my ($url)=@ARGV; # get file and make a nice filename my $file=get_page($url); my $time=time; my $frand=rand(10000); my $fname="/tmp/".$time.$frand; # write stuff to a file open(FL, ">$fname"); print FL $file; close(FL); # Get EXIF-INFO my $exifTool=new Image::ExifTool; $exifTool->Options(FastScan => '1'); $exifTool->Options(Binary => '1'); $exifTool->Options(Unknown => '2'); $exifTool->Options(IgnoreMinorErrors => '1'); my $info = $exifTool->ImageInfo($fname); # feed standard info into a hash # delete tempfile unlink ("$fname"); my @names; print "Author:".$$info{"Author"}."\n"; print "LastSaved:".$$info{"LastSavedBy"}."\n"; print "Creator:".$$info{"creator"}."\n"; print "Company:".$$info{"Company"}."\n"; print "Email:".$$info{"AuthorEmail"}."\n"; exit; #comment to see more fields foreach (keys %$info){ print "$_ = $$info{$_}\n"; } 202 Chapter 5 • Google’s Part in an Information Collection Framework 452_Google_2e_05.qxd 10/5/07 12:46 PM Page 202 sub get_page{ my ($url)=@_; #use curl to get it - you might want change this # 25 second timeout - also modify as you see fit my $res=`curl -s -m 25 $url`; return $res; } Save this script as docinfo.pl.You will notice that you’ll need some PERL libraries to use this, specifically the Image::ExifTool library, which is used to get the meta data from the files. The script uses curl to download the pages from the server, so you’ll need that as well. Curl is set to a 25-second timeout. On a slow link you might want to increase that. Let’s see how this script works: $ perl docinfo.pl http://www.elsevier.com/framework_support/permreq.doc Author:Catherine Nielsen LastSaved:Administrator Creator: Company:Elsevier Science Email: The scripts looks for five fields in a document: Author, LastedSavedBy, Creator, Company, and AuthorEmail.There are many other fields that might be of interest (like the software used to create the document). On it’s own this script is only mildly interesting, but it really starts to become powerful when combining it with a scraper and doing some post processing on the results. Let’s modify the existing scraper a bit to look like this: #!/usr/bin/perl use strict; my ($domain,$num)=@ARGV; my @types=("doc","xls","ppt","pps"); my $result; foreach my $type (@types){ $result=`curl -s -A moo "http://www.google.com/search?q=filetype:$type+site:$domain&hl=en& num=$num&filter=0"`; parse($result); } sub parse { ($result)=@_; Google’s Part in an Information Collection Framework • Chapter 5 203 452_Google_2e_05.qxd 10/5/07 12:46 PM Page 203 my $start; my $end; my $token="<div class=g>"; my $count=1; while (1){ $start=index($result,$token,$start); $end=index($result,$token,$start+1); if ($start == -1 || $end == -1 || $start == $end){ last; } my $snippet=substr($result,$start,$end-$start); my ($pos,$url) = cutter("<a href=\"","\"",0,$snippet); my ($pos,$heading) = cutter(">","</a>",$pos,$snippet); my ($pos,$summary) = cutter(""," ",$pos,$snippet); # remove and $heading=cleanB($heading); $url=cleanB($url); $summary=cleanB($summary); print $url."\n"; $start=$end; $count++; } } sub cutter{ my ($starttok,$endtok,$where,$str)=@_; my $startcut=index($str,$starttok,$where)+length($starttok); my $endcut=index($str,$endtok,$startcut+1); my $returner=substr($str,$startcut,$endcut-$startcut); my @res; push @res,$endcut; push @res,$returner; return @res; } sub cleanB{ 204 Chapter 5 • Google’s Part in an Information Collection Framework 452_Google_2e_05.qxd 10/5/07 12:46 PM Page 204 my ($str)=@_; $str=~s///g; $str=~s/<\/b>//g; return $str; } Save this script as scraper.pl.The scraper takes a domain and number as parameters.The number is the number of results to return, but multiple page support is not included in the code. However, it’s child’s play to modify the script to scrape multiple pages from Google. Note that the scraper has been modified to look for some common Microsoft Office for- mats and will loop through them with a site:domain_parameter filetype:XX search term. Now all that is needed is something that will put everything together and do some post processing on the results.The code could look like this: #!/bin/perl use strict; my ($domain,$num)=@ARGV; my %ALLEMAIL=(); my %ALLNAMES=(); my %ALLUNAME=(); my %ALLCOMP=(); my $scraper="scrape.pl"; my $docinfo="docinfo.pl"; print "Scraping please wait \n"; my @all_urls=`perl $scraper $domain $num`; if ($#all_urls == -1 ){ print "Sorry - no results!\n"; exit; } my $count=0; foreach my $url (@all_urls){ print "$count / $#all_urls : Fetching $url"; my @meta=`perl $docinfo $url`; foreach my $item (@meta){ process($item); } $count++; } #show results Google’s Part in an Information Collection Framework • Chapter 5 205 452_Google_2e_05.qxd 10/5/07 12:46 PM Page 205 print "\nEmails:\n \n"; foreach my $item (keys %ALLEMAIL){ print "$ALLEMAIL{$item}:\t$item"; } print "\nNames (Person):\n \n"; foreach my $item (keys %ALLNAMES){ print "$ALLNAMES{$item}:\t$item"; } print "\nUsernames:\n \n"; foreach my $item (keys %ALLUNAME){ print "$ALLUNAME{$item}:\t$item"; } print "\nCompanies:\n \n"; foreach my $item (keys %ALLCOMP){ print "$ALLCOMP{$item}:\t$item"; } sub process { my ($passed)=@_; my ($type,$value)=split(/:/,$passed); $value=~tr/A-Z/a-z/; if (length($value)<=1) {return;} if ($value =~ /[a-zA-Z0-9]/){ if ($type eq "Company"){$ALLCOMP{$value}++;} else { if (index($value,"\@")>2){$ALLEMAIL{$value}++; } elsif (index($value," ")>0){$ALLNAMES{$value}++; } else{$ALLUNAME{$value}++; } } } } This script first kicks off scraper.pl with domain and the number of results that was passed to it as parameters. It captures the output (a list of URLs) of the process in an array, and then runs the docinfo.pl script against every URL.The output of this script is then sent for further processing where some basic checking is done to see if it is the company name, an e-mail address, a user name, or a person’s name.These are stored in separate hash tables for later use. When everything is done, the script displays each collected piece of information and the number of times it occurred across all pages. Does it actually work? Have a look: 206 Chapter 5 • Google’s Part in an Information Collection Framework 452_Google_2e_05.qxd 10/5/07 12:46 PM Page 206 # perl combined.pl xxx.gov 10 Scraping please wait 0 / 35 : Fetching http://www.xxx.gov/8878main_C_PDP03.DOC 1 / 35 : Fetching http://***.xxx.gov/1329NEW.doc 2 / 35 : Fetching http://***.xxx.gov/LP_Evaluation.doc 3 / 35 : Fetching http://*******.xxx.gov/305.doc <cut> Emails: 1: ***zgpt@***.ksc.xxx.gov 1: ***ikrb@kscems.ksc.xxx.gov 1: ***ald.l.***mack@xxx.gov 1: ****ie.king@****.xxx.gov Names (Person): 1: audrey sch*** 1: corina mo**** 1: frank ma**** 2: eileen wa**** 2: saic-odin-**** hq 1: chris wil**** 1: nand lal**** 1: susan ho**** 2: john jaa**** 1: dr. paul a. cu**** 1: *** project/code 470 1: bill mah**** 1: goddard, pwdo - bernadette fo**** 1: joanne wo**** 2: tom naro**** 1: lucero ja**** 1: jenny rumb**** 1: blade ru**** 1: lmit odi**** 2: **** odin/osf seat 1: scott w. mci**** 2: philip t. me**** 1: annie ki**** Google’s Part in an Information Collection Framework • Chapter 5 207 452_Google_2e_05.qxd 10/5/07 12:46 PM Page 207 Usernames: 1: cgro**** 1: **** 1: gidel**** 1: rdcho**** 1: fbuchan**** 2: sst**** 1: rbene**** 1: rpan**** 2: l.j.klau**** 1: gane****h 1: amh**** 1: caroles**** 2: mic****e 1: baltn****r 3: pcu**** 1: md**** 1: ****wxpadmin 1: mabis**** 1: ebo**** 2: grid**** 1: bkst**** 1: ***(at&l) Companies: 1: shadow conservatory [SNIP] The list of companies has been chopped way down to protect the identity of the government agency in question, but the script seems to work well.The script can easily be modified to scrape many more results (across many pages), extract more fields, and get other file types. By the way, what the heck is the one unedited company known as the “Shadow Conservatory?” 208 Chapter 5 • Google’s Part in an Information Collection Framework 452_Google_2e_05.qxd 10/5/07 12:46 PM Page 208 Figure 5.21 Zero Results for “Shadow Conservatory” The tool also works well for finding out what (and if ) a user name format is used. Consider the list of user names mined from somewhere: Usernames: 1: 79241234 1: 78610276 1: 98229941 1: 86232477 2: 82733791 2: 02000537 1: 79704862 1: 73641355 2: 85700136 From the list it is clear that an eight-digit number is used as the user name.This information might be very useful in later stages of an attack. Taking It One Step Further Sometimes you end up in a situation where you want to hook the output of one search as the input for another process.This process might be another search, or it might be something like looking up an e-mail address on a social network, converting a DNS name to a domain, resolving a DNS name, or verifying the existence of an e-mail account. How do I Google’s Part in an Information Collection Framework • Chapter 5 209 452_Google_2e_05.qxd 10/5/07 12:46 PM Page 209 link two e-mail addresses together? Consider Johnny’s e-mail address johnny@ihackstuff.com and my previous e-mail address at SensePost roelof@sensepost.com.To link these two addresses together we can start by searching for one of the e-mail addresses and extracting sites, e-mail addresses, and phone numbers. Once we have these results we can do the same for the other e-mail address and then compare them to see if there are any common results (or nodes). In this case there are common nodes (see Figure 5.22). Figure 5.22 Relating Two E-mail Addresses from Common Data Sources If there are no matches, we can loop through all of the results of the first e-mail address, again extracting e-mail addresses, sites, and telephone numbers, and then repeat it for the second address in the hope that there are common nodes. What about more complex sequences that involve more than searching? Can you get locations of the Pentagon data centers by simply looking at public information? Consider Figure 5.23. What’s happening here? While it looks seriously complex, it really isn’t.The procedure to get to the locations shown in this figure is as follows: 210 Chapter 5 • Google’s Part in an Information Collection Framework 452_Google_2e_05.qxd 10/5/07 12:46 PM Page 210 . Chapter 5 • Google s Part in an Information Collection Framework 452 _Google_ 2e_05.qxd 10/5/07 12:46 PM Page 208 Figure 5 .21 Zero Results for “Shadow Conservatory” The tool also works well for finding. the existence of an e-mail account. How do I Google s Part in an Information Collection Framework • Chapter 5 209 452 _Google_ 2e_05.qxd 10/5/07 12:46 PM Page 209 link two e-mail addresses together?. ($url)=@_; #use curl to get it - you might want change this # 25 second timeout - also modify as you see fit my $res=`curl -s -m 25 $url`; return $res; } Save this script as docinfo.pl.You will notice

Định dạng
Số trang	10
Dung lượng	355,85 KB