Figure 12.7 Gooscan’s Usage Gooscan’s most commonly used options are outlined in the included README file. Let’s take a look at how the various options work: ■ <-t target> (required argument) This is the Google appliance or server to scan. An IP address or host name can be used here. Caution: Entering www.google.com here violates Google’s terms of service and is neither recommended nor condoned by the author. ■ <-q query | -i query_file> (required argument) The query or query file to send. Gooscan can be used to send an individual query or a series of queries read from a file.The -q option takes one argument, which can be any valid Google query. For example, these are valid options: -q googledorks -q "microsoft sucks" -q "intitle:index.of secret" ■ [ -i input_file] (optional argument) The -i option takes one argument—the name of a Gooscan data file. Using a data file allows you to perform multiple queries with Gooscan. See the following list for information about the included Gooscan data files. ■ [-o output_file] (optional argument) Gooscan can create a nice HTML output file.This file includes links to the actual Google search results pages for each query. ■ [-p proxy:port] (optional argument) This is the address and port of an HTML proxy server. Queries will be sent here and bounced off to the appliance indicated Protecting Yourself from Google Hackers • Chapter 12 491 452_Google_2e_12.qxd 10/5/07 1:24 PM Page 491 with the -t argument.The format can be similar to 10.1.1.150:80 or proxy.validcom- pany.com:8080. ■ [-v] (optional argument) Verbose mode. Every program needs a verbose mode, especially when the author sucks with a command-line debugger. ■ [-s site] (optional argument) This filters only results from a certain site, adding the site operator to each query Gooscan submits.This argument has absolutely no meaning when used against Google appliances, since Google appliances are already site filtered. For example, consider the following Google queries: site:microsoft.com linux site:apple.com microsoft site:linux.org microsoft ■ With advanced express permission from Google (you do have advanced per- mission from Google, don’t you?) you could run the following with Gooscan to achieve the same results: $ ./gooscan -t www.google.com -s microsoft.com linux $ ./gooscan -t www.google.com -s apple.com microsoft $ ./gooscan -t www.google.com -s linux.org microsoft The [-x] and [-d] options are used with the Google appliance. We don’t talk too much about the Google appliance in this book. Suffice it to say that the vast majority of the tech- niques that work against Google.com will work against a Google appliance as well. Gooscan’s Data Files Used in multiple query mode, Gooscan reads queries from a data file.The format of the data files is as follows: search_type | search_string | count | description search_type can be one of the following: ■ intitle Finds search_string in the title of the page. If requested on the command line, Gooscan will append the site query. Example: intitle|error|| This will find the word error in the title of a page. ■ inurl Finds search_string in the URL of the page. If requested on the command line, Gooscan will append the site query. Example: inurl|admin|| 492 Chapter 12 • Protecting Yourself from Google Hackers 452_Google_2e_12.qxd 10/5/07 1:24 PM Page 492 This will find the word admin in the URL of a page. ■ indexof Finds search_string in a directory listing. If requested on the command line, Gooscan will append the site query. Directory listings often will have the term index of in the title of the page. Gooscan will generate a Google query that looks something like this: intitle:index.of search_string NOTE When using the site switch, Gooscan automatically performs a generic search for directory listings. That query looks like this: intitle:index.of site:site_name. If this generic query returns no results, Gooscan will skip any subsequent indexof searches. It is a logical conclusion to skip specific indexof searches if the most generic of indexof searches returns nothing. ■ filetype Finds search_string as a filename, inserting the site query if requested on the command line. For example: filetype|cgi cgi|| This search will find files that have an extension of .cgi. ■ raw This search_type allows the user to build custom queries.The query is passed to Google unmodified, adding a site query if requested in the command line. For example: raw|filetype:xls email username password|| This example will find Excel spreadsheets with the words email, username, and password inside the document. ■ search string The search_string is fairly straightforward. Any string is allowed here except chars \n and |.This string is HTML-ized before sending to Google.The A character is converted to %65, and so on.There are some exceptions, such as the fact that spaces are converted to the + character. ■ count This field records the approximate number of hits found when a similar query is run against all of Google. Site is not applied.This value is somewhat arbi- trary in that it is based on the rounded numbers supplied by Google and that this number can vary widely based on when and how the search is performed. Still, this number can provide a valuable watermark for sorting data files and creating custom Protecting Yourself from Google Hackers • Chapter 12 493 452_Google_2e_12.qxd 10/5/07 1:24 PM Page 493 data files. For example, zero count records could safely be eliminated before run- ning a large search. (This field is currently not used by Gooscan.) ■ description This field describes the search type. Currently, only the filetype.gs data file populates this field. Keep reading for more information on the filetype.gs data file. Several data files are included with Gooscan, each with a distinct purpose: ■ gdork.gs This file includes excerpts from the Google Hacking Database (GHDB) hosted at http://johnny.ihackstuff.com.The GHDB is the Internet’s largest database of Google hacking queries maintained by thousands of members who make up the Search Engine Hacking Forums, also hosted at http://johnny.ihackstuff.com. Updated many times a week, the GHDB currently sits at around 1500 unique queries. ■ filetype.gs This huge file contains every known filetype in existence, according to www.filext.com. By selecting interesting lines from this file, you can quickly deter- mine the types of files that exist on a server that might warrant further investiga- tion. We suggest creating a subset of this file (with a Linux command such as: head -50 filetype.gs > short_filetype.gs for use in the field. Do not run this file as is. It’s too big. With over 8,000 queries, this search would certainly take quite a while and burn precious resources on the target server. Instead, rely on the numbers in the count field to tell you how many (approximate) sites contain these files in Google, selecting only those that are the most common or relevant to your site.The filetypes.gs file lists the most commonly found extensions at the top. ■ inurl.gs This very large data file contains strings from the most popular CGI scan- ners, which excel at locating programs on Web servers. Sorted by the approximate number of Google hits, this file lists the most common strings at the top, with very esoteric CGI vulnerability strings listed near the bottom.This data file locates the strings in the URL of a page.This is another file that shouldn’t be run in its entirety. ■ indexof.gs Nearly identical to the inurl.gs file, this data file finds the strings in a directory listing. Run portions of this file, not all of it! Using Gooscan Gooscan can be used in two distinct ways: single-query mode or multiple-query mode. Single-query mode is little better than using Google’s Web search feature, with the exception that Gooscan will provide you with Google’s number of results in a more portable format. 494 Chapter 12 • Protecting Yourself from Google Hackers 452_Google_2e_12.qxd 10/5/07 1:24 PM Page 494 As shown in Figure 12.8, a search for the term daemon9 returns 2440 results from all of Google.To narrow this search to a specific site, such as phrack.org, add the [-s] option. For example: gooscan -q "daemon9" -t www.google.com -s phrack.org. Figure 12.8 Gooscan’s Single-Query Mode Notice that Gooscan presents a very lengthy disclaimer when you select www.google.com as the target server.This disclaimer is only presented when you submit a search that potentially violates Google TOS.The output from a standard Gooscan run is fairly paltry, listing only the number of hits from the Google search.You can apply the [-o] option to create a nicer HTML output format.To run the daemon9 query with nicer output, run: gooscan -q "daemon9" -t www.google.com -o daemon9.html As shown in Figure 12.9, the HTML output lists the options that were applied to the Gooscan run, the date the scan was performed, a list of the queries, a link to the actual Google search, and the number of results. Protecting Yourself from Google Hackers • Chapter 12 495 452_Google_2e_12.qxd 10/5/07 1:24 PM Page 495 Figure 12.9 Gooscan’s HTML Output in Single-Query Mode The link in the HTML output points to Google. Clicking the link will perform the Google search for you. Don’t be too surprised if the numbers on Google’s page differ from what is shown in the Gooscan output; Google’s search results are sometimes only approxi- mations. Running Google in multiple-query mode is a blatant violation of Google’s TOS but shouldn’t cause too much of a Google-stink if it’s done judiciously. One way to keep Google on your good side is to respect the spirit of its TOS by sending small batches of queries and not pounding the server with huge data files. As shown in Figure 12.10, you can create a small data file using the head command. A command such as: head –5 data_files/gdork.gs > data_files/little_gdork.gs will create a four-query data file, since the gdork.gs file has a commented header line. 496 Chapter 12 • Protecting Yourself from Google Hackers 452_Google_2e_12.qxd 10/5/07 1:24 PM Page 496 Figure 12.10 Running Small Data Files Could Keep Google from Frowning at You The output from the multiple-query run of Gooscan is still paltry, so let’s take a look at the HTML output shown in Figure 12.11. Figure 12.11 Gooscan’s HTML Output in Multiple-Query Mode Protecting Yourself from Google Hackers • Chapter 12 497 452_Google_2e_12.qxd 10/5/07 1:24 PM Page 497 Using Gooscan with the [-s] switch we can narrow our results to one particular site, in this case http://johnny.ihackstuff.com, with a command such as: Gooscan -t www.google.com -i data_files/little_gdork.gs -o ihackstuff.html -s johnny.ihackstuff.com as shown in Figure 12.12. (Don’t worry, that Johnny guy won’t mind!) Figure 12.12 A Site-Narrowed Gooscan Run Most site-narrowed Gooscan runs should come back pretty clean, as this run did. If you see hits that look suspicious, click the link to see exactly what Google saw. Figure 12.13 shows the Google search in its entirety. In this case, we managed to locate the Google Hacking Database itself, which included a reference that matched our Google query.The other searches didn’t return any results, because they were a tad more specific than the Calamaris query, which didn’t search titles, URLs, filetypes, and the like. In summary, Gooscan is a great tool for checking your Web site’s exposure, but it should be used cautiously since it does not use the Google API. Break your scans into small batches, unless you (unwisely) like thumbing your nose at the Establishment. 498 Chapter 12 • Protecting Yourself from Google Hackers 452_Google_2e_12.qxd 10/5/07 1:24 PM Page 498 Figure 12.13 Linking to Google’s Results from Gooscan Windows Tools and the .NET Framework The Windows tools we’ll look at all require the Microsoft .NET framework, which can be located with a Google query of .NET framework download.The successful installation of the framework depends on a number of factors, but regardless of the version of Windows you’re running, assume that you must be current on all the latest service packs and updates. If Windows Update is available on your version of Windows, run it.The Internet Explorer upgrade, available from the Microsoft Web site (Google query: Internet Explorer upgrade) is the most common required update for successful installation of the .NET Framework. Before downloading and installing Athena or Wikto, make sure you’ve got the .NET Framework (versions 1.1 or 2.0 respectively) properly installed. NOTE The only way Google will explicitly allow you to automate your queries is via the Google Application Programming Interface. Some of the API tools cov- ered in this book rely on the SOAP API, which Google discontinued in favor of the AJAX API. If you have an old SOAP API key, you’re in luck. That key will still work with API-based tools. However, if you don’t have a SOAP key, you should consider using SensePost’s Aura program (www.sensepost.com/research/aura) as an alternative to the old SOAP API. Protecting Yourself from Google Hackers • Chapter 12 499 452_Google_2e_12.qxd 10/5/07 1:24 PM Page 499 Athena Athena by Steve Lord (steve@buyukada.co.uk) is a Windows-based Google scanner that is not based on the Google API.As with Gooscan, the use of this tool is in violation of Google’s TOS and that as a result, Google can block your IP range from using its search engine. Athena is potentially less intrusive than Gooscan, since Athena only allows you to perform one search at a time, but Google’s TOS is clear: no automated scanning is allowed. Just as we discussed with Gooscan, use any non-API tool judiciously. History suggests that if you’re nice to Google, Google will be nice to you. Athena can be downloaded from http://snakeoillabs.com/.The download consists of a single MSI file. Assuming you’ve installed version 1.1 of the .NET Framework, the Athena installer is a simple wizard, much like most Windows-based software. Once installed and run, Athena presents the main screen, as shown in Figure 12.14. Figure 12.14 Athena’s Main Screen As shown, this screen resembles a simple Web browser.The Refine Search field allows you to enter or refine an existing query.The Search button is similar to Google’s Search button and executes a search, the results of which are shown in the browser window. To perform basic searches with Athena, you need to load an XML file containing your desired search strings. Simply open the file from within Athena and all the searches will appear in the Select Query drop-down box. For example, loading the digicams XML file 500 Chapter 12 • Protecting Yourself from Google Hackers 452_Google_2e_12.qxd 10/5/07 1:24 PM Page 500 . ./gooscan -t www .google. com -s apple.com microsoft $ ./gooscan -t www .google. com -s linux.org microsoft The [-x] and [-d] options are used with the Google appliance. We don’t talk too much about the Google. valid Google query. For example, these are valid options: -q googledorks -q "microsoft sucks" -q "intitle:index.of secret" ■ [ -i input_file] (optional argument) The -i option. Yourself from Google Hackers • Chapter 12 491 452 _Google_ 2e_12.qxd 10/5/07 1:24 PM Page 491 with the -t argument.The format can be similar to 10.1.1. 150: 80 or proxy.validcom- pany.com:8080. ■ [-v] (optional