Figure 12.24 GSI Options Screen Execution is simple as well. Simply fill in the name of the target website, and click Start GSI. The results will be shown in a hierarchical format as shown in Figure 12.25. Figure 12.25 GSI Output Protecting Yourself from Google Hackers • Chapter 12 511 452_Google_2e_12.qxd 10/5/07 1:24 PM Page 511 Notice that the results are presented in a hierarchical tree that represents the files and direc- tories on the target site. Each link can be clicked on to browse to the appropriate page. Alternatively you can right-click within Firefox and select GSI. In this case, GSI will launch with the query filled in based on the selected text, or if no text is selected, GSI will automatically fill in the name of the current website. GSI has several options to select from, as shown in Table 12.2. Table 12.2 GSI Options GSI Option Description Recursive Search If you choose to use a recursive search, GSI will use inurl searches. For example, if you choose to do a Google Site Index on tankedgenius.com. It would first send a query site:tankedgenius.com. The query would return a result of http://www.tankedgenius.com/blog/cp/index.html. If a recursive search is at level is at 1, then it would also send a query of site:tankedgenius.com inurl:blog. It would then add the results from that to the index. If the recur- sion level is set to 2, it would also send a query of site:tankedgenius.com inurl:cp and get the results. Full website names By default GSI displays an indented site index with only the directory name showing for each link. If you would prefer, you can set this option so that it shows the entire link NOTE Due to the nature of the Google queries that GSI sends, GSI may get 403 errors from Google. These errors are normal when sending queries with mul- tiple operators. Advanced Dork Advanced Dork is an extension for Firefox and Mozilla browsers which provides Google Advanced Operators for use directly from the right-click context menu. Written by CP, the tool is available from https://addons.mozilla.org/en-US/firefox/addon/2144. Like all Firefox extensions, installation is a snap: simply click the link to the .xpi file from within Firefox and the installation will launch. 512 Chapter 12 • Protecting Yourself from Google Hackers 452_Google_2e_12.qxd 10/5/07 1:24 PM Page 512 Advanced Dork is context sensitive—Right licking will invoke Advanced Dork based on where the right-click was performed. For example, right-clicking on a link will invoke link- specific options as shown in Figure 12.26. Figure 12.26 Advanced Dork Link Context Right-clicking on a highlighted text will invoke the highlighted text search mode of Advanced Dork, as shown in Figure 12.27. Figure 12.27 Protecting Yourself from Google Hackers • Chapter 12 513 452_Google_2e_12.qxd 10/5/07 1:24 PM Page 513 This mode will allow you to use the highlighted word in an intitle, inurl, intext, site or ext search. Several awesome options are available to Advanced Dork, as shown in Figures 12.28 and 12.29. Figure 12.28 Figure 12.29 Some of these options are explained in Table 12.3. 514 Chapter 12 • Protecting Yourself from Google Hackers 452_Google_2e_12.qxd 10/5/07 1:24 PM Page 514 Table 12.3 Advanced Dork Options Option Descriptions Highlight Text Functions Right click to choose from over 15 advanced Google operators. This function can be dis- abled in the options menu. Right-Click HTML Page Info Right click anywhere on a page with no text selected, and Advanced Dork will focus on the page’s HTML title and ALT tags for searching using the intitle and allintext operators, respec- tively. This function can be disabled in the options menu. Right-Click Links Right click on a link to choose from site: links domain, link: this link, and cache: this link. Site: links domain will only search the domain name, not the full url. Right Click URL Bar Right click the URL Bar (Address Bar) and choose from site, inurl, link, and cache. Inurl works with the highlighted portion of text only. Site will only search the domain name, not the full url. Advanced Dork is an amazing tool for any serious Google user.You should definitely add it to your arsenal. Getting Help from Google So far we’ve looked at various ways of checking your site for potential information leaks, but what can you do if you detect such leaks? First and foremost, you should remove the offending content from your site.This may be a fairly involved process, but to do it right, you should always figure out the source of the leak, to ensure that similar leaks don’t happen in the future. Information leaks don’t just happen; they are the result of some event that occurred. Figure out the event, resolve it, and you can begin to stem the source of the problem. Google makes a great Web page available that helps answer some of the most com- monly asked questions from a Webmaster’s perspective.The “Google Information for Webmasters” page, located at www.google.com/webmasters, lists all sorts of answers to com- monly asked questions. Solving the local problem is only half the battle. In some cases, Google has a cached copy of your information leak just waiting to be picked up by a Google hacker.There are two ways you can delete a cached version of a page.The first method involves the automatic URL removal system at http://www.google.com/webmasters/tools/removals.This page, shown in Figure 12.30, requires that you first verify your e-mail address. Although this Protecting Yourself from Google Hackers • Chapter 12 515 452_Google_2e_12.qxd 10/5/07 1:24 PM Page 515 appears to be a login for a Google account, Google accounts don’t seem to provide you access. In most cases, you will have to reregister, even if you have a Google account.The exception seems to be Google Groups accounts, which appear to allow access to this page without a problem. Figure 12.30 Google’s Automatic URL Removal Tool The URL removal tool will walk you through a series of questions that will verify your ownership of the content and determine what it is that you are trying to remove. Each of the options is fairly self-explanatory, but remember that the responsibility for content removal rests with you.You should ensure that your content is indeed removed from your site, and follow up the URL removal process with manual checks. 516 Chapter 12 • Protecting Yourself from Google Hackers 452_Google_2e_12.qxd 10/5/07 1:24 PM Page 516 Summary The subject of Web server security is too big for any one book.There are so many varied requirements combined with so many different types of Web server software, application software, and operating system software that no one book could do the topic justice. However, a few general principles can at least help you prevent the devastating effects a malicious Google hacker could inflict on a site you’re charged with protecting. First, understand how the Web server software operates in the event of an unexpected condition. Directory listings, missing index files, and specific error messages can all open up avenues for offensive information gathering. Robots.txt files, simple password authentication, and effective use of META tags can help steer Web crawlers away from specific areas of your site. Although Web data is generally considered public, remember that Google hackers might take interest in your site if it appears as a result of a malicious Google search. Default pages, directories and programs can serve as an indicator that there is a low level of technical know-how behind a site. Servers with this type of default information serve as targets for hackers. Get a handle on what, exactly; a search engine needs to know about your site to draw visitors without attracting undue attention as a result of too much exposure. Use any of the available tools, such as Gooscan, Athena, Wikto, GSI, Google Rower and Advanced Dork, to help you search Google for your site’s information leaks. If you locate a page that shouldn’t be public, use Google’s removal tools to flush the page from Google’s database. Solutions Fast Track A Good, Solid Security Policy ■ An enforceable, solid security policy should serve as the foundation of any security effort. ■ Without a policy, your safeguards could be inefficient or unenforceable. Web Server Safeguards ■ Directory listings, error messages, and misconfigurations can provide too much information. ■ Robots.txt files and specialized META tags can help direct search engine crawlers away from specific pages or directories. ■ Password mechanisms, even basic ones, keep crawlers away from protected content. ■ Default pages and settings indicate that a server is not well maintained and can make that server a target. Protecting Yourself from Google Hackers • Chapter 12 517 452_Google_2e_12.qxd 10/5/07 1:24 PM Page 517 Hacking Your Own Site ■ Use the site operator to browse the servers you’re charged with protecting. Keep an eye out for any pages that don’t belong. ■ Use a tool like Gooscan, Athena, GSI , Google Rower or Advanced Dork to assess your exposure.These tools do not use the Google API, so be aware that any blatant abuse or excessive activity could get your IP range cut off from Google. ■ Use a tool like Wikto, which uses the Google API and should free you from fear of getting shut down. ■ Use the Google Hacking Database to monitor the latest Google hacking queries. Use the GHDB exports with tools like Gooscan, Athena, or Wikto. Getting Help from Google ■ Use Google’s Webmaster page for information specifically geared toward Webmasters. ■ Use Google’s URL removal tools to get sensitive data out of Google’s databases. Links to Sites ■ http://johnny.ihackstuff.com The home of the Google Hacking Database (GHDB), the search engine hacking forums, the Gooscan tool, and the GHDB export files. ■ www.snakeoillabs.com Home of Athena. ■ http://www.seorank.com/robots-tutorial.htm A good tutorial on using the robots.txt file. http://googleblog.blogspot.com/2007/02/robots-exclusion-protocol.html Information about Google’s Robots policy. http://www.microsoft.com/technet/archive/security/chklist/iis5cl.mspx The IIS 5.0 Security Checklist http://technet2.microsoft.com/windowsserver/en/library/ace052a0-a713-423e- 8e8c-4bf198f597b81033.mspx The IIS 6.0 Security Best Practices http://httpd.apache.org/docs/2.0/misc/security_tips.html Apache Security Tips document 518 Chapter 12 • Protecting Yourself from Google Hackers 452_Google_2e_12.qxd 10/5/07 1:24 PM Page 518 www.sensepost.com/research/aura Sensepost’s AURA, which simulates Google SOAP API calls. http://www.tankedgenius.com Home of JeffBall and Cp’s GSI and Google Rower tools. https://addons.mozilla.org/en-US/firefox/addon/2144 Home of Cp’s Advanced Dork Q: What is the no-cache pragma? Will it keep my pages from caching on Google’s servers? A: The no-cache pragma is a META tag that can be entered into a document to instruct the browser not to load the page into the browser’s cache.This does not affect Google’s caching feature; it is strictly an instruction to a client’s browser. See www.htmlgoodies.com/beyond/nocache.html for more information. Q: I’d like to know more about securing Web servers. Can you make any recommendations? A: Q: Can you provide any more details about securing IIS? A: Microsoft makes available a very nice IIS Security Planning Tool.Try a Google search for IIS Security Planning Tool. Microsoft also makes available an IIS 5 security checklist; Google for IIS 5 services checklist. An excellent read pertaining to IIS 6 can be found with a query like “elements of IIS security”. Also, frequent the IIS Security Center.Try querying for IIS security center. Q: Okay, enough about IIS. What about securing Apache servers? A: Securityfocus.com has a great article,“Securing Apache: Step-by-Step,” available from www.securityfocus.com/infocus/1694. Protecting Yourself from Google Hackers • Chapter 12 519 Frequently Asked Questions The following Frequently Asked Questions, answered by the authors of this book, are designed to both measure your understanding of the concepts presented in this chapter and to assist you with real-life implementation of these concepts. To have your questions about this chapter answered by the author, browse to www. syngress.com/solutions and click on the “Ask the Author” form. 452_Google_2e_12.qxd 10/5/07 1:24 PM Page 519 Q: Which is the best tool for checking my Google exposure? A: That’s a tough question, and the answer depends on your needs.The absolute most through way to check your Web site’s exposure is to use the site operator. A query such as site:gulftech.org will show you all the pages on gulftech.org that Google knows about. By looking at each and every page, you’ll absolutely know what Google has on you. Repeat this process once a week. If this is too tedious, you’ll need to consider an automation tool. A step above the site technique is Athena. Athena reads the full contents of the GHDB and allows you to step through each query, applying a site value to each search.This allows you to step through the comprehensive list of “bad searches” to see if your site is affected. Athena does not use the Google API but is not automated in the truest sense of the word. Gooscan is potentially the biggest Google automation offender when used improperly, since it is built on the GHDB and will crank through the entire GHDB in fairly short order. It does not use the Google API, and Google will most certainly notice you using it in its wide-open configuration.This type of usage is not recommended, since Google could make for a nasty enemy, but when Gooscan is used with discretion and respect for the spirit of Google’s no-automation rule, it is a most thorough automated tool. 520 Chapter 12 • Protecting Yourself from Google Hackers 452_Google_2e_12.qxd 10/5/07 1:24 PM Page 520 . respect for the spirit of Google s no-automation rule, it is a most thorough automated tool. 520 Chapter 12 • Protecting Yourself from Google Hackers 452 _Google_ 2e_12.qxd 10/5/07 1:24 PM Page 520 . the most com- monly asked questions from a Webmaster’s perspective.The Google Information for Webmasters” page, located at www .google. com/webmasters, lists all sorts of answers to com- monly asked. GSI, Google Rower and Advanced Dork, to help you search Google for your site’s information leaks. If you locate a page that shouldn’t be public, use Google s removal tools to flush the page from Google s