Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 384 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
384
Dung lượng
2,85 MB
Nội dung
Table of Contents Credits Foreword Preface Chapter 1. Searching Google 1. Setting Preferences 2. Language Tools 3. Anatomy of a Search Result 4. Specialized Vocabularies: Slang and Terminology 5. Getting Around the 10 Word Limit 6. Word Order Matters 7. Repetition Matters 8. Mixing Syntaxes 9. Hacking Google URLs 10. Hacking Google Search Forms 11. Date-Range Searching 12. Understanding and Using Julian Dates 13. Using Full-Word Wildcards 14. inurl: Versus site: 15. Checking Spelling 16. Consulting the Dictionary 17. Consulting the Phonebook 18. Tracking Stocks 19. Google Interface for Translators 20. Searching Article Archives 21. Finding Directories of Information 22. Finding Technical Definitions 23. Finding Weblog Commentary 24. The Google Toolbar 25. The Mozilla Google Toolbar 26. The Quick Search Toolbar 27. GAPIS 28. Googling with Bookmarklets Chapter 2. Google Special Services and Collections 29. Google Directory 30. Google Groups 31. Google Images 32. Google News 33. Google Catalogs 34. Froogle 35. Google Labs Chapter 3. Third-Party Google Services 36. XooMLe: The Google API in Plain Old XML 37. Google by Email 38. Simplifying Google Groups URLs 39. What Does Google Think Of . 40. GooglePeople Chapter 4. Non-API Google Applications 41. Don't Try This at Home 42. Building a Custom Date-Range Search Form 43. Building Google Directory URLs 44. Scraping Google Results 45. Scraping Google AdWords 46. Scraping Google Groups 47. Scraping Google News 48. Scraping Google Catalogs 49. Scraping the Google Phonebook Chapter 5. Introducing the Google Web API 50. Programming the Google Web API with Perl 51. Looping Around the 10-Result Limit 52. The SOAP::Lite Perl Module 53. Plain Old XML, a SOAP::Lite Alternative 54. NoXML, Another SOAP::Lite Alternative 55. Programming the Google Web API with PHP 56. Programming the Google Web API with Java 57. Programming the Google Web API with Python 58. Programming the Google Web API with C# and .NET 59. Programming the Google Web API with VB.NET Chapter 6. Google Web API Applications 60. Date-Range Searching with a Client-Side Application 61. Adding a Little Google to Your Word 62. Permuting a Query 63. Tracking Result Counts over Time 64. Visualizing Google Results 65. Meandering Your Google Neighborhood 66. Running a Google Popularity Contest 67. Building a Google Box 68. Capturing a Moment in Time 69. Feeling Really Lucky 70. Gleaning Phonebook Stats 71. Performing Proximity Searches 72. Blending the Google and Amazon Web Services 73. Getting Random Results (On Purpose) 74. Restricting Searches to Top-Level Results 75. Searching for Special Characters 76. Digging Deeper into Sites 77. Summarizing Results by Domain 78. Scraping Yahoo! Buzz for a Google Search 79. Measuring Google Mindshare 80. Comparing Google Results with Those of Other Search Engines 81. SafeSearch Certifying URLs 82. Syndicating Google Search Results 83. Searching Google Topics 84. Finding the Largest Page 85. Instant Messaging Google Chapter 7. Google Pranks and Games 86. The No-Result Search (Prank) 87. Google Whacking 88. GooPoetry 89. Creating Google Art 90. Google Bounce 91. Google Mirror 92. Finding Recipes Chapter 8. The Webmaster Side of Google 93. A Webmaster's Introduction to Google 94. Generating Google AdWords 95. Inside the PageRank Algorithm 96. 26 Steps to 15K a Day 97. Being a Good Search Engine Citizen 98. Cleaning Up for a Google Visit 99. Getting the Most out of AdWords 100. Removing Your Materials from Google Index Foreword When we started Google, it was hard to predict how big it would become. That our search engine would someday serve as a catalyst for so many important web developments was a distant dream. We are honored by the growing interest in Google and offer many thanks to those who created this book—the largest and most comprehensive report on Google search technology that has yet to be published. Search is an amazing field of study, because it offers infinite possibilities for how we might find and make information available to people. We join with the authors in encouraging readers to approach this book with a view toward discovering and creating new ways to search. Google's mission is to organize the world's information and make it universally accessible and useful, and we welcome any contribution you make toward achieving this goal. Hacking is the creativity that fuels the Web. As software developers ourselves, we applaud this book for its adventurous spirit. We're adventurous, too, and were happy to discover that this book highlights many of the same experiments we conduct on our free time here at Google. Google is constantly adapting its search algorithms to match the dynamic growth and changing nature of the Web. As you read, please keep in mind that the examples in this book are valid today but, as Google innovates and grows over time, may become obsolete. We encourage you to follow the latest developments and to participate in the ongoing discussions about search as facilitated by books such as this one. Virtually every engineer at Google has used an O'Reilly publication to help them with their jobs. O'Reilly books are a staple of the Google engineering library, and we hope that Google Hacks will be as useful to others as the O'Reilly publications have been to Google. With the largest collection of web documents in the world, Google is a reflection of the Web. The hacks in this book are not just about Google, they are also about unleashing the vast potential of the Web today and in the years to come. Google Hacks is a great resource for search enthusiasts, and we hope you enjoy it as much as we did. Thanks, The Google Engineering Team December 11, 2002 Mountain View, California Preface Search engines for large collections of data preceded the World Wide Web by decades. There were those massive library catalogs, hand-typed with painstaking precision on index cards and eventually, to varying degrees, automated. There were the large data collections of professional information companies such as Dialog and LexisNexis. Then there are the still-extant private, expensive medical, real estate, and legal search services. Those data collections were not always easy to search, but with a little finesse and a lot of patience, it was always possible to search them thoroughly. Information was grouped according to established ontologies, data preformatted according to particular guidelines. Then came the Web. Information on the Web—as anyone knows who's ever looked at half-a-dozen web pages knows— is not all formatted the same way. Nor is it necessarily particularly accurate. Nor up to date. Nor spellchecked. Nonetheless, search engines cropped up, trying to make sense of the rapidly- increasing index of information online. Eventually, special syntaxes were added for searching common parts of the average web page (such as title or URL). Search engines evolved rapidly, trying to encompass all the nuances of the billions of documents online, and they still continue to evolve today. Google™ threw its hat into the ring in 1998. The second incarnation of a search engine service known as BackRub, the name "Google" was a play on the word "googol," a one followed by a hundred zeros. From the beginning, Google was different from the other major search engines online—AltaVista, Excite, HotBot, and others. Was it the technology? Partially. The relevance of Google's search results was outstanding and worthy of comment. But more than that, Google's focus and more human face made it stand out online. With its friendly presentation and its constantly expanding set of options, it's no surprise that Google continues to get lots of fans. There are weblogs devoted to it. Search engine newsletters, such as ResearchBuzz, spend a lot of time covering Google. Legions of devoted fans spend lots of time uncovering documented features, creating games (like Google whacking) and even coining new words (like "Googling," the practice of checking out a prospective date or hire via Google's search engine.) In April 2002, Google reached out to its fan base by offering the Google API. The Google API gives developers a legal way to access the Google search results with automated queries (any other way of accessing Google's search results with automated software is against Google's Terms of Service.) Why Google Hacks? "Hacks" are generally considered to be "quick-n-dirty" solutions to programming problems or interesting techniques for getting a task done. But what does this kind of hacking have to do with Google? Considering the size of the Google index, there are many times when you might want to do a particular kind of search and you get too many results for the search to be useful. Or you may want to do a search that the current Google interface does not support. The idea of Google Hacks is not to give you some exhaustive manual of how every command in the Google syntax works, but rather to show you some tricks for making the best use of a search and show applications of the Google API that perform searches that you can't perform using the regular Google interface. In other words, hacks. Dozens of programs and interfaces have sprung up from the Google API. Both games and serious applications using Google's database of web pages are available from everybody from the serious programmer to the devoted fan (like me). How This Book Is Organized The combination of Google's API and over 3 billion pages of constantly shifting data can do strange things to your imagination and give you lots of new perspectives on how best to search. This book goes beyond the instruction page to the idea of "hacks"—tips, tricks, and techniques you can use to make your Google searching experience more fruitful, more fun, or (in a couple of cases) just more weird. This book is divided into several chapters: Chapter 1 This chapter describes the fundamentals of how Google's search properties work, with some tips for making the most of Google's syntaxes and specialty search offerings. Beyond the list of "this syntax means that," we'll take a look at how to eke every last bit of searching power out of each syntax—and how to mix syntaxes for some truly monster searches. Chapter 2 Google goes beyond web searching into several different arenas, including images, USENET, and news. Did you know that these collections have their own syntaxes? As you'll learn in this section, Google's equally adroit at helping you holiday shop or search for current events. Chapter 3 Not all the hacks are ones that you want to install on your desktop or web server. In this section, we'll take a look at third-party services that integrate the Google API with other applications or act as handy web tools—or even check Google by email! Chapter 4 Google's API doesn't search all Google properties, but sometimes it'd be real handy to take that search for phone numbers or news stories and save it to a file. This collection of scrapers shows you how. Chapter 5 We'll take a look under the hood at Google's API, considering several different languages and how Google works with each one. Hint: if you've always wanted to learn Perl but never knew what to "do with it," this is your section. Chapter 6 Once you've got an understanding of the Google API, you'll start thinking of all kinds of ways you can use it. Take inspiration from this collection of useful applications that use the Google API. Chapter 7 All work and no play makes for a dull web surfer. This collection of pranks and games turns Google into a poet, a mirror, and a master chef. Well, a chef anyway. Or at least someone who throws ingredients together. Chapter 8 [...]... Esperanto and Klingon) Google offers several language tools, including one for translation and one for Google' s interface The interface option is much more extensive than the translation option, but the translation has a lot to offer 2.1 Getting to the Language Tools The language tools are available by clicking "Language Tools" on the front page or by going to http://www .google. com/language _tools? hl=en The... How does Google manage to have so many interface languages when they have so few translation languages? Because of the Google in Your Language program, which gathers volunteers from around the world to translate Google' s interface (You can get more information on that program at http://www .google. com/intl/en/language.html.) Finally, the Language Tools page contains a list of region-specific Google home... to when a page was created, but when it was indexed by Google So a page created on February 2 and not indexed by Google until April 11 could be found with daterange: search on April 11 Remember also that Google reindexes pages Whether the date range changes depends on whether the page content changed For example, Google indexes a page on June 1 Google reindexes the page on August 13, but the page content... preferences on this page Instead, alter language preferences as needed using the Google Language Tools Between the simple search, advanced search, and preferences, you've got all the beginning tools necessary to build just the Google query to suit your particular purposes Fair warning: if you have cookies turned off, setting preferences in Google isn't going to do you much good You'll have to reset them every... to contribute a hack for future titles, visit: http://hacks.oreilly.com Chapter 1 Searching Google Section 1.1 Hacks #1-28 Section 1.2 What Google Isn't Section 1.3 What Google Is Section 1.4 Google Basics Section 1.5 The Special Syntaxes Section 1.6 Advanced Search Hack 1 Setting Preferences Hack 2 Language Tools Hack 3 Anatomy of a Search Result Hack 4 Specialized Vocabularies: Slang and Terminology... The Google Toolbar Hack 25 The Mozilla Google Toolbar Hack 26 The Quick Search Toolbar Hack 27 GAPIS Hack 28 Googling with Bookmarklets 1.1 Hacks #1-28 Google' s front page is deceptively simple: a search form and a couple of buttons Yet that basic interface—so alluring in its simplicity—belies the power of the Google engine underneath and the wealth of information at its disposal And if you use Google' s... your search across other Google searches, including Google Groups [Hack #30], Google Images [Hack #31], and the Google Directory Beneath that you'll see a count for the number of results and how long the search took Sometimes you'll see results/sites called out on colored backgrounds at the top or right of the results page These are called "sponsored links" (read: advertisements) Google has a policy of... pinwheel flowers, Google wouldn't present the flowers category Why are you seeing category results? After all, Google is a full-text search engine, isn't it? It's because Google has taken the information from the Open Directory Project (http://www.dmoz.org/) and crossed it with its own popularity rankings to make the Google Directory When you see categories, you're seeing information from the Google Directory... search results that Google gives you, and it doesn't involve the basic search input or the advanced search page It's the preferences page Hack 1 Setting Preferences Customize the way you search Google Google's preferences provide a nice, easy way to set your searching preferences from this moment forward 1.1 Language You can set your Interface Language, affecting the language in which tips and messages... O'Reilly & Associates, Inc 1005 Gravenstein Hwy N Sebastopol, CA 95472 (800) 998-9938 (in the U.S or Canada) (707) 829-0515 (international/local) (707) 829-0104 (fax) To ask technical questions or to comment on the book, send email to: bookquestions@oreilly.com The web site for Google Hacks lists examples, errata, and plans for future editions You can find this page at: http://www.oreilly.com/catalog/googlehks/ . OR "Green Bay") This query searches for the word "snowmobile" or phrase "Green Bay" along with the word "snowblower.". software is against Google& apos;s Terms of Service.) Why Google Hacks? "Hacks" are generally considered to be "quick-n-dirty" solutions