How MySQL Powers Web 2.0 An overview of how MySQL helps power Web 2.0 technologies and companies A MySQL® Technical White Paper August 2006 Copyright © 2006, MySQL AB All logos are trademarks of their respective companies Table of Contents Executive Summary 4 What is Web 2.0? 4 Characteristics and Core Competencies .5 The Web is the Platform 6 An Architecture of Participation .7 O’Reilly’s Hierarchy of “Web 2.0-ness” .7 MySQL & Web 2.0 Applications .8 Virtual/Online Communities & Worlds .8 Linden Labs/Second Life and MySQL .9 Additional Characteristics 9 Web Syndication & Feeds .9 FeedBurner and MySQL 10 Additional Characteristics 10 Blogs 10 LiveJournal and MySQL 11 Additional Characteristics 11 Social Networking 11 Mixi.jp and MySQL .12 Additional Characteristics 12 Wikis 12 Wikipedia and MySQL .12 Additional Characteristics 13 Customized and Advanced Meta Search Engines 13 Technorati and MySQL 13 Additional Characteristics 14 File, Image & Video Sharing .14 Flickr and MySQL 14 Additional Characteristics 15 Online Gaming 15 PokerRoom.com and MySQL 15 Additional Characteristics 16 Technology Requirements of Web 2.0 16 The Benefits of Community & Open Source .16 Linux 17 Apache 17 MySQL .17 PHP, Perl and Python .18 Ruby on Rails 18 Ajax 19 memcached .19 How MySQL Powers Web 2.0 20 “Fail Fast – Scale Fast” .20 Scale Out vs. Scale Up .20 MySQL Replication 21 Copyright © 2006, MySQL AB Page 2 of 29 MySQL Cluster for High Availability 22 MySQL Query Cache 24 Pluggable Storage Engine Architecture 25 10 Reasons to Choose MySQL for Web 2.0 Applications 26 Conclusion 27 About MySQL 28 Additional MySQL & Web 2.0 Resources .28 MySQL and Web 2.0 Portal .28 Web 2.0 Articles 28 White Papers .29 Case Studies .29 Press Releases, News and Events .29 Live Webinars 29 Webinars on Demand 29 Copyright © 2006, MySQL AB Page 3 of 29 Executive Summary The Internet continues to evolve at an accelerated rate, with new technological innovations being introduced all the time. These changes are manifesting themselves in hardware computing power, software versatility and networking speeds. This is constantly forcing us to rethink not only how we currently use the web, but also in what new possibilities lie in the future. The rapid adoption in which these new technologies and services are being integrated into our lives, are dramatically changing the way we communicate, socialize, share and locate information, entertain ourselves and shop for goods and services. It also creates unique opportunities (and challenges) for companies and organizations on how to best leverage these innovations. This rapidly evolving landscape of “next generation” technologies and companies, are being categorized as “Web 2.0”. Because these applications predominately “live” online, a strong collaborative and collective nature is being harnessed. Where the web was once a static and passively consumed experience, it is now dynamic, transactional and interactive, where participation is not optional, it is mandatory. The companies that are delivering these applications and services are taking advantage of the lowered market entry points, by making full use of the benefits of open source software running on commodity-off-the-shelf hardware. This has allowed Web 2.0 companies to meet their capacity and performance requirements, incrementally over time. It is no surprise a common characteristic of many Web 2.0 websites, applications and companies, is their use of the LAMP (Linux, Apache, MySQL, PHP, Perl & Python) open source stack. This allows fast-growing sites to deliver performance, scalability and reliability to millions of users at a fraction of the cost of proprietary databases. MySQL enables up-and-coming Web 2.0 sites like Wikipedia, FeedBurner and digg, - as well as established web properties like Craigslist, Google and Yahoo! - to scale out and meet the ever-increasing volume of users, transactions and data. The information presented here will be valuable to entrepreneurs about to create their own Web 2.0 business, existing web properties wishing to bring their applications to the next level, but also to the large number of enterprises interested in leveraging Web 2.0 technologies. You will also gain an understanding of how MySQL can be used in conjunction with other open source components to deliver low-cost, reliable, scalable, high performance Web 2.0 applications. What is Web 2.0? Web 2.0 can generally be thought of as the technologies and web sites who leverage users and developers in a socially collaborative manner in order to rapidly develop data and applications with a high level of integration across platforms and other services. The term “Web 2.0” was first coined back in 2004 during a brainstorming session between Tim O’Reilly of O’Reilly Media and MediaLive International, a company which puts on technology tradeshows. The term was originally intended for use as the name to describe an upcoming conference showcasing new web-based companies and technologies that had emerged post dot-com bubble. The term “Web 2.0” has since been dismissed as a marketing buzzword, co-opted and validated several times over by various individuals and companies. It has typically been used as a way to describe the new technologies and companies that are revolutionizing the way we use and think about the World Wide Web. Tim O’Reilly expands further on the definition of the term, in his article “What is Web 2.0”, September 30, 2005: “Web 2.0 applications are those that make the most of the intrinsic advantages of that platform: delivering software as a continually updated service that gets better the more people use it, consuming and remixing data from multiple sources, including individual Copyright © 2006, MySQL AB Page 4 of 29 users, while providing their own data and services in a form that allows remixing others, creating network effects through an “architecture of participation” and going beyond the page metaphor of Web 1.0 to deliver rich user experiences.” In the following sections we delve deeper into four ideas central to the discussion of Web 2.0, which O’Reilly and others have elaborated on since the initial emergence of the term. They include: • Characteristics and Core Competencies of Web 2.0 • The Web is the Platform • An Architecture of Participation • Hierarchy of “Web 2.0-ness” Characteristics and Core Competencies O’Reilly names the seven characteristics and core competencies of Web 2.0 companies in his article, “What is Web 2.0”, September 30, 2005 and we elaborate on them in this section. “They should be in the business of providing services not packaged software, while enabling cost effective scalability.” A key component here is the frequency in which Web 2.0 companies leverage MySQL Replication in a “scale-out” configuration. Scale-out enables the use of low-cost commodity servers to increase database performance and scalability, incrementally, at a fraction of the price of traditional “fork-lift” or “scale-up” methods. This capability is critical for companies who experience explosive growth and adoption in very short time frames. “They should also exercise control over unique, difficult to replicate data sources which get richer the more individuals use and contribute to them.” A site might include a database of customer reviews and recommendations on products. You could imagine the difficulty of attempting to recreate all the unique, varied, and unbiased opinions you may find on a particular product. A site like Amazon’s “Customer’s Reviews” is an everyday example. They “control” the data of customer reviews, which would be difficult to replicate that without the same customer traffic and level of participation Amazon’s customers engage in. Another similar use can be found in eBay’s “Seller Ratings”. For many companies, this competency revolves around controlling data that for competitors is prohibitive to replicate due to the licensing costs from private data providers or an inability to engage users to “create” the data. “Trusting users as co-developers.” This concept revolves around the idea that the users are actively assisting in various capacities in the development process of the application. To some degree, this is not a new concept. The open source community has relied on this model of active contribution since its inception. More specifically, the users and development community are active participants in the development, testing, requesting of enhancements and reporting of bugs. Even companies that offer applications which are not “open source” have employed this methodology. This can manifest itself by the introduction of new functionality in an accelerated manner which mimics the open source community’s “release early, release often” development model. It can also be accomplished by monitoring the usage patterns of users in order to gather intelligence on what functionality is being used and creating value, and which functionality is not. By adopting the development state of being “in perpetual beta”, it allows a company to be more responsive in the adoption of new technologies and usage patterns. They also become more adaptable to changing business conditions. An example of an application still in “beta” yet with a large user base actively using and contributing either actively or passively, is Google’s Gmail application. “Harnessing collective intelligence.” “Collective Intelligence” refers to the level participation a website reaches when the users themselves are actively deciding what is important and provides value to them. Websites which offer product reviews allow the users amongst themselves to rate products which present Copyright © 2006, MySQL AB Page 5 of 29 a good value and those that do not. Other examples include Wikipedia where the users are charged with creating and policing data on the system. But also any site which leverages “tags” created by the users themselves to help aggregate and locate relevant data. The idea of collective intelligence can be applied to just about any system where the users of the system have been empowered to decide what is important or accurate and what is not. “Leveraging the long tail through customer self service.” The “long tail” was first coined by Chris Anderson in a 2004 Wired Magazine article to describe certain business and economic models such as Amazon.com or Netflix. The point being made was that customer self-service can be leveraged with effective data management in order to offer goods and services which appeal to users outside the mainstream. The belief is that the aggregate of all these non-mainstream users is much larger then the mainstream users. For example, online booksellers and DVD rental sites draw a significant portion of their revenue from titles that have long disappeared from the general public’s radar. Another example can be seen in the gaining popularity of music trading sites like LaLa.com, whose business revolves around bringing “like-minded” individuals to trade music amongst themselves, regardless if their tastes fall well outside the mainstream. It could be argued that this isn’t necessarily a new concept, as many traditional booksellers, video rental and music stores can attest to the fact that a significant portion of their business stems from music albums which have long disappeared from the charts, movies which have not screened in a theatre for years or books by authors who have long since fallen out of favor with the general public. Web 2.0 companies have realized that their applications must be designed to serve not only popular tastes but also the interests of those on the fringes. “Software above the level of a single device.” The portability of data and the access to it is something Web 2.0 applications must attempt to adhere to. Users are coming to expect that their data can be accessed and synchronized across many devices, such as MP3 players, PDAs, cell phones, kiosks and more traditional computing mediums like workstations and laptops. This data must be especially indifferent to the hardware or operating system platforms on which is to be accessed. “Lightweight user interfaces, development models, AND business models.” On this point, the interfaces which users use to access data must be “lightweight” and highly portable, but still capable delivering a rich end user experience. The use of programming techniques and methodologies like Ajax and Ruby on Rails can be thought of in this respect. Using commodity off-the-shelf hardware, open source software and leveraging development communities and users for testing, allow Web 2.0 companies to enter established markets or create new markets at lower costs. It also allows their applications to be in “perpetual beta”, constantly adapting to the changes conditions of the marketplace and needs of end users. The Web is the Platform The Web has become “the destination where it all happens”. The exchange and distribution of ideas, how we socialize, conduct business, work, and play is increasingly finding its way on to the Web. In some cases it has become the primary way in which many particular interactions are now conducted. Of course, at the heart of these interactions are the people, applications and the data that drive them. Not many years ago it would have been hard to imagine the Web as a strategically important platform for many of the things that are now common place, like trading stocks, booking travel, conducting commerce, bartering for goods and services, finding new/old friends or even a potential life mate. This perception was often due to the fact that the applications making use of the web as a platform, were often sluggish, had few security controls, were graphically uninteresting, or were held captive by the speed of the end-users internet connection. When comparing these characteristics against the existing desktop Copyright © 2006, MySQL AB Page 6 of 29 applications of the day, it is no wonder some people found it hard to imagine that the web could ever be considered a viable platform over the desktop. Fast forward a few years and web applications are now beginning to provide close to if not better end-user experiences. Email is a good example of an application originally relegated to the desktop if you wanted any advanced features. It is now an application that can be accessed over the web, from essentially anywhere in the world with an internet connection, with minimal loss in functionality over a desktop client. In some cases, there is even increased functionality, like portability, if we include accessing email over a PDA or cell phone. Plus advanced search capabilities, contact sharing between devices, no need for local backups and almost unlimited “theoretical” storage capacity. A similar trend is well under way as it relates to spreadsheets, word processing and calendaring. The evolution we are witnessing is that of the web quickly becoming the next “desktop”, or more specifically, the next operating platform on which applications are being designed to run on exclusively. An Architecture of Participation The concept of “an architecture of participation” is typically used to describe companies, technologies and projects, intentionally designed for contribution from developer communities and individual users with an emphasis on empowerment and openness. Often times this concept is closely linked to open source projects and companies. It may be worth noting that a technology or company that is open source does not necessarily mean it automatically exhibits “an architecture of participation”. However, it is often much easier for open source companies and projects, as they will likely have a devout and often vibrant developer community. Many times proprietary products find it difficult to cultivate a participatory quality without heavy subsidization. This can be further complicated if the source code is closed, or the exposed APIs are complex, making even peripheral contributions difficult. A “release early and release often” development cycle, characteristic of open source software, is an excellent way to include a community of volunteers and parties with vested interests in the software, to test and help debug code. Often the introduction of new features is done in strategic locations on a website or within an application to help ascertain its popularity or usability. This helps developers understand if the feature should be more widely employed and enhanced, or abandoned all together. “An architecture of participation” also relates to the idea of users creating meaningful and valuable data for themselves. Often times the application simply provides the framework and tools to empower users in this capacity. A practical manifestation of this may include seller ratings, user recommendations and restaurant reviews. Some examples of applications which typify this concept include: Feeds: Users and applications allow their content to be picked up for distribution to subscribers Blogs: Users create site content and drive traffic Social Networking: Users create site content and through their social channels to build a network Wikis: Users contribute articles and manage the content for accuracy and relevance O’Reilly’s Hierarchy of “Web 2.0-ness” O’Reilly also articulated a “hierarchy” of degrees to which an application possesses or typifies Web 2.0 attributes. They can be found in his article, “Levels of the Game: The Hierarchy of Web 2.0 Applications” July 17, 2006, and we also elaborate on them here. Copyright © 2006, MySQL AB Page 7 of 29 “Level 3: Could ONLY exist on the Web and draws its essential power from the network and the connections it makes possible between people and applications.” These applications are characterized as those who require the collective online activity of users in order for the application to become more valuable. Examples in this regard include eBay’s “seller ratings”, Wikipedia’s articles and del.icio.us’s aggregation of tags and sharing of bookmarks. The larger the network of users which contribute or depend on the data, the more valuable it becomes. “Level 2: Could exist offline, but has unique advantages by being online.” An example in this case could be found among photo sharing applications. Unlike desktop photo management applications like Adobe Photoshop Album or Google’s Picasa, online applications like Flickr, gain unique advantages by being online. Specifically, by their ability to share images publicly with other users. Plus, their ability to then be indexed and searched for online via the use of tags and other metadata characteristics. “Level 1: Can and does exist offline, but gains additional functionality by being online.” This level can be usually assigned to productivity applications which sometimes benefit from collaboration. O’Reilly uses the example of Writely in this case. He points out that his word processing application can be used offline when remarks or comments are not required from others (as is true in the vast majority of cases). But when collaborative editing and review is required, its online attributes make it much more efficient and effective then trying to reconcile the markups from multiple reviewers on the same document, individually. This same idea can be expanded to calendaring software as well. Unless, the calendar needs to be viewed, edited or shared by others, outside of the ability to access it online, there is little advantage. “Level 0: The application has primarily taken hold online, but it would work just as well offline if you had all the data in a local cache.” This of course is prohibitive in many cases based on the amount of data that may be required or if the data is licensed or proprietary. O’Reilly’s examples include, MapQuest, Yahoo! Local, and Google Maps. MySQL & Web 2.0 Applications In this section we explore some Web 2.0 applications and the companies delivering them. We also highlight how MySQL is being leveraged in each scenario. Virtual/Online Communities & Worlds A virtual/online community or world, sometimes referred to as a “virtual reality”, is a group of people who gather and communicate on the web, often through virtual identities. “Inhabiting” these worlds is typically done for recreational and entertainment purposes. The companies which host virtual communities and worlds make extensive use of Web 2.0 technologies and business models. Communication features like instant messaging, forums, and email are typically offered by these sites in order to foster inter-communication amongst its members. Software tools which permit a high level of customization and personalization of these identities are also made available. These virtual communities depend upon a high-level of social interaction and participation among its members in order to function, but also to remain dynamic and grow. Although typically moderated, especially those communities where children congregate, other communities allow the bulk of the moderating to be done by the users themselves. Therefore empowering the end users to decide was acceptable and worth while. Popular examples of virtual/online communities include: Habbo Hotel, Neopets, Linden Labs (Second Life), and Cyworld. Copyright © 2006, MySQL AB Page 8 of 29 Linden Labs/Second Life and MySQL Second Life is a 3-D virtual world entirely built and owned by its residents. Since opening to the public in 2003, it has grown explosively and today is inhabited by 370,997 people from around the globe. The Second Life "world" is hosted on servers that are owned and maintained by Linden Lab. Second Life provides users with tools to interact and modify the virtual world they inhabit. A vast majority of the content in the Second Life world is created by the users themselves. Because of the empowerment and high degree of participation amongst its users, the Second Life world is comprised of many rich and diverse cultures. It is also worth noting that Linden Lab’s actively encourages users to retain the intellectual property rights to any objects they create within the Second Life world. Additional Characteristics • Business is growing over 20% per month • Expected to have anywhere from 1 to over 3 million users by 2007 • At peak times 5000-5500 users working in parallel • Over 1100 servers and about 2000 CPUs to support the “virtual world” • Actively replacing several proprietary portions of Second Life with open-source technologies • MySQL used to manage user accounts, inventories and presence information • High data storage and performance requirements • Technology Stack: Debian Linux, Apache, MySQL, PHP/Perl/Python and Ruby on Rails For more information about Linden Labs/Second Life and MySQL please see: Interview with Ian Wilkes, Director of Operations at Linden Labs http://dev.mysql.com/tech-resources/interviews/ian-wilkes-linden-lab.html My Second Life Runs on MySQL: War Stories from the Metaverse http://mysqluc.com/presentations/mysql06/wilkes_ian.pdf Web Syndication & Feeds A web “feed” is typically an XML-based document which contains content items, often summaries of stories or blog posts with web links to longer versions. News websites, blogs and podcasts are common sources for these feeds, but they are also used to deliver structured information like current weather data. RSS and Atom are currently the two main web feed formats. Web syndication can be used to describe the function of making an information source, such as a blog available for feed distribution. It is very similar to other syndicated media like television and radio programs or news stories distributed over “the wire”. Likewise the contents of a web feed may be shared and posted by other web sites. Feeds are typically subscribed to directly by users using aggregators or feed readers, which combine the contents of multiple web feeds for presentation. Subscription to a feed is typically done by manually entering the URL of a feed or by clicking a link on the page. Copyright © 2006, MySQL AB Page 9 of 29 Because web feeds are designed to be machine-readable rather than human-readable, they can also be leveraged to automatically transfer information from one website to another, without any human intervention. Popular examples of Web Syndication, Feed Management, Feed Aggregator sites and readers include: FeedBurner, digg, Feedster, MyYahoo!, and Google Reader. FeedBurner and MySQL FeedBurner is the world's largest feed management provider. Their Web-based services help bloggers, podcasters and commercial publishers promote, deliver and profit from their content on the Web. FeedBurner also offers the largest advertising network for feeds that brings together an unprecedented caliber of content aggregated from the world's leading media companies, A-list bloggers and blog networks and individual publishers. Additional Characteristics • Provides services for over 170,000 bloggers, podcasters and publishers • Handles over 270,000 feeds • 1 million hits per day • 11 million subscribers in 190 countries • MySQL Replication for Scale-Out leveraged for reads and snapshot backups • Query Cache leveraged in this very high-read environment For more information about FeedBurner and MySQL please see: FeedBurner: Scalable Web Applications Using MySQL and Java http://www.mysqluc.com/cs/mysqluc2006/view/e_sess/8099 Blogs Weblogs or Blogs as they are more commonly referred to are personal websites in a journal or diary format. Text, images, videos and files make up the majority of the content on blogs. They typically allow visitors to post comments and other messages in response to the bloggers posts. “Pingback” and “trackbacks” can be leveraged so that conversations spanning several blogs can be easily traversed or navigated by readers attempting to follow an exchange. It is vital for blog building applications and blog hosting sites that the database(s) they leverage are: • Easy to Use: For administrators and end-users if they must interact directly with database. • Reliable: Many users may depend on the service to be available round the clock. Copyright © 2006, MySQL AB Page 10 of 29 . information about Mixi.jp and MySQL please see: Mixi Delivers Massive Scale-Out with MySQL http://www .mysql. com/why -mysql/ case-studies /mysql- cs-mixi Mixi.jp:. PokerRoom.com and MySQL please see: PokerRoom.com Powers High Transaction Online Poker System with MySQL and HP http://www .mysql. com/why -mysql/ case-studies /mysql- hp-ongame-casestudy.pdf