Release 2.0 Issue 2.0.9, June 2008 http://r2.oreilly.com Jesse Robbins of the O’Reilly Radar team puts it succinctly: “You only make money when your web site is up The more available and faster your web site, the more revenue-generating pages a customer can view in the same amount of time, and the happier the customer will be.” Jesse Robbins, from Velocity, page 03 Release 2.0 Issue 2.0.9, June 2008 ISSN 1935-9446 Published six times a year by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472 http://r2.oreilly.com This newsletter covers the world of information technology and the Internet — and the business and societal issues they raise Contents 01: Achieving Velocity By Jimmy Guterman 03: Velocity: Transforming Web Operations from Cost Center to Competitive Advantage Operations is the “secret sauce” of successful web sites By Allan Alter and the O’Reilly Radar Team executive editor Tim O’Reilly tim@oreilly.com 04: What is Web Operations? editor Jimmy Guterman jimmy@oreilly.com 10: publisher Sara Winge sara@oreilly.com Web Operations and Performance: Business Principles art director Mark Paglietti markp@oreilly.com 30: Conclusion 31: Appendix: The Technologies of Web Operations 40: Calendar copy editor Steven Sloan contributing writers Brady Forrest Jerry Michalski Sarah Milstein Peter Morville Nathan Torkington David Weinberger © 2008, O’Reilly Media, Inc All rights reserved No material in this publication may be reproduced without prior written permission; however, we gladly arrange for reprints, bulk orders, or site licenses Individual subscriptions cost $495 per year 80427 subscription information Release 2.0 PO Box 17046 North Hollywood, CA 91615-9588 http://r2service.oreilly.com customer service 1.800.889.8969 1.707.827.7019 r2@oreilly.com 06: Web Operations and Business 17: Case Study: Flickr 20: Case Study: iLike 21: Best Practices Jimmy Guterman is editor of Release 2.0 and editorial director of O’Reilly’s Radar group Achieving Velocity Operations is the “secret sauce” of successful web sites In the early years of the commercial web, sites were organized like this: More and more, people will expect pages to load faster, sites to have higher uptime, and companies to deliver more performance with fewer resources These sites were neat, clean, manageable In such an environment, with flat files, a hierarchical structure, and not that many customers, it was relatively easy to keep sites smooth and optimized That didn’t last long Amazon, eBay, and other trailblazers showed that even websites that were neat, clean, and manageable from the point of view of the visitor had to be, in fact, quite complicated on the back end Flat files couldn’t be the rule, as any large site needed to be built on a database, rather than :1 Release 2.0.9 July 2008 Achieving Velocity At the leading digital enterprises, web operations is embedded throughout the enterprise It’s the foundation on which the business runs HTML pages And, since you’re asking for passwords and Social Security numbers, you’d better be rigorous about security The iTunes Music Store doesn’t look to Apple developers the way it looks to customers Anyone running a large-scale web site has at least two great worries: performance and scalability “Another site is just a click away” has been a web- business cliché since NCSA Mosaic was the browser of choice, but much evidence suggests that the quickest way to get customers off a website is to make it slow and unreliable In recent years, as many large sites have launched and some of them have prospered, the art and craft of web operations has become crucial to companies that want thriving digital businesses Companies don’t just hire “an ops person” to think about such issues At the leading digital enterprises, web operations is embedded throughout the enterprise It’s the foundation on which the business runs More and more, people will expect pages to load faster, sites to have higher uptime, and companies to deliver more performance with fewer resources The next cool web startup may well be the one that can quickly scale to serve a large and global audience In this special issue of Release 2.0, we look at the state of web operations, examine early signals of where it’s going, and present the industry’s best practices and most interesting players Also available as a stand-alone O’Reilly Radar research report, this issue is a complement to O’Reilly’s inaugural Velocity conference (http://conferences.oreilly.com/velocity) for web performance and operations Longtime IT analyst and reporter Allan Alter called on the deep experience and hard-won strategic insight of conference co-chairs Jesse Robbins and Steve Souders as he crafted the issue With this report and conference, and in ongoing coverage on the Radar blog (http://radar.oreilly.com), we offer tools for making sure your site is one of the winners nn 2: Jimmy Guterman Release 2.0.9 July 2008 Velocity: Transforming Web Operations from Cost Center to Competitive Advantage Allan Alter Velocity: Transforming Web Operations from Cost Center to Competitive Advantage by Allan Alter and the O’Reilly Radar Team When people say the Web has changed how we work, they tend to think about how people buy and sell products, collaborate and share information with coworkers, or all the new kinds of businesses that have emerged What they often overlook is that the Web has changed something else that’s fundamental to every business—execution Execution, say Larry Bossidy and Ram Charan in their book by that name, is the “discipline of getting things done…the missing link between aspirations and results.” It’s understanding how to operate a business in an efficient, effective, and reliable way, knowing how to meet the expectations of your customers so your organization can meet the expectations of management and investors In an online business—and nearly every business is an online business today— execution must include the discipline of operating websites Only, “include” doesn’t go far enough An online business must think of the website as one of the most important parts of a company’s operations It’s just that critical Customers don’t care about the operational, behind-the-scenes stuff that goes on, such as how many servers support a site, server automation, or HTML coding But they care about whether the site keeps crashing, it takes a long time to download pages, or the features on the site hang up That’s why smart executives at all Internet companies—in fact, any organization that conducts business on the web—are recognizing that reliable web operations and fast website performance are essential With the Web now the sales channel most frequently used by U.S companies, serious money is at stake: Amazon, Google, and Microsoft have found a lag of half a second or less can have a major impact on revenues and the number of searches As Jesse Robbins, an O’Reilly Radar blogger and website availability expert, says, “You only make money when your website is up The more available and the faster your website, the more revenue-generating pages a customer can view in the same amount of time, and the happier the customer will be.” (Robbins was responsible for website availability at Amazon.com, where his title was “Master of Disaster.“) Still, most executives don’t fully understand all the potential business benefits of a high performance, high uptime website—and the hit their business can take if they neglect web operations Nor they know enough about the basic principles, technologies, and management practices that separate well-run sites from the also-rans These principles sometimes require new ways of thinking— especially about how to approach website downtime and “failure.” This report, written for business executives and managers at online businesses (or any company with a commercial website), provides a guide to understanding what web operations means, the business opportunities and risks it presents, and the best practices for operating and managing a mission-critical site You only make money when your website is up :3 Release 2.0.9 July 2008 Velocity: Transforming Web Operations from Cost Center to Competitive Advantage Allan Alter What is Web Operations? “Operations” has long referred to the day-in, day-out processes of a business Chief operating officers run their organizations’ day-to-day activities; in banking, “operations” refers to running branches and processing checks or transactions Likewise, the IT profession has been using “operations” for decades to describe running and maintaining mainframes, servers, and data centers But the phrase “web operations” is much newer Its earliest use dates back to 2003, when the phrase appeared as part of the name of the Internet Web Ops conference You still won’t find many companies with a function known as the web operations department Web operations remains an ill-defined and even controversial term: just as the phrase “classical music” means both music specifically from the era of Haydn and Mozart, and the entire European concert music tradition from Monteverdi to Stockhausen, web operations sometimes refers only to running and maintaining websites, and at other times serves as an umbrella term that also encompasses the field known as “web performance.” In this report, we’ll use the phrase web operations in its broader sense Job Titles in Web Operations: One indication of just how new the term “web operations” and “web performance” still is: neither phrase appeared in online job postings for these positions between November 2007 and April 2008 4: Operations = Availability Theo Schlossnagle, an expert on building scalable, high performing websites and CEO of OmniTI Computer Consulting Inc., is not a fan of the phrase “web operations”; he prefers terms that give a tip of the hat to the technical skills required to run a website, such as “site architects” and “site reliability engineers.” But he does have a clear definition of “web operations” in its narrow sense: it’s “how I put the website in place, how I keep it going, how I meet the demand (if demand rises above capacity), or not eat my shorts in costs (if demand is less than capacity) As business requirements change and mutate, it’s making sure what you have in place still works well.” Web operations focuses on availability—keeping sites up and running Availability includes reliability: the capability to consistently download not just web pages, but the features on those pages (e.g., search, video, account information, online purchasing, chat, etc) To achieve reliability, web operations personnel set up and maintain their sites’ hardware, software, storage, and network infrastructure They also ensure scalability as demand for the site increases, by designing an infrastructure that can grow and by preparing for the future through capacity planning Availability also encompasses recovery: the ability to get a website back up should the site, or any individual feature, fail to be available Part of the job of running a website is setting up redundant systems and even data centers that can take over when equipment fails, and creating plans to get the site back online when it crashes While web operations and availability focuses on the servers, web performance concentrates on what the user sees Performance = Response While web operations and availability focuses on the servers, web performance concentrates on what the user sees “It’s how we deliver our product as quickly as possible and provide a good user experience,” is the succinct definition by Google technical staff member Steve Souders, author of High Performance Websites, creator of the YSlow tool for analyzing web performance, and Yahoo’s former Chief Performance Yahoo! Web performance focuses primarily on response time: the length of time it take to download a web page Improving performance involves optimizing the files, instructions, and components that make up a web page But the definition of performance includes efficiency, too: optimizing hardware, data centers, and networking to serve web pages at maximal speed with minimal resources :5 Release 2.0.9 July 2008 Velocity: Transforming Web Operations from Cost Center to Competitive Advantage In the oil industry, operations is what you to extract oil from ground and put it in a barrel It’s the same with web operations: it’s extracting the value you have in your website Web Operations and Business 6: Allan Alter The difference between operations and performance is important to the people who the work But for others the distinctions often overlap—at some point, a slow page is equivalent to an unavailable page—and are ultimately not very important to the folks outside the data center What users care about is their experience on the Web; what mangement cares about most is the bottom line “In business,” says Adam Jacob, Senior Partner of HJK Solutions and a specialist in establishing web operations for start-up firms, “ ‘operations’ is usually defined as the things you need to to extract value from your resources In the oil industry it’s what you to extract oil from ground and put it in a barrel It’s the same with web operations: it’s extracting the value you have in your website and what you are running on it.” To extract that value, executives don’t need to know the intricacies of web technology What they absolutely must understand are the three most important principles for a sound IT infrastructure And the first of them—design for failure—requires a 180-degree change in how most IT organizations think about downtime Design for failure Failure is inevitable: servers break down, bugs and network outages occur, people make mistakes It’s better to assume that failure will happen, and design the infrastructure so it recovers quickly when it does “A system that can tolerate minor problems and still deliver a great customer experience is better than one that delivers great experience all the time except when the page is blocked out,” says Robbins As telephone companies and power grids with their networks, Google, Amazon, and many of the most successful websites design their web infrastructure as a mesh of connected systems When one system goes down—be it a web server, a database server, or some other piece of hardware—others pick up the load John Allspaw, manager of the Operations Engineering Group at Flickr.com, recalls a bizarre request at one of his first jobs that brought home the point for him “My boss said I want you to take a look at this diagram of networks and servers In three months I will walk around and randomly unplug things to see if the network still works.” A network that can survive a rambling, destructive force is a perfect allegory for how web operations should work, he says Design for scalability In a scalable architecture, computing resources keep up with demand as it grows A scalable system allows you to design or modify applications to run on multiple servers, then add servers to the ones already in place (horizontal scalability), or upgrade your current servers with more powerful and faster ones (vertical scalability) Horizontal scalability is generally cheaper and preferable; vertical scaling makes sense when rewriting code is too costly or—if horizontal scaling is already in place—it makes more sense to replace old servers with fewer, more powerful, less- expensive-to-run servers By contrast, an example of an architecture that doesn’t scale is one that keeps a company’s data in a single database that can’t query other database servers If the database hits its upper limit and it’s necessary for queries to go to multiple database servers, it’s hard to change the code For startup companies with a tiny operations staff, the big scalability issue is surviving sudden spikes in traffic and avoiding long periods of downtime That’s why Sendi Windaja, CTO of startup legal referral website Avvo.com, set up an automated web operations infrastructure from the start—one where the work of adding a new server to a network can be done automatically by the network, rather manually by the operations staff His one-and-a-half person operations The Operations Advantage: new online businesses can spend far less time managing their website if they use server automation from the beginning :7 Release 2.0.9 July 2008 Velocity: Transforming Web Operations from Cost Center to Competitive Advantage Important as it is, a scalable architecture isn’t sufficient for achieving a fast, bug-free user experience staff can add a bare metal server, load the operating system and applications, and add it to the network in 30 minutes Downtime to deploy a new software release is usually a regularly scheduled 10 seconds As websites emerge from the startup phase, capacity planning and anticipating the stress points on systems become the main issues Mainstream retailers and companies have the additional burden of integrating their website’s features with their existing IT infrastructure and systems—still a big challenge for store retailers or catalog firms Design for the browser Important as it is, a scalable architecture isn’t sufficient for achieving a fast, bug-free user experience The reason? Steve Souders’ “Performance Golden Rule:” “Only 10–20% of the end user response time is spent downloading the HTML document The other 80% to 90% is spent downloading all the components on the page.” In other words, when you’re waiting for a page to load, the holdup is not the basic HTML frame of the web page, but the time it takes the browser to download all of the scripts, images, DNS lookups, stylesheets, and redirects that make up a modern web page Souders has set out 14 rules for improving front-end performance in his book High Performance Websites: Essential Knowledge for Frontend Engineers (O’Reilly, 2007) Any website can shed 25 to 50 percent off its download time by following the applicable rules, he claims A downloadable tool by Souders called “YSlow” (which runs on the Firebug for Firefox tool) analyzes whether a web page is following or breaking these rules, and then grades pages 8: Allan Alter Release 2.0.9 July 2008 Velocity: Transforming Web Operations from Cost Center to Competitive Advantage People need to be comfortable working under stress when it’s AM and they have to react to complex situations with possibly high stakes progressively more complex breakdowns in the infrastructure Testing validates the system, but Robbins argues that’s not as important as creating the culture around failure so that when this happens, people know what to People need to be comfortable working under stress when it’s AM and they have to react to complex situations with possibly high stakes The only way to that in an organization is by regularly conducting very realistic exercises Hire the right people It’s a challenge to find good people to fill web operations openings The list of skills and traits needed is long, the career path is unsettled, training and education is hard to find, and the competition to hire good candidates is growing The requirements for web operations and performance experts cover a broad spectrum Web operations administrators are expected to things that systems administrators don’t They need to know databases and how to read and write code Besides technical skills, they have to understand the business aspects of web operations, risk management, and Sarbanes-Oxley “Most internet companies have a laundry list that’s unbelievably long All the things that other people aren’t doing get lumped into web operations,” according to Schlossnagle.” And since the role is still new and ill-defined, the set of skills can’t be learned in schools In the meantime, what guidelines should web operations managers follow when deciding whom to hire? There are several qualities that, according to Allspaw, make web operations engineers good ones: “One, of course, is technical skills working in a web ops environment People who haven’t worked in a web ops environment certainly can learn them, but it’s a hard leap Two, the ability to admit they don’t know the solution to a problem A lot of web ops is planning 28 : Allan Alter for worst case scenarios It’s better to say I don’t know what’s going on than making excuses or pretending like I know They need to be able to admit to mistakes instantly so they can move on to fixing the mistake—for example, misconfiguring a server Even the best web ops guys sometimes dumb things.” “Three, the ability to learn,” Allspaw continued “Just because they don’t know technology X doesn’t mean they can’t learn it very quickly If I had a choice between a very hungry and aggressive learner who doesn’t know many technologies vs a person who is not very aggressive learner but knows a lot, I’ll take the first guy Four, someone who can act rationally and isn’t too excitable in an emergency or outage situation Imagine a room full of people in their cubes, and suddenly the website goes down All eyes are on those people to figure out what’s wrong and fix it In those situations a web operations person needs to focus, they can’t be distracted.” Hold vendors and outsourcers accountable In an environment that’s dominated by open source solutions, managing vendors and managers may not rise to the level of a major issue But once companies sign on to a service provider, proprietary software, or have an outsourcer write code, the usual guidelines for vendor management apply Among the practices to have proven to be effective in a 2002 CIO Insight survey on vendor management: encourage competitive bids, require that vendors fill out requests for proposals, have skilled negotiators and legal specialists work out a contract, and strike deals at the end of the fiscal year Once you’ve selected a vendor, use SLAs to ensure vendors deliver what you are expected, and monitor vendor performance against performance metrics The most important point, says Harden, is whenever you outsource, “don’t capitulate responsibility for anything to anyone You still have to view it as part of your own business Have the metrics to measure performance.” Theo Schlossnagle has proposed a set of job descriptions for junior, mid-level and senior web operations roles on his blog at http:// www.lethargy.org/%7Ejesus/archives/106A-job,-a-mission,-a-career-all-without-apath-or-a-name html : 29 Release 2.0.9 July 2008 Velocity: Transforming Web Operations from Cost Center to Competitive Advantage The pressures to have a fast, scalable, and reliable website will only increase Conclusion 30 : Allan Alter If your company hasn’t implemented many of these best practices, there’s no time like today: the pressures to have a fast, scalable, and reliable website will only increase during the next three to five years Online traffic will grow as more people in the U.S and around the world get broadband access, and use smart phones to go online In America alone, there will be 15.8 million servers by 2010, triple the number at the turn of the century We expect power costs to rise as the availability of electricity fails to keep pace with rising demand driving companies to use the smallest amount of power to meet demand Web ops staff will also have their hands full trying to maintain speedy website performance Websites will continue to compete by adding more rich content More of them will contain mashups—single web pages with content from multiple sources Unless they follow guidelines for fast web pages, mashups are “doomed to be slow pages and bad user experiences,” says Souders For mainstream retailers selling via stores and call centers with the Web, integrating online customer data with other sources of customer data will remain a challenge; “the only people who are doing it are doing so on a custom basis,” says Harden Fortunately, new technologies and tools will also be available to help—at least for managing servers and improving performance More startups and established vendors are entering the utility computing field, and offering products that will provide better ways to manage virtualization New storage technologies and methodologies like iSCSI—a way to use Internet protocol with SCSI drives— promise to provide more flexibility The upcoming versions of Firefox and IE promise to improve download speeds than current ones; the competition between browsers to run Javascript faster is especially strong More web developers and operations staff are adopting today’s tools, which in turn will improve or be replaced and supplemented with others “Firebug and YSlow are getting more attention, but they are just the tip of the iceberg,” says Souders “In a few years we’ll laugh at how infantile they were.” Isn’t that always how it is with technology? Still, the determinant of business success or failure in web operations won’t be the technology, but the discipline and smarts to use them by web ops groups that are properly managed and supported nn Release 2.0.9 July 2008 Appendix: The Technologies of Web Operations Allan Alter Appendix: The Technologies of Web Operations Executives don’t want to get down and dirty with the nuances of web technologies, nor they need to—that’s best left to the technical staff But execs can a better job of providing oversight to their web operations groups, and ensuring the decisions they make will support and grow their business, if they have a basic working knowledge of the technologies and what they Ignorance will cause them to push their IT organization to make poor, short-sighted choices Here, then, is an executive’s overview of web operations technologies, why they matter to their business, and the key issues they need to keep in mind as they and their staff make technology decisions Platforms and Platform Architectures One of the basic decisions that every website makes at the start is “platform” or “platform architecture.” A platform can be defined as: The underlying, fundamental technologies used to run websites These “stacks” of technologies consist of the operating system, web server, database technology, and programming language Microsoft’s proprietary platform and the open source “LAMP” stack are the two most commonly used stacks Microsoft LAMP (open source) Operating system Windows/Vista Linux Web server IIS Apache Database SQLServer MySQL Programming language ASP.net Perl/PhP/Python : 31 Release 2.0.9 July 2008 Appendix: The Technologies of Web Operations Allan Alter A service for delivering web applications These include social networks like Facebook which allow other companies to provide applications on their website Services like Yahoo Data Database platform, which provides user authentication services to other websites, are also called platforms A service for hosting websites An example is “cloud computing” hosting services like Amazon Web Services which allow companies to run their websites off of their servers Microsoft and LAMP Choosing between Microsoft and LAMP isn’t a choice between fast or slow “Choosing a platform has to with the cost of doing business, the reliability of the system, who you can find to the work, etc.” says Schurman Microsoft’s strengths, says Schurman, are that its products integrate well with each other and other products that run on that platform In addition, Microsoft’s broad product line is supported and regularly upgraded by the vendor Adds Jacob, “there’s a path [for integrating the technologies in the platform] and you buy into the path, and they can tell you what the path is.” But the open source LAMP platform, used by Amazon, Google, Yahoo and Flickr—three of the largest web-native companies—is a powerful and popular alternative Developers like both its price (free) and its flexibility: they can crack open the source code, and choose from many open source programming languages, rich internet applications, and tools The disadvantage, say Jacob, is that users have to put these modules together themselves, or hire an outside firm to it for them And while “people around the world contribute to making that code together, the vast majority of those people are not paid to make it better If there is a bug, you usually have to fix it yourself or wait until someone fixes it.” 32 : Facebook and other social networks To attract users, Facebook, MySpace, LinkedIn, and other social networks have enabled programmers to write applications that can run on their website Months after Facebook opened up its application platform to outsiders in May 2007, over 10,000 applications were written by programmers seeking to exploit Facebook’s social network The reason to choose a social network as a platform is business, not technology: the hope of generating income from the millions of people who use them, usually through advertising According to eMarketer, 37 percent of U.S adult Internet users and 70 percent of online teens engage in online social networking every month However, very few Facebook applications have truly taken off: only the top few have millions of active users SOURCE: http://radar.oreilly.com/archives/2008/05/facebook-app-categories.html : 33 Release 2.0.9 July 2008 Appendix: The Technologies of Web Operations Allan Alter The cloud as platform “Cloud” computing (also known as utility computing) is essentially website hosting on a massive host Businesses like Amazon or IBM, with huge reserves of data center capacity, sell capacity to other companies along with services to run and manage websites (Inside the cloud, there’s nothing cloudy at all: it’s still servers running software That’s why technical people often dislike the term.) Startups have begun to use cloud computing in their early stages because it makes economic and technical sense But at some point the economic benefits start to dwindle, and you have more control by hosting and managing your website on your own servers Languages, Tools and Frameworks Developer websites are full of debates between partisans of the various competing technologies We won’t go into all of that here—for more detailed information, turn to other O’Reilly reports—but here’s a quick guide to key technologies and services for a business audience, and their relevance for web operations Web programming platforms These are a mix of programming, development tools and frameworks for developing internet applications “They hand you a way to solve all the problems Need to use a database? Here’s how to that Need to email? Here’s how to that,” says Jacob Ruby on Rails is an open source framework for rapid web application development written in the Ruby programming language, while the older Java platform consists of a combination of software and specifications that provide a way to develop applications that run across different platforms Visual Basic performs a similar role in the Microsoft environment Ruby on Rails is good at rapid programming and prototyping, says Allspaw, but like Jacob, he notes there are lingering questions out in the field as to whether systems built using Ruby on Rails can scale Ruby on Rails can scale, Jacob asserts, but it’s sometimes more expensive to so The Java environment is considered more cumbersome and complicated, but it scales well and has traction in established companies, says Jacob The choice of Java or Ruby on Rails comes down to speed, the skills of the people in your company, and confidence that your company can make Ruby on Rails-built web apps scale 34 : Programming languages From a web operations point of view, the main issue when choosing a programming language is the ability to change the code after the application is written—an issue when it comes time to scale or trouble shoot Explains Allspaw, one of the reasons people don’t write web apps in C is it’s a difficult code base for making frequent changes Users of the LAMP stack have a choice of several suitable programming languages, most notably PHP, Perl or Python They’ve been sprouting Rails-like frameworks to help developers: Python has Jango and Perl has Catalyst All of them, says Jacob, can build large scale, well-architected applications The main reason to choose them will be which language your developers are familiar with, and their relative strengths or weaknesses for building certain types of applications Rich internet applications Developers use different technologies and techniques to create the maneuverable images, slideshows, moving advertisements, and other interactive website features known as Rich Internet Applications Among them are Ajax (a set of techniques using Asynchronous JavaScript and XML), JavaScript, Flash, and Silverlight These applications can be large and take a long time to download, so websites with RIAs can take a big performance hit and frustrate users if they aren’t designed well They also complicate the business of measuring website download times, because they start downloading after the rest of the page Javascript libraries As with programming languages, JavaScript has spawned libraries of reusable code modules programmers can use for common functions or tasks, like creating menus and windows or manipulating cookies The purpose is to speed up development cycles rather than improve performance, according to Souders There are several such libraries, with names like Dojo, JQuery, YUI, Prototype, Scriptaculous, and EXT Which to choose? Souders says a friend at Google did a survey and found there’s not much difference in terms of features But to the best of his knowledge, no one has done a very good study on how these libraries perform : 35 Release 2.0.9 July 2008 Appendix: The Technologies of Web Operations Allan Alter Development, debugging and profiling tools Browsers and internet applications have their own tools for analyzing, debugging, and optimizing code, boosting performance by finding bugs or code that can slow down sites Visual Studio provides a set of tools for Internet Explorer; other IE tools include HTTPWatch (a debugger) and Pagetest (a web page performance tool) Firebug (and the YSlow for Firebug web page performance tool) work in Firefox Aptana provides tools for Ajax, Venkman is a JavaScript debugger Fiddler is able to log HTTP traffic in any browser, These are useful tools, but there’s a big problem: there’s not a good development environment that works cross browser and cross platform And since most users use Microsoft IE, but most web developers use Firefox or Apple’s Safari browser, people develop all this great code that doesn’t work or has bugs in IE Operations and Performance Services Companies often use services to improve website performance Two of the main kinds are content delivery networks like Akamai and Panther Express, and monitoring services like Gomez, Keynote and Webmetrics Content delivery networks As a website attracts users from across America or around the world, one of the best ways to prevent response time deterioration is to move the servers closer to the users A content delivery network is the solution With a collection of servers scattered around the world, containing a cache of identical content, “the user in Malaysia doesn’t have to pull the Flickr logo on their page from Texas; there’s a server closer to that person,” says Allspaw Another benefit: if one data center goes down, others can pick up the load While some large companies create their own CDN, many use a CDN service provider like Akamai Technologies, Panther Express, and Limelight The performance gains can be significant: according to Souders, Yahoo’s shopping site saw an overall 20 percent cut in response time by moving the static components of its site to a CDN The drawbacks: CDN response times can be affected by traffic coming and going to other websites managed by the CDN vendor And if a CDN is having performance problems, so will your website 36 : Monitoring services Most large companies monitor the performance of the website from within their own data center That doesn’t help them measure performance from the user’s point of view—the view that really matters to an online business and its advertising clientele Several major companies provide outside monitoring services aimed at capturing the customer’s experience, including Keynote and Gomez According to Eric Goldsmith, these firms measure website performance at different geographic locations or over different network service providers The results can be revealing: Goldsmith recalls that it was an “eye opener” for AOL, which had its data centers in the east coast, when it saw the difference in performance between the east and west coasts Optimization Tools, Techniques, and Technologies While programming tools tend to fall under the area of development, operations staffs have their own toolkits for speeding up websites and improving reliability Managers should make sure own staff uses these tools Virtualization Server and storage virtualization and consolidation has become one of the most significant trends in IT Virtualization lets companies use fewer machines, saving money on maintenance, electricity and even real estate The savings are proving to be so large that spending on server and storage virtualization will grow 20.1% in 2008, according to CIO Insight The advantage for web operations: Virtualization “enables organizations to shift away from the limits of physical hardware and toward infrastructure as a utility computing resource,” says Robbins, enabling better use of existing resources and improving resilience to typical hardware failure Load optimization As with virtualization, this technique lets companies make the most of their server resources; it also helps prevent server crashes Schurman compares load optimization to a traffic cop in front of servers: If one machine in a data center is overloaded with requests and data, load balancing (which is done with network devices or software) distributes the load to other machines The same can be done for entire data centers; an overloaded data center can move some of its workload to underutilized ones : 37 Release 2.0.9 July 2008 Appendix: The Technologies of Web Operations Allan Alter Image optimization Image optimization is all about reducing the amount of data required to display an image One way is to combine many images into one graphic file Schurman recalls a time when he and his colleagues at Microsoft Live Search took 10 different images and combined them into one graphic, and found the download time for the one combined graphic file was just 25 percent of the 10 seperate images Other image optimization tools can reduce the size of JPEG, GIF, and PNG files (A list of optimization techniques can be found at http://developer.yahoo.com/performance/rules.html#opt_images) Minification In his book, Souders defines minification as the practice of removing unnecessary characters from code to reduce its size, thereby improving load times Many websites using JavaScript, even very large ones, neglect to minify, according to Souders Minification tools like Dojo Compressor and JSMin have been found to reduce the size of JavaScript files by up to 25% Emerging Performance—Improvement Techniques Web ops managers are always looking for new ways to improve performance Among some at the cutting edge: Memcached Memcached is a widely used high-performance distributed caching system which dramatically accelerates web application performance while decreasing database load It works by caching the results of database queries in memory and spread across a number of servers Some websites have significantly reduced expensive database load, providing a major incentive for larger sites to use it It is broadly supported by most languages and frameworks including PHP, Java, Python, Perl, Ruby, ASP.NET, Scheme, and others 38 : Implementing memcached for critical applications requires careful planning Theo Schlossnagle says most people who use it are unaware it can introduce inaccuracies to the architecture Take a financial news website: “The problem is the database stores the authoritative version of the article Editors make the change, but the old copy (still in cache) is being served for five minutes If your article is a stock quote, your application is now useless to your audience You have to be careful.” Server containers (a/k/a container farms) These are server farms on a vast scale: Sun, Rackable, and other vendors are offering shipping containers packed with servers and the necessary cooling systems; Dell and IBM are reported to be looking into them as well Server containers can be placed near hydroelectric plants where energy is cheap, or used to reduce response times by placing them in spots where user traffic is highest, such as major cities Microsoft plans to use server containers in its new $500 million facility in Nortthlake, Illinois, and Google has received a patent for such a “mobile data center.” nn : 39 Release 2.0.9 July 2008 Calendar Calendar A selection of significant public events over the next few months June 23–24 Velocity Web Performance and Operations Conference (Burlingame, CA) http://conferences.oreilly.com/velocity/ O’Reilly debuts a new conference dedicated to Web performance and operations Are you building at Internet scale? July 20–22 Ubuntu Live (Portland, OR) http://www.ubuntulive.com/ Find out the latest about the most popular Linux distribution And while you’re in Portland… July 21–25 O’Reilly Open Source Convention (Portland, OR) http://conferences.oreilly.com/oscon Join more than 2,500 open source developers, experts, and gurus July 21–23 Brainstorm Tech (Half Moon Bay, CA) http://www.timeinc.net/fortune/conferences/brainstormtech/tech_home.html FORTUNE’s David Kirkpatrick brings together “an invited group of superb tech thinkers and leaders with smart people from other arenas for two days of intense and creative interaction.” July 23 EconCeleb (Hollywood, CA) http://www.contentnext.com/econceleb Rafat Ali and the paidContent gang celebrate and dissect the economics of online celebrity content Wear sunglasses July 24–26 Scratch@MIT (Cambridge, MA) http://scratch.mit.edu/conference/ Educators, parents, and researchers catch up on the latest regarding the graphical-programming language for children and teens that has been called “the YouTube of interactive media.” 40 : August 2–7 Black Hat USA (Las Vegas, NV) http://blackhat.com/ Find out about the latest in security before it’s too late Will what you learn in Vegas stay in Vegas? And Defcon (http://defcon.org/) comes right after, August 8–10 September 16–19 Web 2.0 Expo NY (New York, NY) http://ny.web2expo.com/ Why should Silicon Valley have all the fun? The megaconference comes to the Javits Center October 21–23 Web 2.0 Expo Europe (Berlin, DE) http://europe.web2expo.com/ The meme that rules us all moves to the old country October 23–25 Pop!Tech (Camden, ME) http://poptech.org/ This year’s event pivots on the relationship between scarcity and abundance October 24–28 Singularity ’08 (be there, wherever you are) http://Singularity08.com It’s a three-day online conference And, if the singularity (http://en.wikipedia org/wiki/Technological_singularity) occurs by then, we’ll all be there anyway November 5–7 Web 2.0 Summit (San Francisco, CA) http://web2summit.com/ Last year this conference examined the web’s edge So what happens when we go over the edge? : 41 Release 2.0 Subscription Form Complete this form and join the other industry executives who rely on Release 2.0 to stay ahead of the headlines You can also subscribe online at http://r2.oreilly.com Your annual Release 2.0 subscription costs $495 per year, and includes both the print and electronic versions of six every-other-month issues, access to the complete Release 1.0 and Release 2.0 archives, a discount on attendance to O’Reilly conferences, and full access to our website, http://r2.oreilly.com name title company address city state zip country telephone fax email (personal email required for electronic access) url n My colleagues should read Release 2.0, too! Send me information about multiple copy subscriptions and electronic site licenses n check enclosed n charge my (check one) n american express n mastercard card number expiration date name and billing address n same as above n see below n visa cvs code name address city state zip country signature Please fax this form to 1.818.487.4501 or mail it to Release 2.0, P.O Box 17046, North Hollywood CA 91615-9588 Payment must be included with this form Your satisfaction is guaranteed or your money back If you have any questions, please call us at 1.800.889.8969 or 1.707.827.7019 or email us at r2@oreilly.com