Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 59 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
59
Dung lượng
6,31 MB
Nội dung
ptg5994185 388 CHAPTER 25 CACHING FOR PERFORMANCE AND SCALE Caching Software Adequately covering even a portion of the caching software that is available both from vendors and the open source communities is beyond the scope of this chapter. However, there are some points that should be covered to guide you in your search for the right caching software for your company’s needs. The first point is that you should thoroughly understand your application and user demands. Running a site with multiple GB per second of traffic requires a much more robust and enterprise- class caching solution than does a small site serving 10MB per second of traffic. Are you projecting a doubling of requests or users or traffic every month? Are you intro- ducing a brand-new video product line that is going to completely change that type and need for caching? These are the types of questions you need to ask yourself before you start shopping the Web for a solution, or you could easily fall into the trap of making your problem fit the solution. The second point addresses the difference between add-on features and purpose- built solutions and is applicable to both hardware and software solutions. To under- stand the difference, let’s discuss the life cycle of a typical technology product. A product usually starts out as a unique technology that sells and gains traction, or is adopted in the case of open source, as a result of its innovation and benefit within its target market. Over time, this product becomes less unique and eventually commod- itized, meaning everyone sells essentially the same product with the primary differen- tiation being price. High tech companies generally don’t like selling commodity products because the profit margins continue to get squeezed each year. And open source communities are usually passionate about their software and want to see it continue to serve a purpose. The way to prevent the margin squeeze or the move into the history books is to add features to the product. The more “value” the vendor adds the more the vendor can keep the price high. The problem with this is that these add-on features are almost always inferior to purpose-built products designed to solve this one specific problem. An example of this can be seen in comparing the performance of mod_cache in Apache as an add-on feature with that of the purpose-built product memcached. This is not to belittle or take away anything from Apache, which is a very common open source Web server that is developed and maintained by an open community of devel- opers known as the Apache Software Foundation. The application is available for a wide variety of operating systems and has been the most popular Web server on the World Wide Web since 1996. The Apache module, mod_cache, implements an HTTP content cache that can be used to cache either local or proxied content. This module is one of hundreds available for Apache, and it absolutely serves a purpose, but when you need an object cache that is distributed and fault tolerant, there are better solu- tions such as memcached. Application caches are extensive in their types, implementations, and configura- tions. You should first become familiar with the current and future requirements of ptg5994185 CONTENT DELIVERY NETWORKS 389 your application. Then, you should make sure you understand the differences between add-on features and purpose-built solutions. With theses two pieces of knowledge, you are ready to make a good decision when it comes to the ideal caching solution for your application. Content Delivery Networks The last type of caching that we are going to cover in this chapter is the content deliv- ery networks (CDNs). This level of caching is used to push any of your content that is cacheable closer to the end user. The benefits of this include faster response time and fewer requests on your servers. The implementation of a CDN is varied but most generically can be thought of as a network of gateway caches located in many differ- ent geographical areas and residing on many different Internet peering networks. Many CDNs use the Internet as their backbone and offer their servers to host your content. Others, to provide higher availability and differentiate themselves, have built their own network point to point between their hosting locations. The advantages of CDNs are that they speed up response time, off load requests from your application’s origin servers, and possibly lower delivery cost, although this is not always the case. The concept is that the total capacity of the CDN’s strategi- cally placed servers can yield a higher capacity and availability than the network backbone. The reason for this is that if there is a network constraint or bottleneck, the total throughput is limited. When these are eliminated by placing CDN servers on the edge of the network, the total capacity is increased and overall availability increases as well. The way this works is that you place the CDN’s domain as an alias for your server by using a canonical name (CNAME) in your DNS entry. A sample entry might look like this: ads.akfpartners.com CNAME ads.akfpartners.akfcdn.net Here, we have our CDN, akfcdn.net , as an alias for our subdomain ads.akfpart- ners.com . The CDN alias could then be requested by the application, and as long as the cache was valid, it would be served from the CDN and not our origin servers for our system. The CDN gateway servers would periodically make requests to our application origin servers to ensure that the data, content, or Web pages that they have in cache is up-to-date. If the cache is out-of-date, the new content is distributed through the CDN to their edge servers. Today, CDNs offer a wide variety of services in addition to the primary service of caching your content closer to the end user. These services include DNS replacement, geo-load balancing, which is serving content to users based on their geographical location, and even application monitoring. All of these services are becoming more commoditized as more providers enter into the market. In addition to commercial ptg5994185 390 CHAPTER 25 CACHING FOR PERFORMANCE AND SCALE CDNs, there are more peer-to-peer P2P services being utilized for content delivery to end users to minimize the bandwidth and server utilization from providers. Conclusion In this chapter, we started off by explaining the concept that the best way to handle large amounts of traffic is to avoid handling them in the first place. You can best do this by utilizing caching. In this manner, caching can be one of the best tools in your tool box for ensuring scalability. We identified that there are numerous forms of caching already present in our environments, ranging from CPU cache to DNS cache to Web browser caches. In this chapter, we wanted to focus primarily on three levels of caching that are most under your control from an architectural perspective. These are caching at the object, application, and content delivery network levels. We started with a primer on caching in general and covered the tag-datum struc- ture of caches and how they are similar to buffers. We also covered the terminology of cache-hit, cache-miss, and hit-ratio. We discussed the various refreshing methodol- ogies of batch and upon cache-miss as well as caching algorithms such as LRU and MRU. We finished the introductory section with a comparison of write-through ver- sus write-back methods of manipulating the data stored in cache. The first type of cache that we discussed was the object cache. These are caches used to store objects for the application to be reused. Objects stored within the cache usually come from either a database or have been generated by the application. These objects are serialized to be placed into cache. For object caches to be used, the appli- cation must be aware of them and have implemented methods to manipulate the cache. The database is the first place to look to offset load through the use of an object cache, because it is generally the slowest and most expensive of your applica- tion tiers; but the application tier is often a target as well. The next type of cache that we discussed was the application cache. We covered two varieties of application caching: proxy caching and reverse proxy caching. The basic premise of application caching is that you desire to speed up performance or minimize resources used. Proxy caching is used for a limited number of users request- ing an unlimited number of Web pages. This type of caching is often employed by Internet service providers or local area networks such as in schools and corporations. The other type of application caching we covered was the reverse proxy cache. A reverse proxy cache is used for an unlimited number of users or requestors and for a limited number of sites or applications. These are most often implemented by system owners in order to off load the requests on their application origin servers. The last type of caching that we covered was the content delivery networks (CDNs). The general principle of this level of caching is to push content that is cache- ptg5994185 CONCLUSION 391 able closer to the end user. The benefits include faster response time and fewer requests on the origin servers. CDNs are implemented as a network of gateway caches in different geographical areas utilizing different ISPs. No matter what type of service or application you provide, it is important to understand the various methods of caching in order that you choose the right type of cache. There is almost always a caching type or level that makes sense with Web 2.0 and SaaS systems. Key Points • The most easily scalable traffic is the type that never touches the application because it is serviced by cache. • There are many layers to consider adding caching, each with pros and cons. • Buffers are similar to caches and can be used for performance, such as when reordering of data is required before writing to disk. • The structure of a cache is very similar to data structures, such as arrays with key-value pairs. In a cache, these tuples or entries are called tags and datum. • A cache is used for the temporary storage of data that is likely to be accessed again, such as when the same data is read over and over without the data changing. • When the requesting application or user finds the data that it is asking for in the cache this is called a cache-hit. • When the data is not present in the cache, the application must go to the pri- mary source to retrieve the data. Not finding the data in the cache is called a cache-miss. • The number of hits to requests is called a cache ratio or hit ratio. • The use of an object cache makes sense if you have a piece of data either in the database or in the application server that gets accessed frequently but is updated infrequently. • The database is the first place to look to offset load because it is generally the slowest and most expensive of your application tiers. • A reverse proxy cache is opposite in that it caches for an unlimited number of users or requestors and for a limited number of sites or applications. • Another term used for reverse proxy caches is gateway caches. • Reverse proxy caches are most often implemented by system owners themselves in order to off load the requests on their Web servers. • Many CDNs use the Internet as their backbone and offer their servers to host your content. ptg5994185 392 CHAPTER 25 CACHING FOR PERFORMANCE AND SCALE • Others, in order to provide higher availability and differentiate themselves, have built their own network point to point between their hosting locations. • The advantages of CDNs are that they lower delivery cost, speed up response time, and off load requests from your application’s origin servers. ptg5994185 393 Chapter 26 Asynchronous Design for Scale In all fighting, the direct method may be used for joining battle, but indirect methods will be needed in order to secure victory. —Sun Tzu This last chapter in Part III, Architecting Scalable Solutions, will address an often overlooked problem when developing services or product—that is, overlooked until it becomes a noticeable and costly inhibitor to scaling. This problem is the use of syn- chronous calls in the application. We will explore the reasons that most developers over- look asynchronous calls as a scaling principle and how converting synchronous calls to asynchronous ones can greatly improve the scalability and availability of the system. We will explore the use of state in applications including why it is used, how it is often used, why it can be problematic, and how to make the best of it when neces- sary. Examining the need for state and eliminating it where possible will pay huge dividends within your architecture if it is not already a problem. If it already is a problem in your system, this chapter will give you some tools to fix it. Synching Up on Synchronization Let’s start our discussion by covering some of the basics of synchronization, starting with a definition and some different types of synchronization methods. The process of synchronization refers to the use and coordination of simultaneously executed threads or processes that are part of an overall task. These processes must run in the correct order to avoid a race condition or erroneous results. Stated another way, syn- chronization is when two or more pieces of work must be in a specific order to accomplish a task. An example is a login task. First, the user’s password must be encrypted; then it must be compared against the encrypted version in the database; then the session data must be updated marking the user as authenticated; then the welcome page must be generated; and finally the welcome page must be presented. If ptg5994185 394 CHAPTER 26 ASYNCHRONOUS DESIGN FOR SCALE any of those pieces of work are done out of order, the task of logging the user in fails to get accomplished. There are many types of synchronization processes that take place in program- ming. One that all developers should be familiar with is the mutex or mutual exclu- sion. Mutex refers to how global resources are protected from concurrently running processes to ensure only one process is updating or accessing the resource at a time. This is often accomplished through semaphores, which is kind of a fancy flag. Sema- phores are variables or data types that mark or flag a resource as being in use or free. Another classic synchronization method is known as thread join. Thread join is when a process is blocked from executing until a thread terminates. After the thread termi- nates, the other process is free to continue. An example would be for a parent pro- cess, such as a “look up,” to start executing. The parent process kicks off a child process to retrieve the location of the data that it is going to look up, and this child thread is “joined.” This means that the parent process cannot complete until the child process terminates. Dining Philosophers Problem This analogy is credited to Sir Charles Anthony Richard Hoare (a.k.a. Tony Hoare), as in the person who invented the Quicksort algorithm. This analogy is used as an illustrative example of resource contention and deadlock. The story goes that there were five philosophers sitting around a table with a bowl of spaghetti in the middle. Each philosopher had a fork to his left, and therefore each had one to his right. The philosophers could either think or eat, but not both. Additionally, in order to serve and eat the spaghetti, each philosopher required the use of two forks. Without any coordination, it is possible that all the philosophers pick up their forks simul- taneously and therefore no one has two forks in which to serve or eat. This analogy is used to show that without synchronization the five philosophers could remain stalled indefinitely and starve just as five computer processes waiting for a resource could all enter into a deadlocked state. There are many ways to solve such a dilemma. One is to have a rule that each philosopher when reaching a deadlock state will place his fork down, freeing up a resource, and think for a random time. If this solution sounds familiar, it might be because it is the basic idea of retransmission that takes place in the Transmission Control Pro- tocol (TCP). When no acknowledgement for data is received, a timer is started to wait for a retry. The amount of time is adjusted by the smoothed round trip time algorithm and doubled after each unsuccessful retry. As you might expect, there are many other types of synchronization processes and methods that are employed in programming. We’re not presenting an exhaustive list ptg5994185 SYNCHRONOUS VERSUS ASYNCHRONOUS CALLS 395 but rather attempting to give you an overall understanding that synchronization is used throughout programming in many different ways. Eliminating synchronization is not possible, nor would it be advisable. It is, however, prudent to understand the purpose and cost of synchronization so that when you use it you do so wisely. Synchronous Versus Asynchronous Calls Now that we have a basic definition and some examples of synchronization, we can move on to a broader discussion of synchronous versus asynchronous calls within the application. Synchronous calls perform their action completely by the time the call returns. If a method is called and control is given to this method to execute, the point in the application that made the call is not given control back until the method has completed its execution and returned either successfully or with an error. In other words, synchronous methods are called, they execute, and when they finish, you get control back. As an example of a synchronous method, let’s look at a method called query_exec from AllScale’s human resource management (HRM) service. This method is used to build and execute a dynamic database query. One step in the query_exec method is to establish a database connection. The query_exec method does not continue executing without explicit acknowledgement of successful comple- tion of this database connection task. Doing so would be a waste of resources and time. If the database is not available, the application should not waste time creating the query and waiting for it to become available. Indeed, if the database is not avail- able, the team should reread Chapter 24, Splitting Databases for Scale, on how to scale the database so that there is improved availability. Nevertheless, this is an example of how synchronous calls work. The originating call is halted and not allowed to complete until the invoked process returns. A nontechnical example of synchronicity is communication between two individu- als either in a face-to-face fashion or over a phone line. If both individuals are engaged in meaningful conversation, there is not likely to be any other action going on. One individual cannot easily start another conversation with another individual without first stopping the conversation with the first person. Phone lines are held open until one or both callers terminate the call. Contrast the synchronous methods or threads with an asynchronous method. With an asynchronous method call, the method is called to execute in a new thread, and it immediately returns control back to the thread that called it. The design pat- tern that describes the asynchronous method call is known as the asynchronous design, or the asynchronous method invocation (AMI). The asynchronous call con- tinues to execute in another thread and terminates either successfully or with error without further interaction with the initiating thread. Let’s turn back to our AllScale ptg5994185 396 CHAPTER 26 ASYNCHRONOUS DESIGN FOR SCALE example with the query_exec method. After calling synchronously for the database connection, the method needs to prepare and execute the query. In the HRM system, AllScale has a monitoring framework that allows them to note the duration and suc- cess of all queries by asynchronously calling a method for start_query_time and end_query_time. These methods store a system time in memory and wait for the end call to be placed in order to calculate duration. The duration is then stored in a mon- itoring database that can be queried to understand how well the system is performing in terms of query run time. Monitoring the query performance is important but not as important as actually servicing the users’ requests. Therefore, the calls to the mon- itoring methods of start_query_time and end_query_time are done asynchronously. If they succeed and return, great—AllScale’s operations and engineering teams get the query time in the monitoring database. If the monitoring calls fail or get delayed for 20 seconds waiting on the monitoring database connection, they don’t care. The user query continues on without any concern over the asynchronous calls. Returning to our communication example, email is a great example of asynchro- nous communication. You write an email and send it, immediately moving on to another task, which may be another email, a round of golf, or whatever. When the response comes in, at an appropriate time, you read the response and potentially issue yet another email in response. The communication chain blocks neither the sender nor receiver for anything but the time to process the communication and issue a response. Scaling Synchronously or Asynchronously Now we understand the difference between synchronous and asynchronous calls. Why does this matter? The answer lies in scalability. Synchronous calls, if used exces- sively or incorrectly, cause undue burden on the system and prevent it from scaling. Let’s continue with our query_exec example where we were trying to execute a user’s query. If we had implemented the two monitoring calls synchronously using the rationale that (1) monitoring is important, (2) the monitoring methods are very quick, and (3) even if we slow down a user query what’s the worst that could happen. These are all good intentions, but they are wrong. As we stated earlier, monitoring is important but it is not more important than returning a user’s query. The monitoring methods might be very quick, when the monitoring database is operational, but what happens when it has a hardware failure and is inaccessible? The monitoring queries back up waiting to time out. This means the users’ queries are blocked waiting for completion of the monitoring queries and are in turn backed up. When the user que- ries are slowed down or temporarily halted waiting for a time out, it is still taking up a database connection on the user database and is still consuming memory on the application server trying to execute this thread. As more and more user threads start stalling waiting for their monitoring calls to time out, the user database might run out of connections preventing other nonmonitored queries from executing, and the ptg5994185 SYNCHRONOUS VERSUS ASYNCHRONOUS CALLS 397 threads on the app servers get written to disk to free up memory, which causes swap- ping on the app servers. This swapping in turn slows down all processing and may result in the TCP stack of the app server reaching some maximum limit and refusing subsequent connections. Ultimately, new user requests are not processed and users sit waiting for browser or application timeouts. Your application or platform is essen- tially “down.” As you see, this ugly chain of events can quite easily occur because of a simple oversight on whether a call should be synchronous or asynchronous. The worst thing about this scenario is the root cause can be elusive. As we step through the chain it is relatively easy to follow but when the symptoms of a problem are that your system’s Web pages start loading slowly and over the next 15 minutes this con- tinues to get worse and worse until finally the entire system grinds to a halt, diagnos- ing the problem can be very difficult. Hopefully, you have sufficient monitoring in place to help you diagnose these types of problems, but these extended chains of events can be very daunting to unravel when your site is down and you are frantic to get it back into service. Despite the fact that synchronous calls can be problematic if used incorrectly or excessively, method calls are very often done synchronously. Why is this? The answer is that synchronous calls are simpler than asynchronous calls. “But wait!” you say. “Yes, they are simpler but often times our methods require that the other methods invoked do successfully complete and therefore we can’t put a bunch of asynchro- nous calls in our system.” Ah, yes; good point. There are many times when you do need an invoked method to complete and you need to know the status of that in order to continue along your thread. We are not going to tell you that all synchro- nous calls are bad; in fact, many are necessary and make the developer’s life a thou- sand times less complicated. However, there are times when asynchronous calls can and should be used in place of synchronous calls, even when there is dependency as described earlier. If the main thread could care less whether the invoked thread fin- ishes, such as with the monitoring calls, a simple asynchronous call is all that is required. If, however, you require some information from the invoked thread, but you don’t want to stop the primary thread from executing, there are ways to use call- backs to retrieve this information. An in-depth discussion of callbacks are beyond the scope of this chapter. An example of callback functionality is interrupt handlers in operating systems that report on hardware conditions. Asynchronous Coordination Asynchronous coordination and communication between the original method and the invoked method requires a mechanism that the original method determines when or if a called method has completed executing. Callbacks are methods passed as an argument to other methods and allow for the decoupling of different layers in the code. [...]... intermediate information for another distributed process to compile The input key might be the name of a document, or remembering that this is a document, the name, or pointer to a piece of a document The value could be content consisting of all the words within the document itself In our distributed inventory system, the key might be the inventory location and the value all of the names of inventory... parenthetically that this pseudocode could work for both the word count example (also given by Google) and the distributed parts inventory example Only one or the other would exist in reality for your application and you would eliminate the parenthesis The following input_key and input_values and output keys and values are presented in Figure 27.1 The first example is a set of phrases including the. .. Input and Output Key-Value Pairs for Inventory in Different Locations Note here how Map takes each of the documents and simply emits each word with a count of 1 as we move through the document For the sake of speed, we had a separate Map process working on each of the documents Figure 27.2 shows the output of this process Again, we have taken each of our initial key-value pairs with the key being the. .. Storage costs to store the data • People and software to manage the storage • Power and space to make the storage work • Capital to ensure the proper power infrastructure • Processing power to traverse the data • Backup time and costs Data isn’t just about the physical storage, and sometimes the other costs identified here can even eclipse the actual cost of storage The Value of Data and the Cost-Value Dilemma... available Then, the scheduler would establish a synchronous stream of communication between itself and the mail server to pass all the information about the job and monitor the job while it completed When all the mail servers were running under maximum capacity, and there were the proper number of schedulers for the number of mail servers, everything worked fine When mail slowed down because of an excessive... problems of this nature Depending on the monitoring system, it is likely that the first alert comes from the slowdown of the site and not the mail servers If that occurs, it is natural that everyone start looking at why the site is slowing down the mail servers instead of the other way around These problems can take a while to unravel and decipher Another reason to analyze and remove synchronous calls is the. .. adding servers If a user arrives on one server for this request and on another server for the next request, how would each machine know the current state of the user? If your application is split along the Y-axis and the login service is running in a completely different pool than the report service, how does each of these services know the state of the other? These are all questions that arise when trying... location of the inventory and the value being the individual components listed with one listing for each occurrence of that component per location The output is the name of the component and a value of 1 per each component listing Again, we used separate Map processes What is the value of such a construct? We can now feed these key-value pairs into a distributed process that will combine them and create... examples and alternatives for each The overall lesson that this chapter should impart on the reader is that there are reasons that we see engineers use synchronous calls and write stateful applications, some due to carefully considered reasons and others because of the nature of modern computational theory and languages The important point is that you should spend the time up front discussing these so... serializing the session data and then storing all of it in a cookie This session data must be transferred back and forth, marshalled/unmarshalled, and manipulated by the application, which can add up to lots of time required for this Remember that marshalling and unmarshalling are processes where the object is transformed into a data format suitable for transmitting or storing and converted back again Another . ptg5994 185 388 CHAPTER 25 CACHING FOR PERFORMANCE AND SCALE Caching Software Adequately covering even a portion of the caching software that is available both from vendors and the open source. need for caching? These are the types of questions you need to ask yourself before you start shopping the Web for a solution, or you could easily fall into the trap of making your problem fit the. the “avail- ability” of the string of lights was the product of the availability (1 the probability of failure) of all the lights. If any light had a 99.999% availability or a 0.001% chance of