Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 59 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
59
Dung lượng
6,09 MB
Nội dung
ptg5994185 270 CHAPTER 17 PERFORMANCE AND STRESS TESTING Performance and Stress Testing for Scalability We usually lead off our chapters with the rhetorical question of how a particular pro- cess could possibly have anything to do with scalability. This time, we’ve waited until we covered the processes in depth to have this discussion; hopefully, as a result, you can already start listing the reasons that performance testing and stress testing have a great place among the multitude of factors that affect scalability. The three areas that we are going to focus on for exploring the relationship are the headroom, change control, and managing risk. As we discussed in Chapter 11, Determining Headroom for Applications, it is crit- ical to scalability that you know where you are in terms of capacity for a particular service within your system. This is for you to calculate how much time and growth you have left to scale. This is fundamental for planning headroom or infrastructure projects, splitting databases/applications, and making budgets. The way to ensure your calculations remain accurate is to conduct performance testing on all your releases to ensure you are not introducing unexpected load increases. It is not uncom- mon for an organization to implement a maximum load increase allowed per release. As you start to become more sophisticated in capacity planning, you will come to see the load added by new features and functionality as a cost that must be accounted for in the cost/benefit analysis. Additionally, stress testing is necessary to ensure that the expected breakpoint or degradation curve is still at the same point as previously iden- tified. It is possible to leave the normal usage load unchanged but decrease the total load capacity through new code paths or changes in logic. For instance, an increase in a data structure lookup of 90 milliseconds would likely be unnoticed in total response time for a user’s request, but if this service is tied synchronously to other services, as the load builds, hundreds or thousands of 90-millisecond delays adds up to decrease the peak capacity that services can handle. When we talk about change management, as defined in Chapter 10, Controlling Change in Production Environments, we are really discussing more than the lightweight change identification process for small startup companies, but instead the fuller featured process by which a company is attempting to actively manage the changes that occur in their production environment. We defined change management as consisting of the fol- lowing components: change proposal, change approval, change scheduling, change implementation and logging, change validation, and change efficacy review. Performance testing and stress testing augment this change management process by providing a prac- tice implementation and most importantly a validation of the change. You would never expect to make a change without verifying that it actually affected the system the way that you think it should, such as fix a bug or provide a new piece of functionality. As part of performance and stress testing, we validate the expected results in a controlled envi- ronment prior to production. This is an additional step in ensuring that when the change is made in production it will also work as it did during testing under varying loads. ptg5994185 CONCLUSION 271 The most significant factor that we should consider when relating performance testing and stress testing to scalability is the management of risk. As outlined in Chapter 16, Determining Risk, risk management is one the most important processes when it comes to ensuring your systems will scale. The precursor to risk management is risk analysis, which attempts to calculate an amount of risk in various actions or components. Performance testing and stress testing are two methods that can signifi- cantly decrease the risk associated with a particular service change. For example, if we were using a failure mode and effects analysis tool and identified a failure mode of a particular feature to be the increase in query time, the mitigation recommended could be to test this feature under actual load conditions, as with a performance test, to determine the actual behavior. This could also be done with extreme load condi- tions as with a stress test to observe behavior above normal conditions. Both of these would provide much more information with regard to the actual performance of the feature and therefore would lower the amount of risk. These two testing processes are powerful tools when it comes to reducing and thus managing the amount of risk within the release or the overall system. From these three areas, headroom, change control, and risk management, we can see the inherent relationship between successful scalability of your system and the adoption of the performance and stress testing processes. As we cautioned previously in the discussion of the stress test, the creation of the test load is not easy, and if done poorly can lead to erroneous data. However, this does not mean that it is not worth pursuing the understanding, implementation, and (ultimately) mastery of these processes. Conclusion In this chapter, we discussed in detail the performance testing and stress testing pro- cesses. We also discussed how these processes related to scalability for the system. For the performance testing process, we defined a seven-step process. The key to the process is to be methodical and scientific about the testing. For the stress testing process, we defined an eight-step process. These were the basic steps we felt necessary to have a successful process. It was suggested that other steps be added as necessary for the proper fit within your organization. We concluded this chapter with a discussion on how performance testing and stress testing fit with scalability. We concluded that based on the relationship between these testing processes and three factors (headroom, change control, and risk man- agement), that have already been established as being causal to scalability, these pro- cesses too are directly responsible for scalability. ptg5994185 272 CHAPTER 17 PERFORMANCE AND STRESS TESTING Key Points • Performance testing covers a broad range of engineering evaluations where the emphasis is on the final measurable performance characteristic. • The goal of performance testing is to identify, document, and where possible eliminate bottlenecks in the system. • Load testing is a process used in performance testing. • Load testing is the process of putting load or user demand on a system in order to measure its response and stability. • The purpose of load testing is to verify that the application can meet a desired performance objective often specified as a service level agreement (SLA). • Load and performance testing are not substitutes for proper architecture. • The seven steps of performance testing are as follows: 1. Establish the criteria expected from the application. 2. Establish the proper testing environment. 3. Define the right test to perform. 4. Execute the tests. 5. Analyze the data. 6. Report to the engineers. 7. Repeat as necessary. • Stress testing is a process that is used to determine an application’s stability when subjected to above normal loads. • Stress testing, as opposed to load testing, goes well beyond the normal traffic, often to the breaking point of the application, in order to observe the behaviors. • The eight steps of stress testing are as follows: 1. Identify the objectives of the test. 2. Choose the key services for testing. 3. Determine how much load is required. 4. Establish the proper test environment. 5. Identify what must be monitored. 6. Actually create the test load. 7. Execute the tests. 8. Analyze the data. • Performance testing and stress testing impact scalability through the areas of headroom, change control, and risk management. ptg5994185 273 Chapter 18 Barrier Conditions and Rollback He will conquer who has learned the artifice of deviation. Such is the art of maneuvering. —Sun Tzu Whether you develop with an agile methodology, a classic waterfall methodology, or some hybrid, good processes for the promotion of systems into your production envi- ronment have the capability of protecting you from significant failures; whereas poor processes may end up damning you to near certain technical death. Checkpoints and barrier conditions within your product development life cycle can increase quality and reduce the cost of developing your product by detecting early when you are off course. But processes alone are not always enough. Even the best of teams, with the best pro- cesses and great technology make mistakes and incorrectly analyze the results of certain tests or reviews. If your platform implements a service, either Software as a Service play or a traditional back office IT system, you need to be able to quickly roll back significant releases to keep scale related events from creating availability incidents. Developing effective go/no-go processes or barrier conditions, ideally within a fault isolative infrastructure, and coupling them with a process and capability to roll back production changes, are necessary components within any highly available ser- vice and are critical to the success of your scalability goals. The companies focused most intensely on cost effectively scaling their systems while guaranteeing high avail- ability create several checkpoints in their development processes. These checkpoints are an attempt to guarantee the lowest probability of a scalability related event and to minimize the impact of that event should it occur. They also make sure that they can quickly get out of any event created through recent changes by ensuring that they can always roll back from any major change. ptg5994185 274 CHAPTER 18 BARRIER CONDITIONS AND ROLLBACK Barrier Conditions You might read this heading and immediately assume that we are proposing that waterfall development cycles are the key to success within highly scalable environ- ments. Very often, barrier conditions or entry and exit criteria are associated with the phases of waterfall development and sometimes identified as a reason for the inflexi- bility of a waterfall development model. Our intent here is not to promote the water- fall methodology, but rather to discuss the need for standards and protective measures regardless of your approach to development. For the purposes of this dis- cussion, assume that a barrier condition is a standard against which you measure suc- cess or failure within your development life cycle. Ideally, you want to have these conditions or checkpoints established within your cycle to help you decide whether you are indeed on the right path for the product or enhancements that you are devel- oping. Remember our discussion on goals in Chapters 4, Leadership 101, and 5, Management 101, and the need to establish and measure these goals. Barrier condi- tions are static goals within a development at regular “heartbeats” to ensure that what you are developing aligns with your vision and need. Barrier conditions for scalability might include desk checking a design against your architectural principles within an Architecture Review Board before the design is implemented, code review- ing the implementation to ensure it is consistent with the design, or performance test- ing an implementation within QA and then measuring the impact to scalability upon release to the production environment. Example Scalability Barrier Conditions We often recommend that the following barrier conditions be inserted into your development methodology or life cycle. Each has a purpose to try to limit the probability of occurrence and resulting impact of any scalability issues within your production environment: 1. Architecture Review Board. From Chapter 14, Architecture Review Board, the ARB exists to ensure that designs are consistent with architectural principles. Architectural princi- ples, in turn, ideally address one or more key scalability tenets within your platform. The intent of this barrier is to ensure that time isn’t wasted implementing or developing sys- tems that are difficult or impossible to scale to your needs. 2. Code Reviews. Modifying what is hopefully an existing and robust code review process to include ensuring that architectural principles are followed within the implementation of the system in question is critical to ensuring that code can be fixed for scalability prob- lems before being identified within QA and being required to be fixed later. ptg5994185 BARRIER CONDITIONS 275 3. Performance Testing: From Chapter 17, Performance and Stress Testing, performance testing helps you identify potential issues of scale before introducing the system into a production environment and potentially impacting your customers with a scalability related issue. 4. Production Monitoring and Measurement. Ideally, your system has been designed to be monitored as discussed within Chapter 12, Exploring Architectural Principles. Even if it is not, capturing key performance data from both a user perspective, application perspec- tive, and system perspective after release and comparing it to previous releases can help you identify potential scalability related issues early before they impact your customers. Your processes may include additional barrier conditions that you’ve found useful over time, but we consider these to be the bare minimum to help manage the risk of releasing systems that negatively impact customers due to scalability related problems. Barrier Conditions and Agile Development In our practice, we have found that many of our clients have a mistaken perception that the including or defining standards, constraints, or processes in agile processes, is a violation of the agile mindset. The very notion that process runs counter to agile methodologies is flawed from the outset as any agile method is itself a process. Most often, we find the Agile Manifesto quoted out of context as a reason for eschewing any process or standard. 1 As a review, and from the Agile Manifesto, agile methodol- ogies value • Individuals and interactions over processes and tools • Working software over comprehensive documentation • Customer collaboration over contract negotiation • Responding to change over following a plan Organizations often take the “Individuals and interactions over processes and tools” out of context without reading the line that follows these bullets, which states, “That is, while there is value in the items from the right, we value the items on the left more.” 2 It is clear with this line that processes add value, but that people and interactions should take precedent over them where we need to make choices. We absolutely agree with this approach and prefer to inject process into agile development most often as barrier conditions to test for an appropriate level of quality, scalability, and availability, or to help ensure that engineers are properly evaluated and taught over time. Let’s examine how some key barrier conditions enhance our agile method. 1. This information is from the Agile Manifesto at www.agilemanifesto.org. 2. Ibid. ptg5994185 276 CHAPTER 18 BARRIER CONDITIONS AND ROLLBACK We’ll first start with valuing working software over comprehensive documenta- tion. None of the suggestions we’ve made from ARB and code reviews to perfor- mance testing and production measurement violate this rule. The barrier conditions represented by ARB and Joint Architecture Design (JAD) are used within agile meth- ods to ensure that the product under development can scale appropriately. ARB and JAD can be performed orally in a group and with limited documentation and there- fore are all consistent with the agile method. The inclusion of barrier conditions and standards to help ensure that systems and products work properly in production actually supports the development of working software. We have not defined comprehensive documentation as necessary in any of our proposed activities, although it is likely that the results of these activities will be logged somewhere. Remember, we are interested in improving our processes over time so logging performance results for instance will help us determine how often we are making mistakes in our development process that result in failed performance tests in QA or scalability issues within production. The processes we’ve suggested also do not in any way hinder customer collabora- tion or support contract negotiation over customer collaboration. In fact, one might argue that they foster a better working environment with the end customer in that by inserting scalability barrier conditions you are actually looking out for your cus- tomer’s needs. Your customer is not likely capable of performing the type of design evaluation, reviews, testing, or measuring that is necessary to determine if your prod- uct will scale to its needs. Your customer does, however, expect that you are deliver- ing a product or service that will meet not only its business objectives but its scalability needs as well. Collaborating to develop tests and measurements that will help ensure that your product meets customer needs and to insert those tests and measurements into your development process is a great way to take care of your cus- tomers and create shareholder value. Finally, the inclusion of the barrier conditions we’ve suggested helps us to respond to change by helping us identify when that change is occurring. The failure of a bar- rier condition is an early alert to issues that we need to address immediately. Identify- ing that a component is incapable of being scaled horizontally (scale out not up from our recommended architectural principles) in an ARB session is a good indication of potential issues for our customer. Although we may make the executive decision to launch the feature, product, or service, we had better ensure that future agile cycles are used to fix the issue we’ve identified. However, if the need for scale is so dramatic that a failure to scale out will keep us from being successful, should we not respond immediately to that issue and fix it? Without such a process and series of checks, how would we ensure that we are meeting our customer’s needs? Hopefully, we’ve convinced you that the addition of criteria against which you can evaluate the success of your scalability objectives is a good idea within your agile implementation. If we haven’t, please remember our “board of directors” test within ptg5994185 BARRIER CONDITIONS 277 Chapter 5, Management 101. Would you feel comfortable stating that you absolutely would not develop processes within your development life cycle to ensure that your products and services could scale? Imagine yourself saying, “In no way, shape, or form will we ever implement barrier conditions or criteria to ensure that we don’t release products with scalability problems!” How long do you think you would have a job? Cowboy Coding Development without any process, without any plans, and without measurements to ensure that the results meet the needs of the business is what we often refer to as cowboy coding. The complete lack of process in cowboy-like environments is a significant barrier to success for any scalability initiatives. Often, we find that teams attempt to claim that cowboy implementations are “agile.” This simply isn’t true. The agile methodology is a defined life cycle that is tailored to be adaptive to your needs over time, versus other models that tend to be more predictive. The absence of pro- cesses, such as any cowboy implementation, is neither adaptive nor predictive. Agile methodol- ogies are not arguments against measurement or management. They are methodologies tuned to release small components or subsets of functionality quickly. They were developed to help control chaos through managing small, easily managed components rather than trying to repeatedly fail at attempting to predict and control very large complex projects. Do not allow yourself or your team to fall prey to the misconception that agile methodologies should not be measured or managed. Using a metric such as velocity to improve the estimation ability of engineers but not to beat them up over, is a fundamental part of the agile methodol- ogy. A lack of measuring dooms you to never improving and a lack of managing dooms you to getting lost en route to your goals and vision. Being a cowboy when it comes to designing highly scalable solutions is a sure way to get thrown off of the bucking scalability bronco! Barrier Conditions and Waterfall Development The inclusion of barrier conditions within waterfall models is not a new concept. Most waterfall implementations include a concept of entry criteria and exit criteria for each phase of development. For instance, in a strict waterfall model, design may not start until the requirements phase is completed. The exit criteria for the require- ments phase in turn may include a signoff by key stakeholders and a review of requirements by the internal customer (or an external representative) and a review by the organizations responsible for producing those requirements. In modified, over- lapping, or hybrid waterfall models, requirements may need to be complete for the systems to be developed first but may not be complete for the entire product or sys- tem. If prototyping is employed, potentially those requirements need to be mocked up in a prototype before major design starts. ptg5994185 278 CHAPTER 18 BARRIER CONDITIONS AND ROLLBACK For our purposes, we need only inject the four processes we identified earlier into the existing barrier conditions. The Architecture Review Board lines up nicely as an exit criterion for the design phase of our project. Code reviews, including a review consistent with our architectural principles, might create exit criteria for our coding or implementation phase. Performance testing should be performed during the vali- dation or testing phase with requirements being that no more than a specific percent- age change be present for any critical system resources. Production measurements being defined and implemented should be the entry criteria for the maintenance phase and significant increases in any measured area if not expected should trigger work to reduce the impact of the implementation or changes in architecture to allow for more cost-effective scalability. Barrier Conditions and Hybrid Models Many companies have developed models that merge agile and waterfall methodolo- gies, and some continue to follow the predecessor to agile methods known as rapid application development (RAD). For instance, some companies may be required to develop software consistent with contracts and predefined requirements, such as those that interact with governmental organizations. These companies may wish to have some of the predictability of dates associated with a waterfall model, but desire to implement chunks of functionality quickly as in agile approaches. The question for these models is where to place the barrier conditions for the greatest benefit. To answer that question, we need to return to the objectives of the barrier conditions. Our intent with any barrier condition is to ensure that we catch problems or issues early in our development so that we reduce the amount of rework to meet our objectives. It costs us less in time and work, for instance, to catch a prob- lem in our QA organization than it does in our production environment. Similarly, it costs us less to catch an issue in ARB than to allow it to be implemented and caught in a code review. The answer to the question of where to place the barrier conditions, then, is to place the barrier conditions where they add the most value and incur the least cost to our processes. Code reviews should be placed at the completion of each coding cycle or at the completion of chunks of functionality. The architectural review should occur prior to the beginning of implementation, production metrics obviously need to occur within the production environment, and performance testing should happen prior to the release of a system into the production environment. Rollback Capabilities You might argue that an effective set of barrier conditions in your development pro- cess should obviate the need for being able to roll back major changes within your ptg5994185 ROLLBACK CAPABILITIES 279 production environment. We can’t really argue with that thought or approach as technically it is correct. However, arguing against the capability to roll back is really an argument against having an insurance policy. You may believe, for instance, that you don’t have a need for health insurance because you are a healthy individual and fairly wealthy. Or, you may argue against automobile insurance because you are, in the words of Dustin Hoffman in Rain Man, “an excellent driver.” But what happens when you contract a treatable cancer and don’t have the funds for the treatment, or someone runs into your vehicle and doesn’t have liability insurance? If you are like most people, your view of whether you need (or needed) this insurance changes immediately when it would become useful. The same holds true when you find your- self in a situation where fixing forward is going to take quite a bit of time and have quite an adverse impact on your clients. Rollback Window Requirements Rollback requirements differ significantly by business. The question to ask yourself in determining how to establish your specific rollback needs, at least from the per- spective of scalability, is to decide by when you will have enough information regard- ing performance to determine if you need to undo your recent changes. For many companies, the bare minimum is to allow a weekly business day peak utilization period to have great confidence in the results of your analysis. This bare minimum may be enough for modifications to existing functionality, but when new functional- ity is added, it may not be enough. New functions or features often have adoption curves that take more than one day to get enough traffic through that feature to determine its resulting impact on system performance. The amount of data gathered over time within any new feature may also have an adverse performance impact and as a result negatively impact your scalability. Let’s return to Johnny Fixer and the HRM application at AllScale. Johnny’s team has been busy implementing a “degrees of separation” feature into the resume track- ing portion of the system. The idea is that the system will identify people within the company who either know a potential candidate personally or who might know peo- ple who know the candidate with the intent being to enable background checking through individual’s relationships. The feature takes as inputs all companies at which current employees have worked and the list of companies for any given candidate. Johnny’s team initially figures that a linear search should be appropriate as the list of potential companies and resulting overlaps are likely to be small. The new feature is released and starts to compute relationship maps over the course of the next few weeks. Initially, all goes well and Johnny’s team is happy with the results and the runtime of the application. However, as the list of candidates grows, so does the list of companies for which the candidates have worked. Addition- ally, given the growth of AllScale, the number of employees has grown as have their first and second order relationship trees. Soon, many of the processes relying upon [...]... unsupportive For example, the tradeoff of Reduce the Quality Testing for the feature has a –9 score for Follow Established Processes because it clearly does not follow established processes of testing After the matrix is filled out, Mike can perform the calculations on them The formula is to multiply each score in the body of the matrix by the weight of each factor and then sum these products for each tradeoff... and cons, and then analyzing each one is the second method of performing a tradeoff analysis The third method of tradeoff analysis is a more formal process In this process, you will take the tradeoffs identified and add to them factors that are important in accomplishing the project What you will have at the end of the analysis is a score that you can use to judge each tradeoff based on the most important... in the proverbial bucket of costs associated to the feature, and the answer is that if you spend more time on the feature, you are very much more likely to figure out ways to shrink the cost of new hardware, additional bandwidth, and all the other miscellaneous charges Thus, there is automatically a tradeoff between the amount of time spent on something and the ultimate cost associated with it For the. .. tradeoffs made during feature development • These tradeoffs made on individual features can affect the overall scalability of the entire system • Technologists and managers must understand and be able to make the right decisions in the classic tradeoff between speed, quality, and cost • There are at least three methods of performing a tradeoff analysis These are gut feel, pro/con comparison, and decision... specialty vans for handicapped people from Ford vehicles One hundred percent of your business is built around the Ford Econoline Van, and you can’t easily retool your factory for another van given the degree of specialization, your tools, the types of parts necessary to perform your conversions, and your deep relationship with Ford What do you do if Ford goes out of business? What happens if Ford stays... and kitchen appliances For these components, architects often attempt to describe them in terms of “fit and finish,” giving dimensions and design characteristics These architects understand that if they architect something appropriately, they open up several opportunities for negotiations among competing providers of the aforementioned materials These negotiations in turn help drive down the cost of. .. Architecture Design process and then reviewed at an Architecture Review Board, are destined to be of lower quality or higher cost or possibly both For the definition of scope, we will consider the amount of product features being developed as well as the level of effort required for the development of each product feature Often, the scope of a feature can be changed dramatically depending on the requirements... conduct a formal analysis about each hire that is a couple percentage points over the budgeted salary; it is more likely that you are like other managers who have become used to conducting quick tradeoff analysis in their heads or relying on their “guts” to help them make the best decisions given the information that they have at the time The second and more formal method of tradeoff analysis is the comparison... After the tradeoffs that are being considered have been identified and the pros and cons of each listed, Mike is ready to move to the next step This step is to analyze the 291 292 C HAPTER 19 F AST OR R IGHT ? pros and cons to determine which ones outweigh the others for each tradeoff Mike can do this by simply examining them or by allocating a score to them in terms of how bad or good they are For instance,... scope, and cost We prefer to use the traditional speed/cost/quality project triangle and define scope as the size of the trian- 285 C HAPTER 19 F AST OR R IGHT ? Sp ee Co st 2 86 d Scope Quality Figure 19.1 Project Triangle gle This is represented in Figure 19.1, where the legs are speed, cost, and quality, whereas the area of the triangle is the scope of the project If the triangle is small, the scope of . successful scalability of your system and the adoption of the performance and stress testing processes. As we cautioned previously in the discussion of the stress test, the creation of the test. where the legs are speed, cost, and quality, whereas the area of the triangle is the scope of the project. If the triangle is small, the scope of the project is small and thus the cost, time, and. all goes well and Johnny’s team is happy with the results and the runtime of the application. However, as the list of candidates grows, so does the list of companies for which the candidates have