Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 26 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
26
Dung lượng
350,29 KB
Nội dung
52 • Imagine finding an error, fixing it, then repeating the test that exposed the problem in the first place. This is a regression test. Added variations on the initial test, to make sure that the fix works, are also considered part of the regression test series. Under this usage, regression testing is done to make sure that a fix does what it's supposed to do. Some programming groups create a set of regression tests that includes every fixed bug ever reported by any customer. Every time the program is changed in any way, all old fixes are retested. This reflects the vulnerability of code fixes (which, unless they're well documented, often don't look "right" when you read the code) to later changes, especially by new programmers. • Imagine making the same fix, and testing it, but then executing a standard series of tests to make sure that the change didn "t disturb anything else. This too is called regression testing, but it tests the overall integrity of the program, not the success of software fixes. Stub and driveT programs developed during incremental testing can be the basis of an automated regression test battery. Or you can create an automated regression suite of black box tests using a capture/replay program (discussed in Chapter 11, "Automated acceptance and regression Tests"). Both types of tests should be executed whenever errors are fixed. Someone talking about regression testing after bug fixing often means both. BUCK BOX TESTING When coding is finished, the program goes to the Testing Group for further testing. You will find and report errors and get a new version for testing. It will have old errors that you didn't find before and it will have new errors. Martin & McClure (1983) summarize data collected by Boehm on the probability of bug fixes working: • The probability of changing the program correctly on the first try is only 50% if the change involves ten or fewer source statements. • The probability of changing the program correctly on the first try is only 20% if the change involves around 50 statements. Not only can fixes fail; they can also have side effects. A change that corrects one error may produce another. Further, one bug can hide (or mask) another. The second doesn't show up until you get past the first one. Programmers often catch their initial failures to fix a problem. They miss side effects and masked bugs because they often skip regression testing. Because you will not catch all the errors in your first wave(s) of tests, and because the bug fixes will cause new bugs, you should expect to test the program many times. While early in testing you might accept revised versions every few hours or days, it's common to test one version thoroughly before accepting the next for 53 testing. A cycle of testing includes a thorough test of one version of the program, a summary report describing the problems found in that version, and a summary of all known problems. Project managers often try to schedule two cycles of testing: one to find all the bugs, the second to verify the fixes. Eight cycles is more likely. If you do less thorough testing per version, expect 20 or 30 (or more) cycles. T HE USUAL BLACK BOX SEQUENCE OF EVENTS This section describes a sequence of events that is "usual" in the microcomputer community, once black box testing starts. The mainframe culture is different. Friends who work in banks tell us that they start designing and writing tests well before they start testing. They tell us this earlier start is typical of mainframe testing even when the test effort is otherwise mediocre. Test planning The testing effort starts when you begin test planning and test case design. Depending on the thoroughness of the specifications and your schedule, you can start planning as soon as the requirements document is circulated. More likely, you will begin detailed planning and designing tests in the first cycle of testing. Chapter 7 discusses the design of individual tests and Chapter 12 discusses the overall test plan. Acceptance testing Each time you receive a new version of the program, check whether it's stable enough to be tested. If it crashes at the slightest provocation, don't waste your time on it. This first bit of testing is called acceptance or qualification testing. Try to standardize the acceptance test. Distribute copies of it to the programmers so they can run the test before submitting the program to you, avoiding embarrassing rejections. The acceptance test should be short. It should test mainstream functions with mainstream data. You should be able to easily defend the claim that a version of the program that fails this test is in miserable shape. Many companies partially automate their acceptance tests using black box automation software. Several packages are commercially available. Initial stability assessment How reliable is the program? Will it take 4 cycles of testing or 24? You might be asked to assess stability for scheduling, to estimate the cost of sending it to an outside testing agency, or to estimate the publishability or supportability of a program your company is considering acquiring and distributing. You are not trying to find bugs per se at this point. You are trying to decide which areas of the program you trust least. If the program looks weak in an area that's hard to test, expect testing to take a long time. Checking the existing manual against the program is a good start. This covers the full range of the program's functions with easy examples. Try a few other tests that you might expect the program to fail. At the end of this initial evaluation, you should have a feel for how hard the program will be to test and how bug-ridden it is. We can't tell you how to translate this feeling into a numerical estimate of required person-hours, but a qualitative gauge is much better than nothing. 54 You should rarely spend more than a week on an initial stability estimate. If you can't test the manual in i week, use part of it. Make sure to include a review of each section of the manual. If the program is not trivial, and if it is not a new version of an old program that you've tested many times before, don't expect to be able to say much about the program in less than a week. Function test, system test, verification, and validation You verify a program by checking it against the most closely related design document(s) or specification(s). [f there is an external specification, the Junction test verifies the program against it. You validate a program by checking it against the published user or system requirements. System testing and integrity testing (see below) are validation tests. # Independent Verification and Validation (IV&V) is a popular buzzphrase referring to verification and validation testing done by an independent test agency. The testing phase includes both function and system testing. If you have an external specification, testing the program against it is only part of your task. We discuss the questions you will raise during testing in the next major section of this chapter, "Some tests run during function and system testing." For a more complete discussion of verification and validation, see Andriole (1986) or the IEEE Standard for Software Verification and Validation Plans (ANSI/IEEE Standard 1012-1986). Beta testing When the program and documentation seem stable, it's time to get user feedback. In a beta test, people who represent your market use the product in the same way(s) that they would if they bought the finished version and give you their comments. Prudent beta testers will not rely on your product because you will warn them that this unfinished version may still have horrible bugs. Since they're not working full time with your product, they will not test it as thoroughly or as quickly as you would like. Expect a beta tester to take three weeks to work with the product for 20 hours. The 20 hours work from a beta tester are not free. You or another tester will probably spend 4 to 8 hours recruiting, managing, nagging, and supporting each outside tester, plus additional time writing the beta test instructions and questionnaire. 55 Some people will use the beta test version of the product much more thoroughly. They will use it more extensively if: • This is the only product of its type; they need it even if it is unreliable. • You pay them enough. Typical payment is a free or deeply price-reduced copy of the product. This is enough if the purchase price is high for that tester. If you're testing a $500 database manager, many users would not consider a free copy of the program to be enough. If they use the program to keep important records and it crashes (as it probably will) it will cost them a lot more to re-enter the data. • You give them a service guarantee. For example, you might promise that if the program crashes, you (someone in your company) will re-enter their data for free. In Chapter 13, the section "Beta: Outside beta tests" discusses beta testing in much more detail. Integrity and release testing Even after you decide that the product is finished, problems are still possible. For example, many companies have sent out blank or virus-infected disks for duplication. In the release test, you gather all the things that will go to the customer or to a manufacturer, check that these are all the right things, copy them, and archive the copies. Then you release them. A release test of a set of disks might be as simple as a binary comparison between all files on these disks and those on the version you declared "good" during the final round of testing. Even if you make the release disks from the tested disks, do the file comparisons. It's cheap compared with the cost of shipping thousands of copies of the wrong disk. We strongly recommend that you test the product for viruses as part of the release test. If you send out software in compressed format, test the compressed disks but also install the program, run the program, reboot, and check if your computer got a virus from the decompressed program. It's not yet clear whether your customers can sue your company, or for how much, if your software carries a virus, but it's not unlikely that your company would be dragged into court (sec Chapter 14). Integrity testing is a more thorough release test. It provides a last chance to rethink things before the product goes out the door. The integrity tester tries to anticipate every major criticism that will appear in product reviews, or, for contract work, every major complaint the customer will raise for the next few months. The integrity tester should be a senior tester who wasn't involved in the development or testing of this product. He may work for an independent test agency. The integrity tester assumes that function and system testing were thorough. He does not deliberately set out to find errors. He may carefully compare the program, the user documentation, and the early requirements documents. He may also make comparisons with competing products. An integrity test should also include all marketing support materials. The product must live up to all claims made in the advertisements. Test the ad copy and sales materials before they are published. The test is best conducted by one person, not by a team. Budget two weeks for an integrity test of a moderately complex single-user program. 56 Final acceptance testing and certification If your company developed the program on contract, the customer will ran an acceptance test when you deliver it In small projects, this test may be informal. For most projects, however, test details are agreed to in advance, in writing. Make sure the program passes the test before trying to deliver it to the customer. An acceptance test usually lasts less than a day. It is not a thorough system test. Beizer (1984) describes the preparation and execution of formal customer acceptance tests. Perry (1986) is, in effect, a customer's guide to creating acceptance tests. Consider using Perry (1986) to structure your negotiations with the customer when you jointly design the acceptance test. Certification is done by a third party. The certifier might be an agent of the user or an independent test agency. A certification test can be brief, at the level of an acceptance test, or more thorough. Development contracts may require certification in place of acceptance testing. The contract should spell out the level of testing or inspection involved and any standards that must be met by the program, the development process or the testing process. If your company is seeking some form of certification voluntarily, probably for marketing purposes, the amount of testing involved is negotiable. SOME TESTS RUN DURING FUNCTION AND SYSTEM TESTING Having defined function and system testing above, here are examples of tests that are run during the function or system testing phases. Specification verification Compare the program's behavior against every word in the external specification. Correctness Are the program's computations and its reports of them correct? Usability You can hire people who are like those who will use the product, and study how they work with it. A beta test is an attempt to run a usability test cheaply. However, since you don't see the problems as they arise, and you can't set the people's tasks, you won't learn as much from beta testing as you could from studying representative users in your laboratory. Boundary conditions Check the program's response to al] extreme input values. Feed it data that force it to output extreme values. 57 Performance This is black box performance testing. Identify tasks and measure how long it takes to do each. Get a good stopwatch. State transitions Does the program switch correctly from state to state? For example, if you can tell it to sort data, print them, then display a data entry screen, will it do these things in the correct order? Can you make it do them out of sequence? Can you make the program lose track of its current state? Finally, what does the program do with input while it's switching between states? If you start typing just as it stops printing and prepares to display the data entry screen, does the program crash? Mainstream usage tests Use the program the way you expect customers to use it. Do some real work with it. It's surprising how many errors show up in this type of test that didn't come up, or didn't seem important, when you did the more formal (e.g., boundary) tests. Load: volume, stress, and storage tests Load tests study the behavior of the program when it is working at its limits: • Volume tests study the largest tasks the program can deal with. You might feed huge programs to a compiler and huge text files to a word processing program. Or you might feed an interactive program input quickly but steadily, to try to overflow the amount of data it can receive and hold in temporary storage. (Interactive programs often minimize their response times to keystrokes and mouse strokes by putting input in temporary storage until a break between bursts of input. Then they process and interpret the input until the next input event.) You should also feed programs with no executable code to the compiler and empty files to the word processor. (For some reason these are not called volume tests). • Stress tests study the program's response to peak bursts of activity. For example, you might check a word processor's response when a person types 120 words per minute. If the amount of activity that the program should be able to handle has been specified, the stress test attempts to prove that the program fails at or below that level. • Storage tests study how memory and space is used by the program, either in resident memory or on disk. If there are limits on these amounts, storage tests attempt to prove that the program will exceed them. Background In a multi-processing system, how well does the product do many tasks? The objective is to prove that the program fails when it tries to handle more than one task. For example, if it is a multi-user database have many people use it at the same time, or write a program to simulate the inputs from many people. This is the background activity. Now start testing. What happens when two users try to work with the same data? What if you both try to write to the printer or disk simultaneously? See Beizer (1984) for further discussion. 58 Error recovery Make as many different types of errors as you can. Try to get the program to issue every error message listed in the documentation's Error Messages appendix. (Also generate any messages that aren't listed in the documentation.) Error handling code is among the least tested so these should be among your most fruitful tests. Security How easy would it be for an unauthorized useT to gain access to this program? What could she do to your data if she did? See Beizer (1984) for thoughts on security testing and Fernandez et al. (1981) for a much broader discussion of security issues. Compatibility and conversion Compatibility testing checks that one product works with another. Two products might be called compatible if they can share the same data files or if they can simultaneously reside in the same computer's memory. Since there are many types of "compatibility," you must know which one is claimed before you can test for it. If they are not directly compatible, your program might still be able to read another's data files by using a two step process. First, run a conversion program that rewrites the files in your program's format. Then your program reads those new files. The most common conversion problem is between two versions of the same program. An updated program must detect that the data are in the old version's format and either read and rewrite them or call a conversion utility to do this. Your program might also be able to rewrite files from its format into one compatible with another program. Configuration The program must work on a range of computers. Even if it only has to operate on one model of computer, two machines of that model will differ in their printers, other peripherals, memory, and internal logic cards. The goal of the configuration test is finding a hardware combination that should be, but is not, compatible with the program. Installablllty and serviceability An installation utility lets you customize the product to match your system configuration. Does the installation program work? Is it easy to use? How long does the average user take to install the product? How long does an expert take? If the program is installed by a service person or by any third party, installation is an issue within the largeT scope of serviceability. The serviceability question is this: if the program does tail, how easily can a trained technician fix it or patch around it? 59 Quickies The qiMky is a show tool. Its goal is to cause a program to fail almost immediately. Quickies are "pulled" in front of an audience, such as visiting executives. If the test is successful, the people watching you will be impressed with how good a tester you are and how unstable the program is. You have no planning time for a quicky. When you get the program, you have to guess what might be wrong with it based on your experience with other programs written by the authors of this one, with other programs that run under the same operating system, etc. For example, try pressing <Ent er > or moving and clicking the mouse while a program is loading from the hard disk. In general, try to provoke race conditions (see "Race conditions" in Chapter 4) or error recovery failures. Your tests should be unobtrusive. Ideally, no one looking over your shoulder would realize that you tried a test unless the program fails it. MAINTENANCE A large share of the money your company spends on this program will be spent changing it after it's completed. According to Martin & McClure's (1984) textbook: • Maintenance accounts for almost 67% of the total cost of the software. • 20% of the maintenance budget is spent fixing errors. • 25% is spent adapting the program so that it works with new hardware or with new co-resident software. • 6% is spent fixing the documentation. • 4% is spent on performance improvements. • 42% is spent making changes (enhancements) requested by users. Most of the testing you will do during the maintenance phases should be similar to what you did during function and system testing. Ideally, you will have a battery of regression tests, many of them automated, that you can run every time the program changes. Remember that maintenance changes are likely to have side effects. It is necessary to verify that the code as a whole works. PORT TESTING The port test is unique to maintenance. Use it when the program is modified to run on another (similar) operating system or computer. The product might be ported to many different types of computers; you have to check that it works on each. Here is our strategy for port testing (assuming that the port required relatively few and minor modifications): • Overall functionality: Use your regression series. If you don't have one, create one that exercises each of the main functions using mainstream data or a few boundary data values. If a function doesn't port successfully, it will usually not work at all, so these tests don't have to be subtle. Ported software doesn't usually fail tests of general functionality, so don't waste your time executing lots of them. • Keyboard handling: Two computers with proprietary keyboards probably use them slightly differently. Many errors are found here. Test the effect of pressing every key (shifted, altered, etc.) in many places. • Terminal handling: The program may not work with terminals that are commonly used with the new computer. You must test the popular terminals even if the program works with ANSI Standard 60 terminals because the Standard doesn't include all the characters displayed on many "ANSI Standard" screens. Along with incompatible characters, look for problems in color, highlighting, underlining, cursor addressing including horizontal and vertical scrolling, and the speed of screen updating. • Sign-on screen, verston and system identification: The program's version ID has changed. Is the new ID everywhere? Also, if the program names the computer or operating system at startup, does it name the right one? • Disks: Disk capacities and formats differ across machines, and formats might be different. Make sure the program works with files that are exactly 128,256,512,1024,2048,4096,8192, and 16,384 bytes long. Try it with a huge drive too, if that is supported on the new system but wasn't available (or tested) in the original environment. • Operating system error handling: If you fill the disk, does the operating system let your program handle the problem or does it halt your program and report a system-level error? If the old machine handled errors one way, the new one may handle them the other. How does your product insulate the user from bad operating system error handling and other system quirks? • Installation: When you install the product, you tell it how much memory it can use, the type of printer and terminal, and other information about peripherals. The installation routines were probably the most heavily modified part of the product, so spend some time on them. Check their responses to all keystrokes, and their transitions across menus. Set up a few peripheral configura tions to see if the product, after proper installation, works with them. Be particularly wary of configurations that were impossible (and so untestable) on the old system, such as huge amounts of available memory, huge hard drives, multi-tasking, or new types of printers. • Compatibility: Suppose that on the original computer, your program was compatible with PROGRAM_X. If PROGRAM_X has also been ported to the new computer, is your ported program compatible with ported PROGRAM_X? Don't bet on it. • Interface style: When you take a program from one graphical environment to another (Windows, Mac, AmigaDOS, Motif, etc.), different user interface conventions apply. Some people are adamant that the program behave as though it was designed for their computer from the start, without carrying in rules from some other environment. • Other changes: Ask the programmers what other changes were made during porting, and why. Test to make sure that the changes are correct. Expect the first port to a new platform to require a lot of testing time, maybe a quarter as long as the original testing, while you figure out what must be tested and what can be skipped. Tests to later platforms will probably go more quickly, now that you understand how the program will usually change. 61 SOFTWARE ERRORS INTRODUCTION: THE REASON FOR THIS CHAPTER Your primary task as a tester Is to find and report errors. The purpose of your work is improvement of product quality. This brief chapter defines "quality" and "software error." Then, because it helps to know what you're looking for before hunting for it, we describe thirteen categories of software errors. The Appendix describes the error categories In more detail, and Illustrates them with over 400 specific types of errors. USEFUL READING Demlng (1982), Feigenbaum (1991), Ishlkawa (1985), and Juran (1989) are well respected, well written books with thoughtful discussions of the meaning of quality. QUALITY Some businesses make customer-designed products on order. The customer brings a detailed specification that describes exactly what he wants and the company agrees to make it. In this case, quality means matching the customer's specification. Most software developers don't have such knowledgeable and precise customers. For them, the measure of their products' and services' quality is the satisfaction of their customers, not the match to a specification. If the customer doesn't like the end result, it doesn't matter if the product meets a specification, even if the customer agreed to the specification. For that customer, it's not good quality if he's not happy with it. One aspect of quality is reliability. The more reliable the program, the less often it fails while the customer is trying to use it, and the less serious the consequences of any failures. This is very important, but testers who say that quality is reliability are mistaken. If the program can't do what the customer wants to do with it, the customer is unhappy. If the customer is not happy, the quality is not high. A program's quality depends on: • the features that make the customer want to use the program, and • the flaws that make the customer wish he'd bought something else. Your main contribution as a tester is to improve customer satisfaction by reducing the number of flaws in the program. But a project manager who forces a particularly useful feature into the program at the last minute may also be improving the product's quality, even if the changed program is less reliable. Features and flaws both determine quality, not just one or the other. (For more discussion, read Juran, 1989.) The rest of this chapter is about the flaws. How will we know one when we find it? [...]... identification of source and version control problems is a Testing function; enforcement is not Expanding a Testing Empire to encompass source and version control is asking for a license to get on people's nerves 65 DOCUMENTATION The documentation is not software but it is part of the software product Poor documentation can lead users to believe that the software is not working correctly Detailed discussion... definition of software errors We see these as just another group of errors and you should too It may be harder to convince a programmer that a user interface error is an error, or that it's important, or that testers have any right to te ll him about it, but customers complain about serious human factors errors every bit as much as they complain about crashes CATEGORIES OF SOFTWARE ERRORS We describe 13 major... IS A SOFTWARE ERROR? One common definition of a software error is a mismatch between the program and its specification Don't use this definition A mismatch between the program and its specification is an error in the program if and only if the specification exists and is correct A program that follows a terrible specification perfectly is terrible, not perfect Here are two better definitions: • A software. .. documentation errors is beyond the scope of this book, but documentation testing is discussed in Chapter 10 TESTING ERRORS Last, but definitely not least: if a programmer makes one and a half mistakes per line of code, how many mistakes will you make per test? Errors made by the tester are among the most common errors discovered during testing You don't want them to be the most common errors reported—you'd... deliberate bug-hiding? • Some Testing Groups change the RESOLUTION CODE We don't recommend this It can cause loud arguments • Some Testing Groups reject Problem Reports that should be marked as Deferred but are marked As des igned They send the report back to the project manager and insist that he reclassify the RESOLUTION Don't try this without solid management support 75 • Many Testing Groups ignore this... principle that all Problem Reports must be reported On occasion, you may be loaned to a programming team during their first stages of testing, well before official release of the code to the Testing Group Many of the problems you'll find wouldn't survive into formal testing whether you were helping test or not Normally, few bugs found at this stage of development are entered into the problem tracking... When the product is submitted for formal testing, enter reports of bugs that remain 76 NUMBERED Track Problem Reports numerically Assign a unique number to each report If you use a computerized database, the report number will serve as a key field This is the one piece of information that always distinguishes one report from all the rest It's best to have the computer assign report numbers SIMPLE By... with strong preferences on both sides.) 73 PRIORITY PRIORITY is assigned by the project manager, who typically uses a 5- or 10-item scale The project manager asks programmers to fix bugs in priority order The definition for each PRIORITY varies between companies Here's a sample scale: (1) Fix immediately—this is holding up other work (2) Fix as soon as possible (3) Must fix before the next milestone (alpha,... problem in the software, fill out a Problem Report form CONTENT OF THE PROBLEM REPORT The type of information requested on Problem Report forms is much the same across companies; the organization and labeling varies Figure 5.1 shows the layout of the form that we refer to throughout this book The rest of this section examines the individual fields on the form PROBLEM REPORT NUMBER Ideally, the computer fills... test For example, the VERSION identifier might be 1 0 lm The product will be advertised as RELEASE 1.01 The VERSION LETTER, m, indicates that this is the thirteenth draft of 1 01 created or released for testing 68 69 When the programmer can't reproduce a problem in the current version of the code, the VEREICN identifier tells her what version the problem was found in She can then go to this exact version . Someone talking about regression testing after bug fixing often means both. BUCK BOX TESTING When coding is finished, the program goes to the Testing Group for further testing. You will find and. early in testing you might accept revised versions every few hours or days, it's common to test one version thoroughly before accepting the next for 53 testing. A cycle of testing includes. schedule two cycles of testing: one to find all the bugs, the second to verify the fixes. Eight cycles is more likely. If you do less thorough testing per version, expect 20 or 30 (or more) cycles.