Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 29 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
29
Dung lượng
314,66 KB
Nội dung
78 Our strongest recommendation for improving legibility is to use a computerized problem tracking system (see Chapter 6). Make the computer type your reports. NON-JUDGMENTAL Nobody likes being told that what they did was wrong, wrong, wrong. As a tester, that's what you tell people every day. You can ensure your unpopularity by describing problems in a way that tells the programmer you think she is sloppy, stupid, or unprofessional. Even if you think she is, keep it out of the report. If the programmer considers you a jerk and your reports vindictive, she will want to ignore your reports and complain about you to her management. Complaints about maliciously written Problem Reports can have serious consequences. First, they reduce your chances of raises and promotions, and may cost you your job. Some testers think their "straight" (nasty) reporting style is more courageous than foolish. But malice leads to a justifiable movement to censor Problem Reports. Because of censorship, only some reports reach the programmers, and censors don't just reject inappropriate wording. They also suppress reports of problems they consider too minor or that they decide will have political repercussions they don't care to face. Once censorship starts, some testers will stop reporting some classes of problems because they "know" that these reports will never make it past review anyway. Under these conditions, many fixable problems are never reported and never fixed. Think twice, and twice again, before declaring war on programmers by expressing personal judgments in your reports. You will almost certainly lose that war. Even if you keep your job, you will create an adversarial relationship that will cost you reporting freedom. It will not improve product quality even if every judgment you express is correct. We are not saying never express a judgment. Occasionally, you may have to write a powerful, bluntly worded report to alert management to a serious problem that a programmer will not acknowledge or fix. Fine. Use your most effective tactics. But choose your battles carefully. Don't do this more than twice per year. If you feel that you have to engage in more mudslinging than that, circulate your resume. Either the company has no standards or your unhappiness in your environment is expressing itself in a very unhealthy way. ANALYSIS OF A REPRODUCIBLE BUG The rest of this chapter concentrates on reporting of coding errors rather than design issues. In this section, and the next, we assume that each bug is reproducible. We explain tactics for reproducing non-reproducible bugs shortly, in the section, "Making a Bug Reproducible." 79 Reproducibility implies the following: • You can describe how to get the program into a known state. Anyone familiar with the program can follow your description and get the program into that state. • From that state, you can specify an exact series of steps that expose the problem. To make your report more effective you should analyze it further. If the problem is complicated either because it takes many steps to recreate or because the consequences are hard to describe, spend time with it. Simplify the report or break it into a series of many reports. The objectives of your analysis are: • Find the most serious consequences of the problem. • Find the simplest, shortest, and most general conditions that will trigger the bug. • Find alternate paths to the same problem. • Find related problems. FINDING THE MOST SERIOUS CONSEQUENCES Look for the most serious consequences of a bug in order to boost everyone's interest in fixing it. A problem that looks minor will more often be deferred. For example, suppose a bug displays a little garbage text in a corneT of the screen. This is minor but reportable. It will probably be fixed, but against a deadline, this bug would not stop shipment of the program. Sometimes, onscreen garbage is merely an isolated problem (and the decision to leave it alone might be wise, especially just before release). Often though, it is the first symptom of a more serious underlying problem. If you keep working with the program, you might discover that it crashes almost immediately after displaying the garbage. This is the consequence you're looking for; it will get the screen garbage fixed. When a program fails, it either: • falls into a state the programmer didn't expect, or • falls into error recovery routines. If the state is unexpected, subsequent code makes incorrect assumptions about what has happened. Further errors are Likely. As to error recovery routines, these are often the least tested parts of the program. They often have errors and are poorly designed. Typically, error routines contain more serious bugs than the one that led there. When the program logs an error, displays garbage onscreen, or does anything else that the programmer didn't intend, always look for a follow-up bug. FINDING THE SIMPLEST AND MOST GENERAL CONDITIONS Some bugs show up at midnight every leap year, but never appear any other time. Some bugs won't show up unless you make a complex series of erroneous or unlikely responses. Bug fixing involves tradeoffs: • If it takes minimal effort to understand and fix a problem, someone will fix it. 80 • If the fix requires (or looks like it will require) lots of time and effort, the programmers will be less willing to fix it. • If the problem will arise during routine use of the program, management interest in the problem will increase. • If it appears that almost no one will see the problem, interest will be low. Finding simpler ways to reproduce a bug also makes the debugging programmer's task much easier and faster. The fewer steps that it takes to reproduce a bug, the fewer places the programmer has to look (usually) in the code, and the more focused her search for the internal cause of the bug can be. The effort involved in fixing a bug includes finding the internal cause, changing the code to eliminate the cause, and testing the change. If you make it easier to find the cause and test the change, you reduce the effort required to fix the problem. Easy bugs get fixed even if they are minor. FINDING ALTERNATE PATHS TO THE SAME PROBLEM Sometimes it takes a lot to trigger a bug. No matter how deeply you analyze a problem, you still need many steps to reproduce it. Even if every step is likely in normal use of the program, a casual observer might believe that the problem is so complicated that few customers will see it. You can counter this impression by showing that you can trigger the error in more than one way. Two different paths to the same bug are a more powerful danger signal than one. Two paths suggest that something is deeply wrong in the code even if each path involves a complicated series of steps. Further, if you describe two paths to a bug, they probably have something in common. You might not see the commonality from the outside, but the programmer can look for code they both pass through. It takes practice to develop judgment here. You must present different enough paths that the programmer won't dismiss them as alternative descriptions of the same bug, but the paths don't have differ in every detail. Each path is valuable to the degree that it provides extra information. FINDING RELATED PROBLEMS Look for other places in the program where you can do something similar to what you did to expose this bug. You've got a reasonable chance of finding a similar error in this new code. Next, follow up that error and see what other trouble you can get into. A bug is an opportunity. It puts the program into an unusual state, and runs it through error recovery code that you would otherwise find hard to reach and test. Most bugs that you find under these conditions are worthwhile because some customers will find another way to reach the same error handling routines. Your investigation can avert a disaster. 81 Again you must develop judgment You don't want to spend too much time looking for related problems. You may invest time in this most heavily after deferral of a bug that you know in your heart is going to cause customer grief. TACTICS FOR ANALYZING A REPRODUCIBLE BUG Here are a few tips for achieving the objectives laid out in the previous section: LOOK FOR THE CRITICAL STEP When you find a bug, you're looking at a symptom, not a cause. Program misbehavior is the result of an error in the code. You don't see the error because you don't read the code; you just see misbehavior. The underlying error (the mistake in the code) may have happened many steps ago: any of the steps involved in a bug could be the one that triggers the error. If you can isolate the triggering step, you can reproduce the bug more easily and the programmer can fix it more easily. Look carefully for any hint of an error as you take each step. Often minor indicators are easily missed or ignored. Minor bugs might be the first symptoms of an error that will eventually manifest itself as the problem you're interested in. If they occur on the path to the problem you're analyzing, the odds are reasonable that they're related to it. Look for • Error messages: Check error messages against a list of the program's error messages and the events the programmer claims trigger them. Read the message, try to understand why it appears and when (what step or substep). • Processing delays: If the program takes an unusually long time to display the next bit of text or to finish a calculation, it may be wildly executing totally unrelated routines. The program may break out of this with inappropriately changed data or it may never return to its old state. When you type the next character, the program may think you're answering a different question (asked in an entirely different section of code) from the one showing onscreen. An unusual delay may be the only indicator that a program has just started to run amok. • Blinking screen: You may be looking at error recovery when the screen is repainted or part of it flashes then reverts to normal. As part of its response to an error, the program makes sure that what shows on the screen accurately reflects its state and data. The repainting might work, but the rest of the error recovery code may foul up later. • Jumping cursor: The cursor jumps to an unexpected place. Maybe it comes back (error recovery?) or maybe it stays there. If it stays, the program may have lost track of the cursor's location. Even if the cursor returns, if the program maintains internally distinct input and output cursors (many do), it may have lost one of them. • Multiple cursors: There are two cursors on the screen when there only should be one. The program may be in a weird state or in a transition between states. (However, this may not be state-dependent. The program may just be misdriving the video hardware, perhaps because it's not updating redundant variables it uses to track the register status of the video card.) 82 • Misaligned text. Lines of text that are normally printed or displayed in a consistent pattern (e.g., all of them start in the leftmost column) are slightly misprinted. Maybe only one line is indented by one character. Maybe all the text is shifted, evenly or unevenly. • Characters doubled or omitted: The computer prints out the word error as errrro. Maybe you've found a spelling mistake or maybe the program is having problems reading the data (the string "error") or communicating with the printer. Some race conditions cause character skipping along with other less immediately visible problems. • In-use light on when the device is not in use: Many disk drives and other peripherals have in-use lights. These show when the computer is reading or writing data to them. When a peripheral's light goes on unexpectedly, the program might be incorrectly reading or writing to memory locations allocated to these peripherals instead of the correct area in memory. Some languages (C, for example) make it especially easy to inadvertently address the wrong area of memory. The program may "save" data to locations reserved for disk control or have previously overwritten control code with data it thought it was saving elsewhere. When this happens you don't see the internal program being overwritten (which will result in horrible bugs when you try to use that part of the program), but you can see the I/O lights blink. This is a classic "wild pointer" bug. MAXIMIZE THE VISIBILITY OF THE BEHAVIOR OF THE PROGRAM The more aspects of program behavior you can make visible, the more things you can see going wrong and the more likely you'll be able to nail down the critical step. If you know how to use a source code debugger, and have access to one, consider using it. Along with tracing the code path, some debuggers will report which process is active, how much memory or other resources it's using, how much of the stack is in use, and other internal information. The debugger can tell you that: • A routine always exits leaving more data on the stack (a temporary, size-limited data storage area) than was there when it began. If this routine is called enough times, the stack will fill up and terrible things will happen. • When one process receives a message from another, an operating system utility that controls message transfer gives me receiving process access to a new area of memory. The message is the data stored in this memory area. When the process finishes with the message, it tells the operating system to take the memory area back. If the process never releases message memory, then as it receives more messages, eventually it gains control of all available memory. No more messages can be sent. The system grinds to a halt. The debugger can show you which process is accumulating memory, before the system crashes. 83 You can find much more with debuggers. The more you know about programming and the internals of the program you're testing, the more useful the debugger will be. But beware of spending too much time with the debugger: your task is black box testing, not looking at the code. Another way to increase visibility is to print everything the computer displays onscreen and all changes to disk files. You can analyze these at your leisure. If the screen display changes too rapidly for you to catch all the details, test on a slower computer. You'll be able to see more of the display as it changes. You have other ways to slow down the program. For example, on a multi-user system, get lots of activity going on other terminals. ONCE YOU'VE FOUND THE CRITICAL STEP, VARY YOUR BEHAVIOR You know that if you do A then B then C, the computer does something bad at C. You know the error is in B. Try A then B then D. How does the program foul up in D? Keep varying the next steps until you get sick of it or until you find at least one case that is serious (such as a system crash). LOOK FOR FOLLOW-UP ERRORS Even if you don't know the critical step, once you've found the bug, keep using the program for a bit. Do any other errors show up? Do this guardedly. All further problems may be consequences of the first one. They may not be reproducible after this one is fixed. On the other hand, once you find one error, don't assume that later ones are necessarily consequences of the first. You have to test them separately from a known clean state and through a path that doesn't trigger the initial problem. PROGRESSIVELY OMIT OR VARY YOUR STEPS If the problem is complex and involves many steps, what happens if you skip some or change them just a little? Does the bug stay there? Does it go away or turn into something else? The more steps you can get rid of, the better. Test each to see if it is essential to reproducing the bug. As to varying the steps, look for boundary conditions within a step. If the program displays three names per line, and you know it fails when it has exactly six, what happens if it has exactly three? CHECK FOR THIS ERROR IN PREVIOUS PROGRAM VERSIONS If the error isn't in the last version of the program you tested, the error was introduced as part of a change. This information can substantially narrow the programmer's search for the cause of the error, [f possible, reload the old version and check for it. This will be most important at the end of a project. LOOK FOR CONFIGURATION DEPENDENCE Suppose your computer has two megabytes of memory. Can you reproduce the bug on one that has 640K or four megabytes? What if you add a network or window environment or TSR programs? If you've configured the program to work with two terminals, what happens if you change this to one or four? If the problem appears on a color monitor, what happens on a monochrome monitor? If program options are stored in a data file, what if you change some values? Chapter 8 discusses configuration issues. 84 MAKING A BUG REPRODUCIBLE A bug is reproducible only if someone else can do what you say and get what you got. You must be able to explain how to put the computer into a known state, do a few steps that trigger the bug, and recognize it when it appears. Many bugs corrupt unexpected areas of memory, or change device states. To be sure that you aren't looking at a side effect of some previous bug, as part of your reproduction drill you will generally reboot the computer and reload the program before trying the steps you think are necessary to trigger the bug. Suppose you don't know how to reproduce a bug. You try to reproduce it and fail. You're not sure how you triggered the bug. What do you do? First, write down everything you remember about what you did the first time. Note which things you're sure of, and which are good guesses. Note what else you did before starting on the series of steps that led to this bug. Include trivia. Now ask the question, "Why is this bug hard to reproduce?" Many testers find it useful to videotape their steps. Many computers or video and sound cards provide output that can be recorded on video tape. This can save many hours of trying to remember individual steps, or it can be a serious time sink: approach it with caution. With a program prone to irreproducible problems, a record of last resort may be essential for tracing back through a particularly complex path. And a recording of a bug proves that the bug exists, even if you cannot reproduce it. Other testers use capture programs to record all their keystrokes and mouse movements. These are also good tools to help you identify the things you did before running into the bug. If retracing your steps still doesn't work, keep at it. There are no intermittent software errors. The problem may appear rarely, but each time the exact conditions are met, the same behavior will repeat. All bugs should be reproducible. There are many reasons that you might not be able to reproduce a bug immediately. Here are a few hypotheses to consider. RACE CONDITIONS Once you're used to conducting a test, you might run through its steps quickly. It's common (and good practice) to slow down when you find a bug. You did it fast the first time, now watch what you're doing carefully while you try it again. If you can't repeat the error, your problem may be timing: race conditions show up when you're trying to push the program to work faster than it can. Run the test again quickly, with the same rhythm you used the first time. Try this a few times before giving up. Try slowing the computer down or testing on a slower machine. FORGOTTEN DETAILS If you're testing on the fly (i.e., without a test plan) and you find a problem that you can't repeat, you've probably forgotten something about what you did. It's easy to forget under these circumstances because you 85 don't have a step-by-step plan of what you were going to do. Sometimes you may be pressing keys almost randomly. If you are interrupted during a test, you may do something twice, or something apparently extraneous that should be harmless (for example, turn a terminal or printer on or off, or press a key then press <Delete>). Try to remember exactly what you did just before the interruption, what fidgeting you did during the interruption, and what you did just after you got back to work. USER ERROR: YOU DIDN'T DO WHAT YOU THOUGHT YOU DID This will often be the explanation for a "bug." As long as you don't repeat your error, you won't be able to recreate the bug. Even though this is a likely guess, accept it only when you run out of alternatives. If you think that people will make this error frequently, and the program's response to it is unacceptable, report a problem with the program's error handling. Don't ignore your errors. Carefully examine what the computer does with them. AN EFFECT OF THE BUG MAKES REPLICATION IMPOSSIBLE Bugs can destroy files, write into invalid memory areas, disable interrupts, or close down I/O ports. When this occurs you can't reproduce a problem until you recover the files or restore the computer to its proper (or previous) state. Here's an example of this type of problem. One of your customers sends you a letter of complaint and a floppy disk. To replicate the problem you start the program, load the disk, run the test and OOPS, the bug trashes the data files on the customer's disk. You've reproduced the problem once, but now until you get another copy of the disk from the customer, you'll never reproduce it again. To avoid problems like this, make sure to back up data files before attempting to replicate a bug. ^ — ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ Never, never, never use the original source of the data. Always use copies. THE BUG IS MEMORY-DEPENDENT The program may fail only when a specific amount or type of memory is available. Another memory-specific condition may be that the total amount of available memory appears adequate, but it turns out to be too fragmented (spread across smaller blocks that are not contiguous). A message box that displays the amount of free memory, perhaps also showing the sizes of the five largest blocks, can be extremely handy. You see how much memory is available at the start of a test, and so, how far to reduce available memory to truly reproduce a problem. Further, this helps you understand how much memory each operation uses, making it much easier to get the program back into the original memory state. (These memory dialogs are often put in for debugging purposes, accessed by a special key, but they are often left in programs for product support use later. They are very handy.) 86 —\ THIS IS A FIRST-TIME-ONLY (INFTIAL STATE) BUG ■ In the classic case, when you run the program for its first time, one of its first tasks is to initialize its configuration data file on disk. If you can get the program to do anything else before initialization, it will misbehave. As soon as initialization of the data file is complete, however, the program will work fine. This error will only be seen the very first time the program is run. Unfortunately, it might be seen by every person who buys the program when it is run for the first time. As a variant of this problem, a program might not clean out the right parts of the computer's memory until after running for a while. Rather than finding O's, the program might find what it thinks is data. What it has really found is junk left over from the last program that was running. Once the program initializes this area of memory, you won't see the problem again until you reload the other programs into memory, then reload this on top of them. The question to ask is how to get the computer, the program, and the data files into the state they were in before the program misbehaved. To answer this question perfectly you have to know all the changes the program makes and when it makes them. You probably don't know this (if you did, you could reproduce the bug), so returning everything to initial states won't be easy. If you suspect initialization problems test from its initial state, turn off the computer and start over with a never-used copy of the program (make a supply of them.) BUG PREDICATED ON CORRUPTED DATA The program might corrupt its own data, on disk or in memory, or you may have fed the program bad data. The program chokes on the data, or detects the error but stumbles in the error handler. In either case, the error you're seeing is one of error detection and recovery. To reproduce the error, you must give the program the same data again. This sounds obvious, but every tester misses this point sometime. BUG IS A SIDE-EFFECT OF SOME OTHER PROBLEM This is an error recovery failure. The program fails, then, in handling the error, the program fails again. Often the second failure is much worse than the first. In watching the spectacular crash caused by the second bug, you may not notice that tiny first glitch. Your objective, after you realize that there is a first bug, is to reproduce the first one. The second one reproduces easily after that. INTERMITTENT HARDWARE FAILURE Hardware failures arc usually complete. Usually, for example, a memory chip will work or it won't. But heat build-up or power fluctuations may cause intermittent memory failures or memory chips may work loose and make intermittent connection. Data or code in memory are only occasionally corrupted. If you think this is 87 happening check the power supply first. Be reluctant to blame a bug on hardware. The problem is rarely in the hardware. TIME DEPENDENCY If the program keeps track of the time, it probably does special processing at midnight. A program that tracks the day may do special processing on New Year's and at the end of February in a leap year. The switch from December 31, 1999 to January 1, 2000, is being anticipated with dread because range checks, default date searches, and other assumptions built into many programs will fail. Check the effect of crossing a day, week, month, year, leap year, or century boundary. Bugs that happen once a day or once a week may be due to this kind of problem. RESOURCE DEPENDENCY In a multi-processing system, two or more processes (programs) share the CPU, resources, and memory. While one process uses the printer, the other must wait. If one uses 90% of available memory, the other is restricted to the remaining 10%. The process must be able to recover from denial of resources. To replicate a failure of recovery, you have to replicate denial of the resource (memory, printer, video, communication link, etc.) LONG FUSE An error may not have an immediate impact. The error may have to be repeated dozens of times before the program is on the brink of collapse. At this point, almost anything will crash it. A totally unrelated bug-free subroutine might do the magic thing that crashes the program. You'll be tempted to blame this latecomer, not the routines that slowly corrupted the system. As an example, many programs use a stack A stack is an area of memory reserved for transient data. The program puts data onto the "top" of the stack and takes data off the top. The stack may be small. You can fill it up. Suppose the stack can handle 256 bytes of data and Subroutine A always puts 10 bytes of data onto it and leaves them there instead of cleaning up when it's done. If no other routine takes those 10 bytes off the stack, then after you call Subroutine A 25 times, it has put 250 bytes of data onto the stack. There is only room for 6 more bytes. If Subroutine B, which has nothing to do with Subroutine A, tries to put 7 bytes of data onto the stack, the stack will overflow. Stack overflows often crash programs. You can call Subroutine B from now until the computer wears out; you will not repeat this error until you call A 25 times. When the routine you think is the culprit doesn't cause the system to fail, ask what routines preceded it. SPECIAL CASES IN THE CODE You don't know what the critical conditions are in the code. A cooperative programmer can save you hours or days of work trying to reproduce difficult bugs by suggesting follow-up tests. We list this last because you can alienate a good programmer by constantly pestering her about bugs you can't repeat. If you go to her with irreproducible bugs too often, she may well conclude that you are a sloppy tester and are wasting her time. [...]... customer complaints or magazine reviews The tracking system might be single-user or multi-user The typical single-user database sits on one computer in the Testing offices Everyone enters reports at this computer and runs reports from it Only testers have direct access to the computer, perhaps only some testers Problem Reports and summary status reports for each project are printed and circulated by one of... the problems Clearly, testers are not the only users of the tracking system The Testing Group maintains the system, but it belongs to the company as a whole, not to the Testing Group or to any individual tester THE LEAD TESTER The lead tester heads the testing effort for this project and is accountable for the quality of testing and problem reporting She may be the only tester allowed to close Problem... with development of microcomputer software packages written for retail sale, fix failure rates of 10% are very good That is, we are pleased with the attentiveness of the programmers we work with if we discover problems in only 10% of the bugs they send back to us as "fixed." We are annoyed but not outraged with failure rates as high as 25% As we noted in Chapter 3 ("Black box testing" ), much larger fix... debugging responsibilities from pro gramming to testing might also require an increase in testing staff, if the project is going to succeed - In some projects the testers are more skilled debuggers than the programmers or are more motivated to gather whatever information is necessary to demonstrate that a problem can be fixed It may be appropriate to drain testing resources in these cases, especially... will do rather than testing features individually or in controlled combinations This is healthy if it lasts for a few weeks, but if testers are frequently outperformed by someone else, review your testing strategy It seems ineffective • Does the pattern in the number of problems reported per week by each tester make sense? Usually a tester reports many design problems at the start of testing, then flurries... TINKERED WITH YOUR MACHINE This happens You do some testing, go to the washroom, and while you're away someone enters new data, tinkers with the program itself, or turns off the printer Maybe this is a practical joke Or maybe your manager just has to demonstrate this new program to a visitor and forgets to leave you a note Whenever you leave your computer or terminal logged on you risk returning to... summary reports about them A good system fosters accountability and communication about the bugs Unless the number of reports is trivial, you need an organized system Too many software groups still use pen-and-paper tracking procedures or computer- based systems that they consider awkward and primitive It's not so hard to build a good tracking system and it's worth it, even for small projects This chapter... is good feedback for the next release of the software But you are pushing your luck if you let late-joining testers write report after report demanding that an Amiga, Windows, or DeskMate product adopt Macintosh user interface standards • When published summary statistics showing the number of outstanding bugs include many that are fixed and waiting for retesting or are irreproducible or otherwise out... otherwise out of the programmers' and project manager's hands This unfairly underestimates the programmers' progress • When inaccurate summaries of the bug status are published For example, if 40 bugs are fixed and 40 new ones are reported, including 35 unrelated minor design issues, and the summary report notes say that most of the fixes appear to be generating new bugs, this is wrong The fixes are working... that writers and testers are in the same department In others, the writers have nothing to do with the database THE TEST MANAGER The test manager is accountable for the quality of the testing effort and for supervising the testing staff He reviews Problem Reports asking whether they suggest that a tester needs further training He also looks for communication or work-balancing problems between the test . single-user database sits on one computer in the Testing offices. Everyone enters reports at this computer and runs reports from it. Only testers have direct access to the computer, perhaps only some. Try this a few times before giving up. Try slowing the computer down or testing on a slower machine. FORGOTTEN DETAILS If you're testing on the fly (i.e., without a test plan) and you. project. LOOK FOR CONFIGURATION DEPENDENCE Suppose your computer has two megabytes of memory. Can you reproduce the bug on one that has 640 K or four megabytes? What if you add a network or window