Software Engineering For Students: A Programming Approach Part 28 pdf

248 Chapter 17 ■ Software robustness Let us turn to examining how an exception is thrown, using the same example. In Java, the method parseInt can be written as follows: public int parseInt(String string) throws NumberFormatException { int number = 0; for (int i = 0; i < string.length(); i++) { char c = string.charAt(i); if (c < '0' || c > '9') throw new NumberFormatException(); number = number * 10 + (c - '0'); } return number; } You can see that in the heading of the method the exception that may be thrown is declared, along with the specification of any parameters and return value. If this method detects that any of the characters within the string are illegal, it executes a throw instruction. This immediately terminates the method and transfers control to a catch block designed to handle the exception. In our example, the catch block is within the method that calls parseInt. Alternatively the try-catch combination can be written within the same method as the throw statement. Or it can be written within any of the methods in the calling chain that led to calling parseInt. Thus the designer can choose an appropriate place in the software structure at which to carry exception handling. The position in which the exception handler is written helps both to determine the action to be taken and what happens after it has dealt with the situation. > > SELF-TEST QUESTION 17.7 The method parseInt does not throw an exception if the string is of zero length. Amend it so that it throws the same exception in this situation. What happens after an exception has been handled? In the above example, the catch block ends with a return statement, which exits from the current method, actionPerformed and returns control to its caller. This is the appropriate action in this case – the program is able to recover and continue in a useful way. In general the options are either to recover from the exception and continue or to allow the program to gracefully degrade. The Java language mechanism supports various actions: ■ handle the exception. Control flow then either continues on down the program or the method can be exited using a return statement. ■ ignore the exception. This is highly dangerous and always leads to tears, probably after the software has been put into use. BELL_C17.QXD 1/30/05 4:24 PM Page 248 17.6 Recovery blocks 249 In the above example, the application program itself detected the exception. Sometimes, however, it is the operating system or the hardware that detects an exception. An example is an attempt to divide by zero, which would typically be detected by the hardware. The hardware would alert the run-time system or operating system, which in turn would enter any exception handler associated with this exception. The mechanism described above is the exception handling facility provided in Java. Similar mechanisms are provided in Ada and C++. In old software systems the simplest solution to handling exceptions was to resort to the use of a goto statement to transfer control out of the immediate locality and into a piece of coding designed to handle the situation. The use of a goto was particularly appealing when the unusual situation occurred deep within a set of method calls. The throw statement has been criticized as being a goto statement in disguise. The response is that throw is indeed a “structured goto”, but that its use is restricted to dealing with errors and therefore it cannot be used in an undisciplined way. In summary, exception handlers allow software to cope with unusual, but anticipated, events. The software can take appropriate remedial action and continue with its tasks. Exception handlers therefore provide a mechanism for forward error recovery. In Java, the mechanism consists of three ingredients: 1. a try block, in which the program attempts to behave normally 2. the program throws an exception 3. a catch block handles the exceptional situation. Recovery blocks are a way of structuring backward error recovery to cope with unanticipated faults. In backward error recovery, periodic dumps of the state of the system are made at recovery points. When a fault is detected, the system is restored to its state at the most recent recovery point. (The assumption is that this is a correct state of the system.) The system now continues on from the recovery point, using some alternative course of action so as to avoid the original problem. An analogy: if you trip on a banana skin and spill your coffee, you can make a fresh cup (restore the state of the system) and carry on (carefully avoiding the banana skin). 17.6 ● Recovery blocks SELF-TEST QUESTION 17.8 What happens if the return statement is omitted in the above example of the exception handler? ■ throw another exception. This passes the buck to another exception handler further up the call chain, which the designer considers to be a more appropriate place to handle the exception. BELL_C17.QXD 1/30/05 4:24 PM Page 249 250 Chapter 17 ■ Software robustness As shown in Figure 17.3, backward error recovery needs: 1. the primary software component that is normally expected to work 2. a check that it has worked correctly 3. an alternative piece of software that can be used in the event of the failure of the primary module. We also need, of course, a mechanism for taking dumps of the system state and for restoring the system state. The recovery block notation embodies all of these features. Taking as an example a program that uses a method to sort some information, a fault tolerant fragment of program looks like this: ensure dataStillValid by superSort else by quickSort else by slowButSureSort else error Here supersort is the primary component. When it has tried to sort the information, the method dataStillValid tests to see whether a failure occurred. If there was a fault, the state of the program is restored to what it was before the sort method was executed. The alternative method quickSort is then executed. Should this now fail, a third alternative is provided. If this fails, there is no other alternative available, and the whole component has failed. This does not necessarily mean that the whole program will fail, as there may be other recovery blocks programmed by the user of this sort module. What kinds of fault is this scheme designed to cope with? The recovery block mechanism is designed primarily to deal with unanticipated faults that arise from bugs (design faults) in the software. When a piece of software is complete, it is to be expected that there will be residual faults in it, but what cannot be anticipated is the whereabouts of the bugs. User Normal module Checking module Alternative module Figure 17.3 Components in a recovery block scheme > > BELL_C17.QXD 1/30/05 4:24 PM Page 250 17.6 Recovery blocks 251 Recovery blocks will, however, also cope with hardware faults. For example, suppose that a fault develops in the region of main memory containing the primary sort method. The recovery block mechanism can then recover by switching over to an alternative method. There are stories that the developers of the recovery block mechanism at Newcastle University, England, used to invite visitors to remove memory boards from a live computer and observe that the computer continued apparently unaffected. We now examine some of the other aspects of recovery blocks. The acceptance test You might think that acceptance tests would be cumbersome methods, incurring high overheads, but this need not be so. Consider for example a method to calculate a square root. A method to check the outcome, simply by multiplying the answer by itself, is short and fast. Often, however, an acceptance test cannot be completely foolproof – because of the performance overhead. Take the example of the sort method. The acceptance test could check that the information had been sorted, that is, is in sequence. However, this does not guarantee that items have not been lost or created. An acceptance test, therefore, does not normally attempt to ensure the correctness of the software, but instead carries out a check to see whether the results are acceptably good. Note that if a fault like division by zero, a protection violation, an array subscript out of range occurs while one of the sort methods is being executed, then these also con- stitute the result of checks on the behavior of the software. (These are checks carried out by the hardware or the run-time system.) Thus either software acceptance tests or hardware checks can trigger fault tolerance. The alternatives The software components provided as backups must accomplish the same end as the primary module. But they should achieve this by means of a different algorithm so that the same problem doesn’t arise. Ideally the alternatives should be developed by different programmers, so that they are not unwittingly sharing assumptions. The alternatives should also be less complex than the primary, so that they will be less likely to fail. For this reason they will probably be poorer in their performance (speed). Another approach is to create alternatives that provide an increasingly degraded service. This allows the system to exhibit what is termed graceful degradation. As an example of graceful degradation, consider a steel rolling mill in which a computer controls a machine that chops off the required lengths of steel. Normally the computer employs a sophisticat- ed algorithm to make optimum use of the steel, while satisfying customers’ orders. Should this algorithm fail, a simpler algorithm can be used that processes the orders strictly sequentially. This means that the system will keep going, albeit less efficiently. Implementation The language constructs of the recovery block mechanism hide the preservation of variables. The programmer does not need to explicitly declare which variables should be stored and when. The system must save values before any of the alternatives is executed, BELL_C17.QXD 1/30/05 4:24 PM Page 251 252 Chapter 17 ■ Software robustness and restore them should any of the alternatives fail. Although this may seem a formidable task, only the values of variables that are changed need to be preserved, and the notation highlights which ones these are. Variables local to the alternatives need not be stored, nor need parameters passed by value. Only global variables that are changed need to be preserved. Nonetheless, storing data in this manner probably incurs too high an overhead if it is carried out solely by software. Studies indicate that, suitably implemented with hardware assistance, the speed overhead might be no more than about 15%. No programming language has yet incorporated the recovery block notation. Even so, the idea provides a framework which can be used, in conjunction with any programming language, to structure fault tolerant software. This form of programming means developing n versions of the same software component. For example, suppose a fly-by-wire airplane has a software component that decides how much the rudder should be moved in response to information about speed, pitch, throttle setting, etc. Three or more version of the component are implemented and run concurrently. The outputs are compared by a voting module, the majority vote wins and is used to control the rudder (see Figure 17.4). It is important that the different versions of the component are developed by different teams, using different methods and (preferably) at different locations, so that a mini- mum of assumptions are shared by the developers. By this means, the modules will use different algorithms, have different mistakes and produce different outputs (if they do) under different circumstances. Thus the chances are that when one of the components fails and produces an incorrect result, the others will perform correctly and the faulty component will be outvoted by the majority. Clearly the success of an n-programming scheme depends on the degree of inde- pendence of the different components. If the majority embody a similar design fault, they will fail together and the wrong decision will be the outcome. This is a bold assumption, and some studies have shown a tendency for different developers to com- mit the same mistakes, probably because of shared misunderstandings of the (same) specification. The expense of n-programming is in the effort to develop n versions, plus the processing overhead of running the multiple versions. If hardware reliability is also an issue, 17.7 ● n-version programming Version 1 Version 2 Version 3 Voting module Input data Output data Figure 17.4 Triple modular redundancy BELL_C17.QXD 1/30/05 4:24 PM Page 252 17.8 Assertions 253 as in fly-by-wire airplanes, each version runs on a separate (but identical) processor. The voting module is small and simple, consuming minimal developer and processor time. For obvious reasons, an even number of versions is not appropriate. The main difference between the recovery block and the n-version schemes is that in the former the different versions are executed sequentially (if need be). Is n-programming forward error recovery or is it backward error recovery? The answer is that, once an error is revealed, the correct behavior is immediately available and the system can continue forwards. So it is forward error recovery. Assertions are statements written into software that say what should be true of the data. Assertions have been used since the early days of programming as an aid to verifying the correctness of software. An assertion states what should always be true at a particular point in a program. Assertions are usually placed: ■ at the entry to a method – called a precondition, it states what the relationship between the parameters should be ■ at the end of a method – called a postcondition, it states what the relationship between the parameters should be ■ within a loop – called a loop invariant, it states what is always true, before and after each loop iteration, however many iterations the loop has performed. ■ at the head of a class – called a class invariant, it states what is always true before and after a call on any of the class’s public methods. The assertion states a relationship between the variables of an instance of the class. An example should help see how assertions can be used. Take the example of a class that implements a data structure called a stack. Items can be placed in the data structure by calling the public method push and removed by calling pop. Let us assume that the stack has a fixed length, described by a variable called capacity. Suppose the class uses a variable called count to record how many items are currently in the stack. Then we can make the following assertions at the level of the class. These class invariant is: assert count >= 0; assert capacity >= count; These are statements which must always be true for the entire class, before or after any use is made of the class. We can also make assertions for the individual methods. Thus for method push, we can say as a postcondition: assert newCount = oldCount + 1; For the method push, we can also state the following precondition: assert oldCount < capacity; 17.8 ● Assertions BELL_C17.QXD 1/30/05 4:24 PM Page 253 254 Chapter 17 ■ Software robustness Note that truth of assertions does not guarantee that the software is working correctly. However, if the value of an assertion is false, then there certainly is a fault in the software. Note also that violation of a precondition means that there is a fault in the user of the method; a violation of a postcondition means a fault in the method itself. There are two main ways to make use of assertions. One way is to write assertions as comments in a program, to assist in manual verification. On the other hand, as indicated by the notation used above, some programming languages (including Java) allow assertions to be written as part of the language – and their correctness is checked at run- time. If an assertion is found to be false, an exception is thrown. There is something of an argument about whether assertions should be used only during development, or whether they should also be enabled when the software is put into productive use. Fault tolerance in hardware has long been recognized – and accommodated. Electronic engineers have frequently incorporated redundancy, such as triple modular redundancy, within the design of circuits to provide for hardware failure. Fault tolerance in software has become more widely addressed in the design of computer systems as it has become recognized that it is almost impossible to produce correct software. Exception handling is now supported by all the mainstream software engineering languages – Ada, C++, Visual Basic, C# and Java. This means that designers can provide for failure in an organ- ized manner, rather than in an ad hoc fashion. Particularly in safety-critical systems, either recovery blocks or n-programming is used to cope with design faults and enhance reliability. Fault tolerance does, of course, cost money. It requires extra design and programming effort, extra memory and extra processing time to check for and handle exceptions. Some applications need greater attention to fault tolerance than others, and safety-critical systems are more likely to merit the extra attention of fault tolerance. However, even software packages that have no safety requirements often need fault tolerance of some kind. For example, we now expect a word processor to perform periodic and automatic saving of the current document, so that recovery can be performed in the event of power failure or software crash. End users are increasingly demanding that the software cleans up properly after failures, rather than leave them with a mess that they cannot salvage. Thus it is likely that ever-increasing attention will be paid to improving the fault tolerance of software. 17.9 ● Discussion SELF-TEST QUESTION 17.9 Write pre- and post-conditions for method pop. BELL_C17.QXD 1/30/05 4:24 PM Page 254 Exercises 255 17.1 For each of the computer systems detailed in Appendix A, list the faults that can arise, categorizing them into user errors, hardware faults and software faults. Decide whether each of the faults is anticipated or unanticipated. Suggest how the faults could be dealt with. 17.2 Explain the following terms, giving an example of each to illustrate your answer: fault tolerance, software fault tolerance, reliability, robustness, graceful degradation. Summary Faults in computer systems are caused by hardware failure, software bugs and user error. Software fault tolerance is concerned with: ■ detecting faults ■ assessing damage ■ repairing the damage ■ continuing. Of these, faults can be detected by both hardware and software. One hardware mechanism for fault detection is protection mechanisms, which have two roles: 1. they limit the spread of damage, thus easing the job of fault tolerance 2. they help find the cause of faults. Faults can be classified in two categories – anticipated and unanticipated. Recovery mechanisms are of two types: ■ backward – the system returns to an earlier, safe state ■ forward – the system continues onwards from the error. Anticipated faults can be dealt with by means of forward error recovery. Exception handlers are a convenient programming language facility for coping with these faults. Unanticipated faults – such as software design faults – can be handled using either of: ■ recovery blocks, a backward error recovery mechanism ■ n-programming, a forward error recovery mechanism. Assertions are a way of stating assumptions that should be valid when software executes. Automatic checking of assertions can assist debugging. Exercises • BELL_C17.QXD 1/30/05 4:24 PM Page 255 256 Chapter 17 ■ Software robustness 17.3 Consider a programming language with which you are familiar. In what ways can you deliberately (or inadvertently) write a program that will: 1. crash 2. access main memory in an undisciplined way 3. access a file protected from you. What damage is caused by these actions? How much damage is possible? Assuming you didn’t already know it, is it easy to diagnose the cause of the problem? Contemplate that if it is possible deliberately to penetrate a system, then it is certainly possible to do it by accident, thus jeopardizing the reliability and security of the system. 17.4 “Compile-time checking is better than run-time checking.” Discuss. 17.5 Compare and contrast exception handling with assertions. 17.6 The Java system throws an IndexOutOfBoundsException exception if a program attempts to access elements of an array that lie outside the valid range of subscripts. Write a method that calculates the total weekly rainfall, given an array of floating point numbers (values of the rainfall for each of seven days of the week) as its single parameter. The method should throw an exception of the same type if an array is too short. Write code to catch the exception. 17.7 Outline the structure of recovery block software to cope with the following situation. A fly-by-wire aircraft is controlled by software. A normal algorithm calculates the opti- mal speed and the appropriate control surface and engine settings. A safety module checks that the calculated values are within safe limits. If they are not, it invokes an alternative module that calculates some safe values for the settings. If, again, this module fails to suggest safe values, the pilots are alerted and the aircraft reverts to manual control. 17.8 Compare and contrast the recovery block scheme with the n-programming scheme for fault tolerance. Include in your review an assessment of the development times and performance overheads associated with each scheme. 17.9 Searching a table for a desired object is a simple example of a situation in which it can be tempting to use a goto to escape from an unusual situation. Write a piece of program to search a table three ways: 1. using goto 2. using exceptions 3. avoiding both of these. Compare and contrast the three solutions. BELL_C17.QXD 1/30/05 4:24 PM Page 256 Answers to self-test questions 257 17.10 Consider a program to make a copy of a disk file. Devise a structure for the program that uses exception handlers so that it copes with the following error situations: 1. the file doesn’t exist (there is no file with the stated name) 2. there is a hardware fault when reading information from the old file 3. there is a hardware fault when writing to the new file. Include in your considerations actions that the filing system (or operating system) needs to take. 17.11 Explain the difference between using a goto statement and using a throw statement. Discuss their relative advantages for dealing with exceptions. 17.12 “There is no such thing as an exceptional situation. The software should explicitly deal with all possible situations.” Discuss. 17.13 Some word processors provide an undo command. Suppose we interpret a user wanting to undo what they have done as a fault, what form of error recovery does the software provide and how is it implemented? 17.14 Examine the architecture and operating system of a computer for which you have documentation. Investigate what facilities are provided for detecting software and hardware faults. 17.15 Compare and contrast approaches to fault tolerance in software with approaches for hardware. Answers to self-test questions 17.1 1. unanticipated 2. unanticipated 3. unanticipated 4. anticipated 5. anticipated 17.2 stack overflow use of a null pointer 17.3 The module could check that all the items in the new array are in order. (This is not foolproof because the new array could contain different data to the old.) 17.4 Pro: prevent the spread of damage, assist in diagnosing the cause. Cons: expensive hardware and software, reduction in performance (speed). ➞ BELL_C17.QXD 1/30/05 4:24 PM Page 257 . many iterations the loop has performed. ■ at the head of a class – called a class invariant, it states what is always true before and after a call on any of the class’s public methods. The assertion. of the data. Assertions have been used since the early days of programming as an aid to verifying the correctness of software. An assertion states what should always be true at a particular point. contrast approaches to fault tolerance in software with approaches for hardware. Answers to self-test questions 17.1 1. unanticipated 2. unanticipated 3. unanticipated 4. anticipated 5. anticipated 17.2

Định dạng
Số trang	10
Dung lượng	164,88 KB