REAL-TIME SYSTEMS DESIGN AND ANALYSIS phần 8 doc

6.6 CODING STANDARDS 347 Other language constructs that may need to be considered include: ž Use of while loops versus for loops or do-while loops. ž When to “unroll” loops, that is, to replace the looping construct with repet- itive code (thus saving the loop overhead as well as providing the compiler the opportunity to use faster, direct, or single indirect mode instructions). ž Comparison of variable types and their uses (e.g., when to use short integer in C versus Boolean, when to use single precision versus double precision floating point, and so forth). ž Use of in-line expansion of code via macros versus procedure calls. This is, by no means, an exhaustive list. While good compilers should provide optimization of the assembly language code output so as to, in many cases, make the decisions just listed, it is important to discover what that optimization is doing to produce the resultant code. For example, compiler output can be affected by optimization for speed, memory and register usage, jumps, and so on, which can lead to inefficient code, timing problems, or critical regions. Thus, real-time systems engineers must be masters of their compilers. That is, at all times the engineer must know what assembly language code will be output for a given high-order language statement. A full understanding of each compiler can only be accomplished by developing a set of test cases to exercise it. The conclusions suggested by these tests can be included in the set of coding standards to foster improved use of the language and, ultimately, improved system performance. When building real-time systems, no matter which language, bear in mind these rules of thumb: ž Avoid recursion (and other nondeterministic constructs where possible). ž Avoid unbounded while loops and other temporally unbounded structures. ž Avoid priority inversion situations. ž Avoid overengineering/gold-plating. ž Know your compiler! 6.6 CODING STANDARDS Coding standards are different from language standards. A language standard, for example, ANSI C, embodies the syntactic rules of the language. A program violating those rules will be rejected by the compiler. Conversely, a coding standard is a set of stylistic conventions. Violating the conventions will not lead to compiler rejection. In another sense, compliance with language standards is mandatory, while compliance with coding standards is voluntary. Adhering to language standards fosters portability across different compilers and, hence, hardware environments. Complying with coding standards will not foster portability, but rather in many cases, readability and maintainability. Some 348 6 PROGRAMMING LANGUAGES AND THE SOFTWARE PRODUCTION PROCESS even contend that the use of coding standards can increase reliability. Coding standards may also be used to foster improved performance by encouraging or mandating the use of language constructs that are known to generate more efficient code. Many agile methodologies, for example, eXtreme Programming, embrace coding standards. Coding standards involve standardizing some or all of the following elements of programming language use: ž Header format. ž Frequency, length, and style of comments. ž Naming of classes, methods, procedures, variable names, data, file names, and so forth. ž Formatting of program source code, including use of white space and inden- tation. ž Size limitations on code units, including maximum and minimum lines of code, and number of methods. ž Rules about the choice of language construct to be used; for example, when to use case statements instead of nested if-then-else statements. While it is unclear if conforming to these rules fosters improvement in reliability, clearly close adherence can make programs easier to read and understand and likely more reusable and maintainable. There are many different standards for coding that are language independent, or language specific. Coding standards can be teamwide, companywide, user-group specific (for example, the Gnu software group has standards for C and C++), or customers can require conformance to a specific standard that they own. Still other standards have come into the public domain. One example is the Hungarian notation standard, named in honor of Charles Simonyi, who is credited with first promulgating its use. Hungarian notation is a public domain standard intended to be used with object-oriented languages, particularly C++. The standard uses a complex naming scheme to embed type information about the objects, methods, attributes, and variables in the name. Because the standard essentially provides a set of rules about naming variables, it can be and has been used with other languages, such as C++, Ada, Java, and even C. Another example is in Java, which, by convention, uses all uppercase for constants such as PI and E.Further, some classes use a trailing underscore to distinguish an attribute like x from a method like x(). One problem with standards like the Hungarian notation is that they can create mangled variable names, in that they direct focus on how to name in Hungarian rather than a meaningful name of the variable for its use in code. In other words, the desire to conform to the standard may not result in a particularly meaningful variable name. Another problem is that the very strength of a coding standard can be its own undoing. For example, in Hungarian notation what if the type information embedded in the object name is, in fact, wrong? There is no way for 6.7 EXERCISES 349 a compiler to check this. There are commercial rules wizards, reminiscent of lint, that can be tuned to enforce the coding standards, but they must be programmed to work in conjunction with the compiler. Finally, adoption of coding standards is not recommended midproject. It is much easier to start conforming than to be required to change existing code to comply. The decision to use coding standards is an organizational one that requires significant forethought and debate. 6.7 EXERCISES 6.1 Which of the languages discussed in this chapter provide for some sort of goto statement? Does the goto statement affect performance? If so, how? 6.2 It can be argued that in some cases there exists an apparent conflict between good software engineering techniques and real-time performance. Consider the relative merits of recursive program design versus interactive techniques, and the use of global variables versus parameter lists. Using these topics and an appropriate programming language for examples, compare and contrast real-time performance versus good software engineering practices as you understand them. 6.3 What other compiler options are available for your compiler and what do they do? 6.4 In the object-oriented language of your choice, design and code an “image” class that might be useful across a wide range of projects. Be sure to follow the best principles of object-oriented design. 6.5 In a procedural language of your choice develop an abstract data type called “image” with associated functions. Be sure to follow the principle of information hiding. 6.6 Write a set of coding standards for use with any of the real-time applications introduced in Chapter 1 for the programming language of your choice. Document the rationale for each provision of the coding standard. 6.7 Develop a set of tests to exercise a compiler to determine the best use of the language in a real-time processing environment. For example, your tests should determine such things as when to use case statements versus nested if-then- else statements; when to use integers versus Boolean variables for conditional branching; whether to use while or for loops, and when; and so on. 6.8 How can misuse or misunderstanding of a software technology impede a software project? For example, writing structured C code instead of classes in C++,or reinventing a tool for each project instead of using a standard one. 6.9 Compare how Ada95 and Java handle the goto statement. What does this indicate about the design principles or philosophy of each language? 6.10 Java has been compared to Ada95 in terms of hype and “unification” – defend or refute the arguments against this. 6.11 Are there language features that are exclusive to C/C++? Do these features provide any advantage or disadvantage in embedded environments? 6.12 What programming restrictions should be used in a programming language to per- mit the analysis of real-time applications? 7 PERFORMANCE ANALYSIS AND OPTIMIZATION 7.1 THEORETICAL PRELIMINARIES Of all the places where theory and practice never seem to coincide, none is more obvious than in performance analysis. For all the well-written and well- meaning research on real-time performance analysis, those that have built real systems know that practical reality has the annoying habit of getting in the way of theoretical results. Neat little formulas that ignore resource contention, use theoretically artificial hardware, or have made the assumption of zero context switch time are good as abstract art, but of little practical use. These observations, however, do not mean that theoretical analysis is useless or that there are no useful theoretical results. It only means that there are far less realistic, cookbook approaches than might be desired. 7.1.1 NP-Completeness The complexity class P is the class of problems that can be solved by an algorithm that runs in polynomial time on a deterministic machine. The complexity class NP is the class of all problems that cannot be solved in polynomial time by a deterministic machine, although a candidate solution can be verified to be correct by a polynomial time algorithm. A decision or recognition problem is NP-complete if it is in the class NP and all other problems in NP are polynomial Some of this chapter has been adapted from Phillip A. Laplante, Software Engineering for Image Processing, CRC Press, Boca Raton, FL, 2003. Real-Time Systems Design and Analysis, By Phillip A. Laplante ISBN 0-471-22855-9  2004 Institute of Electrical and Electronics Engineers 351 352 7 PERFORMANCE ANALYSIS AND OPTIMIZATION transformable to it. A problem is NP-hard if all problems in NP are polynomial transformable to that problem, but it hasn’t been shown that the problem is in the class NP. The Boolean Satisfiability Problem, for example, which arose during require- ments consistency checking in Chapter 4 is NP-complete. NP-complete problems tend to be those relating to resource allocation, which is exactly the situation that occurs in real-time scheduling. This fact does not bode well for the solution of real-time scheduling problems. 7.1.2 Challenges in Analyzing Real-Time Systems The challenges in finding workable solutions for real-time scheduling problems can be seen in more than 30 years of real-time systems research. Unfortunately most important problems in real-time scheduling require either excessive practical constraints to be solved or are NP-complete or NP-hard. Here is a sampling from the literature as summarized in [Stankovic95]. 1. When there are mutual exclusion constraints, it is impossible to find a totally on-line optimal run-time scheduler. 2. The problem of deciding whether it is possible to schedule a set of periodic processes that use semaphores only to enforce mutual exclusion is NP-hard. 3. The multiprocessor scheduling problem with two processors, no resources, arbitrary partial-order relations, and every task having unit computation time is polynomial. A partial-order relation indicates that any process can call itself (reflexivity), if process A calls process B, then the reverse is not possible (antisymmetry), and if process A calls process B and process B calls process C, than process A can call process C (transitivity). 4. The multiprocessor scheduling problem with two processors, no resources, independent tasks, and arbitrary computation times is NP-complete. 5. The multiprocessor scheduling problem with two processors, no resources, independent tasks, arbitrary partial order, and task computation times of either 1 or 2 units of time is NP-complete. 6. The multiprocessor scheduling problem with two processors, one resource, a forest partial order (partial order on each processor), and each computation time of every task equal to 1 is NP-complete. 7. The multiprocessor scheduling problem with three or more processors, one resource, all independent tasks, and each computation time of every task equal to 1 is NP-complete. 8. Earliest deadline scheduling is not optimal in the multiprocessing case. 9. For two or more processors, no deadline scheduling algorithm can be optimal without complete a priori knowledge of deadlines, computation times, and task start times, It turns out that most multiprocessor scheduling problem are in NP, but for deterministic scheduling this is not a major problem because a polynomial scheduling 7.1 THEORETICAL PRELIMINARIES 353 algorithm can be used to develop an optimal schedule if the specific problem is not NP-complete [Stankovic95]. In these cases, alternative, off-line heuristic search techniques can be used. These off-line techniques usually only need to find feasible schedules, not optimal ones. But this is what engineers do when workable theories do not exist – engineering judgment must prevail. 7.1.3 The Halting Problem The Halting Problem, simply stated, is: does there exist a computer program that takes an arbitrary program, P i , and an arbitrary set of inputs, I j , and determines whether or not P i will halt on I j (Figure 7.1). The question of the existence of such an oracle is more than a theoretical exercise, and it has important implications in the development of process monitors, program verification, and in schedulability analysis. Unfortunately, such an oracle cannot be built. 1 Thus the Halting Problem is unsolvable. There are several ways to demonstrate this sur- prising fact. One way is using Cantor’s diagonal argument, first used to show that the real numbers are not countably denumerable. It should be clear that every possible program, in any computer language, can be encoded using a numbering scheme in which each program is represented as the binary expansion of the concatenated source-code bytes. The same encoding can be used with each input set. Then if the proposed oracle could be built, its behavior would be described in tabular form as in Table 7.1. That is, for each program P i and each input set I j it would simply have to determine if program P i halts on I j . Such an oracle would have to account for every conceivable program and input set. In Table 7.1, the ↑ symbol indicates that the program does not halt and the symbol ↓ indicates that the program will halt on the corresponding input. How- ever, the table is always incomplete in that a new program P ∗ can be found Oracle Set of Inputs to Program I j Halt or No Halt Decision Arbitirary Program p i Source Code Figure 7.1 A graphical depiction of the Halting Problem. 1 Strictly speaking, such an oracle can be built if it is restricted to a computer with fixed-size memory since, eventually, a maximum finite set of inputs would be reached, and hence the table could be completed. 354 7 PERFORMANCE ANALYSIS AND OPTIMIZATION Table 7.1 Diagonalization argument to show that no oracle can be constructed to solve the Halting Problem I 1 I 2 I n P 1 . . . . . . . . . . . P 2 . . . P n . . . P * ↑ ↑ ↓↓ ↓↓ ↓↑↑ ↑↑↓ that differs from every other in at least the input at the diagonal. Even with the addition of a new program P ∗ , the table cannot be completed because a new P ∗ can be added that is different from every other program by using the same construction. To see the relevance of the Halting Problem to real-time systems suppose a schedulability analyzer is to take an arbitrary program and the set of all possible inputs to that program and determine the best-, worst-, and average-case execution times for that program (Figure 7.2). A model of the underlying machine is also needed, but this can be incorporated as part of the input set. It is easy to see that is a manifestation of the Halting Problem, since in order to determine the running time, the analyzer must know when (and hence, if) the program stops. While it is true that given a program in a specific language and a fixed set of inputs, the execution times can be found, the running times can be determined only through heuristic techniques that are not generalizable, that is, they could not work for an arbitrary and dynamic set of programs. The Halting Problem also has implications in process monitoring. For example, is a process deadlocked or simply waiting? And also in the theory of recursive programs, for example, will a recursive program finish referencing itself? Schedulability Analyzer Model of Target Computer System Best, Worst-, Average-Case Execution Times Program Source Code Figure 7.2 A schedulability analyzer whose behavior is related to the Halting Problem. 7.1 THEORETICAL PRELIMINARIES 355 7.1.4 Amdahl’s Law Amdahl’s Law is a statement regarding the level of parallelization that can be achieved by a parallel computer [Amdahl67]. 2 Amdahl’s law states that for a constant problem size, speedup approaches zero as the number of processor elements grows. It expresses a limit of parallelism in terms of speedup as a software property, not a hardware one. Formally, let n be the number of processors available for parallel processing. Let s be the fraction of the code that is of a serial nature only, that is, it cannot be parallelized. A simple reason why a portion of code cannot be parallelized would be a sequence of operations, each depending on the result of the previous operation. Clearly (1 − s) is the fraction of code that can be parallelized. The speedup is then given as the ratio of the code before allocation to the parallel processors to the ratio of that afterwards. That is, Speedup = s + (1 −s)  s + (1 − s) n  = 1  s + (1 − s) n  = 1  ns n + (1 − s) n  = 1  ns + 1 −s n  = n ns + 1 −s Hence, Speedup = n 1 + (n − 1)s (7.1) Clearly for s = 0 linear speedup can be obtained as a function of the number of processors. But for s>0, perfect speedup is not possible due to the sequential component. Amdahl’s Law is frequently cited as an argument against parallel systems and massively parallel processors. For example, it is frequently suggested that “there will always be a part of the computation which is inherently sequential, [and that] 2 Some of the following two sections has been adapted from Gilreath, W. and Laplante, P., Computer Architecture: A Minimalist Perspective, Kluwer Academic Publishers, Dordrecht, The Netherlands, 2003 [Gilreath03]. 356 7 PERFORMANCE ANALYSIS AND OPTIMIZATION no matter how much you speed up the remaining 90 percent, the computation as a whole will never speed up by more than a factor of 10. The processors working on the 90 percent that can be done in parallel will end up waiting for the single processor to finish the sequential 10 percent of the task” [Hillis98]. But the argument is flawed. One underlying assumption of Amdahl’s law is that the problem size is constant, and then at some point there is a diminishing margin of return for speeding up the computation. Problem sizes, however, tend to scale with the size of a parallel system. Parallel systems that are bigger in number of processors are used to solve very large problems in science and mathematics. Amdahl’s Law stymied the field of parallel and massively parallel computers, creating an insoluble problem that limited the efficiency and application of parallelism to different problems. The skeptics of parallelism took Amdahl’s Law as the insurmountable bottleneck to any kind of practical parallelism, which ultimately impacted on real-time systems. However, later research provided new insights into Amdahl’s Law and its relation to parallelism. 7.1.5 Gustafson’s Law Gustafson demonstrated with a 1024-processor system that the basic presump- tions in Amdahl’s Law are inappropriate for massive parallelism [Gustafson88]. Gustafson found that the underlying principle that “the problem size scales with the number of processors, or with a more powerful processor, the problem expands to make use of the increased facilities is inappropriate” [Gustafson88]. Gustafson’s empirical results demonstrated that the parallel or vector part of a program scales with the problem size. Times for vector start-up, program loading, serial bottlenecks, and I/O that make up the serial component of the run do not grow with the problem size [Gustafson88]. Gustafson formulated that if the serial time, s, and parallel time, p = (1 − s), on a parallel system with n processors, then a serial processor would require the time: s + p ·n(7.2) Comparing the plots of Equations 7.1 and 7.2 in Figure 7.3, it can be seen that Gustafson presents a much more optimistic picture of speedup due to parallelism than does Amdahl. Unlike the curve for Amdahl’s Law, Gustafson’s Law is a simple line, “one with a much more moderate slope: 1 − n. It is thus much easier to achieve parallel performance than is implied by Amdahl’s paradigm” [Gustafson88]. A different take on the flaw of Amdahl’s Law can be observed as “a more efficient way to use a parallel computer is to have each processor perform similar work, but on a different section of the data where large computations are con- cerned this method works surprisingly well” [Hillis98]. Doing the same task but on a different range of data circumvents an underlying presumption in Amdahl’s Law, that is, “the assumption that a fixed portion of the computation must be sequential. This estimate sounds plausible, but it turns out not to be true of most computations” [Hillis98]. [...]... 1) x) 380 7 PERFORMANCE ANALYSIS AND OPTIMIZATION Table 7.3 Look-up table for trigonometric functions Angle (rads) Cosine Sine Angle (rads) Cosine Sine 0.000 1.000 0.000 6. 981 0.766 0.643 0.6 98 0.766 0.643 7.679 0.174 0. 985 1.396 0.174 0. 985 8. 3 78 −0.500 0 .86 6 2.094 −0.500 0 .86 6 9.076 −0.940 0.342 2.793 −0.940 0.342 9.774 −0.940 −0.342 3.491 −0.940 −0.342 10.472 −0.500 −0 .86 6 4. 189 −0.500 −0 .86 6 11.170... /m! m k k=0 (λ/µ) /k! (7. 18) This result dates back to 1917 [Kleinrock75] Applying Erlang’s Formula to the previous example gives m = 4, λ = 380 , and µ = 16.5; then p= ( 380 /16.5)4 /4! 1 + ( 380 /16.5) + ( 380 /16.5)2 /2 + ( 380 /16.5)3 /3! + ( 380 /16.5)4 /4! = 0 .83 4 This means there is a potential for time overloading 83 .4% of the time Based on the average time-loading figure of 98% and the rate-monotonic theorem,... 4. 189 −0.500 −0 .86 6 11.170 0.174 −0. 985 4 .88 7 0.174 −0. 985 11 .86 8 0.766 −0.643 5. 585 0.766 −0.643 12.566 1.000 0.000 6. 283 1.000 0.000 the algorithm is interruptible, and hence helps improve performance as compared to a series expansion Look-up tables are widely used in the implementation of continuous functions such as the exponential sine, cosine, and tangent functions and their inverses For example,... · 180 = 360 − LSB with granularity (7.23) ◦ 2−(n−1) · 180 = LSB (7.24) Consider the 16-bit BAM word: 0000 0000 10100 110 Its binary angular measurement is 166 · 180 ◦ · 2−15 = 0.91 18 180 90 45 22.5 … Figure 7.6 … … … … … … … … … 180 · 2−14 180 · 2−15 A 16-bit binary angular measurement word [Laplante03c] 7.5 PERFORMANCE OPTIMIZATION 379 BAM can be added and subtracted together and multiplied and. .. times and 7.2 PERFORMANCE ANALYSIS 363 wait states, which can vary depending on the source region of the instruction or data in memory Some companies that frequently design real-time systems on a variety of platforms use simulation programs to predict instruction execution time and CPU throughput Then engineers can input the CPU types, memory speeds for each region of memory, and an instruction mix, and. .. substantial interrupt latency, and this must be included in the overall latency calculation Interrupts are disabled for a number of reasons, including protection of critical regions, buffering routines, and context switching 7.2 .8 Deterministic Performance Cache, pipelines, and DMA, all designed to improve average real-time performance, destroy determinism and thus make prediction of real-time performance troublesome... patches and hand-tuning compiler output Often, however, use of these practices leads to code that is unmaintainable and unreliable because it may be poorly documented More desirable, then, is to use coding “tricks” that involve direct interaction with the high-level language and that can be documented These tricks improve real-time performance, but generally not at the expense of maintainability and reliability... PERFORMANCE ANALYSIS AND OPTIMIZATION APPLICATION OF QUEUING THEORY The classic queuing problem involves one or more producer processes called servers and one or more consumer processes called customers Queuing theory has been applied to the analysis of real-time systems this way since the mid1960s (e.g., [Martin67]), yet it seems to have been forgotten in modern realtime literature A standard notation... worst path and counting the instructions shows that there are 12 integer and 15 floating-point instructions for a total execution time of 0 .82 2 millisecond Since this program runs in a 5-millisecond cycle, the time-loading is 0 .82 2/5 = 16.5% If the other cycles were analyzed to have a utilization as follows – 1-second 360 7 PERFORMANCE ANALYSIS AND OPTIMIZATION cycle 1%, 10-millisecond cycle 30%, and 40-millisecond... D3 E3 F4 D4 F5 E4 D5 (Flush) F9 2 4 6 8 10 12 D9 F10 E9 D10 E10 (flush) F11 D11 F12 D12 E12 14 16 18 20 22 24 26 Time in microseconds This represents a total execution time of 26 microseconds For path 3, the pipeline execution trace looks like F1 D1 E1 F2 D2 E2 F3 D3 E3 F4 D4 F5 E4 D5 F6 E5 D6 F7 E6 D7 F8 E7 D8 F9 E8 E9 (flush) F12 2 4 6 8 10 12 14 16 D12 E12 18 20 22 24 26 Time in microseconds This . Raton, FL, 2003. Real-Time Systems Design and Analysis, By Phillip A. Laplante ISBN 0-471-2 285 5-9  2004 Institute of Electrical and Electronics Engineers 351 352 7 PERFORMANCE ANALYSIS AND OPTIMIZATION transformable. easier to read and understand and likely more reusable and maintainable. There are many different standards for coding that are language independent, or language specific. Coding standards can be. and real-time performance. Consider the relative merits of recursive program design versus interactive techniques, and the use of global variables versus parameter lists. Using these topics and

Định dạng
Số trang	53
Dung lượng	622,4 KB