Model-Based Design for Embedded Systems- P6 pptx

Nicolescu/Model-Based Design for Embedded Systems 67842_C001 Finals Page 26 2009-10-1 26 Model-Based Design for Embedded Systems Proceedings of the Tenth International Symposium on Hardware/Software Codesign, pp. 187–192, New York, 2002. ACM. 12. K. Richter, M. Jersak, and R. Ernst. A formal approach to mpsoc performance verification. IEEE Computer, 36(4):60–67, 2003. 13. L. Thiele, S. Chakraborty, and M. Naedele. Real-time calculus for scheduling hard real-time systems. In Proceedings Symposium on Circuits and Systems, volume 4, pp. 101–104, Geneva, Switzerland, 2000. 14. L. Thiele, E. Wandeler, and N. Stoimenov. Real-time interfaces for com- posing real-time systems. In International Conference on Embedded Software EMSOFT 06, pp. 34–43, Seoul, Korea, 2006. 15. K. Tindell and J. Clark. Holistic schedulability analysis for distributed hard real-time systems. Microprocess. Microprogram., 40(2–3):117–134, 1994. 16. E. Wandeler and L. Thiele. Interface-based design of real-time systems with hierarchical scheduling. In 12th IEEE Real-Time and Embedded Tech- nology and Applications Symposium (RTAS), pp. 243–252, San Jose, CA, April 2006. 17. E. Wandeler. Modular performance analysis and interface-based design for embedded realtime systems. PhD thesis, ETH Zürich, 2006. 18. E. Wandeler, A. Maxiaguine, and L. Thiele. Quantitative characterization of event streams in analysis of hard real-time applications. Real-Time Sys- tems, 29(2):205–225, March 2005. 19. E. Wandeler, A. Maxiaguine, and L. Thiele. Performance analysis of greedy shapers in real-time systems. In Design, Automation and Test in Europe (DATE), pp. 444–449, Munich, Germany, March 2006. 20. E. Wandeler and L. Thiele. Optimal TDMA time slot and cycle length allocation. In Asia and South Pacific Desing Automation Conference (ASP- DAC), pp. 479–484, Yokohama, Japan, January 2006. 21. E. Wandeler and L. Thiele. Real-Time Calculus (RTC) Toolbox. http://www.mpa.ethz.ch/Rtctoolbox, 2006. 22. E. Wandeler and L. Thiele. Workload correlations in multi-processor hard real-time systems. Journal of Computer and System Sciences, 73(2):207– 224, March 2007. Nicolescu/Model-Based Design for Embedded Systems 67842_C002 Finals Page 27 2009-10-13 2 SystemC-Based Performance Analysis of Embedded Systems Jürgen Schnerr, Oliver Bringmann, Matthias Krause, Alexander Viehl, and Wolfgang Rosentiel CONTENTS 2.1 Introduction 28 2.2 PerformanceAnalysisofDistributedEmbeddedSystems 29 2.2.1 Analytical Approaches 29 2.2.2 Simulative Approaches 30 2.2.3 Hybrid Approaches 31 2.3 Transaction-LevelModeling 32 2.3.1 Accuracy and Speed Trade-Off during Refinement Process 33 2.3.1.1 Communication Refinement 33 2.3.1.2 Computation Refinement of Software Applications 34 2.4 Proposed Hybrid Approach for Accurate Software Timing Simulation 35 2.4.1 Back-Annotation of WCET/BCET Values 36 2.4.2 Annotation of SystemC Code . 38 2.4.3 Static Cycle Calculation of a Basic Block 40 2.4.4 Modeling of Pipeline for a Basic Block . 40 2.4.4.1 Modeling with the Help of Reservation Tables 41 2.4.4.2 Calculation of Pipeline Overlapping 42 2.4.5 Dynamic Correction of Cycle Prediction 43 2.4.5.1 Branch Prediction 43 2.4.5.2 Instruction Cache 43 2.4.5.3 Cache Model 44 2.4.5.4 Cache Analysis Blocks 44 2.4.5.5 Cycle Calculation Code 45 2.4.6 Consideration of Task Switches 46 2.4.7 Preemption of Software Tasks 46 2.5 ExperimentalResults 47 2.6 Outlook 50 2.7 Conclusions 50 References 51 This chapter presents a methodology for SystemC-based performance analysis of embedded systems. This methodology is based on a cycle-accurate simulation approach for the embedded software that also allows the integration of abstract SystemC models. Compared to existing simulation-based approaches, a hybrid method is presented that resolves performance issues 27 Nicolescu/Model-Based Design for Embedded Systems 67842_C002 Finals Page 28 2009-10-13 28 Model-Based Design for Embedded Systems by combining the advantages of simulation-based and analytical approaches. In the first step, cycle-accurate static execution time analysis is applied at each basic block of a cross-compiled binary program using static processor models. After that, the determined timing information is back-annotated into SystemC for a fast simulation of all effects that cannot be resolved statically. This allows the consideration of data dependencies during runtime, and the incorporation of branch prediction and cache models by efficient source-code instrumentation. The major benefit of our approach is that the generated code can be executed very efficiently on the simulation host with approximately 90% of the speed of the untimed software without any code instrumentation. 2.1 Introduction In the future, new system functionality will be realized less by the sum of single components, but more by cooperation, interconnection, and distribution of these components, thereby leading to distributed embedded systems. Furthermore, new applications and innovations arise more and more from a distribution of functionality as well as from a combination of previously independent functions. Therefore, in the future, this distribution will play an important part in the increase of the product value. The system responsibility of the supplier is also currently increasing. This is because the supplier is not only responsible for the designed subsystem, but additionally for the integration of the subsystem in the context of the entire system. This integration is becoming more complex: today, requirements of single components are validated; in future, the requirements vali- dation of the entire system has to be achieved with regard to the designed component. What this means is that changes in the product area will lead to a para- digm shift in the design. Even in the design stage, the impact of a component on an entire system has to be considered. A comprehensive modeling of distributed systems, and an early analysis and simulation of the system integration have to be considered. Therefore, a methodical design process of distributed embedded systems has to be established, taking into account the timing behavior of the embedded software very early in the design process. This methodical design process can be implemented by using a comprehensive modeling of distributed systems and by using a platform-independent development of the application software (UML [6], MATLAB R  /Simulink R  [24], and C++). What is also important is the early inclusion of the intended target platform in the model-based system design (UML), the mapping of function blocks on platform components, and the use of virtual prototypes for the abstract modeling of the target architecture. Nicolescu/Model-Based Design for Embedded Systems 67842_C002 Finals Page 29 2009-10-13 SystemC-Based Performance Analysis of Embedded Systems 29 An early evaluation of the target platform means that the application software can be evaluated while considering the target platform. Hence, an optimization of the target platform under consideration of the application software, performance requirements, power dissipation, and reliability can take place. An early analysis of the system integration is provided by an early verification and exposure of integration faults using virtual prototypes. After that, a seamless transition to the physical prototype can take place. 2.2 Performance Analysis of Distributed Embedded Systems The main question of performance analysis of distributed embedded systems is: What is the global timing behavior of a system and how can it be determined? The central issue is that computation has no timing behavior as long as the target platform is not known because the target platform has a major effect on timing. The specification, however, can contain global performance requirements. The fulfillment of these requirements depends on local timing behav- iors of system parts. A solution for determining local timing properties is an early inclusion of the target architecture. Several analytical and simulative approaches for performance analysis have previously been proposed. In this chapter, a hybrid approach for performance analysis will be presented. 2.2.1 Analytical Approaches Analytical approaches perform a formal analysis of pessimistic corner cases based on a system model. Corner cases are hard bounds of the temporal system behavior. The approaches can be divided into two categories: black-box approaches and white-box approaches. Furthermore, both approaches can be categorized depending on the level of system abstraction and with regard to the model of computation that is employed. Black-box approaches consider functional system components as black boxes and abstract from their internal behavior. Black-box abstraction commonly uses a task model [33] with abstract task activation and event streams representing activation patterns [34] at the task level. Using event stream propagation, fixed points are calculated. For this, no modification of the event streams is necessary. Examples for black-box approaches are the real-time calculus (see Chapter 1 or [44]), the system- level composition by event stream propagation as it is used in SymTA/S (see Chapter 3 or [11]), the MAST framework [9], and the framework proposed by Pop et al. [31]. Nicolescu/Model-Based Design for Embedded Systems 67842_C002 Finals Page 30 2009-10-13 30 Model-Based Design for Embedded Systems White-box approaches include an abstract control-flow representation of each process within the system model. Then, a global performance and communication analysis considering (data-dependent) control structures of all processes can take place. For this analysis, an extraction of the control flow from the application software or from UML models [47] is required. Then, the environment can be modeled using event models or processes. Examples for white-box approaches are the communication dependency analysis [41], the control-flow-based extraction of hierarchical event streams [1], and timed automata [27]. Analytical approaches that only rely on best-case and worst-case timing estimates are very often too pessimistic, hence risk estimation for concrete scenarios is difficult to carry out. Different probabilistic analytic approaches attempt to tackle this issue by considering probabilities of timing quantities in white-box system analysis. Timed Petri nets [49] are able to represent the internal behavior of a system. Although there exist stochastic extensions by generalized stochastic Petri nets (GSPN) [23], these do not consider execution times of the actual system components. Furthermore, synchronization by communication and the specification of communication protocols have to be modeled explicitly and cannot be extracted from executable functional implementations of a design. System-level performance and power estimation based on stochastic automata networks (SAN) are introduced in [22]. The system including probabilities of execution times is modeled explicitly in SAN. The actual execution behavior of the components related to timing and control flow of a functional implementation is not considered. Stochastic automata [3] extend the model of communicating I/O automata [42] by general probability distri- butions for verifying performance requirements of systems. The system and timing probabilities have to be modeled explicitly and no bottom-up evaluation of a functional system implementation is given. 2.2.2 Simulative Approaches Simulative approaches perform a simulation of the entire communication infrastructure and the processing elements. If necessary, this simulation includes a hardware IP. Depending on the underlying model of computation, a network simulator such as the OPNET [28], Simulink, or SystemC [14] can be employed to simulate a network between communicating C/C++ processes. Timing annotation of such a network simulation is possible, but the exact timing behavior of the software is missing. To obtain this timing behavior, it is necessary to simulate the software execution on the target processor. For this simulation, the binary code for the target platform component is required. This binary code can run on an instruction set simulator (ISS). An ISS is an abstract model for executing instructions at the binary level and can be implemented either as an interpreter or as a binary code translator. It does Nicolescu/Model-Based Design for Embedded Systems 67842_C002 Finals Page 31 2009-10-13 SystemC-Based Performance Analysis of Embedded Systems 31 not consider modeling of the bus behavior. The binary code translation can be realized in two different ways: either as a static or as a dynamic compilation, also called the just-in-time (JIT) compilation [26]. An ISS is used in several commercial solutions, like the CoWare Processor Designer [5], CoMET from VaST Systems Technology [45], or Synopsys Virtual Platforms [43]. Furthermore, the binary code can be executed using a processor model that captures the complete processor (functional units, pipelines, caches, register, counter, I/Os, etc.). Such a model can have several levels of accuracy. For example, it can be a transaction-level model or a register transfer model. Since our approach uses transaction-level modeling (TLM), we will describe the different levels of abstraction of TLM models in more detail in Section 2.3. In addition to simulating the processor, peripheral components and cus- tom hardware have to be simulated as well, either by a co-simulation with HDL (hardware description language) simulators or by using SystemC. An abstract processor model with an integrated RTOS (real-time operat- ing system) model using task scheduling was presented in [35]. Addition- ally, a processor model using neural networks for execution-cycle estimation was presented in [30]. A transaction-level approach for the performance evaluation of SoC (System-on-Chip) architectures was presented in [48]. This approach is trace-based, and, therefore, cannot guarantee a sufficient path coverage of control-flow-dominated applications. Furthermore, the integration of a so-called cycle-approximate retar- getable processor model for software performance estimation at the transaction level was presented in [13]. The major drawback of this approach is that microarchitecture-dependent properties are measured on the target platform and are included probabilistically during execution. The compara- ble low deviation from on-board measurements of only 8% results from the fact that the reference measurements used the same examples and input data that the models were built from. It is likely that data-dependent effects will lead to larger accuracy errors. 2.2.3 Hybrid Approaches Hybrid approaches combine the advantages of analytical and simulative approaches. A hybrid approach for combining simulation and formal analysis for tightening bounds of system-level performance analysis was presented in [20]. The objectives are to determine timing characteristics of nonformally specified components by simulation and to integrate simulation results into a framework for formal performance analysis. In compari- son to the approach shown in [20], we focus on a fast timing simulation of the embedded software. The results determined using our approach may be included in system-level performance methodologies with the benefit of high accuracy and large time savings in the simulation stage. Analytic performance risk quantification based on profiled execution times is presented in [46]. The model is derived from physical Nicolescu/Model-Based Design for Embedded Systems 67842_C002 Finals Page 32 2009-10-13 32 Model-Based Design for Embedded Systems implementations. Although it is able to represent the temporal behavior of communication, computation, and synchronization, data-dependent timing effects cannot be detected reliably. A hybrid model for the fast simulation that allows switching between native code execution and ISS-based simulation was presented in [17]. Another approach using a hybrid model was shown in [38] and [36]. This approach is based on the translation of an object code into an annotated binary code for the target processor. For the cycle-accurate execution of the annotated code on this processor, a special hardware is needed. 2.3 Transaction-Level Modeling The TLM is a high-level approach to model systems where computation and communication between system modules are separated for each module of the proposed target architecture. Components that are described at different levels of abstraction can be integrated and exchanged in one common system model using standardized interfaces. Furthermore, an exploration and a refinement of components and their implementation in the global architecture can be performed. Transaction-level models address the problem of designing increasingly complex systems by raising the level of design abstraction above the register transfer level (RTL). The Open SystemC Initiative (OSCI) Transaction-Level Working Group has defined different levels of abstraction. Of these abstraction levels, transaction-level models apply at the levels between the Algo- rithmic Level (AL) and the RTL. These levels are introduced in [2] and also are briefly presented here. • Algorithmic Level (AL): Purely behavioral, no architectural detail whatsoever. • Untimed (UT) Modeling: Notion of simulation time is not required, each process runs up to the next explicit synchronization point before yielding. • Loosely Timed (LT) Modeling: The simulation time is used, but processes are temporally decoupled from the simulation time. Each process keeps a tally of the time it consumes, and may yield because it reaches an explicit synchronization point or because it has consumed its time quantum. • Approximately Timed (AT) Modeling: Processes run in lockstep with the SystemC simulation time. Delays of process interactions are annotated by using timeouts (wait) or timed event notifications. • Register Transfer Level (RTL): Has the description of the register and combination logic. Nicolescu/Model-Based Design for Embedded Systems 67842_C002 Finals Page 33 2009-10-13 SystemC-Based Performance Analysis of Embedded Systems 33 2.3.1 Accuracy and Speed Trade-Off during Refinement Process The proposed approach allows for an early incorporation of the effects of the underlying target platform into the embedded software design. Platform architectures are not limited to single-core processors with simple communication architectures. The approach also applies to multi-core architectures and distributed embedded systems with complex network architectures, for instance, networks of interconnected electronic control units (ECUs) in the automotive domain. This flexibility requires a seamless refinement flow for the embedded software beginning at the platform-independent software down to the platform-specific target software. By stepwise refinement of the system model, a design at lower levels of abstraction, where the simulation is more accurate at the expense of increasing the simulation time, can be obtained. Two different refinement strategies have to be distinguished: computation refinement and communication refinement. Computation refinement is especially applicable for single-processor embedded systems without a special focus on communication aspects. In this case, the complexity of executing a cross-compiled binary code may be acceptable. But with an increasing number of processing units and network complexity (e.g., hierarchical automotive networks consisting of FlexRay, CAN, LIN, and MOST buses), the simulation speed for analyzing the timing influences of the embedded software on the distributed system becomes unacceptable. This issue is addressed by a highly scalable performance simulation approach for net- worked embedded systems because the integration of the ISSs with a high simulation time into each processing element becomes obsolete. A decreas- ing simulation time is specifically enabled by keeping computation at a high level of abstraction whereas communication is refined to a lower level or vice versa. During the refinement flow, different levels of abstraction are traversed. This strategy is supported by the TLM in SystemC. More detailed information about the modeling and refinement of SystemC simulation models within the scope of the automotive embedded software and AUTOSAR [10] is presented in [19]. 2.3.1.1 Communication Refinement As shown in Figure 2.1, there exists a communication scheme at the UT level that is called point-to-point communication. The point-to-point communication can be timed or untimed. A timed representation means that an abstract timing behavior is provided by use of wait(T) statements, which are allowed to be introduced within the point-to-point communication. However, only certain cases can be considered during simulation. The consideration of all cases possibly results in an infinite or at least in an unacceptable simulation time. This is a general problem of simulation, and only a formal analysis can solve this problem to cover each corner case of the system behavior. Such a method is also introduced in [39] and [40]. Nicolescu/Model-Based Design for Embedded Systems 67842_C002 Finals Page 34 2009-10-13 34 Model-Based Design for Embedded Systems UT/LT Untimed/timed structural communication CDMA Timing approximate communication CAN UT Untimed/timed p-2-p communication AT Cycle-accurate communication CAN Refinement flow FIGURE 2.1 The communication refinement flow. (From Krause, M. et al., Des. Automat. Embed. Syst., 10, 237, 2005. With permission.) The refinement from untimed modeling to loosely timed modeling intro- duces abstract or dedicated buses respectively. The ports and interfaces of the untimed modeling remain and only the channel implementation is replaced. Figure 2.1 illustrates the communication refinement process for a CAN bus. Refinement from the TLM to the RTL description means replacing trans- actions by signals. This refinement technique is described in [8] in detail. 2.3.1.2 Computation Refinement of Software Applications Considering computation, the design is transformed to a structural representation by specifying the desired target architecture. Using untimed modeling, processes are still simulated as parallel processes by the SystemC simulation kernel.Themostimportantimpacttoasoftwarerealizationistheimplemented scheduling of threads that are assigned to the same processing elements. The refinement from an unstructured to a structured execution order is done by introducing a scheduler model to the system description, or, for more detailed modeling, an abstract RTOS model. However, this requires the specification of preemption points. Together with such preemption points, the timing information of the runtime is annotated. This chapter presents an approach on how to obtain and integrate the accurate timing information. Figure 2.2 illustrates the computation refinement process. Detailed information about refinement is presented in [18]. UT Untimed/timed parallel processes Refinement flow UT/LT Untimed/timed scheduled processes Scheduled processes, approximate timing AT Cycle-accurate computation CAN RTOS RTOS CPU CPU RTOS RTOS FIGURE 2.2 The computation refinement flow. (From Krause, M. et al., Des. Automat. Embed. Syst., 10, 238, 2005. With permission.) Nicolescu/Model-Based Design for Embedded Systems 67842_C002 Finals Page 35 2009-10-13 SystemC-Based Performance Analysis of Embedded Systems 35 2.4 Proposed Hybrid Approach for Accurate Software Timing Simulation In this section, a hybrid approach for the performance simulation of the embedded software [37] will be presented. Hybrid approaches consist of a combination of analytic and simulative approaches with the objective of gaining simulation speed while maintaining sufficient accuracy. The integratability in a global refinement flow for the software down to the cycle-approximate level is given by the automated generation of the TLM interfaces. The static worst-case/best-case execution time (WCET/BCET) analysis abstracts the influence of data dependencies on the software execution time. Because of this, the BCET/WCET analysis delivers very good results of the entire basic blocks, but it is too pessimistic across the basic block boundaries. Furthermore, the effects of a concurrent cache usage of different applications on multi-core architectures lead to even wider bounds. An analytic solution for this issue is still unknown. The objective of the presented approach is the reduction of pessimism that is contained in the WCET/BCET boundaries. Simulative techniques that consider an application with concrete input data and the target architecture can be used to determine the timing behavior of the softwareonthe underlying architecture.Theproposed approach triesto prevent repeated time-consuming interpretation and repeated timing deter- mination of all executed binary code instructions on the target architecture. The hybrid approach provided in this chapter applies back-annotation of the WCET/BCET values. These values are determined statically at the basic block level using the binary code that was generated from the C source code. Additionally, the timing impact of data-dependent architectural properties such as branch prediction is also considered effectively. The tool that implements the proposed methodology generates the SystemC code. This code can be compiled for any host machine to be used for a target platform- independent simulation. Communication calls in the automatically created SystemC models are encapsulated in the TLM [7] communication primitives. In this way, a clean and standardized ability to integrate the timed embedded software in virtual SystemC prototypes is provided. One major advantage of the presented methodology is in the area of multi-core processors with shared caches. Whereas static analysis has no knowledge of concurrent cache usage of different applications and the impact on execution time, the presented methodology is able to handle these issues. How this is done will be described in more detail in Section 2.4.6. Another possibility would be a translation of the binary code into the annotated SystemC code. One of the main advantages of such an approach is that no source code is needed, as the binary code is used for determining cycle counts and for generating the SystemC code. Another advantage is that . is presented that resolves performance issues 27 Nicolescu /Model-Based Design for Embedded Systems 67842_C002 Finals Page 28 2009-10-13 28 Model-Based Design for Embedded Systems by combining. Nicolescu /Model-Based Design for Embedded Systems 67842_C001 Finals Page 26 2009-10-1 26 Model-Based Design for Embedded Systems Proceedings of the Tenth International. framework proposed by Pop et al. [31]. Nicolescu /Model-Based Design for Embedded Systems 67842_C002 Finals Page 30 2009-10-13 30 Model-Based Design for Embedded Systems White-box approaches include

Định dạng
Số trang	10
Dung lượng	312,72 KB