Nicolescu/Model-Based Design for Embedded Systems 67842_C001 Finals Page 16 2009-10-1 16 Model-Based Design for Embedded Systems α u (Δ) = min{(α u ⊗β u ) β l , β u } α l (Δ) = min{(α l β u ) ⊗ β l , β l } β u (Δ) = max{ inf Δ≤λ {β u (λ) − α l (λ)},0} β l (Δ) = sup 0≤λ≤Δ {β l (λ) − α u (λ)} In the example system of Figure 1.2, the processing semantics of the tasks P 1 , P 2 , P 3 ,andP 4 can be modeled with abstract GPCs. Finally, let us consider a component that is used for event stream shaping. A greedy shaper component (GSC) with a shaping curve σ delays events of an input event stream such that the output event stream has σ as an upper arrival curve. Additionally, a greedy shaper guarantees that no events are delayed longer than necessary. Typically, greedy shapers are used to reshape bursty event streams and to reduce global buffer requirements. If the abstract input event stream of a GSC with the shaping curve, σ, is represented by the tuple of arrival curves, [α l , α u ], then the output of the GSC can be modeled as an abstract event stream with arrival curves: α u GSC = α u ⊗σα l GSC = α l ⊗(σσ) Note that a greedy shaper does not need any computation or communica- tion resources. Thus, the transfer function of an abstract GSC considers only the ingoing and the outgoing event stream, as well as the shaping curve, σ. More details about greedy shapers inthe context of MPA can be found in [19]. In the example system of Figure 1.2, the semantics of the shapers, S 1 and S 2 , can be modeled with abstract GSCs. 1.4.4 System Performance Model In order to analyze the performance of a distributed embedded platform, it is necessary to build a system performance model. This model has to repre- sent the hardware architecture of the platform. In particular, it has to reflect the mapping of tasks to computation or communication resources and the scheduling policies adopted by these resources. To obtain a performance model of a system, we first have to model the event streams that trigger the system, the computation and communication resources that are available, and the processing components. Then, we have to interconnect the arrival and service inputs and outputs of all these ele- ments so that the architecture of the system is correctly represented. Figure 1.7 depicts the MPA performance model for the example system described in Figure 1.2. Note that the outgoing abstract service stream of GPC 2 is used as the ingoing abstract service stream for GPC 4 , i.e., GPC 4 gets only the resources that are left by GPC 2 . This represents the fact that the two tasks share the same processor and are scheduled according to a Nicolescu/Model-Based Design for Embedded Systems 67842_C001 Finals Page 17 2009-10-1 Performance Prediction of Distributed Platforms 17 α HR α LR β 1 β 3 β 2 σ 1 σ 2 GPC 1 GPC 3 GPC 4 GSC 2 GPC 2 GSC 1 FIGURE 1.7 Performance model for the example system in Figure 1.2. pre-emptive fixed priority scheduling policy with GPC 2 having a higher pri- ority than GPC 4 . In general, scheduling policies for shared resources can be modeled by the way the abstract resources β are distributed among the different abstract tasks. For some scheduling policies, such as earliest deadline first (EDF) [16], TDMA [20], nonpreemptive fixed priority scheduling [4], various kinds of servers [16], or any hierarchical composition of these elementary policies, abstract components with appropriate transfer functions have been intro- duced. Figure 1.8 shows some examples of how to model different schedul- ing policies within the MPA framework. 1.4.5 Performance Analysis The performance model provides the basis for the performance analysis of a system. Several performance characteristics such as worst-case end-to- end delays of events or buffer requirements can be determined analytically within the MPA framework. The performance of each abstract component can be determined as a function of the ingoing arrival and service curves by the formulas of the RTC. For instance, the maximum delay, d max , experienced by an event of an event stream with arrival curves, [α l , α u ], that is processed by a GPC on a resource with service curves, [β l , β u ], is bounded by d max ≤ sup λ≥0 inf{τ ≥ 0: α u (λ) ≤ β l (λ + τ)} def = Del(α u , β l ) The maximum buffer space, b max , that is required to buffer an event stream with arrival curves, [α l , α u ], that is processed by a GPC on a resource with service curves, [β l , β u ], is bounded by b max ≤ sup λ≥0 {α u (λ) − β l (λ)} def = Buf(α u , β l ) Nicolescu/Model-Based Design for Embedded Systems 67842_C001 Finals Page 18 2009-10-1 18 Model-Based Design for Embedded Systems α A α A GPC GPC α B β β (a) α A α B β β (b) EDF α A α B β slot 1 β slot 2 (c) α A α B β β s 2 α B β α A α B α A α B β slot 1 β slot 2 α A α B β β s 2 β s 1 β s 1 (d) GPC TDMA Share Sum GPC GPC GPC FIGURE 1.8 Modeling scheduling policies in the MPA framework: (a) preemptive fixed priority, (b) EDF, (c) TDMA, and (d) generalized processor sharing. Figure 1.9 shows the graphical interpretation of the maximum delay expe- rienced by an event at a GPC and the maximum buffer requirement of the GPC: d max corresponds to the maximum horizontal distance between α u and β l ,andb max corresponds to the maximum vertical distance between α u and β l . In order to compute the end-to-end delay of an event stream over several consecutive GPCs, one can simply add the single delays at the various com- ponents. Besides this strictly modular approach, one can also use a holistic delay analysis that takes into consideration that in a chain of task the worst- case burst cannot appear simultaneously in all tasks. (This phenomenon is described as “pay burst only once” [7].) For such a task chain the total delay can be tightened to d max ≤ Del(α u , β l 1 ⊗β l 2 ⊗ ⊗β l n ) For an abstract GSC, the maximum delay and the maximum backlog are bounded by d max = Del(α u , σ) b max = Buf(α u , σ) Nicolescu/Model-Based Design for Embedded Systems 67842_C001 Finals Page 19 2009-10-1 Performance Prediction of Distributed Platforms 19 d max b max β l α u Δ FIGURE 1.9 Graphical interpretation of d max and b max . Let us come back to the example of Figure 1.2. By applying the above reasoning, the worst-case end-to-end delay for the packets of the two video streams can be analytically bounded by d HR ≤ Del(α u HR , β l 1 ⊗β l 3 ⊗σ 1 ) d LR ≤ Del(α u LR , β l 2 ⊗β l 3 ⊗σ 2 ) 1.4.6 Compact Representation of VCCs The performance analysis method presented above relies on computations on arrival and service curves. While the RTC provides compact mathemati- cal representations for the different operations on curves, their computation in practice is typically more involved. The main issue is that the VCCs are defined for the infinite range of positive real numbers. However, any com- putation on these curves requires a finite representation. To overcome this problem, we introduce a compact representation for special classes of VCCs. In particular, we consider piecewise linear VCCs that are finite, periodic, or mixed. • Finite piecewise linear VCCs consist of a finite set of linear segments. • Periodic piecewise linear VCCs consist of a finite set of linear segments that are repeated periodically with a constant offset between consecu- tive repetitions. • Mixed piecewise linear VCCs consist of a finite set of linear segments that are followed by a second finite set of linear segments that are repeated periodically, again with a constant offset between consecu- tive repetitions. Figure 1.10a through c shows examples of these three classes of curves. Many practically relevant arrival and service curves are piecewise linear. For example, if a stream consists of a discrete token, the corresponding Nicolescu/Model-Based Design for Embedded Systems 67842_C001 Finals Page 20 2009-10-1 20 Model-Based Design for Embedded Systems 8 6 4 2 0 0246810 Δ (ms)(a) (b) (c) Resource units 4 3 2 1 0 0 246810 Resource units Δ (ms) 8 6 4 2 0 0246810 Events Δ (ms) FIGURE 1.10 (a) A finite piecewise linear VCC, (b) a periodic piecewise linear VCC, and (c) a mixed piecewise linear VCC. arrival curve is an integer and can be represented as a piecewise constant function. For the service curves,one could use the samereasoning, as the basic resource units (number of clock cycles, number of bytes, etc.) are typically also atomic. However, these units are often too fine-grained for a practical analysis andhence it is preferable to use a continuous model. In most practical applications, the fluid resource availability is piecewise constant over time, that is, practically relevant service curves are also piecewise linear. Here, we want to note that there are also piecewise linear VCCs that are not covered by the three classes of curves that we have defined above. In particular, we have excluded irregular VCCs, that is, VCCs with an infinite number of linear segments that do not eventually show periodicity. However, most practically relevant timing specifications for event streams and availability specifications for resources can be captured by either finite, Nicolescu/Model-Based Design for Embedded Systems 67842_C001 Finals Page 21 2009-10-1 Performance Prediction of Distributed Platforms 21 periodic, or mixed piecewise linear VCCs. In addition, note that VCCs only describe bounds on token or resource streams, and, therefore, one can always safely approximate an irregular VCC to a mixed piecewise VCC. In the following, we describe how these three classes of curves can be rep- resented by means of a compact data structure. First, we note that a single lin- ear segment of a curve can be represented by a triple x, y,swith x ∈ R ≥0 and y, s ∈ R that specifies a straight line in the Cartesian coordinate system, which starts at the point (x, y) and has a slope s. Further, a piecewise linear VCC can be represented as a (finite or infinite) sequence x 1 , y 1 , s 1 , x 2 , y 2 , s 2 , of such triples with x i < x i+1 for all i. To obtain a curve defined by such a sequence, the single linear segments are simply extended with their slopes until the x-coordinate of the starting point of the next segment is reached. The key property of the three classes of VCCs defined above is that these VCCs can be represented with a finite number of segments, which is funda- mental for practical computations: Let ρ be a lower or an upper VCC belong- ing to a set of finite, periodic, or mixed VCCs. Then ρ can be represented with atuple ν ρ = Σ A , Σ P , p x , p y , x p0 , y p0 where Σ A is a sequence of linear segments describing a possibly existing irregular initial part of ρ Σ P is a sequence of linear segments describing a possibly existing regularly repeated part of ρ If Σ P is not an empty sequence, then the regular part of ρ is defined by the period p x and the vertical offset p y between two consecutive repetitions of Σ P , and the first occurrence of the regular sequence Σ P starts at (x p0 , y p0 ). In this compact representation, we call Σ A the aperiodic curve part and Σ P the periodic curve part. In the compact representation, a finite piecewise linear VCC has Σ P ={}, that is, it consists of only the aperiodic part, Σ A ,withx A,1 = 0. A periodic piecewise linear VCC can be described with Σ A ={}, x P,1 = 0, and x p0 = 0, that is, it has no aperiodic part. And finally, a mixed piecewise linear VCC is characterized by x A,1 = 0, x P,1 = 0, and x p0 > 0. As an example, consider the regular mixed piecewise linear VCC depicted in Figure 1.10c. Its compact representation according to the defi- nition above is given by the tuple ν C = 0, 1, 0, 0.2, 2,0, 0.4, 3, 0, 0.6,4, 0 , 0, 0, 0 ,2,1,2,5 The described compact representation of VCCs is used as a basis for prac- tical computations in the RTC framework. All the curve operators adopted in the RTC (minimum, maximum, convolutions, deconvolutions, etc.) are closed on the set of mixed piecewise linear VCCs. This means that the result of the operators, when applied to finite, periodic, or mixed piecewise linear Nicolescu/Model-Based Design for Embedded Systems 67842_C001 Finals Page 22 2009-10-1 22 Model-Based Design for Embedded Systems VCCs, is again a mixed piecewise linear VCC. Further details about the com- pact representation of VCCs and, in particular, on the computation of the operators can be found in [17]. 1.5 RTC Toolbox The framework for the MPA with the RTC that we have described in this chapter has been implemented in the RTC Toolbox for MATLAB R [21], which is available at http://www.mpa.ethz.ch/Rtctoolbox. The RTC Toolbox is a powerful instrument for system-level performance analysis of distributed embedded platforms. At its core, the toolbox provides a MATLAB type for the compact representation of VCCs (see details in Sec- tion 1.4) and an implementation of a set of the RTC curve operations. Built around this core, the RTC Toolbox provides libraries to perform the MPA, and to visualize VCCs and the related data. Figure 1.11 shows the underlying software architecture of the toolbox. The RTC toolbox internally consists of a kernel that is implemented in Java, and a set of MATLAB libraries that connect the Java kernel to the MAT- LAB command line interface. The kernel consists of classes for the compact representation of VCCs and classes that implement the RTC operators. These two principal components are supported by classes that provide various utilities. On top of these classes, the Java kernel provides APIs that provide methods to create compact VCCs, compute the RTC operations, and access parts of the utilities. MATLAB command line RTC toolbox RTC operators MPA library VCC library MATLAB/Java interface Min-plus/Max-plus algebra, utilities Compact representation of VCCs Java API FIGURE 1.11 Software architecture of the RTC toolbox. Nicolescu/Model-Based Design for Embedded Systems 67842_C001 Finals Page 23 2009-10-1 Performance Prediction of Distributed Platforms 23 The Java kernel is accessed from MATLAB via the MATLAB Java Inter- face. However, this access is completely hidden from the user who only uses the MATLAB functions provided by the RTC libraries. The MATLAB libraries of the RTC Toolbox provide functions to create VCCs, plot VCCs, and apply operators of the RTC on VCCs. From the point of view of the user, the VCCs are MATLAB data types, even if internally they are represented as Java objects. Similarly, the MATLAB functions for the RTC operators are wrapper functions for the corresponding methods that are implemented in the Java kernel. On top of the VCC and the RTC libraries, there is the MPA library. It provides a set of functions that facilitate the use of the RTC Toolbox for the MPA. In particular, it contains functions to create commonly used arrival and service curves, as well as functions to conveniently compute the outputs of the various abstract components of the MPA framework. 1.6 Extensions In the previous sections, we have introduced the basics of the MPA approach based on the RTC. Recently, several extensions have been developed to refine the analysis method. In [4], the existing methods for analyzing heterogeneous multiprocessor systems are extended to nonpreemptive scheduling policies. In this work, more complex task-activation schemes are investigated as well. In particular, components with multiple inputs and AND- or OR-activation semantics are introduced. The MPA approach also supports the modeling and analysis of systems with dynamic scheduling policies. In [16], a component for the modeling of the EDF scheduling is presented. This work also extends the ability of the MPA framework to model and analyze hierarchical scheduling policies by introducing appropriate server components. The TDMA policies have been modeled using the MPA as well [20]. In Section 1.4, we have briefly described the GSC. More details about traffic shaping in the context of multiprocessor embedded systems and the embedding of the GSC component into the MPA framework can be found in [19]. In many embedded systems, the events of an event stream can have various types and impose different workloads on the systems depending on their types. Abstract stream models for the characterization of streams with different event types are introduced in [18]. In order to get more accurate analysis results, these models permit to capture and exploit the knowledge aboutcorrelationsanddependenciesbetweendifferenteventtypesinastream. Further, in distributed embedded platforms, there often exist correlations in the workloads imposed by events of a given type on different system Nicolescu/Model-Based Design for Embedded Systems 67842_C001 Finals Page 24 2009-10-1 24 Model-Based Design for Embedded Systems components. In [22], a model is introduced to capture and characterize such workload correlations in the framework of the MPA. This work shows that the exploitation of workload correlations can lead to considerably improved analysis results. The theory of real-time interfaces is introduced in [14]. It connects the principles of the RTC and the interface-based embedded system design [2]. The real-time interfaces represent a powerful extension of the MPA frame- work. They permit an abstraction of the component behavior into interfaces. This means that a system designer does not need to understand the details of a component’s implementation, but only needs to know its interface in order to ensure that the component will work properly in the system. Before the introduction of the real-time interfaces, the MPA method was limited to the a posteriori analysis of component-based real-time system designs. With the real-time interfaces, it is possible to compose systems that are correct by construction. 1.7 Concluding Remarks In this chapter, we have introduced the reader to the system-level perfor- mance prediction of distributed embedded platforms in the early design stages. We have defined the problem and given a brief overview of approaches to performance analysis. Starting from a simple application scenario, we have presented a formal system description method in the time domain. We have described its use- fulness for the simulation of concrete system executions, but at the same time we have pointed out that the method is inappropriate for worst-case analy- sis, as in general it cannot guarantee the coverage of corner cases. Driven by the need to provide hard performance bounds for distributed embedded platforms, we have generalized the formalism to an abstraction in the time interval domain based on the VCCs and the RTC. We have pre- sented the essential models underlying the resulting framework for the MPA and we have demonstrated its application. Finally, we have described a com- pact representation of the VCCs that enables an efficient computation of RTC curve operations in practice, and we have presented the RTC Toolbox for MATLAB, the implementation of the MPA analysis framework. Acknowledgments The authors would like to thank Ernesto Wandeler for contributing to some part of this chapter and Nikolay Stoimenov for helpful comments on an ear- lier version. Nicolescu/Model-Based Design for Embedded Systems 67842_C001 Finals Page 25 2009-10-1 Performance Prediction of Distributed Platforms 25 References 1. S. Chakraborty, S. Künzli, and L. Thiele. A general framework for analysing system properties in platform-based embedded system designs. In Design Automation and Test in Europe (DATE), pp. 190–195, Munich, Germany, March 2003. IEEE Press. 2. L. de Alfaro and T. A. Henzinger. Interface theories for component-based design. In EMSOFT ’01: Proceedings of the First International Workshop on Embedded Software, pp. 148–165, London, U.K., 2001. Springer-Verlag. 3. M. G. Harbour, J. J. Gutiérrez García, J. C. Palencia Gutiérrez, and J. M. Drake Moyano. Mast: Modeling and analysis suite for real time appli- cations. In Proceedings of 13th Euromicro Conference on Real-Time Systems, pp. 125–134, Delft, the Netherlands, 2001. IEEE Computer Society. 4. W. Haid and L. Thiele. Complex task activation schemes in system level performance analysis. In 5th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’07), pp. 173–178, Salzburg, Austria, October 2007. 5. S. Künzli, F. Poletti, L. Benini, and L. Thiele. Combining simulation and formal methods for system-level performance analysis. In Design Automa- tion and Test in Europe (DATE), pp. 236–241, Munich, Germany, 2006. IEEE Computer Society. 6. K. Lahiri, A. Raghunathan, and S. Dey. System-level performance analy- sis for designing on-chip communication architectures. IEEE Transactions on CAD of Integrated Circuits and Systems, 20(6):768–783, 2001. 7. J Y. Le Boudec and P. Thiran. Network Calculus: A Theory of Deterministic Queuing Systems for the Internet. Springer-Verlag, New York, Inc., 2001. 8. A. Maxiaguine, S. Künzli, and L. Thiele. Workload characterization model for tasks with variable execution demand. In Design Automation and Test in Europe (DATE), pp. 1040–1045, Paris, France, February 2004. IEEE Computer Society. 9. The Open SystemC Initiative (OSCI). http://www.systemc.org. 10. J. C. Palencia Gutiérrez and M. G. Harbour. Schedulability analysis for tasks with static and dynamic offsets. In Proceedings of the 19th Real-Time Systems Symposium, Madrid, Spain, 1998. IEEE Computer Society. 11. T. Pop, P. Eles, and Z. Peng. Holistic scheduling and analysis of mixed time/event-triggered distributed embedded systems. In CODES ’02: . linear. For example, if a stream consists of a discrete token, the corresponding Nicolescu /Model-Based Design for Embedded Systems 67842_C001 Finals Page 20 2009-10-1 20 Model-Based Design for Embedded. sup λ≥0 {α u (λ) − β l (λ)} def = Buf(α u , β l ) Nicolescu /Model-Based Design for Embedded Systems 67842_C001 Finals Page 18 2009-10-1 18 Model-Based Design for Embedded Systems α A α A GPC GPC α B β β (a) α A α B β β (b) EDF α A α B β slot 1 β slot 2 (c) α A α B β β s 2 α B β α A α B α A α B β slot 1 β slot 2 α A α B β β s 2 β s 1 β s 1 (d) GPC TDMA Share Sum GPC GPC GPC FIGURE. Nicolescu /Model-Based Design for Embedded Systems 67842_C001 Finals Page 16 2009-10-1 16 Model-Based Design for Embedded Systems α u (Δ) = min{(α u ⊗β u )