Model-Based Design for Embedded Systems- P4 pptx

Nicolescu/Model-Based Design for Embedded Systems 67842_C001 Finals Page 6 2009-10-1 6 Model-Based Design for Embedded Systems In addition, in many cases, the same simulation environment can be used for both function and performance verifications. However, most simulation- based performance estimation methods suffer from insufficient corner-case coverage. This means that they are typically not able to provide worst-case performance guarantees. Moreover, accurate simulations are often computa- tionally expensive. In other works [5,6], hybrid performance estimation methods have been presented that combine simulation and analytic techniques. While these approaches considerably shorten the simulation run-times, they still cannot guarantee full coverage of corner cases. To determine guaranteed performance limits, analytic methods must be adopted. These methods provide hard performance bounds; however, they are typically not able to model complex interactions and state-dependent behaviors, which can result in pessimistic performance bounds. Several models and methods for analytic performance verifications ofdis- tributed platforms have been presented so far. These approaches are based on essentially different abstraction concepts. The first idea was to extend well-known results of the classical scheduling theory to distributed systems. This implies the consideration of communication delays, which cannot be neglected in a distributed system. Such a combined analysis of processor and bus scheduling is often referred to as holistic scheduling analysis. Rather than a specific performance analysis method, holistic scheduling is a collection of techniques for the analysis of distributed platforms, each of which is tailored toward a particular combination of an event stream model, a resource-sharing policy, and communication arbitration (see [10,11,15] as examples). Several holistic analysis techniques are aggregated and imple- mented in the modeling and analysis suite for real-time applications (MAST) [3]. ∗ In [12], a more general approach to extend the concepts of the classical scheduling theory to distributed systems was presented. In contrast to holistic approaches that extend the monoprocessor scheduling analysis to special classes of distributed systems, this compositional method applies existing analysis techniques in a modular manner: the single components of a distributed system are analyzed with classical algorithms, and the local results are propagated through the system by appropriate interfaces relying on a limited set of event stream models. In this chapter, we will describe a different analytic and modular approach for performance prediction that does not rely on the classical scheduling theory. The method uses real-time calculus [13] (RTC), which extends the basic concepts of network calculus [7]. The corresponding modular performance analysis (MPA) framework [1] analyzes the flow of event streams through a network of computation and communication resources. ∗ Available as Open Source software at http://mast.unican.es Nicolescu/Model-Based Design for Embedded Systems 67842_C001 Finals Page 7 2009-10-1 Performance Prediction of Distributed Platforms 7 1.2 Application Scenario In this section, we introduce the reader to the system-level performance analysis by means of a concrete application scenario from the area of video processing. Intentionally, this example is extremely simple in terms of the underlying hardware platform and the application model. On the other hand, it allows us to introduce the concepts that are necessary for a compositional performance analysis (see Section 1.4). The example system that we consider is a digital set-top box for the decoding of video streams. The architecture of the system is depicted in Figure 1.2. The set-top box implements a picture-in-picture (PiP) application that decodes two concurrent MPEG-2 video streams and displays them on the same output device. The upper stream, V HR , has a higher frame resolution and is displayed in full screen whereas the lower stream, V LR ,hasa lower frame resolution and is displayed in a smaller window at the bottom left edge of the screen. The MPEG-2 video decoding consists of the following tasks: variable length decoding (VLD), inverse quantization (IQ), inverse discrete cosine transformation (IDCT), and motion compensation (MC). In the considered set-top box, the decoding application is partitioned onto three processors: CPU 1 ,CPU 2 ,andCPU 3 . The tasks VLD and IQ are mapped onto CPU 1 for the first video stream (process P 1 )andontoCPU 2 for the second video stream (process P 3 ). The tasks IDCT and MC are mapped onto CPU 3 for both video streams (processes P 2 and P 4 ). A pre-emptive fixed priority scheduler is adopted for the sharing of CPU 3 between the two streams, with the upper stream having higher priority than the lower stream. This reflects the fact that the decoder gives a higher quality of service (QoS) to the stream with a higher frame resolution, V HR . As shown in the figure, the video streams arrive over a network and enter the system after some initial packet processing at the network interface. The inputs to P 1 and P 3 are compressed bitstreams and their outputs are partially decoded macroblocks, which serve as inputs to P 2 and P 4 . The fully decoded video streams are then fed into two traffic-shaping components S 1 and S 2 , respectively. This is necessary because the outputs of P 2 and P 4 are potentially bursty and need to be smoothed out in order to make sure that no packets are lost by the video interface, which cannot handle more than a certain packet rate per stream. We assume that the arrival patterns of the two streams, V HR and V LR , from the network as well as the execution demands of the various tasks in the system are known. The performance characteristics that we want to analyze are the worst-case end-to-end delays for the two video streams from the input to the output of the set-top box. Moreover, we want to analyze the memory demand of the system in terms of worst-case packet buffer occupa- tion for the various tasks. Nicolescu/Model-Based Design for Embedded Systems 67842_C001 Finals Page 8 2009-10-1 8 Model-Based Design for Embedded Systems Network Network interface V HR V LR CPU1 CPU3 CPU2 P1 P2 P3 P4 S 2 σ 2 Set-top box Video interface LCD TV HR LR S1 σ 1 S2 σ 2 VLD IQ VLD IQ IDCT MC IDCT MC FIGURE 1.2 A PiP application decoding two MPEG-2 video streams on a multiprocessor architecture. Nicolescu/Model-Based Design for Embedded Systems 67842_C001 Finals Page 9 2009-10-1 Performance Prediction of Distributed Platforms 9 In Section 1.3, we at first will formally describe the above system in the concrete time domain. In principle, this formalization could directly be used in order to perform a simulation; in our case, it will be the basis for the MPA described in Section 1.4. 1.3 Representation in the Time Domain As can be seen from the example described in Section 1.2, the basic model of computation consists of component networks that can be described as a set of components that are communicating via infinite FIFO (first-in first-out) buffers denoted as channels. Components receive streams of tokens via their input channels, operate on the arriving tokens, and produce output tokens that aresent tothe outputchannels. Wealso assumethat thecomponents need resources in order to actually perform operations. Figure 1.3 represents the simple component network corresponding to the video decoding example. Examples of components are tasks that are executed on computing resources or data communication via buses or interconnection networks. Therefore, the token streams that are present at the inputs or outputs of a component could be of different types; for example, they could represent simple events that trigger tasks in the corresponding computation component or they could represent data packets that need to be communicated. 1.3.1 Arrival and Service Functions In order to describe this model in greater detail, at first we will describe streams in the concrete time domain. To this end, we define the concept of arrival functions: R(s, t) ∈ R ≥0 denotes the amount of tokens that arrive in the time interval [s,t) for all time instances, s, t ∈ R, s < t,andR(t, t) = 0. Depending on the interpretation of a token stream, an arrival function may be integer valued, i.e., R(s,t) ∈ Z ≥0 . In other words, R(s,t) “counts” the P 1 P 3 (a) (b) P 4 S 2 P 2 S 1 R HR R LR C 1 C 2 P 1 P 3 P 2 P 4 C 3 S 1 S 2 σ 1 σ 2 FIGURE 1.3 Component networks corresponding to the video decoding example in Sec- tion 1.2: (a) without resource interaction, and (b) with resource interaction. Nicolescu/Model-Based Design for Embedded Systems 67842_C001 Finals Page 10 2009-10-1 10 Model-Based Design for Embedded Systems number of tokens in a time interval. Note that we are taking a very liberal definition of a token here: It just denotes the amount of data or events that arrive in a channel. Therefore, a token may represent bytes, events, or even demanded processing cycles. In the component network semantics, tokens are stored in channels that connect inputs and outputs of components. Let us suppose that we had determined the arrival function R  (s, t) corresponding to a component output (that writes tokens into a channel) and the arrival function R(s, t) corresponding to a component input (that removes tokens from the channel); then we can easily determine the buffer fill level, B(t), of this channel at some time t: B(t) = B(s) +R  (s, t) −R(s, t). As has been described above, one of the major elements of the model is that components can only advance in their operation if there are resources available. As resources are the first-class citizens of the performance analysis, we define the concept of service functions: C(s, t) ∈ R ≥0 denotes the amount of available resources in the time interval [s, t) for all time instances, s, t ∈ R, s < t,andC(t, t) = 0. Depending on the type of the underlying resource, C(s,t) may denote the accumulated time in which the resource is fully available for communication or computation, the amount of processing cycles, or the amount of information that can be communicated in [s,t). 1.3.2 Simple and Greedy Components Using the above concept of arrival functions, we can describe a set of very simple components that only perform data conversions and synchronization. • Tokenizer: A tokenizer receives fractional tokens at the input that may correspond to a partially transmitted packet or a partially executed task. A discrete output token is only generated if the whole processing or communication of the predecessor component is finished. With the input and output arrival functions R(s, t) and R  (s, t), respectively, we obtain as a transfer function R  (s, t) =R(s, t). • Scaler: Sometimes, the units of arrival and service curves do not match. For example, the arrival function, R, describes a number of events and the service function, C, describes resource units. Therefore, we need to introduce the concept of scaling: R  (s, t) = w · R(s, t), with the positive scaling factor, w. For example, w may convert events into processor cycles (in case of computing) or into number of bytes (in case of communication). A much more detailed view on workloads and their modeling can be found in [8], for example, modeling time-varying resource usage or upper and lower bounds (worst-case and best-case resource demands). • AND and OR: As a last simple example, let us suppose a component that only produces output tokens if there are tokens on all inputs (AND). Then the relation between the arrival functions at the inputs Nicolescu/Model-Based Design for Embedded Systems 67842_C001 Finals Page 11 2009-10-1 Performance Prediction of Distributed Platforms 11 R 1 (s, t) and R 2 (s, t), and output R  (s, t) is R  (s, t) = min{B 1 (s) +R 1 (s, t), B 2 (s) + R 2 (s, t)}, where B 1 (s) and B 2 (s) denote the buffer levels in the input channels at time s. If the component produces an output token for every token at any input (OR), we find R  (s, t) = R 1 (s, t) +R 2 (s, t). The elementary components described above do not interact with the available resources at all. On the other hand, it would be highly desirable to express the fact that a component may need resources in order to operate on the available input tokens. A greedy processing component (GPC) takes an input arrival function, R(s, t), and produces an output arrival function, R  (s, t), by means of a service function, C(s, t). It is defined by the input/output relation R  (s, t) = inf s≤λ≤t {R(s, λ) +C(λ, t) +B(s),C(s, t)} where B(s) denotes the initial buffer level in the inputchannel. The remaining service function of the remaining resource is given by C  (s, t) = C(s, t) − R  (s, t) The above definition can be related to the intuitive notion of a greedy component as follows: The output between some time λ and t cannot be larger than C(λ, t), and, therefore, R  (s, t) ≤ R  (s, λ) +C(λ,t),andalso R  (s, t) ≤ C(s, t). As the component cannot output more than what was available at the input, we also have R  (s, λ) ≤ R(s, λ) + B(s), and, therefore, R  (s, t) ≤ min{R(s, λ) + C(λ, t) + B(s), C(s,t)}. Let us suppose that there is some last time λ ∗ before t when the buffer was empty. At λ ∗ , we clearly have R  (s, λ ∗ ) = R(s, λ ∗ ) + B(s). In the interval from λ ∗ to t, the buffer is never empty and all available resources are used to produce output tokens: R  (s, t) =R(s,λ ∗ )+B(s)+C(λ ∗ , t). If the buffer is never empty, we clearly have R  (s, t) = C(s,t), as all available resources are used to produce output tokens. As a result, we obtain the mentioned input–output relation of a GPC. Note that the above resource and timing semantics model almost all prac- tically relevant processing and communication components (e.g., processors that operate on tasks and use queues to keep ready tasks, communication networks, and buses). As a result, we are not restricted to model the processing time with a fixed delay. The service function can be chosen to represent a resource that is available only in certain time intervals (e.g., time division multiple access [TDMA] scheduling), or which is the remaining service after a resource has performed other tasks (e.g., fixed priority scheduling). Note that a scaler can be used to perform the appropriate conversions between token and resource units. Figure 1.4 depicts the examples of concrete components we considered so far. Note that further models of computation can be described as well, for example, (greedy) Shapers that limit the amount of output tokens to a given shaping function, σ, according to R  (s, t) ≤ σ(t −s) (see Section 1.4 and also [19]). Nicolescu/Model-Based Design for Embedded Systems 67842_C001 Finals Page 12 2009-10-1 12 Model-Based Design for Embedded Systems Scaler Tokenizer AND OR GPC Shaper R R 2 R 2 R 1 R 1 RRRR RRR R R ω + GPC σ C C FIGURE 1.4 Examples of component types as described in Section 1.3.2. 1.3.3 Composition The components shown in Figure 1.4 can now be combined to form a component network that not only describes the flow of tokens but also the interaction with the available resources. Figure 1.3b shows the component network that corresponds to the video decoding example. Here, the components, as introduced in Section 1.3.2, are used. Note that necessary scaler and tokenizer components are not shown for simplicity, but they are needed to relate the different units of tokens and resources, and to form tokens out of partially computed data. For example, the input events described by the arrival function, R LR ,trigger the tasks in the process P 3 , which runs on CPU 2 whose availability is described by the service function, C 2 . The output drives the task in the process P 4 , which runs on CPU 3 with a second priority. This is modeled by feed- ing the GPC component with the remaining resources from the process P 2 . We can conclude that the flow of event streams is modeled by connecting the “arrival” ports of the components and the scheduling policy is modeled by connecting their “service” ports. Other scheduling policies like the non- preemptive fixed priority, earliest deadline first, TDMA, general processor share, various servers, as well as any hierarchical composition of these policies can be modeled as well (see Section 1.4). 1.4 Modular Performance Analysis with Real-Time Calculus In the previous section, we have presented the characterization of event and resource streams, and their transformation by elementary concrete processes. We denote these characterizations as concrete, as they represent components, event streams, and resource availabilities in the time domain and work on concrete stream instances only. However, event and resource streams can exhibit a large variability in their timing behavior because of nondetermin- ism and interference. The designer of a real-time system has to provide performance guarantees that cover all possible behaviors of a distributed system Nicolescu/Model-Based Design for Embedded Systems 67842_C001 Finals Page 13 2009-10-1 Performance Prediction of Distributed Platforms 13 and its environment. In this section, we introduce the abstraction of the MPA with the RTC [1] (MPA-RTC) that provides the means to capture all possible interactions of event and resource streams in a system, and permits to derive safe bounds on best-case and worst-case behaviors. This approach was first presented in [13] and has its roots in network calculus [7]. It permits to analyze the flow of event streams through a network of heterogeneous computation and communication resources in an embedded platform, and to derive hard bounds on its performance. 1.4.1 Variability Characterization In the MPA, the timing characterization of event streams and of the resource availability is based on the abstractions of arrival curves and service curves, respectively. Both the models belong to the general class of variability characterization curves (VCCs), which allow to precisely quantify the best-case and worst-case variabilities of wide-sense-increasing functions [8]. For simplicity, in the rest of the chapter we will use the term VCC if we want to refer to either arrival or service curves. In the MPA framework, an event stream is described by a tuple of arrival curves, α(Δ) =[α l (Δ), α u (Δ)], where α l : R ≥0 → R ≥0 denotes the lower arrival curve and α u : R ≥0 → R ≥0 the upper arrival curve of the event stream. We say that a tuple of arrival curves, α(Δ), conforms to an event stream described by the arrival function, R(s, t), denoted as α |= R iff for all t > s we have α l (t − s) ≤ R(s, t) ≤ α u (t − s). In other words, there will be at least α l (Δ) events and at most α u (Δ) events in any time interval [s, t) with t − s = Δ. In contrast to arrival functions, which describe one concrete trace of an eventstream,atuple ofarrivalcurves representsallpossible tracesof astream. Figure 1.5a shows an example tuple of arrival curves. Note that any event stream can be modeled by an appropriate pair of arrival curves, which means that this abstraction substantially expands the modeling power of standard event arrival patterns such as sporadic, periodic, or periodic with jitter. Similarly, the availability of a resource is described by a tuple of service curves, β(Δ) =[β l (Δ), β u (Δ)], where β l : R ≥0 → R ≥0 denotes the lower service curve and β u : R ≥0 → R ≥0 the upper service curve. Again, we say that a tuple of service curves, β(Δ), conforms to an event stream described by the service function, C(s, t), denoted as β |= C iff for all t > s we have β l (t −s) ≤ C(s, t) ≤ β u (t −s). Figure 1.5b shows an example tuple of service curves. Note that, as defined above, the arrival curves are expressed in terms of events while the service curves are expressed in terms of workload/ service units. However, the component model described in Section 1.4.2 requires the arrival and service curves to be expressed in the same unit. The transformation of event-based curves into resource-based curves and vice versa is done by means of so-called workload curves which are VCCs Nicolescu/Model-Based Design for Embedded Systems 67842_C001 Finals Page 14 2009-10-1 14 Model-Based Design for Embedded Systems 8 6 4 2 0 (a) (b) 5101520 # Events α u α l Δ # Cycles 3e4 2e4 1e4 0102030 β u β l Δ FIGURE 1.5 Examples of arrival and service curves. GPC αα β (a) GPC R C C β R (b) FIGURE 1.6 (a) Abstract and (b) concrete GPCs. themselves. Basically, these curves define the minimum and maximum workloads imposed on a resource by a given number of consecutive events, i.e., they capture the variability in execution demands. More details about workload transformations can be found in [8]. In the simplest case of a con- stant workload w for all events, an event-based curve is transformed into a resource-based curve by simply scaling it by the factor w. This can be done by an appropriate scaler component, as described in Section 1.3. 1.4.2 Component Model Distributed embedded systems typically consist of computation and communication elements that process incoming event streams and are mapped on several different hardware resources. We denote such event-processing units as components. For instance, in the system depicted in Figure 1.2, we can identify six components: the four tasks, P 1 , P 2 , P 3 and P 4 , as well as the two shaper components, S 1 and S 2 . In the MPA framework, an abstract component is a model of the processing semantics of a concrete component, for instance, an application task or a concrete dedicated HW/SW unit. An abstract component models the execution of events by a computation or communication resource and can be Nicolescu/Model-Based Design for Embedded Systems 67842_C001 Finals Page 15 2009-10-1 Performance Prediction of Distributed Platforms 15 seen as a transformer of abstract event and resource streams. As an example, Figure 1.6 shows an abstract and a concrete GPC. Abstract components transform input VCCs into output VCCs, that is, they are characterized by a transfer function that relates input VCCs to output VCCs. We say that an abstract component conforms to a concrete component if the following holds: Given any set of input VCCs, let us choose an arbitrary trace of concrete component inputs (event and resource streams) that conforms to the input VCCs. Then, the resulting output streams must conform to the output VCCs as computed using the abstract transfer function. In other words, for any input that conforms to the corresponding input VCCs, the output must also conform to the corresponding output VCCs. In the case of the GPC depicted in Figure 1.6, the transfer function Φ ofthe abstract component is specified by a set of functions that relate the incoming arrival and service curves to the outgoing arrival and service curves. In this case, we have Φ =[f α , f β ] with α  = f α (α, β) and β  = f β (α, β). 1.4.3 Component Examples In the following, we describe the abstract components of the MPA framework that correspond to the concrete components introduced in Section 1.3: scaler, tokenizer, OR, AND, GPC, and shaper. Using the above relation between concrete and abstract components, we can easily determine the transfer functions of the simple components, tokenizer, scaler, and OR, which are depicted in Figure 1.4. • Tokenizer: The tokenizer outputs only integer tokens and is characterized by R  (s, t) =R(s, t). Using the definition of arrival curves, we simply obtain as the abstract transfer function α u (Δ) =α u (Δ) and α l (Δ) =α l (Δ). • Scaler:AsR  (s, t) = w · R(s, t),wegetα u (Δ) = w · α u (Δ) and α l (Δ) = w · α l (Δ). • OR: The OR component produces an output for every token at any input: R  (s, t) = R 1 (s, t) +R 2 (s, t). Therefore, we find α u (Δ) = α u 1 (Δ) + α u 2 (Δ) and α l (Δ) = α l 1 (Δ) + α l 2 (Δ). The derivation of the AND component is more complex and its corresponding transfer functions can be found in [4,17]. As described in Section 1.3, a GPC models a task that is triggered by the events of the incoming event stream, which queue up in a FIFO buffer. The task processes the events in a greedy fashion while being restricted by the availability of resources. Such a behavior can be modeled with the following internal relations that are proven in [17]: ∗ ∗ The deconvolutions in min-plus and max-plus algebra are defined as (f  g)(Δ) = sup λ≥0 {f(Δ + λ) − g(λ)} and (f  g)(Δ) = inf λ≥0 {f(Δ + λ) − g(λ)}, respectively. The convo- lution in min-plus algebra is defined as (f ⊗g)(Δ) = inf 0≤λ≤Δ {f(Δ −λ) +g(λ)}. . Nicolescu /Model-Based Design for Embedded Systems 67842_C001 Finals Page 6 2009-10-1 6 Model-Based Design for Embedded Systems In addition, in many cases, the same simulation environment can be used for. worst-case packet buffer occupa- tion for the various tasks. Nicolescu /Model-Based Design for Embedded Systems 67842_C001 Finals Page 8 2009-10-1 8 Model-Based Design for Embedded Systems Network Network interface V HR V LR CPU1. and (b) with resource interaction. Nicolescu /Model-Based Design for Embedded Systems 67842_C001 Finals Page 10 2009-10-1 10 Model-Based Design for Embedded Systems number of tokens in a time interval.

Định dạng
Số trang	10
Dung lượng	493,96 KB