176 Model-Based Design for Embedded Systems 23 MATLAB Homepage: http://www.mathworks.com Visited 200809-30 24 Modelica Homepage: http://modelica.org Visited 2008-09-30 25 ns-2 Homepage: http://www.isi.edu/nsnam/ns Visited 2008-09-30 26 Martin Ohlin, Dan Henriksson, and Anton Cervin TrueTime 1.5— Reference Manual, January 2007 Homepage: http://www.control.lth.se/ truetime 27 OMNeT++ Homepage: http://www.omnetpp.org Visited 2008-09-30 28 F Österlind A sensor network simulator for the Contiki OS Technical report T2006-05, SICS – Swedish Institute of Computer Science, February 2006 29 L Palopoli, L Abeni, and G Buttazzo Real-time control system analysis: An integrated approach In Proceedings of the 21st IEEE Real-Time Systems Symposium, Orlando, FL, December 2000 30 A Panousopoulou and A Tzes Utilization of mobile agents for Voronoibased heterogeneous wireless sensor network reconfiguration In Proceedings of the European Control Conference (ECC), Kos, Greece, 2007 31 C.E Perkins and E.M Royer Ad-hoc on-demand distance vector (AODV) routing In Proceedings of the Second IEEE Workshop on Mobile Computing Systems and Applications, New Orleans, LA, 1999 32 RUNES—Reconfigurable Ubiquitous Networked Embedded Systems Homepage: http://www.ist-runes.org Visited 2008-09-30 33 Scilab Homepage: http://www.scilab.org Visited 2008-09-30 34 F Singhoff, J Legrand, L Nana, and L Marcé Cheddar: A flexible real time scheduling framework ACM SIGAda Ada Letters, 24(4), 1–8, 2004 35 M.F Storch and J.W.-S Liu DRTSS: A simulation framework for complex real-time systems In Proceedings of the Second IEEE Real-Time Technology and Applications Symposium, Boston, MA, 1996 36 H.-Y Tyan Design, realization and evaluation of a component-based compositional software architecture for network simulation PhD thesis, Ohio State University, 2002 37 B Zurita Ares, C Fischione, A Speranzon, and K.H Johansson On power control for wireless sensor networks: Radio model, software implementation and experimental evaluation In Proceedings of the European Control Conference (ECC), Kos, Greece, 2007 Part II Design Tools and Methodology for Multiprocessor System-on-Chip MPSoC Platform Mapping Tools for Data-Dominated Applications Pierre G Paulin, Olivier Benny, Michel Langevin, Youcef Bouchebaba, Chuck Pilkington, Bruno Lavigueur, David Lo, Vincent Gagne, and Michel Metzger CONTENTS 7.1 Introduction 7.1.1 Platform Programming Models 7.1.1.1 Explicit Capture of Parallelism 7.1.2 Characteristics of Parallel Multiprocessor SoC Platforms 7.2 MultiFlex Platform Mapping Technology Overview 7.2.1 Iterative Mapping Flow 7.2.2 Streaming Programming Model 7.3 MultiFlex Streaming Mapping Flow 7.3.1 Abstraction Levels 7.3.2 Application Functional Capture 7.3.3 Application Constraints 7.3.4 The High-Level Platform Specification 7.3.5 Intermediate Format 7.3.6 Model Assumptions and Distinctive Features 7.4 MultiFlex Streaming Mapping Tools 7.4.1 Task Assignment Tool 7.4.2 Task Refinement and Communication Generation Tools 7.4.3 Component Back-End Compilation 7.4.4 Runtime Support Components 7.5 Experimental Results 7.5.1 3G Application Mapping Experiments 7.5.2 Refinement and Simulation 7.6 Conclusions 7.6.1 Outlook References 179 181 184 184 185 186 187 188 189 190 191 192 192 192 194 194 195 197 197 198 198 202 203 204 205 7.1 Introduction The current deep submicron technology era—as it applies to low-cost, highvolume consumer digital convergence products—presents two opposing challenges: rising system-on-chip (SoC) platform development costs and 179 180 Model-Based Design for Embedded Systems shorter product market windows Compounding the problem is the rate of change due to evolving specifications and the appearance of multiple standards that need to be incorporated into a single platform There are three main causes to the rising SoC platform development costs The first is the continued rise in gate and memory count Today’s SoCs can have over 100 million transistors—enough to theoretically place the logic of over one thousand 32 bit RISC processors on a single die Leveraging these capabilities is a major challenge The second cause is the increased complexity of dealing with deep submicron effects These include electro-migration, voltage-drop, and on-chip variations These effects are having a dampening impact on design productivity Also, rising mask set costs—currently over one million dollars—compound the problem, and present a nearly insurmountable financial market entry barrier for smaller companies The third cause is the rising embedded software development cost in current generation SoCs, driven by an accelerated rate of new feature introduction This is partly because of the convergence of computing, consumer, and communications domains that implies supporting a broader range of functionalities and standards for a wide set of geographic markets While the growth of hardware complexity in SoCs has tracked Moore’s law, with a resulting growth of 56% in transistor count per year, industry studies [22] show that the complexity of embedded S/W is rising at a staggering 140% per year This software now represents over 50% of development costs in most SoCs and over 75% in emerging multiprocessor SoC (MP-SoC) platforms As a result, the significant investment to develop the platform—typically between 10M$ and 100M$ for today’s 65 nm platforms—requires to maximize the time-in-market for a given platform On the other hand, the consumer-led product cycles imply increasingly shorter time-to-market for the applications supported by the platform Finally, customers of a given SoC platform increasingly request to add their own value-added features as a market differentiator These features are not just superficial additions, such as human-interface and top-level control code For example, a SoC platform customer may have proprietary multimedia-oriented enhancements that they want to include in the platform (e.g., image noise reduction, face recognition, etc.) All of these factors lead to the need for a domain-specific flexible platform that can be reused across a wide range of application variants In addition, time-to-market considerations mean that the platform must come with high-level application-to-platform mapping tools that increase developer productivity Both of these requirements point in the direction of highly S/W programmable platform solutions A wide range of general-purpose and domain-specific cores exist and they come with powerful compilation, debug, and analysis tools This makes them a key component of the flexible SoC of the future MPSoC Platform Mapping Tools for Data-Dominated Applications 181 From the above market trends, it is clear that multiprocessor-based platforms will play a key role Of course, delivering this flexibility cannot be achieved at any cost or power In mobile multimedia products, typical power targets for SoCs used in battery-powered products are a few hundred milliwatts [11] This suggests the use of domain-optimized heterogeneous MP-SoC platforms that will embody a rich mix of general-purpose processor cores, domain- and application-specific processor cores, and H/W processing elements (PEs) to deliver a solution at a competitive cost and power A key question is therefore how to effectively exploit this type of platform We need to tackle this challenge from three main directions: The development of high-level platform programming models The development of effective platform mapping technologies The design of parallel platforms that support the programming models and facilitate the development of the platform mapping tools This chapter focuses primarily on the first two objectives 7.1.1 Platform Programming Models A SoC platform programming model is an abstraction of a heterogeneous system consisting of a range of loosely and tightly coupled processors, local and shared memory, communication channels, various hardware accelerators, and input/output (I/O) A platform programming model must both hide and expose the functionalities offered by the platform It must hide the heterogeneity of the underlying PEs, the heterogeneity of the tools used to program these PEs, and abstract the low-level communication mechanisms between the PEs, the storage elements, and I/O blocks However, the programming model should also expose some top-level characteristics of the underlying platform It needs to capture the type of high-level parallelism supported by the platform This is because most platforms are designed to naturally support one main class of high-level programming models For example, symmetric multiprocessing using shared memory, message-passing, or streaming Moreover, in the domain of MP-SoCs, the programming model should not only abstract the programmable processors, it should also allow the exploitation of the abstract functionalities provided by all types of platform components including H/W blocks, communication channels, storage components, and I/O Figure 7.1 illustrates the programming model as the boundary between the high-level application description and the underlying heterogeneous platform 182 Model-Based Design for Embedded Systems We believe that at least three classes of platform programming models are needed: Application Control A symmetric multiprocessor (SMP) model, in Audio Video the spirit of Unix POSIX threads [15] This programming model relies on symmetric processProgramming model ing resources that access a shared memory Platform A distributed client–server programming RISC DSP model, in the spirit of CORBA [16] or NoC DCOM [17] In this approach, applications I/O Mem H/W are encapsulated into well-defined components with explicit interfaces It relies on an abstract message-passing communication scheme where all communication between FIGURE 7.1 Application, platform, parallel application components is explicit programming A dataflow-oriented streaming programming and model, as illustrated by StreamIt [3] and model Brooks [2] As with the client–server model, this approach encapsulates applications into well-defined S/W components, but implements a dataflow-driven static or dynamic communication semantic Control is typically fairly simple Table 7.1 summarizes the main advantages and drawbacks of these three programming models • In the SMP model, the application is organized as a set of processes that share a common operating system (OS) and memory This model provides the support of current OSs and facilitates the use of legacy code Moreover, some form of load balancing of resources is usually supported However, the data coherency has to be maintained This typically involves expensive cache coherency hardware In datadominated applications, this programming model implies high data bandwidth for inter-processor communication unless data movement is controlled carefully By definition, it is designed for symmetric systems and is hardly applicable for heterogeneous processing resources In practical implementations of SMP platforms, scalability is limited between two and eight processors • In the client–server model, the application is organized as a set of clients and servers; the client makes a service request from the server that fulfills the request Generally, an object request broker (ORB) acts as an agent between the client request and the completion of this request This model is appropriate for heterogeneous systems and control-oriented applications and it presents a good potential for scaling and load balancing However, the client–server model requires data marshaling—the process of gathering data and transforming it into a standard format before it is transmitted over a network—so that the data can transcend network boundaries [8] This generalization of MPSoC Platform Mapping Tools for Data-Dominated Applications 183 TABLE 7.1 Programming Models for MPSoCs Programming Model Advantages SMP Natural support of current OS Legacy code support Load balancing Client–server Supports heterogeneous systems Potential for scaling and load balancing Good support for control-oriented application Low overhead communications Reduced data bandwidth on communication channels Orthogonal communication and computation Easy to estimate the communication requirements of the application Streaming Drawbacks Need to maintain coherence of local, shared data High inter-processor data communication bandwidth Limited scalability No support for heterogeneous systems Marshalling problem Heavy infrastructure Lack of streamlining Timing of control and data Poor support for control-oriented applications the communication adds to the complexity of the supporting infrastructure and implies some performance overhead • In comparison with the client–server and SMP models, the streaming programming model provides poor support for control-oriented computation, and the timing of control and data is difficult However, this model is more suitable for data-oriented applications The streaming model enables low overhead communications and the reduction of data bandwidth Moreover, communication and computation are orthogonal and by analyzing the communication edges in a stream computation, it is possible to obtain precise estimates of the communication requirements for a given application This greatly simplifies analysis and mapping of application onto parallel architectures [1] In summary, there is a continuum of characteristics that need to be considered when moving between SMP on one end, client–server in the middle, and streaming on the other end SMP is the most preferred general-purpose model, it is relatively user-friendly, but this ease of use is at the expense of predictability, performance, and cost At the opposite end of the continuum, streaming is a more constrained, predictable, and understandable model, but is more specialized toward dataflow and requires more time to express and optimize The client–server programming model is more general-purpose than streaming, and expresses control applications better However, 184 Model-Based Design for Embedded Systems automatic load balancing can imply high-communication bandwidth between PEs Each of these programming models have their advantages and inconveniences, and we have found that, for the consumer style multimedia and communications SoC platforms we have been working with, we need to use all three—sometimes making use of more than one for a single platform, often in a tightly coupled, interoperable fashion Due to the tight constraints in the design of MP-SoCs, the designers have to choose the appropriate programming model(s) in order to develop their applications on a particular platform or subsystem 7.1.1.1 Explicit Capture of Parallelism A key assumption made here—for all three programming models, as we have defined them—is that the application developer is responsible for identifying and explicitly expressing parallelism However, in our experience for domain-specific application code in communications, imaging, video, and audio, this is a reasonable assumption Parallelism is tractable and well understood in many cases Moreover, designers have been dealing with this type of parallelism in hardware-based platforms for many years For an application such as an MPEG4 video encoder consisting of 10,000 lines of sequential C reference code, our experience has shown that the parallelization represents less than one or two person-months of work (for a person already familiar with the application and the programming model) 7.1.2 Characteristics of Parallel Multiprocessor SoC Platforms While our research work is focused primarily on the programming models and platform mapping tools, the characteristics of the target MP-SoC platform have a significant impact on the complexity of the mapping problem, and the efficiency of the end results From an idealistic mapping tools-only perspective, the MP-SoC platforms would embed a homogeneous set of general-purpose RISC-style processors This is not realistic for the foreseeable future [20]: • Domain-specific cores such as DSPs offer 2X–4X performance in their domain of application via instruction specialization and wider instruction words The combination of SIMD-style word-level parallelism can increase performance by another factor of 2X–8X in certain cases • Configurable ASIPs (application-specific instruction-set processors) can offer 10X–100X performance improvements via applicationspecific instruction sets and tightly coupled H/W coprocessors • Hardware coprocessors can offer 100X or more performance advantages and/or significant power and area savings They will remain essential for highly parallel, regular operations with high data rates In particular, for data processing operations that are fixed for an MPSoC Platform Mapping Tools for Data-Dominated Applications 185 application domain (e.g., direct and inverse discrete cosine transforms—DCT and iDCT—used in video processing) • Legacy code and general-purpose OS support will often dictate the host processor for the platform The data representation used in this processor is not likely to be compatible with the parallel processor subsystems, or the hardware coprocessors • Some application tasks will not be parallelizable; therefore, fast general-purpose cores will be necessary to support these As a result, we believe that a performance and power effective platform for the consumer-dominated convergence platforms will be composed of a heterogeneous composition of the following PE types: • A medium to high-performance, general-purpose RISC core, typically running a standard general-purpose OS Increasingly, this host system will consist of a two to four core SMP cluster, as they appear in the marketplace All the top-level control code will run here Legacy code that is not performance critical will also run on this processor Finally, customer-specific developments and controlled access to the domainspecific parallel subsystems will usually occur via this general-purpose processor and OS pair • Domain-specific subsystems composed of mostly homogeneous, lightweight multiprocessor clusters Although homogeneous, the instruction-set of these processors will typically be optimized toward a broad application domain (e.g., video codec, image quality improvement, wireless communications, and 3D graphics) • Tightly coupled hardware PEs for domain-specific data processing functions • Domain-specific I/O blocks, which are becoming increasingly flexible 7.2 MultiFlex Platform Mapping Technology Overview This section introduces the MultiFlex technology, which supports the mapping of user-defined parallel applications, expressed in one or more programming models, onto a MP-SoC platform The support in MultiFlex of a lightweight SMP programming model was described in [12] This uses a hardware-assisted concurrency engine to support small grain parallelism dynamically In MultiFlex, the client–server programming model is referred to as “DSOC” (Distributed System Object Component), and was also described in [12] This toolset supports static and dynamic load balancing and supports heterogeneous PEs with potentially different data representations Dynamic load balancing is achieved using either a lightweight S/W-based kernel to dynamically schedule large-grain tasks, or a hardware-assisted ... illustrates the programming model as the boundary between the high-level application description and the underlying heterogeneous platform 182 Model- Based Design for Embedded Systems We believe that... products—presents two opposing challenges: rising system-on-chip (SoC) platform development costs and 179 180 Model- Based Design for Embedded Systems shorter product market windows Compounding the problem... applications better However, 184 Model- Based Design for Embedded Systems automatic load balancing can imply high-communication bandwidth between PEs Each of these programming models have their advantages