WolfFront.fm Page i Wednesday, August 9, 2006 1:21 PM In Praise of High-Performance Embedded Computing: Architectures, Applications, and Methodologies High-Performance Embedded Computing is a timely addition to the literature of system design The number of designs, as well as the number of installed embedded systems, vastly surpasses those of general purpose computing, yet there are very few books on embedded design This book introduces a comprehensive set of topics ranging from design methodologies to metrics to optimization techniques for the critical embedded system resources of space, time, and energy There is substantial coverage of the increasingly important design issues associated with multiprocessor systems Wayne Wolf is a leading expert in embedded design He has personally conducted research on many of the topics presented in the book, as well as practiced the design methodologies on the numerous embedded systems he has built This book contains information valuable to the embedded system veteran as well as the novice designer —Daniel P Siewiorek, Carnegie Mellon University High-Performance Embedded Computing addresses high-end embedded computers— certainly an area where a skilled balance between hardware and software competencies is particularly important for practitioners, and arguably a research domain which will be at the heart of the most interesting methodological evolutions in the coming years Focusing on best industrial practices and real-world examples and applications, Wayne Wolf presents in an organized and integrated way an impressive amount of leading-edge research approaches, many of which will most likely become key differentiators for winning designs in the coming decade This is a timely book ideally suited both for practitioners and students in advanced embedded computer engineering courses, as well as researchers and scientists who want to get a snapshot of the important research taking place at the confluence of computer architecture and electronic design automation —Paolo Ienne, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland As processors continue to leap off our desks and embed themselves into our appliances, our cars, our phones, and perhaps soon our clothing and our wallets, it’s become clear that embedded computing is no longer a slow, boring sideshow in the architecture circus It’s moved to the center ring Wayne Wolf’s book pulls all the diverse hardware and software threads together into one solid text for the aspiring embedded systems builder —Rob A Rutenbar, Carnegie Mellon University Educators in all areas of computer systems and engineering should take a look at this book Its contrasting perspective on performance, architecture, and design offer an enhanced comprehension of underlying concepts to students at all levels In my opinion, it represents the shape of things to come for anyone seeking a career in “systems.” —Steven Johnson, Indiana University WolfFront.fm Page ii Wednesday, August 9, 2006 1:21 PM More and more embedded devices are available, as people now walk around with cell phones, PDAs, and MP3 players at their side The design and constraints of these devices are much different than those of a generic computing system, such as a laptop or desktop PC High-Performance Embedded Computing provides an abundance of information on these basic design topics while also covering newer areas of research, such as sensor networks and multiprocessors —Mitchell D Theys, University of Illinois at Chicago High-Performance Embedded Computing not only presents the state of the art in embedded computing augmented with a discussion of relevant example systems, it also features topics such as software/hardware co-design and multiprocessor architectures for embedded computing This outstanding book is valuable reading for researchers, practitioners, and students —Andreas Polze, Hasso-Plattner-Institute, Universität Potsdam Embedded computer systems are everywhere This state-of-the-art book brings together industry practices and the latest research in this arena It provides an in-depth and comprehensive treatment of the fundamentals, advanced topics, contemporary issues, and real-world challenges in the design of high-performance embedded systems HighPerformance Embedded Computing will be extremely valuable to graduate students, researchers, and practicing professionals —Jie Hu, New Jersey Institute of Technology WolfFront.fm Page iii Wednesday, August 9, 2006 1:21 PM High-Performance Embedded Computing WolfFront.fm Page iv Wednesday, August 9, 2006 1:21 PM About the Author Wayne Wolf is a professor of electrical engineering and associated faculty in computer science at Princeton University Before joining Princeton, he was with AT&T Bell Laboratories in Murray Hill, New Jersey He received his B.S., M.S., and Ph.D in electrical engineering from Stanford University He is well known for his research in the areas of hardware/software co-design, embedded computing, VLSI, and multimedia computing systems He is a fellow of the IEEE and ACM and a member of the SPIE He won the ASEE Frederick E Terman Award in 2003 He was program chair of the First International Workshop on Hardware/Software Co-Design Wayne was also program chair of the 1996 IEEE International Conference on Computer Design, the 2002 IEEE International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, and the 2005 ACM EMSOFT Conference He was on the first executive committee of the ACM Special Interest Group on Embedded Computing (SIGBED) He is the founding editor-in-chief of ACM Transactions on Embedded Computing Systems He was editor-in-chief of IEEE Transactions on VLSI Systems (1999–2000) and was founding co-editor of the Kluwer journal Design Automation for Embedded Systems He is also series editor of the Morgan Kaufmann Series in Systems on Silicon WolfFront.fm Page v Wednesday, August 9, 2006 1:21 PM High-Performance Embedded Computing Architectures, Applications, and Methodologies Wayne Wolf Princeton University AMSTERDAM • BOSTON • HEIDELBERG • LONDON NEW YORK • OXFORD • PARIS • SAN DIEGO SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO MORGAN KAUFMANN PUBLISHERS IS AN IMPRINT OF ELSEVIER WolfFront.fm Page vi Wednesday, August 9, 2006 1:21 PM Publisher Publishing Services Mgr Developmental Editor Editorial Assistant Cover Design Cover Image Text Design Composition Technical Illustration Proofreader Indexer Printer Denise E M Penrose George Morrison Nate McFadden Kimberlee Honjo Dick Hanus Corbus Design Rebecca Evans & Associates Multiscience Press, Inc diacriTech Jodie Allen Steve Rath Maple-Vail Book Manufacturing Group Morgan Kaufmann Publishers is an imprint of Elsevier 500 Sansome Street, Suite 400, San Francisco, CA 94111 This book is printed on acid-free paper © 2007 by Elsevier Inc All rights reserved Designations used by companies to distinguish their products are often claimed as trademarks or registered trademarks In all instances in which Morgan Kaufmann Publishers is aware of a claim, the product names appear in initial capital or all capital letters Readers, however, should contact the appropriate companies for more complete information regarding trademarks and registration No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means— electronic, mechanical, photocopying, scanning, or otherwise—without prior written permission of the publisher Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in Oxford, UK: phone: (+44) 1865 843830, fax: (+44) 1865 853333, e-mail: permissions@elsevier.co.uk You may also complete your request online via the Elsevier homepage (http://elsevier.com) by selecting “Customer Support” and then “Obtaining Permissions.” Library of Congress Cataloging-in-Publication Data Application submitted ISBN 13: 978-0-12-369485-0 ISBN 10: 0-12-369485-X For information on all Morgan Kaufmann publications, visit our Web site at www.mkp.com or www.books.elsevier.com Printed in the United States of America 06 07 08 09 10 WolfFront.fm Page vii Wednesday, August 9, 2006 1:21 PM To Nancy and Alec WolfFront.fm Page viii Wednesday, August 9, 2006 1:21 PM Supplemental Materials Resources for this book are available at textbooks.elsevier.com/012369485X The instructor site, which is accessible to adopters who register at textbooks.elsevier.com, includes: ■ Instructor slides (in ppt format) ■ Figures from the text (in jpg and ppt formats) ■ Solutions to exercises (in pdf format) The companion site (accessible to all readers) features: ■ Links to related resources on the Web ■ A list of errata Wolf.book Page ix Tuesday, August 8, 2006 1:56 PM Contents Preface Chapter xvii Embedded Computing 1.1 The Landscape of High-Performance Embedded Computing 1.2 Example Applications 1.2.1 Radio and Networking 1.2.2 Multimedia 1.2.3 Vehicle Control and Operation 1.2.4 Sensor Networks 5 11 15 18 1.3 Design Goals 21 1.4 Design Methodologies 1.4.1 Basic Design Methodologies 1.4.2 Embedded Systems Design Flows 1.4.3 Standards-Based Design Methodologies 1.4.4 Design Verification and Validation 1.4.5 A Methodology of Methodologies 1.4.6 Joint Algorithm and Architecture Development 22 24 26 28 31 31 32 1.5 Models of Computation 1.5.1 Why Study Models of Computation? 1.5.2 Finite versus Infinite State 1.5.3 Control Flow and Data Flow Models 33 33 34 38 ix Wolf.book Page 507 Tuesday, August 8, 2006 1:56 PM Index 507 networks, operating systems, reliability, 47–51 security, Embedded file systems, 255–56 Embedded multiprocessors, 268, 269–75 constraints, 269 flexibility and efficiency, 275 performance and energy, 272–74 requirements, 270–71 specialization and, 274–75 See also Multiprocessors Energy-aware scheduling, 421 Energy model, 91–92 Energy/power consumption, 21 ENSEMBLE, 369 Error-correction codes, 7, 50 Esterel, 210–11 compiling, 210–11 defined, 210 example program, 210 Event-driven state machine, 207–8 Event function model, 354 Events, 20, 353 input, 354 models, 353–54 output, 354 output timing, 354–55 sporadic, 354 Faults actions after, 50 design, 48 operational, 48 physical, 48 sources, 48 Feasibility factor, 417 as constraint, 418 as objective genetic algorithms, 418 Field-based encoding, 106 Fine-grained scheduling, 371 Finite-state machines (FSMs), 35 asynchronous, 35 communication, 43–44 defined, 35 nondeterministic, 36–37 streams, 35–36 verification and, 36 Finite versus infinite state, 34–38 Firing rules, 214 First-come-first-served (FCFS) scheduling, 389–90 First-reaches table, 203 Flash-based memory, 256–57 NAND, 256 NOR, 256 virtual mapping, 257 wear leveling, 257 FlexRay, 316–24 active stars, 318–19 arbitration grid, 317 block diagram, 317 bus guardians, 316, 323 clock synchronization procedure, 324 communication cycle, 318 controller host interface, 323 defined, 316 dynamic segments, 321, 322 frame encoding, 320 frame fields, 320–21 frame format, 321 levels of abstraction, 318 macrotick, 316–17 microtick, 316 network stack, 318 physical layer, 319–20 static segments, 321–22 system startup, 322–23 timekeeping, 323 timing, 316–17 FlexWare compiler, 163, 164, 170 Floorplanning, 426 Flow control, interconnection networks, 296 Flow facts, 193, 194 Wolf.book Page 508 Tuesday, August 8, 2006 1:56 PM 508 Index Force-directed scheduling, 390–92 calculation, 391 defined, 390 distribution graph, 390, 391 forces, 391 predecessor/successor forces, 392 self forces, 392 See also Multiprocessor scheduling Forward channel, FTL-LITE, 258 General-purpose computing systems, 362 Genetic algorithms, 418 Giotto, 245–47 actuator ports, 245 defined, 245 execution cycle, 246–47 mode switch, 246 program implementation, 247 sensor ports, 245 switch frequency, 246 target mode, 246 Global Criticality/Local Phase (GCLP) algorithm, 408 Global optimizations, 174–76 Global slack, 406 Global time service, 364 GMRS, 375 GOPS, 417 Gordian knot, 407 Guards, 398 H.26x standard, 13 H.264 standard, 13, 302–4 Halting problem, 38 Hardware abstractions, 330–32 architectures, design methodologies, 25–26 event handlers, 20 resources, tabular representation, 396 Hardware abstraction layer (HAL), 367 Hardware radio, Hardware/software co-design, 26, 27, 32, 383–430 algorithms, 396–428 Annapolis Micro Systems WILDSTAR II Pro, 386–87 ARM Integrator logic module, 386 custom integrated circuit, 385 custom-printed circuit board, 384 high-level synthesis, 387–92 memory management, 425 memory systems, 422–25 as methodology, 384 multi-objective optimization, 416–21 PC-based system, 384 performance analysis, 387–96 platforms, 384–87 Xiilnx Virtex-4 FX platform FPGA family, 385–86 Hardware/software co-simulation, 428–29 backplanes, 428 co-simulators, 428–29 with simulation backplane, 429 Hardware/software partitioning, 400 Hazard function, 49 Heterogeneously programmed systems, 34 Heterogeneous memory systems, 307–9 Heterogeneous multiprocessors, 274 in co-design, 385 problems, 329, 338 See also Multiprocessors Hierarchical co-synthesis, 414 High-level services, 58–60 High-level synthesis, 387–92 behavioral description, 387 control step, 389 cost reduction using, 406–7 critical-path scheduling, 390 Wolf.book Page 509 Tuesday, August 8, 2006 1:56 PM Index 509 for estimates, 411 FCFS scheduling, 389–90 force-directed scheduling, 390–92 goals, 387 list scheduling, 390 path-based scheduling, 392 register-transfer implementation, 388 Hot swapping, 234 HP DesignJet printer, 308 Huffman coding, 11, 12 defined, 101 illustrated, 101 Hunter/Ready OS, 247 IBM CodePack, 100 IBM Coral tool, 327 Ideal parallelism, 171 Index rewriting, 172 Instruction caching, 193–95 Instruction formation, 145 Instruction issue width, 68 Instruction-level parallelism, 45, 84 Instructions in combination, 147 large, 148 models for, 157–60 performance from, 187 scheduling, 68 template generation, 145 template size versus utilization, 147 Instruction scheduling, 131, 163–66 Araujo and Malik algorithm, 166 constraint modeling, 164–65 defined, 157 Instruction selection, 157–60 defined, 156 as template matching, 158 See also Code generation Instruction set design space, 143 metrics, 144 search algorithms, 145 style, 68 synthesis, 143–50 Integer linear programming (ILP), 188 abstract interpretation and, 196 path analysis by, 188 Intel Flash File System, 258–59 mote, 18–19 XScale, 88 Interactive data language (IDL), 363 Interconnection networks, 289–304 application-specific, 295–96 area, 289 buses, 292–93 Clos, 294 crossbars, 293–94 energy consumption, 289 flow control, 296 H.264 decoder, 302–4 latency, 289 link characteristics, 290 mesh, 294–95 metrics, 289 models, 290–92 NetChip, 302 NoCs, 296–304 Nostrum, 297–98 Poisson model, 292 QNoC, 301–2 routing, 296 Sonics SiliconBackplane III, 304 SPIN, 298 throughput, 289 TI McBSP, 291 topologies, 289, 292–96 xpipes, 302 See also Multiprocessors Inter-event stream context, 356 Interface co-synthesis, 422 Intermodule connection graph, 343–44 Internet, Internet Protocol (IP), Internetworking standard, Wolf.book Page 510 Tuesday, August 8, 2006 1:56 PM 510 Index Interprocess communication (IPC) mechanisms, 254 Interprocessor communication modeling graph, 344, 345 Interrupt-oriented languages, 199–200 methodologies, 200 NDL, 199 video drivers, 199 Interrupt service routine (ISR), 250 Interrupt service thread (IST), 250 Interval scheduling, 228 Intra-event stream context, 358 Iterative improvement schedulers, 224, 409–10 Java, 211–14 bytecodes, 211–12 JIT, 212–13 memory management, 213–14 mnemonics, 212 Java Virtual Machine (JVM), 211 JETTY, 311 Jini, 58 Jitter event model, 353–54 Joint Tactical Radio System (JTRS), 10 Journaling, 258 Journaling Flash File System (JFFS), 258 JPEG standard, 11 DCT and, 12 JPEG 2000, 12 Just-in-time (JIT) compilers, 212 Kahn processes, 40–41 defined, 40 illustrated, 41 network, 41, 214 Latency service, 364 L-block, 189 Least common multiple (LCM), 348 Least-laxity first (LLF) scheduling, 232 Lempel-Ziv coding, 116, 117 Lempel-Ziv-Welch (LZW) algorithm, 116 Lifetime cost, 22 Limited-precision arithmetic, 149–50 Linear-time temporal logic, 216, 259 Link access control (LAC), 10 LISA system, 140–42 hardware generation, 141–42 sample modeling code, 141 See also Configurable processors List scheduling, 227–28, 390 Load balancing, 360–61 algorithm, 361 defined, 360 Load threshold estimator, 358 Local slack, 406 Logical link control and adaptation protocol (L2CAP), 54, 55 Log-structured file systems, 257–58 Loop conflict factor, 181 fusion, 174 iterations, 191–92 nests, 171 padding, 172 permutation, 172, 173 splitting, 172 tiling, 172 unrolling, 172 Loop-carried dependencies, 171 Looped containers, 375 Loop transformations, 171–74 buffering and, 177 matrices and, 173 non-unimodular, 173–74 types of, 172 Low-density parity check (LDPC), Low-power bus encoding, 117–22 Low-power design, Lucent Daytona multiprocessor, 283–84 LUSTRE, 203–4 Wolf.book Page 511 Tuesday, August 8, 2006 1:56 PM Index 511 LYCOS, 404 BSBs, 406 partitioning, 406 profiling tools, 404 Macroblocks, 13 Macrotick, 316–17, 323 Mailboxes, 254 Main memory-oriented optimizations, 182–85 Markov models, 109–10 arithmetic coding and, 108–9 for conditional character probabilities, 110 defined, 108 of instructions, 110 uses, 109 Maximum distance function, 354 Maximum mutator utilization (MMU), 213 Mealy machine, 35 Mean time to failure (MTTF), 48 MediaBench suite, 84–85, 128 Medium access control (MAC), 10 Memory area model, 91 arrays, 94–95 banked, 182, 184–85 block structure, 90 bottleneck, 170 caches, 95–98 cells, 90 component models, 89–95 consistency, 311 delay model, 91 energy model, 91–92 flash-based, 256–57 hierarchy, 89–99 layout transformations, 174 long-term, 403 multiport, 92 MXT system, 116 non-RAM components, 175 paged, 182 register files, 95, 96 scratch pad, 98–99, 180–82 short-term, 403 system optimizations, 170 Memory management embedded operating systems, 248–49 hardware/software co-design, 425 Windows CE, 348–49 Memory-oriented optimizations, 170–85 buffer, data transfer, and storage management, 176–78 cache-/scratch pad, 178–82 global, 174–76 loop transformations, 171–74 main, 182–85 Memory systems, 304–12 average access rate, 305 consistent parallel, 309–12 hardware/software co-design, 422–25 heterogeneous, 307–9 models, 306–7 multiple-bank, 304–5 parallel, 304–5 peak access rate, 305 power consumption, 308 power/energy, 308–9 real-time performance, 307–8 See also Multiprocessors Mentor Graphics Seamless system, 429 Mesh networks, 294–95 MESH simulator, 378 Message nodes, Metagenerators, 217 Metamodels, 216–17 Methodologies, 2–3 design, 3, 22–33 interrupt-oriented languages, 200 multiprocessor design, 276–77 standards-based design, 28–30 step implementation as tools, Wolf.book Page 512 Tuesday, August 8, 2006 1:56 PM 512 Index Metropolis, 215 Microarchitecture-modeling simulators, 131–32 Microtick, 316 Middleware, 361–75 defined, 362 group protocol, 55 MPI, 366 for multiparadigm scheduling, 372 resource allocation, 362 SoC, 366–70 Minimum distance function, 354 Mobile supercomputing, 272 Modal processes, 422 Model-integrated computing, 216 Models of computation, 33–46 control flow versus data flow, 34, 38–41 defined, 33 finite versus infinite state, 34–38 reasons for studying, 33–34 sequential versus parallelism, 34, 41–46 MOGAC, 419–20 components, 419 constraints, 420 genetic algorithm, 419–20 genetic model, 419 optimization procedure, 420 Moore machine, 35 Motion compensation, 13 estimation, 13, 14 vectors, 14 MP3 standard, 15 MPEG standards, 13, 14, 30 MPI (MultiProcessor Interface), 366 MultiFlex, 368–69 Multihop routing, 18 Multimedia algorithms, 85 applications, 11–15 Multi-objective optimization, 416–21 Multiparadigm scheduling, 371 Multiple-instruction, multiple-data (MIMD), 68 Multiple-instruction, single-data (MISD), 68 Multiplexers, 389 Multiport memory, 92 Multiprocessing accelerators and, 274 real time and, 274 uniprocessing versus, 274 Multiprocessors, 267–333 architectures, 279–88 ARM MPCore, 311–12 connectivity graph, 399 core-based strategy, 326–27 design methodologies, 326–32 design techniques, 275–79 embedded, 268, 269–75 generic, 268 heterogeneous, 274, 329, 338 interconnection networks, 289–304 Lucent Daytona, 283–84 memory systems, 304–12 modeling and simulation, 278–79 MPSoC, 279 PEs, 267, 288 Philips Nexperia, 281–83 Qualcomm MSM5100, 280–81 scheduling, 339 simulation as parallel computing, 278–79 specialization and, 274–75 STMicroelectronics Nomadik, 284–86 subsystems, 267 TI OMAP, 286–88 Multiprocessor scheduling, 342–58 AND activation, 355–56 communication and, 341 contextual analysis, 358 cyclic task dependencies, 358 data dependencies and, 347–48 delay estimation algorithm, 349–51 Wolf.book Page 513 Tuesday, August 8, 2006 1:56 PM Index 513 distributed software synthesis, 358–59 with dynamic tasks, 359–61 event model, 353–54 event-oriented analysis, 353 intermodule connection graph, 343–44 interprocessor communication modeling (IPC) graph, 344, 345 limited information, 340–41 models, 343 network flow, 343–44 NP-complete, 342–43 OR activation, 356–58 output event timing, 354–55 phase constraints, 351–53 preemption and, 347 static scheduling algorithm, 349 system timing analysis, 355 task activation, 355 See also Scheduling Multiprocessor software, 337–79 design verification, 376–78 embedded, 337–39 master/slave, 340 middleware, 361–75 PE kernel, 340 quality-of-service (QoS), 370–75 real-time operating systems, 339–61 role, 339–42 Multiprocessor system-on-chip (MPSoC), 279 Multitasking caches and, 239 scratch pads and, 240–41 Multithreading, 68 Mutators, 213 MXT memory system, 116 Myopic algorithm, 360 NAND memory, 256 NDL, 199 NetChip, 302 Network design, 32 Networked consumer devices, 56–57 Network layer, Networks, ad hoc, aircraft, 324–25 interconnection, 289–304 personal area, 54 physically distributed, 312–25 wireless, Networks-on-chips (NoCs), 296–304 defined, 297 design, 300 energy modeling, 299–300 Nostrum, 297–98 OCCN, 300–301 QoS and, 375 QoS-sensitive design, 300 services, 370 SPIN, 298 Nimble, 426–27 Nonblocking communication, 45 Nonconvex operator graph, 147 NOR memory, 256 Nostrum network, 297–98 resources, 298 stack, 370 Notification service, 374–75 Object Constraint Language (OCL), 217 Object request broker (ORB), 363 OCCN, 300–301 Open Systems Interconnection (OSI) model, Operating systems, 4, 223–64 design, 247–59 embedded, 248–49 Hunter/Ready, 247 IPC mechanisms, 254 memory management, 248–49 multiprocessor, 339–61 Wolf.book Page 514 Tuesday, August 8, 2006 1:56 PM 514 Index Operating systems (Cont’d.) overhead, 251–52 power management, 255 real-time (RTOSs), 223, 247–48 scheduling support, 253–54 simulation, 251 TI OMAP, 341–42 Operational faults, 48 OR activation, 356–58 illustrated, 357 jitter, 357–58 period, 356 Ordered Boolean decision diagrams (OBDDs), 36 Output line energy, 93 Paged addressing mechanisms, 183–84 Page description language, 308 Paged memories, 182 Parallel execution mechanisms, 77–86 processor resource utilization, 83–86 superscalar processors, 80 thread-level parallelism, 82–83 vector processors, 80–81 VLIW processors, 77–80 Parallelism, 41–46 architecture and, 42 communication and, 41–45 data-level, 45 defined, 41 ideal, 171 instruction-level, 45 Petri net, 42–43 subword, 81 task graphs, 42 task-level, 46 thread-level, 82–83 Parametric timing analysis, 192 Pareto optimality, 417 Path analysis, 186, 187, 188–90 cache behavior and, 189 by ILP, 188 user constraints, 189–90 Path-based estimation, 393, 394 Path-based scheduling, 392 Path ratio, 85 Paths cache behavior and, 189 crossing critical, 197 execution, 186 Path timing, 186, 190–97 abstract interpretation, 195–96 clustered analysis, 192–93 instruction caching, 193–95 loop iterations, 191–92 parametric analysis, 192 simulation-based analysis, 196–97 PC sampling, 130 PEAS III, 142–43 compiler generator, 163 defined, 142 hardware synthesis, 143 model of pipeline stage, 142 VHDL models, 143 Perceptual coding, 11 Performance average, 66 compression, 114 embedded microprocessors, 272–74 hardware/software co-design, 387–96 indices, 215 peak, 66 processor, 66–67 worst-case, 67 Periodic admissible sequential schedule (PASS), 200 Personal area networks, 54 Petri nets, 42–43, 242 behavior, 43 defined, 42 illustrated, 43 maximal expansion/cut-off markings, 243 Wolf.book Page 515 Tuesday, August 8, 2006 1:56 PM Index 515 Phase constraints, 351–53 Philips Nexperia, 281–83 Physical faults, 48 Physical layer Bluetooth, 54 defined, FlexRay, 319–20 Platform-based design, 26–28 defined, 26 illustrated, 27 phases, 28 platform programming, 28 two-stage process, 26 Platform-dependent characteristics, 277 Platform-independent measurements, 277 Platform-independent optimizations, 277 Poisson model, 292 Polyhedral reduced dependence graphs (PRDGs), 206 Polytope model, 172 Post-assembly optimizations, 167 Post-cache decompression, 106–7 Power attacks, 125 countermeasures, 126 defined, 53 See also Attacks Power management, 370 Power simulators, 131–32 Predecessor/successor forces, 392 Preferred lists, 361 Prefetching, 178 Presentation layer, Priority ceiling protocol, 233–34 Priority inheritance protocols, 233 Priority inversion, 176, 232–33 Priority schedulers, 225 dynamic priorities, 230 static priorities, 230 Priority service, 364 Procedure cache, 113 Procedure-splitting algorithm, 169 Processes completion time, 225 concurrent execution, 259 critical, 227 deadlock, 259, 260 defined, 224 execution, 225 execution time, 225 initiation time, 225 modal, 422 real-time scheduling, 224–41 response time, 225 scheduling, 68 slowdown factor, 235 specifications, 226 Processing elements (PEs), 205 defined, 267 design methodology, 288 kernel, 340 multiprocessor, 267, 288 See also Multiprocessors Process migration, 360 Processors comparing, 66–69 cost, 67 customization, 133 DSPs, 71–76 embedded, 68–69 energy, 67 evaluating, 66–67 general-purpose, 68–69 instruction issue width, 68 instruction set style, 68 memory hierarchy, 89–99 MIMD, 68 MISD, 68 performance, 66–67 power, 67 predictability, 67 resource utilization, 83–86 RISC, 67, 69–71 SIMD, 67 superscalar, 80 taxonomy, 67–68 Wolf.book Page 516 Tuesday, August 8, 2006 1:56 PM 516 Index Processors (Cont’d.) vector, 80–82 VLIW, 77–80 Procrastination scheduling, 238–39 Product machines, 36 Programming environments, 169–70 models, 197–218 Program performance analysis, 185–97 average performance, 185 BCET, 185–86 challenges, 186 measures, 185 models, 187–88 WCET, 185–86 Programs, 155–219 code generation, 156–70 defined, 155 execution paths through, 186 flow statements, 188, 189 memory-oriented optimizations, 170–85 models of, 197–218, 259 representations, 397–98 Property managers, 374 Protocol data units, 301 Ptolemy II, 215 QNoC, 301–2, 375 Qualcomm MSM5100, 280–81 Quality descriptive language, 374 Quality-of-service (QoS), 370–75 attacks, 53 CORBA and, 374 management as control, 373 model, 371 NoCs and, 375 notification service, 374–75 resources, 371 services, 370–75 Quenya model, 404–5 Radio and networking application, 5–10 RATAN process model, 346 Rate-monotonic analysis (RMA), 230 Rate-monotonic scheduling (RMS), 230 critical instant, 231 priority assignment, 230–31 utilization, 232–33 Razor latch, 88, 89 Reachability, 36 Reactive systems, 198 Real-Time Connection Ordination Protocol, 365 Real-time daemon, 364 Real-time event service, 365 Real-time operating systems (RTOSs), 223, 247–48 interrupts and scheduling, 250 ISRs/ISTs, 250 multiprocessor, 339–61 structure, 250–51 See also Operating systems Real-time primary-backup, 365 Real-time process scheduling, 224–41 algorithms, 227–34 for dynamic voltage scaling, 234–39 performance estimation, 239–41 preliminaries, 224–26 Real-Time Specification for Java (RTSJ), 174–75 Reconfigurable systems CORDS, 426 co-synthesis for, 425–28 defined, 425 Nimble, 426–27 Redundant active stars, 319, 320 Reference implementation, 29 Register allocation, 160–63 cliques, 161–62 conflict graph, 160, 161 defined, 156 graph coloring, 162 Wolf.book Page 517 Tuesday, August 8, 2006 1:56 PM Index 517 illustrated, 160 See also Code generation Register files, 95, 96 defined, 95 parameters, 95 size, 96 VLIW, 162–63 Register liveness, 160 Register-transfer implementation, 332 Relative computational load, 404 Reliability, 46, 47–51 demand for, 47 function, 49 methods, 50 system design fundamentals, 48–51 Resource allocation middleware and, 362 multiprocessor software and, 339 Resources defined, 298 dependencies, 227 QoS, 370 utilization, 83–86 Response time theorem, 348–49 Reverse channel, Rewriting rules, 159 RFCOMM, 55 Ripple scheduling, 375 RISC processors, 69–71 architecture, 69 ARM family, 70 defined, 67 embedded versus, 69 MIPS architecture, 70 PowerPC family, 70 Routing interconnection networks, 296 store-and-forward, 296 virtual cut-through, 296 wormhole, 296 RT-CORBA, 363–65 RTM, 253–54 RTU, 253 SAFE-OPS, 124 Safety, 46 Sandblaster processor, 82–83 Scalar variable placement, 178–79 Schedulers constructive, 224, 227 dynamic, 225 instruction, 131 iterative improvement, 224 list, 227–28 priority, 225 Spring, 253 static, 224 Schedules defined, 224 failure rate, 252 SDF, 202 single-appearance, 201, 202 unrolled, 348 Scheduling AFAP, 392 caches and, 240 checkpoint-driven, 237–38 critical-path, 390 data dependencies and, 227 deadline-driven, 232 for DVS, 234–39 dynamic, 68, 224 dynamic task, 360 energy-aware, 421 FCFS, 389–90 fine-grained, 371–72 force-directed, 390–92 hard, 225 instructions, 68 interval, 228 languages and, 241–47 least-laxity first (LLF), 232 list, 390 multiparadigm, 371 multiprocessor, 339, 340–41, 342–58 OS support, 253–54 processor, 68 Wolf.book Page 518 Tuesday, August 8, 2006 1:56 PM 518 Index Scheduling (Cont’d.) procrastination, 238–39 rate-monotonic (RMS), 230–32 real-time process, 224–41 ripple, 375 slack-based, 236–37 soft, 225 static, 68, 224 Scratch pads, 98–99 allocation algorithm, 183 evaluation, 182 management, 180–81 multitasking and, 240–41 performance comparison, 184 See also Memory Search metrics, 360 Security, 5, 46, 122–26 Self forces, 392 Self-programmable one-chip microcomputer (SPOM) architecture, 123 Sensor networks, 18–21 Intel mote, 18–19 TinyOS, 19–20 ZebraNet, 20–21 Serra system, 407 Service records, 55 Services ARMADA, 365 ENSEMBLE, 369 MultiFlex, 368–69 NoC, 370 Nostrum, 370 ORB, 363 QoS, 370–75 RT-CORBA, 363–65 SoC, 366–70 standards-based, 363–66 Session layer, Set-associative caches, 97 S-graphs, 241, 422 SHIM, 247 Short-term memory, 403 Side channel attacks, 123 SIGNAL, 204–5 Signal flow graph (SFG), 40 Signals analysis, 204–5 composition, 204 defined, 204 Simple event model, 353 Simple power analysis, 53 SimpleScalar, 131 Simulated annealing, 403 Simulation CPU, 126–32 direct execution, 130–31 multiprocessor, 277–79 operating systems, 251 Simulation-based timing analysis, 196 Simulators communicating, 278 heterogeneous, 279 MESH, 378 VastSystems CoMET, 376–77 Simulink, 217–18 Single-appearance schedules, 201, 202 Single-instruction, multiple data (SIMD), 67–68 Sink SCC, 346 Slack-based scheduling, 236–37 Slave threads, 403 Slowdown factor, 235 Smart cards, 123 SmartMIPS, 124 Snooping caches, 310 Software abstractions, 330–32 architectures, multiprocessor, 337–79 performance analysis, 32 tool generation, 32 verification, 32 Software-based decomposition, 111–13 Software-controlled radio, Wolf.book Page 519 Tuesday, August 8, 2006 1:56 PM Index 519 Software-defined ratio (SDR), 6–7 ideal, tiers, ultimate, Software radio, Software thread integration (STI), 242 Sonics SiliconBackplane III, 304 Source SCC, 346 Sparse time model, 314 SPECInt benchmark set, 127–28 SpecSyn, 408–9 SPIN, 261, 298 Spiral model, 24–25 Spring scheduler, 253 Standards-based design methodologies, 28–30 design tasks, 29–30 pros/cons, 28–29 reference implementation, 29 Standards-based services, 363–66 Starcore SC140 VLIW core, 80 Statecharts, 208–10 hierarchy tree, 209 interpretation, 209 STATEMATE, 208–9 variations, 208 verification, 209–10 STATEMATE, 208–9 Static buffers, 203 Static scheduling, 68 algorithms, 227 defined, 224 implementation, 227 multiprocessor, 349 See also Scheduling STMicroelectronics Nomadik multiprocessor, 284–86 Store-and-forward routing, 296 Streams, 35–36, 214 Strongly connected component (SCC), 344 sink, 346 source, 346 Subtasks, 224 Subword parallelism, 81 Superscalar processors, 80 Symmetric multiprocessing (SMP), 368–69 SymTA/S, 353 Synchronous data flow (SDF), 40, 200, 201 buffer management, 202 graphs, 200, 201 Synchronous languages, 198 as deterministic, 198 rules, 198 SystemC, 279 System-level design flow, 329–30 System-on-chip (SoC), 279 services, 366–70 template, 326 System timing analysis, 355 Task graphs, 42, 398 COSYN, 412 large, 411 Task Graphs for Free (TGFF), 398 Task-level parallelism, 46 Tasks activation, 355 assertion, 415 compare, 415 defined, 224 dynamic, 359–61 migration, 360 Technology library, 389 Template-driven synthesis algorithms, 400–407 COSYMA, 401 CoWare, 402–3 hardware/software partitioning, 400 LYCOS, 404 Quenya model, 404–5 Vulcan, 401 See also Co-synthesis algorithms Template-matching algorithm, 159 Temporal logic, 259–60 Tensilica Xpres compiler, 148 Wolf.book Page 520 Tuesday, August 8, 2006 1:56 PM 520 Index Tensilica Xtensa, 135–38 core customization, 135 defined, 135 features, 136 See also Configurable processors Testing, 31 Texas Instruments C5x DSP family, 72–74 C6x VLIW DSP, 79 C55x co-processor, 75–76 McBSP, 291 OMAP multiprocessor, 286–88, 341–42 Thread-level parallelism, 82–83 Thread pool, 363, 364 Threads CDGs, 242–45 defined, 224 integrating, 244 primary, 242 secondary, 242 slave, 403 time-loop, 403 Throughput factors, 417 Timed distributed method invocation (TDMI), 364 Time-loop threads, 403 Time quantum, 224 Time-triggered architecture (TTA), 313–16 cliques, 316 communications controller, 315 communications network interface, 314, 315 defined, 313 host node, 315 interconnection topologies, 316 sparse time model, 314 timestamp, 313 topologies, 315 Timing accidents, 188 attacks, 53 path, 186, 190–97 penalty, 188 simulation-based analysis, 196 TinyOS, 19–20 Tokens, 42 Token-triggered threading, 83 Toshiba MeP core, 138–40 Total conflict factor, 181 Trace-based analysis, 129–30 Traffic models, 291 Transition variables, 120 Transmission start sequence, 320 Transport group protocols, 54 Transport layer, Triple modular redundancy, 50, 51 Turbo codes, Turing machine, 37–38 defined, 37 halting problem, 38 illustrated, 37 operating cycle, 38 Twig code generator, 158 Ultimate software radio, Unbuffered communication, 43 Unified Modeling Language (UML), 217 UNITY, 398 Unrolled schedules, 348 User-custom instruction (UCI), 140 Utilization CPU, 226 processor resource, 83–86 RMS, 231–32 Validation, 31 Variable-length codewords, 105–6 Variable-length coding, 11, 12 Variable lifetime chart, 160, 161 Variable-performance CPUs, 86–89 better-than-worst-case design, 88–89 dynamic voltage and frequency scaling (DVFS), 86–88 Variable-to-fixed encoding, 111, 112 Wolf.book Page 521 Tuesday, August 8, 2006 1:56 PM Index 521 VastSystems CoMET simulator, 376–77 Vector processing, 68 Vector processors, 80–81 Vehicle control/operation, 15–18 harnesses, 16 microprocessors, 16 safety-critical systems, 16 specialized networks, 16–17 X-by-wire, 17 See also Applications Verification, 31, 259–63 finite state and, 36 multiprocessor design, 376–78 software, 32 statecharts, 209–10 techniques, 31 Very long instruction word (VLIW) processors, 77–80 defined, 77 register files, 162–63 split register files, 78 Starcore SC140 VLIW core, 80 structure, 77 TI C6x DSP, 79 uses, 78 See also Processors Video cameras, computation in, 271 Video compression standards, 13 Video drivers, 199 Video encoding standards, 13 Virtual channel flow control, 296 Virtual components, 327 Virtual cut-through routing, 296 Virtual mapping, 257 Virtual Socket Interface Alliance (VISA), 326 Virtual-to-real synthesis, 328 Voting schemes, 50 Vulcan, 401 Watchdog timers, 50–51 Waterfall model, 24 Wavelets, 12 Wear leaving, 257 WiFi, 56 Windows CE memory management, 248–49 scheduling and interrupts, 250–51 Windows Media Rights Manager, 59– 60 Wireless co-synthesis, 421 data, Wolfe and Chanin architecture, 102 architecture performance, 103–4 efficiency comparison, 103 Word line energy, 92 Working-zone encoding, 119 Workload, 276 Wormhole routing, 296 Worst-case execution time (WCET), 185–86 Wrappers, 327–29 X-by-wire, 17 Xiilnx Virtex-4 FX platform FPGA family, 385–86 Xpipes, 302 Yet Another Flash Filing System (YAFFS), 258 ZebraNet, 20–21 ...WolfFront.fm Page i Wednesday, August 9, 2006 1:21 PM In Praise of High- Performance Embedded Computing: Architectures, Applications, and Methodologies High- Performance Embedded Computing. .. Technology WolfFront.fm Page iii Wednesday, August 9, 2006 1:21 PM High- Performance Embedded Computing WolfFront.fm Page iv Wednesday, August 9, 2006 1:21 PM About the Author Wayne Wolf is a professor... love Wayne Wolf Princeton, New Jersey Wolf. book Page Tuesday, August 8, 2006 1:56 PM Chapter Embedded Computing 1.1 ■ Fundamental problems in embedded computing ■ Applications that make use of embedded