About the Authors iiChapter 1.1 The RISC Design Philosophy 41.2 The ARM Design Philosophy 51.3 Embedded System Hardware 61.4 Embedded System Software 121.5 Summary 15 Chapter 2.1 Registe
Trang 2System Software
Trang 3About the Authors
Andrew N Sloss
Andrew Sloss received a B.Sc in Computer Science from the University of Herefordshire (UK)
in 1992 and was certified as a Chartered Engineer by the British Computer Society (C.Eng, MBCS)
He has worked in the computer industry for over 16 years and has been involved with the ARMprocessor since 1987 He has gained extensive experience developing a wide range of applicationsrunning on the ARM processor He designed the first editing systems for both Chinese and EgyptianHieroglyphics executing on the ARM2 and ARM3 processors for Emerald Publishing (UK) AndrewSloss has worked at ARM Inc for over six years He is currently a Technical Sales Engineer advisingand supporting companies developing new products He works within the U.S Sales Organizationand is based in Los Gatos, California
Dominic Symes
Dominic Symes is currently a software engineer at ARM Ltd in Cambridge, England, where
he has worked on ARM-based embedded software since 1995 He received his B.A and D.Phil inMathematics from Oxford University He first programmed the ARM in 1989 and is particularlyinterested in algorithms and optimization techniques Before joining ARM, he wrote commercial andpublic domain ARM software
Chris Wright
Chris Wright began his embedded systems career in the early 80s at Lockheed Advanced MarineSystems While at Advanced Marine Systems he wrote small software control systems for use onthe Intel 8051 family of microcontrollers He has spent much of his career working at the LockheedPalo Alto Research Laboratory and in a software development group at Dow Jones Telerate Mostrecently, Chris Wright spent several years in the Customer Support group at ARM Inc., training andsupporting partner companies developing new ARM-based products Chris Wright is currently theDirector of Customer Support at Ultimodule Inc in Sunnyvale, California
John Rayfield
John Rayfield, an independent consultant, was formerly Vice President of Marketing, U.S., atARM In this role he was responsible for setting ARM’s strategic marketing direction in the U.S.,and identifying opportunities for new technologies to serve key market segments John joined ARM
in 1996 and held various roles within the company, including Director of Technical Marketing andR&D, which were focused around new product/technology development Before joining ARM, Johnheld several engineering and management roles in the field of digital signal processing, software,hardware, ASIC and system design John holds an M.Sc in Signal Processing from the University ofSurrey (UK) and a B.Sc.Hons in Electronic Engineering from Brunel University (UK)
Trang 4Designing and Optimizing
System Software
Andrew N Sloss
Dominic Symes
Chris Wright
With a contribution by John Rayfield
AMSTERDAM • BOSTON • HEIDELBERG • LONDON
NEW YORK • OXFORD • PARIS • SAN DIEGO
SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO
Trang 5Publishing Services Manager Simon Crump
Project Manager Sarah M Hajduk
Developmental Editor Belinda Breyer
Editorial Assistant Summer Block
Cover Design Dick Hannus
Cover Image Red Wing No.6 by Charles Biederman
Collection Walker Art Center, Minneapolis Gift of the artist through the Ford Foundation Purchase Program, 1964 Technical Illustration Dartmouth Publishing
Composition Cepha Imaging, Ltd.
Copyeditor Ken Dellapenta
Proofreader Jan Cocker
Indexer Ferreira Indexing
Interior printer The Maple-Vail Book Manufacturing Group
Cover printer Phoenix Color
Morgan Kaufmann Publishers is an imprint of Elsevier.
500 Sansome Street, Suite 400, San Francisco, CA 94111
This book is printed on acid-free paper.
© 2004 by Elsevier Inc All rights reserved.
The programs, examples, and applications presented in this book and on the publisher’s Web site have been included for their instructional value The publisher and the authors offer no warranty implied or express, including but not limited to implied warranties of fitness or merchantability for any particular purpose and do not accept any liability for any loss or damage arising from the use of any information in this book, or any error or omission in such information, or any incorrect use of these programs, procedures, and applications.
Designations used by companies to distinguish their products are often claimed as trademarks or registered trademarks In all instances in which Morgan Kaufmann Publishers is aware of a claim, the product names appear in initial capital or all capital letters Readers, however, should contact the appropriate companies for more complete information regarding trademarks and registration.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means—electronic, mechanical, photocopying, scanning, or otherwise—without prior written permission of the publisher.
Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in Oxford, UK: phone: (+44) 1865 843830, fax: (+44) 1865 853333, e-mail: permissions@elsevier.com.uk You may also complete your request on-line via the Elsevier homepage (http://elsevier.com) by selecting “Customer Support” and then “Obtaining Permissions.”
Library of Congress Cataloging-in-Publication Data
Sloss, Andrew N.
ARM system developer’s guide: designing and optimizing system software/Andrew N.
Sloss, Dominic Symes, Chris Wright.
p cm.
Includes bibliographical references and index.
ISBN 1-55860-874-5 (alk paper)
1 Computer software–Development 2 RISC microprocessors 3 Computer
architecture I Symes, Dominic II Wright, Chris, 1953- III Title.
QA76.76.D47S565 2004
005.1–dc22
2004040366 ISBN: 1-55860-874-5
For information on all Morgan Kaufmann publications,
visit our Web site at www.mkp.com.
Printed in the United States of America
Trang 6About the Authors ii
Chapter
1.1 The RISC Design Philosophy 41.2 The ARM Design Philosophy 51.3 Embedded System Hardware 61.4 Embedded System Software 121.5 Summary 15
Chapter
2.1 Registers 212.2 Current Program Status Register 222.3 Pipeline 292.4 Exceptions, Interrupts, and the Vector Table 332.5 Core Extensions 342.6 Architecture Revisions 372.7 ARM Processor Families 382.8 Summary 43
Chapter
3.1 Data Processing Instructions 503.2 Branch Instructions 583.3 Load-Store Instructions 603.4 Software Interrupt Instruction 733.5 Program Status Register Instructions 753.6 Loading Constants 783.7 ARMv5E Extensions 793.8 Conditional Execution 823.9 Summary 84
v
Trang 7vi Contents
Chapter
4.1 Thumb Register Usage 894.2 ARM-Thumb Interworking 904.3 Other Branch Instructions 924.4 Data Processing Instructions 934.5 Single-Register Load-Store Instructions 964.6 Multiple-Register Load-Store Instructions 974.7 Stack Instructions 984.8 Software Interrupt Instruction 994.9 Summary 100
Chapter
5.1 Overview of C Compilers and Optimization 1045.2 Basic C Data Types 1055.3 C Looping Structures 1135.4 Register Allocation 1205.5 Function Calls 1225.6 Pointer Aliasing 1275.7 Structure Arrangement 1305.8 Bit-fields 1335.9 Unaligned Data and Endianness 1365.10 Division 1405.11 Floating Point 1495.12 Inline Functions and Inline Assembly 1495.13 Portability Issues 1535.14 Summary 155
Chapter
6.1 Writing Assembly Code 1586.2 Profiling and Cycle Counting 1636.3 Instruction Scheduling 1636.4 Register Allocation 1716.5 Conditional Execution 1806.6 Looping Constructs 1836.7 Bit Manipulation 1916.8 Efficient Switches 197
Trang 86.9 Handling Unaligned Data 2016.10 Summary 204
Chapter
7.1 Double-Precision Integer Multiplication 2087.2 Integer Normalization and Count Leading Zeros 2127.3 Division 2167.4 Square Roots 2387.5 Transcendental Functions: log, exp, sin, cos 2417.6 Endian Reversal and Bit Operations 2487.7 Saturated and Rounded Arithmetic 2537.8 Random Number Generation 2557.9 Summary 256
Chapter
8.1 Representing a Digital Signal 2608.2 Introduction to DSP on the ARM 2698.3 FIR filters 2808.4 IIR Filters 2948.5 The Discrete Fourier Transform 3038.6 Summary 314
Chapter
9.1 Exception Handling 3189.2 Interrupts 3249.3 Interrupt Handling Schemes 3339.4 Summary 364
Chapter
10.1 Firmware and Bootloader 36710.2 Example: Sandstone 37210.3 Summary 379
Trang 9viii Contents
Chapter
11.1 Fundamental Components 38111.2 Example: Simple Little Operating System 38311.3 Summary 400
Chapter
12.1 The Memory Hierarchy and Cache Memory 40412.2 Cache Architecture 40812.3 Cache Policy 41812.4 Coprocessor 15 and Caches 42312.5 Flushing and Cleaning Cache Memory 42312.6 Cache Lockdown 44312.7 Caches and Software Performance 45612.8 Summary 457
Chapter
13.1 Protected Regions 46313.2 Initializing the MPU, Caches, and Write Buffer 46513.3 Demonstration of an MPU system 47813.4 Summary 487
Chapter
14.1 Moving from an MPU to an MMU 49214.2 How Virtual Memory Works 49314.3 Details of the ARM MMU 50114.4 Page Tables 50114.5 The Translation Lookaside Buffer 50614.6 Domains and Memory Access Permission 51014.7 The Caches and Write Buffer 51214.8 Coprocessor 15 and MMU Configuration 51314.9 The Fast Context Switch Extension 51514.10 Demonstration: A Small Virtual Memory System 52014.11 The Demonstration as mmuSLOS 54514.12 Summary 545
Trang 1015 The Future of the Architecture
15.1 Advanced DSP and SIMD Support in ARMv6 55015.2 System and Multiprocessor Support Additions to ARMv6 56015.3 ARMv6 Implementations 56315.4 Future Technologies beyond ARMv6 56315.5 Summary 566
Appendix
A.1 Using This Appendix 569A.2 Syntax 570A.3 Alphabetical List of ARM and Thumb Instructions 573A.4 ARM Assembler Quick Reference 620A.5 GNU Assembler Quick Reference 631
Appendix
B.1 ARM Instruction Set Encodings 637B.2 Thumb Instruction Set Encodings 638B.3 Program Status Registers 645
Appendix
C.1 ARM Naming Convention 647C.2 Core and Architectures 647
Appendix
D.1 Using the Instruction Cycle Timing Tables 651D.2 ARM7TDMI Instruction Cycle Timings 653D.3 ARM9TDMI Instruction Cycle Timings 654D.4 StrongARM1 Instruction Cycle Timings 655D.5 ARM9E Instruction Cycle Timings 656D.6 ARM10E Instruction Cycle Timings 658D.7 Intel XScale Instruction Cycle Timings 659D.8 ARM11 Cycle Timings 661
Trang 11and Reference) 667E.4 Operating System References 668
Trang 12Increasingly, embedded systems developers and system-on-chip designers select specificmicroprocessor cores and a family of tools, libraries, and off-the-shelf components toquickly develop new microprocessor-based products A major player in this industry isARM Over the last 10 years, the ARM architecture has become the most pervasive 32-bitarchitecture in the world, with more than 2 billion ARM-based processors shipped at thetime of this writing ARM processors are embedded in products ranging from cell/mobilephones to automotive braking systems A worldwide community of ARM partners andthird-party vendors has developed among semiconductor and product design companies,including hardware engineers, system designers, and software developers To date, no bookhas directly addressed their need to develop the system and software for an ARM-basedembedded design This text fills that gap.
Our goal has been to describe the operation of the ARM core from a product developer’sperspective with a clear emphasis on software Because we have written this book specificallyfor engineers who are experienced with embedded systems development but who may beunfamiliar with the ARM architecture, we have assumed no previous ARM experience
To help our readers become productive as quickly as possible, we have included a suite
of ARM software examples that can be integrated into commercial products or used astemplates for the quick creation of productive software The examples are numbered sothat readers can easily locate the source code on the publisher’s Web site The examples arealso valuable to people with ARM design experience who want to make the most efficientuse of an ARM-based embedded system
Organization of the Book
The book begins by briefly noting the ARM processor design philosophy and discussing howand why it differs from the traditional RISC philosophy The first chapter also introduces asimple embedded system based on the ARM processor
Chapter 2 digs more deeply into the hardware, focusing on the ARM processor core andpresenting an overview of the ARM cores currently in the marketplace
The ARM and Thumb instruction sets are the focus of Chapters 3 and 4, respectively,and form the fundamental basis for the rest of the book Explanations of key instructionsinclude complete examples, so these chapters also serve as a tutorial on the instruction sets.Chapters 5 and 6 demonstrate how to write efficient code with scores of example that wehave developed while working with ARM customers Chapter 5 teaches proven techniques
xi
Trang 13xii Preface
and rules for writing C code that will compile efficiently on the ARM architecture, and ithelps determine which code should be optimized Chapter 6 details best practices for writingand optimizing ARM assembly code—critical for improving performance by reducingsystem power consumption and clock speed
Because primitives are basic operations used in a wide range of algorithms, it’s while to learn how they can be optimized Chapter 7 discusses how to optimize primitivesfor specific ARM processors It presents optimized reference implementations of com-mon primitives as well as of more complicated mathematical operations for those whowish to take a quick reference approach We have also included the theory behind eachimplementation for those who wish to dig deeper
worth-Audio and video embedded systems applications are increasingly in demand Theyrequire digital signal processing (DSP) capability that until recently would have been pro-vided by a separate DSP processor Now, however, the ARM architecture offers highermemory bandwidths and faster multiply accumulate operations, permitting a single ARMcore design to support these applications Chapter 8 examines how to maximize the per-formance of the ARM for digital processing applications and how to implement DSPalgorithms
At the heart of an embedded system lie the exception handlers Efficient handlerscan dramatically improve system performance Chapter 9 covers the theory and prac-tice of handling exceptions and interrupts on the ARM processor through a set of detailedexamples
Firmware, an important part of any embedded system, is described in Chapter 10 bymeans of a simple firmware package we designed, called Sandstone The chapter also reviewspopular industry firmware packages that are available for the ARM
Chapter 11 demonstrates the implementation of embedded operating systems through
an example operating system we designed, called Simple Little Operating System
Chapters 12, 13, and 14 focus on memory issues Chapter 12 examines the variouscache technologies that surround the ARM cores, demonstrating routines for controllingthe cache on specific cache-enabled ARM processors Chapter 13 discusses the memoryprotection unit, and Chapter 14 discusses the memory management unit
Finally, in Chapter 15, we consider the future of the ARM architecture, highlightingnew directions in the instruction set and new technologies that ARM is implementing inthe next few years
The appendices provide detailed references on the instruction sets, cycle timing, andspecific ARM products
Examples on the Web
As we noted earlier, we have created an extensive set of tested practical examples toreinforce concepts and methods These are available on the publisher’s Web site at
www.mkp.com/companions/1558608745.
Trang 14First, of course, are our wives—Shau Chin Symes and Yulian Yang—and families who havebeen very supportive and have put up with us spending a large proportion of our hometime on this project
This book has taken many years to complete, and many people have contributed withencouragement and technical advice We would like to personally thank all the peopleinvolved Writing a technical book involves a lot of painstaking attention to detail, so a bigthank you to all the reviewers who spent time and effort reading and providing feedback—adifficult activity that requires a special skill Reviewers who worked with the publisher duringthe developmental process were Jim Turley (Silicon-Insider), Peter Maloy (CodeSprite),Chris Larsen, Peter Harrod (ARM, Ltd.), Gary Thomas (MLB Associates), Wayne Wolf(Princeton University), Scott Runner (Qualcomm, Inc.), Niall Murphy (PanelSoft), andDominic Sweetman (Algorithmics, Ltd.)
A special thanks to Wilco Dijkstra, Edward Nevill, and David Seal for allowing us toinclude selected examples within the book Thanks also to Rod Crawford, Andrew Cum-mins, Dave Flynn, Jamie Smith, William Rees, and Anne Rooney for helping throughoutwith advice Thanks to the ARM Strategic Support Group—Howard Ho, John Archibald,Miguel Echavarria, Robert Allen, and Ian Field—for reading and providing quick localfeedback
We would like to thank John Rayfield for initiating this project and contributingChapter 15 We would also like to thank David Brash for reviewing the manuscript andallowing us to include ARMv6 material in this book
Lastly, we wish to thank Morgan Kaufmann Publishers, especially Denise Penrose andBelinda Breyer for their patience and advice throughout the project
Trang 15This Page Intentionally Left Blank
Trang 161.3.1 ARM Bus Technology
1.3.2 AMBA Bus Protocol
1.3.3 Memory
1.3.4 Peripherals
1.4.1 Initialization (Boot) Code
1.4.2 Operating System
1.4.3 Applications
Trang 17ARM’s designers have come a long way from the first ARM1 prototype in 1985 Overone billion ARM processors had been shipped worldwide by the end of 2001 The ARMcompany bases their success on a simple and powerful original design, which continues
to improve today through constant technical innovation In fact, the ARM core is not
a single core, but a whole family of designs sharing similar design principles and a commoninstruction set
For example, one of ARM’s most successful cores is the ARM7TDMI It provides up to
120 Dhrystone MIPS1and is known for its high code density and low power consumption,making it ideal for mobile embedded devices
In this first chapter we discuss how the RISC (reduced instruction set computer) designphilosophy was adapted by ARM to create a flexible embedded processor We then introduce
an example embedded device and discuss the typical hardware and software technologiesthat surround an ARM processor
1 Dhrystone MIPS version 2.1 is a small benchmarking program
3
Trang 181.1 The RISC design philosophy
The ARM core uses a RISC architecture RISC is a design philosophy aimed at deliveringsimple but powerful instructions that execute within a single cycle at a high clock speed.The RISC philosophy concentrates on reducing the complexity of instructions performed
by the hardware because it is easier to provide greater flexibility and intelligence in softwarerather than hardware As a result, a RISC design places greater demands on the compiler
In contrast, the traditional complex instruction set computer (CISC) relies more on thehardware for instruction functionality, and consequently the CISC instructions are morecomplicated Figure 1.1 illustrates these major differences
The RISC philosophy is implemented with four major design rules:
1 Instructions—RISC processors have a reduced number of instruction classes These
classes provide simple operations that can each execute in a single cycle The compiler
or programmer synthesizes complicated operations (for example, a divide operation)
by combining several simple instructions Each instruction is a fixed length to allowthe pipeline to fetch future instructions before decoding the current instruction Incontrast, in CISC processors the instructions are often of variable size and take manycycles to execute
2 Pipelines—The processing of instructions is broken down into smaller units that can be
executed in parallel by pipelines Ideally the pipeline advances by one step on each cyclefor maximum throughput Instructions can be decoded in one pipeline stage There is
no need for an instruction to be executed by a miniprogram called microcode as onCISC processors
3 Registers—RISC machines have a large general-purpose register set Any register can
contain either data or an address Registers act as the fast local memory store for all data
CISC RISC
CodeGenerationGreater
Complexity
GreaterComplexity
CodeGeneration
Figure 1.1 CISC vs RISC CISC emphasizes hardware complexity RISC emphasizes compiler
complexity
Trang 191.2 The ARM Design Philosophy 5
processing operations In contrast, CISC processors have dedicated registers for specificpurposes
4 Load-store architecture—The processor operates on data held in registers Separate load
and store instructions transfer data between the register bank and external memory.Memory accesses are costly, so separating memory accesses from data processing pro-vides an advantage because you can use data items held in the register bank multipletimes without needing multiple memory accesses In contrast, with a CISC design thedata processing operations can act on memory directly
These design rules allow a RISC processor to be simpler, and thus the core can operate
at higher clock frequencies In contrast, traditional CISC processors are more complexand operate at lower clock frequencies Over the course of two decades, however, thedistinction between RISC and CISC has blurred as CISC processors have implementedmore RISC concepts
1.2 The ARM Design Philosophy
There are a number of physical features that have driven the ARM processor design First,portable embedded systems require some form of battery power The ARM processor hasbeen specifically designed to be small to reduce power consumption and extend batteryoperation—essential for applications such as mobile phones and personal digital assistants(PDAs)
High code density is another major requirement since embedded systems have ited memory due to cost and/or physical size restrictions High code density is useful forapplications that have limited on-board memory, such as mobile phones and mass storagedevices
lim-In addition, embedded systems are price sensitive and use slow and low-cost memorydevices For high-volume applications like digital cameras, every cent has to be accountedfor in the design The ability to use low-cost memory devices produces substantial savings.Another important requirement is to reduce the area of the die taken up by the embeddedprocessor For a single-chip solution, the smaller the area used by the embedded processor,the more available space for specialized peripherals This in turn reduces the cost of thedesign and manufacturing since fewer discrete chips are required for the end product.ARM has incorporated hardware debug technology within the processor so that softwareengineers can view what is happening while the processor is executing code With greatervisibility, software engineers can resolve issues faster, which has a direct effect on the time
to market and reduces overall development costs
The ARM core is not a pure RISC architecture because of the constraints of its primaryapplication—the embedded system In some sense, the strength of the ARM core is that
it does not take the RISC concept too far In today’s systems the key is not raw processorspeed but total effective system performance and power consumption
Trang 201.2.1 Instruction Set for Embedded Systems
The ARM instruction set differs from the pure RISC definition in several ways that makethe ARM instruction set suitable for embedded applications:
■ Variable cycle execution for certain instructions—Not every ARM instruction executes
in a single cycle For example, load-store-multiple instructions vary in the number
of execution cycles depending upon the number of registers being transferred Thetransfer can occur on sequential memory addresses, which increases performance sincesequential memory accesses are often faster than random accesses Code density is alsoimproved since multiple register transfers are common operations at the start and end
of functions
■ Inline barrel shifter leading to more complex instructions—The inline barrel shifter is
a hardware component that preprocesses one of the input registers before it is used
by an instruction This expands the capability of many instructions to improve coreperformance and code density We explain this feature in more detail in Chapters 2, 3,and 4
■ Thumb 16-bit instruction set—ARM enhanced the processor core by adding a second
16-bit instruction set called Thumb that permits the ARM core to execute either16- or 32-bit instructions The 16-bit instructions improve code density by about30% over 32-bit fixed-length instructions
■ Conditional execution—An instruction is only executed when a specific condition has
been satisfied This feature improves performance and code density by reducing branchinstructions
■ Enhanced instructions—The enhanced digital signal processor (DSP) instructions were
added to the standard ARM instruction set to support fast 16×16-bit multiplier ations and saturation These instructions allow a faster-performing ARM processor insome cases to replace the traditional combinations of a processor plus a DSP
oper-These additional features have made the ARM processor one of the most commonlyused 32-bit embedded processor cores Many of the top semiconductor companies aroundthe world produce products based around the ARM processor
1.3 Embedded System Hardware
Embedded systems can control many different devices, from small sensors found on
a production line, to the real-time control systems used on a NASA space probe Allthese devices use a combination of software and hardware components Each component
is chosen for efficiency and, if applicable, is designed for future extension and expansion
Trang 211.3 Embedded System Hardware 7
ARMprocessor
AHB arbiterInterrupt controller
AHB-APB bridge
Real-time clockSerial UARTs
Memory controller
AHB-external bridge
EthernetCounter/timers
ROMSRAMFLASHROMDRAMExternal bus
EthernetphysicaldriverConsole
Figure 1.2 An example of an ARM-based embedded device, a microcontroller
Figure 1.2 shows a typical embedded device based on an ARM core Each box represents
a feature or function The lines connecting the boxes are the buses carrying data We canseparate the device into four main hardware components:
■ The ARM processor controls the embedded device Different versions of the ARM
pro-cessor are available to suit the desired operating characteristics An ARM propro-cessorcomprises a core (the execution engine that processes instructions and manipulatesdata) plus the surrounding components that interface it with a bus These componentscan include memory management and caches
■ Controllers coordinate important functional blocks of the system Two commonly
found controllers are interrupt and memory controllers
■ The peripherals provide all the input-output capability external to the chip and are
responsible for the uniqueness of the embedded device
■ A bus is used to communicate between different parts of the device.
Trang 221.3.1 ARM Bus Technology
Embedded systems use different bus technologies than those designed for x86 PCs The mostcommon PC bus technology, the Peripheral Component Interconnect (PCI) bus, connectssuch devices as video cards and hard disk controllers to the x86 processor bus This type
of technology is external or off-chip (i.e., the bus is designed to connect mechanically andelectrically to devices external to the chip) and is built into the motherboard of a PC
In contrast, embedded devices use an on-chip bus that is internal to the chip and thatallows different peripheral devices to be interconnected with an ARM core
There are two different classes of devices attached to the bus The ARM processor core is
a bus master—a logical device capable of initiating a data transfer with another device across the same bus Peripherals tend to be bus slaves—logical devices capable only of responding
to a transfer request from a bus master device
A bus has two architecture levels The first is a physical level that covers the electrical
characteristics and bus width (16, 32, or 64 bits) The second level deals with protocol—the
logical rules that govern the communication between the processor and a peripheral.ARM is primarily a design company It seldom implements the electrical characteristics
of the bus, but it routinely specifies the bus protocol
1.3.2 AMBA Bus Protocol
The Advanced Microcontroller Bus Architecture (AMBA) was introduced in 1996 and hasbeen widely adopted as the on-chip bus architecture used for ARM processors The firstAMBA buses introduced were the ARM System Bus (ASB) and the ARM Peripheral Bus(APB) Later ARM introduced another bus design, called the ARM High Performance Bus(AHB) Using AMBA, peripheral designers can reuse the same design on multiple projects.Because there are a large number of peripherals developed with an AMBA interface, hard-ware designers have a wide choice of tested and proven peripherals for use in a device
A peripheral can simply be bolted onto the on-chip bus without having to redesign an face for each different processor architecture This plug-and-play interface for hardwaredevelopers improves availability and time to market
inter-AHB provides higher data throughput than ASB because it is based on a centralizedmultiplexed bus scheme rather than the ASB bidirectional bus design This change allowsthe AHB bus to run at higher clock speeds and to be the first ARM bus to support widths
of 64 and 128 bits ARM has introduced two variations on the AHB bus: Multi-layer AHBand AHB-Lite In contrast to the original AHB, which allows a single bus master to beactive on the bus at any time, the Multi-layer AHB bus allows multiple active bus masters.AHB-Lite is a subset of the AHB bus and it is limited to a single bus master This bus wasdeveloped for designs that do not require the full features of the standard AHB bus.AHB and Multi-layer AHB support the same protocol for master and slave but havedifferent interconnects The new interconnects in Multi-layer AHB are good for systemswith multiple processors They permit operations to occur in parallel and allow for higherthroughput rates
Trang 231.3 Embedded System Hardware 9
The example device shown in Figure 1.2 has three buses: an AHB bus for the performance peripherals, an APB bus for the slower peripherals, and a third bus for externalperipherals, proprietary to this device This external bus requires a specialized bridge toconnect with the AHB bus
high-1.3.3 Memory
An embedded system has to have some form of memory to store and execute code Youhave to compare price, performance, and power consumption when deciding upon specificmemory characteristics, such as hierarchy, width, and type If memory has to run twice asfast to maintain a desired bandwidth, then the memory power requirement may be higher
All computer systems have memory arranged in some form of hierarchy Figure 1.2 shows
a device that supports external off-chip memory Internal to the processor there is an option
of a cache (not shown in Figure 1.2) to improve memory performance
Figure 1.3 shows the memory trade-offs: the fastest memory cache is physically locatednearer the ARM processor core and the slowest secondary memory is set further away.Generally the closer memory is to the processor core, the more it costs and the smaller itscapacity
The cache is placed between main memory and the core It is used to speed up datatransfer between the processor and main memory A cache provides an overall increase inperformance but with a loss of predictable execution time Although the cache increases the
1 MB
Cache
Mainmemory
Secondarystorage
1 GBMemory Size
Figure 1.3 Storage trade-offs
Trang 24general performance of the system, it does not help real-time system response Note thatmany small embedded systems do not require the performance benefits of a cache.The main memory is large—around 256 KB to 256 MB (or even greater), depending onthe application—and is generally stored in separate chips Load and store instructions accessthe main memory unless the values have been stored in the cache for fast access Secondarystorage is the largest and slowest form of memory Hard disk drives and CD-ROM drivesare examples of secondary storage These days secondary storage may vary from 600 MB
to 60 GB
The memory width is the number of bits the memory returns on each access—typically
8, 16, 32, or 64 bits The memory width has a direct effect on the overall performance andcost ratio
If you have an uncached system using 32-bit ARM instructions and 16-bit-wide memorychips, then the processor will have to make two memory fetches per instruction Each fetchrequires two 16-bit loads This obviously has the effect of reducing system performance,but the benefit is that 16-bit memory is less expensive
In contrast, if the core executes 16-bit Thumb instructions, it will achieve betterperformance with a 16-bit memory The higher performance is a result of the core makingonly a single fetch to memory to load an instruction Hence, using Thumb instructionswith 16-bit-wide memory devices provides both improved performance and reduced cost.Table 1.1 summarizes theoretical cycle times on an ARM processor using differentmemory width devices
a ROM to hold boot code
Table 1.1 Fetching instructions from memory
Instruction size 8-bit memory 16-bit memory 32-bit memory
ARM 32-bit 4 cycles 2 cycles 1 cycle
Thumb 16-bit 2 cycles 1 cycle 1 cycle
Trang 251.3 Embedded System Hardware 11
Flash ROM can be written to as well as read, but it is slow to write so you shouldn’t use
it for holding dynamic data Its main use is for holding the device firmware or storing term data that needs to be preserved after power is off The erasing and writing of flash ROMare completely software controlled with no additional hardware circuity required, whichreduces the manufacturing costs Flash ROM has become the most popular of the read-onlymemory types and is currently being used as an alternative for mass or secondary storage.Dynamic random access memory (DRAM) is the most commonly used RAM for devices
long-It has the lowest cost per megabyte compared with other types of RAM DRAM is dynamic—
it needs to have its storage cells refreshed and given a new electronic charge every fewmilliseconds, so you need to set up a DRAM controller before using the memory
Static random access memory (SRAM) is faster than the more traditional DRAM, but
requires more silicon area SRAM is static—the RAM does not require refreshing The
access time for SRAM is considerably shorter than the equivalent DRAM because SRAMdoes not require a pause between data accesses Because of its higher cost, it is used mostlyfor smaller high-speed tasks, such as fast memory and caches
Synchronous dynamic random access memory (SDRAM) is one of many subcategories
of DRAM It can run at much higher clock speeds than conventional memory SDRAMsynchronizes itself with the processor bus because it is clocked Internally the data is fetchedfrom memory cells, pipelined, and finally brought out on the bus in a burst The old-styleDRAM is asynchronous, so does not burst as efficiently as SDRAM
All ARM peripherals are memory mapped—the programming interface is a set of
memory-addressed registers The address of these registers is an offset from a specificperipheral base address
Controllers are specialized peripherals that implement higher levels of functionalitywithin an embedded system Two important types of controllers are memory controllersand interrupt controllers
Memory controllers connect different types of memory to the processor bus On power-up
a memory controller is configured in hardware to allow certain memory devices to be active.These memory devices allow the initialization code to be executed Some memory devicesmust be set up by software; for example, when using DRAM, you first have to set up thememory timings and refresh rate before it can be accessed
Trang 261.3.4.2 Interrupt Controllers
When a peripheral or device requires attention, it raises an interrupt to the processor
An interrupt controller provides a programmable governing policy that allows software todetermine which peripheral or device can interrupt the processor at any specific time bysetting the appropriate bits in the interrupt controller registers
There are two types of interrupt controller available for the ARM processor: the standardinterrupt controller and the vector interrupt controller (VIC)
The standard interrupt controller sends an interrupt signal to the processor core when
an external device requests servicing It can be programmed to ignore or mask an individualdevice or set of devices The interrupt handler determines which device requires servicing
by reading a device bitmap register in the interrupt controller
The VIC is more powerful than the standard interrupt controller because it prioritizesinterrupts and simplifies the determination of which device caused the interrupt Afterassociating a priority and a handler address with each interrupt, the VIC only asserts aninterrupt signal to the core if the priority of a new interrupt is higher than the currentlyexecuting interrupt handler Depending on its type, the VIC will either call the standardinterrupt exception handler, which can load the address of the handler for the device fromthe VIC, or cause the core to jump to the handler for the device directly
1.4 Embedded System Software
An embedded system needs software to drive it Figure 1.4 shows four typical softwarecomponents required to control an embedded device Each software component in thestack uses a higher level of abstraction to separate the code from the hardware device.The initialization code is the first code executed on the board and is specific to a particulartarget or group of targets It sets up the minimum parts of the board before handing controlover to the operating system
ApplicationOperating systemInitialization Device drivers
Hardware device
Figure 1.4 Software abstraction layers executing on hardware
Trang 271.4 Embedded System Software 13
The operating system provides an infrastructure to control applications and managehardware system resources Many embedded systems do not require a full operating systembut merely a simple task scheduler that is either event or poll driven
The device drivers are the third component shown in Figure 1.4 They provide
a consistent software interface to the peripherals on the hardware device
Finally, an application performs one of the tasks required for a device For example,
a mobile phone might have a diary application There may be multiple applications running
on the same device, controlled by the operating system
The software components can run from ROM or RAM ROM code that is fixed on the
device (for example, the initialization code) is called firmware.
1.4.1 Initialization (Boot) Code
Initialization code (or boot code) takes the processor from the reset state to a state where theoperating system can run It usually configures the memory controller and processor cachesand initializes some devices In a simple system the operating system might be replaced by
a simple scheduler or debug monitor
The initialization code handles a number of administrative tasks prior to handing controlover to an operating system image We can group these different tasks into three phases:initial hardware configuration, diagnostics, and booting
Initial hardware configuration involves setting up the target platform so it can boot
an image Although the target platform itself comes up in a standard configuration, thisconfiguration normally requires modification to satisfy the requirements of the bootedimage For example, the memory system normally requires reorganization of the memorymap, as shown in Example 1.1
Diagnostics are often embedded in the initialization code Diagnostic code tests thesystem by exercising the hardware target to check if the target is in working order It alsotracks down standard system-related issues This type of testing is important for manu-facturing since it occurs after the software product is complete The primary purpose ofdiagnostic code is fault identification and isolation
Booting involves loading an image and handing control over to that image The bootprocess itself can be complicated if the system must boot different operating systems ordifferent versions of the same operating system
Booting an image is the final phase, but first you must load the image Loading an imageinvolves anything from copying an entire program including code and data into RAM, tojust copying a data area containing volatile variables into RAM Once booted, the systemhands over control by modifying the program counter to point into the start of the image.Sometimes, to reduce the image size, an image is compressed The image is thendecompressed either when it is loaded or when control is handed over to it
Example
1.1 Initializing or organizing memory is an important part of the initialization code becausemany operating systems expect a known memory layout before they can start
Trang 28DRAMlargecontiguousblockAfter
Figure 1.5 Memory remapping
Figure 1.5 shows memory before and after reorganization It is common for ARM-basedembedded systems to provide for memory remapping because it allows the system to startthe initialization code from ROM at power-up The initialization code then redefines orremaps the memory map to place RAM at address 0x00000000—an important step becausethen the exception vector table can be in RAM and thus can be reprogrammed We willdiscuss the vector table in more detail in Section 2.4 ■
1.4.2 Operating System
The initialization process prepares the hardware for an operating system to takecontrol An operating system organizes the system resources: the peripherals, memory,and processing time With an operating system controlling these resources, they can beefficiently used by different applications running within the operating system environment.ARM processors support over 50 operating systems We can divide operating systemsinto two main categories: real-time operating systems (RTOSs) and platform operatingsystems
RTOSs provide guaranteed response times to events Different operating systems havedifferent amounts of control over the system response time A hard real-time applicationrequires a guaranteed response to work at all In contrast, a soft real-time applicationrequires a good response time, but the performance degrades more gracefully if the responsetime overruns Systems running an RTOS generally do not have secondary storage.Platform operating systems require a memory management unit to manage large, non-real-time applications and tend to have secondary storage The Linux operating system is
a typical example of a platform operating system
Trang 291.5 Summary 15
These two categories of operating system are not mutually exclusive: there are ing systems that use an ARM core with a memory management unit and have real-timecharacteristics ARM has developed a set of processor cores that specifically target eachcategory
operat-1.4.3 Applications
The operating system schedules applications—code dedicated to handling a particular task
An application implements a processing task; the operating system controls the ment An embedded system can have one active application or several applications runningsimultaneously
environ-ARM processors are found in numerous market segments, including networking, motive, mobile and consumer devices, mass storage, and imaging Within each segmentARM processors can be found in multiple applications
auto-For example, the ARM processor is found in networking applications like homegateways, DSL modems for high-speed Internet communication, and 802.11 wirelesscommunication The mobile device segment is the largest application area for ARM pro-cessors because of mobile phones ARM processors are also found in mass storage devicessuch as hard drives and imaging products such as inkjet printers—applications that are costsensitive and high volume
In contrast, ARM processors are not found in applications that require leading-edgehigh performance Because these applications tend to be low volume and high cost, ARMhas decided not to focus designs on these types of applications
Pure RISC is aimed at high performance, but ARM uses a modified RISC design philosophythat also targets good code density and low power consumption An embedded systemconsists of a processor core surrounded by caches, memory, and peripherals The system iscontrolled by operating system software that manages application tasks
The key points in a RISC design philosophy are to improve performance by reducingthe complexity of instructions, to speed up instruction processing by using a pipeline, toprovide a large register set to store data near the core, and to use a load-store architecture.The ARM design philosophy also incorporates some non-RISC ideas:
■ It allows variable cycle execution on certain instructions to save power, area, andcode size
■ It adds a barrel shifter to expand the capability of certain instructions
■ It uses the Thumb 16-bit instruction set to improve code density
Trang 30■ It improves code density and performance by conditionally executing instructions.
■ It includes enhanced instructions to perform digital signal processing type functions
An embedded system includes the following hardware components: ARM processors are found embedded in chips Programmers access peripherals through memory-mapped registers There is a special type of peripheral called a controller, which embedded systems
use to configure higher-level functions such as memory and interrupts The AMBA on-chip
bus is used to connect the processor and peripherals together.
An embedded system also includes the following software components: Initialization
code configures the hardware to a known state Once configured, operating systems can be
loaded and executed Operating systems provide a common programming environment for
the use of hardware resources and infrastructure Device drivers provide a standard interface
to peripherals An application performs the task-specific duties of an embedded system.
Trang 31This Page Intentionally Left Blank
Trang 322.2.3 State and Instruction Sets
2.2.4 Interrupt Masks
2.2.5 Condition Flags
2.2.6 Conditional Execution
2.3 Pipeline
2.3.1 Pipeline Executing Characteristics
2.4 Exceptions, Interrupts, and the Vector Table
Trang 33C h a p t e r
ARM Processor
Chapter 1 covered embedded systems with an ARM processor In this chapter we will focus
on the actual processor itself First, we will provide an overview of the processor core anddescribe how data moves between its different parts We will describe the programmer’smodel from a software developer’s view of the ARM processor, which will show you thefunctions of the processor core and how different parts interact We will also take a look atthe core extensions that form an ARM processor Core extensions speed up and organizemain memory as well as extend the instruction set We will then cover the revisions to theARM core architecture by describing the ARM core naming conventions used to identifythem and the chronological changes to the ARM instruction set architecture The finalsection introduces the architecture implementations by subdividing them into specificARM processor core families
A programmer can think of an ARM core as functional units connected by data buses,
as shown in Figure 2.1, where, the arrows represent the flow of data, the lines represent thebuses, and the boxes represent either an operation unit or a storage area The figure showsnot only the flow of data but also the abstract components that make up an ARM core
Data enters the processor core through the Data bus The data may be an instruction to
execute or a data item Figure 2.1 shows a Von Neumann implementation of the ARM—data items and instructions share the same bus In contrast, Harvard implementations ofthe ARM use two different buses
The instruction decoder translates instructions before they are executed Eachinstruction executed belongs to a particular instruction set
The ARM processor, like all RISC processors, uses a load-store architecture This
means it has two instruction types for transferring data in and out of the processor: loadinstructions copy data from memory to registers in the core, and conversely the store
19
Trang 34Figure 2.1 ARM core dataflow model
instructions copy data from registers to memory There are no data processing instructionsthat directly manipulate data in memory Thus, data processing is carried out solely inregisters
Data items are placed in the register file—a storage bank made up of 32-bit registers.
Since the ARM core is a 32-bit processor, most instructions treat the registers as holdingsigned or unsigned 32-bit values The sign extend hardware converts signed 8-bit and 16-bitnumbers to 32-bit values as they are read from memory and placed in a register
ARM instructions typically have two source registers, Rn and Rm, and a single result or destination register, Rd Source operands are read from the register file using the internal buses A and B, respectively.
The ALU (arithmetic logic unit) or MAC (multiply-accumulate unit) takes the
regis-ter values Rn and Rm from the A and B buses and computes a result Data processing instructions write the result in Rd directly to the register file Load and store instructions
use the ALU to generate an address to be held in the address register and broadcast on the
Address bus.
Trang 352.1 Registers 21
One important feature of the ARM is that register Rm alternatively can be preprocessed
in the barrel shifter before it enters the ALU Together the barrel shifter and ALU cancalculate a wide range of expressions and addresses
After passing through the functional units, the result in Rd is written back to the register file using the Result bus For load and store instructions the incrementer updates the address
register before the core reads or writes the next register value from or to the next sequentialmemory location The processor continues executing instructions until an exception orinterrupt changes the normal execution flow
Now that you have an overview of the processor core we’ll take a more detailed look
at some of the key components of the processor: the registers, the current program status
register (cpsr), and the pipeline.
2.1 Registers
General-purpose registers hold either data or an address They are identified with the
letter r prefixed to the register number For example, register 4 is given the label r4 Figure 2.2 shows the active registers available in user mode—a protected mode normally