1. Trang chủ
  2. » Kỹ Thuật - Công Nghệ

Microcontrollers with c cortex m and beyond (klaus elk)

229 0 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Nội dung

Trang 1

Microcontrollers with C Cortex-M® and Beyond

Klaus Elk

Trang 2

Information provided in this book is derived from many sources, standards, and analyzes - including subjective personal views Any errors or omissions shall not imply any liability for direct or indirect consequences arising from the use of this information The author makes no warranty for the correctness or for the use of this information, and assume no liability for direct or indirect damages of any kind arising from technical interpretation or technical explanations in this book, for typographical or printing errors, or for any subsequent changes.

All rights reserved.

ISBN-13: 9798862003437

See also https://klauselk.com

Trang 3

Joining an embedded software project for the first time can be intimidating As an embedded developer, you often have to start programming without hardware, and when the hardware finally arrives you cannot assume that it is working as it should Typically, hardware is developed in parallel with your software Your requirements to on-board resources - e.g Flash and RAM - may be vague at the beginning Other resources - such as display-type - might change as you go Such changes may happen if a component becomes obsolete, and/or was replaced by something cheaper and “almost identical” - at least from the hardware designers point-of-view Embedded programmers are faced with numerous CPU-types, board types and peripherals These may have been selected in an earlier project - but

a bit scary On top of this we have the application domain Embedded systems are typically dedicated for very specific purposes This means that there may be many users out there, already trained by previous systems - maybe from competitors - with very special needs, that you may not yet understand So why should you choose to go the embedded way? Because it is so satisfying to see an LED blink - or a step-motor step - or experience the sound from a phone in a prototype of a hearing aid Embedded systems bridge the gap between the physical world and the art of programming It is simply fascinating

iii

Trang 4

Klaus Elk graduated as Master of Science in electronics from the Danish Technical University in Copenhagen in 1984, with a thesis in digital signal processing Since then, he has worked in the electronic industry within the domains of telecommunication, medical electronics and sound and vibration He also holds a bachelor’s degree in marketing In a period of 10 years Klaus - besides his R&D job - taught at the Danish Technical University The subjects were originally object-oriented programming (C++ and Java), later the Internet Protocol Stack Today he is Senior R&D Manager in Platforms at GN Resound

Acknowledgements

I would like to thank my manager Bjarke Blucher Brink for his positive attitude towards my writing Also, a big thanks to past and present col­leagues who have inspired me when I write my books Finally, thanks to my family for their patience with my many hours by the PC

Klaus Elk

Trang 5

3.2 A Generic Embedded System 16

3.3 CPU and MCU 22

3.4 Harvard and Von Neumann 23

3.5 Physical Memory Types 25

3.6 The Cortex Families 28

4.3 Fast Memory Access 45

4.4 The Build Process 46

5.2 Literal Pool 70

v

Trang 6

6.5 Compiler and Linker Options 85

6.6 IDEs and alternatives 89

7 Cortex-M Programmers Model 917.1 History of Cortex-M 91

Trang 7

CONTENTS vii

11.1 Multiple Thread Solutions 159

11.2 Context Switching Between Threads 162

11.3 Storing and Retrieving State 163

11.4 Executing the Context Switch 165

12 Common Patterns 16712.1 Layering 167

13.2 ITM and Printf 192

13.3 Messages Between Tasks 198

13.4 Interrupt with Semaphore 201

13.5 CMSIS-RTOS 209

Trang 9

Chapter 1

Introduction

The aim of this book is to give you - the reader - a better understanding of what goes on in an embedded system Using a mix of theory and examples, we will work our way through complex subjects from different perspectives Following software from C-source through the compiler and linker - into the memory - we will get a taste of assembly code Studying how different types of memories are used in embedded systems, we will see how they are accessed and controlled by the CPU-core Looking at the modern microcontroller, we will see it perform floating-point operations Armed with the above knowledge we will see how operating systems work, and we will discuss important patterns for parallel processing

will show you how to get to a point where you can run the samples It is however, also necessary to get a wider understanding of why we do what we do - and could we do it better or faster?

A development board is a PCB - Printed Circuit Board - from a chip­vendor Originally these were created for use by professional developers - for prototyping and verification of key performance - before an important component like a CPU was selected However, concepts like Arduino, Raspberry Pi and Beaglebone have been huge successes - reaching far beyond professional developers I have no doubt that these concepts have meant an increased interest in the specific CPUs used in these designs - as well as a general interest in embedded systems This has motivated other vendors of CPUs to create more accessible development boards - and even target some of these for schools and hobby-programmers Hobby­programmer today - job-creator and customer tomorrow This means that the last many years of hobby programming for PCs, Macs and phones have now found worthy competition in the embedded world

1

Trang 10

A development board contains an interesting chip - in our case a micro­controller (MCU) - from the silicon vendor, but also glue-logic and peripherals that allow you to easily try out the MCU in question There will also be some accessible software to help you in this investigation, and with the surge of hobbyists, interesting samples for some fun peripherals We will also start with a sample and a development-board, but I hope that reading this book will give you a deeper understanding - not just of the “how” - but also the “why”.

My first book on embedded programming was “Embedded Software for the IoT” The 3’rd edition was published by De Gruyter in 2019 So why this new book? The IoT book mainly looks outwards - at connectivity, with focus on TCP/IP, Ethernet and Wi-Fi, security in transmissions and “Industry 4.0” subjects like process-control The IoT book also looks at

development processes and general software tools like git.In contrast, the book you have in your hand looks more inwards - at the CPU architecture, interrupt frames, memory-spaces, floating-point, instruction sets etc - all from a C-programmers point-of-view This in­ward looking is indeed a huge subject I narrow it down a bit by mainly using examples from the Arm Cortex-M range of Microcontrollers1 This focus makes sense because the Arm-based System-on-Chip (SoC) is ev­erywhere The SoCs are marketed by silicon vendors like NXP, Renesas, Texas Instruments, ST, Nordic and many others Arm never makes a single chip - they sell IP - Intellectual Property The silicon vendors add various peripherals and memory to the design, and then market this “IC combo” They compete in the speed, flexibility, robustness, space-requirements, temperature-interval and power usage of the solution - as well as develop­ment tools and price

1We will get back to the difference between CPUs and Microcontrollers

The IoT-book started with Operating Systems - of which the most refer­enced one was Linux In the Arm world this leads you to Cortex-A CPUs In this book I am focusing on the Arm Cortex-M core - mostly related to bare-metal systems (those without an Operating System) because we can learn a lot this way We will however also discuss real-time operating systems, as well as parallel programming and other more general subjects.The aim of these investigations is not so much to show how to utilize various peripherals Instead, we will dig into how code is generated, loaded and executed in C and a bit of assembly language While doing this we will examine central concepts and components, and discuss important patterns in e.g., parallel programming

Trang 11

3Ihope that this will give you a deeper understanding that you can take with you as new generations of boards and microcontrollers appear on the market Thus, I consider this book to be more of an informal text-book, rather than a manual to any specific board.

An important aim of this book is to use free tools - just like in another earlier book “SQL-Server with C#” This means using the open-source GNU toolchain - compiler, linker and debugger - as well as various helper-tools It is also possible to find free IDEs - Integrated Development Environments - and libraries from some silicon vendors Other vendors do not offer com­pletely free tools Instead, they offer limited versions that e.g only handle memory up to a certain size - or time-limited versions Finally, we have the very popular editor Visual Studio Code, working on all platforms (Win­dows, Linux and Mac) - also free The hardware development boards are typically very inexpensive I even found a few in my basement-collection that I was given freely at various trade-fairs before Covid changed the world for a time Since the SW-tools from ST are free and unlimited, I decided to use these in most examples As can be seen in Chapter 2, this led me to buy a low-cost Nucleo-64 board - also from ST The Nucleo boards share many features as well as pinout, and if you would like to code-along, any Nucleo-64 board will do

After covering the basic embedded system, we will look into some

Parallel Design Patterns, and how they are supported in the modern SoCs,

that we are investigating We will also look at the support in C - either built-in, or added via intrinsic functions

You will find that much of the content of this book is generic to almost any microcontroller - sometimes even to all programming However, as stated we will be looking mostly at Cortex-M Many examples are per­formed on a Cortex M4 - simply because this is big enough to contain important architectural building blocks like Floating-Point Unit, Memory Protection Unit and DSP It is also used in countless designs I will, how­ever, also discuss similarities to - and differences from - other Cortex-M microcontrollers

Along the way I take advantage of my many years in the embedded universe I will draw parallels to older CPUs and MCUs From my teaching days I know that anecdotes from “real life” makes it easier to remember the theory Understanding a bit of the history also makes it easier to understand how things work today

Trang 12

Through the book I use capital “B” for Bytes and “b” for bits - when the short form is used.

At the back of the book you find the Index Before this there are lists of Figures, Tables and Listings Note that some smaller “code-snippets” are caption-less and not included as listings

Trang 13

Chapter 2

The Tip of the Iceberg

Every C-programmer knows the classic “Hello World” program that you see in Listing 2.1 In this case it was written in, compiled by, and run from Visual Studio on Windows - in a project created seconds before It looks very innocent, but, if you have worked with C programs on the Desktop, you know that the project has tons of settings, and that the include statement pulls-in selected parts of the standard C-library You may also know that you can build Debug and Release versions from this code, and that you may compile to either 32-bit or 64-bit, and these days even to Linux

The PC-tool, however, provides a fast-track to the common solution allowing you to experience the feeling of success very fast You now have something that you can incrementally extend If you use version control, you may be able to go back to a working solution - should you run into trouble

The same concept exists in the embedded domain We can get fast up and running on a specific development-board, and we can grow from there PC-programmers might at one time or another be forced to think about Debug versus Release builds, and which optimizations they need

Listing 2.1: Hello World on PC

123456

#include <stdio.h>int main()

{printfCHello^WorldiW);}

5

Trang 14

to employ The embedded platform choices are larger in numbers as well as in consequences There are numerous CPU-cores, in countless boards A given Cortex-M comes in many derivates, with and without specific building blocks The silicon vendor has added peripherals inside the chip (yes - we still call them peripherals), that you may need to support, and the hardware team may have added even more outside the chip You probably want this support to continue to work - also if you later upgrade the microcontroller.

On top of this, at least someone in an embedded project needs to think about what happens before the first line in main() How do we build? How is the memory organized? - and how do we guard against memory­overwrites and similar failures? How do we get debug printout? - and what about floating-point support? I don’t think many PC programmers have studied linker maps or compiler options, but in an embedded project this is crucial In this chapter we are starting with the tip of the iceberg - Hello World - and we will spend the next couple of chapters looking under the surface of this iceberg

It turns out that the Hello-World program of embedded development­boards is “Blinky” This is a simple main-program with an endless loop that toggles an LED Sometimes a timer interrupt is involved - sometimes not

I bought one of ST’s smallest Nucleo-64 boards - STM32F334R8 This board is not new, and only costs around $10 It is very small, but it does contain a Cortex-M4 which has (almost) all the “grown-up” features we need to get started Some important features for this book are the DSP- commands and the Single-Precision FPU - Floating-Point Unit We will get back to these For the first chapters the actual board is not so important - we basically just need to build for it However, just to show how to get

started I will also flash the sample and start the debugger.Like many other development-boards, STM32F334R8 has the on-board circuits that allow us to connect directly to the USB of a PC - without an extra debugger-box on the wire You would normally not have this in “real-life”, because we prefer to save the cost and the board-space when the device goes into mass-production It is however, really nice in our situation See Figure 2.1

Iinstalled the fully-functional STM32CubeIDE free of charge (you do need to register) from ST’s website and fired it up The initial configuration editor - STM32CubeMX - allowed me to select my STM32F334R8 board I simply accepted the default configuration of pinouts, clocks, enabled peripherals etc In Chapter 13 we dig deeper into STM32CubeMX and

Trang 15

Figure 2.1: Nucleo-64 Board with debug support Courtesy of ST

Trang 16

CN2 ST-LINK/Nucleo selector

SWD connector

ARDUINO®connectorB1 USER

buttonJP6 IDD measurement

32 KHz crystal(1)

LD3(Red LED)

ARDUINO®connector

ST morphoconnector

PWM/M03/D11

CN1ST-LINK USBmini B connectorLD1

(Red/Green LED) COM

B2RESETbuttonSB23.3Vregulatoroutput

LD2(Green LED)

CN5ARDUINO® connector

CN10ST morpho connectorCN9

ARDUINO® connector

U5STM32microcontroller

Figure 2.2: Nucleo-64 Board Layout (topside) Courtesy of ST

Trang 17

12345678910111213

Listing 2.2: Add two lines to blink

while (1)

{/

/

HAL_GPIO_TogglePin(LD2_GPIO_Port, LD2_Pin);HAL_Delay(1000);

In the auto-created main.c file, there was an empty endless Loop, into which I wrote the two lines starting with “HAL” in 2.2 This code basically toggles an LED once a second Looking at Figure 2.2 (and Figure 13.4) we can see that “LD2” is the green LED available for user programs - the only one not related to power and not related to the debug circuit Note that LD2 is defined in main.h as gpio_pin_5 on gpioa - these values could also have been used You can find a longer description of the setup on ST’s web if you look for “STM32StepByStep:Step2_Blink_LED” You can also find full source-files on my website klauselk.com Note that STM32CubeMX also allows you to start with one of many samples if you prefer

Debug the current project The first time this must be done by selecting the project in the Explorer window, right-clicking and selecting “Debug” In the following dialog, you can accept the defaults

The IDE now told me that the development-board debugger-firmware was a bit old and would I like to update? Note that this is not about flashing the Cortex-M4, but the small debugging circuit on the small “peninsula”

Trang 18

of the board that allows us to talk to it A bit nervous I accepted - and it went well I started the debugger again and now my newly built program was flashed into the Cortex-M4, and the built-in debugger paused at a breakpoint at the first instruction of my program This is default behavior Iwas now able to run or single step the program and watch the LED blink slowly When I power-cycled the board (turned off the power and turned it on again by unplugging the cable and inserting it again), I could see that the downloaded firmware was still present and working This proved that it indeed was flashed.

Let us briefly pause to enumerate all the components that together gives us debugging, but also - a bit confusing - individually sometimes are called a “debugger”:

demand single-stepping, see registers etc This can be Eclipse-based or Visual Studio Code with an extension It can even be one of many more narrow tools like Ozone

orders from our user-client via a local TCP-connection and translates these to commands over the USB

hardware we debug In our case this is ST-Link - built into our board

(looking like you could break it off) See Figure 2.1 ST-Link converts from data over USB to data over a simple Serial-Wire Debugger - SWD interface - on the board This is connected to our Cortex-M - but you can instead use a serial cable to another board, if you move a jumper1

mands from the above software

1A jumper is a small plastic-coated metal-clip that you can slide down over two pins, and thus connect them

Figure 2.3 shows the program at a breakpoint with the following win­dows manipulated to fit the paper:

When the debugging starts, the Eclipse-based IDE suggests a shift to

window changes from a File-Explorer to show the Stack “Perspec­tives” are an Eclipse invention Each perspective defines a specific

Trang 19

11collection of Views The perspectives have a reasonable default You can change this and save the new perspective, and you can go back to the default We will discuss the stack in depth in Chapter 7, but will see it in use in many places Note that we can see in the stack that we are at a breakpoint in line 99 of main.c If the breakpoint was in a function called by a function etc., this window shows us exactly which function at which line-number in which file that called the next level This helps with a bit of overview when debugging.

The C-source is the main window in the normal (coding) perspective, and also when debugging

This window I created via the top menu: Window - Show View - Disassembly Note that when debugging, you can press a small icon in the menu - looking like an “i” with a right arrow Hovering over this with the mouse it says “Instruction Stepping” When this is selected, the single-step is no longer a C-line single-step, but an assembly-line single-step

This view was also created via the top menu: Window - Show View - Build Analyzer This shows us how much memory we have - and how much is used The usage percentage is also shown with a bar, applying traffic-light colors to visualize memory usage

For the rest of this book we will look mainly at text-file listings and some zoom-ins of single windows, as the full Eclipse screen is not very readable on paper

Whenever you start a debug session, the Debug Console window at the

bottom of the STM32CubeIDE quickly scrolls through some text This first run is shown in Listing 2.3 (with a few blank lines deleted) and contains the following sections:

• Debugger server settings - e.g the listening (TCP) port 61234 on the GDB-server “SWD” is Serial-Wire Debug It is an alternative to JTAG - SWD, however, use fewer wires and is slower

• Information about the board - once connected In this case NUCLEO- F334R8, Cortex-M4, Flash-size etc We see that we have an ST- Link device on-board that transforms the Serial-Wire Debugger port connection to our USB-cable

Trang 20

Nucteo - FirstRun/Core/Src/main.e - STM32CubelDEEile£ditSourceRefactor Navigate Search ProjectfiunJfitindow Help

» b:*■ ’ ■ 1• ■ M j-» Ut>- Oq.i*’

t> Debug XtProject Explorer Bill*ft“

v FirstRun [STM32 C/C++ Application]

* ® FirstRun.elf[cores 0]« i? Thread #1 [main] 1 [core: 0](Suspended: Breakpoint)

= mainO at mam.c99 OxBOOOldc*3arm-none-eabi-gdb (102,9020210621)

41ST-LINK (ST-LINK GDBserver)

Enter location here

“ B

Disassembly main:

push add blblblbl08000lc8:

08000lea:

08000lec:080001H0:08000ld4:

080001d8:

A99

» 080001dc:080001de:

rl, #32

r0, #1207959552 ; 0x480000080x8G00b38 <HAL_GPIO_TogglePin>HAL_Oelay(1000);

mov.wr0, #1000;0x3e8bl 0x8000610 <HAL_0elay>

>u Build Analyzer XFirstRun.elf /FfrstRun/Debug Aug 20.2023.2:30:40 PMMemory Regions: Memory Details

Region Start address End address Size Free■ CCMRAM 0x10000000 0x10001000 4KB 4 KB■RAM 0x20000000 0x20003000 12KB 10,32 KB■ FLASH0x08000000 0x08010000 64 KB 53,93 KB

UsedUsage {%)

OB 0.00%1,68 KB (4.00%

I

I 92/• USER CODE END 2 •/

I

I 94/•Infinite loop •/I 95/“ USER CODE BEGINWHILE */I 96while(1)

I97{

1

98 /*USER CODE EMO WHILE

99HAL_6PIO_TogglePin(LD2_GPIO_Port, LD2_Pin100HALJ>elay(1000);

101/•USER CODE BEGIN3 •/

Trang 21

13• Programming The program is flashed at address 0x0800 0000 This

address is standard in Arm Cortex-M boards We will dig much deeper into this Finally, the flashed image is verified

If you have problems starting the debugger, it is worthwhile to check the debugger text You might want to save a “good version” of this for comparison before things turn sour

Trang 22

Listing 2.3: Built-in Debugger Connection

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253

STMicroelectronics ST-LINK GDB server Version 7.0.0Copyright (c2022, STMicroelectronics All rights reserved.Starting server with the following options:

Persistent Mode : Disabled

BoardNUCLEO-F334R8Voltage3.26V

SWD freq4000 KHzConnect modeUnder ResetReset modeHardware resetDevice ID0x438

Revision IDRev ZDevice nameSTM32F303x4-x6-x8/F328xx/F334xxFlash size64 KBytes

Device typeMCUDevice CPUCortex-M4BL Version—Memory Programming Opening and parsing file: ST-LINK_GDB_server_a21428.srec

File: ST-LINK_GDB_server_a21428.srec

Address: 0x08000000Erasing memory corresponding to segment 0:Erasing internal memory sectors [0 5]Download in Progress:

File download completeTime elapsedduring download operation: 00:00:00.623Verifying

Download verified successfully

Trang 23

Chapter 3

An Embedded System3.1 Introduction

Before digging into the embedded software, we will discuss the hardware in a generic embedded system I will introduce a number of keywords and concepts We will dig deeper into most of these along the way, but in order to understand some concepts, we need an idea about the others Thus, it all becomes a bit iterative, and we use things that we later get more insight into

Back in the eighties I was part of a project developing medical equip­

ment for measuring Evoked Potentials This is small electric signals in

muscles and nerves - provoked by injecting currents through needles - and yes - it was a bit painful The Evomatic 8000 used Intel 8086 - immortalized by the first IBM PC - for general control However, we needed something much faster for Digital Signal Processing For this we used Bit-Slice - based on the 29xx-series from AMD The core of this was a 4-bit wide ALU -

Arithmetic Logical Unit - that could input two 4-bit operands, and add/sub-

tract, and/or/xor, and rotate/shift these to produce a 4-bit result Several ALUs could be chained together to support the arithmetic operations as well as shifting or rotating the full chain Using Carry-Look-Ahead and

AND’ing of Zero flags etc., it was possible to put the 4-bit ALUs together

like LEGO bricks Hence, the “Bit-Slice” name We accumulated data in 20-bits and thus needed five 4-bit ALUs There was also a Sequencer which would decide the next instruction address from the current address, the chosen operation - e.g., Branch to address XX if Less Than - and the flags

from the ALUs We created our own instruction set where we dedicated e.g., two times four bits to select two register operands among 16 registers Another field in the instruction-set spelled out the operation for the ALU

15

Trang 24

More fields were dedicated to addresses in RAM-memory and so on Basi­cally we allocated fixed parts of the instruction-word to specific purposes - like e.g., selecting registers We had a very long instruction word from all these fields, and we created our own assembly language for this.

Very importantly, we had a hardware multiplier that could multiply two 8-bit operands and produce a 16-bit result which was accumulated in the 20-bit wide accumulator - giving headroom for adding a lot of data before dividing with e.g., number of samples to obtain an average This

Multiply-Accumulate is a very common DSP-pattern that we will recognize

again and again.We did not have the advanced pipeline concepts that modern CPUs have, but I clearly remember how important it was to keep the multiplier busy all the time when I wrote a Radix-4 FFT, while in parallel moving data in and out of the registers Using the fields of the wide instruction word felt very much like opening doors to let the bits run back and forth like mice in an organized maze The DSP-parts of the system filled two

Double Eurocards (6U rack-size - app 233 mm * 160 mm).

Ilearned a lot about how a CPU is created, as we basically built our own huge CPU - or rather DSP Modern CPUs and MCUs are built around simi­lar concepts, but are much faster, use much less energy, and particularly much less space On top of this, the MCUs come stacked with peripherals Nevertheless, many of the underlying principles are the same

Figure 3.1 shows a generic programmers block-diagram of an MCU - Micro­

controller You will recognize several blocks from the Introduction Please

note that only selected data- and control-paths are shown In the figure we have the following components:

The clock-system is extremely important - and often very complex With everything built from basic flip-flops, we need clocks to drive signals through the gates In this sense, clocks are connected every­where The MCU will typically have several clocks - often derived from the same “Clock-Base” This base can be accurately set with an external crystal If the crystal is faulty or not present, the clock-base is normally derived from an RC-circuit You may programmatically decide which clock goes to which peripheral - and even if at times you want a given clock to pause In modern electronics the power­consumption is almost a linear function of the clock This means

Trang 25

3.2. A GENERICEMBEDDED SYSTEM 17

Figure 3.1: Generic Microcontroller

that if you can programmatically conclude that you do not need the service of a given peripheral or memory, and you can pause its clock, then there is power to save This again will lead to a longer “run-time” when running on batteries

Modern embedded systems support several Clock Domains Within a given clock domain there can be several clocks, but they are all based on a common clock In other words, you can get from one to the other by a simple multiplication and/or division Different clock­domains typically originate from the need to communicate with other devices over interfaces like USB, SPI, CAN, S/PDIF etc Here data may arrive at a clock-rate driven by the other side Clocks in different domains do not relate simply to each other, and they will eventually

common origin You will need advanced algorithms when data with a constant sample-rate is transferred between clock-domains Figure 3.2 shows a connection from the clock to the interrupt-controller This is relevant as you typically can set up different timers - and some may generate interrupts Such a clock-interrupt may be used by an RTOS - Real-Time Operating System - kernel

Trang 26

The Arithmetic Logic Unit is the heart of the MCU This is where normal integer calculations like adding and subtraction is done, but also logic operations - OR, AND, XOR etc Integer multiplications and divisions are typically also performed here The ALU creates the flags used for branching This is discussed in Chapter 7 E.g., the

This contains the “sequencer” from the introduction, that handles branching in loops and function-calls It is also a lot more Modern cores have a lot of Microcode This is proprietary secret low-level code It handles internal protocols and executes the many built-in state-machines that decide what can and what cannot be done - and in which sequence

CPUs and MCUs always have several registers In the old 8086 used in the original PC, each register had specific relation to specific assembly instructions As an example, the CX register had to be used as loop-counter, to facilitate a Decrement-And Branch-Until Zero

instruction Today, instructions are typically more orthogonal This

means that many instructions will work with any register Still some registers also have special meaning - like PC - Program Counter - and SP - Stack Pointer As we will see when we look at assembly code, the

Arm architectures - like many others - are of the type Load-And-Store

This means that all ALU operations - like add - work on registers To work on data from memory or peripherals, such data must first be loaded into a register The result of an ALU operation also ends in a register A new instruction is then needed to store the register into memory This means that a simple increment of a value in memory requires a read instruction, then an add instruction, and finally a write instruction On the general level we call this a Read-Modify­Write operation This is important in multithreaded programs, as we will discuss later When doing function calls with a limited number of operands, most CPUs and MCUs will use registers for these, as this is faster than using the stack All in all, this means that registers are incredibly central to Arm and many other CPUs and MCUs The Cortex-M Registers are discussed in Chapter 7

I just stated that in a load-and-store architecture all data needs to go

Trang 27

3.2. A GENERICEMBEDDED SYSTEM 19

through registers There is however one exception - DMA - Direct

Memory Access In modern systems many data-flows may routinely

run all the time One example starts with an A/D-converter - aka ADC This device will deliver samples at a constant rate - e.g., 48 kHz In very primitive systems each sample generates an interrupt, that triggers an exception-handler which reads the sample into a buffer in RAM We don’t want our advanced microcontroller interrupted for such mundane tasks all the time Instead, we set up a DMA-transfer

buffer is e.g half full (if we use an ADC) This will finally generate an interrupt, and we can use this to move the buffered data further along in our processing chain - while the other half of the buffer is filled by a new DMA-operation There may also be interrupts from D/A-converters, as well as from UARTs, CAN-buses etc In these scenarios, the peripheral end of the DMA will typically work on a single address, while the memory address is incremented by a fixed constant between each sample DMA can also be from one memory location to another memory location (not shown in the figure) In this case the two ends of the DMA will use address-increments that are not necessarily the same Note that even though DMA might save CPU-time, it may still occasionally block relevant buses This will temporarily pause the CPU

An MCU can run without external memory and therefore needs

called Non-Volatile Memory - often abbreviated to NVM This memory

is usually Flash It could, however, also be ROM - Read Only Memory ROM is memory that is programmed - Masked - at the silicon factory and cannot be deleted As Figure 3.2 shows, the flash may also contain constant data like e.g., tables

Again - an MCU is defined by having internal memory - also for data RAM is volatile - it needs power to retain its content Some MCUs allow us to copy performance-critical code to RAM, to speed it up It is, however, important that this does not introduce a RAM-bottleneck when both program and data is fetched from the same place

Even the simplest MCU will have circuitry allowing external events to

Trang 28

interrupt the current program-flow and take care of urgent business

Some devices will allow Nested Interrupts while others may not We

have nested interrupts when the normal flow of execution has been interrupted by an external event, and then another event occurs, which have higher priority, so we interrupt the interrupt This may continue for several levels As a software-developer you need to take this into account Without nesting, you may experience a high

we define interrupt-latency as the time it takes from the external event happens, until the MCU is chewing on the first instruction in the ISR - Interrupt Service Routine You also need to be aware that

each nested interrupt will cause the stack to grow further Re-entrant interrupts would make the above even worse and is normally not enabled

This can be anything Typical peripherals are Ethernet and USB,

UART - Universal Asynchronous Receive/Transmit for serial ports and

and DACs - Digital/Analog Converters Another popular component is

PWM - Pulse Width Modulation - which is a simple way to use a digital

output to provide a semi-analog voltage By changing the duty-cycle of a square-wave, you can generate an average voltage between the

motors Typically, we think about an MCU with added peripherals, but there are also cases where the peripheral is the key component

Low Energy - ICs These contain an MCU - typically Cortex-M0 or

Cortex-M4 You may use such a device as a BLE Gateway in a system where the built-in MCU may - or may not - be the only MCU/CPU in the system

used for testing the wiring between the IC’s on a board in production, but also is used for debugging In “the old days” it was normal to

have an Emulator - a device that could be connected into the CPU-

socket It would act as the CPU, but also facilitate debugging - like single-stepping, breaking and even tracing Many modern ICs have so many pins that this concept is not realistic anymore Also, clock­

Trang 29

3.2. A GENERICEMBEDDED SYSTEM 21

speeds are so high today, that you cannot send internal CPU-signals through a flat-cable without getting numerous bit-errors Instead, most CPUs and Microcontrollers have internal logic for debugging and use the JTAG - or a Serial-Wire Debugger (SWD) - to allow a PC to debug This is good for us, as Emulators used to be very costly.Not shown in the figure - but also important:

External Flash and RAM serve the same purpose as the internal versions They just give us more of the same - and they may be shared with other intelligent ICs There may however be differences in access time - which translates to speed Sometimes it is possible to go into a sleep-condition where external memory is powered down, while the internal keeps working This can be problematic with RAM as it requires reloading if completely powered of

Again this can be more of the same There is however, functionality that you prefer externally This can e.g., be components that consume a lot of power - like load-balancing of Lithium batteries, or voltage regulators Power equals heat It is also my experience that the best A/D-converters are “standalone” devices There is so much digital stuff going on in a core that it is more or less impossible to avoid EMC-related noise in internal ADCs

Figure 3.1 shows arrows leading to the internal memory and periph­eral components Similar connections lead to external components These are called “Buses” because they represent a lot of Data and Address lines that travel the same path In older literature you often see a data-bus and an address-bus - counting as two buses However, all the buses we see in the Arm documentation contain both address and data lines, and typically also numerous control signals Today such a set of address- and data-bus with control-signals are often just termed a bus

An external bus is typically 16 or 32-bit wide, while internal buses may be up to 64-bit wide There may be separate buses for vari­ous types of memory, or everybody may be on the same bus The technology applied to these buses is a key performance parameter

Trang 30

When we talk about internal peripherals and GPIO connecting to the outer world, it is extremely important to be aware of the microcontrollers

pin-multiplexing There always seem to be an abundance of pins in the

datasheet that you can use from the various peripherals However, when you look closer, you see that all these “logic” pins share a much smaller number of “physical” pins A part of the initialization is to select which functionality to connect to the outer world This is multiplexing and it obviously limits your choices Note that the microcontroller you are interested in may come in different packages This can limit the pin-count and thereby make your choices even harder Multiplexing is a fact with all microcontrollers - although some give you more flexibility than others

The short walk-through of Figure 3.1 opened up a lot of interesting subjects that we will deal with throughout the book The left side of the figure hints about the Address Space We will see more of this in Chapter 4 Many of the other blocks are discussed further in Chapter 7

enough on-board memory to run meaningful programs without external memory You also expect an MCU to contain some relevant peripherals

All the above means that MCUs are typically used for very specific pur­poses in a confined space This is basically the definition of an embedded system

The reason we still have CPUs is that they typically are more flexi­ble They often support modern operating systems by having an MMU

- Memory-Management-Unit - that allows the CPU to have a full-blown

operating system with virtual memory and paging etc Like Linux or Win­dows The cost is that we need to supply the CPU with external memory A CPU might, however, have some on-board peripherals and maybe a bit of memory as well, so the border between the definitions becomes blurred

modern MCU Onwards I will use the term “Core” for the inner part of a CPU or MCU - basically the stuff to the right of the memories in Figure 3.2

As stated, these things overlap A good example is that most CPUs will

Trang 31

3.4. HARVARD AND VONNEUMANN 23

Table 3.1: CPU versus MCU

boot up and load the code from an SSD1 into DRAM where it will execute it, while MCUs typically execute directly from Flash However, you will find examples of the opposite For now, we will assume that our embedded system executes directly from Flash - like a “true” MCU We will discuss memory in more detail in Section 3.5

1SSD is Solid-State Disk It is Flash-based and have no moving parts.

In the previous section we touched upon the subject of buses - the address and data-lines that connect the memory to the CPU This is relevant for classic CPUs as described above, as well as for the Core inside the MCU Essentially, the Core will apply an address to the address bus As soon as the zeros and ones on the address-wires are stable enough, the Core will either write or read data on the data bus There is help from signals that

control Read/Write as well as Chip Select Chip Select use the higher bits

of the address to enable only the right memory chip(s), while the rest of the address lines goes to the selected chip(s) This concept allows us to use memory chips that only covers a fraction of the total address-space Many buses have more signals, but they are not relevant here

A classic CPU like 8086 only had one set of data-lines (16-bit) and address-lines (20-bit) In fact, the cheaper - but software compatible -

8088 would multiplex the lower-byte address and data-lines so that the

same wires first would be used for addressing, and then for data This was slower, but cheaper

However, both 8086 and 8088 would need a lot of clock cycles to first fetch an instruction, and then later one or even two operands from the

Trang 32

Figure 3.2: Von Neumann and Harvard

memory - all in sequence The concept of sharing memory and buses for instructions as well as for data is called Von Neumann

The Von Neumann concept is especially annoying when doing Digital Signal Processing, where you often need to keep the central Multiply-

in a pre-defined constant and a data-value This led to the Harvard archi­tecture where you have one set of address and data buses for instructions, and another set for data See Figure 3.2 This allows for parallel access to data and code

DSPs favor the Harvard concept If the DSP is running the code of a filter, it will typically have the filter-constants stored together with the instructions - sometimes even as part of the instructions The first DSP I experienced - Texas Instruments TMS320C10 - was Harvard based and had a 16-bit instruction word A Multiply-Accumulate instruction dedicated 13 bits of the 16 to a filter-constant Thus, a FIR-filter of order 21 would contain 21 consecutive Multiply-Accumulate instructions in the assembly­source, with each 16-bit instruction containing a 13-bit constant Data was fetched in parallel with the help of an auto-incrementing data-pointer I used this DSP in my university graduation project in 1983 Later I worked with the famous intel 8051 microcontroller, which also uses the Harvard

architecture.Apart from the added board-space and cost, Harvard has a serious

Trang 33

3.5.PHYSICAL MEMORYTYPES 25

drawback: If data and instructions are completely separate, then how do you update the firmware? When a new program is loaded it is seen as data by the old program - so how do we get it into the instruction storage? A solution often requires additional external hardware

In general, most designers prefer the flexibility of Von Neumann and the speed of Harvard The most common solution in microcontrollers today is therefore to have one unified address space - with at least two (sets of) buses This is often called “Modified Harvard”, but it might be more correct to call it Von Neumann on the architectural level and Harvard at the implementation level

The early intel CPUs had special “In” and “Out” instructions for manip­ulating external hardware - peripherals This is basically abandoned, and peripherals are now mapped into the addressing space - still often called the memory space This means that when you map the 4 GB of address space for a modern 32-bit MCU, you will find blocks for respectively RAM, Flash and peripherals We will look at this in Section 4.1, where we will see that the overall unified memory map in the various Cortex-Ms is more or less a standard

Table 3.2 shows some of many types of memory The UV-erasable EPROM is kind of outdated I specifically remember a product where we had a

we had to take all 24 EPROMs out of their sockets Then we placed them in

in the PROM-burner - which had space for 8 EPROMs as far as I remember Finally, we could put them back in the circuit

At last came the event where we were to show the product - Counter­

guys, bringing a brand-new set of 24 EPROMs with me We mounted them all, switched on the power - and nothing happened Huge sigh from the assembled colleagues “Time to try the spare” I said and whipped out 24 more EPROMS from my bag They worked I was a hero that day - although I was probably the one who had messed-up the first set Incidentally, the paperwork for the hardware was not in place, and on the way home I got picked in customs I began to accept the fact that I would miss my flight But then - just next to me - Diego Maradona got into a fistfight with a press-photographer I silently gathered my stuff and walked on

Trang 34

Back to Table 3.2 Of all the memory-types in the table, only the RAM­variants are volatile Static RAM is simple to design into a system, as it

The upside of DRAM is the much higher density - and therefore MB per Dollar as well as per square-inch SDRAM is essentially DRAM running at a speed that is synchronized to the CPU frequency You can get Double

Data Rate RAM that can be accessed on both flanks of the clock - DDRAM

These come in faster and faster generations - like DDR5.When you buy a PC, you can basically buy the DRAM-type that you

high density for the many Gigabytes of RAM, and the DRAM-controller - normally built into the CPU - handles the refresh

So why do we then see SRAM in MCUs? The thing is, that if you leave SRAM with the voltage applied, but don’t use it, it uses almost no current Since P = U * I, the power consumption of an idle SRAM is close to zero If the MCU sleeps the RAM power also goes down - without any fancy control

• NAND-Flash

This is relatively slow when reading, and read errors do occur reg­ularly The good thing is that it has a high density and thus can

deliver many MB of program-space If you use it as an SSD disk, the

Many CPUs and MCUs come with some ROM from the factory This is typically some kind of bootloader - e.g., BOOTP - allowing you to boot

from an Ethernet connection It can also be code related to security, as well as guaranteed unique MAC-addresses and serial numbers OTP - One-Time

chip-vendors serial number but that of the board.Even though Flash can act much as EEPROM, you still see EEPROM being used It can (still) survive more rewrites than Flash EEPROM is also easy to write and read to

Flash is a complex subject We have two kinds:

• NOR-Flash

This is relatively fast to read, but slow to erase and write, and you rarely see read-errors This makes it ideal for reading with direct- addressing - and thus for in-place execution of code - directly from boot The downside is the low density - it costs more per MB The upside is a fast boot, a less-complex system and less need for RAM If you think the above sounds like a typical microcontroller, you are right

Trang 35

3.5.PHYSICAL MEMORYTYPES 27

Table 3.2: Physical Memory

Memory

Volatile Memory

SDRAM

SDRAM running faster

Programmable

Programmed in-circuit once

As in many other cases, we see new products that tend to lessen the gap between the technologies For this reason, you should probably focus on the basic features like read/write/erase time and density for the given Flash and less on the technology used Still, the above can explain why a given system executes in-place or is using code-shadowing

Trang 36

3.6The Cortex Families

After having discussed several generic concepts we will look at the Arm

The R stands for real-time These devices offer high performance and the lowest latency However, it’s not only real-time but also the safety features that define Cortex-R A good example is the Hercules family from Texas Instruments The high-end Hercules devices come with Cortex-R dual cores that can run in Lockstep 2- meaning that they run the same program and an exception is raised if the Cores disagree An important application area for Hercules and the likes is automotive and other transport where people’s life may depend on correct execution Arm-R also requires that the built-in MPU -

Memory Protection Unit - is used Where FreeRTOS would be a good

RTOS choice on Cortex-M, you would typically choose its sibling

SafeRTOS on a Cortex-R device.

Trang 37

3.6 THE CORTEXFAMILIES 29

Figure 3.3: Cortex-M4 in STM32F334R8 Courtesy of ST

from Cortex-A is that Cortex-M never has an MMU Of the three pro­files, Cortex-M is the best when it comes to low-power and keeping costs down The NVIC - Nested Vector Interrupt Controller is common

to all Cortex-M MCUs Figure 3.3 shows Cortex-M as built into our sample - STM32F334R83 Cortex-M devices cannot run Linux (at least not without cutting some serious corners) They can however, run an RTOS like Arm Mbed and FreeRTOS There are also many designs running “bare metal” on Cortex-M devices

3Except that STM32F334R8 has no MPU

The Cortex profiles are created to be extended by silicon vendors - offering many ways for these vendors to stand out - but still assuring that the programming experience does not change too much when you move from one device to another The first many Cortex-devices were all 32-bit cores and a major “common ground” for these, is the 4 GB Memory Map which we will see in Section 4.1 We will dig further into the Cortex-M architectures in Chapter 7

Trang 38

3.7STM32F334R8 Architecture

It is time to show a concrete example of an MCU architecture Let’s take a look at the STM32F334R8 MCU that we saw briefly in action in Chapter 2 We will focus mostly on “generic” stuff and not delve too far into specific peripherals There is always the datasheet

The overall block diagram is shown in Figure 3.4 The small box in the top left corner is the Cortex-M4 CPU-Core from Figure 3.3 The Cortex core is the main IP from Arm4, and ST has chosen to explicitly show the following blocks:

4The buses outside the Core also comes from Arm

The Serial-Wire/JTAG is an interface for basic debugging - like up­dating firmware, single-stepping and using breakpoints We have already seen it in action It is very common

The Trace-Port Interface Unit is a bridge to the internal debug­hardware that supports low-cost debug-tracing The chip-vendor can also select other alternatives to this - e.g something supporting their specific hardware and software Tracing is a functionality that allows you to set a normal breakpoint, but when the hardware breaks you can see a “recording” of the instructions leading up to the breakpoint Typically, there is not much memory for this, but then you may be able to apply a “filter” and only record specific instructions - like e.g branches or reads and writes to a particular variable The TPIU can run in a mode where it outputs data on a Serial-Wire Output - SWO to

from your program to a debugger at a relatively high speed We try this in Chapter 13

While Cortex-M0 through Cortex-M3 cannot support FPU and Cortex- M7 always has an FPU - with double-precision as an option - the Cortex-M4 offers the silicon vendor the choice between nothing and a single-precision FPU In the case of STM32F334, ST has included the latter For the C-programmer this is a “float” We will get back to this in Chapter 9

Trang 39

3.7. STM32F334R8 ARCHITECTURE 31

• The basic Cortex-M4 Core

This can also to a degree be specified by the silicon vendor (here ST) As an example, little-endian was chosen by ST in this MCU (as is the common choice in Arm-based designs) We see ST’s view in Figure 3.3

You will not be surprised to hear that the Nested Vector Interrupt Controller allows the CPU to handle interrupts that interrupt lower priority interrupts - thus nesting interrupt service routines in each other

The fact that the Cortex-M4 takes little space in the diagram is not a reflection of its contribution to the total - neither in complexity nor in silicon space It just means that the Cortex-M4 used here is pretty standard and is well documented by Arm, so “let’s not waste too much space in the diagram” Some selected parts from the rest of the diagram:

3.7.1 Buses

Figure 3.4 shows several buses We will focus on the buses out of the Cortex-M4 core, as they relate to the instructions Instead of two Harvard buses to two separate spaces, we see three buses to the shared space:

Ibus

The Ibus is used to fetch instructions (code) This bus can read from the range 0x0000 0000 - 0x2000 0000 Figure 4.1 shows the memory map of the STM32F334R8 We see that the Ibus address range fits with the area noted as “CODE” Using a pin on the chip - “Boot0” - and a configuration bit - “nBoot1” - the designer can choose to “alias” either onboard Flash, SRAM or “System Memory” into this space “Alias” means that it will be accessible at both the original address

range and starting from 0

- When BootO is 0 we will see the 64k of Flash from address 0x0800 0000 aliased to address 0 and onwards This is the normal scenario

- When Boot0 is 1 and nBoot1 is 0, we will see SRAM from address 0x2000 0000 aliased to address 0 and onwards

- When both bits are 1, we will see “System Memory” mapped from address 0x1FFF D800 aliased to address 0 and onwards

Trang 40

Figure 3.4: STM32F334 Architecture Courtesy of ST.

Ngày đăng: 20/08/2024, 11:24

w