• Software support and hardware protection for multitasking• Support for parallel processing • Cache required: external memory is not fast enough address data 32 32 8 Kbyte cache 32 bit
Trang 1Nội dung môn học
1 Giới thiệu chung về hệ vi xử lý
Trang 4• -15V power supply
Trang 5Intel 8008
• First 8-bit processor (1972)
• Cost $500; at this time, a
4-bit processor costed $50
• Complete system had 2
Kbyte RAM
• 200 KHz clock frequency, 10
µ m, 3500 TOR, 0.06 MIPS,
16 Kbyte addressable memory
• 18 pin package, multiplexed
address and data bus
Trang 6power supply, 6 KTOR, 0.64 MIPS
• 64 Kbyte address
space (“as large as designers want”, EDN 1974)
• 10X the performance
of the 8008
Trang 7Intel 8088
• 16-bit processor
• introduced in 1979
• 3 µ m, 5 - 8 MHz, 29 KTOR, 0.33 a 0.66 MIPS,
1 Mbyte addressable memory
• 10X the performance of
the 8008
Trang 820
Trang 9Intel 80286
• Introduced: 1983
• 1.5 µm, 134 KTOR, 0.9 to 2.6 MIPS
• Clock frequency: 6 - 25 MHz
• 16MB addressable, 1GB virtual memory
• 3-6X the performance of the 8086
16 bit integer CPU
address
data 16
24 MMU
Trang 10• Software support and hardware protection for multitasking
32 bit integer CPU
address
data 16
24 MMU
Trang 11Intel 80386dx
• Introduced: 1988
• Clock frequency: 16 - 40 MHz
• 4GB addressable memory, 64 TB virtual memory
• Software support and hardware protection for multitasking
32 bit integer CPU
address
data 32
32 MMU
Trang 12• Software support and hardware protection for multitasking
• Support for parallel processing
• Cache required: external memory is not fast enough
address
data 32
32
8 Kbyte cache 32 bit integer CPU
64 bit FPU MMU
Trang 13Intel 80486sx
• Introduced: 1989
• 0.8 µm, 1.2 MTOR, 20 to 41 MIPS
• Clock frequency: 25 - 50 MHz
• Software support and hardware protection for multitasking
• Support for parallel processing
• Cache required: external memory is not fast enough
address
data 32
32
8 Kbyte cache 32 bit integer CPU
MMU
Trang 14Intel 80486dx2
• Introduced: 1992
• Clock frequency: internal: 50 - 66 MHz, external: 25 - 33 MHz
• Software support and hardware protection for multitasking
• Support for parallel processing
• Cache required: external memory is not fast enough
address
data 32
32
8 Kbyte cache 32 bit integer CPU
64 bit FPU MMU
Trang 15Intel Pentium
• Introduced: 1993
• (.8 µm, 3.1 MTOR) up to (.35 mm, 4.5 MTOR incl MMX)
• Clock frequency: internal: 60 - 166 MHz, external: 66 MHz
• Support for parallel processing: cache coherence protocol
• Super scalar ->5X the performance of the 33MHz Intel486 DX
address
data 64
32
64 bit FPU
Static branch prediction unit
32 bit integer pipelined CPU
32 bit integer pipelined CPU
MMU
8 Kbyte program cache
8 Kbyte data cache
Trang 16Intel Pentium Pro
• Introduced: 1995, 0.35 µm, 3.3 V, 5.5 MTOR, 35W, 387 pin
• Clock frequency: 150 - 200 MHz Internal, 60 - >100 MHz External
• Super scalar (4 Instr./cycle), super pipelined (12 stages)
• Support for symmetrical multiprocessing (≤4 CPU)
• MCM: 256-1024 Kbyte L2 4-way set associative cache
Dynamic branch prediction unit
MMU
Instruction dispatch unit
32 bit integer pipelined CPU
64 bit pipelined FPU Address generation unit
32 bit integer pipelined CPU
32 bit integer
data 64+ECC
36
8 Kbyte L1 program cache
8 Kbyte L1 data cache
to L2 cache
Trang 17Intel Pentium II
• Introduced: 1997, 0.25 µm, 2.0 V, 9 MTOR, 43 W, 242 pin
• Clock frequency: 200 - 550 MHz Internal, 100 - 225 MHz L2 cache, 66 - 100 MHz
External
• Super scalar (4 Instr./cycle), super pipelined (12 stages)
• Support for symmetrical multiprocessing (≤8 CPU)
• Single Edge Contact Cartridge with Thermal Sensor: 256-1024 Kbyte L2 4-way set associative cache
Dynamic branch prediction unit
MMU
Instruction dispatch unit
64 bit pipelined FPU
64 bit pipelined FPU Address generation unit
32 bit integer pipelined CPU
32 bit integer
data 64+ECC
36
16 Kbyte L1 program cache
16 Kbyte L1 data cache
to L2 cache ECC
Trang 18Intel Pentium III
• Introduced: 1999, 0.18 µm , 6LM, 1.8 V, 28 MTOR, 370 pin
• Clock frequency: 450 - 1130 MHz Internal, 100-133 MHz External
• Super scalar (4 Instr./cycle), super pipelined (12 stages)
• Support for symmetrical multiprocessing (≤2 CPU)
Dynamic branch prediction unit
MMU
Instruction dispatch unit
64 bit pipelined FPU
64 bit pipelined FPU Address generation unit
32 bit integer pipelined CPU
32 bit integer
data 64+ECC
36
16 Kbyte L1 data cache
256 Kbyte L2 unified
cache
16 Kbyte L1 program cache
Trang 19Intel Pentium IV
• Introduced: 2002, 0.13 µm or 90nm , 1.8 V, 55 MTOR
• Clock frequency: 1,4 to 3.8 GHz Internal, 400 to 800 MHz External
• Super scalar (4 Instr./cycle), super pipelined (12 stages)
• Newer versions: Hyper threading
Dynamic branch prediction unit
MMU
Instruction dispatch unit
64 bit pipelined FPU
64 bit pipelined FPU Address generation unit
32 bit integer pipelined CPU
32 bit integer
data 64+ECC
36
16 Kbyte L1 data cache
256/512/1024 Kbyte L2
16 Kbyte L1 program cache
Trang 20IA-64 (Itanium)
• Design started in 1994; first samples on the market in 2001
• 64-bit address space (4x10 9 Gbyte; we will never need that much…)
• 256 64-bit integer and 128 82-bit floating point registers; 64 branch target
registers; 64 1-bit predicate registers
• 41 bit instruction word length
• 10-stage pipeline
• separate L1 data and program, 96 Kbyte L2 unified on-chip, 4 Mbyte L3 unified off-chip
Trang 22Trends for general purpose processors
• Higher clock frequencies: 4.7 -> 30 GHz
• Faster memory: 120 ns -> 50 ns
not proportional to clock frequency increase => use of caches and
special DRAM memories (e.g SDRAM)
• Limited by power dissipation => decreasing power supply voltage
• Parallel processing
• Memory with processor instead of processor with memory
Trang 23The future: general characteristics
1.5- 1.5
1.1- 1.2
1.0- 0.9
0.7-0.6 0.5 0.4
Max power
dissipation/chip
Will 22 nm be the end of the scaling race for CMOS?
Some believe10 nm will be the end…
…thereafter, semiconductor drive will be scattered (MEMS, sensors, magnetic, optic, polymer, bio, …)Depending on application domain: besides and beyondsilicon
Trang 24Besides and beyond silicon (e.g polymer
electronics)
Trang 25Besides and beyond silicon: applied to future
ambient intelligent environments
Trang 26Besides and beyond silicon: applied to future
ambient intelligent environments
© Emile Aarts, HomeLab, Philips
Trang 27Besides and beyond silicon: applied to ambient
intelligent HomeLab (2002)
Trang 28750 1250 2100 3500 6000 10000 16903
On chip global clock freq (MHz)
The future: high performance ( µ P)
• CTO Intel says in 2001
Trang 29The future: high performance ( µ P)
Trang 30Exponential growth for 3 decades!
This is called ‘Moore’s law’: number of transistorsdoubles every 18 months
(Gordon Moore, founder Intel Corp.)
Trang 31Processor performance
• Smaller line size
More transistors => parallelism
1983: 1 instruction per 4 clock cycles
2002: 8 instructions per clock cycle
Smaller capacitors => faster
1983: 4 MHz
2002: 2800 MHz
Speed-up: 25000
• Enables new applications
UMTS with large rolled-up OLED screen enabling web
downloadable services (e.g virtual meetings)
• Do we find applications that are demanding enough for next decade’s
processors?
Trang 32The future: DRAM
Trang 33Processor
Gap
Trang 34Memory density
• Skills: center of gravity
USA: processors (Intel, Motorola, TI, …)
Japan: memory (NEC, Toshiba, …)
Future: IC = processor + memory
Where???
• Memory density grows faster than needs
1983: 512 Kbyte @ 64 Kbit/chip = 64 chips/PC
2001: 256 Mbyte @ 512 Mbit/chip = 4 chips/PC
Compensated if you sell at least 16 times more PCs…
… or if you find new applications (UMTS, car,…)
2010: 4 Gbyte @ 64 Gbit/chip = 0.5 chip/PC
No need for such a large memory chip…
… unless you find new applications (3D video…)
Trang 35Power consumption
Power (W/cm2)
1101001K
Processor architecture design driven by memory bottleneck
& power problem!Nevertheless, ‘cooling tower’ is necessary!
Trang 36Power consumption
Cooling “tower”
Trang 37Power consumption
• Let us do a calculation:
How long could a GSM using a Pentium 3 (hardly powerful
enough…) last on a single battery charge?
Capacity of a battery:
600 mAh @ 4V = 2400 mWh
Power consumption Pentium 3: 45 W
One charge lasts for … 3 minutes!!!
• Let us turn the computation upside down:
We want a GSM to last for 240 hours on a single charge How much
power may be consumed by the processor?
Capacity of a battery:
600 mAh @ 4V = 2400 mWh
Power consumption processor: 10 mW
Possible via specialization to the application:
dedicated hardware…
Trang 38Summary on technological trends
• Technologically speaking, we can have the same exponential evolution for
another decade
• This gives us at least 4 decades of exponential evolution, never seen in history
• End-user price stayed the same or even decreased
Since 30 years, the price for a brand new processor is 1000 USD
• So far for the good news…
Trang 39Designcomplexity
10%/year
Designproductivity
Trang 40Design issues
• We can build exponentially complex circuits, but we cannot design them
Design of Pentium 4: 8 years, during last 2 years with a team of
1000 persons
Who can afford this???
Trang 42Giới thiệu về vi điều khiển
• Vi điều khiển = CPU + Bộ nhớ + các khối ghép nối ngoại vi + các khối chức năng
Trang 43Programmable chip selects Interrupt logic
2 Timers
Serial asynch.
Buffered I/O
1 12 7
32 bit integer CPU
address
data 16
24
Trang 4432 bit integer CPU
Programmable chip selects Interrupt logic
Timer PU
Serial asynch.
Buffered I/O
1 12 7
Parallel I/O
address
data 16
24
2 Kbyte RAM
Trang 4532 bit integer CPU
Programmable chip selects Interrupt logic
2 Timers
Serial asynch.
I/O
2 4 7
Parallel I/O
address
data 16
32
2 channel DMA controller
Trang 46Motorola MC68F333
• 68020 processor
• 4 Kbyte SRAM, 64 Kbyte on chip flash EEPROM
• 8 channel 10-bit ADC, 16 channel 16-bit timer
• several interfaces
• Introduced in 1994
Trang 47Motorola MC68HC16
• 16-bit microprocessor
• Introduced in 1994
www.freescale.com
Trang 49• CISC 8bit processor
• Max clock 12MHz, 1 instruction cycle= 1us
• 16bit addressable memory: SROM, SRAM
• Internal RAM 128byte
• Special Function Registers (SFR)
• 2 16bit-Timers/Counters
• 5 interrupt source
• 4x8 GPIO
Trang 50 8K Flash ROM for programming, 3 level of memory lock
256 byte internal RAM
• 89S8252 = 89S52 + 2K data flash ROM
• MSC 51 family: popular, Developed by many manufacturers
Trang 51 Analog functions: 10-bit, 6 channels A/D converter and 2 PWM units
MP3 Decoder (MPEG1&2Layer-3), 64 Kbytes Flash
MultiMediaCard™, DataFlash®, SmartMedia™, CompactFlash™
IDE Interfaces UART, SPI and Two-wire Interface (TWI), USB 1.1
Audio Stereo DAC and 500mW Power Amplifier.
Trang 52Philips MSC51
16K ROM, 512 RAM
UART,CAN bus controller
8×10-bit A/D, PWM outputs,
8K ROM, 768B RAM, 512B EEPROM
PWM, Watchdog, ADC 8bit
UART, I2C, API
• eXtended Architecture, XAC37
16bit processor, 32 MHz
32KB ROM, 1KB RAM
UART, SPI, CAN 2.0
• eXtended Architecture, XAH4
4 UARTs, DRAM controller
Trang 54 32 general purpose registers
• Low power consumption = Long battery life time
1.8 - 5.5 volts operation
Variety of operation modes, fast wake-up from low-power modes
Software controlled operation frequency
• High code density = Ideal for High-level Languages
Architecture designed for C, C-like addressing modes
16- and 32-bit arithmetic support
Linear address maps
• Outstanding memory technology
Self-programming Flash
EEPROM for parameter storage
SRAM
Trang 55Atmel AVR
• Automotive AVR: ATmega168 Automotive
16KB Flash Program Memory, 512B SRAM, 256B EEPROM
8 Channel 10-bit A/D-converter
DebugWIRE On-chip Debug System
16 MIPS throughput at 16 MHz
• CAN AVR AT90CAN128
CAN Controller V2.0A and V2.0B standard compliant
Perfectly suited for Industrial and Automotive applications
• megaAVR ATmega128
128KB ROM, 4KB SRAM, 4KB EEPROM
8 Channel 10-bit A/D-converter
JTAG interface for on-chip-debug
Up to 16 MIPS throughput at 16 MHz
2.7 - 5.5 Volt operation
• USB AVR AT90USB1286
Trang 57• Mixed-Signal Controllers
8bit MCU core + Digital blocks + Analog blocks
Low voltage, low power: 1.5V-5.5V
• MCU core
CISC, Clock 3-48Mhz, Onchip oscillator
4-20KB Flash (program + storage) , 128-2K SRAM
Debugger core
• Digital blocks
8-32bit Counters/ Timers, PWM
Communication: I2C inteface, UART
Logic: Buffer , NOT gates, Multiplexer
• Analog blocks
DAC 6-12 bit, ADC 6-12 bit
Amplifiers: Power Amp., Dif Amp.,
Analog filters
DTMF generator, Analog multiplexer
Trang 59Trends for microcontrollers
• Standard CPU core surrounded by peripherals taken from a vast library
• Single architecture line is whole family
different memory & on-chip peripherals
for embedded applications
Deterministic behavior
no caches, no virtual memory, but on-chip RAM
no out-of-order execution
delayed branch prediction
Trang 60Trends for microcontrollers
• Word length as small as possible
Trang 62Texas Instruments TMS320C20x Low end consumer Fixed Point
• Series continued; typical app.: Digital camera, feature-phones, disk drives,
Point-of-Sales Terminal
• 40 MHz, 3.3-5V, 3LM
• Available as core
Selection of peripherals:
serial comm., timers,
fixed MAC 16x16+32->32 PROM
Dual access data RAM
address
data 16
18
address
data 16
16 I/O Loop controller
Trang 63Texas Instruments TMS320C24x Low end consumer Fixed Point
• Series continued; typical app.: electrical motor control
• 50 MHz, 5V
Selection of peripherals:
serial comm., timers,
fixed MAC 16x16+32->32 PROM
Dual access data RAM
address
data 16
16 Loop controller
8 output PWM
8 channel A/D
CAN bus controller watchdog
Trang 64serial comm., timers, DMA,
32 bit floating add
32 bit floating multiply PRAM
XRAM
YRAM
address
data 32
24
address
data 32
24
I/O ACU
ACU
Trang 65Texas Instruments TMS320C4x Floating Point Message Passing
• Series discontinued; typical app.: prototyping, radar
• 60 MHz, 5V, 325 pin
• Super scalar; message passing multiprocessor
Loop controller
Serial link, timers
32 bit floating add
32 bit floating multiply PRAM
4KByte XRAM 4KByte YRAM
20 MB/s
8
12 channel DMA controller
address
data 32
32
address
data 32
32 ACU
ACU
Trang 66Texas Instruments TMS320C54xx High end consumer Fixed Point
• Series continued; typical app.: GSM, set-top box, audio
• 1.8-5V, max 160 MHz, 144 pin, 15µm (1999), 0.32mW/MIPS for the core
• Specialized on-chip unit: will occur more often in future
• e.g C5420: dual core + 2x100 MW on-chip SRAM
e.g C5402: 5$ for 100 MIPS
Loop controller
Buffered serial links, timers,
6 channel DMA controller
Fixed ALU 32+32->40 Fixed Add 32+32->40 Fixed multiply 17x17->34 Viterbi PROM
Dual access XRAM
YRAM
address
data 32
17
address
data 16
16 ACU
ACU
I/O
Trang 67Texas Instruments TMS320C5510 High end consumer Fixed Point
per unit and per cycle
Power Mgment
Buffered serial links, timers,
6 channel DMA controller
Fixed ALU 32+32->40 Fixed Add 32+32->40 Fixed multiply 17x17->34
Viterbi
PROM
32 KByte
Dual access XRAM (256 Kbyte)
YRAM (64 Kbyte)
address
data 32
24
address
data 16
16 ACU
ACU
I/O
Fixed multiply 17x17->34 P-cache
24 KByte
Trang 68Texas Instruments TMS320C8x
Fixed Point Video
• Series discontinued; typical app.: video phone, video conferencing,
multimedia workstations
• Introduced: 1995, 50 MHz, 305 pin
• Multiprocessor-on-a-chip; sub-word SIMD for each DSP
DSP processor 1 DSP processor 2 DSP processor 3 DSP processor 4
General purpose RISC processor
Transfer controller
data address 32
Trang 69X-Texas Instruments TMS320C6201
High end Fixed Point
• Series continued; typical app.: modems, multimedia
• 1997, 0.25 µm, 5ML, 352 pin, 200 MHz, 2.5V, 1.9W, $85
• Super scalar (8 Instr./cycle), 1600 MIPS
• VLIW: 256 bit instruction word
fixed MUL 16x16->32 fixed MUL 16x16->32 fixed ALU 32+32->40 fixed ALU 32+32->40 fixed ALU/branch 32+32->40 fixed ALU/branch 32+32->40 integer ACU 32+32 integer ACU 32+32
16KByte D-SRAM 16KByte D-SRAM 16KByte D-SRAM 16KByte D-SRAM
64KByte P-SRAM/cache JTAG / clock pump
4 channel DMA
2 Serial ports
2 Timers
Ext memory interface
data address 17
16
Host interface
data address 23
32
External memory