1. Trang chủ
  2. » Giáo Dục - Đào Tạo

Design and Implementation of VLSI Systems_Lecture 05: Circuit Characterzation performace estimation doc

129 310 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 129
Dung lượng 2,46 MB

Nội dung

D ELAY DEFINITIONS tpdr: rising propagation delay  Max time: From input to rising output crossing VDD/2  tpdf: falling propagation delay  Max time: From input to falling output cr

Trang 1

Design and Implementation

of VLSI Systems

Lecture 05

Thuan Nguyen Faculty of Electronics and Telecommunications,

University of Science, VNU HCMUS

Spring 2011

1

Trang 2

LECTURE 05: CIRCUIT CHARACTERIZATION &

Trang 4

Critical paths are those which require attention

to timing details

Timing analyzer is a design tool that

automatically finds the slowest path in a logic

 The architecture/ microarchitecture level

 The logic level

 The circuit level

Trang 5

D ELAY DEFINITIONS

tpdr: rising propagation delay

 Max time: From input to rising output crossing VDD/2

tpdf: falling propagation delay

 Max time: From input to falling output crossing VDD/2

tpd: average propagation delay tpd = (tpdr + tpdf)/2

tcdr: rising contamination (best-case) delay

 Min time: From input to rising output crossing VDD/2

tcdf: falling contamination (best-case) delay

 Min time: From input to falling output crossing VDD/2

tcd: average contamination delay tcd = (tcdr + tcdf)/2

Trang 6

HOW TO CALCULATE DELAY? JUST RUN SPICE!

(V)

0.0 0.5 1.0 1.5 2.0

t(s) 0.0 200p 400p 600p 800p 1n

tpdf = 66ps tpdr = 83ps

Vin

Vout

•Time consuming

•Not very useful for designers in evaluating different options

and optimizing different parameters

• We need a simple way to estimate delay for “what if” scenarios

Trang 7

T RANSISTOR RESISTANCE

In the linear region

•Not accurate, but at least shows that the resistance is

Trang 8

SWITCH-LEVEL RC MODELS

effective resistance R

transistor of the same size due to the pMOS mobility

of double-unit width has effective resistance R

A transistor of k unit width has kC capacitance and R/k

k g

Trang 9

CALCULATE K

Trang 10

EXAMPLE: 3-INPUT NAND GATE

to achieve effective rise and fall resistances equal to a unit inverter (R)

3

3

2 2

2

3

C = Cgate + Csource diffusion + Cdrain diffusion

 To keep estimation simple

Cgate = Cdiffusion

o The capacitance consists of

gate capacitance and

source/drain diffusion

capacitance

Trang 11

EXAMPLE: 3-INPUT NAND GATE

3 3

3C 3C

3C 3C 3C

diffusion capacitance

9C 3C 3C 3

3 3

2 2

2

5C 5C 5C

Trang 12

ELMORE DELAY MODEL

 ON transistors look like resistors

 Elmore delay of RC ladder

Trang 13

COMPUTING THE RISE AND FALL DELAYS

 Estimate rising and falling propagation delays of

a 2-input NAND driving h identical gates

h copies

6C 2C 2

2

2 2

Trang 14

2 2

4hC B

A

x

Y

R (6+4h)C Y

Trang 15

DIFFUSION CAPACITANCE

7C 3C 3C 3

3 3

2 2

2

3C

2C 2C

3C 3C

Isolated Contacted Diffusion Merged

 We assumed contacted diffusion on every s / d

 Good layout minimizes diffusion area

 Ex: NAND3 layout shares one diffusion contact

Trang 17

LECTURE 05: CIRCUIT CHARACTERIZATION &

Trang 18

 Chip designers face a bewildering array of choices

 What is the best circuit topology for a function?

 How many stages of logic give least delay?

 How wide should the transistors be?

 Logical effort is a method to make these decisions

 Uses a simple model of delay

 Allows back-of-the-envelope calculations

alternatives

? ? ?

Trang 19

Motoroil 68W86, an embedded automotive processor

Help Ben design the decoder for a register file

 Decoder specifications:

 16 word register file

 Each bit presents load of 3 unit-sized transistors

Trang 20

DELAY COMPONENTS

Parasitic delay (due to gate own diffusion capacitance)

R/2

R/2

Trang 21

DELAY IN A LOGIC GATE

 Delay has two components: d = f + p

f : effort delay = gh (a.k.a stage effort)

g : logical effort

 Measures relative ability of gate to deliver

current

g  1 for inverter

h : electrical effort = Cout / Cin

 Ratio of output to input capacitance

 Sometimes called fanout

p: parasitic delay

 Represents delay of gate driving no load

 Set by internal parasitic capacitance

abs

d d

  3RC  3 ps in 65 nm process

60 ps in 0.6 m m process

Trang 22

0 1 2 3 4 5 6

Trang 23

COMPUTING LOGICAL EFFORT

 DEF: Logical effort is the ratio of the input

capacitance of a gate to the input capacitance of

an inverter delivering the same output current

 Measure from delay vs fanout plots

 Or estimate by counting transistor widths

B

Y

A B

Y 1

2

2 2

4 4

Trang 26

EXAMPLE: RING OSCILLATOR

 Estimate the frequency of an N-stage ring

Trang 27

EXAMPLE: FO4 INVERTER

 Estimate the delay of a fanout-of-4 (FO4) inverter

Trang 28

LIMITATIONS OF LINEAR DELAY MODEL

Trang 29

LIMITATIONS OF LINEAR DELAY MODEL

 Input Arrival Times

Trang 30

LIMITATIONS OF LINEAR DELAY MODEL

 Gate-Source Capacitance

Trang 31

LIMITATIONS OF LINEAR DELAY MODEL

 Bootstrapping

Trang 32

MULTISTAGE LOGIC NETWORKS

 Logical effort generalizes to multistage networks

Path Logical Effort

Path Electrical Effort

Path Effort

i

out-path in-path

C H

Trang 33

MULTISTAGE LOGIC NETWORKS

 Logical effort generalizes to multistage networks

Path Logical Effort

Path Electrical Effort

Trang 34

PATHS THAT BRANCH

 No! Consider paths that branch:

Trang 35

BRANCHING EFFORT

Introduce branching effort

 Accounts for branching between stages in path

Trang 36

MULTISTAGE DELAYS

 Path Effort Delay

 Path Parasitic Delay

Trang 37

DESIGNING FAST CIRCUITS

 Delay is smallest when each stage bears same effort

 This is a key result of logical effort

 Find fastest possible delay

 Doesn’t require calculating gate sizes

Trang 38

GATE SIZES

 How wide should the gates be for least delay?

transformation to find input capacitance of each

gate given load it drives

 Check work by verifying input cap spec is met

ˆ

ˆ

out in

i i

C C

i out in

g C C

f

Trang 39

EXAMPLE: 3-STAGE PATH

 Select gate sizes x and y for least delay from A to

B

8

x x

B

Trang 40

EXAMPLE: 3-STAGE PATH

x

y y

45 45

Trang 41

EXAMPLE: 3-STAGE PATH

y = 45 * (5/3) / 5 = 15

x = (15*2) * (5/3) / 5 = 10

P: 4 N: 4

N: 3

8

x x

B

Trang 42

BEST NUMBER OF STAGES

 Minimizing number of stages is not always fastest

 Example: drive 64-bit datapath with unit

2 8 18

3 4 15

4 2.8 15.3

Trang 43

 Consider adding inverters to end of path

 How many give least delay?

 Define best stage effort

N - n1 Extra Inverters Logic Block:

n1 Stages Path Effort F

1 1

1 1

Trang 44

BEST STAGE EFFORT

 has no closed-form solution

 Neglecting parasitics (pinv = 0), we find  = 2.718 (e)

 For pinv = 1, solve numerically for  = 3.59

inv

Trang 45

1.0 2.0 0.5 0.7 1.4

N / N

1.15

1.26 1.51

(  =2.4) (  =6)

Trang 46

EXAMPLE, REVISITED

68W86, an embedded automotive processor Help Ben

design the decoder for a register file

 Decoder specifications:

 16 word register file

 Each bit presents load of 3 unit-sized transistors

How fast can decoder operate?

Trang 47

 Try a 3-stage design

Trang 48

GATE SIZES & DELAY

Trang 50

C C

C C

h

on-path off-path on-path

C C C

b  

fgh f

p

d  f p g

1

Trang 51

METHOD OF LOGICAL EFFORT

1) Compute path effort

2) Estimate best number of stages

3) Sketch path with N stages

4) Estimate least delay

5) Determine best stage effort

6) Find gate sizes

g C C

f

Trang 52

LIMITS OF LOGICAL EFFORT

 Need path to compute G

 But don’t know number of stages without G

 Simplistic delay model

 Neglects input rise time effects

 Interconnect

 Iteration required in designs with wire

 Not minimum area/power for constrained delay

Trang 53

 Logical effort is useful for thinking of delay in

circuits

 Numeric logical effort characterizes gates

 NANDs are faster than NORs in CMOS

 Paths are fastest when effort delays are ~4

 Path delay is weakly sensitive to stages, sizes

 But using fewer stages doesn’t mean faster paths

 Delay of path is about log4F FO4 inverter delays

 Inverters and NAND2 best for driving large caps

 Provides language for discussing fast circuits

 But requires practice to master

Trang 54

 Homework Assignment #4 View

 Submit your answer in the next week

Trang 56

POWER AND ENERGY

 Power is drawn from a voltage source attached to

the VDD pin(s) of a chip

Trang 57

POWER IN CIRCUIT ELEMENTS

2 0

Trang 58

CHARGING A CAPACITOR

 Energy stored in capacitor is

transistor as heat, other half stored in capacitor

2 1

Trang 59

SWITCHING WAVEFORMS

 Example: VDD = 1.0 V, CL = 150 fF, f = 1 GHz

Trang 60

Tf CV T

Trang 61

ACTIVITY FACTOR

 Suppose the system clock frequency = f

 Let fsw = af, where a = activity factor

 If the signal is a clock, a = 1

 If the signal switches once per cycle, a = ½

2

Trang 62

SHORT CIRCUIT CURRENT

networks may be momentarily ON at once

 Leads to a blip of “short circuit” current

 < 10% of dynamic power if rise/fall times are

comparable for input and output

 We will generally ignore this component

Trang 63

POWER DISSIPATION SOURCES

 Ptotal = Pdynamic + Pstatic

 Dynamic power: Pdynamic = Pswitching + Pshortcircuit

 Switching load capacitances

Trang 64

DYNAMIC POWER BREAKUP

Interconnect 51%

Gate 34%

Diffusion 15%

Total dynamic Power 64

Trang 65

DYNAMIC POWER EXAMPLE

 1 billion transistor chip

Neglect wire capacitance and short-circuit

current

Trang 66

6 mem

2 dynamic logic mem

Trang 67

DYNAMIC POWER REDUCTION

Trang 68

ACTIVITY FACTOR ESTIMATION

 Let Pi = Prob(node i = 1)

 Pi = 1-Pi

 ai = Pi * Pi

 Completely random data has P = 0.5 and a = 0.25

 Data is often not completely random

 e.g upper bits of 64-bit words representing bank

account balances are usually 0

lower activity factor

 Depends on design, but typically a ≈ 0.1

Trang 69

SWITCHING PROBABILITY

Trang 70

 A 4-input AND is built out of two levels of gates

 Estimate the activity factor at each node if the

inputs have P = 0.5

Trang 71

CLOCK GATING

 The best way to reduce the activity is to turn off

the clock to registers in unused blocks

 Saves clock activity ( a = 1)

 Eliminates all switching activity in the block

 Requires determining if block will be used

Trang 72

 Gate capacitance

 Fewer stages of logic

 Small gate sizes

 Wire capacitance

 Good floorplanning to keep communicating blocks

close to each other

 Drive long wires with inverters or buffers rather than

complex gates

Trang 73

VOLTAGE / FREQUENCY

 Run each block at the lowest possible voltage and

frequency that meets performance requirements

 Provide separate supplies to different blocks

 Level converters required when crossing

from low to high VDD domains

 Dynamic Voltage Scaling

 Adjust VDD and f according to

workload

Trang 74

STATIC POWER

 Static power is consumed even when chip is

quiescent

 Leakage draws power from nominally OFF devices

 Ratioed circuits burn power in fight between ON

transistors

Trang 76

STATIC POWER EXAMPLE

 Revisit power estimation for 1 billion transistor

 High Vt used in all memories and in 95% of logic gates

 Junction leakage negligible

Trang 79

STACK EFFECT

 Series OFF transistors have less leakage

 Vx > 0, so N2 has negative Vgs

 Leakage through 2-stack reduces ~10x

 Leakage through 3-stack reduces further

V V

Trang 80

LEAKAGE CONTROL

 Leakage and delay trade off

 Aim for low leakage in sleep and low delay in active

mode

 To reduce leakage:

 Increase Vt: multiple V t

 Use low Vt only in critical circuits

 Increase Vs: stack effect

Input vector control in sleep

 Decrease Vb

Reverse body bias in sleep

 Or forward body bias in active mode

Trang 81

GATE LEAKAGE

 Extremely strong function of tox and Vgs

 Negligible for older processes

 Approaches subthreshold leakage at 65 nm and below

in some processes

 Control leakage in the process using tox > 10.5 Å

 High-k gate dielectrics help

 Some processes provide multiple tox

 e.g thicker oxide for 3.3 V I/O transistors

 Control leakage in circuits by limiting VDD

Trang 82

NAND3 LEAKAGE EXAMPLE

 100 nm process

Ign = 6.3 nA Igp = 0

Ioffn = 5.63 nA Ioffp = 9.3 nA

Trang 83

JUNCTION LEAKAGE

 From reverse-biased p-n junctions

 Between diffusion and substrate or well

 Ordinary diode leakage is negligible

significant

 Especially in high-Vt transistors where other leakage

is small

 Worst at Vdb = VDD

 Worst for Vgd = -VDD (or more negative)

Trang 84

POWER GATING

 Turn OFF power to blocks when they are idle to

save leakage

 Use virtual VDD (VDDV)

 Gate outputs to prevent

invalid logic levels to next block

 Voltage drop across sleep transistor degrades

performance during normal operation

 Size the transistor wide enough to minimize impact

 Switching wide sleep transistor costs dynamic

power

 Only justified when circuit sleeps long enough

84

Trang 85

LECTURE 05: CIRCUIT CHARACTERIZATION

Trang 86

TRANSISTORS + WIRES = CIRCUITS

Trang 87

HOW INTERCONNECTS CONTRIBUTE TO DELAY

 Interconnects have resistance, capacitance (and

inductance)

Interconnects increase circuit delay:

 The wire capacitance adds loading to each gate

 Long wires have significant resistance that further contribute to the delay

Interconnects increase dynamic power:

 Because of the wire capacitance

Trang 88

WIRE GEOMETRY

 Old processes had AR << 1

 Modern processes have AR  2

Trang 89

4.1 WIRE RESISTANCE

ρ = resistivity (W*m)

 R = sheet resistance (Ω/)

  is a dimensionless unit(!)

Trang 90

HOW DOES THE KIND OF METAL IMPACT

RESISTIVITY?

 Until 180 nm generation, most wires were

aluminum

 Modern processes often use copper

Trang 91

CONTACT AND VIA RESISTANCE

 Many small contacts for current crowding

around periphery

Trang 92

4.2 WIRE CAPACITANCE

 To neighbors

 To layers above and below

 Ctotal = Ctop + Cbot + 2Cadj

t

h1

h2

Trang 93

FACTORS IMPACTING THE CAPACITANCE

 Wires are not parallel plates, but obey trends

 Increasing area (W, t) increases capacitance

 Increasing distance (s, h) decreases capacitance

Trang 94

M2 CAPACITANCE DATA (180NM)

 Typical wires have ~ 0.2 fF/mm

 Compare to 2 fF/mm for gate capacitance)

 Polysilicon has lower C but high R

0 50 100 150 200 250 300 350 400

Trang 95

GIVEN R AND C, HOW TO CALCULATE

 Wires are a distributed system

 3-segment p-model is accurate to 3% in simulation

R C L-model

R C/2 C/2

R/2 R/2 C

N segments

p -model T-model

Trang 96

INTERCONNECT DELAY: THE LUMPED CASE

0V

Trang 97

INTERCONNECT DELAY: IDEAL ANALYSIS

tpd~0.38RC

Ideally, modeling using diffusion equation;

Trang 98

INTERCONNECT DELAY: DISTRIBUTED

r = resistance per unit length

c = capacitance per unit length

Trang 99

DELAY CALCULATIONS

Assuming ideal wires:

Realistic wire modeling:

Trang 100

LAYER STACK

 AMI 0.6 m m process has 3 metal layers

Trang 101

4.3 INTERCONNECTS INTRODUCE CROSS

TALK

 A capacitor does not like to change its voltage

instantaneously

 A wire has high capacitance to its neighbor

wire tends to switch too

Called capacitive coupling or crosstalk

 Crosstalk has two harmful effects:

Trang 102

A CROSSTALK IMPACTS DELAY

 Assume layers above and below on average are quiet

 Effective Cadj depends on behavior of neighbors

Miller effect

Cadj

Cgnd Cgnd

Trang 103

B CROSSTALK ALSO CREATES NOISE

 Crosstalk causes noise on nonswitching wires

Trang 104

CROSSTALK NOISE EFFECTS

 Usually victim is driven by a gate that fights noise

 Victim driver is in linear region, agg in saturation

 If sizes are same, aggressor = 2-4 x Rvictim

1 1

Trang 105

SIMULATING NOISE INDUCED BY COUPLING

Aggressor

Victim (undriven): 50%

Victim (half size driver): 16%

Victim (equal size driver): 8%

Victim (double size driver): 4%

t (ps)

0 200 400 600 800 1000 1200 1400 1800 2000 0

0.3 0.6 0.9 1.2 1.5 1.8

if disturbed by large noise spikes

• But glitches cause extra delay and power

105

Trang 107

WIDTH, SPACING, LAYER, SHIELDING

• Widening a wire reduces resistance but increases

capacitance (but less proportionally) → RC delay product

improves

• Spacing reduces capacitance → improves RC delay

• Layers

•Coupling can be avoided if adjacent lines do not switch → shield

critical nets with power or ground wires on one or both sides to

eliminate coupling

Trang 108

C REPEATER INSERTION

R and C are proportional to l

RC delay is proportional to l2

 Break long wires into N shorter segments

 Drive each one with a repeater or buffer

buffer/repeater

Two questions:

A What is the position that minimizes the delay?

B How many repeaters to insert to minimize the delay?

108

Ngày đăng: 29/07/2014, 16:21

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

w