D ELAY DEFINITIONS tpdr: rising propagation delay Max time: From input to rising output crossing VDD/2 tpdf: falling propagation delay Max time: From input to falling output cr
Trang 1Design and Implementation
of VLSI Systems
Lecture 05
Thuan Nguyen Faculty of Electronics and Telecommunications,
University of Science, VNU HCMUS
Spring 2011
1
Trang 2LECTURE 05: CIRCUIT CHARACTERIZATION &
Trang 4 Critical paths are those which require attention
to timing details
Timing analyzer is a design tool that
automatically finds the slowest path in a logic
The architecture/ microarchitecture level
The logic level
The circuit level
Trang 5D ELAY DEFINITIONS
tpdr: rising propagation delay
Max time: From input to rising output crossing VDD/2
tpdf: falling propagation delay
Max time: From input to falling output crossing VDD/2
tpd: average propagation delay tpd = (tpdr + tpdf)/2
tcdr: rising contamination (best-case) delay
Min time: From input to rising output crossing VDD/2
tcdf: falling contamination (best-case) delay
Min time: From input to falling output crossing VDD/2
tcd: average contamination delay tcd = (tcdr + tcdf)/2
Trang 6HOW TO CALCULATE DELAY? JUST RUN SPICE!
(V)
0.0 0.5 1.0 1.5 2.0
t(s) 0.0 200p 400p 600p 800p 1n
tpdf = 66ps tpdr = 83ps
Vin
Vout
•Time consuming
•Not very useful for designers in evaluating different options
and optimizing different parameters
• We need a simple way to estimate delay for “what if” scenarios
Trang 7T RANSISTOR RESISTANCE
In the linear region
•Not accurate, but at least shows that the resistance is
Trang 8SWITCH-LEVEL RC MODELS
effective resistance R
transistor of the same size due to the pMOS mobility
of double-unit width has effective resistance R
A transistor of k unit width has kC capacitance and R/k
k g
Trang 9CALCULATE K
Trang 10EXAMPLE: 3-INPUT NAND GATE
to achieve effective rise and fall resistances equal to a unit inverter (R)
3
3
2 2
2
3
C = Cgate + Csource diffusion + Cdrain diffusion
To keep estimation simple
Cgate = Cdiffusion
o The capacitance consists of
gate capacitance and
source/drain diffusion
capacitance
Trang 11EXAMPLE: 3-INPUT NAND GATE
3 3
3C 3C
3C 3C 3C
diffusion capacitance
9C 3C 3C 3
3 3
2 2
2
5C 5C 5C
Trang 12ELMORE DELAY MODEL
ON transistors look like resistors
Elmore delay of RC ladder
Trang 13COMPUTING THE RISE AND FALL DELAYS
Estimate rising and falling propagation delays of
a 2-input NAND driving h identical gates
h copies
6C 2C 2
2
2 2
Trang 142 2
4hC B
A
x
Y
R (6+4h)C Y
Trang 15DIFFUSION CAPACITANCE
7C 3C 3C 3
3 3
2 2
2
3C
2C 2C
3C 3C
Isolated Contacted Diffusion Merged
We assumed contacted diffusion on every s / d
Good layout minimizes diffusion area
Ex: NAND3 layout shares one diffusion contact
Trang 17LECTURE 05: CIRCUIT CHARACTERIZATION &
Trang 18 Chip designers face a bewildering array of choices
What is the best circuit topology for a function?
How many stages of logic give least delay?
How wide should the transistors be?
Logical effort is a method to make these decisions
Uses a simple model of delay
Allows back-of-the-envelope calculations
alternatives
? ? ?
Trang 19Motoroil 68W86, an embedded automotive processor
Help Ben design the decoder for a register file
Decoder specifications:
16 word register file
Each bit presents load of 3 unit-sized transistors
Trang 20DELAY COMPONENTS
Parasitic delay (due to gate own diffusion capacitance)
R/2
R/2
Trang 21DELAY IN A LOGIC GATE
Delay has two components: d = f + p
f : effort delay = gh (a.k.a stage effort)
g : logical effort
Measures relative ability of gate to deliver
current
g 1 for inverter
h : electrical effort = Cout / Cin
Ratio of output to input capacitance
Sometimes called fanout
p: parasitic delay
Represents delay of gate driving no load
Set by internal parasitic capacitance
abs
d d
3RC 3 ps in 65 nm process
60 ps in 0.6 m m process
Trang 220 1 2 3 4 5 6
Trang 23COMPUTING LOGICAL EFFORT
DEF: Logical effort is the ratio of the input
capacitance of a gate to the input capacitance of
an inverter delivering the same output current
Measure from delay vs fanout plots
Or estimate by counting transistor widths
B
Y
A B
Y 1
2
2 2
4 4
Trang 26EXAMPLE: RING OSCILLATOR
Estimate the frequency of an N-stage ring
Trang 27EXAMPLE: FO4 INVERTER
Estimate the delay of a fanout-of-4 (FO4) inverter
Trang 28LIMITATIONS OF LINEAR DELAY MODEL
Trang 29LIMITATIONS OF LINEAR DELAY MODEL
Input Arrival Times
Trang 30LIMITATIONS OF LINEAR DELAY MODEL
Gate-Source Capacitance
Trang 31LIMITATIONS OF LINEAR DELAY MODEL
Bootstrapping
Trang 32MULTISTAGE LOGIC NETWORKS
Logical effort generalizes to multistage networks
Path Logical Effort
Path Electrical Effort
Path Effort
i
out-path in-path
C H
Trang 33MULTISTAGE LOGIC NETWORKS
Logical effort generalizes to multistage networks
Path Logical Effort
Path Electrical Effort
Trang 34PATHS THAT BRANCH
No! Consider paths that branch:
Trang 35BRANCHING EFFORT
Introduce branching effort
Accounts for branching between stages in path
Trang 36MULTISTAGE DELAYS
Path Effort Delay
Path Parasitic Delay
Trang 37DESIGNING FAST CIRCUITS
Delay is smallest when each stage bears same effort
This is a key result of logical effort
Find fastest possible delay
Doesn’t require calculating gate sizes
Trang 38GATE SIZES
How wide should the gates be for least delay?
transformation to find input capacitance of each
gate given load it drives
Check work by verifying input cap spec is met
ˆ
ˆ
out in
i i
C C
i out in
g C C
f
Trang 39EXAMPLE: 3-STAGE PATH
Select gate sizes x and y for least delay from A to
B
8
x x
B
Trang 40EXAMPLE: 3-STAGE PATH
x
y y
45 45
Trang 41EXAMPLE: 3-STAGE PATH
y = 45 * (5/3) / 5 = 15
x = (15*2) * (5/3) / 5 = 10
P: 4 N: 4
N: 3
8
x x
B
Trang 42BEST NUMBER OF STAGES
Minimizing number of stages is not always fastest
Example: drive 64-bit datapath with unit
2 8 18
3 4 15
4 2.8 15.3
Trang 43 Consider adding inverters to end of path
How many give least delay?
Define best stage effort
N - n1 Extra Inverters Logic Block:
n1 Stages Path Effort F
1 1
1 1
Trang 44BEST STAGE EFFORT
has no closed-form solution
Neglecting parasitics (pinv = 0), we find = 2.718 (e)
For pinv = 1, solve numerically for = 3.59
inv
Trang 451.0 2.0 0.5 0.7 1.4
N / N
1.15
1.26 1.51
( =2.4) ( =6)
Trang 46EXAMPLE, REVISITED
68W86, an embedded automotive processor Help Ben
design the decoder for a register file
Decoder specifications:
16 word register file
Each bit presents load of 3 unit-sized transistors
How fast can decoder operate?
Trang 47 Try a 3-stage design
Trang 48GATE SIZES & DELAY
Trang 50C C
C C
h
on-path off-path on-path
C C C
b
f gh f
p
d f p g
1
Trang 51METHOD OF LOGICAL EFFORT
1) Compute path effort
2) Estimate best number of stages
3) Sketch path with N stages
4) Estimate least delay
5) Determine best stage effort
6) Find gate sizes
g C C
f
Trang 52LIMITS OF LOGICAL EFFORT
Need path to compute G
But don’t know number of stages without G
Simplistic delay model
Neglects input rise time effects
Interconnect
Iteration required in designs with wire
Not minimum area/power for constrained delay
Trang 53 Logical effort is useful for thinking of delay in
circuits
Numeric logical effort characterizes gates
NANDs are faster than NORs in CMOS
Paths are fastest when effort delays are ~4
Path delay is weakly sensitive to stages, sizes
But using fewer stages doesn’t mean faster paths
Delay of path is about log4F FO4 inverter delays
Inverters and NAND2 best for driving large caps
Provides language for discussing fast circuits
But requires practice to master
Trang 54 Homework Assignment #4 View
Submit your answer in the next week
Trang 56POWER AND ENERGY
Power is drawn from a voltage source attached to
the VDD pin(s) of a chip
Trang 57POWER IN CIRCUIT ELEMENTS
2 0
Trang 58CHARGING A CAPACITOR
Energy stored in capacitor is
transistor as heat, other half stored in capacitor
2 1
Trang 59SWITCHING WAVEFORMS
Example: VDD = 1.0 V, CL = 150 fF, f = 1 GHz
Trang 60Tf CV T
Trang 61ACTIVITY FACTOR
Suppose the system clock frequency = f
Let fsw = af, where a = activity factor
If the signal is a clock, a = 1
If the signal switches once per cycle, a = ½
2
Trang 62SHORT CIRCUIT CURRENT
networks may be momentarily ON at once
Leads to a blip of “short circuit” current
< 10% of dynamic power if rise/fall times are
comparable for input and output
We will generally ignore this component
Trang 63POWER DISSIPATION SOURCES
Ptotal = Pdynamic + Pstatic
Dynamic power: Pdynamic = Pswitching + Pshortcircuit
Switching load capacitances
Trang 64DYNAMIC POWER BREAKUP
Interconnect 51%
Gate 34%
Diffusion 15%
Total dynamic Power 64
Trang 65DYNAMIC POWER EXAMPLE
1 billion transistor chip
Neglect wire capacitance and short-circuit
current
Trang 666 mem
2 dynamic logic mem
Trang 67DYNAMIC POWER REDUCTION
Trang 68ACTIVITY FACTOR ESTIMATION
Let Pi = Prob(node i = 1)
Pi = 1-Pi
ai = Pi * Pi
Completely random data has P = 0.5 and a = 0.25
Data is often not completely random
e.g upper bits of 64-bit words representing bank
account balances are usually 0
lower activity factor
Depends on design, but typically a ≈ 0.1
Trang 69SWITCHING PROBABILITY
Trang 70 A 4-input AND is built out of two levels of gates
Estimate the activity factor at each node if the
inputs have P = 0.5
Trang 71CLOCK GATING
The best way to reduce the activity is to turn off
the clock to registers in unused blocks
Saves clock activity ( a = 1)
Eliminates all switching activity in the block
Requires determining if block will be used
Trang 72 Gate capacitance
Fewer stages of logic
Small gate sizes
Wire capacitance
Good floorplanning to keep communicating blocks
close to each other
Drive long wires with inverters or buffers rather than
complex gates
Trang 73VOLTAGE / FREQUENCY
Run each block at the lowest possible voltage and
frequency that meets performance requirements
Provide separate supplies to different blocks
Level converters required when crossing
from low to high VDD domains
Dynamic Voltage Scaling
Adjust VDD and f according to
workload
Trang 74STATIC POWER
Static power is consumed even when chip is
quiescent
Leakage draws power from nominally OFF devices
Ratioed circuits burn power in fight between ON
transistors
Trang 76STATIC POWER EXAMPLE
Revisit power estimation for 1 billion transistor
High Vt used in all memories and in 95% of logic gates
Junction leakage negligible
Trang 79STACK EFFECT
Series OFF transistors have less leakage
Vx > 0, so N2 has negative Vgs
Leakage through 2-stack reduces ~10x
Leakage through 3-stack reduces further
V V
Trang 80LEAKAGE CONTROL
Leakage and delay trade off
Aim for low leakage in sleep and low delay in active
mode
To reduce leakage:
Increase Vt: multiple V t
Use low Vt only in critical circuits
Increase Vs: stack effect
Input vector control in sleep
Decrease Vb
Reverse body bias in sleep
Or forward body bias in active mode
Trang 81GATE LEAKAGE
Extremely strong function of tox and Vgs
Negligible for older processes
Approaches subthreshold leakage at 65 nm and below
in some processes
Control leakage in the process using tox > 10.5 Å
High-k gate dielectrics help
Some processes provide multiple tox
e.g thicker oxide for 3.3 V I/O transistors
Control leakage in circuits by limiting VDD
Trang 82NAND3 LEAKAGE EXAMPLE
100 nm process
Ign = 6.3 nA Igp = 0
Ioffn = 5.63 nA Ioffp = 9.3 nA
Trang 83JUNCTION LEAKAGE
From reverse-biased p-n junctions
Between diffusion and substrate or well
Ordinary diode leakage is negligible
significant
Especially in high-Vt transistors where other leakage
is small
Worst at Vdb = VDD
Worst for Vgd = -VDD (or more negative)
Trang 84POWER GATING
Turn OFF power to blocks when they are idle to
save leakage
Use virtual VDD (VDDV)
Gate outputs to prevent
invalid logic levels to next block
Voltage drop across sleep transistor degrades
performance during normal operation
Size the transistor wide enough to minimize impact
Switching wide sleep transistor costs dynamic
power
Only justified when circuit sleeps long enough
84
Trang 85LECTURE 05: CIRCUIT CHARACTERIZATION
Trang 86TRANSISTORS + WIRES = CIRCUITS
Trang 87HOW INTERCONNECTS CONTRIBUTE TO DELAY
Interconnects have resistance, capacitance (and
inductance)
Interconnects increase circuit delay:
The wire capacitance adds loading to each gate
Long wires have significant resistance that further contribute to the delay
Interconnects increase dynamic power:
Because of the wire capacitance
Trang 88WIRE GEOMETRY
Old processes had AR << 1
Modern processes have AR 2
Trang 894.1 WIRE RESISTANCE
ρ = resistivity (W*m)
R = sheet resistance (Ω/)
is a dimensionless unit(!)
Trang 90HOW DOES THE KIND OF METAL IMPACT
RESISTIVITY?
Until 180 nm generation, most wires were
aluminum
Modern processes often use copper
Trang 91CONTACT AND VIA RESISTANCE
Many small contacts for current crowding
around periphery
Trang 924.2 WIRE CAPACITANCE
To neighbors
To layers above and below
Ctotal = Ctop + Cbot + 2Cadj
t
h1
h2
Trang 93FACTORS IMPACTING THE CAPACITANCE
Wires are not parallel plates, but obey trends
Increasing area (W, t) increases capacitance
Increasing distance (s, h) decreases capacitance
Trang 94M2 CAPACITANCE DATA (180NM)
Typical wires have ~ 0.2 fF/mm
Compare to 2 fF/mm for gate capacitance)
Polysilicon has lower C but high R
0 50 100 150 200 250 300 350 400
Trang 95GIVEN R AND C, HOW TO CALCULATE
Wires are a distributed system
3-segment p-model is accurate to 3% in simulation
R C L-model
R C/2 C/2
R/2 R/2 C
N segments
p -model T-model
Trang 96INTERCONNECT DELAY: THE LUMPED CASE
0V
Trang 97INTERCONNECT DELAY: IDEAL ANALYSIS
tpd~0.38RC
Ideally, modeling using diffusion equation;
Trang 98INTERCONNECT DELAY: DISTRIBUTED
r = resistance per unit length
c = capacitance per unit length
Trang 99DELAY CALCULATIONS
Assuming ideal wires:
Realistic wire modeling:
Trang 100LAYER STACK
AMI 0.6 m m process has 3 metal layers
Trang 1014.3 INTERCONNECTS INTRODUCE CROSS
TALK
A capacitor does not like to change its voltage
instantaneously
A wire has high capacitance to its neighbor
wire tends to switch too
Called capacitive coupling or crosstalk
Crosstalk has two harmful effects:
Trang 102A CROSSTALK IMPACTS DELAY
Assume layers above and below on average are quiet
Effective Cadj depends on behavior of neighbors
Miller effect
Cadj
Cgnd Cgnd
Trang 103B CROSSTALK ALSO CREATES NOISE
Crosstalk causes noise on nonswitching wires
Trang 104CROSSTALK NOISE EFFECTS
Usually victim is driven by a gate that fights noise
Victim driver is in linear region, agg in saturation
If sizes are same, aggressor = 2-4 x Rvictim
1 1
Trang 105SIMULATING NOISE INDUCED BY COUPLING
Aggressor
Victim (undriven): 50%
Victim (half size driver): 16%
Victim (equal size driver): 8%
Victim (double size driver): 4%
t (ps)
0 200 400 600 800 1000 1200 1400 1800 2000 0
0.3 0.6 0.9 1.2 1.5 1.8
if disturbed by large noise spikes
• But glitches cause extra delay and power
105
Trang 107WIDTH, SPACING, LAYER, SHIELDING
• Widening a wire reduces resistance but increases
capacitance (but less proportionally) → RC delay product
improves
• Spacing reduces capacitance → improves RC delay
• Layers
•Coupling can be avoided if adjacent lines do not switch → shield
critical nets with power or ground wires on one or both sides to
eliminate coupling
Trang 108C REPEATER INSERTION
R and C are proportional to l
RC delay is proportional to l2
Break long wires into N shorter segments
Drive each one with a repeater or buffer
buffer/repeater
Two questions:
A What is the position that minimizes the delay?
B How many repeaters to insert to minimize the delay?
108