Reliable Architectures

40 7 0
Reliable Architectures

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

Joel Emer December 7, 2005 6.823, L24-1 Reliable Architectures Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Joel Emer December 7, 2005 6.823, L24-2 Strike Changes State of a Single Bit Joel Emer December 7, 2005 6.823, L24-3 Impact of Neutron Strike on a Si Device neutron strike source drain + + + + - - Strikes release electron & hole pairs that can be absorbed by source & drain to alter the state of the device Transistor Device • Secondary source of upsets: alpha particles from packaging Joel Emer December 7, 2005 6.823, L24-4 Cosmic Rays Come From Deep Space p p n n p n n p n p n Earth’s Surface • Neutron flux is higher in higher altitudes 3x - 5x increase in Denver at 5,000 feet 100x increase in airplanes at 30,000+ feet Physical Solutions are hard Joel Emer December 7, 2005 6.823, L24-5 • Shielding? – No practical absorbent (e.g., approximately > 10 ft of concrete) – unlike Alpha particles • Technology solution: SOI? – Partially-depleted SOI of some help, effect on logic unclear – Fully-depleted SOI may help, but is challenging to manufacture • Circuit level solution? – Radiation hardened circuits can provide 10x improvement with significant penalty in performance, area, cost – 2-4x improvement may be possible with less penalty Triple Modular Redundancy (Von Neumann, 1956) M M V Result M V does a majority vote on the results Joel Emer December 7, 2005 6.823, L24-6 Dual Modular Redundancy Joel Emer December 7, 2005 6.823, L24-7 (e.g., Binac, Stratus) Error? M C Mismatch? M Error? • Processing stops on mismatch • Error signal used to decide which processor be used to restore state to other Pair and Spare Lockstep Joel Emer December 7, 2005 6.823, L24-8 (e.g., Tandem, 1975) M Primary C Mismatch? M M Backup C M • Primary creates periodic checkpoints • Backup restarts from checkpoint on mismatch Mismatch? Redundant Multithreading Joel Emer December 7, 2005 6.823, L24-9 (e.g., Reinhardt, Mukherjee, 2000) Leading Thread X W X X W C X X Fault? W C Fault? Trailing Thread X W • Writes are checked X X W X X W C Fault? Joel Emer December 7, 2005 6.823, L24-10 Component Protection Parity 1 Parity Error? ECC 1 … 0 … ECC 1 … • Fujitsu SPARC in 130 nm technology (ISSCC 2003) – 80% of 200k latches protected with parity – versus very few latches protected in commodity microprocessors Computing AVF Joel Emer December 7, 2005 6.823, L24-26 • Approach is conservative – Assume every bit is ACE unless proven otherwise • Data Analysis using a Performance Model – Prove that data held in a structure is un-ACE • Timing Analysis using a Performance Model – Tracks the time this data spent in the structure Joel Emer December 7, 2005 6.823, L24-27 Dynamic Instruction Breakdown DYNAMICALLY DEAD 20% PERFORMANCE INST 1% ACE 46% PREDICATED FALSE 7% NOP 26% Average across Spec2K slices Joel Emer December 7, 2005 6.823, L24-28 Mapping ACE & un-ACE Instructions to the Instruction Queue NOP Prefetch Architectural un-ACE ACE Inst ExACE ACE Inst Inst WrongPath Inst Idle Micro-architectural un-ACE ACE Lifetime Analysis (1) (e.g., write-through data cache) • Idle is unACE Fill Idle Read Valid Read Valid Evict Valid Idle • Assuming all time intervals are equal • For 3/5 of the lifetime the bit is valid • Gives a measure of the structure’s utilization – Number of useful bits – Amount of time useful bits are resident in structure – Valid for a particular trace Joel Emer December 7, 2005 6.823, L24-29 ACE Lifetime Analysis (2) (e.g., write-through data cache) • Valid is not necessarily ACE Fill Read Read Idle Evict Idle Write-through Data Cache • ACE % = AVF = 2/5 = 40% • Example Lifetime Components – ACE: fill-to-read, read-to-read – unACE: idle, read-to-evict, write-to-evict Joel Emer December 7, 2005 6.823, L24-30 ACE Lifetime Analysis (3) Joel Emer December 7, 2005 6.823, L24-31 (e.g., write-through data cache) • Data ACEness is a function of instruction ACEness Fill Read Read Idle Evict Idle Write-through Data Cache • Second Read is by an unACE instruction • AVF = 1/5 = 20% Joel Emer December 7, 2005 6.823, L24-32 Instruction Queue IDLE 31% Ex-ACE 10% ACE 29% NOP 15% PREDICATED FALSE 3% WRONG PATH 3% DYNAMICALLY DEAD 8% PERFORMANCE INST 1% ACE percentage = AVF = 29% Joel Emer December 7, 2005 6.823, L24-33 Strike on a bit (e.g., in register file) Bit Read? no yes benign fault no error Bit has error protection? detection & correction no no error detection only affects program outcome? yes SDC no benign fault no error affects program outcome? yes yes True DUE no no False DUE SDC = Silent Data Corruption, DUE = Detected Unrecoverable Error Joel Emer December 7, 2005 6.823, L24-34 DUE AVF of Instruction Queue with Parity True DUE AVF 29% Idle & Msc i 38% Uncommitted 6% Dynamically CPU2000 Dead Asim 11% Simpoint Itanium®2-like Neutral 16% False DUE AVF 33% Joel Emer December 7, 2005 6.823, L24-35 Sources of False DUE in an Instruction Queue • Instructions with uncommitted results – e.g., wrong-path, predicated-false – solution: π (possibly incorrect) bit till commit • Instruction types neutral to errors – e.g., no-ops, prefetches, branch predict hints – solution: anti- π bit • Dynamically dead instructions – instructions whose results will not be used in future – solution: π bit beyond commit Joel Emer December 7, 2005 6.823, L24-36 Coping with Wrong-Path Instructions (assume parity-protected instruction queue) Fetch Instruction Cache (IC) Decode inst IQ X RR DECLARE ERROR ON ISSUE Execute Data Cache • Problem: not enough information at issue Commit The π (Possibly Incorrect) Bit Joel Emer December 7, 2005 6.823, L24-37 (assume parity-protected instruction queue) Fetch Decode inst inst Instruction Cache (IC) Execute IQ RR inst (π) inst (π) inst (π) POST ERROR IN π BIT ON ISSUE Commit inst (π) Data Cache At commit point, declare error only if not wrong-path instruction and π bit is set Anti-π bit: coping with No-ops Joel Emer December 7, 2005 6.823, L24-38 (assume parity-protected instruction queue) IQ RR inst inst inst inst (anti-π) (anti-π) anti-π bit neutralizes Instruction Cache (IC) the π bit Fetch Decode Execute Commit inst inst Data Cache On issue, if the anti-π bit is set, then not set the π bit π bit: avoiding False DUE on Dynamically Dead Instructions Inst i: Inst i+n: write R1 read R1 Fetch Instruction Cache (IC) Joel Emer December 7, 2005 6.823, L24-39 write R1 write R1(π) write R1(π) write R1(π) write R1(π) read R1 read R1 (π) read R1 Decode IQ RR Execute Commit Data Cache • Declare the error on reading R1, if π bit is set • If R1 isn’t read (i.e., dynamically dead), then no False DUE • π bit can be used in caches & main memory … Joel Emer December 7, 2005 6.823, L24-40 % False DUE AVF Eliminated (PI = π) PI bit till I/O commit 12% PI bit till register commit 18% PI bit till store commit 8% PI bit till register read 14% CPU2000 Asim Simpoint Itanium®2-like anti-PI bit 48% Practical to eliminate most of the False DUE AVF

Ngày đăng: 11/10/2021, 14:22

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan