Fault Tolerant Computer Architecture-P1 pptx

10 223 0
Fault Tolerant Computer Architecture-P1 pptx

Đang tải... (xem toàn văn)

Thông tin tài liệu

Fault Tolerant Computer Architecture iii Chapter Title here Kratos Editor Mark D. Hill, University of Wisconsin, Madison Synthesis Lectures on Computer Architecture publishes 50 to 150 page publications on topics pertaining to the science and art of designing, analyzing, selecting and interconnecting hardware components to create computers that meet functional, performance and cost goals. Fault Tolerant Computer Architecture Daniel Sorin 2009 The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines Luiz André Barroso and Urs Hölzle 2009 Computer Architecture Techniques for Power-Efficiency Stefanos Kaxiras and Margaret Martonosi 2008 Chip Mutiprocessor Architecture: Techniques to Improve Throughput and Latency Kunle Olukotun, Lance Hammond, James Laudon 2007 Transactional Memory James R. Larus, Ravi Rajwar 2007 Quantum Computing for Computer Architects Tzvetan S. Metodi, Frederic T. Chong 2006 Synthesis Lectures on Computer Architecture Copyright © 2009 by Morgan & Claypool All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means—electronic, mechanical, photocopy, recording, or any other except for brief quotations in printed reviews, without the prior permission of the publisher. Fault Tolerant Computer Architecture Daniel Sorin www.morganclaypool.com ISBN: 9781598299533 paperback ISBN: 9781598299540 ebook DOI: 10.2200/S00192ED1V01Y200904CAC005 A Publication in the Morgan & Claypool Publishers series SYNTHESIS LECTURES ON COMPUTER ARCHITECTURE Lecture #5 Series Editor: Mark D. Hill, University of Wisconsin, Madison Series ISSN ISSN 1935-3235 print ISSN 1935-3243 electronic Fault Tolerant Computer Architecture Daniel J. Sorin Duke University SYNTHESIS LECTURES ON COMPUTER ARCHITECTURE #5 ABSTRACT For many years, most computer architects have pursued one primary goal: performance. Architects have translated the ever-increasing abundance of ever-faster transistors provided by Moore’s law into remarkable increases in performance. Recently, however, the bounty provided by Moore’s law has been accompanied by several challenges that have arisen as devices have become smaller, includ- ing a decrease in dependability due to physical faults. In this book, we focus on the dependability challenge and the fault tolerance solutions that architects are developing to overcome it. The two main purposes of this book are to explore the key ideas in fault-tolerant computer architecture and to present the current state-of-the-art—over approximately the past 10 years—in academia and industry. vi KEYWORDS fault tolerance (or fault tolerant), reliability, dependability, computer architecture, error detection, error recovery, fault diagnosis, self-repair, autonomous, dynamic verification vii “To Deborah, Jason, and Julie” DedicationDedication viii I would like to thank my family for their support while I was writing this lecture. I would also like to thank Mark Hill for inviting me to write this lecture and Mike Morgan for organizing the produc- tion of the lecture. Valuable feedback on early drafts of the lecture was provided by Babak Falsafi, Jude Rivers, and Mark Hill. I would also like to thank Lihao Xu for helping me with a question about error coding. Acknowledgments 1. Introduction 1 1.1 Goals of this Book 1 1.2 Faults, Errors, and Failures 2 1.2.1 Masking 2 1.2.2 Duration of Faults and Errors 3 1.2.3 Underlying Physical Phenomena 3 1.3 Trends Leading to Increased Fault Rates 5 1.3.1 Smaller Devices and Hotter Chips 5 1.3.2 More Devices per Processor 6 1.3.3 More Complicated Designs 6 1.4 Error Models 7 1.4.1 Error Type 7 1.4.2 Error Duration 8 1.4.3 Number of Simultaneous Errors 8 1.5 Fault Tolerance Metrics 9 1.5.1 Availability 9 1.5.2 Reliability 10 1.5.3 Mean Time to Failure 10 1.5.4 Mean Time Between Failures 10 1.5.5 Failures in Time 10 1.5.6 Architectural Vulnerability Factor 11 1.6 The Rest of This Book 12 1.7 References 13 2. Error Detection 19 2.1 General Concepts 19 2.1.1 Physical Redundancy 19 2.1.2 Temporal Redundancy 22 Contents ix 2.1.3 Information Redundancy 22 2.1.4 The End-to-End Argument 25 2.2 Microprocessor Cores 27 2.2.1 Functional Units 27 2.2.2 Register Files 29 2.2.3 Tightly Lockstepped Redundant Cores 29 2.2.4 Redundant Multithreading Without Lockstepping 30 2.2.5 Dynamic Verification of Invariants 34 2.2.6 High-Level Anomaly Detection 39 2.2.7 Using Software to Detect Hardware Errors 41 2.2.8 Error Detection Tailored to Specific Fault Models 42 2.3 Caches and Memory 44 2.3.1 Error Code Implementation 44 2.3.2 Beyond EDCs 45 2.3.3 Detecting Errors in Content Addressable Memories 46 2.3.4 Detecting Errors in Addressing 47 2.4 Multiprocessor Memory Systems 48 2.4.1 Dynamic Verification of Cache Coherence 49 2.4.2 Dynamic Verification of Memory Consistency 50 2.4.3 Interconnection Networks 52 2.5 Conclusions 52 2.6 References 52 3. Error Recovery 61 3.1 General Concepts 61 3.1.1 Forward Error Recovery 61 3.1.2 Backward Error Recovery 62 3.1.3 Comparing the Performance of FER and BER 68 3.2 Microprocessor Cores 69 3.2.1 FER for Cores 69 3.2.2 BER for Cores 69 3.3 Single-Core Memory Systems 71 3.3.1 FER for Caches and Memory 71 3.3.2 BER for Caches and Memory 72 3.4 Issues Unique to Multiprocessors 73 x FAULT TOLERANT COMPUTER ARCHITECTURE . in fault- tolerant computer architecture and to present the current state-of-the-art—over approximately the past 10 years—in academia and industry. vi KEYWORDS fault tolerance (or fault tolerant) ,. hardware components to create computers that meet functional, performance and cost goals. Fault Tolerant Computer Architecture Daniel Sorin 2009 The Datacenter as a Computer: An Introduction to. LECTURES ON COMPUTER ARCHITECTURE Lecture #5 Series Editor: Mark D. Hill, University of Wisconsin, Madison Series ISSN ISSN 1935-3235 print ISSN 1935-3243 electronic Fault Tolerant Computer Architecture Daniel

Ngày đăng: 03/07/2014, 19:20

Mục lục

    Fault Tolerant Computer Architecture

    Synthesis Lectures on Computer Architecture

    1.1 GOALS OF THIS BOOK

    1.2 FAULTS, ERRORS, AND FAILURES

    1.2.2 Duration of Faults and Errors

    1.3 TRENDS LEADING TO INCREASED FAULT RATES

    1.3.1 Smaller Devices and Hotter Chips

    1.3.2 More Devices per Processor

    1.4.3 Number of Simultaneous Errors

    1.5.3 Mean Time to Failure

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan