RELIABILITY, MAINTAINABILITY AND RISK Also by the same author Reliability Engineering, Pitman, 1972 Maintainability Engineering, Pitman, 1973 (with A. H. Babb) Statistics Workshop, Technis, 1974, 1991 Achieving Quality Software, Chapman & Hall, 1995 Quality Procedures for Hardware and Software, Elsevier, 1990 (with J. S. Edge) Reliability, Maintainability and Risk Practical methods for engineers Sixth Edition Dr David J Smith BSc, PhD, CEng, FIEE, FIQA, HonFSaRS, MIGasE OXFORD AUCKLAND BOSTON JOHANNESBURG MELBOURNE NEW DELHI Butterworth-Heinemann Linacre House, Jordan Hill, Oxford OX2 8DP 225 Wildwood Avenue, Woburn, MA 01801-2041 A division of Reed Educational and Professional Publishing Ltd A member of the Reed Elsevier group plc First published by Macmillan Education Ltd 1981 Second edition 1985 Third edition 1988 Fourth edition published by Butterworth-Heinemann Ltd 1993 Reprinted 1994, 1996 Fifth edition 1997 Reprinted with revisions 1999 Sixth edition 2001 © David J. Smith 1993, 1997, 2001 All rights reserved. No part of this publication may be reproduced in any material form (including photocopying or storing in any medium by electronic means and whether or not transiently or incidentally to some other use of this publication) without the written permission of the copyright holder except in accordance with the provisions of the Copyright, Designs and Patents Act 1988 or under the terms of a licence issued by the Copyright Licensing Agency Ltd, 90 Tottenham Court Road, London, England W1P 9HE. Applications for the copyright holder’s written permission to reproduce any part of this publication should be addressed to the publishers British Library Cataloguing in Publication Data Smith, David J. (David John), 1943 June 22– Reliability, maintainability and risk. – 6th ed. 1 Reliability (Engineering) 2 Risk assessment I Title 620'.00452 Library of Congress Cataloguing in Publication Data Smith, David John, 1943– Reliability, maintainability, and risk: practical methods for engineers/David J Smith. – 6th ed. p. cm. Includes bibliographical references and index. ISBN 0 7506 5168 7 1 Reliability (Engineering) 2 Maintainability (Engineering) 3 Engineering design I Title. TA169.S64 2001 620'.00452–dc21 00–049380 ISBN 0 7506 5168 7 Composition by Genesis Typesetting, Laser Quay, Rochester, Kent Printed and bound in Great Britain by Antony Rowe, Chippenham, Wiltshire Preface Acknowledgements Part One Understanding Reliability Parameters and Costs 1 The history of reliability and safety technology 1 1.1 FAILURE DATA 1 1.2 HAZARDOUS FAILURES 4 1.3 RELIABILITY AND RISK PREDICTION 5 1.4 ACHIEVING RELIABILITY AND SAFETY-INTEGRITY 6 1.5 THE RAMS-CYCLE 7 1.6 CONTRACTUAL PRESSURES 9 2 Understanding terms and jargon 2.1 DEFINING FAILURE AND FAILURE MODES 2.2 FAILURE RATE AND MEAN TIME BETWEEN FAILURES 12 2.3 INTERRELATIONSHIPS OF TERMS 14 2.4 THE BATHTUB DISTRIBUTION 16 2.5 DOWN TIME AND REPAIR TIME 17 2.6 AVAILABILITY 20 2.7 HAZARD AND RISK-RELATED TERMS 20 2.8 CHOOSING THE APPROPRIATE PARAMETER 21 EXERCISES 22 3 A cost-effective approach to quality, reliability and safety 3.1 THE COST OF QUALITY 3.2 RELIABILITY AND COST 26 3.3 COSTS AND SAFETY 29 Part Two Interpreting Failure Rates 4 Realistic failure rates and prediction confidence 4.1 DATA ACCURACY 4.2 SOURCES OF DATA 37 4.3 DATA RANGES 41 4.4 CONFIDENCE LIMITS OF PREDICTION 44 4.5 OVERALL CONCLUSIONS 46 5 Interpreting data and demonstrating reliability 5.1 THE FOUR CASES 5.2 INFERENCE AND CONFIDENCE LEVELS 5.3 THE CHI-SQUARE TEST 49 5.4 DOUBLE-SIDED CONFIDENCE LIMITS 50 5.5 SUMMARIZING THE CHI-SQUARE TEST 51 5.6 RELIABILITY DEMONSTRATION 52 5.7 SEQUENTIAL TESTING 56 5.8 SETTING UP DEMONSTRATION TESTS 57 EXERCISES 57 6 Variable failure rates and probability plotting 6.1 THE WEIBULL DISTRIBUTION 6.2 USING THE WEIBULL METHOD 60 6.3 MORE COMPLEX CASES OF THE WEIBULL DISTRIBUTION 67 6.4 CONTINUOUS PROCESSES 68 EXERCISES 69 Part Three Predicting Reliability and Risk 7 Essential reliability theory 7.1 WHY PREDICT RAMS? 7.2 PROBABILITY THEORY 7.3 RELIABILITY OF SERIES SYSTEMS 76 7.4 REDUNDANCY RULES 77 7.5 GENERAL FEATURES OF REDUNDANCY 83 EXERCISES 86 8 Methods of modelling 8.1 BLOCK DIAGRAM AND MARKOV ANALYSIS 8.2 COMMON CAUSE (DEPENDENT) FAILURE 98 8.3 FAULT TREE ANALYSIS 103 8.4 EVENT TREE DIAGRAMS 110 9 Quantifying the reliability models 9.1 THE RELIABILITY PREDICTION METHOD 9.2 ALLOWING FOR DIAGNOSTIC INTERVALS 115 9.3 FMEA (FAILURE MODE AND EFFECT ANALYSIS) 117 9.4 HUMAN FACTORS 118 9.5 SIMULATION 123 9.6 COMPARING PREDICTIONS WITH TARGETS 126 EXERCISES 127 10 Risk assessment (QRA) 10.1 FREQUENCY AND CONSEQUENCE 10.2 PERCEPTION OF RISK AND ALARP 129 10.3 HAZARD IDENTIFICATION 130 10.4 FACTORS TO QUANTIFY 135 Part Four Achieving Reliability and Maintainability 11 Design and assurance techniques 11.1 SPECIFYING AND ALLOCATING THE REQUIREMENT 11.2 STRESS ANALYSIS 145 11.3 ENVIRONMENTAL STRESS PROTECTION 148 11.4 FAILURE MECHANISMS 148 11.5 COMPLEXITY AND PARTS 150 11.6 BURN-IN AND SCREENING 153 11.7 MAINTENANCE STRATEGIES 154 12 Design review and test 12.1 REVIEW TECHNIQUES 12.2 CATEGORIES OF TESTING 156 12.3 RELIABILITY GROWTH MODELLING 160 EXERCISES 163 13 Field data collection and feedback 13.1 REASONS FOR DATA COLLECTION 13.2 INFORMATION AND DIFFICULTIES 13.3 TIMES TO FAILURE 165 13.4 SPREADSHEETS AND DATABASES 166 13.5 BEST PRACTICE AND RECOMMENDATIONS 168 13.6 ANALYSIS AND PRESENTATION OF RESULTS 169 13.7 EXAMPLES OF FAILURE REPORT FORMS 170 14 Factors influencing down time 14.1 KEY DESIGN AREAS 14.2 MAINTENANCE STRATEGIES AND HANDBOOKS 180 15 Predicting and demonstrating repair times 15.1 PREDICTION METHODS 15.2 DEMONSTRATION PLANS 201 16 Quantified reliability centred maintenance 16.1 WHAT IS QRCM? 16.2 THE QRCM DECISION PROCESS 206 16.3 OPTIMUM REPLACEMENT (DISCARD) 207 16.4 OPTIMUM SPARES 209 16.4 OPTIMUM PROOF-TEST 210 16.6 CONDITION MONITORING 211 17 Software quality/reliability 17.1 PROGRAMMABLE DEVICES 17.2 SOFTWARE FAILURES 214 17.3 SOFTWARE FAILURE MODELLING 215 17.4 SOFTWARE QUALITY ASSURANCE 217 17.5 MODERN/FORMAL METHODS 223 17.6 SOFTWARE CHECKLISTS 226 Part Five Legal, Management and Safety Considerations 18 Project management 18.1 SETTING OBJECTIVES AND SPECIFICATIONS 18.2 PLANNING, FEASIBILITY AND ALLOCATION 234 18.3 PROGRAMME ACTIVITIES 234 18.4 RESPONSIBILITIES 237 18.5 STANDARDS AND GUIDANCE DOCUMENTS 237 19 Contract clauses and their pitfalls 19.1 ESSENTIAL AREAS 19.2 OTHER AREAS 241 19.3 PITFALLS 242 19.4 PENALTIES 244 19.5 SUBCONTRACTED RELIABILITY ASSESSMENTS 246 19.6 EXAMPLE 247 20 Product liability and safety legislation 20.1 THE GENERAL SITUATION 20.2 STRICT LIABILITY 249 20.3 THE CONSUMER PROTECTION ACT 1987 250 20.4 HEALTH AND SAFETY AT WORK ACT 1974 251 20.5 INSURANCE AND PRODUCT RECALL 252 21 Major incident legislation 21.1 HISTORY OF MAJOR INCIDENTS 21.2 DEVELOPMENT OF MAJOR INCIDENT LEGISLATION 255 21.3 CIMAH SAFETY REPORTS 256 21.4 OFFSHORE SAFETY CASES 259 21.5 PROBLEM AREAS 261 21.6 THE COMAH DIRECTIVE (1999) 262 22 Integrity of safety-related systems 22.1 SAFETY-RELATED OR SAFETY-CRITICAL? 22.2 SAFETY-INTEGRITY LEVELS (SILs) 264 22.3 PROGRAMMABLE ELECTRONIC SYSTEMS (PESs) 266 22.4 CURRENT GUIDANCE 268 22.5 ACCREDITATION AND CONFORMITY OF ASSESSMENT 272 23 A case study: The Datamet Project 23.1 INTRODUCTION 23.2 THE DATAMET CONCEPT 23.3 FORMATION OF THE PROJECT GROUP 277 23.4 RELIABILITY REQUIREMENTS 278 23.5 FIRST DESIGN REVIEW 279 23.6 DESIGN AND DEVELOPMENT 281 23.7 SYNDICATE STUDY 282 23.8 HINTS 282 Appendix 1 Glossary A1 TERMS RELATED TO FAILURE A2 RELIABILITY TERMS 285 A3 MAINTAINABILITY TERMS 286 A4 TERMS ASSOCIATED WITH SOFTWARE 287 A5 TERMS RELATED TO SAFETY 289 A6 MISCELLANEOUS TERMS 290 Appendix 2 Percentage points of the Chi- square distribution Appendix 3 Microelectronics failure rates Appendix 4 General failure rates Appendix 5 Failure mode percentages Appendix 6 Human error rates Appendix 7 Fatality rates Appendix 8 Answers to exercises Appendix 9 Bibliography BOOKS OTHER PUBLICATIONS STANDARDS AND GUIDELINES JOURNALS Appendix 10 Scoring criteria for BETAPLUS common cause model 1 CHECKLIST AND SCORING FOR EQUIPMENT CONTAINING PROGRAMMABLE ELECTRONICS 2 CHECKLIST AND SCORING FOR NON-PROGRAMMABLE EQUIPMENT Appendix 11 Example of HAZOP EQUIPMENT DETAILS HAZOP WORKSHEETS POTENTIAL CONSEQUENCES Appendix 12 HAZID checklist Index [...]... After three editions Reliability, Maintainability in Perspective became Reliability, Maintainability and Risk and has now, after just 20 years, reached its 6th edition In such a fast moving subject, the time has come, yet again, to expand and update the material particularly with the results of my recent studies into common cause failure and into the correlation between predicted and achieved field reliability... the craftsman/ manufacturer and less determined by the ‘combination’ of part reliabilities Nevertheless, mass production of standard mechanical parts has been the case since early in this century Under these circumstances defective items can be identified readily, by means of 4 Reliability, Maintainability and Risk inspection and test, during the manufacturing process, and it is possible to control... activities and a typical design-cycle The top portion shows the specification and feasibility stages of design leading to conceptual engineering and then to detailed design 8 Reliability, Maintainability and Risk Figure 1.2 RAMS-Cycle model RAMS targets should be included in the requirements specification as project or contractual requirements which can include both assessment of the design and demonstration... tender and other contractual documents Mean Times Between Failure, repair times and availabilities, for both cost- and safety-related failure modes, are specified and quantified 10 Reliability, Maintainability and Risk There are problems in such contractual relationships arising from: Ambiguity of definition Hidden statistical risks Inadequate coverage of the requirements Unrealistic requirements Unmeasurable... (MDT, MTTR) There is frequently confusion between the two and it is important to understand the difference Down time, or outage, is the period during which equipment is in the failed state A formal 18 Reliability, Maintainability and Risk Table 2.2 Known as Decreasing failure rate Infant mortality Burn-in Early failures Usually related to manufacture and QA, e.g welds, joints, connections, wraps, dirt,... important that terms are fully understood before they are used and if this is achieved by defining them for specific situations, then so much the better The danger in specifying that all terms shall be defined by 22 Reliability, Maintainability and Risk a given published standard is that each person assumes that he or she knows the meaning of each term and these are not read or discussed until a dispute arises... setting up a reliability model is far more valuable than the numerical outcome 6 Reliability, Maintainability and Risk Figure 1.1 Figure 1.1 illustrates the problem of matching a reliability or risk prediction to the eventual field performance In practice, prediction addresses the component-based ‘design reliability’, and it is necessary to take account of the additional factors when assessing the... of failure rate and repair/down time which determines unavailability The design and operating features which influence down time are also taken into account in this book Achieving reliability, safety and maintainability results from activities in three main areas: 1 Design: Reduction in complexity Duplication to provide fault tolerance Derating of stress factors Qualification testing and design review... and standards 3 Field use: Adequate operating and maintenance instructions Feedback of field failure information Replacement and spares strategies (e.g early replacement of items with a known wearout characteristic) It is much more difficult, and expensive, to add reliability/safety after the design stage The quantified parameters, dealt with in Chapter 2, must be part of the design specification and. .. for permission to make use of examples from their guidance document (SR/24, Risk Assessment Techniques) ITT Europe for permission to reproduce their failure report form and the US Department of Defense for permission to quote from MIL Handbooks Part One Understanding Reliability Parameters and Costs 1 The history of reliability and safety technology Safety/Reliability engineering has not developed as . the field of hazard assessment. 4 Reliability, Maintainability and Risk 1.3 RELIABILITY AND RISK PREDICTION System modelling, by means of failure mode analysis and fault tree analysis methods,. OF RISK AND ALARP 129 10.3 HAZARD IDENTIFICATION 130 10.4 FACTORS TO QUANTIFY 135 Part Four Achieving Reliability and Maintainability 11 Design and assurance techniques 11.1 SPECIFYING AND. HAZID checklist Index Preface After three editions Reliability, Maintainability in Perspective became Reliability, Main- tainability and Risk and has now, after just 20 years, reached its 6th