Fault Tolerance: Principles and Practice by Peter A. LeeFault Tolerance: Principles and Practice by Peter A. Lee

Fault Tolerance: Principles and Practice

byPeter A. Lee, Thomas Anderson

Paperback | January 7, 2012

Pricing and Purchase Info

$115.43 online 
$137.95 list price save 16%
Earn 577 plum® points

Prices and offers may vary in store


In stock online

Ships free on orders over $25

Not available in stores


The production of a new version of any book is a daunting task, as many authors will recognise. In the field of computer science, the task is made even more daunting by the speed with which the subject and its supporting technology move forward. Since the publication of the first edition of this book in 1981 much research has been conducted, and many papers have been written, on the subject of fault tolerance. Our aim then was to present for the first time the principles of fault tolerance together with current practice to illustrate those principles. We believe that the principles have (so far) stood the test of time and are as appropriate today as they were in 1981. Much work on the practical applications of fault tolerance has been undertaken, and techniques have been developed for ever more complex situations, such as those required for distributed systems. Nevertheless, the basic principles remain the same.
Title:Fault Tolerance: Principles and PracticeFormat:PaperbackDimensions:320 pagesPublished:January 7, 2012Publisher:Springer-Verlag/Sci-Tech/TradeLanguage:English

The following ISBNs are associated with this title:

ISBN - 10:3709189926

ISBN - 13:9783709189924

Look for similar items by category:


Table of Contents

1 Introduction.- Fault Prevention and Fault Tolerance.- Anticipated and Unanticipated Faults.- Book Aim.- References.- 2 System Structure and Dependability.- System Structure.- Systems.- System Model.- Software/Hardware Interaction.- Interpreter Model of Systems.- Component Model of Systems.- Measures and Mechanisms.- Atomic Actions.- System Dependability and Reliability.- Dependability.- Failure and Reliability.- System Specification.- Multiple Specifications.- Erroneous Transitions and States.- Component/Design Failures.- Errors and Faults.- Fault Classifications.- Summary.- References.- 3 Fault Tolerance.- Fault Tolerance: How.- Principles of Fault Tolerance.- Redundancy.- Fault Tolerance: Where and How Much.- Quantitative Reliability Evaluation.- Hardware Reliability Models.- Software Reliability Models.- An Implementation Framework.- Exceptions and Exception Handling.- Classification of Exceptions.- Exception Handling in Software Systems.- Exception Propagation.- Summary of Exception Handling.- References.- 4 Fault Tolerant Systems.- ESS No. lA.- System Description.- Reliability Strategies.- SIFT and Ftmp.- SIFT System Design.- SIFT Reliability Strategies.- FTMP System Design.- FTMP Reliability Strategies.- Tandem.- Tandem Reliability Strategies.- Stratus.- Stratus Reliability Strategies.- References.- 5 Error Detection.- Measures for Error Detection.- Ideal Checks.- Types of Check.- Replication Checks.- Timing Checks.- Reversal Checks.- Coding Checks.- Reasonableness Checks.- Structural Checks.- Diagnostic Checks.- Mechanisms for Error Detection.- Structuring Error Detection in Systems.- References.- 6 Damage Confinement and Assessment.- Damage Confinement.- Measures for Damage Confinement.- Measures for Damage Assessment.- Mechanisms for Damage Confinement.- Protection Mechanisms.- Mechanisms for Damage Assessment.- Summary.- References.- 7 Error Recovery.- Concepts of Error Recovery.- State Restoration.- Forward and Backward Error Recovery.- Measures for Forward Error Recovery.- Backward Error Recovery.- Facilities for Backward Error Recovery.- Measures For Backward Error Recovery.- Mechanisms For Backward Error Recovery.- Checkpoints and Audit Trails.- The Recovery Cache.- Unrecoverable Components.- Recovery in Hierarchical Systems.- Recovery in Concurrent Systems.- Concurrent Processes.- Recovery for Competing Processes.- Recovery for Cooperating Processes.- Distributed Systems.- Recovery in Idealised Fault Tolerant Components.- Summary.- References.- 8 Fault Treatment and Continued Service.- Fault Location.- System Repair.- Resuming Normal Service.- Idealised Fault Tolerant Components.- Summary.- References.- 9 Software Fault Tolerance.- The Recovery Block Scheme.- Implementation of Recovery Blocks.- The Utility of Recovery Blocks.- Acceptance Tests.- Run-Time Overheads.- Experiments With Recovery Blocks.- Summary of Recovery Blocks.- The N-Version Programming Scheme.- Implementation of N-Version Programming.- Voting Check.- Experiments With N-Version Programming.- Summary of N-Version Programming.- Comparison with the Recovery Block Scheme.- Summary.- References.- 10 Conclusion.- Methodology and Framework for Fault Tolerance.- Idealised Fault Tolerant Components.- Failure Exceptions.- Critical Components.- The Future.- References.- References.- Annotated Bibliography.- Multiple Sources.- Fault Tolerant Systems.- August Systems.- COMTRAC.- COPRA.- C.vmp.- ESS Systems (Bell Laboratories).- Fault Tolerant Multiprocessor (FTMP).- Fault Tolerant Spaceborne Computer (FTSC).- IBM 9020.- JPL-STAR Computer.- MARS.- Plessey System 250.- Pluribus.- PRIME.- Sequoia.- Software Implemented Fault Tolerance (SIFT).- Space Shuttle Computer Complex.- Stratus.- Tandem.- VOTRICS.- Software Fault Tolerance.- Multiple Source.- Recovery Blocks.- N-Version Programming.- Other Software Fault Tolerance Papers.- Exception Handling.