Fault Tolerant Systems PPT



Dependable computer systems are required in applications which involve human life or large economics. In this course we study the theory and practice of design of such system both at hardware and software level. We will cover the following topics.

    Dependability     concepts:     dependable system, techniques for achieving dependability,     dependability measures, fault, error, failure, and classification of faults     and failures. Dependability measures and reliability calculation.
    Fault tolerant     strategies: Fault detection, masking, containment, location,     reconfiguration, and recovery.
    Fault tolerant     design techniques: Hardware redundancy, software redundancy, time     redundancy, and information redundancy.
    Fault tolerance     in real-time systems: Time-space tradeoff, imprecise computation, (m,k)-firm     deadline model, fault tolerant scheduling algorithms.
    Dependable     communication:     Dependable channels, survivable networks, fault-tolerant     routing.
    Fault tolerance     in distributed systems: Building blocks: consensus protocols, fault diagnosis, clock synchronization, stable storage and RAID architectures;  checkpointing and recovery; atomic actions; data     replication and resiliency.
      Dependability       evaluation techniques and tools: Fault trees, Markov chains, Petri Nets; Case studies.
Analysis of fault tolerant hardware and software architectures.


Case studies of dependable systems.


Reading of some of the state-of-the-art research material.
Dependability Concepts
Lecture 01
 [ PPT ] [pdf]
Lecture 02
[       PPT ]  [pdf]
Lecture 03
[ PPT ] [pdf]
Fault-Tolerant (FT) Design Techniques
Lecture 04
 [ PPT       ] [pdf]
Lecture 05
 [ PPT ] [pdf]
Information Redundancy – self reading
Dependability Modeling
Reliability, MTTF, etc. PPT
Fault Tree Analysis
Petri Nets
FT in Distributed Systems
Stable storage — RAID PPT
Stable storage – advanced RAID PPT
Consensus PPT
Clock Synchronization PPT
System-level diagnosis PPT
Checkpoint and Rollback recovery PPT
Atomic actions — Lock & Commit Protocols PPT
Replica management protocols PPT
FT in Networks
Dependable communication – 1 PPT
Primary-backup path PPT
Fault Localization PPT
Dependability-Security PDF
FT in Real-Time Systems
Lecture 06
[ PPT ] [pdf]
Lecture 07
[ PPT ] [pdf]
Lecture 08
[ PPT ] [pdf]
Spring 2010 Student Presentations
Recovery-Oriented Computing – Peter Scott PDF
ZFS – a RAID based file system – Henri Bai http://hub.opensolaris.org/bin/view/Community+Group+zfs/docs
2-dimensional error coding – Long Chen PDF
Software based fault detection – Tim Prince PPT
Self Recovery of Server Programs – Chesta Dwivedi PPT
Dynamic Fault Trees – Ashok Aditya PPT
Device Failure Tolerance Using Software – Haribabu Narayanan PPT
FPGA Fault Tolerance – Matt Clausman PPT
Byzantine Storage – Debkanta Chakraborty PPT
Spring 2009 Student Presentations
Fault-Tolerant Internet Services — Indranil Roy PPT
Checkpoint Recovery in Petaflop systems — Paul Jennings PPT
Highly Available Systems – Case Study — Cory Kleinheksel PPT
Fault-Tolerant TCP Server — Preethika K. PPT
Fault-Tolerant CORBA (NVP implementation) — Indranil Roy PPT
Fault-Tolerant Multipath Routing – Ganesan Mani PPT
Petri Net modeling – Phased Mission – Siddharth Sridhar PPT
Spring 2007 Student Presentations
Energy-aware scheduling Weakly-hard real-time systems (Julie Rursch) PPT
Fault-Tolerance in Multiprocessor SoC (Premkumar) PPT
Fountain Codes (Long Long) PPT
Network Time Protocol (Lizandro) PPT
RAID architectures (Russell Graves) PDF
Spring 2006 Student Presentations
Architecture fault-tolerance (Viswanathan) PPT
Advanced Quorum protocols (Kamna) PPT
Fault-tolerant objects (Bebek) PPT
Hierarchical system-level Diagnosis (Qin Wen) PPT
 Checkpointing in mobile systems (Ben) PPT
Dependability and Security (Srdjan) PPT
 Decidability and Schedulability — Timed Automata  () PPT
System-level diagnosis in adhoc networks (Kavitha) PPT1, PPT2
NOTE: You can print the handout slides from Microsoft Powerpoint.