Chapter 18 - Recovery and fault tolerance. This chapter discusses recovery and fault tolerance techniques used in a distributed operating system. Resiliency, which is a technique for minimizing the impact of a fault, is also discussed.
PROPRIETARY MATERIAL. © 2007 The McGrawHill Companies, Inc. All rights reserved. No part of this PowerPoint slide may be displayed, reproduced or distributed in any form or by any means, without the prior written permission of the publisher, or used beyond the limited distribution to teachers and educators permitted by McGrawHill for their individual course preparation. If you are a student using this PowerPoint slide, you are using it without permission. Chapter 17: Distributed Control Algorithms Dhamdhere: Operating Systems— A ConceptBased Approach , 2 ed Slide No: 1 Copyright © 2008 OS control functions in a distributed environment • Special features of distributed OS control functions – Mutual exclusion * Involves synchronization of processes in different computers – Deadlock handling * Deadlocks may involve use of resources in different computers – Scheduling * Perform load balancing to ensure uniform loading of computers – Termination detection * Check whether all processes of a computation, which may operate in different computers, have completed – Election * Elect a coordinator for a privileged function like resource allocation Chapter 17: Distributed Control Algorithms Dhamdhere: Operating Systems— A ConceptBased Approach , 2 ed Slide No: 2 Copyright © 2008 Nature of a distributed control algorithm • A distributed control function offers services to both system and user processes – It operates in parallel with its clients • Following terminology is used to differentiate between the distributed control algorithm and its clients – Basic computation: Operation of a client * Interprocess messages used by it are called basic messages – Control computation: Operation of the control algorithm * Interprocess messages exchanged in the control computation are called control messages – Basic part and control part of a process * Participate in the basic and control computations, respectively Chapter 17: Distributed Control Algorithms Dhamdhere: Operating Systems— A ConceptBased Approach , 2 ed Slide No: 3 Copyrightâ2008 Basic and control parts of a process Pi The basic part of Pi interacts with basic parts of other processes through basic messages; analogously for control part of Pi • The control part provides services such as resource allocation to the basic part Chapter 17: Distributed Control Algorithms Dhamdhere: Operating Systems— A ConceptBased Approach , 2 ed Slide No: 4 Copyrightâ2008 Correctness of a distributed control algorithm Processes of a distributed control algorithm exchange control data and coordinate through control messages – New correctness issues arise because * Exchange of control messages incurs delays Control data used in processes may become stale or may appear inconsistent – Hence correctness has two new facets * Liveness The algorithm must eventually perform the correct action * Safety The algorithm must not perform wrong actions Chapter 17: Distributed Control Algorithms Dhamdhere: Operating Systems— A ConceptBased Approach , 2 ed Slide No: 5 Copyright © 2008 Liveness and safety of distributed control algorithms Chapter 17: Distributed ControlAlgorithms Dhamdhere:OperatingSystems AConceptưBasedApproach,2ed SlideNo:6 Copyrightâ2008 Distributed mutual exclusion algorithms At any time, at most one process may be in a CS for a data item ds – Permission-based algorithms * A process seeks the permission of a set of processes and enters a CS only when all processes in the set have granted the permission – Token-based algorithms * A token represents a special privilege of some kind, e.g., privilege to enter a CS * Only a process possessing the token may perform a privileged operation * Processes send the token to one another to pass on the privilege Chapter17:Distributed ControlAlgorithms Dhamdhere:OperatingSystems AConceptưBasedApproach,2ed SlideNo:7 Copyrightâ2008 RicartAgrawala algorithm Steps of the algorithm Process wishing to enter CS sends time-stamped requests to all other processes When a process receives a request a If it is not interested in entering CS, it sends a ‘go ahead’ immediately b If it is also interested in entering CS, it sends a ‘go ahead’ only if the received request’s time-stamp < its own time-stamp c If it is in a CS, it adds the request to the pending list When a process receives n -1 ‘go ahead’ replies, it enters CS When a process exits a CS, it sends ‘go ahead’ replies to each request in its pending list The algorithm requires x (n – 1) messages per CS entry Chapter 17: Distributed Control Algorithms Dhamdhere: Operating Systems— A ConceptBased Approach , 2 ed Slide No: 8 Copyright © 2008 Basic and control actions in Ricart–Agrawala algorithm 1, 2(b), 3 Chapter 17: Distributed Control Algorithms Dhamdhere: Operating Systems— A ConceptBased Approach , 2 ed Slide No: 9 Copyrightâ2008 Maekawa algorithm Each process has a request set of processes; it seeks the permission of only processes in the request set (Ri represents the request set of process Pi) – Correctness is ensured through the following rules: * For all Pi : Pi is included in Ri * For all Pi, Pj: Ri ∩ Rj is non-null – The algorithm requires x √n messages per CS entry Chapter 17: Distributed Control Algorithms Dhamdhere: Operating Systems— A ConceptBased Approach , 2 ed Slide No: 10 Copyright © 2008 Mitchell–Merritt algorithm for distributed deadlock detection • It is an edge chasing algorithm—control messages are sent over WFG edges to detect cycles – A provision is made to ensure that the cycle has not been broken before it was detected * Each process is assigned a public label and a private label The labels are identical when a process is created The public label of a process changes when it gets blocked on a resource request It also changes when it waits for a process having a larger public label * A wait-for edge with a specific relation between public and private labels of its source and destination processes indicates presence of a deadlock Chapter 17: Distributed Control Algorithms Dhamdhere: Operating Systems— A ConceptBased Approach , 2 ed Slide No: 22 Copyright © 2008 Rules of Mitchell–Merritt algorithm • Block rule changes the labels of a process when it blocks; z = inc(u, x), where inc generates a unique label larger than both u and x • The transmit rule changes public label of a process waiting for a process with a larger public label • Outedge of the process with largest label in a cycle satisfies detect condition Chapter 17: Distributed Control Algorithms Dhamdhere:OperatingSystems AConceptưBasedApproach,2ed SlideNo:23 Copyrightâ2008 Distributed deadlock prevention Cycles are prevented as follows: – A pair (local time, node id) is used to time-stamp creation of a process – When process Pi requests a resource allocated to Pj, timestamps of Pi, Pj are used to decide whether Pi can wait for Pj • Two approaches – Wait-or-die * Pi is allowed to wait if it is older than Pj; otherwise, it is killed – Would-or-wait * Pi is allowed to wait if it is younger than Pj; otherwise Pj is killed Chapter 17: Distributed Control Algorithms Dhamdhere: Operating Systems— A ConceptBased Approach , 2 ed Slide No: 24 Copyright © 2008 Distributed scheduling algorithms • Computational load in nodes is balanced through the technique of process migration • Process Pi is migrated from N1 to N2 during the interval t1–t2 • It is preemptive migration Chapter 17: Distributed Control Algorithms Dhamdhere: Operating Systems— A ConceptBased Approach , 2 ed Slide No: 25 Copyrightâ2008 Distributed scheduling algorithms Issues in distributed scheduling Kinds of process migration * Preemptive migration requires transfer of state—hard to implement * Non-preemptive migration is performed while creating a process— avoids need to transfer state – Identifying nodes for process migration by quantifying ‘load’ * Heavily loaded nodes become sender nodes, lightly loaded nodes become receiver nodes Use CPU utilization as the criterion: causes overhead Use length of ready queue: easier to implement – Stability of a scheduling algorithm: An algorithm is unstable if, under some conditions, its overhead is unbounded * Excessive shuffling of processes between nodes causes instability Chapter 17: Distributed Control Algorithms Dhamdhere: Operating Systems— A ConceptBased Approach , 2 ed Slide No: 26 Copyright © 2008 Distributed scheduling algorithms • Three kinds of distributed scheduling algorithms – Sender initiated algorithms * Thresholds on load are used to identify senders and receivers * A sender node migrates a process non-preemptively at its creation Sender node polls other nodes to identify a receiver node Instability at high load: prevent by limiting the amount of polling – Receiver initiated algorithm * When a process completes, the node checks whether it has become a receiver and migrates a process preemptively to itself * No instability At high load, senders are easy to find At low load, CPU time is available to support high polling cost Chapter 17: Distributed ControlAlgorithms Dhamdhere:OperatingSystems AConceptưBasedApproach,2ed SlideNo:27 Copyrightâ2008 Distributed scheduling algorithms Three kinds of distributed scheduling algorithms (contd) – Symmetrically initiated algorithms * Has features of both sender and receiver initiated algorithms Behaves like sender initiated algorithm at low loads Behaves like receiver initiated algorithm at high loads – Outline of a symmetrically initiated algorithm * Each node maintains lists of senders, receivers and ok nodes * A sender node polls nodes in the receivers list If the node is a receiver node, a process is migrated If the node is not a receiver, it is put into appropriate list * Analogously, a receiver node polls nodes in the senders list Chapter 17: Distributed Control Algorithms Dhamdhere: Operating Systems— A ConceptBased Approach , 2 ed Slide No: 28 Copyright © 2008 Performance of distributed scheduling algorithms Chapter 17: Distributed Control Algorithms Dhamdhere: Operating Systems— A ConceptBased Approach , 2 ed Slide No: 29 Copyright © 2008 Distributed termination detection • Processes of a distributed computation execute in different nodes of a distributed system – These processes perform work assigned to them * A process is active when it is performing work, and passive when it has no work * Work is assigned to a process through a message Hence it may become active on receiving a message – Distributed termination condition (DTC) detects whether such a computation has completed It consists of two parts * All processes of the computation are passive * No basic messages are in transit Chapter 17: Distributed Control Algorithms Dhamdhere: Operating Systems— A ConceptBased Approach , 2 ed Slide No: 30 Copyright © 2008 Distributed termination detection • A diffusion computation-based algorithm – Following assumptions are made * Processes are not created or destroyed dynamically during operation of the algorithm * Interprocess communication channels are FIFO * Processes communicate through synchronous communication, i.e., the process sending a message is blocked until a response is received Chapter 17: Distributed Control Algorithms Dhamdhere: Operating Systems— A ConceptBased Approach , 2 ed Slide No: 31 Copyright © 2008 Distributed termination detection— A diffusion computation-based algorithm When a process becomes passive a Sends “shall I declare termination?” messages on all edges b After receiving replies to all messages: Declares termination if all replies are “yes” When a process receives an engaging query a Send queries on all edges, except the one along which it received engaging query b After receiving replies to all messages: Send a “yes” reply to process from which it received engaging query if all received replies are “yes”; otherwise, send a “no” reply When a process receives a non-engaging query a Send a “yes” reply Chapter17:Distributed ControlAlgorithms Dhamdhere:OperatingSystems AConceptưBasedApproach,2ed SlideNo:32 Copyrightâ2008 Election algorithms Algorithms for a ring topology – Algorithm 1: Process Pi initiates by sending (“elect me”, Pi) message Every process Pj receiving an (“elect me”, Pi) message sends an (“elect me”, Pj) message and then forwards Pi’s message Pi receives back its own message after receiving message of every other process; it elects the highest priority process as leader a It sends a “new coordinator” message to inform others – Algorithm 2: Refinement of algorithm * In Step 2, Pj sends only one message: Its own message if its priority is higher than Pi’s; otherwise, it sends Pi’s message * Only highest priority process would get back its own message * Requires 3n–1 messages instead of n2 messages Chapter17:Distributed ControlAlgorithms Dhamdhere:OperatingSystems AConceptưBasedApproach,2ed SlideNo:33 Copyrightâ2008 Election algorithm Bully algorithm Initiator Pi sends (“elect me”, Pi) messages to all higher priority processes and starts a time-out interval T1 a If a time-out occurs, it sends a “new coordinator” message to lower priority processes b If it receives a “don’t you dare” message from a higher priority process Pj, it starts another time-out interval T2 i If a time-out occurs, it assumes that all high priority processes have failed and starts a new election If a process Pj receives an “elect me” message from a lower priority process a It sends a “don’t you dare” message to its sender b Starts a new election by sending (“elect me”, Pj) messages Chapter 17: Distributed Control Algorithms Dhamdhere: Operating Systems— A ConceptBased Approach , 2 ed Slide No: 34 Copyright © 2008 Resource allocation in a distributed system 1. Pi requests resource allocator for a specific resource 2. Resource allocator consults name server, finds id of the resource 3. Resource allocator informs requester and resource manager of resource 4. Requester accesses the resource. 5. At end, resource is released Chapter 17: Distributed Control Algorithms Dhamdhere:OperatingSystems AConceptưBasedApproach,2ed SlideNo:35 Copyrightâ2008 Process migration Process migration is performed for load balancing – Difficulties * Process state is distributed in various data structures of the OS * Process id’s may change due to migration Process id’s are used in interprocess communication Solution: Use global process ids as in Sun cluster * Delivery of messages requires a special provision A node receiving a message would redirect it if the destination process has migrated out of it » This residual state causes poor reliability Alternatively, all processes may be informed when a process is migrated » Requires a complex protocol Chapter 17: Distributed Control Algorithms Dhamdhere: Operating Systems— A ConceptBased Approach , 2 ed Slide No: 36 Copyright © 2008 ... Dhamdhere: Operating Systems— A ConceptBased Approach , 2 ed Slide No: 8 Copyright © 2008 Basic and control actions in Ricart–Agrawala algorithm 1, 2(b), 3 Chapter 17: Distributed Control Algorithms Dhamdhere: Operating Systems—... Chapter 17: Distributed Control Algorithms Dhamdhere: Operating Systems— A ConceptBased Approach , 2 ed Slide No: 13 Copyright © 2008 Raymond’s token-based algorithm (a) A system (b) Abstract inverted tree for the system: P5 holds the token... computation are called control messages – Basic part and control part of a process * Participate in the basic and control computations, respectively Chapter 17: Distributed Control Algorithms Dhamdhere: Operating Systems—