The plane that healed itself

Armed with all of this knowledge about life cycles, supervision, and

DeathWatch, it’s time to make our plane into a mythical super machine that can heal itself when things go wrong. To do this, we’ll reorganize things a little and supply a bit of self-typing to allow our plane to be constructed in a more flexible manner. This will become more important as we drive toward testing what we’ve written.

The goal

In Chapter 7, we learned about the hierarchical path structure of the actor system and how we can use theactorFormethods to look up actors within that system; we even used that mechanism for finding various actors in our plane. As we move forward, we’re also moving further into the real world of

actor programming with Akka, and in the real world, things change. That’s not just a metaphorical statement—we’re actually going to change the structure of our plane, and in doing so, we’re going to break some of that hierarchy that we set up before.

/user Guardian

Plane

Lead Flight Attendant

Resume Supervisor

Stop Supervisor

Flight Attendant 1

Flight

Attendant 8 Altimeter Control Pilot CoPilot

Surfaces OneForOne

Strategy

OneForOne Strategy

OneForOne

Strategy OneForOne

Strategy Isolates children

from its own Restarts Isolates children

from its own Restarts

Watches References

Sends Events To...

Uses Uses Refe

ren ces

Refe renc

References References

OneForOne Strategy

AutoPilot

Uses

Figure 8.7ãThe structure of the plane we’re heading toward.

Figure 8.7shows the structure that we’re driving toward with our plane at this stage. Contrast this withFigure 7.3, which was rather shallow in nature.

With this new layout, we recognize that different actors require different types of supervision and we facilitate that by inserting dedicated supervisory nodes into the structure. We do this because, as we know, any given actor can only implement a single supervision strategy and the power of supervision comes from simplicity at any given level. If we have one node in the tree doing too much, we’ll put our system’s resiliency at risk. Our goal is to

define small pockets of reason in the hierarchy that react well under failure.

Our supervision strategy for the plane as a whole will be:

• The plane itself will be supervised by the user guardian, which em- ploys a OneForOneStrategy that restarts children (i.e., our plane) when a failure occurs. However, the plane isn’t going to have a problem that will cause its supervisor to restart it. We’ll talk about this later.

• The LeadFlightAttendant will be supervised by the plane’s

OneForOneStrategy, which will restart the LeadFlightAttendant

on failure. The LeadFlightAttendant will, in turn, stop all of its children (i.e., theFlightAttendants) and recreate them.

• We’ll be giving the instruments and controls a supervisor all their own.

It’ll be a dedicated supervisor, providing no value to the behaviour of the plane other than taking care of its kids, letting them resume operation on failure.

• The pilots will be different than the rest. When a pilot dies, he’s dead.

To facilitate that, we’ll create a dedicated “stop” supervisor, just like the “resume” supervisor we made for the instruments and controls.

The stop and resume supervisors will be part of the structure of our application, but they should also stay out of the way. The plane will get a couple of helper functions that it can use to look up its grandchildren. And the supervisors will ensure that, should they be restarted, their children stay alive.

AnIsolatedLifeCycleSupervisor

The IsolatedLifeCycleSupervisor will provide us with a supervisor’s base functionality that lets children survive the supervisor’s own restarts, as well as provide some extra plumbing to make that happen.

object IsolatedLifeCycleSupervisor {

// Messages we use in case we want people to be // able to wait for us to finish starting case object WaitForStart

case object Started

}

trait IsolatedLifeCycleSupervisor extends Actor { import IsolatedLifeCycleSupervisor._

def receive = {

// Signify that we've started case WaitForStart =>

sender ! Started

// We don't handle anything else, but we give a decent // error message stating the error

case m =>

throw new Exception(

s"Don't call ${self.path.name} directly ($m).") }

// To be implemented by subclass def childStarter(): Unit

// Only start the children when we're started final override def preStart() { childStarter() } // Don't call preStart(), which would be the // default behaviour

final override def postRestart(reason: Throwable) { } // Don't stop the children, which would be the

// default behaviour

final override def preRestart(reason: Throwable,

message: Option[Any]) { } }

There are some things to note here:

• The receive method is implemented for us. Ifanymessage comes into our supervisor (other than a request to be told when it’s fin- ished starting), then we’ll throw an exception. The supervisor is fault- tolerance plumbing and there’s no reason for anyone to talk to it. We assume that if someone does talk to it directly, then that’s a bug.

• We’ve ensured that most existing life cycle methods cannot be al- tered by derived types. This ensures that our isolation remains intact.

The job of creating children is left to the derivations, but outside of

preStartso that the isolator can control the life cycles on its own.

• The children of the isolated supervisor can still be restarted since they would be the survivors of the supervisor’s restart life cycle, and we know that survivors experience a restart. However, because they’ve survived, they haven’t died and theirActorRefremains unchanged.

– In fact, assuming the supervisor’s parent restarts its children, all you’d have to do to get the supervisor to experience a restart would be to send it any message other thanWaitForStart. We aren’t going to do that.

Given the IsolatedLifeCycleSupervisor, we have a class from which we can derive simple supervisors tailored for the specific needs of the actors within our plane.

Creating the supervisors

We’ll go pretty deep here and create some types that will really help us gen- erate supervisors quickly and easily. By defining some traits and some abstract classes, we can simplify instantiation of the supervisors a fair bit, thus making our code easier to read and understand.

import akka.actor.SupervisorStrategy

import akka.actor.SupervisorStrategy.Decider

trait SupervisionStrategyFactory { def makeStrategy(maxNrRetries: Int,

withinTimeRange: Duration)(decider: Decider): SupervisorStrategy }

trait OneForOneStrategyFactory extends SupervisionStrategyFactory { def makeStrategy(maxNrRetries: Int,

withinTimeRange: Duration)(decider: Decider): SupervisorStrategy = OneForOneStrategy(maxNrRetries, withinTimeRange)(decider)

}

trait AllForOneStrategyFactory extends SupervisionStrategyFactory { def makeStrategy(maxNrRetries: Int,

withinTimeRange: Duration)(decider: Decider): SupervisorStrategy =

AllForOneStrategy(maxNrRetries, withinTimeRange)(decider) }

Above we have a factory declaration and two definitions that we can use to abstract away the concrete concepts of theOneForOneStrategyand the

AllForOneStrategy, respectively. We will be mixing one of these factories into our supervisors as needed.

Given these factories, we can now define the specific instances of the

IsolatedLifeCycleSupervisorthat we declared earlier:

abstract class IsolatedResumeSupervisor(

maxNrRetries: Int = -1,

withinTimeRange: Duration = Duration.Inf) extends IsolatedLifeCycleSupervisor { this: SupervisionStrategyFactory =>

override val supervisorStrategy = makeStrategy(

maxNrRetries, withinTimeRange) {

case _: ActorInitializationException => Stop case _: ActorKilledException => Stop

case _: Exception => Resume case _ => Escalate

} }

The resume supervisor declares the Decider to be one that resumes operation of the children in the case of any exception that is not

ActorInitializationException or ActorKilledException. It also does not know what type of strategy will be in place, delegating this to a future trait mixin.

abstract class IsolatedStopSupervisor(

maxNrRetries: Int = -1,

withinTimeRange: Duration = Duration.Inf) extends IsolatedLifeCycleSupervisor { this: SupervisionStrategyFactory =>

override val supervisorStrategy = makeStrategy(

maxNrRetries, withinTimeRange) {

case _: ActorInitializationException => Stop

case _: ActorKilledException => Stop case _: Exception => Stop

case _ => Escalate }

}

The stop supervisor is nearly identical to the resume supervisor, with the obvious difference that it stops children instead of resumes them.

Now that we have these helpers in place, we can start refactoring our plane’s structure to fit the goal.

Our healing plane

Let’s break this down into pieces.

Construction components

The actor program’s hierarchical structure is a constant presence in your design. It’s a powerful structure, to be sure, but it’s also an imposing structure that can limit your options, if you use it inappropriately. We want to take advantage of the hierarchy where it can help us, but we also want to ensure that we don’t get so wrapped up in it that refactoring becomes more of a challenge than we’d like down the road. This will be a recurring theme as we move forward.

To begin, we need to apply a simple refactoring that allows us to provide construction of plane members using a variant of thecake pattern. This will allow us to escape the confines of the hierarchy that would otherwise be hard-coded.

trait PilotProvider {

def newPilot: Actor = new Pilot def newCoPilot: Actor = new CoPilot def newAutopilot: Actor = new AutoPilot }

Here, we see the trait that will provide us with different types of pilots when we need them. This trait can be overridden with derivations used in testing, or other cases, which keeps our design flexible. The plane can now take advantage of this provider, along with similar refactorings that we’ve applied to theLeadFlightAttendantand the altimeter:

class Plane extends Actor with ActorLogging { this: AltimeterProvider

with PilotProvider

with LeadFlightAttendantProvider =>

The refactorings to be applied to theLeadFlightAttendantand altimeter are left as exercises for you, should you choose to accept them.

Building the basic hierarchy

Now that we’ve decoupled the construction of the plane’s elements from the elements themselves, we can revisit the creation of them and the insertion of supervisors into the hierarchy.

// There's going to be a couple of asks below and // a timeout is necessary for that.

implicit val askTimeout = Timeout(1.second) def startEquipment() {

val controls = context.actorOf(

Props(new IsolatedResumeSupervisor with OneForOneStrategyFactory { def childStarter() {

val alt = context.actorOf(

Props(newAltimeter), "Altimeter")

// These children get implicitly added to the // hierarchy

context.actorOf(Props(newAutopilot), "AutoPilot") context.actorOf(Props(new ControlSurfaces(alt)),

"ControlSurfaces") }

}), "Equipment")

Await.result(controls ? WaitForStart, 1.second) }

The startEquipment method is in charge of creating the

IsolatedResumeSupervisor, mixing in theOneForOneStrategyFactory

and then defining the required childStarter method, demanded by the

IsolatedLifeCycleSupervisorwe created at the start. Once this method completes, we have the layout depicted inFigure 8.8.

Plane

Resume Supervisor

Altimeter Control

Surfaces AutoPilot

Restarts its children

Resumes its children

Keeps its children alive after any restarts imposed by its supervisor (i.e. the Plane)

Never restart

Figure 8.8ãAfterstartEquipmentcompletes we have this sub-hierarchy.

And now the method that starts our people:

def startPeople() {

val people = context.actorOf(

Props(new IsolatedStopSupervisor with OneForOneStrategyFactory { def childStarter() {

// These children get implicitly added to // the hierarchy

context.actorOf(Props(newPilot), pilotName) context.actorOf(Props(newCoPilot), copilotName) }

}), "Pilots")

// Use the default strategy here, which // restarts indefinitely

context.actorOf(Props(newFlightAttendant), attendantName) Await.result(people ? WaitForStart, 1.second)

}

Note the slight difference here: we’re starting the

IsolatedStopSupervisor and adding the pilots to it, but we want theLeadFlightAttendant supervised directly by the plane, so we simply add it to the plane’s list of direct children as we had before. Once this method completes, we have the layout depicted inFigure 8.9.

Why theAwait.resultwhen starting supervisors?

“Doesn’tAwait.resultblock the current thread, and isn’t blocking bad?” Saw that one, did ya? OK, blocking a thread is bad, and nor- mally we shouldn’t be seeing calls toAwaitin our code, but we’ve also got to be pragmatic.

Akka is a concurrency toolkit, and starting actors is an asynchronous process. Our isolating supervisors are the actors we’re starting, and Akka guarantees that theActorRefyou get back is entirely valid and ready for business, but it doesn’t guarantee that the actor is completely started—starting it is an asynchronous process.

But remember that our goal isn’t to start the isolating supervisor, it’s to start the isolating supervisor’schildren. One of the first things the plane is going to want to do is talk to these children, and they are not guaranteed to evenexistby the time the plane wants to do that. In order to give the plane the deterministic startup that it would have gotten without the isolating supervisors, we add the call toAwait.

Now, we could have come up with a wild asynchronous mechanism to ensure that we don’t need the call toAwaitbut that would be incredibly complicated. We’ve solved the problem withone line of codeand it only happens when the planestarts—i.e., it’s an exceedingly rare event.

Yes, blocking can be bad but it definitely has its uses. Use it wisely and your code will thank you for it.

There! You now have the hierarchy you’re interested in, which provides us with a level of resiliency on which we can build.

But. . . there’s a problem.

Things change

The introduction of our supervision nodes has broken some stuff. Look at what the pilot does when he receives theReadyToGomessage from the plane:

case ReadyToGo =>

context.parent ! Plane.GiveMeControl

copilot = context.actorFor("../" + copilotName)

Plane

Lead Flight Attendant

Stop Supervisor

Flight Attendant 1

Flight

Attendant 8 Pilot CoPilot

Restarts its children

Stops its children Keeps its children alive after restarts imposed by its supervisor (i.e. the Plane)

Restart indefinitely Stop on failure

Figure 8.9ãAfterstartPeoplecompletes we have this sub-hierarchy.

autopilot = context.actorFor("../AutoPilot")

There’s a fair bit there that depended on the original structure of the actor hierarchy, before we started inserting things and moving them around, and most of that won’t work now. Who’s the parent of the pilot? Of course, it’s theIsolatedStopSupervisornot the plane, and if we were to send it the

Plane.GiveMeControlmessage, it would throw an exception.

This is where the lack of typing on the actor comes to bite us. The

context.parent and context.actorFor return ActorRefs, not Planes orCoPilots or any other specific class that we might define, so the compiler won’t be able to catch this misstep. This isstill a good thingin general, but in situations like these we need to be vigilant about our structure.

Tip

An actor hierarchy is just like an organizational hierarchy in a large corporation; don’t depend on it looking the same on Friday as it did on Wednesday. If your actors heavily depend on their hierarchical structure, then you’re going to find a world of pain when you want to refactor that structure.

That’s not meant to be a hard-and-fast rule, but merely a guideline. Actor programming, just like any other kind of programming with a flexible toolkit such as Akka, presents you with many varied situations. You may find that relying on the actor’s hierarchical structure is the most amazing choice imag- inable in some cases, and in other cases, it paints you into a corner. However, as a general rule, we’ll find that keeping the hierarchy invisible leads to code that is more resilient in the face of change than it would be otherwise.

Note

It’s more reasonable for an actor to depend on the structure beneath it, since it is more influential in that structure than it is in the structure above it.

The structureabove any actor is completely outside of its control and it’s more brittle for that actor to make any assumptions about those an- cestors. The structure below is something that it can generally depend on much better, as is the case with our plane. The plane instantiates the

IsolatedStopSupervisorand also specifies its direct children. While the children of any given actor could conceivably alter that structure in a way that exposes brittleness in the actors above it, this is much less likely and much more in your control as the coder of the actor in question.

Bypassing the structure

To reduce any ripple effect of changes to the actor system’s structure, we can employ a couple of facilities. The first is a comfortable friend, the dependency injector, and the other simply falls into the category ofmessage passing.

Dependency injection works well when we have one-way dependencies and a uniform method of construction. For example, it’s easy to pass the notion of the “parent” to a “child” on that child’s construction. It doesn’t work so well when you’ve got a circular dependency between actors or a construction pattern that simply doesn’t lend itself well to a deterministic assignment (for example, it’s the actor itself that decides what it needs, not someone on the outside).

With message passing, we can do things such as:

• Ask a known actor to do something for us, which it will dele- gate to an actor that it already knows, such as the plane delegating to the LeadFlightAttendant. Essentially, we present the plane as a facade on top of theLeadFlightAttendant and hide the

LeadFlightAttendantfrom the world.

• Request a reference to an existing actor, such as asking the plane for theControlSurfaces.

• Get a reference to an actor without asking. We don’t have an example of this at the moment, so you might imagine the con-

nection of a client to your application via WebSocket where that

WebSocketis represented by an actor. Thus, your actor may receive a WebSocketConnected(webSocketActor) message indicating the event has occurred.

Injecting into the pilot

We can specify four elements of our plane nicely within classPilot’s con- structor:

class Pilot(plane: ActorRef, autopilot: ActorRef, var controls: ActorRef,

altimeter: ActorRef) extends Actor {

This alteration to thePilotconstructor is simple dependency injection.

Dependency injection, a very common idiom, is easy to do, read, and understand. Use it when you can.

• Passing in the plane eliminates the need to access context.parent

directly, which frees up the plane to impose any intermediaries between itself and the pilot that it sees fit.

• The altimeter and the autopilot won’t change during the plane’s life cycle and can thus be passed in as well.

• We keep theControlSurfacesas avar, since the pilot could give up control of the plane at a later time, but it’s the pilot that’s in charge of the plane initially, so we let the plane give him control at startup.

Given this, we can now make our changes to the plane, which will now construct the pilot correctly:

// Helps us look up Actors within the "Equipment" Supervisor def actorForControls(name: String) =

context.actorFor("Equipment/" + name) def startPeople() {

val plane = self

// Note how we depend on the Actor structure beneath

// us here by using actorFor(). This should be // resilient to change, since we'll probably be the // ones making the changes

val controls = actorForControls("ControlSurfaces") val autopilot = actorForControls("AutoPilot") val altimeter = actorForControls("Altimeter") val people = context.actorOf(

Props(new IsolatedStopSupervisor with OneForOneStrategyFactory { def childStarter() {

// These children get implicitly added // to the hierarchy

context.actorOf(

Props(newCoPilot(plane, autopilot, altimeter)), copilotName)

context.actorOf(

Props(newPilot(plane, autopilot, controls, altimeter)), pilotName)

}

}), "Pilots")

// Use the default strategy here, which // restarts indefinitely

context.actorOf(Props(newFlightAttendant), attendantName) Await.result(people ? WaitForStart, 1.second)

}

Of course, this requires changes to the definition of thePilotProvider

that provides the newPilotmethod. It no longer takes zero arguments and we need to compensate for that, but those changes are simple enough that we won’t include them here.

Great! Now our pilot is isolated from the plane’s structure; except for access to its sibling actor, the copilot as can be seen by the code that still remains in theReadyToGohandler:

case ReadyToGo =>

copilot = context.actorFor("../" + copilotName)

A critical look at shared-state concurrency

You grabbed the right toolkit