# The Tortoise and the Hare in TLA+

Problem: Determine if a linked list contains a cycle, using O(1) space.

Robert Floyd invented an algorithm for solving this problem in the late 60s, which is called “The Tortoise and the Hare”[1].

(This is supposedly a popular question to ask in technical interviews. I’m not a fan of expecting interviewees to re-invent the algorithms of famous computer scientists on the spot).

As an exercise, I implemented the algorithm in TLA+, using PlusCal.

The algorithm itself is very simple: the hardest part was deciding how to model linked lists in TLA+. We want to model the set of all linked lists, to express that algorithm should work for any element of the set. However, the model checker can only work with finite sets. The typical approach is to do something like “the set of all linked lists which contain up to N nodes”, and then run the checker against different values of N.

What I ended up doing was generating N nodes, giving each node a successor

(a next element, which could be the terminal value NIL), and then selecting

the start of the list from the set of nodes:

CONSTANTS N

ASSUME N \in Nat

Nodes == 1..N

NIL == CHOOSE NIL : NIL \notin Nodes

start \in Nodes
succ \in [Nodes -> Nodes \union {NIL}]


(Note: I originally defined NIL as a constant, but this is incorrect, see the comment by Ron Pressler for more details).
This is not the most efficient way from a model checker point of view, because the model checker will generate nodes that are irrelevant because they aren’t in the list. However, it does generate all possible linked lists up to length N.

Superficially, this doesn’t look like the pointer-and-structure linked list you’d see in C, but it behaves the same way at a high level. It’s possible to model a memory address space and pointers in TLA+, but I chose not to do so.

In addition, the nodes of a linked list typically have an associated value, but Floyd’s algorithm doesn’t use this value, so I didn’t model it.

Here’s my implementation of the algorithm:

EXTENDS Naturals

CONSTANTS N

ASSUME N \in Nat

Nodes == 1..N

NIL == CHOOSE NIL : NIL \notin Nodes

(*
--fair algorithm TortoiseAndHare

variables
start \in Nodes,
succ \in [Nodes -> Nodes \union {NIL}],
cycle, tortoise, hare, done;
begin
h0: tortoise := start;
hare := start;
done := FALSE;
h1: while ~done do
h2: tortoise := succ[tortoise];
hare := LET hare1 == succ[hare] IN
IF hare1 \in DOMAIN succ THEN succ[hare1] ELSE NIL;
h3: if tortoise = NIL \/ hare = NIL then
cycle := FALSE;
done := TRUE;
elsif tortoise = hare then
cycle := TRUE;
done := TRUE;
end if;
end while;

end algorithm
*)

I wanted to use the model checker to verify the the implementation was correct:

PartialCorrectness == pc="Done" => (cycle <=> HasCycle(start))


(See the Correctness wikipedia page for why this is called “partial correctness”).

To check correctness, I needed to implement my HasCycle operator (without resorting to Floyd’s algorithm). I used the transitive closure of the successor function for this, which is called TC here. If the transitive closure contains the pair (node, NIL), then the list that starts with node does not contain a cycle:

HasCycle(node) == LET R == {<<s, t>> \in Nodes \X (Nodes \union {NIL}): succ[s] = t }
IN <<node, NIL>> \notin TC(R)


The above definition is “cheating” a bit, since we’ve defined a cycle by what it isn’t. It also wouldn’t work if we allowed infinite lists, since those don’t have cycles nor do they have NIL nodes.

Here’s a better definition: a linked list that starts with node contains a cycle if there exists some node n that is reachable from node and from itself.

HasCycle2(node) ==
LET R == {<<s, t>> \in Nodes \X (Nodes \union {NIL}): succ[s] = t }
IN \E n \in Nodes : /\ <<node, n>> \in TC(R)
/\ <<n, n>> \in TC(R)


To implement the transitive closure in TLA+, I used an existing implementation

from the TLA+ repository itself:

TC(R) ==
LET Support(X) == {r[1] : r \in X} \cup {r[2] : r \in X}
S == Support(R)
RECURSIVE TCR(_)
TCR(T) == IF T = {}
THEN R
ELSE LET r == CHOOSE s \in T : TRUE
RR == TCR(T \ {r})
IN  RR \cup {<<s, t>> \in S \X S :
<<s, r>> \in RR /\ <<r, t>> \in RR}
IN  TCR(S)


The full model is the lorin/tla-tortoise-hare repo on GitHub.

1. Thanks to Reginald Braithwaite for the reference in his excellent book Javascript Allongé.  ↩

# Death of a pledge as systems failure

Caitlin Flanagan has written a fantastic and disturbing piece for the Atlantic entitled Death at a Penn State Fraternity.

This line really jumped out at me:

Fraternities do have a zero-tolerance policy regarding hazing. And that’s probably one of the reasons Tim Piazza is dead.

The official policy of the fraternities is that hazing is forbidden. Because this is the official policy, it is the individuals in a particular frat house that are held responsible if hazing happens, not the national fraternity organization.

This policy has had the effect of insulating the organizations from being liable, but it hasn’t stopped hazing from being widespread: according to Flanagan, 80% of fraternity members report being hazed.

Because individual fraternity members are the ones that are on the hook if something goes wrong during hazing, reporting an injury carries risk, which means the member must make a decision involving a tradeoff. In the case documented above, that tradeoff led to a nineteen year old dying of his injuries.

This example really reinforces ideas around systems thinking: the introduction of the zero-tolerance policy did not have the intended effect. Because the culture of hazing remains, the policy ended up making things worse.

# Antics, Drift and Chaos

Here’s the talk I gave at Strange Loop 2017.

# Assumption of rationality

Matthew Reed wrote a post about Lisa Servon’s book “The Unbanking of America”. This comment stood out for me (emphasis mine):

By treating her various sources as intelligent people responding rationally to their circumstances, rather than as helpless victims of evil predators, [Servon] was able to stitch together a pretty good argument for why people make the choices they make.

In its approach, it reminded me a little of Tressie McMillan Cottom’s “Lower Ed” or Matthew Desmond’s “Evicted.”  In their different ways, each book addresses a policy question that is usually framed in terms of smart, crafty, evil people taking advantage of clueless, ignorant, poor people, and blows up the assumption.  In no case are predators let off the hook, but the “prey” are actually (mostly) capable and intelligent people doing the best they can.  Understanding why this is the best they can do, and what would give them better options, leads to a very different set of prescriptions.

Sidney Dekker calls this perspective the local rationality principle. It assumes that people make decisions that are reasonable given the constraints that they are working within, even though from the outside those decisions appear misguided.

I find this assumption of rationality to be a useful frame for explaining individual behavior. It’s worth putting in the effort to identify why a particular decision would have seemed rational within the context in which it was made.

# A conjecture on why reliable systems fail

Even highly reliable systems go down occasionally. After having read over the details of several incidents, I’ve started to notice a pattern, which has led me to the following conjecture:

Once a system reaches a certain level of reliability, most major incidents will involve:

• A manual intervention that was intended to mitigate a minor incident, or
• Unexpected behavior of a subsystem whose primary purpose was to improve reliability

Here are three examples from Amazon’s post-mortem write-ups of major AWS outages:

The S3 outage on February 28, 2017 involved a manual intervention to debug an issue that was causing the S3 billing system to progress more slowly than expected.

The DynamoDB outage on September 20, 2015 (which also affected SQS, auto scaling, and CloudWatch) involved healthy storage servers taking themselves out of service by executing a distributed protocol that was (presumably) designed that way for fault tolerance.

The EBS outage on October 22, 2012 (which also affected EC2, RDS, and ELBs) involved a memory leak bug in an agent that monitors the health of EBS servers.

# Engineering research reveals wrongdoing

The New York Times has a story today, Inside VW’s Campaign of Trickery, about how Volkswagon conspired to hide their excessive diesel emissions from California regulators.

What was fascinating to me was that the emission violations were discovered by mechanical engineering researchers at West Virginia University, Dan Carder, Hemanth Kappanna, and Marc Besch (Kappanna and Besch were graduate students at the time).

The presence of high levels of lead in Flint, Michigan drinking water was also discovered by an engineering researcher: Marc Edwards, a civil engineering professor at Virginia Tech.

It’s a reminder that regulators alone aren’t sufficient to ensure safety, and that academic engineering research can have a real impact on society.

# Sketching on the back end

In his essay entitled, The Cognitive View: A Different Look at Software Design, Robert Glass made the following observation about how software engineers do design work:

What we see here is a mental process, a very rapid process, an iterative process, a process in fact of fast trial and error. The mind forms a solution to the problem, knowing that it will be inadequate because the mind is not yet able to fully grasp all the facets of the problem. That problem solution, the mind knows, must be in the form of a model, because it is going to be necessary to try sample input against the model, run a quick simulation (inside the mind) on the model using the sample input, and get sample outputs (still inside the mind) from that simulation.

The essence of design, then, is rapid modeling and simulation. And a key factor in design is the ability to propose solutions and allow them to fail!

Design work incorporates trial and error. We think up a potential solution, and then try to evaluate whether it’s a good fit for the problem.

But not all potential solutions are so easy to evaluate entirely inside one’s head. In fields where designers are working on physical designs or visual interfaces, they use sketches to help with early solution evaluation, sometimes called formative evaluation. Sketching user experiences is a great book on this topic.

For those of us working on the back-end, we don’t think of this early evaluation of solutions as involving “sketching”, but the metaphor fits. One of the things I find appealing about the Alloy modeling language is that it allows you to sketch out different data models and check their properties.

In a recent blog post, Michael Fogus describes working with clojure.spec as a form of sketching. He also makes the following observation about JSON.

While Lisp programmers have long know the utility of sketching their structs, I’m convinced that one of the primary reasons that JSON has taken over the world is that it provides JS-direct syntactic literals that can be easy typed up and manipulated with little fuss. JSON is a perfectly adequate tool for sketching for many programmers.

I believe UML failed because the designers did not understand the role that box-and-arrow diagrams played in software engineering design. They wanted to build a visual modeling language with well-defined semantics: we wanted to sketch.