# Lorin Hochstein

Ramblings about software, research and other things

## The Tortoise and the Hare in TLA+

Problem: Determine if a linked list contains a cycle, using O(1) space.

Robert Floyd invented an algorithm for solving this problem in the late 60s, which is called “The Tortoise and the Hare”[1].

(This is supposedly a popular question to ask in technical interviews. I’m not a fan of expecting interviewees to re-invent the algorithms of famous computer scientists on the spot).

As an exercise, I implemented the algorithm in TLA+, using PlusCal.

The algorithm itself is very simple: the hardest part was deciding how to model linked lists in TLA+. We want to model the set of all linked lists, to express that algorithm should work for any element of the set. However, the model checker can only work with finite sets. The typical approach is to do something like “the set of all linked lists which contain up to N nodes”, and then run the checker against different values of N.

What I ended up doing was generating N nodes, giving each node a successor
(a next element, which could be the terminal value NIL), and then selecting
the start of the list from the set of nodes:

CONSTANTS N, NIL

Nodes == 1..N

start \in Nodes
succ \in [Nodes -> Nodes \union {NIL}]


This is not the most efficient way from a model checker point of view, because the model checker will generate nodes that are irrelevant because they aren’t in the list. However, it does generate all possible linked lists up to length N.

Superficially, this doesn’t look like the pointer-and-structure linked list you’d see in C, but it behaves the same way at a high level. It’s possible to model a memory address space and pointers in TLA+, but I chose not to do so.

In addition, the nodes of a linked list typically have an associated value, but Floyd’s algorithm doesn’t use this value, so I didn’t model it.

Here’s my implementation of the algorithm:

EXTENDS Naturals

CONSTANTS N, NIL

Nodes == 1..N

(*
--fair algorithm TortoiseAndHare

variables
start \in Nodes,
succ \in [Nodes -> Nodes \union {NIL}],
cycle, tortoise, hare, done;
begin
h0: tortoise := start;
hare := start;
done := FALSE;
h1: while ~done do
h2: tortoise := succ[tortoise];
hare := LET hare1 == succ[hare] IN
IF hare1 \in DOMAIN succ THEN succ[hare1] ELSE NIL;
h3: if tortoise = NIL \/ hare = NIL then
cycle := FALSE;
done := TRUE;
elsif tortoise = hare then
cycle := TRUE;
done := TRUE;
end if;
end while;

end algorithm
*)

I wanted to use the model checker to verify the the implementation was correct:

PartialCorrectness == pc="Done" => (cycle <=> HasCycle(start))


(See the Correctness wikipedia page for why this is called “partial correctness”).

To check correctness, I needed to implement my HasCycle operator (without resorting to Floyd’s algorithm). I used the transitive closure of the successor function for this, which is called TC here. If the transitive closure contains the pair (node, NIL), then the list that starts with node does not contain a cycle:

HasCycle(node) == LET R == {<<s, t>> \in Nodes \X (Nodes \union {NIL}): succ[s] = t }
IN <<node, NIL>> \notin TC(R)


To implement the transitive closure in TLA+, I used an existing implementation
from the TLA+ repository itself:

TC(R) ==
LET Support(X) == {r[1] : r \in X} \cup {r[2] : r \in X}
S == Support(R)
RECURSIVE TCR(_)
TCR(T) == IF T = {}
THEN R
ELSE LET r == CHOOSE s \in T : TRUE
RR == TCR(T \ {r})
IN  RR \cup {<<s, t>> \in S \X S :
<<s, r>> \in RR /\ <<r, t>> \in RR}
IN  TCR(S)


The full model is the lorin/tla-tortoise-hare repo on GitHub.

1. Thanks to Reginald Braithwaite for the reference in his excellent book Javascript Allongé.  ↩

Written by Lorin

October 16, 2017 at 12:00 am

Posted in software

Tagged with

## Sketching on the back end

In his essay entitled, The Cognitive View: A Different Look at Software Design, Robert Glass made the following observation about how software engineers do design work:

What we see here is a mental process, a very rapid process, an iterative process, a process in fact of fast trial and error. The mind forms a solution to the problem, knowing that it will be inadequate because the mind is not yet able to fully grasp all the facets of the problem. That problem solution, the mind knows, must be in the form of a model, because it is going to be necessary to try sample input against the model, run a quick simulation (inside the mind) on the model using the sample input, and get sample outputs (still inside the mind) from that simulation.

The essence of design, then, is rapid modeling and simulation. And a key factor in design is the ability to propose solutions and allow them to fail!

Design work incorporates trial and error. We think up a potential solution, and then try to evaluate whether it’s a good fit for the problem.

But not all potential solutions are so easy to evaluate entirely inside one’s head. In fields where designers are working on physical designs or visual interfaces, they use sketches to help with early solution evaluation, sometimes called formative evaluation. Sketching user experiences is a great book on this topic.

For those of us working on the back-end, we don’t think of this early evaluation of solutions as involving “sketching”, but the metaphor fits. One of the things I find appealing about the Alloy modeling language is that it allows you to sketch out different data models and check their properties.

In a recent blog post, Michael Fogus describes working with clojure.spec as a form of sketching. He also makes the following observation about JSON.

While Lisp programmers have long know the utility of sketching their structs, I’m convinced that one of the primary reasons that JSON has taken over the world is that it provides JS-direct syntactic literals that can be easy typed up and manipulated with little fuss. JSON is a perfectly adequate tool for sketching for many programmers.

I believe UML failed because the designers did not understand the role that box-and-arrow diagrams played in software engineering design. They wanted to build a visual modeling language with well-defined semantics: we wanted to sketch.

Written by Lorin

February 20, 2017 at 2:54 am

Posted in software

## I’d rather read code than tests

But the benefits [of test-driven development] go beyond that. If you want to know how to call a certain API, there is a test that does it. If you want to know how to create a certain object, there is a test that does it. Anything you want to know about the existing system, there is a test that demonstrates it. The tests are like little design documents, little coding examples, that describe how the system works and how to use it.

Have you ever integrated a third party library into your project? You got a big manual full of nice documentation. At the end there was a thin appendix of examples. Which of the two did you read? The examples of course! That’s what the unit tests are! They are the most useful part of the documentation. They are the living examples of how to use the code. They are design documents that are hideously detailed, utterly unambiguous, so formal that they execute, and they cannot get out of sync with the production code.

TheThreeRulesOfTdd, “Uncle” Bob Martin

I sat down today with a co-worker today to discuss a pull request I had submitted earlier in the day.

“Yeah”, he replied, to which he amended “…well, I didn’t read your tests.”

I wasn’t really surprised at that, since (a) it’s the non-test code that I’m worried about, and (b) I often don’t read tests in pull requests either.

Still, I paused for a moment to discuss this with him, because I remembered Uncle Bob’s paean to unit tests as documentation for code. My colleague admitted that he didn’t read tests because reading the tests required more effort than reading the code.

On further reflection, I realized that I, too, never look at tests when I’m trying to understand an unfamiliar codebase. I just read the code directly.

What’s going on here? My theory is: unit tests are too noisy to be useful for understanding.

For APIs, yes, examples are wonderful: if I need to write code that will call into somebody else’s API, I can never get enough examples. But for most of the code that I read, I’m not trying to understand how to call into it like a library, I’m trying to understand its role in the context of a larger system.

For the code I typically work with, the unit tests require complex setup and assertion sections. This increases the cognitive load on the reader: why expend the extra effort to understand the test when I can read the source directly?

There are techniques for making tests more readable: factor your tests well, use a single assertion per test, use tools to improve readability like Rspec and Hamcrest. Still,  it always feels like it isn’t worth the effort to try and understand the tests. After all, I can always just read the code.

Written by Lorin

January 26, 2017 at 1:51 am

Posted in software

Tagged with

## How open source beat agile

How does code land in your master branch? Do your team members commit directly to master, or do you merge in pull requests?

In 2008, the repository hosting company known as GitHub was founded. GitHub quickly became the dominant player in the world of open source project hosting. GitHub’s pull request model of contributing to a project became so popular that pull request is now a generic expression.

Two years after GitHub was founded, Jez Humble and David Farley published Continuous Delivery, a book about automated deployment. One of the central ideas of continuous delivery is that all team members should commit directly and frequently to the mainline branch in the version control repository (master in Git, “trunk” in Subversion, Git’s predecessor). According to proponents of continuous delivery, working in long-lived feature branches that are occasionally merged to mainline should be avoided.

Continuous Delivery was an enormously influential book. As an example of its influence, Netflix’s deployment system, Spinnaker, bills itself as a “continuous delivery platform”. And yet, I’ve never worked on a project where the developers committed directly to mainline. The “review and merge pull request” model has dominated my professional career. The problem is that it isn’t clear how to integrate the continuous delivery approach with code review[1]. I’d wager that the majority of developers who support the idea of continuous delivery are also merging feature branches via pull request rather than committing directly to mainline.

While we pay lip service to doing continuous delivery, we’ve made the choice to go the pull request route. What’s going on here?

I think this illustrates a larger tension between the open source way of doing software development and the agile way of doing software development, where we end up speaking the language of agile but adopting the processes of open source. To understand why agile and open source would be in tension with one another, it’s instructive to examine where the agile movement came from.

Agile was born out of the world of software consulting; the substring “consult” has twelve matches on the Authors page of the Agile Manifesto. The kinds of software projects that consultants work on tend to have three properties: they are co-located, synchronous and have a well-defined customer. A good illustration of this is Extreme Programming (XP), the ur-agile process, which requires co-located synchronous development for pair programming, and having a well-defined customer or doing release planning.

By contrast, open source software projects are distributed, asynchronous, and don’t have well-defined customers. It’s natural, then, that open source processes would be different than agile ones. The continuous delivery approach of committing to mainline doesn’t make sense in an open source project, since most contributors don’t have permission to commit to mainline.

Most of the projects I’ve worked on have looked more like the agile context than the open source project, but we still do things the open source way.

When we talk about agile concepts such as continuous delivery, pair programming, and test-driven development, we refer to them as “processes”, but they’re really skills. You can’t just pick up a skill and be proficient immediately.  One way to improve a skill is the apprentice model: to observe how an expert applies that skill in context.

In open source, the entire process is visible, by definition. We can observe how projects do code reviews, integration tests, new feature planning, and so on. In the world where agile came from, we can’t. We have access to books and talks by consultants, but we don’t get to see how they applied their skills to solve specific difficulties they encountered with agile.

Consider Gherkin-based specifications. Gherkin is popular in agile, but it isn’t used in open source projects, because open source projects don’t use specifications. Without being able to observe how experts write Gherkin-based specs, it’s difficult for a novice to assess whether the approach is worth pursuing[2]. The only time I’ve ever seen Gherkin on projects is when I’ve written it myself.

Because open source succeeds in making work visible where agile fails, developers have more opportunities to become better open source developers than agile developers.

1. Yes, yes, if you do pair programming, then you are doing a kind of continuous code review that is compatible with continuous delivery. But not all teams want to do pair programming, and in that case you still only have a single reviewer.  ↩
2. The biggest disappointment in the otherwise great book Specification by Example by Gojko Adzic is the lack of examples!  ↩

Written by Lorin

January 21, 2017 at 7:42 pm

Posted in software

Tagged with

## Software development as engineering

Glenn Vanderburg argues eloquently that software development is a genuine engineering discipline.

Written by Lorin

June 14, 2016 at 12:11 am

Posted in software

## Operations engineering

Operations Engineering is the application of software engineering practices and principles to achieve and sustain operational excellence.

The quote above is from a re:Invent talk given by Josh Evans at Netflix. The phrasing appeals to me because it explicitly links operations and software engineering. I also recommend the talk if you’re interested in the topic of operations engineering at Netflix. (For context: Josh is my manager’s manager’s manager).

Written by Lorin

October 27, 2015 at 1:23 am

Posted in netflix, software