I’m a fan of the formal specification language TLA+. With TLA+, you can build models of programs or systems, which helps to reason about their behavior.

TLA+ is particularly useful for reasoning about the behavior of multithread programs and distributed systems. By requiring you to specify behavior explicitly, it forces you to think about interleaving of events that you might not otherwise have considered.

The user base of TLA+ is quite small. I think one of the reasons that TLA+ isn’t very popular is that it’s difficult to learn. I think there are at least three concepts you need for TLA+ that give new users trouble:

- The universe as a state machine
- Modeling programs and systems with math
- Mathematical syntax

## The universe as state machine

TLA+ uses a *state machine* model. It treats the universe as a collection of variables whose values vary over time.

A state machine in sense that TLA+ uses it is similar, but not exactly the same, as the *finite state machines* that software engineers are used to. In particular:

- A state machine in the TLA+ sense can have an infinite number of states.
- When software engineers think about state machines, they think about a specific object or component being implemented as a finite state machine. In TLA+,
*everything*is modeled as a state machine.

The state machine view of systems will feel familiar if you have a background in physics, because physicists use the same approach for system modeling: they define a state variable that evolves over time. If you squint, a TLA+ specification looks identical to a system of first-order differential equations, and associated boundary conditions. But, for the average software engineer, the notion of an entire system as an evolving state variable is a new way of thinking.

The state machine approach requires a set of concepts that you need to understand. In particular, you need to understand *behaviors*, which requires that you understand *states*, *steps*, and *actions*. Steps can *stutter*, and actions may or may not be *enabled*. For example, here’s the definition of “enabled” (I’m writing this from memory):

An action

aisenabledfor a statesif there exists a statetsuch thatais true for the step s→t.

It took me a *long* time to internalize these concepts to the point where I could just write that out without consulting a source. For a newcomer, who wants to get up and running as quickly as possible, each new concept that requires effort to understand decreases the likelihood of adoption.

## Modeling programs and systems with math

One of the commonalities across engineering disciplines is that they all work with mathematical models. These models are abstractions, objects that are simplified versions of the artifacts that we intend to build. That’s one of the thing that attracts me about TLA+, it’s modeling for software engineering.

A mechanical engineer is never going to confuse the finite element model they have constructed on a computer with the physical artifact that they are building. Unfortunately, we software engineers aren’t so lucky. Our models superficially resemble the artifacts we build (a TLA+ model and a computer program both look like source code!). But models aren’t programs: a model is a completely different beast, and that trips people up.

Here’s a metaphor: You can think of writing a program as akin to painting, in that both are additive work: You start with nothing and you do work by adding content (statements to your program, paint to a canvas).

The simplest program, equivalent to an empty canvas, is one that doesn’t do anything at all. On Unix systems, there’s a program called `true` which does nothing but terminate successfully. You can implement this in shell as an empty file. (Incidentally, AT&T has copyrighted this implementation).

By contrast, when you implement a model, you do the work by adding constraints on the behavior of the state variables. It’s more like sculpting, where you start with everything, and then you chip away at it until you end up with what you want.

The simplest model, the one with no constraints at all, allows *all possible behaviors*. Where the simplest computer program does nothing, the simplest model does (really *allows*) **everything.** The work of modeling is adding constraints to the possible behaviors such that the model only describes the behaviors we are interested in.

When we write ordinary programs, the only kind of mistake we can really make is a bug, writing a program that doesn’t do what it’s supposed to. When we write a model of a program, we can also make that kind of mistake. But, we can make another kind of mistake, where *our model allows some behavior that would never actually happen in the real world, or isn’t even physically possible in the real world*.

Engineers and physicists understand this kind of mistake, where a mathematical model permits a behavior that isn’t possible in the real world. For example, electrical engineers talk about *causal* filters, which are filters whose outputs only depend on the past and present, not the future. You might ask why you even need a word for this, since it’s not possible to build a non-causal physical device. But it’s possible, and even useful, to describe non-causal filters mathematically. And, indeed, it turns out that filters that perfectly block out a range of frequencies, are not causal.

For a new TLA+ user who doesn’t understand the distinction between models and programs, this kind of mistake is inconceivable, since it can’t happen when writing a regular program. Creating non-causal specifications (the software folks use the term “machine-closed” instead of “causal”) is not a typical error for new users, but underspecifying the behavior some variable of interest is very common.

## Mathematical syntax

Many elements of TLA+ are taken directly from mathematics and logic. For software engineers used to programming language syntax, these can be confusing at first. If you haven’t studied predicate logic before, the universal (∀) and extensional (∃) quantifiers will be new.

I don’t think TLA+’s syntax, by itself, is a significant obstacle to adoption: software engineers pick up new languages with unfamiliar syntax all of the time. The real difficulty is in understanding TLA+’s notion of a state machine, and that modeling is describing a computer program as permitted behaviors of a state machine. The new syntax is just one more hurdle.

Alright, fine, what’s the best way to start learning about TLA+?

Practical TLA+ by Hillel Wayne: https://www.apress.com/us/book/9781484238288

Hillel also has this website: https://learntla.com

Thank you!