The seductiveness of single-metric decisions

Making decisions is hard.

One technique to help with making a decision is to compute a single metric for each of the options being considered, and then compare the value of those two metrics. A common metric for this situation is to use dollars or ROI (return on investment, which is a unitless ratio of dollars). Are you trying to decide between two internal software development projects? Estimate the ROI for each one and pick the larger one. OKRs (objectives and key results) and error budgets are two other examples of driving decisions using individual metrics, like “where should we focus our effort now?” or “can we push this new feature to production?”

A single-metric-based approach has the virtue of simplifying the final stage in the decision-making process: we simply compare two numbers (either two metrics or a metric against a threshold) in order to make our decision. Yes, it requires mapping the different factors under consideration onto the metric, but it’s tractable, right?

The problem is that the process of mapping the relevant factors into the single metric always involves subjective judgments that ultimately discard information. For example, for ROI calculations, consider the work involved in considering the various different kinds of costs and benefits and mapping those into dollars. Information that should be taken into account when making the final decision vanishes from this process as these factors get mapped into an anemic scalar value.

The problem here isn’t the use of metrics. Rather, it’s the temptation to squeeze all of the relevant information into a form that is representable in a single metric. A single metric frees the decision maker from having to make a subjective judgment that involves very different-looking factors. That’s a hard thing to do, and it can make people uncomfortable.

W. Edwards Deming was famous for railing against numerical targets. Note that he wasn’t opposed to metrics. (He advocated for the value of professional statisticians and control charts). Rather, he was opposed to decisions that were made based on single metrics. Here are some quotes from his book Out of the crisis on this topic:

Focus on outcome (management by numbers, MBO, work standards, meet
specifications, zero defects, appraisal of performance) must be abolished,
leadership put in place.

Eliminate management by objective. Eliminate management by numbers,
numerical goals. Substitute leadership.

[M]anagement by numerical goal is an attempt to manage without knowledge of what to do, and in fact is usually management by fear.

Deming uses the term “leadership” as the alternative to the decision-by-single-metric approach. I interpret that term as the ability of a manager to synthesize information from multiple sources in order to make a decision holistically. It’s a lot harder than mapping all of the factors into a single metric. But nobody ever said being an effective leader is easy.

Engineering research reveals wrongdoing

The New York Times has a story today, Inside VW’s Campaign of Trickery, about how Volkswagon conspired to hide their excessive diesel emissions from California regulators.

What was fascinating to me was that the emission violations were discovered by mechanical engineering researchers at West Virginia University, Dan Carder, Hemanth Kappanna, and Marc Besch (Kappanna and Besch were graduate students at the time).

The presence of high levels of lead in Flint, Michigan drinking water was also discovered by an engineering researcher: Marc Edwards, a civil engineering professor at Virginia Tech.

It’s a reminder that regulators alone aren’t sufficient to ensure safety, and that academic engineering research can have a real impact on society.

Personal productivity tools

Personal productivity tools

Productivity tools have always held a special fascination for me. I also tend to futz around with multiple tools, trying to find the perfect match for my workflow. My toolset has been pretty stable for several months now. Here’s what I’m currently using.

OmniFocus

I’ve been a fan of Gettings Things Done for a long time. Of the various GTD-supporting tools I’ve found, I like OmniFocus the best. Useful features include:

  • Syncs well between laptop and phone
  • Easy to add to the inbox via keyboard shortcut in OS X
  • Easy to add to the inbox in the iOS app
  • Integrates with reminders on iOS, which means I can say to my watch “Remind me to do X” and “do X” ends up in my OmniFocus inbox
  • Per-project support for “serial tasks” (only one next action) and “parallel tasks” (multiple next actions). I use this all of the time.
  • I can put projects “on hold” and they don’t show up in current context. In particular, I have an on-hold “Someday” task which acts as a catch-all for things I don’t want to forget but that I don’t plan on doing in the near term.

My contexts are:

  • office
  • online
  • home
  • phone
  • work
  • waiting

VoodooPad

VoodooPad is a personal wiki. It mainly has two uses: context for each project I’m working on (e.g., pastes of recent error messages), and reference pages for things like urls and commonly used code snippets or commands that I often forget.

I like how it’s free-form, and not just plain text. This means I can paste in images that are rendered inline, and I can render code and terminal output in fixed-width font, and my notes in variable-width font.

That being said, what I’d really like is some content system that lets me organize by a topic and by date, and VoodooPad only does by topic, but it’s the closest I’ve been able to find.

Emergent Task Planner

I use a notebook called the Emergent Task Planner to structure my day. I write down tasks that I’d like to accomplish that day and schedule them in chunks of time. I often don’t follow the specific schedule, but I find it helps if I take some time to think about what I’m going to try to accomplish, as well as explicitly scheduling out time for checking email so I’m less tempted to do that while working.

Ubiquitous capture tools

Getting Things Done has a notion of “ubiquitious capture”: being able to quickly capture content that you can come back to later. In addition to OmniFocus, I use a few other tools for ubiquitious capture:

Index cards

I keep a stack of index cards in my back pocket with a binder clip and along with a Fisher space pen. It’s often faster to scribble on an index card than to take out my phone. This was inspired by Merlin Mann’s Hipster PDA.

CiteULike

When I encounter a book or academic paper I’d like to read, I clip it to CiteULike.

Instapaper

If I encounter an essay on the web I don’t have time to read, I use Instapaper to capture it for later . It has great Kindle support: every week it automatically emails the content to my Kindle Paperwhite.

Pinboard

I use Pinboard to bookmark reference material. I was a Delicious user for a long time, but Pinboard’s UX is so much better, than I’m happy to pay them for it rather than use Delicious for free.

When software takes a human life

A Tesla driver was killed in a car crash while the Autopilot system was engaged. According to the news report:

Joshua D. Brown, of Canton, Ohio, died in the accident May 7 in Williston, Florida, when his car’s cameras failed to distinguish the white side of a turning tractor-trailer from a brightly lit sky and didn’t automatically activate its brakes, according to government records obtained Thursday.

These types of automative systems are completely outside my area of expertise. That being said, I imagine that validating this type of control system that relies on complex sensor data must be incredibly challenging. The input space is mind-bogglingly huge, so how do you catch these kinds of corner cases in testing?

The failure here is not due to a “bug” (or “defect” in academic software engineering jargon) in the traditional sense that we use the term. Yet, there clearly was a defect in this system, and the result was a human fatality.

I was also struck by this line:

Harley [an analyst at Kelley Blue Book] called the death unfortunate, but said that more deaths can be expected as the autonomous technology is refined.

I wonder if future deaths will lead to additional regulations on how software engineering work is done in domains like this.

Head banging odds ratio

Here’s an idea for a software engineering empirical study. My first thought was to use this to compare the productivity of web frameworks (e.g., Django, Rails, …), but really it could be used for any software development framework or language.

Pick a random sample of, say, Django developers and Rails developers. Send participants text messages at random times during the week (ask them in advance which range of times it’s OK to text them). The text message says:

Are you currently programming in the (Django|Rails) framework and banging your head against the wall?

  • If yes, respond “1”
  • If currently programming but not banging your head against the wall, respond “2”
  • If not currently programming, respond “3”

At the end of the study, look at the ratio of “1” to “2” responses for each framework, to measure the odds ratio of “banging head against the wall : not banging head against the wall”.