Outside the Boxplot

What if the audience is right?

2026-04-10T00:00:00Z

Half-formed thought: when a chart is repeatedly misread by its audience, we tend to blame the audience.

But what if the audience is actually reading the chart correctly — and the chart is just wrong?

The chart was made with one framing. The reader brings a different one. Both are legitimate. The question is whose job it is to close that gap.

I think it’s the maker’s job. Always. The reader is the only party in that exchange who hasn’t made any promises.

Pie Charts

2026-04-09T00:00:00Z

Pie charts fail when there are too many slices, when the slices are close in size, or when the point is to compare values precisely. In those cases, a bar chart is strictly better.

Pie charts work when there are two or three slices and the point is to show part-to-whole at a glance. “More than half” is a thing a pie chart communicates instantly. It is not a thing a bar chart communicates as well.

The problem isn’t the chart type. It’s using it outside its competence.

A Note on Scale

2026-04-08T00:00:00Z

Truncating the y-axis to make a small change look dramatic is dishonest. Starting the y-axis at zero when the data lives between 94.2 and 94.8 is also dishonest — it flattens real variation into a flat line.

Neither rule is absolute. Context determines which distortion is worse. The only honest move is to say what you did and why.

The Legend Is a Last Resort

2026-04-01T00:00:00Z

A legend requires the reader to do a lookup — see color, find color in legend, read label, return to chart — for every data point they want to understand. That’s a round trip. In a chart with twelve data series, it’s twelve round trips.

Direct labeling eliminates the round trip entirely. The label is next to the thing it describes. The eye doesn’t have to travel.

When legends make sense

There are cases where legends are the right call. Dense small-multiple grids, where labeling each panel would create noise. Maps with many categories and limited space. Sparklines. Situations where the category names would physically overlap the data no matter where you placed them.

But these are real constraints, not defaults. Most charts that use legends don’t have those constraints. They use legends because the charting tool defaults to one, and nobody questioned it.

The design work direct labeling requires

Direct labeling is harder to implement than a legend, which is part of why it’s underused. You have to decide where the labels live — typically at the end of a line, or at the largest/most recent data point. You have to handle overlapping labels. You may need to abbreviate.

That friction is doing design work. It forces you to think about which series matter and how much space each deserves. A chart that’s too crowded to label directly is often too crowded, full stop.

The rule

Use a legend only when direct labeling would genuinely make the chart harder to read. In every other case, move the label to the data.

The Annotation Is the Argument

2026-03-15T00:00:00Z

When we strip annotations from data visualizations in the name of “clean design,” we often strip the argument too.

The numbers on the axis tell you what happened. The annotation tells you so what. A line going up is just a line going up until someone writes “pandemic begins” next to the inflection point.

I’ve been thinking about this while reviewing a lot of dashboard work lately. Beautiful, minimal, empty. Charts that describe but don’t communicate. The designer removed every annotation to achieve visual quiet. What they achieved instead was visual ambiguity.

The annotation is not a decoration. It is the editorial voice of the chart. Removing it doesn’t make the data speak for itself — data never speaks for itself. It just makes the chart mute.

What annotations actually do

They close the gap between what you see and what you should conclude. Every annotation is a small act of interpretation. When you skip it, you push that work onto the reader — which sounds generous but is usually just unhelpful.

The chart that needs no annotations is the chart where the story is already obvious. That chart is rarer than people think.

The Visual Vocabulary Gap

2026-03-02T00:00:00Z

Alberto Cairo’s How Charts Lie is the best practical book on visualization literacy I’ve read. Not because it teaches you to make better charts — it doesn’t, really — but because it gives you a rigorous account of what it means for a chart to mislead, intentionally or not.

The chapter on the vocabulary gap is worth the price of the book. Cairo’s argument: visualization literacy is not evenly distributed, and designers who forget this make charts that communicate only to people who already know the answer.

The bar chart is nearly universal. The scatter plot is common. The choropleth map is familiar. After that, things get complicated fast. Box plots, violin plots, beeswarms, parallel coordinates — these require explicit instruction to read correctly. Treating them as self-explanatory because you can read them is a category error.

The book doesn’t argue for dumbing down. It argues for knowing your audience and designing accordingly. A chart for an academic journal audience is allowed to assume more. A chart for a general news audience is not.

Worth reading alongside Cole Nussbaumer Knaflic’s Storytelling with Data, which makes a complementary but different argument about why simplicity is a design virtue rather than a constraint.

Small Multiples, Underused

2026-02-28T00:00:00Z

Edward Tufte’s defense of small multiples has stayed with me longer than almost anything else I’ve read about visualization.

The idea: instead of one complex chart trying to show many variables at once, show many simple charts showing one variable each, arranged so the eye can compare across panels. Let spatial position do the comparative work.

This is still underused. Most dashboards reach for color coding and legends when they should reach for faceting. A 3×3 grid of line charts, each showing one country, beats a single spaghetti chart with nine colored lines almost every time.

Worth revisiting: The Visual Display of Quantitative Information, chapter 4. Still the clearest treatment of this I know.

Why Every Data Team Should Hire a Copy Editor Before They Hire Another Data Engineer

2026-02-10T00:00:00Z

The average data team has many people who can build a pipeline and almost no one whose job is to make the output legible.

This is a strange inversion. The pipeline exists to produce insight. The insight is useless if it cannot be communicated. Yet the team is structured as if the hard part is the infrastructure.

A copy editor — or more broadly, someone who thinks professionally about language, clarity, and reader experience — would do more to improve the impact of most data teams than another engineer would. They would fix the chart titles. They would challenge the jargon. They would ask “what does this mean?” until the answer was short enough to be useful.

This is a deliberately overstated position. Take it as a provocation.

The Data-Context Problem

2026-01-12T00:00:00Z

Every number is a fraction of something unstated.

A 12% conversion rate is excellent or catastrophic depending on the industry, the product, the point in the funnel, and the quarter you’re comparing it to. Without that context, 12% is noise wearing a number’s clothing.

This is the data-context problem: the meaning of a datum is not intrinsic to the datum. It lives in the relationship between the number and everything surrounding it — the baseline, the comparison, the definition of terms, the source of measurement, the reason it was measured at all.

Why this matters in practice

Most data communication failures are context failures, not data failures. The numbers are right. The context is missing.

I see three common patterns:

The naked metric. A single number presented without a reference point. Revenue grew 8%. Compared to what? Last quarter? Last year? Competitor average? Industry benchmark? The number tells you nothing alone.

The unanchored trend. A line chart showing movement over time, with no annotation explaining what happened. The line goes down in March. Why? Was that expected? Is it a problem? The reader is left to guess.

The undefined denominator. “40% of customers…” 40% of which customers? Active users? All registered accounts? People who bought in the last 30 days? The denominator defines the claim. Leave it out and the number is underdetermined.

Context is not a footnote

The instinct is to treat context as supplementary — something you add after the visualization is done, if you have time. A footnote. A tooltip. A caption buried below the chart.

This is backwards. Context is structural. It should be decided before the visualization is designed, because it determines what the visualization needs to show.

The question isn’t “what data do I have?” It’s “what does the reader need to know to interpret this correctly?” Work backwards from that, and context stops being an afterthought.

A working definition

For practical purposes, I think about context as four things:

Reference points — what should this number be compared to?
Definitions — how is the metric calculated, and what is excluded?
Causation hints — what explanations are worth surfacing?
Confidence signals — how reliable is this number, and at what granularity?

A visualization that addresses all four is doing real communicative work. One that addresses none of them is just a picture of a spreadsheet.

Formatting Guide

2025-12-01T00:00:00Z

This post exists as a formatting reference. It exercises every markdown element the site supports so you can see how each one renders before using it in real writing.

Headings

The post title renders as an h1. Use h2 for major sections and h3 for sub-sections within them. Avoid deeper nesting — if you need h4, the section probably wants to be its own post.

This is an h3

It renders smaller and lighter than an h2, suitable for a subdivision within a section.

Prose and emphasis

Plain paragraph text is set in IBM Plex Serif at 17px with a comfortable line height. You can use italic for titles of works, technical terms on first use, or light stress. Use bold for genuinely important terms or key phrases — not for general emphasis.

Avoid using bold or italic as decoration. If everything is emphasized, nothing is.

Links

Links are rendered in the accent green and underline on hover. Internal links use relative paths. External links should go to the actual source — like this one.

Lists

Unordered list — for items without a meaningful order:

First item in the list
Second item, which is a bit longer to show how the text wraps when it runs past the end of the line and continues on the next
Third item
Fourth item

Ordered list — when sequence matters:

Start with the claim
Provide the evidence
Acknowledge the strongest counterargument
Restate the claim with the counterargument absorbed

Blockquotes

Use blockquotes for direct quotations or for passages you want to set apart from the main text.

The greatest value of a picture is when it forces us to notice what we never expected to see.

— John Tukey

Blockquotes are set in italic with a green left border. They are for quotations, not for calling out your own text — that’s what prose structure is for.

Code

Inline code uses a monospace face with a light background: published_date, filterByCategory(), --accent.

Code blocks are for multi-line examples:

title: My Post Title
excerpt: A short description of the post.
category: median
published_date: 2026-01-15

The site does not load a syntax highlighting library, so code blocks render in plain monospace without color.

Horizontal rules

The --- separator above and in this section creates a full-width rule. Use it to mark a major thematic break — a scene change, a shift in register, or a transition between a setup and a payoff. Don’t use it as a substitute for a heading.

Badge types for reference

Posts are tagged with one of four types, each named after a box plot element:

Median — finished, definitive essays
Box — dense, substantial analysis (this post)
Whisker — recommendations and pointers outward
Outlier — rough ideas, seeds, short provocations