<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>Outside the Boxplot</title>
  <subtitle>A digital garden for data communicators — thoughts on data communication, information design, and related topics.</subtitle>
  <link href="http://localhost:8080/feed.xml" rel="self"/>
  <link href="http://localhost:8080/"/>
  <id>http://localhost:8080/</id>
  <author>
    <name>Outside the Boxplot</name>
  </author>
  
  
  <updated>2026-04-10T00:00:00Z</updated>
  
  
  <entry>
    <title>What if the audience is right?</title>
    <link href="http://localhost:8080/what-if-the-audience-is-right/"/>
    <id>http://localhost:8080/what-if-the-audience-is-right/</id>
    <published>2026-04-10T00:00:00Z</published>
    <updated>2026-04-10T00:00:00Z</updated>
    <summary>A rough thought on whose responsibility it is when a chart gets misread.</summary>
    <content type="html"><![CDATA[&lt;p&gt;Half-formed thought: when a chart is repeatedly misread by its audience, we tend to blame the audience.&lt;/p&gt;
&lt;p&gt;But what if the audience is actually reading the chart correctly — and the chart is just wrong?&lt;/p&gt;
&lt;p&gt;The chart was made with one framing. The reader brings a different one. Both are legitimate. The question is whose job it is to close that gap.&lt;/p&gt;
&lt;p&gt;I think it’s the maker’s job. Always. The reader is the only party in that exchange who hasn’t made any promises.&lt;/p&gt;
]]></content>
    <category term="outlier"/>
  </entry>
  
  <entry>
    <title>Pie Charts</title>
    <link href="http://localhost:8080/pie-charts/"/>
    <id>http://localhost:8080/pie-charts/</id>
    <published>2026-04-09T00:00:00Z</published>
    <updated>2026-04-09T00:00:00Z</updated>
    <summary>They are not as bad as people say. They are bad in a specific, avoidable way.</summary>
    <content type="html"><![CDATA[&lt;p&gt;Pie charts fail when there are too many slices, when the slices are close in size, or when the point is to compare values precisely. In those cases, a bar chart is strictly better.&lt;/p&gt;
&lt;p&gt;Pie charts work when there are two or three slices and the point is to show part-to-whole at a glance. “More than half” is a thing a pie chart communicates instantly. It is not a thing a bar chart communicates as well.&lt;/p&gt;
&lt;p&gt;The problem isn’t the chart type. It’s using it outside its competence.&lt;/p&gt;
]]></content>
    <category term="outlier"/>
  </entry>
  
  <entry>
    <title>A Note on Scale</title>
    <link href="http://localhost:8080/a-note-on-scale/"/>
    <id>http://localhost:8080/a-note-on-scale/</id>
    <published>2026-04-08T00:00:00Z</published>
    <updated>2026-04-08T00:00:00Z</updated>
    <summary></summary>
    <content type="html"><![CDATA[&lt;p&gt;Truncating the y-axis to make a small change look dramatic is dishonest. Starting the y-axis at zero when the data lives between 94.2 and 94.8 is also dishonest — it flattens real variation into a flat line.&lt;/p&gt;
&lt;p&gt;Neither rule is absolute. Context determines which distortion is worse. The only honest move is to say what you did and why.&lt;/p&gt;
]]></content>
    <category term="outlier"/>
  </entry>
  
  <entry>
    <title>The Legend Is a Last Resort</title>
    <link href="http://localhost:8080/the-legend-is-a-last-resort/"/>
    <id>http://localhost:8080/the-legend-is-a-last-resort/</id>
    <published>2026-04-01T00:00:00Z</published>
    <updated>2026-04-01T00:00:00Z</updated>
    <summary>A legend makes the reader do extra work on every single data point. Direct labeling is almost always the better trade.</summary>
    <content type="html"><![CDATA[&lt;p&gt;A legend requires the reader to do a lookup — see color, find color in legend, read label, return to chart — for every data point they want to understand. That’s a round trip. In a chart with twelve data series, it’s twelve round trips.&lt;/p&gt;
&lt;p&gt;Direct labeling eliminates the round trip entirely. The label is next to the thing it describes. The eye doesn’t have to travel.&lt;/p&gt;
&lt;h2&gt;When legends make sense&lt;/h2&gt;
&lt;p&gt;There are cases where legends are the right call. Dense small-multiple grids, where labeling each panel would create noise. Maps with many categories and limited space. Sparklines. Situations where the category names would physically overlap the data no matter where you placed them.&lt;/p&gt;
&lt;p&gt;But these are real constraints, not defaults. Most charts that use legends don’t have those constraints. They use legends because the charting tool defaults to one, and nobody questioned it.&lt;/p&gt;
&lt;h2&gt;The design work direct labeling requires&lt;/h2&gt;
&lt;p&gt;Direct labeling is harder to implement than a legend, which is part of why it’s underused. You have to decide where the labels live — typically at the end of a line, or at the largest/most recent data point. You have to handle overlapping labels. You may need to abbreviate.&lt;/p&gt;
&lt;p&gt;That friction is doing design work. It forces you to think about which series matter and how much space each deserves. A chart that’s too crowded to label directly is often too crowded, full stop.&lt;/p&gt;
&lt;h2&gt;The rule&lt;/h2&gt;
&lt;p&gt;Use a legend only when direct labeling would genuinely make the chart harder to read. In every other case, move the label to the data.&lt;/p&gt;
]]></content>
    <category term="median"/>
  </entry>
  
  <entry>
    <title>The Annotation Is the Argument</title>
    <link href="http://localhost:8080/the-annotation-is-the-argument/"/>
    <id>http://localhost:8080/the-annotation-is-the-argument/</id>
    <published>2026-03-15T00:00:00Z</published>
    <updated>2026-03-15T00:00:00Z</updated>
    <summary>A chart without annotations is a question without an answer. The label is where the reasoning lives.</summary>
    <content type="html"><![CDATA[&lt;p&gt;When we strip annotations from data visualizations in the name of “clean design,” we often strip the argument too.&lt;/p&gt;
&lt;p&gt;The numbers on the axis tell you &lt;em&gt;what&lt;/em&gt; happened. The annotation tells you &lt;em&gt;so what&lt;/em&gt;. A line going up is just a line going up until someone writes “pandemic begins” next to the inflection point.&lt;/p&gt;
&lt;p&gt;I’ve been thinking about this while reviewing a lot of dashboard work lately. Beautiful, minimal, empty. Charts that describe but don’t communicate. The designer removed every annotation to achieve visual quiet. What they achieved instead was visual ambiguity.&lt;/p&gt;
&lt;p&gt;The annotation is not a decoration. It is the editorial voice of the chart. Removing it doesn’t make the data speak for itself — data never speaks for itself. It just makes the chart mute.&lt;/p&gt;
&lt;h2&gt;What annotations actually do&lt;/h2&gt;
&lt;p&gt;They close the gap between &lt;em&gt;what you see&lt;/em&gt; and &lt;em&gt;what you should conclude&lt;/em&gt;. Every annotation is a small act of interpretation. When you skip it, you push that work onto the reader — which sounds generous but is usually just unhelpful.&lt;/p&gt;
&lt;p&gt;The chart that needs no annotations is the chart where the story is already obvious. That chart is rarer than people think.&lt;/p&gt;
]]></content>
    <category term="median"/>
  </entry>
  
  <entry>
    <title>The Visual Vocabulary Gap</title>
    <link href="http://localhost:8080/the-visual-vocabulary-gap/"/>
    <id>http://localhost:8080/the-visual-vocabulary-gap/</id>
    <published>2026-03-02T00:00:00Z</published>
    <updated>2026-03-02T00:00:00Z</updated>
    <summary>Most audiences can read a bar chart. Far fewer can read a violin plot. The gap between what analysts reach for and what audiences can parse is larger than we admit.</summary>
    <content type="html"><![CDATA[&lt;p&gt;Alberto Cairo’s &lt;em&gt;How Charts Lie&lt;/em&gt; is the best practical book on visualization literacy I’ve read. Not because it teaches you to make better charts — it doesn’t, really — but because it gives you a rigorous account of what it means for a chart to mislead, intentionally or not.&lt;/p&gt;
&lt;p&gt;The chapter on the vocabulary gap is worth the price of the book. Cairo’s argument: visualization literacy is not evenly distributed, and designers who forget this make charts that communicate only to people who already know the answer.&lt;/p&gt;
&lt;p&gt;The bar chart is nearly universal. The scatter plot is common. The choropleth map is familiar. After that, things get complicated fast. Box plots, violin plots, beeswarms, parallel coordinates — these require explicit instruction to read correctly. Treating them as self-explanatory because &lt;em&gt;you&lt;/em&gt; can read them is a category error.&lt;/p&gt;
&lt;p&gt;The book doesn’t argue for dumbing down. It argues for knowing your audience and designing accordingly. A chart for an academic journal audience is allowed to assume more. A chart for a general news audience is not.&lt;/p&gt;
&lt;p&gt;Worth reading alongside Cole Nussbaumer Knaflic’s &lt;em&gt;Storytelling with Data&lt;/em&gt;, which makes a complementary but different argument about why simplicity is a design virtue rather than a constraint.&lt;/p&gt;
]]></content>
    <category term="whisker"/>
  </entry>
  
  <entry>
    <title>Small Multiples, Underused</title>
    <link href="http://localhost:8080/small-multiples-underused/"/>
    <id>http://localhost:8080/small-multiples-underused/</id>
    <published>2026-02-28T00:00:00Z</published>
    <updated>2026-02-28T00:00:00Z</updated>
    <summary>Tufte&#39;s case for showing many simple charts instead of one complex one holds up better than almost anything else he wrote.</summary>
    <content type="html"><![CDATA[&lt;p&gt;Edward Tufte’s defense of small multiples has stayed with me longer than almost anything else I’ve read about visualization.&lt;/p&gt;
&lt;p&gt;The idea: instead of one complex chart trying to show many variables at once, show many simple charts showing one variable each, arranged so the eye can compare across panels. Let spatial position do the comparative work.&lt;/p&gt;
&lt;p&gt;This is still underused. Most dashboards reach for color coding and legends when they should reach for faceting. A 3×3 grid of line charts, each showing one country, beats a single spaghetti chart with nine colored lines almost every time.&lt;/p&gt;
&lt;p&gt;Worth revisiting: &lt;em&gt;The Visual Display of Quantitative Information&lt;/em&gt;, chapter 4. Still the clearest treatment of this I know.&lt;/p&gt;
]]></content>
    <category term="whisker"/>
  </entry>
  
  <entry>
    <title>Why Every Data Team Should Hire a Copy Editor Before They Hire Another Data Engineer</title>
    <link href="http://localhost:8080/why-every-data-team-needs-a-copy-editor/"/>
    <id>http://localhost:8080/why-every-data-team-needs-a-copy-editor/</id>
    <published>2026-02-10T00:00:00Z</published>
    <updated>2026-02-10T00:00:00Z</updated>
    <summary>The bottleneck in most data organizations is not computation. It is communication. The ratio of engineers to writers reflects a misdiagnosis.</summary>
    <content type="html"><![CDATA[&lt;p&gt;The average data team has many people who can build a pipeline and almost no one whose job is to make the output legible.&lt;/p&gt;
&lt;p&gt;This is a strange inversion. The pipeline exists to produce insight. The insight is useless if it cannot be communicated. Yet the team is structured as if the hard part is the infrastructure.&lt;/p&gt;
&lt;p&gt;A copy editor — or more broadly, someone who thinks professionally about language, clarity, and reader experience — would do more to improve the impact of most data teams than another engineer would. They would fix the chart titles. They would challenge the jargon. They would ask “what does this mean?” until the answer was short enough to be useful.&lt;/p&gt;
&lt;p&gt;This is a deliberately overstated position. Take it as a provocation.&lt;/p&gt;
]]></content>
    <category term="outlier"/>
  </entry>
  
  <entry>
    <title>The Data-Context Problem</title>
    <link href="http://localhost:8080/the-data-context-problem/"/>
    <id>http://localhost:8080/the-data-context-problem/</id>
    <published>2026-01-12T00:00:00Z</published>
    <updated>2026-01-12T00:00:00Z</updated>
    <summary>Numbers don&#39;t carry their own meaning. Context is not decoration — it is the substrate that makes the number interpretable at all.</summary>
    <content type="html"><![CDATA[&lt;p&gt;Every number is a fraction of something unstated.&lt;/p&gt;
&lt;p&gt;A 12% conversion rate is excellent or catastrophic depending on the industry, the product, the point in the funnel, and the quarter you’re comparing it to. Without that context, 12% is noise wearing a number’s clothing.&lt;/p&gt;
&lt;p&gt;This is the data-context problem: the meaning of a datum is not intrinsic to the datum. It lives in the relationship between the number and everything surrounding it — the baseline, the comparison, the definition of terms, the source of measurement, the reason it was measured at all.&lt;/p&gt;
&lt;h2&gt;Why this matters in practice&lt;/h2&gt;
&lt;p&gt;Most data communication failures are context failures, not data failures. The numbers are right. The context is missing.&lt;/p&gt;
&lt;p&gt;I see three common patterns:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The naked metric.&lt;/strong&gt; A single number presented without a reference point. Revenue grew 8%. Compared to what? Last quarter? Last year? Competitor average? Industry benchmark? The number tells you nothing alone.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The unanchored trend.&lt;/strong&gt; A line chart showing movement over time, with no annotation explaining what happened. The line goes down in March. Why? Was that expected? Is it a problem? The reader is left to guess.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The undefined denominator.&lt;/strong&gt; “40% of customers…” 40% of which customers? Active users? All registered accounts? People who bought in the last 30 days? The denominator defines the claim. Leave it out and the number is underdetermined.&lt;/p&gt;
&lt;h2&gt;Context is not a footnote&lt;/h2&gt;
&lt;p&gt;The instinct is to treat context as supplementary — something you add after the visualization is done, if you have time. A footnote. A tooltip. A caption buried below the chart.&lt;/p&gt;
&lt;p&gt;This is backwards. Context is structural. It should be decided before the visualization is designed, because it determines what the visualization needs to show.&lt;/p&gt;
&lt;p&gt;The question isn’t “what data do I have?” It’s “what does the reader need to know to interpret this correctly?” Work backwards from that, and context stops being an afterthought.&lt;/p&gt;
&lt;h2&gt;A working definition&lt;/h2&gt;
&lt;p&gt;For practical purposes, I think about context as four things:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Reference points&lt;/strong&gt; — what should this number be compared to?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Definitions&lt;/strong&gt; — how is the metric calculated, and what is excluded?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Causation hints&lt;/strong&gt; — what explanations are worth surfacing?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Confidence signals&lt;/strong&gt; — how reliable is this number, and at what granularity?&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;A visualization that addresses all four is doing real communicative work. One that addresses none of them is just a picture of a spreadsheet.&lt;/p&gt;
]]></content>
    <category term="box"/>
  </entry>
  
  <entry>
    <title>Formatting Guide</title>
    <link href="http://localhost:8080/formatting-guide/"/>
    <id>http://localhost:8080/formatting-guide/</id>
    <published>2025-12-01T00:00:00Z</published>
    <updated>2025-12-01T00:00:00Z</updated>
    <summary>A reference post demonstrating every supported formatting element — headings, lists, blockquotes, code, and more.</summary>
    <content type="html"><![CDATA[&lt;p&gt;This post exists as a formatting reference. It exercises every markdown element the site supports so you can see how each one renders before using it in real writing.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;Headings&lt;/h2&gt;
&lt;p&gt;The post title renders as an &lt;code&gt;h1&lt;/code&gt;. Use &lt;code&gt;h2&lt;/code&gt; for major sections and &lt;code&gt;h3&lt;/code&gt; for sub-sections within them. Avoid deeper nesting — if you need &lt;code&gt;h4&lt;/code&gt;, the section probably wants to be its own post.&lt;/p&gt;
&lt;h3&gt;This is an h3&lt;/h3&gt;
&lt;p&gt;It renders smaller and lighter than an h2, suitable for a subdivision within a section.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;Prose and emphasis&lt;/h2&gt;
&lt;p&gt;Plain paragraph text is set in IBM Plex Serif at 17px with a comfortable line height. You can use &lt;em&gt;italic&lt;/em&gt; for titles of works, technical terms on first use, or light stress. Use &lt;strong&gt;bold&lt;/strong&gt; for genuinely important terms or key phrases — not for general emphasis.&lt;/p&gt;
&lt;p&gt;Avoid using bold or italic as decoration. If everything is emphasized, nothing is.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;Links&lt;/h2&gt;
&lt;p&gt;Links are &lt;a href=&quot;http://localhost:8080/&quot;&gt;rendered in the accent green&lt;/a&gt; and underline on hover. Internal links use relative paths. External links should go to the actual source — &lt;a href=&quot;http://localhost:8080/&quot;&gt;like this one&lt;/a&gt;.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;Lists&lt;/h2&gt;
&lt;p&gt;Unordered list — for items without a meaningful order:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;First item in the list&lt;/li&gt;
&lt;li&gt;Second item, which is a bit longer to show how the text wraps when it runs past the end of the line and continues on the next&lt;/li&gt;
&lt;li&gt;Third item&lt;/li&gt;
&lt;li&gt;Fourth item&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Ordered list — when sequence matters:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Start with the claim&lt;/li&gt;
&lt;li&gt;Provide the evidence&lt;/li&gt;
&lt;li&gt;Acknowledge the strongest counterargument&lt;/li&gt;
&lt;li&gt;Restate the claim with the counterargument absorbed&lt;/li&gt;
&lt;/ol&gt;
&lt;hr /&gt;
&lt;h2&gt;Blockquotes&lt;/h2&gt;
&lt;p&gt;Use blockquotes for direct quotations or for passages you want to set apart from the main text.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The greatest value of a picture is when it forces us to notice what we never expected to see.&lt;/p&gt;
&lt;p&gt;— John Tukey&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Blockquotes are set in italic with a green left border. They are for quotations, not for calling out your own text — that’s what prose structure is for.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;Code&lt;/h2&gt;
&lt;p&gt;Inline code uses a monospace face with a light background: &lt;code&gt;published_date&lt;/code&gt;, &lt;code&gt;filterByCategory()&lt;/code&gt;, &lt;code&gt;--accent&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Code blocks are for multi-line examples:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;title: My Post Title
excerpt: A short description of the post.
category: median
published_date: 2026-01-15
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The site does not load a syntax highlighting library, so code blocks render in plain monospace without color.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;Horizontal rules&lt;/h2&gt;
&lt;p&gt;The &lt;code&gt;---&lt;/code&gt; separator above and in this section creates a full-width rule. Use it to mark a major thematic break — a scene change, a shift in register, or a transition between a setup and a payoff. Don’t use it as a substitute for a heading.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;Badge types for reference&lt;/h2&gt;
&lt;p&gt;Posts are tagged with one of four types, each named after a box plot element:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Median&lt;/strong&gt; — finished, definitive essays&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Box&lt;/strong&gt; — dense, substantial analysis (this post)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Whisker&lt;/strong&gt; — recommendations and pointers outward&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Outlier&lt;/strong&gt; — rough ideas, seeds, short provocations&lt;/li&gt;
&lt;/ul&gt;
]]></content>
    <category term="box"/>
  </entry>
  
</feed>
