Covid’s Pareto distribution

Last post, I was speculating. Now with real data!

At the bottom of my last post about a December 2019 Covid case in Portland, Oregon, I attempted to reconcile two facts:

● there are a growing number of reports from people who have tested antibody positive who are convinced they had Covid-19 in December, 2019 and January, 2020;

● there is data from viral genome sequencing that shows most SARS-CoV-2 cases on the U.S. West Coast descend from one copy of the virus introduced near Seattle from Wuhan on 15 January, 2020.

By way of explanation, I used for the epidemic the analogy of a forest fire:

  • The wind carries glowing embers (the virus) ahead of the blaze (the epidemic)
  • Only a few of those embers land on spots with enough connected dry brush to get new fire going
  • Other embers, indeed most of them, may make small fires that flare up for a moment, but those flare-ups run out of local fuel and die out
  • Since this action is going at the leading edge of the big fire and is soon subsumed by the moving conflagration, we may not notice this mechanism at work
  • Depending on how fuel is distributed in the unburned part of the forest, it would not be unusual for the effect of each ember to follow a Pareto distribution.
  • Indeed, there is a classic toy Fire Model (kids can run it using Net Logo) that demonstrates clearly a phase transition from “fizzle out” to “full fire” when fuel density crosses a 67% threshold

On Tuesday, I noted this article by Kai Kupferschmidt in Science Magazine: “Why do some COVID-19 patients infect many others, whereas most don’t spread the virus at all?”

First, I highly recommend it.

Kai’s article mentions a study by Gwenan Knight and colleagues at the London School of Hygiene & Tropical Medicine (LSHTM) published as “What settings have been linked to SARS-CoV-2 transmission clusters?” on Wellcome Open Research.

The authors did a search of the scientific literature and media articles detailing clusters of SARS-CoV-2 transmission and extracted the data into a Google Sheets file.

From there, it was a few steps to the chart above.

This is bad news for people who like to play with simple models where they can change R_0. In the real world, the majority of people do not transmit. The mode for R_0 — as opposed to the average — may well be zero.

To build a realistic model, in addition to R, you need k, a dispersion factor, which describes how much a disease clusters. The lower k is, the more transmission comes from a small number of people. Kai quotes Adam Kucharski of LSHTM estimating that k for COVID-19 is as low as 0.1. “Probably about 10% of cases lead to 80% of the spread,” Kucharski says.

I believe k was something my old Professor Bradford DeLong was on the brink of discovering in this post on his blog.

80–10 is not the 80–20 rule, but it’s close enough.

You can follow the COVID aB Tracking blog on Twitter @Will_Bates_sci

Will Bates writes about science, technology, and business. His journalism has appeared in the New York Times, the Wall Street Journal, and numerous magazines.