[conspire] And why open source math in science papers is a good thing

Rick Moen rick at linuxmafia.com
Fri Aug 7 16:56:40 PDT 2020


Quoting Dire Red (deirdre at deirdre.net):

> https://smw.ch/article/doi/smw.2020.20336
> 
>> In the fitting procedure used in the manuscript Fig1c_Rscript.R
>> (available at https://github.com/ehylau/COVID-19
>> <https://github.com/ehylau/COVID-19>), the following condition is used
>> in the return line of the likelihood function:
>> return(-sum(lli[!is.infinite(lli)]))
>>
>> This condition will erroneously drop any data-point that has a
>> probability of zero (and hence a log-probability of −∞) under the
>> current model parameters. As the optimisation is initiated with a
>> shift value of 2.5 days, two data-points (54 and 68) are dropped from
>> the beginning of the fit procedure. This then leads to an erroneous
>> maximum likelihood infectiousness profile, which is displayed in
>> figure 1C of the original manuscript [1
>> <https://smw.ch/article/doi/smw.2020.20336#5573e39b6600496d40f493d00ec7658479a19607>].

> paper tl;dr: infectivity date isn’t 2-3 days prior to symptom start,
> it’s 4-5.

Which means that contact-tracing has been missing a significan fraction
of cases, by not tracing enough days back.

Explaining the paper requires delving into one of those topics most 
people get completely wrong: probability math.  It's squarely in my
wheelhouse (BA in mathematics with concentration in statistics), but 
I'm really rusty.  Let's have a go:

When medical authorities confirm a COVID-19 case, one vital task is to
trace back and contact everyone that person has spent significan time
near for some number of days into the past.  How many days?  Ah, that's
the math problem.  First, one must guesstimate how many days before
onset of symptoms a typical COVID-19 patient became significantly
infectious.  

Scientists collected a big load of data about time of symptom onset and
time of contact between persons.  Based on that, they fitted a curve to
the data of infectivity plotted against days before and days after
symptom onset.  They then tested that curve against actual case
information to see how much variance there is (where probability enters
the picture).  They compared this work against the widely accepted paper
on the same subject in mid-April that everyone's been relying on
(https://pubmed.ncbi.nlm.nih.gov/32296168/), and found that _those_
authors made an error in the probability calculation (that had
established a rough figure of 2.3 days of infectiousness before onset of
symptoms).  Basically, the earlier authors accidentally dropped two data
points, so their infectiousness profile graph was overly optimistic by a
couple of days.  With the revised curve, it follows that contact-tracers
will need to search contacts back at least 4 days before symptom onset,
not two, to catch 90% of presymptomatic infections.

The new paper's authors also found a small error in data-weighting in
the earlier paper (but this error did no significant harm).

(I hope that is correct and makes sense, but as I said I'm really rusty
at this.)




More information about the conspire mailing list