My sister has COVID-19 but she's not being counted

The numbers are junk and we keep pretending they are not.

Apr 03, 2020

Statisticians have invented many wondering mathematical methods for estimation and inference.

However, no statistical method can fix bias in the data. As they say, “junk in, junk out”.

If there is bias in your data, it is better to do nothing than to do analysis and report the results. This is especially true if decision-makers rely on your expertise and the mathematical sophistication of the method as proof of the soundness of the analysis.

However, “expert” data modelers seem not to have gotten the message. They make forecasts on crap data.

Katherine Milkman @katy_milkman

Overconfidence is a pernicious bias, even in experts. It's astounding how few experts' confidence intervals included the correct estimate of #COVID19 infections in the US by 3/29 when forecasting for just two weeks in the future. (of course, non-expert estimates are even worse)

Last week, my sister got diagnosed with COVID-19. She can’t get tested.

Robert Osazuwa Ness @osazuwa

My sister @lydabean was diagnosed with #covid19 by an ER doctor, but can't get an actual test. Think about that when you hear reporting on "confirmed cases". Symptomatic people with actual diagnoses are not counted as "confirmed".

ktsm.comEl Paso woman gets coronavirus diagnosis but can’t get testedEL PASO, Texas (KTSM) An El Paso mother of three says she and her family began self-isolation back on March 15th out of an abundance of caution. Garcia family. “We took it very seriously in m…

My sister’s scenario is mind-boggling to me. She is fully symptomatic, literally has a COVID-19 diagnosis, as well as chronic asthma, and yet they will not waste a precious test on her. She will not be counted in her state’s statistics.

I find myself wondering, what is the point of testing then? We don’t use it to detect and quarantine the sick. We don’t use it to get a representative percentage of infected individuals in the population.

We just use them to produce junk data, which drives junk journalism and policy-making.

This has been one of my main takeaways so far in this crisis.

Robert Osazuwa Ness @osazuwa

Journalists would rather report highly problematic statistics (COVID19 infection numbers despite lack of tests and highly biased selection of who gets tested, bogus Chinese numbers, etc), than report no numbers. The latter is far worse for decision-making.

And don’t get me started on China…

For example, why do journalists treat Chinese numbers as though they are real despite the evidence that they are not? CNN now treats statements coming from the CCP with less skepticism than statements from the White House. I suppose the former is just less clumsy about its fake news.

Robert Osazuwa Ness @osazuwa

Shocker. If you work in #journalism or #DataScience and report official Chinese numbers or reports results of analyses based on their numbers, then you are part of their global propaganda apparatus (and your not even getting paid for it...) https://t.co/tC44q7wzvM

Financial analyst friends in China taught me that they never look at reported numbers from the government or Chinese corporations. They always model with some sort of proxy variable.

Robert Osazuwa Ness @osazuwa

Quants dont believe a Chinese corporate reports, they count shipping containers leaving out of the factory. They don't believe the occupancy numbers of a development, they count the lights on at night. Don't believe their #COVID19 numbers, count the urns.

vice.comWuhan’s Crematoriums Are Filling Thousands of Urns With Coronavirus Remains Each DayWuhan residents are waiting for hours in line to pick up the remains of their loved ones.

If you want to get a true impression of the numbers in China, just look at the lines of bereaved in front of the crematoriums, waiting to collect family members’ remains. Do the math. How many ovens? How many can corpses can they process a day? How many crematoriums in one city?

Spoiler alert; the math works out to way more than the number of deaths China is reporting.

This is not to say we shouldn’t model

Here is an interview on probabilistic modeling I did with Thomas Wiecki, VP of Quantopian this week.

Thomas Wiecki @twiecki

Intro to #Bayesian modeling using #COVID19 data: informative priors, hierarchical models, uncertainty in forecasts, model iteration (exponential, logistic, SIR models) youtube.com/watch?v=6ppOWv… with @osazuwa. Includes a strong disclaimer ;) #PyMC3 #NotAnEpidemiologist

youtube.comBayesian modeling of COVID-19 spread (1/5)Thomas Wiecki, VP of Data Science at Quantopian, introduces Bayesian modeling of COVID-19 spread using probabilistic programming language PyMC. Part 1 of 5. ...

A data scientist can take an epidemic model and train it on all that biased data. On the one hand, that is useful because it takes the subject domain knowledge of experts, encodes it into a computable artifact, and calibrates it on data. This gives us something we can work with.

Most importantly, these models can be used for sensitivity analysis, where you can test how sensitive your model’s forecasts are to bias in the data.

Robert Osazuwa Ness @osazuwa

@FreakX19 @twiecki Also, you can do sensitivity analysis on this manner of analysis. "What if there were more cases than the data suggests? Uh oh, look at how the forecast explodes if the estimate of infected increases by only X%!" That is extremely useful to decision-making.

Altdeep.ai Newsletter

Discussion about this post