Data Models

Let’s take a deeper look at the Covid 19 data for Canada.

  • What is the story?
    • Outbreaks come and go in waves.
    • Each wave has a peak and a duration.
    • Experts tell us that as a pathogen matures it often becomes less lethal
    • Let’s start with the simplest model — y = ax + b the straight line.
  • If this was a stock chart your broker would tell you to sell it now.
  • Charles that is way too simple.
  • OK. Lets look at some expert models from IHME.
  • We often hear that growth of a new variant is exponential.
  • That can be confirmed by looking at the start of every wave. Each wave starts out as an exponential – and then tops out. A better model for the next wave would be a gaussian – the bell curve loved by statisticians everywhere.
  • Here are the “expert” projections from IHME. The dotted lines for future scenarios are shown on the right with a shaded area of uncertainty.
  • The projection is for the doom to peak on Feb 23 at 0.26 deaths per 100 thousand. Without getting too far into the mathematics we can see that the width of the projection conforms approximately to the width of previous outbreaks and the height is a reasonable projection of the first two peaks .
  • The model on the left is a 7 day rolling average which is used to “clean up” the picture of the past. I have already said I prefer to look at the raw numbers which allow me to estimate the uncertainty in the data as reported.
  • Clearly the further apart we draw the black lines the less confidence we have in the data. The model in the previous figure appears very certain and precise. Right — the truth is likely in there somewhere.
  • Notice how the uncertainty, the shaded portion of the projection, is much greater than the area between the black lines. The future is always uncertain but so are the past and the present. It is to the credit of IHME that they acknowledge the uncertainty in their projections if not in the reported data.
  • The green line is the optimists projection. The “republican” view of the data. It is just as uncertain as the “expert” projections and could have been shown as a box. It says the death rate will be zero on Feb 23.
  • IHME has another model, not shown, for Total deaths. We know this is a model because it gives us numbers about double those “reported”. What we do not know is exactly what fudge factor was used to transform reported data into “Total” data. We used to call these factors SWAG, like the green projection.
  • Lest I appear too cynical about the science here is a wonderful video about the models we use to predict the high and low tides. These models are a beautiful use of mathematics to accurately predict a very complex phenomenon.
  • “All models are wrong, but some are useful.”- George Box

How do you recognize a useful model? Simple. Good models make accurate verifiable predictions. Some examples are: tide tables, Google Maps trip times, and sunrise and sunset from my home automation setup DomoticZ.

A model in predictive form tells you the probability that an event will happen. The weatherman says forty percent chance of rain. You look out the window and the sun is shining. Oops. Weather forecasts are actually pretty good, mostly because they are so easy to verify and a lot of serious work has gone into their verification and calibration.

I’ll be back on Feb 23rd to see which model was closest to the truth.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: