We now turn to the subject of uncertainty quantification of these models. Each of the models provides estimates of uncertainty, where the IHME, YYG and LANL forecasts give 95% PIs, while the UT provides 90% PIs for predictions made prior to the forecast date of May 16. To translate these 90% PIs to 95% PIs for the UT model, we take the log of the prediction and the 90% PIs; calculate the difference between the log of the prediction and the log of the 90% PI limits and multiply this difference by a factor of 1.96/1.64. We recompute the 95% PIs on the log scale before transforming them back to the original scale.
Figure 4 presents plots of the 95% PIs for various predictions made by the models and the training ground truth. We follow [18] and define a k-step-ahead prediction and PI for a particular date, to be the prediction and accompanying PI made k days in advance of that date. For example, for June 3, a 1-step-ahead prediction and PI are the prediction and accompanying interval for June 3 made on the forecast date of June 2, while the 2-step-ahead prediction and PI for June 3 would be made on the forecast date of June 1, etc. The columns of Fig. 4 relate to the number of step-ahead predictions ranging from 1 to 7 in panel 4a and from 8 to 14 in panel 4b, while the rows of Fig. 4 correspond to the different models.
Figure 4 shows a number of interesting features. First, as documented in [18], the IHME model undergoes a number of dramatic changes in the calculation of the prediction and the corresponding PIs. The original IHME model underestimates uncertainty and 45.7% of the predictions (over 1- to 14-step-ahead predictions) made over the period March 24 to March 31 are outside the 95% PIs. The IHME model was revised on April 2 and made no predictions on April 1 and April 2. In the revised model, for forecasts from of April 3 to May 3 the uncertainty bounds are enlarged, and most predictions (74.0%) are within the 95% PIs, which is not surprising given the PIs are in the order of 300 to 2000 daily deaths. Yet, even with this major revision, the claimed nominal coverage of 95% well exceeds the actual coverage. On May 4, the IHME model undergoes another major revision, and the uncertainty is again dramatically reduced with the result that 47.4% of the actual daily deaths fall outside the 95% PIs—well beyond the claimed 5% nominal value. It is concerning, nevertheless, that the uncertainty estimates of the IHME model seem to improve with the forecast horizon, so that for the original model and latest IHME model update, more observed values fall within the 95% PIs for the 7-step-ahead prediction than for the 1-step-ahead prediction.
Second, Fig. 4 shows that the YYG model does not perform well in terms of uncertainty quantification, as there are many more actual deaths lying outside the 95% PIs than would be expected. Taken across the entire time period, the proportion of actual deaths lying outside the 95% PIs is 31.1%. We do, however, note that this percentage improves over time. From the forecast date of May 1 on-wards, the fraction of actual deaths lying outside the 95% PIs ranges from 28.6% for the 1-step-ahead prediction to 13.6% for the 14-step-ahead prediction, in comparison to 44.8% for the 1-step-ahead prediction to 48.3% for the 14-step-ahead prediction prior to this date.
Similarly, regarding the UT and LANL forecasts, neither has observed coverage consistently matching the 95% nominal coverage as shown in Fig. 5 (first plot in the top panel). For the UT model, the fraction of actual deaths lying outside the 95% PIs ranges from 14.6% for the 1-step-ahead prediction to 67.5% for the 14-step-ahead. The corresponding figures for the LANL are 40.0% for the 1-step-ahead prediction to 9.1% for the 14-step-ahead.
The second, third and fourth plots in the top panel (one for each version of ground truth) in Fig. 5 present this information from a slightly different perspective. In particular, from this point of view, for the 5-14 step-ahead predictions, LANL had the best observed coverage compared to the nominal 95% level, with short-term predictions tending to overestimate the ground truth. The PIs for the UT model are seen to deteriorate for predictions out into the future, with a tendency of the model to underestimate the daily number of deaths. The remaining two models, YYG and IHME, tended to provide daily death prediction PIs that systematically miss the nominal 95% coverage level, irrespective of distance into the future.
Examining the last panel in Fig. 5, with the exceptions of IHME evaluated on JHURD and LANL evaluated on the NYT ground truth, no model had more than 30% of daily death predictions falling within 10% of the ground truth, with the UT having virtually no predictions within a 10% bound of the ground truth, out into the future. When evaluated on their training ground truth, only 10.2% of the predictions fall within 10% of their training ground truth, irrespective of distance into the future.