Abstract
This study addresses COVID-19 testing as a nonlinear sampling problem, aiming to uncover the dependence of the true infection count in the population on COVID-19 testing metrics such as testing volume and positivity rates. Employing an artificial neural network, we explore the relationship among daily confirmed case counts, testing data, population statistics, and the actual daily case count. The trained artificial neural network undergoes testing in in-sample, out-of-sample, and several hypothetical scenarios. A substantial focus of this paper lies in the estimation of the daily true case count, which serves as the output set of our training process. To achieve this, we implement a regularized backcasting technique that utilize death counts and the infection fatality ratio (IFR), as the death statistics and serological surveys (providing the IFR) as more reliable COVID-19 data sources. Addressing the impact of factors such as age distribution, vaccination, and emerging variants on the IFR time series is a pivotal aspect of our analysis. We expect our study to enhance our understanding of the genuine implications of the COVID-19 pandemic, subsequently benefiting mitigation strategies.
Similar content being viewed by others
Data Availibility
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
References
Organization WH (2023) WHO Coronavirus (COVID-19) Dashboard. https://covid19.who.int/ Accessed 2023-06-10
Harvey WT, Carabelli AM, Jackson B, Gupta RK, Thomson EC, Harrison EM, Ludden C, Reeve R, Rambaut A, Consortium C-GUC-U (2021) Sars-cov-2 variants, spike mutations and immune escape. Nature Reviews Microbiology 19(7), 409–424
Wu SL, Mertens AN, Crider YS, Nguyen A, Pokpongkiat NN, Djajadi S, Seth A, Hsiang MS, Colford JM Jr, Reingold A (2020) Substantial underestimation of sars-cov-2 infection in the united states. Nature communications 11(1):4507
Team C-F (2022) Variation in the covid-19 infection-fatality ratio by age, time, and geography during the pre-vaccine era: a systematic analysis. The Lancet 399(10334), 1469–1488 https://doi.org/10.1016/S0140-6736(21)02867-1
Brazeau NF, Verity R, Jenks S, Fu H, Whittaker C, Winskill P, Dorigatti I, Walker PG, Riley S, Schnekenberg RP (2022) Estimating the covid-19 infection fatality ratio accounting for seroreversion using statistical modelling. Communications medicine 2(1):54
Meyerowitz-Katz G, Merone L (2020) A systematic review and meta-analysis of published research data on covid-19 infection fatality rates. International Journal of Infectious Diseases 101:138–148
Barber RM, Sorensen RJ, Pigott DM, Bisignano C, Carter A, Amlag JO, Collins JK, Abbafati C, Adolph C, Allorant A (2022) Estimating global, regional, and national daily and cumulative infections with sars-cov-2 through nov 14, 2021: a statistical analysis. The Lancet 399(10344):2351–2380
Hortaçsu A, Liu J, Schwieg T (2021) Estimating the fraction of unreported infections in epidemics with a known epicenter: An application to covid-19. Journal of Econometrics 220(1):106–129
Chen Z, Feng L, Lay HA Jr, Furati K, Khaliq A (2022) Seir model with unreported infected population and dynamic parameters for the spread of covid-19. Mathematics and computers in simulation 198:31–46
Albani V, Loria J, Massad E, Zubelli J (2021) Covid-19 underreporting and its impact on vaccination strategies. BMC Infectious Diseases 21:1–13
Tang S, Cao Y (2023) A phenomenological neural network powered by the national wastewater surveillance system for estimation of silent covid-19 infections. Science of The Total Environment 902:166024
Guo Q, He Z (2021) Prediction of the confirmed cases and deaths of global covid-19 using artificial intelligence. Environmental Science and Pollution Research 28:11672–11682
Vaid S, Cakan C, Bhandari M (2020) Using machine learning to estimate unobserved covid-19 infections in north america. The Journal of bone and joint surgery. American volume
Dairi A, Harrou F, Zeroual A, Hittawe MM, Sun Y (2021) Comparative study of machine learning methods for covid-19 transmission forecasting. Journal of Biomedical Informatics 118:103791
Kamalov F, Rajab K, Cherukuri AK, Elnagar A, Safaraliev M (2022) Deep learning for covid-19 forecasting: State-of-the-art review. Neurocomputing 511:142–154
Rahimi I, Chen F, Gandomi AH (2023) A review on covid-19 forecasting models. Neural Computing and Applications 35(33):23671–23681
He S, Peng Y, Sun K (2020) Seir modeling of the covid-19 and its dynamics. Nonlinear dynamics 101:1667–1680
Perc M, Gorišek Miksić N, Slavinec M, Stožer A (2020) Forecasting covid-19. Frontiers in physics 8:127
Namasudra S Dhamodharavadhani S, Rathipriya R (2021) Nonlinear neural network based forecasting model for predicting covid-19 cases. Neural processing letters, 1–21
Dutta R, Das N, Majumder M, Jana B (2023) Aspect based sentiment analysis using multi-criteria decision-making and deep learning under covid-19 pandemic in india. CAAI Transactions on Intelligence Technology 8(1):219–234
Chimmula VKR, Zhang L (2020) Time series forecasting of covid-19 transmission in canada using lstm networks. Chaos, solitons & fractals 135:109864
Watson GL, Xiong D, Zhang L, Zoller JA, Shamshoian J, Sundin P, Bufford T, Rimoin AW, Suchard MA, Ramirez CM (2021) Pandemic velocity: Forecasting covid-19 in the us with a machine learning & bayesian time series compartmental model. PLoS computational biology 17(3):1008837
Kevrekidis GA, Rapti Z, Drossinos Y, Kevrekidis PG, Barmann MA, Chen Q-Y, Cuevas-Maraver J (2022) Backcasting covid-19: a physics-informed estimate for early case incidence. Royal Society Open Science 9(12):220329
Phipps SJ, Grafton RQ, Kompas T (2020) Robust estimates of the true (population) infection rate for covid-19: a backcasting approach. Royal Society Open Science 7(11):200909. https://doi.org/10.1098/rsos.200909
Miller AC, Hannah LA, Futoma J, Foti NJ, Fox EB, D’Amour A, Sandler M, Saurous RA, Lewnard JA (2022) Statistical deconvolution for inference of infection time series. Epidemiology (Cambridge, Mass.) 33(4), 470
Jahja M, Chin A, Tibshirani RJ (2022) Real-time estimation of covid-19 infections: Deconvolution and sensor fusion. Statistical Science 37(2):207–228
Sarría-Santamera A, Abdukadyrov N, Glushkova N, Russell Peck D, Colet P, Yeskendir A, Asúnsolo A, Ortega MA (2022) Towards an accurate estimation of covid-19 cases in kazakhstan: Back-casting and capture-recapture approaches. Medicina 58(2):253
Irons NJ, Raftery AE (2021) Estimating sars-cov-2 infections from deaths, confirmed cases, tests, and random surveys. Proceedings of the National Academy of Sciences 118(31):2103272118. https://doi.org/10.1073/pnas.2103272118
Raissi M, Perdikaris P, Karniadakis GE (2019) Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational physics 378:686–707
Center JHCR (2023) COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University. https://github.com/CSSEGISandData/COVID-19 Accessed 2023-06-10
Kidger P, Lyons T (2020) Universal approximation with deep narrow networks. In: Conference on Learning Theory, pp. 2306–2327. PMLR
Maiorov V, Pinkus A (1999) Lower bounds for approximation by mlp neural networks. Neurocomputing 25(1–3):81–91
Zhai J, Dobson M, Li Y (2022) A deep learning method for solving fokker-planck equations. In: Mathematical and Scientific Machine Learning, pp. 568–597. PMLR
Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980
Cleveland WS, Devlin SJ (1988) Locally weighted regression: an approach to regression analysis by local fitting. Journal of the American statistical association 83(403):596–610
Flaxman S, Mishra S, Gandy A, Unwin H, Coupland H, Mellan T, Zhu H, Berah T, Eaton J, Perez Guzman P, et al (2020) Report 13: Estimating the number of infections and the impact of non-pharmaceutical interventions on covid-19 in 11 european countries
Miller AC, Hannah L, Futoma J, Foti NJ, Fox EB, D’Amour A, Sandler M, Saurous RA, Lewnard JA (2022) Statistical deconvolution for inference of infection time series. Epidemiology 33(4):470–479. https://doi.org/10.1097/EDE.0000000000001495
Jahja M, Chin A, Tibshirani RJ (2022) Real-Time Estimation of COVID-19 Infections: Deconvolution and Sensor Fusion. Statistical Science 37(2):207–228. https://doi.org/10.1214/22-STS856
Disease Control C (2023a) Prevention: COVID-19 Weekly Cases and Deaths per 100,000 Population by Age, Race/Ethnicity, and Sex. https://covid.cdc.gov/covid-data-tracker/#demographicsovertime Accessed 2023-06-10
Akima H (1970) A new method of interpolation and smooth curve fitting based on local procedures. Journal of the ACM (JACM) 17(4):589–602
Akima H (1974) A method of bivariate interpolation and smooth surface fitting based on local procedures. Communications of the ACM 17(1):18–20
Easton DM, Hirsch HR (2008) For prediction of elder survival by a gompertz model, number dead is preferable to number alive. Age 30:311–317
Disease Control C (2023b) Prevention: COVID-19 Vaccination Age and Sex Trends in the United States, National and Jurisdictional. https://data.cdc.gov/Vaccinations/COVID-19-Vaccination-Age-and-Sex-Trends-in-the-Uni/5i5k-6cmh Accessed 2023-06-10
Lewnard JA, Hong VX, Patel MM, Kahn R, Lipsitch M, Tartof SY (2022) Clinical outcomes associated with sars-cov-2 omicron (b. 1.1. 529) variant and ba. 1/ba. 1.1 or ba. 2 subvariant infection in southern california. Nature medicine 28(9), 1933–1943
Ulloa AC, Buchan SA, Daneman N, Brown KA (2022) Estimates of sars-cov-2 omicron variant severity in ontario, canada. Jama 327(13):1286–1288
Ward IL, Bermingham C, Ayoubkhani D, Gethings OJ, Pouwels KB, Yates T, Khunti K, Hippisley-Cox J, Banerjee A, Walker AS, et al (2022) Risk of covid-19 related deaths for sars-cov-2 omicron (b. 1.1. 529) compared with delta (b. 1.617. 2): retrospective cohort study. bmj 378
Nyberg T, Ferguson NM, Nash SG, Webster HH, Flaxman S, Andrews N, Hinsley W, Bernal JL, Kall M, Bhatt S (2022) Comparative analysis of the risks of hospitalisation and death associated with sars-cov-2 omicron (b. 1.1. 529) and delta (b. 1.617. 2) variants in england: a cohort study. The Lancet 399(10332), 1303–1312
Disease Control C (2023c) Prevention: COVID data tracker: Variant Proportion. https://covid.cdc.gov/covid-data-tracker/#variant-proportions Accessed 2023-06-10
Disease Control C (2023d) Prevention: Rates of COVID-19 Cases and Deaths by Vaccination Status. https://data.cdc.gov/Public-Health-Surveillance/Rates-of-COVID-19-Cases-or-Deaths-by-Age-Group-and/54ys-qyzm Accessed 2023-06-10
Scheiner S, Ukaj N, Hellmich C (2020) Mathematical modeling of covid-19 fatality trends: Death kinetics law versus infection-to-death delay rule. Chaos, Solitons & Fractals 136:109891
Feng Z, Xu D, Zhao H (2007) Epidemiological models with non-exponentially distributed disease stages and applications to disease control. Bulletin of mathematical biology 69(5):1511–1536
Ghosh S, Volpert V, Banerjee M (2022) An epidemic model with time-distributed recovery and death rates. Bulletin of Mathematical Biology 84(8):78
Shah S, Gwee SXW, Ng JQX, Lau N, Koh J, Pang J (2022) Wastewater surveillance to infer covid-19 transmission: A systematic review. Science of The Total Environment 804:150060
Daughton CG (2020) Wastewater surveillance for population-wide covid-19: The present and future. Science of the Total Environment 736:139631
Acknowledgements
We would like to thank REU students Ziyan Zhao for collecting vaccination data and Jessica Hu for collecting age group case data.
Author information
Authors and Affiliations
Contributions
Yao Li and Ning Jiang are partially supported by NSF DMS-1813246 and DMS-2108628. Charles Kolozsvary is partially supported by the REU part of NSF DMS-1813246 and NSF DMS-2108628.
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A: Additional Data About COVID-19
Appendix A: Additional Data About COVID-19
In this section we present many figures that demonstrate raw data, processed data, and intermediate results used to generate the training set. Some data for selected states have been already demonstrated in the main text. This includes
-
1
Time series of IFR for all 50 states plus Washington DC
-
2
Time series of recovered true cases and undercounting factor for all 50 states plus Washington DC
-
3
Raw and smoothed confirmed daily case count and daily death count for all 50 states plus Washington DC
-
4
Time series of case rate per age group at all regions of the United States
-
5
Time series of vaccination rate of all age group for all 50 states plus Washington DC
-
6
Incident rate ratio of COVID-19 case and death for vaccinated and unvaccinated groups.
-
7
Time series of testing volume for all 50 states plus Washington DC
1.1 A.1 Time Series of State IFR
The time series of IFR for 10 selected states are presented in the main text. Below we demonstrate the time series of IFR for all 50 states plus Washington DC after considering age group case rate, vaccination, variant in Figs. 13 and 14.
1.2 A.2 Time Series of State Recovered True Case
The time series of recovered true case and under counting factor for 10 selected states are demonstrated in the main text. Here we show these data for all 50 states plus Washington DC in Figs. 15 and 16.
1.3 A.3 State Confirmed Case and Death
Figures 17 and 18 show the daily case count and \(100 \times \) daily death count of all 50 states plus Washington DC. The data comes from the JHU COVID-19 database (Center 2023). Figure 19 and 20 are the processed daily case count and daily death count after addressing data dump and holiday issues.
1.4 A.4 Case Rate Per Age Group
Figure 21 shows the time series of case rate of each age group from all 10 regions provided by CDC (Disease Control 2023a). The HHS regions used by CDC is described in the following Table 1.
1.5 A.5 State Vaccination Rate
Figures 22 and 23 gives the time series of vaccinate rate for each age group older than 18 years old in all 50 states plus Washington DC. This data is obtained from CDC (Disease Control 2023b).
1.6 A.6 Incident Rate Ratio (IRR) of Vaccinated and Unvaccinated Groups
The incident ratio of COVID-19 infection and death for each group is given in Fig. 24. This data is obtained from CDC website (Disease Control 2023d). Note that death data of younger age group is not included because there are too few, sometimes zero, death count from vaccinated young group in many weeks. The ratio of IFR of unvaccinated group to vaccinated group of three older age groups are shown in Fig. 24 Right.
1.7 A.7 State Testing Volume
Figures 25 and 26 gives the time series of smoothed COVID-19 test volume in all 50 states plus Washington DC. This data comes from the Coronavirus Resource Center of Johns Hopskins University (Center 2023).
1.8 A.8 Training Set Data Distribution
Figure 27 displays the distribution of the data in the training data set.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Jiang, N., Kolozsvary, C. & Li, Y. Artificial Neural Network Prediction of COVID-19 Daily Infection Count. Bull Math Biol 86, 49 (2024). https://doi.org/10.1007/s11538-024-01275-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11538-024-01275-3