Panel Associations Between Newly Dead, Healed, Recovered, and Confirmed Cases During COVID-19 Pandemic

Background Currently, the knowledge of associations among newly recovered cases (NR), newly healed cases (NH), newly confirmed cases (NC), and newly dead cases (ND) can help to monitor, evaluate, predict, control, and curb the spreading of coronavirus disease 2019 (COVID-19). This study aimed to explore the panel associations of ND, NH, and NR with NC. Methods Data from China Data Lab in Harvard Dataverse with China (January 15, 2020 to January 14, 2021), the United States of America (the USA, January 21, 2020 to April 5, 2021), and the World (January 22, 2020 to March 20, 2021) had been analyzed. The main variables included in the present analysis were ND, NH, NR, and NC. Pooled regression, stacked within-transformed linear regression, quantile regression for panel data, random-effects negative binomial regression, and random-effects Poisson regression were conducted to reflect the associations of ND, NH, and NR with NC. Event study analyses were performed to explore how the key events influenced NC. Results Descriptive analyses showed that mean value of ND/NC ratio regarding China was more than those regarding the USA and the World. The results from tentative analysis reported the significant relationships among ND, NH, NR, and NC regarding China, the USA, and the World. Panel regressions confirmed associations of ND, NH, and NR with NC regarding China, the USA, and the World. Panel event study showed that key events influenced NC regarding USA and the World more greatly than that regarding China. Conclusion The findings in this study confirmed the panel associations of ND, NH, and NR with NC in the three datasets. The efficiencies of various control strategies of COVID-19 pandemic across the globe were compared by the regression outcomes. Future direction of research work could explore the influencing mechanisms of the panel associations.


Introduction
Despite travel restrictions [1] and limitations [2], coronavirus disease 2019 (COVID-19) has rapidly spread across the globe as a result of multiple literature. For instance, a longitudinal analysis concluded the impact of COVID-19 could migrate between vulnerable counties [3]. Another theoretical study demonstrated that a large-scale spatial transmission of COVID-19 was caused by the relatively high per-capita rate of transmission [4]. To tackle the spread of COVID-19, a growing number of countries initiated practical strategies (in-house isolation, quarantine, and promoting general awareness about transmission routes) against further development of contagion [5]. But consequently, the situation rapidly deteriorated with increasing number of newly confirmed cases (NC) [6], especially in western countries. Especially, a certain empirical law of COVID-19 spread attracted academic attention [7]. Although NC between countries was reported [8], national gaps among newly recovered cases (NR), newly healed cases (NH), and newly dead cases (ND) were seldom documented in the current academic literature. Statistical analyses with micro and macro data of COVID-19 pandemic can help evaluate the relevant control interventions.
Till now, regarding the epidemic evolution of total COVID-19 infections, analytical methods of control efficiency of COVID-19 pandemic are limited and biased. Notably, trend forecast with publicly available micro epidemiological data has been particularly the mainstream in the field of COVID-19 control. For example, multiple studies forecast a trend of the COVID-19 spreading in China [9][10][11]. Moreover, the temporal dynamics of the COVID-19 epidemic were reported in the parts of the World including Huangshi city, China [12], South Korea [13], UK and Sweden [14], Pakistan [15], and Wuhan, China [16]. The survival duration including the average lag between NC and ND [17], lethal duration [18], and COVID-19 duration [19] were employed to reflect the evolution of COVID-19 pandemic. But, forecast and trend methods often considered time change and neglect the relationships among ND, NH, NR, and NC. Additionally, pure mathematics underlined prediction errors caused by large uncertainties [20]. However, those studies without regional, national, and global variables could not obtain correct and scientific findings.
To date, analytical tools in published studies were limited to reflect the associations of ND, NH, and NR with NC. For example, a substantial body of time series models and simulations employed not spatial and locational factors but temporal factors [21][22][23][24][25][26]. Several simulations reported time trend of ND, NH, NR, and NC, but provided limitations in studying locational differences [27][28][29]. Thus, time series studies and simulations led to partial and biased research outcomes. Even more importantly, panel associations of ND, NH, and NR with NC were not analyzed.
Furthermore, policy interventions were not considered in the current studies. From December 12, 2019 till now, a series of daily policies and regulations were released by the Chinese government, global organizations, and western countries and documented in China Data Lab [30]. With publicly available data of the COVID-19 pandemic for both the USA and Italy, a study observed that the future NC, ND, and NR of COVID-19 were reasonably predicted [31]. Thus, trend driven by policy outcomes regarding NC which indirectly assessed national struggling efforts against COVID-19 pandemic often were neglected.
The progress in COVID-19 crisis was formally characterized by ND, NH, NR, and NC. Thus, this study based on publicly available longitudinal datasets to explore panel associations of ND, NH, and NR with NC. According to the presumptions of the panel models, pooled regression, stacked within-transformed linear regression, quantile regression for panel data, random-effects negative binomial regression, and random-effects Poisson regression would be conducted to reflect the associations of interest regarding China, the USA, and the World. Subsequently, panel event study was performed to reflect the trends of NC. Consequently, the endemic control performance would be further analyzed, assessed, and compared on the basis of the empirical outcomes.

Data Sources and Selection
Daily cases in China included the numbers of NH, NC, and ND at the province-level unit available from January 15, 2020 to January 14, 2021 [32]. Daily cases in the USA included the numbers of ND and NC at the state-level unit available from January 21, 2020 to April 5, 2021 [33]. Daily cases in the World (outside Antarctica, China, the USA, and MS Zaandam) included ND, NR, and NC at the countrylevel unit available from January 22, 2020 to March 20, 2021 [34]. The dataset of China contained information on 31 province-level units. The dataset of the USA contained information on 51 states. The dataset of the World contained information on 192 countries and regions. The geographical divisions could be found in Appendix. There was no data cleaning performed on the raw data available at Harvard dataverse.

Front-and-Back Plots
Before designing statistical strategies, the relationships between NC and ND, between NH and NC, and between NH and ND regarding China, the relationship between ND and NC regarding USA, and the relationships between NC and ND, between NC and NR, and between ND and NR regarding the World were depicted by front-and-back plots in Figs. 1, 2, 3, 4, 5, 6 and 7 [35]. Due to sparse distribution in Figs. 1, 2, 3 and asymptotic normality in Figs. 4, 5, 6 and 7, several linear and nonlinear panel regression models were considered as potential analytical methods when normality assumptions were violated.

Tentative Analyses
Tentative analysis on the relationships between ND, NH, NR, and NC was performed by a one-stop solution for robust inference with multiway clustering (Stata package vcemway) [36]. In the sample, the identification code and day were identified as the clustered variables of interest. Thus, this study extended the ordinary least squares regression to incorporate random effects at the individual level. The     Relationship between NR and NC regarding the World following analyses estimated the resulting random effects model and adjusted its standard errors for two-way clustering in identification code and day. As compared with the statistical outcomes from the ordinary least squares regression, two-way clustering can lead to more conservative inferences than one-way clustering approaches.

Panel Analyses
The main associations of interest in this study were panel associations of ND and NH with NC regarding China, panel associations between ND and NC regarding the USA, and panel associations of ND and NR with NC regarding the World. In the pooled regression analysis, the regions of China (Central China, Western China, Northeast China, and East China), the USA (New England, Mid-Atlantic Region, the South Region, Mid-West Region, the Southwest Region, and the West Region), and the World (Africa, Asia, Europe, North America, Oceania, and South America) were also considered as covariates.
The count data of ND, NH, NR, and NC tended to follow the Poisson or negative binomial distributions. In this large sample, the distributions approached to normal distributions approximately. Regarding the associations, the feasible panel models could be linear and nonlinear models. When NR, NH, NC, and ND were considered as count data, randomeffects negative binomial regression and random-effects Poisson regression could be employed to reflect the associations of interest in nonlinear models. When ND, NH, NR, and NC were considered as continuous variables, pooled regression, stacked within-transformed linear regression, and quantile regression for panel data could be employed to explore the associations of interest in linear models.
Regarding China, the panel associations of ND and NH with NC could be found by using the regression model (1): Regarding the USA, the panel associations between ND and NC could be found by using the regression model (2): Regarding the World, the panel associations of ND and NR with NC could be found by using the regression model (3): Here, β 0 was constant. β 1 and β 2 were coefficients. μ 1 , μ 2 , and μ 3 were random errors. If optimized iterations were not concave, the possible calculations of chosen methods were deleted.
Regarding cubic or quadratic equations, this study aimed to explore the associations of interest rather than dynamic system analysis. Thus, it was unnecessary to conduct regressions with squared terms or interactions.
Pooled regressions are usually carried out to analyze available time series of cross-sections. The main advantage of pooled regression is the ability to measure different factors at the region level and aggregate results at the national level. The main disadvantages of pooled regression are overestimating and underestimating the impact in the regions.
Stacked within-transformed linear regression analysis was performed by Stata program xtstackreg [37]. Regarding the suitability and applicability, stacked within-transformed linear regression accommodated fixed-effects estimation, applied a degrees-of-freedom adjustment, and allowed for factor-variables in dependent variables. When regressing regarding China, the USA, and the World, all region-level units entered into regressions. After regression calculation, parts of the geographical covariates were left in the regression outcomes. The main advantage of stacked within-transformed linear regression is the ability to generate predictions from a "stacked" ensemble of models, including LASSO regression, k-nearest neighbors, random forest, and gradient boosting. This technique produces superior estimates with larger samples.
Quantile regression for panel data was performed by Stata program qregpd with Nelder-Mead optimization [38]. Likewise, quantile regression for panel data addresses a fundamental problem posed by alternative fixed-effect quantile estimators: inclusion of individual fixed effects alters the interpretation of the estimated coefficient on the treatment variable. Compared to the standard mean regression models, quantile regression models are more robust and flexible, which can help to account for unobserved heterogeneity and heterogeneous covariates effects. According to Powell (2015), a quantile regression estimator can be used to evaluate impacts of exogenous and endogenous treatment (1) NC ∼ 0 + 1 NH + 2 ND + 1 .

Fig. 7
Relationship between ND and NR regarding the World variables on an outcome distribution among the sample with small T [39]. Simultaneously, random-effects negative binomial regression and random-effects Poisson regressions were conducted.

Panel Event Study
This study included panel models for the associations of interest and prediction models for the effects of key events. A panel event study implemented by the program "eventdd" in Stata [40] was employed to analyze how the key events influenced NC. With a difference-in-difference style model, a series of lag and lead coefficients and confidence intervals (CIs) were estimated and plotted. In the context, three key events were adopted as treatments regarding China, the USA, and the World (outside Antarctica, China, the USA, and MS Zaandam), respectively. On February 5, 2020, China released tax exemption and loan policies to beef up coronavirus containment (http:// en. nhc. gov. cn/ 2020-02/ 06/c_ 76511. htm). Coronavirus Guidelines for America was issued on March 16, 2020 in the USA (https:// www. white house. gov/ briefi ngs-state ments/ coron avirus-guide lines-ameri ca/). On March 11, 2020, WHO characterized COVID-19 as a pandemic (https:// www. who. int/ emerg encies/ disea ses/ novelcoron avirus-2019/ events-as-they-happen).
All analyses were performed with Stata (Version 14 and 16, Stata Corporation, College Station, TX, USA).

Tentative Analyses
In Table 2, NC was significantly predicted by ND and NH regarding China. Simultaneously, NC was significantly predicted by ND regarding the USA. NC was significantly predicted by ND and NR regarding the World.

Pooled Analyses
In

Panel Regressions
Before conducting random-effects Poisson regression and random-effects negative binomial regression, the 66 values of NC (< 0) were treated as missing values. The results from the estimation presented in Table 4 indicated that ND and regions had significant effects on NC regarding China.
The results from the estimation presented in Table 5 indicated that ND had significant effects on NC in stacked within-transformed linear regression, quantile regression for panel data, random-effects Poisson regression, and  random-effects negative binomial regression regarding the USA. Moreover, regions had significant effects on NC in random-effects Poisson regression regarding the USA. The results from the estimation presented in Table 6 indicated that ND and NR had significant effects on NC in stacked within-transformed linear regression and randomeffects Poisson regression regarding the World. Moreover, regions had significant effects on NC in random-effects Poisson regression regarding the World.  Figure 10a-

Main Outcomes
This study employed publicly available daily datasets including the samples of China, the USA, and the World (outside Antarctica, China, the USA, and MS Zaandam) and obtained the associations of ND, NR, and NH with NC regarding China, the USA, and the World, respectively. In panel event study, curve lines showed key events influenced NC regarding the USA and the World significantly, while straight line showed key events nearly had no significant influence on NC regarding China.
Congruent with a prior study [41], this study confirmed the effects of control measures. Regression outcomes provided coarse estimates of controlling performance       comparisons of COVID-19 pandemic. This study was in line with early simulation outcomes which found that their NH rates were the approximately linear increasing functions and the ND rates were the small constants [42]. This could partially explained by an early study which indicated that socio-economic determinants and city sizes had high impacts on the change of COVID-19 transmission in China [43]. Because of mean value of NH/NC ratio (China) > mean value of NR/NC ratio (the World) and mean value of NH/ ND ratio (China) > mean value of NR/ND ratio (the World), the practical performance of COVID-19 controlling in China was seemly better than that in the other countries. Some Chinese scholars agreed with this judgment [44,45].
With regard to methodologies, the findings in panel event study were in line with prior studies. For example, an exploratory data analysis with visualizations had been made to understand the number of NR, NC, and ND in China [46]. An 82-day (January 21 to April 12, 2020) forecast infections for COVID-19 death indicated that forecast placed the COVID-19 peak in the USA around July 14, 2020 [47]. This study was in line with another study which revealed that the effect of NC on ND was heterogeneous across provinces in China [48]. Furthermore, the spread of COVID-19 up to February 5, 2020 the number of NC showed a trend of "rapid increase before slowing down" [49]. Another forecast showed that the cumulative number of cases for Italy, UK and the USA corresponded to the diminishing average daily rate, from April 22 to May 22, 2020 [50].
Changes of COVID-19 ND, NH, NR, and NC in various regions could be influenced by life style, environmental factors, regulations, and progressing stages. Regarding life style, change in social distancing [51], increase of space-time clusters [52], and different sets of neighborhood characteristics [53] could be identified as risk factors for ND and NC during the COVID-19 pandemic. As to environmental factors, a study indicated temperature and the columnar density of total atmospheric ozone had a strong association with the tendency of COVID-19 spreading in almost all states in the USA [54]. As for regulations mainly including mobility restrictions and other non-pharmacological interventions, ill-prepared work [55], facemask shortage [56], poor traveller screening [57], forgone care [58], and population migration [59] could lead to ineffective prevention and controlling COVID-19. Regarding progressing stages, changes of COVID-19 ND, NH, NR, and NC might be caused by COVID-19 epidemic progressing laws differentially in various countries. Theoretically, various phases of COVID-19 epidemic documented four phases in 61 most affected countries [60], three or four phases in Wuhan City, Hubei Province and China [61], and five stages in China's non-Hubei provinces [62].
There were small curves in the point estimation regarding China and wide range of trajectories regarding the USA. This could be partially explained by several studies. For example, a study showed rapid nucleation and diffusion in January 2020 followed by rapid NC decrease in February in China, while the USA showed a wide range of trajectories, with an abrupt transition from slow NC increase in January and February, to rapid geographic dispersion shortly before mobility reductions occurred in March [63]. Regarding the epidemic trends of national and state regional administrative units, a study from July 27, 2020, to January 22, 2021 indicated the turning point of the early epidemic in the USA was predicted to occur in September [64]. Another model inferred that the inflection point of the epidemic across China would be mid-February, and the end of the epidemic would be in late March [65].

Strengths and Weaknesses
Regarding data sources, this study employed three datasets. The current study had a large sample size which increased the precision of the study. Additionally, more than 1-year period could provide reliable results regarding epidemic control and daily changes in the prevalence of COVID-19 conditions. Regarding statistical methods, this study adopted several advanced panel regression methods. Especially, the event study with difference in difference was used to analyze the role of key events. Compared with the other studies [66][67][68][69], the results from this study were significantly more accurate, realistic, appropriate, and suitable for long-time series outbreak data. Another advantage of this study was under the consideration of key events.
There were several limitations. First, several variables including demographics, financial support, and international aids were not taken into account. Statistically, a study in South Korea found that sex, region, and infection reasons affected on both NR and ND [13]. Second, this study did not adopt newly designed methods conceived by the author to analyze the law of spread and transmission of COVID-19. Changes in case definitions affected inferences on the transmission dynamics of COVID-19 allowed detection of more cases as knowledge increased in China [70]. Finally, this study did consider one key event rather than varying treatment time and duration [71].

Conclusion
Using panel analysis and data collected in China provincelevel units, the USA state-level units, and the World countrylevel units (outside Antarctica, China, the USA, and MS  Zaandam), regressions confirmed the positive panel associations between NH, ND, and NC regarding China, between ND and NC regarding the USA, between NR, ND, and NC regarding the World. Panel event study showed key events influenced NC regarding the World and the USA more forceful and unsteady as compared to that regarding China. Future work on the basis of the current study should be performed on the influencing mechanism of the panel associations.  Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.