Development of a risk score to identify patients at high risk for a severe course of COVID-19

Aim We aimed to develop a risk score to calculate a person’s individual risk for a severe COVID-19 course (POINTED score) to support prioritization of especially vulnerable patients for a (booster) vaccination. Subject and methods This cohort study was based on German claims data and included 623,363 individuals with a COVID-19 diagnosis in 2020. The outcome was COVID-19 related treatment in an intensive care unit, mechanical ventilation, or death after a COVID-19 infection. Data were split into a training and a test sample. Poisson regression models with robust standard errors including 35 predefined risk factors were calculated. Coefficients were rescaled with a min–max normalization to derive numeric score values between 0 and 20 for each risk factor. The scores’ discriminatory ability was evaluated by calculating the area under the curve (AUC). Results Besides age, down syndrome and hematologic cancer with therapy, immunosuppressive therapy, and other neurological conditions were the risk factors with the highest risk for a severe COVID-19 course. The AUC of the POINTED score was 0.889, indicating very good predictive validity. Conclusion The POINTED score is a valid tool to calculate a person’s risk for a severe COVID-19 course. Supplementary Information The online version contains supplementary material available at 10.1007/s10389-023-01884-7.


Introduction
In the current COVID-19 pandemic, vaccinations are an essential tool to prevent severe disease courses of a SARS-Cov-2 infection and protect public health. Many countries including Germany prioritized elderly people, residents, and personnel of long-term care facilities, healthcare workers, social care personnel, and people with certain comorbidities for a COVID-19 vaccination at the beginning of the vaccination campaign due to limited vaccine availability and later on for booster vaccinations (BMG 2021;STIKO 2021). The aim of the German vaccination campaign against COVID-19 was to minimize the number of severe disease courses and COVID-19 associated mortality. Furthermore, securing the operability of the health care system and the protection of individuals at high risk of infection due to their profession (Waize et al. 2021). In Europe, vaccination campaigns against COVID-19 are advanced, but complete vaccination coverage including a booster dose varies between countries (ECDC 2022). As of 20 February 2022, 56.3% of the German population had received two doses plus a booster immunization against COVID-19 (BMG 2022). In February 2022, the German Standing Committee on vaccination [(STIKO [Ständige Impfkommission]) recommended a second booster vaccination for (i) people 70 years and older, (ii) residents and personnel of long-term care facilities, (iii) people with an immunodeficiency from the age of five, and (iv) employees of medical facilities, especially those with direct patient contact (STIKO 2022).
The relevance of different comorbidities as risk factors for a severe disease course of COVID-19 has been well described (Dreher et al. 2020;Gagiannis et al. 2020;Grunert et al. 2020;Härter et al. 2020;Monika et al. 2020;Nachtigall et al. 2020;Rößler et al. 2021). Relevant comorbidities include but are not limited to autoimmune diseases, (hemato-) oncological conditions as well as cardiovascular diseases such as heart failure and coronary heart disease. Limited evidence was available regarding the effect of risk factors in different age groups (Treskova-Schwarzbach et al. 2021) and whether a cumulative effect of multiple risk factors in a person is relevant regarding a persons' risk for a severe disease course. Hence, a potentially cumulative effect of the simultaneous presence of multiple risk factors were not incorporated in the vaccination recommendations in Germany. We aimed to develop and validate a risk score to calculate a person's individual risk for a severe COVID-19 course, which accounts for the presence of multiple risk factors at the same time and incorporates differential effects of underlying comorbidities in different age groups, using German claims data. This score (POINTED score) can be used to identify people who would most benefit from additional (personal) protective measures against COVID-19 and fourth vaccination to prevent severe courses of disease.

Data base
This cohort study was based on nationwide claims data of approximately 38 million individuals under statutory health insurance (SHI) in Germany. Data from two German Local Health Care Funds [AOK PLUS Sachsen and AOK Bayern], the BARMER, and the DAK-Gesundheit and the Techniker Krankenkasse (TK) as well as the research database of the Institute of Applied Health Research Berlin including anonymized claims data of company health insurances [Betriebskrankenkasse (BKK)] were used for the purposes of this analysis. Claims data from the AOK PLUS Sachsen and from DAK-Gesundheit were analyzed at the Center for Evidence-Based Healthcare (ZEGV) at the TU Dresden and Vandage GmbH, respectively. In total, the data included information of approximately 38 million persons, which corresponds to approximately 46% of the German population.
In addition to sociodemographic information (age and sex) and vital status (i.e., date of death), German claims data contain information about performed ambulatory services (according to "Einheitlicher Bewertungsmassstab," EBM), diagnoses documented in the ambulatory and hospital setting (according to the International Statistical Classification of Diseases and Related Health Problems -German Modification, ICD-10-GM) and procedures conducted (according to the "Operationen-und Prozedurenschluessel," OPS; German modification of the International Classification of Procedures in Medicine, ICPM) as well as drug prescription data (according to the German Anatomical Therapeutic Chemical (ATC) Classification). Longitudinally linked data from the years 2019 and 2020 have been used for the purposes of this study. Due to German data protection law, pooling of individual-level data was not feasible. Hence, six harmonized health insurance data sets were analyzed separately by authorized institutes or the healthcare research department within the respective health insurance.

Study population
Adult patients with a confirmed ambulatory and hospital COVID-19 diagnosis (ICD-10 U07.1!; laboratory confirmed SARS-CoV-2 virus) between 27 January 2020 and 31 December 2020 were included in the analysis. COVID-19 patients had to be continuously enrolled in the SHI in the year 2019 up to the date of the COVID-19 infection and from their COVID-19 infection until death or 31 December 2020, whichever came first. Based on COVID-19 related ambulatory and hospital services the beginning of the COVID-19 infection was determined. Claims data from the year 2019 were used to assess risk factors for a severe course of COVID-19.

Outcome
The outcome severe COVID-19 course was defined as COVID-19 related treatment in an intensive care unit, mechanical ventilation, or death after a COVID-19 infection. Intensive care treatment and mechanical ventilation had to occur within a hospitalization for which COVID-19 was documented as a discharge diagnosis. Deaths occurring within 30 days after a COVID-19 infection, during a COVID-19 related hospitalization, or within 14 days following such hospitalizations were defined as COVID-19 related. An overview of the procedure codes used to identify intensive care treatment and mechanical ventilation is provided in the supplementary material S1.

Risk factors
Based on an umbrella review by  and further the recommendations of the STIKO, we defined 35 conditions that were associated with a severe course of COVID-19. The conditions were defined using ICD-10 codes derived from hospital discharges and ambulatory physicians and psychotherapists in the year prior to the COVID-19 infection. Ambulatory diagnoses had to be documented in at least two quarters in 2019. Furthermore, prescriptions for specific therapies were required to validate ambulatory diagnoses of asthma, coronary heart disease, COPD, depression, diabetes, hematologic, metastatic, and solid cancer with therapy, heart failure, hypertension, and severe psychiatric diseases. The definition of risk factors is available from the corresponding author upon request.

Statistical modeling
Statistics for the entire COVID-19 cohort across all data sites were summarized descriptively. To develop the score, the total cohorts of COVID-19 patients in 2020 selected at each data site were split into training and a test data set. The training data sets, used to develop the score, included a random 90% sample of the total study population at the respective data site. The random test data set included the other 10% of the study population and was used to assess the performance of the developed score.
To evaluate whether effect modification of certain risk factors by age was present, age stratified Poisson regression models with robust standard errors using the training data sets were estimated (Zou 2004). To estimate the model, the fisher scoring algorithm was used. Poisson regression yields consistent estimators of model coefficients irrespective of the distribution of the outcome (Gourieroux et al. 1984). The following age groups were chosen for age stratification: 18-64 years, 65 to 79 years, and 80 and older. The regression results estimated at the individual data sites were pooled with a meta-analysis using the metagen routine in the R package meta (Schwarzer 2021). The pooled age stratified results were reviewed by an expert panel of physicians to decide which age and risk factor interactions to include in the final model.
The models with the selected interaction terms were fitted at each data site and pooled again using meta-analysis. These final coefficients were rescaled with a min-max normalization on the scale 0 to 20 (Patro and Sahu 2015). Negative coefficients were set to zero.
To evaluate the scores' discriminatory ability, the test data sets at each data site were used to determine the area under the curve (AUC) of the receiver operating characteristics (ROC) and the Youden index, the point of the ROC curve with the highest combined true positive rate (TPR) and false positive rate (FPR). The performance of the pooled score model was evaluated separately at each site. Aggregated statistics about the grouped age and score distribution and prediction performance (area under the curve) were pooled at ZEGV. The performance of the grouped risk score model was compared to the performance of a score model based on grouped age only.

Results
A total of 623,363 patients with a confirmed COVID-19 infection between 27 January 2020 and 31 December 2020 were included in the analysis at all participating data sites. Approximately 42% were male and 22% were above the age of 64. Table 1 shows descriptive baseline characteristics for the training and the test data set. The most common risk factors included hypertension (21.68%), depression (7.09%), and chronic renal failure (7.09%) in the training data set. Due to data protection reasons, conditions that occurred in less than five patients in one of the included data sets could not be reported. Approximately 5% (n = 3297) of COVID-19 patients experienced a severe course of the COVID-19 disease (intensive care treatment, mechanical ventilation, or death after COVID-19 infection). There was no evidence for systematic differences regarding the assessed baseline characteristics or outcome frequency between training and test data set.
The min-max normalized POINTED score values as well as the risk ratios derived from the pooled Poisson regression results are displayed in Table 2. Due to the reference category "18 to 24 years," the exponentiated coefficients for the age groups appear high compared to the disease/no disease ratios. The factors with the highest risk for a severe course of COVID-19 besides age included down syndrome (6 points) and hematologic cancer with therapy in patients between 18 and 64 years (5 points), immunosuppressive therapy in patients between 18 and 64 years (4 points), and other neurological conditions in that same age group (4 points). No excess risk was found for ulcerative colitis, Crohn's disease, rheumatic diseases, and dialysis in patients over the age of 64 years and depression in patients 80 years and older. A patient's individual risk score can be calculated by adding the score across all disease categories this patient suffers from. The total score for an exemplary 66-year-old (16 points), male (3 points) patient with COPD (1 point), and heart failure (1 point) is 21 points.
Factors that increase the risk of a severe COVID-19 disease course were more prevalent in the higher age groups (see Table 1). In the training data set, 1.1% of COVID-19 patients between 18 and 64 years received an immunosuppressive therapy compared to 2.9% of COVID-19 patients 65 years and older. However, the effect of certain risk factors was more pronounced in the younger age groups. The risk score for patients under immunosuppressive therapy between 18 to 64 years is 4, while the score is 2 in patients above 65 years of age. The same was observed for type I and II diabetes. The additional risk for a severe disease course of COVID-19 is highest in diabetic patients between 18 to 64 years and lower in the older age groups, while the disease is more common in the older age group.
To quantify the additional value of incorporating comorbidities and sex in the score compared to an age and sexbased score only, we compared the mean score in each age group based on the age/sex group scores and the mean scores using age/sex group and disease scores (see Fig. 1A). The POINTED risk score (orange) is higher than the score using only the estimated pointes for age and sex (blue). Especially in the higher age groups, incorporating the added risk of certain diseases leads to higher average POINTED scores.

Validation of the score
The AUC for the prediction of severe COVID-19 courses was similar when using the developed POINTED risk score (0.889) or a score based on age alone (0.870) (see Fig. 1B).
The distribution of the severe courses of COVID-19 by age group and groups of the POINTED risk score is depicted in Table 3. Only 0.24% of all COVID-19 patients in the age group 18 to 39 years developed a severe course of COVID-19, but 24.7% of those with 80 or more years of age. In patients with a POINTED risk score below 10, 0.26% experience a severe COVID-19 disease course compared to 43.6% of the patients with a score of 30 and more (Table 3).
Based on the Youden Index two cut-off points were defined: 65 years and older (TPR 0.823; FPR 0.812) for the age score and 20 points and more (TPR 0.833; FPR 0.832) for the POINTED risk score including sex and comorbidities *Due to sample sizes below five in a least one of the six data sets this number is not reported. **Due to sample sizes below five in one of the disease-age-strata a total is not reported. 1 Proportions relative to the total number of patients with COVID-19 in the data set/in the respective age group   Table 3) in that age group. Due to the relative rareness of the outcome of about 5%, this corresponds to a positive predictive value (PPV) of 0.195 (2707/13,862), i.e., about one fifth of the patients who have been identified to be at high risk for a severe course of COVID-19 because they are 65 years or older experienced the outcome. The PPV increases to 0.217 (2747/12,668) when patients with a POINTED risk score of 20 or more points are prioritized for a vaccination.

Discussion
We derived and internally validated a risk score (POINTED score) for a severe COVID-19 disease course in a population of 623,363 COVID-19 patients in Germany which aimed to optimize prioritization for a COVID-19 vaccination by considering the potentially cumulative effect of different comorbidities. The score adequately predicted a severe course of COVID-19 (AUC 0.889) in a validation cohort of 62,336 German COVID-19 patients.
Using the presented methodology individuals can be prioritized for vaccination in descending order of the estimated risk score per person. The additive score allows for the consideration of multiple risk factors since individuals with multiple low risk conditions might have an equal risk for a severe COVID-19 course as patients with one major risk factor only. The POINTED score performed only slightly better than a model based on age only in prediction of a severe course of COVID-19. This underlines that a risk stratification by age alone is also a feasible way for prioritization. However, younger patients with major chronic diseases benefit from the POINTED risk score as they would qualify for an earlier vaccination than when only age is used for prioritization. On the other hand, older patients with an age below 80 and no or only minor documented chronic diseases would eventually have to wait longer. Furthermore, the score can be used for prioritization within age groups. Jucknewitz et al. also identified and quantified risk factors for a severe course of COVID-19 (Jucknewitz et al. 2022). The authors included prediction variables in a more granular level and refrained from grouping of ICD-10, ATC, or procedure codes to predefined potential risk factors to avoid the loss of information. In contrast, we chose to define a set of risk factors, which had been shown to be associated with a higher risk for a severe course of COVID-19 (Treskova-Schwarzbach et al. 2021). This allows for an easier interpretation and application of the results by medical professionals.
In contrast to an earlier work (Wende et al. 2022), we decided to develop a score in a cohort of COVID-19 patients instead of the general population. Hence, our results are not confounded by different probabilities of contracting the disease within the populations with risk factors (e.g., strict self-isolation) as we only assess the impact of risk factors on severe course of COVID-19 once infected. For identifying people who would benefit most from a vaccination in the overall population, we think that this is the adequate approach. In line with our results, a British study using a database comprising general practices in England with linkage to Covid-19 test results, Hospital Episode Statistics, and death registry found that down syndrome and dementia significantly increased the risk for a severe course of COVID-19 (Clift et al. 2020). Of all considered factors, only asthma in the age group 65+ was associated with a significant lower likelihood of a severe course of COVID-19. This might be due to the specific medication not only controlling the chronic disease but also being beneficial during an acute COVID-19 infection (Izquierdo et al. 2021).

Limitations
Most importantly, the presented score was developed in an unvaccinated population and the risk of a severe course if COVID-19 is lower in vaccinated individuals. However, as risk factors for a severe disease course are similar in vaccinated and unvaccinated individuals, we consider the developed score to be valid in a vaccinated population also (Antonelli et al. 2022;Yek et al. 2022). Accordingly, the STIKO recommendation for a fourth vaccination for especially vulnerable or exposed groups is also based on the previously established risk factors for a severe course of COVID-19 (STIKO 2022). Due to data protection regulations, we had to use metaanalytic methods to pool the results of the individual data holders. This causes a loss in efficiency compared to direct estimation.
Furthermore, German claims data, especially data from the outpatient setting, are only available with a time delay. Hence, for this paper we could only include data of COVID-19 patients until 31 December 2020. Consequently, no COVID-19 infections with variants such as Delta or Omicron were included in the analysis. However, if the risk factors for a severe course for COVID-19 were similar between the variants, the results of this study are still applicable.
The developed score will be most feasible for application in populations with a similar burden of disease of the considered risk factors. However, using the described methodology the score can be used or rapidly adapted to specific populations given that an adequate population-based database for the calculation of the score is available. The methodological approach is transferable to other situations where the cumulative effect of multiple risk factors is to be estimated for a risk ranking in a defined patient population.

Conclusion
The presented POINTED score offers an opportunity for physicians and all healthcare decision makers (e.g., health insurance companies, German National Association of Statutory Health Insurance Physicians) to calculate a person's individual risk for a severe course of COVID-19. This supports the prioritization of especially vulnerable patients for booster vaccinations or other protective public health measures to prevent severe courses of disease.
Authors' contributions All authors contributed to the study conception and design. DW, MB, FL, OW, MR, MS, LR, and ON prepared the data and analyzed the models on each dataset. FT performed the synthesis of the results. JW and JS supervised the project in which this manuscript is embedded. JJ wrote the first draft of the manuscript. All authors commented on previous versions of the manuscript.
Funding Open Access funding enabled and organized by Projekt DEAL. FT, LR, JJ, JS, JW, MR, and MS report institutional funding for parts of this project from the German BMBF (grant number: 01KX2021).
Data availability As described in the methods section, German data protection laws do not allow a pooling of claims data from different statutory health insurances without prior regulatory approval, which can take up to nine months. Considering the dynamic situation during the pandemic, we chose to pool aggregate data instead. The raw data used in this study cannot be made available in the manuscript, the supplemental files, or in a public repository due to German data protection laws (Bundesdatenschutzgesetz). The aggregated data is stored on a secure drive at ZEGV.

Code availability
The R code of the analysis can be made available upon request by the corresponding author.

Declarations
Ethics approval The ethics committee of TU Dresden approved this study (approval number: BO-EK (COVID)-482102021).

Consent for publication Not applicable.
Conflict of interest FT, JJ, JS, JW, MR, MS, and ON report institutional funding for parts of this project from the German BMBF. Unrelated to this study, JS reports grants for investigator-initiated research from the German GBA, the BMG, BMBF, EU, Federal State of Saxony, Novartis, Sanofi, ALK, and Pfizer. He also participated in advisory board meetings for Sanofi, Lilly, and ALK. MB reports payment for data analysis which is presented in this paper from DAK-Gesundheit. Unrelated to this study, MB reports grants from German GBA, Pfizer, and Sanofi Pasteur and consulting fees from Janssen-Cilag. He participated in an advisory board for GSK. SB is Head of Analytics and Data Science at AOK PLUS, Dresden, Germany. The other authors declare that they have no competing interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.