Background

One of the most interesting aspects of the COVID-19 pandemic is that a variable percentage of patients (from 2 to 69%) could have a repeated positivity following hospital discharge or even several weeks after clinical recovery [1]. There are multiple reasons why a positive result to SARS-CoV-2, usually ascertained by reverse transcriptase-polymerase chain reaction (RT-PCR), may be detected again, including reinfection, disease reactivation, prolonged viral shedding or false positive results [1,2,3].

The Centers for Disease Control and Prevention (CDC) suggest that an epidemiological and clinical reinfection case may be suspected in a person with at least one detection of SARS-Cov‐2 RNA more than 90 days after the first detection (with or without symptoms) or in persons with COVID‐19‐like symptoms and detection of SARS-CoV-2 RNA between 45 and 89 days from the first SARS-CoV-2 infection, with evidence of close‐contacts with a confirmed case and without evidence of another cause of infection [4]. For confirmation of reinfection, viral genotype assays of the first and second specimens are required [5]. However, the lack of protective immunity due to scarce development or a rapid decay of antibodies could lead to new infection with the same species and strain (recurrence) [6]. The Collaborative COvid RECurrences (COCOREC) study proposed that a COVID-19 recurrence may be considered if the second episode occurs within 21 days following a symptom-free period and where alternative etiologies are excluded [7]. However, the possibility of reactivation of a latent infection or relapse has been considered as a potential consequence of the presence of non-replicative viral traces up to a maximum of 6 weeks after the onset of symptoms [8, 9]. Similarly, the presence of viral RNA in the nasopharyngeal swabs for a prolonged period after infection has been observed in numerous patients (persistent positivity) [10].

Since the beginning of the pandemic, several authors have reported the possibility of reinfection by SARS-CoV-2 or reactivation of a latent infection [8]. These studies highlighted possible reinfection and reactivation of SARS-CoV-2, calling for urgent attention from researchers, as well as public health policymakers.

Big Data analysis has been widely used for pandemic monitoring; from predicting new trends in the spread of infection, to periodically updating the epidemiological situation to assist institutional governance in allocation of health resources decision-making [11]. The term Big Data not only indicates a large amount of data but defines a more complicated concept, described by the 5 Vs concept: (i) velocity (i.e. the speed of data acquisition, processing and manipulation), (ii) volume (i.e. large amount of information available), (iii) variety (data from different sources and in different formats), (iv) veracity (quality of data, free of errors); and (v) value (possibly bringing benefits and producing knowledge) [12]. The availability of large amounts of data, in association with adequate informatic tools, enables the application of Big Data analyses to laboratory data, thereby producing useful information for the study, control and monitoring of the SARS-CoV-2 infection rates [11]. Therefore, this approach of managing and analysing a large database was chosen for the current study.

Thousands of naso- and oro- pharyngeal swabs have been performed at the Department of Laboratory Medicine of the Azienda USL of Modena since the early stages of the pandemic, and a large amount of data has been collected. We applied big data analysis to estimate the real incidence of reinfection by SARS-CoV-2 in the population of the Province of Modena, to understand the weight of protective immunity in this pandemic. Also, we performed a review of current systematic reviews reporting reinfection with SARS-CoV-2, to summarize the current knowledge/findings about reinfection.

Methods

Analysis of laboratory results

The laboratory of the Department of Laboratory Medicine of the Azienda USL of Modena has a database which collects about 15 million results per year. These are tests carried out in all the laboratories in the same province, serving a population of about 700,000 residents. The database is continuously updated every day, enabling users to know at any moment how many exams are booked, how many are in progress and how many have been reported. Users can also check this same information for the previous days and months.

For this study, we retrospectively analyzed the results of the oro- or naso-pharyngeal swabs performed for the determination of SARS-CoV-2 RNA in samples collected over a period of 18 months (1st January 2021–30th June 2021), in subjects for which the molecular RNA research was requested for the diagnosis of SARS-CoV-2 infection. We identified and selected subjects with a positive result, a subsequent negative result, and then another positive result. Reinfection was defined as a second positive result > 90 days from the initial positive result [4]. To identify possible reinfection of enrolled subjects, data analyzed was extended up to 22th February 2022. Data regarding the vaccination status of the positive subject was also acquired. Subjects were grouped according to age categories: 0–14, 15–29, 30–49, 50–69 and \(\ge\) 70 years old.

Total RNA was extracted from the clinical samples using a commercial RNA-extraction kit and was reverse transcribed. The cDNA was then amplified by real-time qualitative PCR, using a commercial kit (Alinity m SARS-CoV-2 Assay—Abbott Molecular). The procedures and interpretation of the results were carried out in compliance with the manufacturers' instructions.

For this retrospective and observational study, the sample size was not calculated a priori, but all data collected in the database up to 22th February 2022 were considered. Due to the lack of informed consent and the impossibility of its subsequent acquisition, data were pseudonomyzed. One researcher extracted data from the database using specific search queries without including any personal identification codes. A second researcher analyzed the data extracted. Any researcher using the data was able to trace the identity of the subject analyzed.

We calculated the frequency of reinfection of SARS-CoV-2 over the study period. We also stratified data into subgroups based on subjects’ vaccination status. The reinfection rate was determined by dividing the number of reinfected subjects by the total number of initially positive subjects.

Review of systematic reviews

We conducted an overview of systematic reviews (SRs) according to the Cochrane methodological guidance [13], reporting the findings according to applicable items from the PRISMA statement [14].

This overview included any type of SR that reported the number of reinfections, defined as a positive RT-PCR test carried out > 90 days after the initial test, in subjects who had been considered cured or a positive RT-PCR test carried out > 45 days after the initial positive test, accompanied by compatible symptoms or epidemiological exposure [4]. Further inclusion criteria specified that SRs: (i) provided specific eligibility criteria allowing to define the clinical question; (ii) described a search strategy; (iii) reported sufficient details for included studies; (iv) reported the definition of reinfection; and (v) reported the pooled estimate or the exact number of reinfections. Eligibility was not restricted by language, patient age or study setting. We excluded any SRs that did not report sufficient information about reinfections, were narrative reviews, evaluated sample specimens or included animal models.

To identify all SRs of interest, we searched the clinical queries of MedLine, including appropriate filters for SRs conducted on COVID-19. The search strategy included the following keywords “reinfection”, “COVID-19, “SARS-CoV-2”. Further, the reference lists of potentially eligible SRs were also screened. We limited the search to studies published between 2020 and 2022. The literature search was conducted by one investigator in February 2022. One author screened the titles and abstracts of SRs retrieved from the database searches and selected the studies for inclusion according to eligibility criteria. A second author checked the selection. Disagreements were resolved by consensus.

From each included SR, one author extracted the necessary data, and a second author validated the data collected. The following information was recorded: (1) characteristics of SR (authors, year, country); (2) definition of reinfection; (3) methodological design of included studies (i.e., case report, cohort); (4) characteristics of participants (i.e., sample size, age, gender, symptoms); (5) investigated outcomes (number of reinfections, time interval between first and second infection). Disagreement between reviewers was resolved by consensus.

One author assessed the methodological quality of the included SRs, using the AMSTAR 2 tool [15]. A second author checked the evaluation. AMSTAR 2 does not generate an overall score, and we report an overall rating of confidence in the results of the SRs as follows: high (zero non-critical weaknesses), moderate (≥ 1 non-critical weakness), low (1 critical flaw with or without non-critical weaknesses), very low (> 1 critical flaw with or without non-critical weaknesses). We synthesised data into tables, including SR characteristics and summarised narrative findings, according to quality and outcomes of interest for the current overview.

Results

Analysis of laboratory data

During the 18-month study period, 178,948 subjects (25.5% of all assisted subjects in the Province of Modena) performed a molecular test for the search for SARS-CoV-2 RNA. Of these, 20% (n = 35,743/178,948) had at least one dose of vaccine for SARS-CoV-2 up to 30th June, especially subjects over the age of 70. During the study’s time period, vaccines were not available for children < 14 years old (Table 1).

Table 1 Subjects tested, infected and reinfected, according to age group

The 20% (35,692) of tested subjects had a positive PCR result for SARS-CoV-2, with highest among the 50–69 age group (n = 9488/40813, 23%). Among all positive subjects, 5% (n = 1,794/35,692) were vaccinated. Likewise, among vaccinated subjects, 5% (n = 1794/35,743) had a positive result on molecular test (Table 1).

Of the subjects with an initial infection, 3.5% (n = 1,258/35,692) had reinfection, with the highest incidence among the children groups (Table 1, Fig. 1). Among re-positive subjects, 66.2% (833/1,258) were vaccinated.

Fig. 1
figure 1

Incidence of reinfection by age

At the closure of the extended data analysis for reinfections (up to 22th February 2022), 79% of tested subjects (141,936/178,948) were vaccinated. Reinfection rates for vaccinated and non-vaccinated subjects were 0.59% (833/141,936) and 1.15% (425/37012) respectively (Fig. 2).

Fig. 2
figure 2

Incidence of reinfection by vaccination status

The mean time between the initial and second positive molecular results according to age groups was 248 days (> 70 years), 313 days (< 14 years), 309, 310 and 302 for subjects with age range from 15 to 69 years, respectively.

Synthesis of systematic reviews

The litterature search identified 22 references. Of these, 3 were excluded because they did not meet the inclusion criteria. There were 19 references considered eligible for inclusion and details were obtained from full texts. Further, 10 texts were excluded, leaving a total of 9 SRs [16,17,18,19,20,21,22,23,24] selected for this overview. (Fig. 3). All included publications reported the database and data search, the description of inclusion criteria and the confirmation of the reinfection by RT-PCR. Three SRs defined the reinfection following CDC criteria, three reported a general definition, one used the RKI definition, and two did not report any definition (Table 2).

Fig. 3
figure 3

PRISMA flow diagram of the selection process for systematic reviews

Table 2 Characteristics of systematic review included in the overview

The methodological quality of SR was judged moderate for 5 SRs, low for 3 SRs and very low for one SRs.

The selected SRs reported few cases of reinfections in studies published up to August 2021. The reporting of this information is more heterogeneous among SRs included. Cases ranged from 35 [17] to 260 [20]. Abrokwa et al. [16] reported that the rate of re-positive result ranged from 0 to 50%, likewise, Mao et al. [21] showed a pooled reinfection rate of 0.65%. Sotoodeh Ghorbani et al. [23] estimated reinfection to be 3 per 1000 patients (95% CI 0.8–5), and Chivese et al. [18] reported that patients previously infected by SARS-CoV-2 had a 81% odds reduction for reinfection (Table 2). Two SRs reported that protection against reinfection was 87% [21] and 90.4% [19], respectively. Authors agreed that the protective effect of prior SARS-CoV-2 infection on re-infection is high and similar to the protective effect of vaccination.

Four SRs [17, 20, 22, 24] reported information about the severity of symptoms in reinfection compared with initial infection; two reported similar severity [17, 24], whereas Lo Muzio et al. [20] reported less severe symptoms with reinfection. Massachi et al. [22] reported the same percentage of patients experiencing either greater or milder symptoms with reinfection.

Discussion

The available literature for SARS-CoV-2 reinfection risk suggests that although rare, reinfection is possible, but available estimates vary considerably and are mostly based on case reports/series and some cohort studies. We retrospectively analyzed a large repository of complete data for a single province, with the assistance of big data analysis, identifying more than 35 thousand subjects with an initial positive result for SARS-CoV-2, registered over an 18 months period, and with extended analysis of at least 6 months follow-up. We report 1,258 cases of SARS-CoV-2 reinfection (3.5%) compared to a total of 1,705 cases of reinfection reported among the SRs included in our SR overview. Despite inherent limitations of big data analysis, our observation suggests that its application in the context of a complete database of reported infections for a single population can more accurately estimate true rates of SARS-CoV-2 reinfection.

Multiple questions regarding reinfection associated with SARS-CoV-2 are still ongoing. What is the pathophysiological mechanism for reinfection, who are the subjects with a higher risk of reinfection and what is the clinical burden for reinfected patients? Reinfection with the SARS-CoV-2 virus can be mainly attributed to two phenomena: decay of the immune response and viral mutations that favor the appearance of new variants. The decay of immunity or the failure of naturally acquired immunity may result in reinfection with the same virus strain [25], whilst viral mutations may make subjects vulnerable to reinfection [26, 27]. New virus variants could evade immune responses acquired in subjects with infections from previous variants or reduce the capacity for neutralization by polyclonal antibodies [26, 28].

To understand the causes of reinfections it is necessary to known the nature, duration and kinetic of the anti-SARS-CoV-2 antibody response, as well as the variables associated with the virus itself. Based on the several reinfection cases reported, now we know that the possible causes of reinfection include the decay of antibodies title, the exposure to a higher dose of the virus, immunological comorbidities associated with some patients and the diffusion of more virulent virus due to new genome mutations that allow them to evade the host’s immune response [29]. The viral variations occur as result of nucleotide changes in the viral genome during replication and confer advantages with respect to viral replication, transmission, and immune evasion. Most of the mutations, occurring at the Spike (S) protein and especially in the Receptor-Binding Domain (RBD), could affect the entry of the virus into the host cells and the efficacy of vaccines and neutralizing antibodies. Since the beginning of the pandemic, several viral variations have followed one another. CDC distinguishs between variant of concern (VOC) and variant of interest (VOI) based on their ability to cause severe disease, infectivity, or reduced Abs response. In particular, VOC is associated with higher transmissibility, severe disease, and escaping natural and vaccine-induced immunity. Alpha (B.1.1.7) variant has 23 mutations and it is associated with an increase transmissibility, hospitalizations, mortality rates and burden to health care systems. The Beta (B.1.351) variant has mutations in S, N, E e ORF proteins. In particular, the N501Y in RBD confers an increased binding affinity for the ACE2 receptor, and E484K mutation is associated with reduced vaccine efficacy and increase immune escape. Similarly, the Gamma (P.1) variant contained more mutation in S protein, among which E484K that is associated with immune evasion and higher risk of reinfection. The Delta (B.1.617.2) variant is more virulent than other variants, is associated with severe disease, hospitalization and resistance to preventive measures. Lastly, the Omicron (B.1.1.529) variant has 32 mutations in the spike (S) protein and is associated with higher virulence and increase risk of re-infection. The VOC share common mutations in S protein. The first is N501Y founding in the ACE2 binding site of the RBD and is common to Alpha, Beta, Gamma and Omicron variant. The second and third, E484K/Q/A and K417T/N, are present in the Beta, Gamma and Omicron variants. The fourth, L452R, is unique to the Delta variant [30].

The efficacy of current vaccines is due to the ability to stimulate the neutralizing antibodies production against the S protein of the SARS-CoV-2 wild-type strain. The advent of new viral variants, evading the immunity acquired from previous infection, could reduce the efficacy of vaccine and cause re-infection, with heterogeneous clinical severity and great difficulties for healthcare system.

This issue suggests the need to increase the current knowledge about the degree of protection provided against SARS-CoV-2, leading the development of vaccines and the creation and implementation of appropriate interventional strategies.

Current evidence confirms that patients infected by SARS-CoV-2 produce antibodies against Spike and N proteins within 30 days from infection [31] but the mechanism of mediate immunity are not fully understood. Infection by SARS-CoV-2 activates T and B cells, leading to the production of neutralizing protein inhibiting viral infectivity through various mechanisms of action, including blocking the binding of the Spike protein with the ACE2 receptor [32]. IgM appears quickly but has a very short half-life. Specific IgG develops a few days after IgM and can be determined in serum about 7–14 days from symptom onset [33]. A recent systematic review [34] reported differences in the presence of antibodies during the first infection (56%) and reinfection (63%), suggesting that waning antibodies could place individuals at a risk for reinfection. The presence of antibodies could provide a protective role, but it does not specifically prevent reinfection [35]. Furthermore, it has also been suggested that a previous COVID-19 infection may not confer total immunity, paving the way for a potential second infection by a different variant, with the second infection being potentially more severe than the first [24].

Currently, there are discordant rates of reinfection reported in SRs (ranging from 0 to 50%), which could partially be explained by the heterogeneous adopted definitions of reinfection. Today, there is still no universal agreement on the determination of the correct time period between positive results for SARS-CoV-2 for the definition of reinfection, although the definition provided by CDC is the most accredited [4]. Further, most SRs mainly include case series or case reports, with limited examples of reinfection. Our big data analysis was conducted in a unique environment of complete data stored in a single warehouse including all SARS-CoV-2 testing with PCR in a single province, analyzed according to the most commonly accepted definition of reinfection in literature. With the collection of a large number of reinfected cases, possible causes, important information for the discrimination of reinfection from recurrence, and the definition of subjects with higher risk of reinfection can be evaluated.

It has been pointed out that the severity of reinfection depends on the individual immune response, as well as both the viral load and the SARS-CoV-2 variants causing the reinfection [36]. A reinfection can then be of the same or greater intensity, and it is probable that it is mostly due to a new species of coronavirus [37]. Garduno-Orte et al. [38] described 4 cases of reinfected healthcare workers, showing that in two cases the reinfection resulted in a more severe case. Likewise, Massachi et al. [22] reported that the 41% of reinfected patients experienced greater symptom burden than initial infection. Conversely, Wang et al. [24] reported that 69% had similar severity, 19% had worse symptoms, and 12% had milder symptoms with a second episode. Our study does not include information regarding disease severity, making any examination of the clinical and social implication of SARS-CoV-2 reinfection impossible to make.

Our study does, however, include subjects’ vaccination status, providing important considerations for the risk of reinfection provided by natural immunity and vaccines. In this study, the rate of reinfection among vaccinated subjects was lower than that observed among non-vaccinated subjects. If antibody decay is associated with susceptibility to reinfection, we may observe further reinfections over the next months. Likewise, if the immune response vaccine-induced is likely to decay as the natural immune response, the need for booster immunization will require re-evaluated to maintain ongoing protection.

Our work is an example of application of big data analysis in laboratory setting, enabling real estimates of incidence of reinfection, to identify factors affecting reinfection, such as strains of the virus or patient immune characteristics, and ponder the involvement of the vaccine in this pandemic. The Big Data concept refers to a complex analysis of a huge set of data, which requires the use of dedicated analytical and statistical approaches. The method uses advanced computational methods to extract information from datasets and build new association models. There are multiple sources of data (administrative databases, electronic health records, epidemiological studies), so it is important to develop an appropriate integration and analysis system to translate the information from analysed data to appropriate clinical decisions.

The growing data availability and greater analytical capacity, can improve results not only in the economic and financial fields, but also in public health, supporting diagnostic pathways, developing prognostic predictive models of disease, personalizing therapeutic regimens and, can also find an application in prevention initiatives [39, 40]. The application of Big Data analysis in healthcare has numerous advantages as it enables: (i) the integration of different datasets and builds different algorithms and more complex learning models to find new genetic, biological and clinical associations [41], (ii) direct analyses of data from an entire population, overcoming the limitations associated with statistical approaches applied to the analysis of data from a representative sample to make inference on a population (even if randomized controlled trials remain the "gold standard" to study treatment efficacy), (iii) the observation of effects of long-term treatments. However, this approach has limitations due to the high variability of data and data collection methodologies. These limits can be resolved by the use of adequate computation systems, thereby helping to reduce bias and make data more functional.

In the interpretation of our results, some limitations due to the lack of information about symptoms and immunological status of subjects analyzed, and the viral strain causing infection and reinfection, should be considered. The definition of a positive result 90 days after an initial infection as reinfection used in this analysis cannot exclude a possible reactivation of a latent infection. Further, the true rate of infection is assumed to be underestimated, as many asymptomatic subjects are not tested for viral RNA research and, among those tested, genomic sequencing is not always performed, rendering the identification of the precise variants causing infection and reinfection very difficult to make. Considering the official data from Istituto Superiore di Sanità, between January and June 2021, in Italy, the most common variant was the alpha variant (88.1%) followed by gamma variant (7.3%), instead between January and March 2022 was the omicron variant (98.3%) (https://www.epicentro.iss.it/coronavirus/sars-cov-2-monitoraggio-varianti-rapporti-periodici), and probably the re-positive results observed in our study may be re-infections. The third factor to consider in the evaluation of reinfection, is a potential false negative molecular result at discharge and a subsequent positive result being due to persistent infection [42].

Our big data analysis of a complete population confirms an overall risk of reinfection by SARS-CoV-2 of 3.5%, with unvaccinated and younger subjects more susceptible to reinfection. More data will become available over time, and big data analysis will enable its timely integration into considerations of targeted strategies to control and prevent reinfection, increasing value in the patient's care pathway and supporting healthcare systems. In the meanwhile, social distancing, the use of masks and hand hygiene remain the main preventative measures against primary infection and reinfection of SARS-CoV-2. A standardized approach to identify and report reinfection cases is necessary.