Introduction

The newly emerged virus SARS-CoV-2 was initially reported in China in December 2019 [1]. On February 20, 2020, the first major COVID-19 outbreak in Europe was detected in the Lombardy region, Italy [2]. On March 11, 2020, WHO declared the SARS-CoV-2 outbreak a pandemic [3]. As of August 30, 2023, the WHO reports a total of 770,085,713 confirmed cases and 6,956,173 confirmed deaths worldwide [4]; while the Italian NHS reported 26,175,146 confirmed cases, and 190,644 deaths [5]. During the first pandemic year, the temporal course of the epidemic in Italy was characterized by 3 distinct phases: the first epidemic wave from March to June 2020, followed by a summer period with a relatively low incidence, and a second wave that started in September and peaked in November 2020 [4].

Evidence shows that males, aged over 65 and smoking patients might face a greater risk of developing more critical or lethal conditions if infected with SARS-CoV2. Comorbidities, such as hypertension, diabetes, cardiovascular disease or respiratory diseases, could also greatly affect the prognosis of COVID-19 patients [6]. A study carried out in a large community cohort has also shown associations between adverse lifestyles and a higher risk of COVID-19 [7]. However, it is now well known that lifestyle plays a mediating role in the relationship between socioeconomic position (SEP) and health [8]. Literature reports that the wide inequalities seen in infection, hospitalization and mortality rates between population groups were mostly driven by social factors overlaid on biological risks [9, 10]. Multiple mechanisms explain the increased impact of COVID-19 among people with a higher level of socio-economic disadvantage, but in summary, unfavourable social determinant of health (SDH) and associated higher rates of chronic disease [11] increased their risk of poor outcomes from COVID-19 and poorer access to health services for treatment and vaccination [12]. While existing predisposing susceptibility to COVID-19 is a product of pre-existing SDH, there is growing evidence that the ability of disadvantaged groups to adhere to public health and social measures that reduce viral transmission and to deal with the aftermath of the pandemic, is also negatively affected by unfavourable SDH [13]. Difficult living and working conditions make adherence to preventive measures more difficult for disadvantaged populations, thus increasing their exposure to the risk of infection [14]. For instance, most deprived individuals usually carry out manual labour professions or work in the informal sector and therefore, resulting in limited opportunities for working from home. Health and socioeconomic inequality mutually influence each other by triggering and feeding vicious circles [12]. The economic fallouts from the pandemic are hitting disadvantaged population groups harder. The pandemic itself has accentuated these already existing social and health inequalities, widening the gap among individuals with different SEP. This is a crucial point for the present but also for the future. Assessment and mitigation of SDH cannot be neglected in a pandemic response and prevention programme.

The available Italian deprivation index (DI) is a multidimensional measure of the disadvantage in the ownership of both social and material resources among residents in each census sections, which are comparable to neighbourhoods as described elsewhere [15, 16]. Mateo-Urdiales et al. investigated the association between deprivation and COVID-19 outcomes in Italy during pre-lockdown, lockdown and post-lockdown periods using the Italian DI as a contextual measure of deprivation. No differences in case-hospitalisation and case-fatality according to deprivation were observed [17]. Similarly, in a study on the city of Barcelona, the increase in hospitalisation and mortality rates was not significant among people living in areas characterised by higher geographical DI [18]. However, the use of geographical DI measures as a proxy for the level of individual social disadvantage is subject to potential ecological bias that can arise from attributing a collective measure to an individual.

The overall objective of this research was to compare the geographic and individual DI in assessing the associations between individuals' SEP and risk of Sars-CoV-2 infection and disease severity in the Apulia region from February to December 2020.

Methods

Study setting

The Italian’Servizio Sanitario Nazionale’ (SSN) was introduced in 1978 to ensure that healthcare is accessible to all Italian citizens without socio-economic barriers, according to a principle of horizontal equity [19]. The system is organized into three levels: national, regional, and local. The national level is responsible for establishing the general objectives and fundamental principles of the NHS. The nineteen regions and two autonomous provinces (R&AP) are then responsible for organizing and delivering health care [20]. In this scenario, through ministerial decrees, the Ministry of Health has taken the lead in the fight against the COVID-19 epidemic. Then, the R&AP were in charge of organizing and implementing the monitoring and prevention strategy at the local level based on national guidelines. This study is carried out in the Apulia region (Dimension: 19.540 km2), which is a region in southern Italy as shown in red in Additional file 1: Figure S1. As of December 31, 2019, the population of the Apulia region was 3,953,305. The region is administratively divided into six provinces, each with its own LHA. In 2019, 51.4% of the population in Apulia were females, and 3.4% were foreigners. The average age of the population was 45.4 years, with individuals aged over 65 accounting for 23.1% of the entire population. Additionally, 40.4% of the population had at least one chronic disease, with the most common conditions being hypertension (18.8%), arthrosis and arthritis (17.0%), allergic diseases (11.8%), and osteoporosis (9.4%) [21]. In Apulia region, the hospital network comprises 33 public and 26 private institutions, providing a total of 11,565 beds for ordinary inpatient care, equating to 2.93 beds per 1000 people.

Pandemic stages and preventive measures

In Italy, after the detection of the first locally acquired Covid-19 case in Lombardy on February 20, 2020, the number of cases increased greatly in the following weeks, although unevenly among regions, forcing the government to adopt unprecedented restrictive measures. In particular, during the 1st year of the pandemic, the following phases can be defined: 1. first comprehensive national lockdown from 9 March to 3 May (closure of schools and most workplaces, and the implementation of quarantines, border closings, and restriction on public gatherings) 2., 22] gradual reopening phase from 4 May to 14 June (restrictions were gradually rolled back) 3. , 23] Few restrictions from 15 June to 7 October (Covid-19 incidence remained low) 4. , 24] new restrictions from 8 October to 5 November [25], and 5. lockdowns on a regional basis from 6 November until the end of 2020 (closure of regional borders) [26]. During the first epidemic wave, the number of diagnostic tests (Polymerase Chain reaction—PCR) available was limited, while availability increased considerably during the second wave [27].

Study design and data sources

This is a retrospective cohort study conducted by analysing and merging the following electronic health records from the Apulia regional health-care information systems: (i) laboratory registry of individuals tested for Sars-CoV-2 infection; (ii) registry of COVID-19 confirmed cases; (iii) healthcare workers’ registry; (iv) 2011 census dataset. Figure 1. explains how the final dataset was obtained. In the study, a COVID-19 confirmed case was defined as an individual who resulted positive to a PCR test. The study period goes from 20 February to 31 December 2020. The cohort in this study includes all the individuals tested for SARS-CoV-2 infection during the study period in the Apulia region, in which a total of 3.95 million inhabitants were registered as of December 31, 2019 [28]. Since the census data date back to 2011, we decided to exclude persons under 35 years of age, for whom part of the census data (i.e. education) may have changed. Individuals tested from 20 February 2020 to 31 May 2020 were included in the first epidemic wave, while those tested from 16 September 2020 to 31 December 2020 were included in the second epidemic wave [4].

Fig. 1
figure 1

Data linkage to obtain the study cohort

The laboratory registry tracked: identification code; age; sex; province of residence; date of the test; PCR test result. The COVID-19 confirmed cases registry reported the following information: identification code, COVID-19 related hospitalization; admission to the intensive care unit (ICU) due to COVID-19 and death due to COVID-19. From the census variables available, the following were selected to perform this study: identification code; education; the number of family members; citizenship; family type; and employment status.

Charlson comorbidities index, extensively described previously [29], and the geographical DI [15, 16] were added to the characteristics of individuals in the final dataset. The Italian geographical DI is the sum of the z score of five simple indicators: x1: % of population among 15 and 60 years old with education equal to or less than elementary school; x2: % of the active population unemployed or seeking their first job; x3: % of housing occupied for rent; x4: % of single-parent families (and consisting of a single-family unit) with children under 18 years old; x5: population density (occupants per 100 m2).

Through the use of the healthcare workers’ registries, healthcare workers were removed from the dataset as their risk of acquisition of infection was strongly related to their professional exposure to the virus.

The merging of the data information system has been carried out by the regional health agency through a numeric identification code. All the health data used in the study were anonymous.

Definition of the individual deprivation index through the census data

For the definition of the individual DI, polychoric principal component analysis (PCA) was employed, and four census variables were used: citizenship, family type, employment status, and education. Table 1 reports the value attributed to each individual variable category.

Table 1 Census variable used to define the individual DI

Statistical analysis

Generalized linear logistic regression models (GLM) were used to test associations between COVID-19 outcomes (tested positive, being hospitalised due to COVID-19, being admitted in ICU due to COVID-19 and death due to COVID-19) and geographical or individual DI. Age, sex and Charlson comorbidity index were included in the model as covariates. Multilevel logistic models were used to test associations between COVID-19 outcomes and PCA individual DI, geographical DI, and their interaction. Census sections were used as clusters for the model, each of which was assigned a geographical DI. Age, sex and Charlson comorbidity index were included in the model as covariates. To build the model, we followed the following steps also explained in Additional file 1: Figure S2: Preliminary step: Data preparation (centring of variables); Step1: Construction of an empty model to assess the variation of log-odds from cluster to cluster. This model did not contain the covariates. Step2: Assessment of the variation of lower-level effects from cluster to cluster. In order to perform the assessment, first, a constrained intermediate model was run, secondly, an augmented intermediate model was run and finally both were compared by performing a likelihood- ratio test. Both models contained the covariates, geographical and PCA individual DI. The augmented intermediate included also the residual term associated with the PCA individual DI, thereby estimating the random slope. Step3: Construction of a final model adding the cross-level interaction. The methodology is described in detail in the work of Morselli et al. [30]. When the variation of log-odds from cluster to cluster was not significant, the results of one-level regression analyses were considered.

The statistical analysis has been carried out separately for the first epidemic wave and the second epidemic wave.

R (version 4.2.0) was used for all statistical analysis and a p-value of 0.05 was applied for testing statistical significance.

Results

In the study period 129,779 individuals aged more then 34 (49.1% female and 50.9% male) were tested for COVID-19: 7,160 (5.6%) during the first wave, 20,523 (15.9%) during the summer and 102,096 (78.4%) during the second wave (Additional file 1: Figure S3). In our study population the median age was 60. 24,466 (18.9%) individuals lived in a census area whose geographical DI was 1, 23,594 (18.2%) in a census area whose geographical DI was 2, 25,508 (19.6%) in a census area whose geographical DI was 3, 27,676 (21.3%) in a census area whose geographical DI was 4, 28,497 (20.5%) in a census area whose geographical DI was 5. Table 2 describes study population characteristics by epidemic wave.

Table 2 Description of the study population by wave of epidemics. Proportion calculated only on complete data

PCA Individual deprivation index

The census variables selected to build the individual DI were available for a subpopulation of 129,779 individuals. We have retained two principal components in our PCA: the variance explained by the principal component 1 was 34.1% and the variance explained by the principal component 2 was 25.5% (Additional file 1: Figure S4). The coefficients forming these components can be found in Additional file 1: Table S1.

Outcomes of interest by type of deprivation index

Figure 2 shows the counts of positive tests, COVID-19 hospitalisations, intensive care unit admissions and deaths among the DI groups per 100,000 tests or positive tests during the first or second wave respectively.

Fig. 2
figure 2

Counts per 100,000 of positive tests, hospitalizations, ICU admissions, and deaths per DI level. A: 1st wave (20 February 2020 to 31 May 2020). B: 2nd wave (16 September 2020 to 31 December 2020)

1st epidemic wave

According to the results of the logistic GLMs, during the first wave, the risk of testing positive for Sars-CoV-2 infection was not significantly different in people with a level of geographical deprivation higher than 1 when compared with the less deprived population group. The models that used PCA individual ID reported that in the more deprived population, the odds of testing positive were lower compared those estimated in the less deprived population, with ORs lower than one. Considering both DI indicators, the ORs of being hospitalised, admitted to the ICU or dying when positive did not exhibit significant differences between the subgroups of the population with the highest DI and those with the lowest DI. (Fig. 3, Table 3).

Fig. 3
figure 3

Adjusted ORs per increase in the geographical, PCA individual DI for the counts of positive tests, hospitalisations, ICU admissions, and deaths. *IDI: individual deprivation index

Table 3 Associations between geographical and individual DI and risk of Sars-CoV-2 infection and disease severity

2nd epidemic wave

According to the results of the logistic GLMs, during the 2nd epidemic wave the risk of testing positive for Sars-CoV-2 infection was significantly higher in people with a level of deprivation higher than 1 when compared with the less deprived population group, with generally homogeneous results disregarding the DI used in the model. For what concern the odds of being hospitalised and dying if positive, considering the geographical DI, the subgroup with deprivation level 5 has a higher OR of being hospitalised (OR: 1.23; 95% CI 1.12–1.35) and dying (OR: 1.34; 95% CI 1.05–1.58) if positive than the subgroup with deprivation level 1. Calculating GLMs with the PCA individual DI, the OR of being hospitalised and dying if positive increased with increasing individual deprivation level also in population subgroups with a DI less than 5. The models that used individual ID reported that as deprivation increased, the risk of hospitalisation and death increased, which was not evident in the models that used geographic ID. The OR of being admitted to the ICU when positive in the subgroups of the population with the highest DI was not significantly different from the subgroup of the population with the lowest DI, irrespective of the type of DI used (Fig. 3, Table 3).

Multilevel logistic modelling

1st epidemic wave

The proportion of the between-cluster variation (ICC) in the probability of being positive if tested during the first wave was 9.9 (Table 4). This indicates that 9.9% of the chances was explained by between the census section differences. The deviance of the augmented intermediated model was significantly lower than the deviance of the constrained model. Table 4 reports multilevel logistic regression results. The model showed that higher level PCA individual DI were significantly associated with lower probability of being positive if tested. Additional file 1: Figure S5, Panel A shows the interaction between geographical ad individual DI in the prospective prediction of being positive if tested. Although the association between geographical and individual DI was not significant, the coefficient (ß) was negative. The proportion of the between-cluster variation in the probability of being hospitalised, admitted to ICU and dying if positive during the first wave was respectively 5.4, 4.9 and 2.1. Therefore, the results for the one-level regression model were considered.

Table 4 Results of multilevel logistic regression or GLM models including geographical and individual DI interaction

2nd epidemic wave

The proportion of the between-cluster variation in the probability of being positive if tested during the first wave was 15.2. This indicates that 15.2% of the chances was explained by between the census section differences. The deviance of the augmented intermediated model was significantly lower than the deviance of the constrained model. The model shows that both higher level PCA individual DI or geographical DI were significantly associated with higher probability of being positive if tested (Table 4). Additional file 1: Figure S5, Panel B shows the interaction between geographical ad individual DI in the prospective prediction of being positive if tested. Although the association between geographical and individual DI was not significant, the coefficient (ß) was positive. The proportion of the between-cluster variation in the probability of being hospitalised, admitted to ICU and dying if positive during the first wave was respectively 1.3, 2.4 and 1.9. Therefore, the results for the one-level regression model were considered.

Discussion

This study reports results from a large population-wide cohort of people tested for COVID-19 in the Apulia region, Italy, during the first and second wave of the pandemic in 2020. To our knowledge, this was the first study to investigate the role of individual DI on COVID-19 outcomes in Italy. Previous studies were limited to the use of an area-based DI or the individual component of socioeconomic deprivation as a proxy of individual deprivation [17, 31,32,33]. Although for the first wave, the geographic DI and the individual DI calculated in this study show no differences in the non-significance of the association with COVID-19 outcomes, the results were different for the second wave.

According to our findings, the association between the risk of testing positive for Sars-CoV-2 and the level of socioeconomic deprivation in the Apulia region changed between the first and second waves. According the analysis done with the PCA individual DI, during the first wave, there was a significant inversely proportional trend between the DI and the risk of testing positive. As deprivation increased, the risk of being positive when tested decreased. During the second wave, individuals with a higher level of socio-economic deprivation had a higher statistically significant probability of testing positive. This may be explained by the fact that during the early stages of the epidemic outbreak, with affected geographical areas still circumscribed, most of the cases in the Apulia region were due to returning residents [34]. Indeed, the gradual implementation of control measures in Italy sparked substantial movements of people travelling from northern regions at the epicentre of the epidemic toward other regions, such as Apulia. Then, the swift extension of lockdown to the entire country [22] mitigated the impact of these COVID-19 seeding events and the epidemic in Apulia was successfully contained. For these reasons, during the first wave in Apulia, the epidemic probably spread mainly among individuals who had the financial means to travel or who were economic migrants in Lombardy or northern regions, areas of intense economic activities. During the first wave, individuals with a higher level of socio-economic deprivation did not have a statistically significant higher probability of being hospitalised or dying if positive. These results were in line with those reported by Furtunato et al. [35]. While during the second wave, individuals with a higher level of socio-economic deprivation had a higher statistically significant higher probability of being hospitalised or dying if positive.

Concerning the second wave, the two DI showed a clear association between the probability of testing positive and the level of deprivation. However, the geographic DI showed a stronger trend. The results of this study showed that most people with a higher individual DI live in the most deprived census areas. Since people with a higher individual DI were more often in jobs that were less amenable to remote working and so they benefited less from lockdown restrictions than those able to work from home; they had a higher likelihood of acquiring the infection [36]. Moreover, they generally live in higher-density environments where family members could be infected secondarily [37]. This may have meant that in more deprived areas the virus circulated more widely than in less deprived areas. As individual DI increased, so did the probability of hospitalisation and death. The same trend was not observed for geographical DI, which showed a significantly increased probability of being hospitalised or dying if positive only for the highest level of deprivation. In accordance with our results, Mateo-Urdiales and colleagues, using the same geographical DI used in this study, but at the municipality level, found a higher incidence of cases in the most deprived municipalities compared with the least deprived ones and no differences in case-hospitalisation and case-fatality according to deprivation were observed in any period under study. The same result has also been reported by other studies using geographical measures of deprivation conducted in Spain [38, 39]. While many factors could explain this finding, an alternative could be that hospitalisation and death cases were similar across areas with different levels of deprivation in a well-developed universal healthcare system, such as Italy and Spain.

The use of individual DI enabled to understand that actually the most disadvantaged people had a higher risk of hospitalisation and death, regardless of the area in which they lived. To avoid exacerbating existing social inequalities and marginalisation, it is essential to be able to monitor them. According to our results, using geographical DI as a proxy for individual DI may lead to inaccurate assessments. However, the geographic DI can have a relevance in providing insight into the distribution of infection within different neighbourhoods, accounting for their heterogeneity. This may have important implications for public health action planning.

Understanding the relationship between deprivation and COVID-19 outcomes is multifaceted and complex. This is why the secondary objective of this manuscript was to test the hypotheses about how individual and geographical DI interact to predict COVID-19 outcomes. According to the results of the multilevel logistic and GLM models, there was no association between COVID-19 outcomes and the interaction between PCA individual DI and geographical DI. Although this may seem at odds with the robust evidence in the literature demonstrating an interaction between socioeconomic status in one's neighbourhood, individual deprivation and health, this evidence is mainly based on health outcomes of non-communicable chronic diseases [40,41,42,43].

The COVID-19 pandemic has been described as a syndemic pandemic [44]. Originating in anthropology, a syndemic describes a set of closely intertwined and mutual enhancing health problems that significantly affect the overall health status of a population within the context of a perpetuating configuration of noxious social conditions [45]. Deprivation—which is an area measure of poverty, low income, and a reflection of the wider social determinants of health (such as housing, working conditions, unemployment, health-care access, etc.)— results in multiple, interacting, and additive adverse risk factors for COVID-19 outcomes. These can be summarised by way of four inter-related pathways: unequal exposure, unequal transmission, unequal vulnerability, and unequal susceptibility [46]. Living in a more deprived neighbourhood may increase the likelihood of exposure and transmission. Although it has been proven that individuals living in more deprived neighbourhoods have worse profiles on many components of subjective health, more risk factors and higher morbidity and mortality rates than their counterparts living in less deprived neighbourhoods, the level of neighbourhood deprivation does not influence with a higher extent the health of those with low individual DI compared to those whit a higher individual DI [43]. This could be the reason why not all individuals living in the most deprived neighbourhoods have a higher vulnerability and susceptibility to COVID-19 and would explain why the analyses in this study give different results when using geographical DI than when using individual DI.

This study had some limitations. Firstly, policies for the execution of the diagnostic test changed during the period under study. During the first wave, the diagnostic capacity was limited, and the number of positives could be under-reported, whereas for the second wave data could represent laboratory-confirmed COVID-19 cases who sought care. Secondly, the death status only reflects deaths occurring in individuals diagnosed with COVID-19, but this number may be underestimated due to non-diagnosis. Third, the Charlson comorbidities index was utilized as a comorbidity index, though it may not be the most suitable or up-to-date choice for studies related to SARS-CoV-2. Unfortunately, we could not independently calculate a comorbidity index due to a lack of available data. Finally, the DI is a composite measure of deprivation, and it is difficult to know which of its constituent factors are driving associations. There are other potential variables, such as income and homeownership, that could enhance the deprivation index, but we were unable to assess them due to information unavailability.

Conclusion

Historically, pandemics have been experienced unequally with higher rates of infection and mortality among the most disadvantaged communities [47]. Evidence from a variety of countries suggests that these inequalities are being mirrored in the COVID-19 pandemic [36, 48] and this study also adds to this evidence. The causal forces at work at the interface of disease epidemics and social inequality are complex: they are operant at the level of individuals, neighbourhoods, and local communities. The results of this study remind us to be cautious about using geographical DI as a proxy for the level of individual social disadvantage because of the inevitable potential ecological bias that can result from attributing a collective measure to an individual and may lead to inaccurate assessments.

The geographical DI is often used due to a lack of individual data. However, on the determinants of health and health inequalities, monitoring has to have a central focus. Health inequalities monitoring provides evidence on who is being left behind and informs equity-oriented policies, programmes and practices [49]. The emergence of novel data sources and monitoring approaches for public health surveillance is triggered by technological advances in data retrieval and data analysis. Unfortunately, the information richness resulting from the data science revolution is not exploited to better understand health inequities and inform the development and implementation of services and policies that tackle inequities in health [50, 51].

Reducing these inequalities—and those that may result from future pandemics—requires long-term action to improve equity in health and wealth. Future research and data collection should focus on improving surveillance systems, for example by integrating individual measures of inequalities into national health information systems. Understanding the social nature of emerging infectious disease pandemics will ultimately help reduce the burden of disease.