Introduction

The Organisation for Economic Co-operation and Development (OECD) defines an index as “a composite indicator that is formed when individual indicators are compiled into a single index, based on an underlying model of the multi-dimensional concept that is being measured” (OECD, 2004). A health index, which is a composite of several health indicators, can be used to measure the health of a community (Kaltenthaler et al., 2004). A population health index provides a summary measure of a certain health characteristic at the population level (Ashraf et al., 2019). Since the composite indicators forming the index summarise complex or multi-dimensional indicators or measures into a single but larger concept (Saisana & Tarantola, 2002), it is easier to interpret them than interpreting several, separate indicators. The composite indices can be used to examine health disparities across geographical regions, groups or across times.

Many health indices have been developed, mainly in developed countries (See Ashraf et al., 2019; Kaltenthaler et al., 2004, WHO, 2018 for a review of literature on health indices). A composite health index developed for use in a developed country may not be applicable in a developing country such as India due to the complexity and diversity of factors influencing health and healthcare. Such an application may also lead to the ‘risk of exporting failure’ (Miranda & Zaman, 2010). For example, in countries like India, constraints on access to healthcare and nutrition are more pronounced, poverty and inequality are widespread, and healthcare infrastructure and services are often inadequate (Balarajan et al., 2011; Goli & Arokiasamy, 2014; Goli et al., 2013; Kumar et al., 2013; Pathak et al., 2010; Reddy, 2012). Despite substantial progress made in improving population health over the past four decades—as reflected by reductions in infant mortality rate (IMR), crude death rate (CDR), maternal mortality ratio (MMR) and improvements in life expectancy—the distribution of these gains remains uneven among states and socioeconomic groups (Dandona et al., 2020; Meh et al., 2022; Rai et al., 2012). Hence, it is important to develop a health index tailored for India, allowing a nuanced understanding of health disparities across various social, economic, and geographical regions.

In India, efforts have been made to develop health indices for the entire country and for various groups (Sehgal et al., 2023). For example, previous attempts in India have focussed on specific aspects, such as measuring health resources (Sekhar et al., 1991), monitoring child health status (Satyanarayana et al., 1995), explaining variation in poverty, health, nutritional status and standard of living (Antony & Visweswara, 2007), conducting block-level comparisons and analysing financial allocations (Doke, 2018) or examining inequities in health coverage (Prinja et al., 2017). However, none of the health indices developed in India provides a comprehensive overview of health. With few exceptions, such as Prinja el at., (2017), Doke (2018) and Sharma et al., (2019), most studies relied on state-level data, masking intrastate variations. Given the substantial disparities in health across India (Sivagurunathan, Umadevi et al., 2015; Bora & Saikia, 2018; Guilmoto et al., 2018; UNICEF, 2019), any index constructed at an administrative unit above the district level risks masking areas of disparity and diverse health care needs. Recognising the significance of local geography in shaping public policy (Kim, Pathak et al., 2019), there is a need to measure the health index at lower administrative units such as districts. This approach aligns with evidence indicating that health inequalities are more pronounced and persistent when considering smaller geographical areas compared to the larger areas (Krieger, Chen et al., 2002). Despite the importance of social determinants on various health measures, such as morbidity or duration of illness, only the composite index of health by Antony & Visweswara (2007) incorporated social determinants. Furthermore, there has been limited discussion on the rationale behind the choice of indicators for creating health indexes, with indicators often selected arbitrarily, and lacking proper justification (Sehgal et al., 2023).

This study aims to develop and validate the Health Index for India (India Health Index) by using data from the most recent round of the nationally representative survey, the National Family Health Survey (NFHS-5), and employing Principal Component Analysis. Additionally, we examine the spatial variation in the India Health Index across all 707 districts of the country. By using the same dataset to measure health indicators, we ensure that inferences can be drawn confidently, minimizing the impact of variations in results attributable to differences in data quality and methodologies.

Data and methods

Data

Drawing from the latest round of the NFHS-5 survey, conducted in 2019–2021, we analysed household data at the district level. Our choice of district as the analytical unit stems from the substantial disparities in health outcomes within states in India (IIPS, 2017; Singh et al., 2011). Moreover, in India, policies are put into action by the deputy commissioner who is the Executive Head of the district, working hand in hand with the local Member of the State Legislative Assembly (Swaminathan, Kim et al., 2019), thereby underscoring the importance of the district level in assessing the necessity and impact of health interventions. The NFHS-5 was purposefully designed to provide comprehensive data on all the indicators required for this research, spanning all 707 districts in India. It included a sizeable representative sample, comprising of 636,699 households, 724,115 women, and 101,839 men. Further details about the survey can be obtained from the website of the National Family Health Survey, India (IIPS, 2021).

Selection of indicators

The conceptual framework of this study is guided by World Health Organisation’s Social Determinants of Health approach (SDOH) (Solar & Irwin, 2007; World Health Organisation, 2009). The selection of indicators within this framework was guided by several criteria: data availability at the district level from the same source to ensure reliability and to minimize potential data quality issues; relevance to population health within the Indian context (i.e., a clearly defined relationship of the indicator—either positive or negative- with the India Health Index); variability among districts for effective comparison; and inclusion of indicators suitable for longitudinal analysis, facilitating the monitoring and evaluation of changes in population health nationwide. Additionally, indicators representing various stages of human life—including childhood, adolescence, and adulthood—were incorporated (refer to Table 1).

Table 1 Domains, indicators, and corresponding life stages used for developing India Health Index at district level

The selected indicators were divided into six domains to encompass a broad spectrum of variables, including socio-cultural factors, population health status—both protective and risk factors—household environment, and health system and policy. Table 1 offers a comprehensive description of each indicator incorporated into the final index, along with their anticipated association with the overall India Health Index.

The first domain, labelled Socio-cultural, encompasses variables acting as proxies for socio-economic factors including the percentage of women with 10 or more years of schooling, the sex ratio of the total population (females per 1000 males), and the percentage of women aged 15–19 years who were already mothers or pregnant at the time of the survey. Empirical evidence supports the importance of socio-economic factors, material well-being and access to healthcare services in shaping population health outcomes (Goli & Arokiasamy, 2014; Goli et al., 2013; Karlsson et al., 2020).

The second domain, referred to as Health status, comprises indicators such as percentage of children with diarrhoea, symptoms of acute respiratory infection (ARI) in children, the percentage of women and men with very high blood sugar level and mild elevated blood pressure. Diabetes and hypertension are significant global public health concerns, contributing to high mortality and morbidity associated with non-communicable diseases (WHO, 2015). India, in particular, must address these conditions due to their increasing prevalence (Swami et al., 2015). This is crucial for preventing and managing noncommunicable diseases (Geldsetzer et al., 2018). Given the high incidence of cardiovascular disease and diabetes among Indians, hypertension serves as one of the modifiable risk factors contributing to a substantial burden of chronic diseases in India (Devi et al., 2013; Geldsetzer et al., 2018; Gupta et al., 2019). Therefore, there is a need to customise the India Health Index to the specific health situation in India and include indicators which link to hypertension and diabetes.

The third domain, denoted as Health determinants, includes the risk factors such as stunting, underweight, anaemia in children, women whose Body Mass Index (BMI) is below the normal range; women who are overweight or obese (BMI ≥ 25.0 kg/m2); non-pregnant women aged 15–49 years who are anaemic; and men and women aged 15 years and above who use any kind of tobacco. The concurrent presence of underweight and overweight—termed the double burden of malnutrition (DBM)—is acknowledged as a substantial public health challenge in developing countries, such as India (Biswas et al., 2020). Both underweight and overweight conditions are linked to diverse health conditions, leading to disabilities and mortality, particularly impacting women of childbearing age. These outcomes involve an increased likelihood of premature deliveries, still births, neonatal mortality and maternal mortality (Biswas et al., 2020; Kamal et al., 2015). Similarly, the prevalence of anaemia among women persists as a significant public health concern across the majority of WHO member states, carrying both immediate and long-term health implications (Chaparro & Suchdev, 2019). Globally, more than one in three pregnant women experience iron deficiency anaemia (Lewkowitz & Tuuli, 2019; Means, 2020). Additionally, tobacco use is linked to a substantial global mortality burden (World Health Organization, 2008) and premature deaths, particularly notable in India (Jha et al., 2008). With a global declining trend in tobacco consumption, including in India, over the last two decades (Lahoti & Dixit, 2021; Suliankatchi Abdulkader et al., 2019; World Health Organization, 2021), women and youth have become a prime target for tobacco industry promotions (Bansal et al., 2005; Richmond, 1997). Child immunization (Hasan et al., 2020) and breastfed at 4 months are important indicators of child health status constituting the fourth domain, Health determinants- Protective factors.

The fifth domain, Household environment, encompasses indicators such as drinking water, toilet facilities and cooking fuel. Recent evidence from the Global Burden of Disease highlights the significant association between unsafe drinking water and inadequate sanitation and increased morbidity and mortality worldwide (Wolf et al., 2023). Research indicates that access to clean drinking water and proper sanitation facilities is linked to reduced childhood diarrhea, decreased instances of intestinal infections that cause malnutrition and stunted growth, and lower mortality rates among children under five (Chakrabarti et al., 2020; Lakshminarayanan & Jayalakshmy, 2015; Mara et al., 2010; Spears et al., 2013). Additionally, utilizing clean cooking fuel helps prevent deaths from indoor air pollution, a significant contributor to pneumonia in Low or Middle-Income Countries (WHO, 2014).

The final domain pertains to Health system and policy, including factors that represent various facets of access to maternal and child health services. These factors align with the health systems and policy assessment framework that conceptualizes the World Health Organization (WHO) health systems and policy building blocks (Singh, Huicho et al., 2016). They include aspects such as maternal health during pregnancy (e.g., mothers who consumed iron folic acids for 180 days or more when they were pregnant), childbirth (e.g., percentage of c-section deliveries), and postpartum care (e.g., percentage of mothers receiving postnatal care from a health professional within 2 days of delivery, health workers discussing family planning with female non-users of contraception). Antenatal care is pivotal in promoting maternal and child health by providing preventive and promotive health care services (Campbell & Graham, 2006). Postnatal care holds particular significance for maternal and child health, given that a significant proportion of neonatal deaths occur during this period (World Health Organization, 2013). Metrics such as c-section delivery and births attended by skilled birth attendants are vital health system indicators, reflecting the availability and management of obstetric complications (Montgomery et al., 2014). Health workers have played a critical role in increasing contraceptive use and awareness in many developing countries, including India (Kumar et al., 2020; Phillips & Hossain, 2003). It is important to recognize that the health system and policy indicators used in the construction of IHI serve as indicators of healthcare system performance. For example, the availability, accessibility, and utilization of prenatal care services within the healthcare system indicate the system's capacity to provide preventive care and address maternal health needs before childbirth. Similarly, the quality of obstetric care during childbirth, including the presence of skilled birth attendants, and availability of obstetric services, reflects the effectiveness of the healthcare system in managing childbirth complications and ensuring safe deliveries. Postpartum care is essential for monitoring maternal health and addressing any lingering health issues after childbirth. The provision of postpartum care services signifies the healthcare system's commitment to comprehensive maternal care beyond delivery. Therefore, they can be classified as health system indicators that can provide insights into the effectiveness of healthcare delivery, access to services, and the overall health policy environment (World Health Organization, 2010, 2016).

Methods

In addition to selecting indicators based on their widely recognized relevance and theoretical considerations, the variability of indicators among districts also played a crucial role. For instance, an indicator that was either universally present in all districts (~ 100%), or completely absent (resulting in a zero-standard deviation) would fail to differentiate between districts, thus providing minimal contribution to distinguishing the India Health Index. Moreover, in cases where indicators showed a high level of correlation (greater than 0.90), only one of them was retained for analysis. All indicators were presumed to have a hypothesized relationship with population health, and if the relationship did not align in the same direction, adjustments were made accordingly. For example, a higher prevalence of obesity was indicative of poorer health, and if this trend did not align, as seen in the case of neonatal tetanus immunization, the reciprocal of the indicator was used.

The indicators were log transformed to address skewness in the values. Subsequently, standardized values for these indicators were calculated, with each variable centred to zero and scaled to unit variance. Statistical computations using Principal Component Analysis (PCA) (Abdi & Williams, 2010; Jollille, 2002) were employed to combine the selected indicators, resulting in the creation of a comprehensive, multi-dimensional India Health Index (IHI). This facilitated the ranking of districts according to varying levels of population health. Details of the methodology can be accessed elsewhere (Sehgal et al., 2024).

PCA involved several steps. First, it included generation of a covariance matrix to identify correlations among variables. Then, by analysing the eigenvectors and eigenvalues of the covariance matrix, the principal components were identified. The eigenvalues indicated how much each of the principal components contributed to the dataset’s variability. By inspecting the scree plot based on these eigenvalues (the point where the slope of the curve levelled off) and by using the variance explained (at least 70% of variance explained), the decision on which principal components to retain was made. Subsequently, a score derived from the selected principal components was utilized to determine the rank of each district. This ranking, in turn, established the districts' positions on the IHI.

Internal consistency of the India Health Index

The internal consistency of the India Health Index was tested by Cronbach. The closer the coefficient is to one, the better the verification that the variables were homogeneous. A Cronbach coefficient alpha of ≥ 0.70 is considered highly reliable (Nunnally & Bernstein, 1994).

External validation validity of the India Health Index

In the absence of any existing district level population health index for all the districts of India, assessing external validity posed a challenge. To address this, we tested the external validity of the India Health Index by correlating it with two key health indicators at the state level: under-five child mortality (U5MR) and the Subnational Human Development Index (SHDI). This involved deriving the state-level ranking of the India Health Index by averaging its values across districts within each state. Data on U5MR were sourced from NFHS-5 for this analysis. The SHDI offers a subnational version of the Human Development Index (HDI) (Smits & Permanyer, 2019; Permanyer & Smits, 2020). SHDI comprises of three underlying sub-indices (for education, health, and standard of living), covering two education indicators (mean years of schooling of adults aged 25 + and expected years of schooling of children aged 6), along with one each for health (life expectancy at birth) and standard of living (Gross National Income per capita, PPP, 2011 US$).

All analyses were conducted using statistical software SAS release: 9.04.01M7P08062020.

Spatial analysis of the India Health Index

To examine spatial dependence and clustering of IHI, Moran’s I and Univariate Local Indicator of Spatial Association (LISA) cluster maps were produced. While Moran’s I index quantifies the extent of autocorrelation among the index and its spatial neighbours, LISA measures the correlation of IHI values at the district level around a specific location (Anselin & Bera, 1998). Spatial clustering (or spatial autocorrelation) at the district level was calculated using a spatial weight matrix, which captured the spatial proximity between each pair of districts. The spatial weight matrix was generated using GeoDa software, utilizing the queen contiguity method of order one. This method identifies the spatial proximity between each pair of districts in the dataset (i.e., districts sharing borders or corners) (Getis & Ord, 1992). Moran’s I value serves as a global indicator of spatial autocorrelation. Moran’s I value ranges from −1 (indicating perfect dispersion) to + 1 (perfect correlation), with positive values suggesting spatial clustering of similar observations and negative values indicating clustering of different observations (i.e. more dispersed distribution). A Moran's I value of zero suggests a random spatial pattern with no spatial autocorrelation (Moran, 1950).

Following this, we estimated the univariate Local Indicator of Spatial Association (LISA) to explore the spatial clusters/outliers present in the data (Anselin & Bera, 1998; Clark & Evans, 1954). The following patterns were generated and presented on the maps:

  • Hotspots (bright red) districts with high district-level IHI value sharing boundaries with high district-level IHI value (high–high).

  • Cold spots (bright blue) districts with low district-level IHI value sharing boundaries with low district-level IHI value (low–low).

  • Spatial outliers (light red) districts with high district-level IHI value sharing boundaries with low district-level IHI value (high–low).

  • Spatial outliers (light blue) districts with low district-level IHI sharing boundaries with high district-level IHI value (low–high).

Results

Descriptive statistics of variables used in the construction of India Health Index

Descriptive statistics of the indicators included in the India Health Index are provided in Supplementary Table 1. On average, 41% of women had 10 or more years of schooling. However, the indicator values varied greatly between districts. Sex ratio of the total population (females per 1000 males) was at 929. Early pregnancy was reported by 7%.

The overall prevalence of diarrhoeal disease and acute respiratory infection among children aged below 5 years was 7% and 3% respectively. The overall prevalence of high blood sugar and blood pressure was 6% and 12% respectively for women. Men exhibited slightly higher percentages than women, with 7% and 16% for high blood sugar and blood pressure, respectively.

Considering the nutritional statuses, 36%, 32% and 67% of children were identified to be stunted, underweight and anaemic, respectively. Also, 19%, 24%, and 57% of women were identified to be low BMI, obese, and anaemic, respectively. About 9% of women and 38% of men used tobacco. About 42% of children under age 3 were breastfed within 1 h of births while 84% children aged 12 to 23 years of age were fully immunised.

Most of the households had an improved source of drinking water (96%) and toilet facility (70%). Around 59% of the houses used clean cooking fuel. Approximately 41% households had a member with health insurance. Most child births (79%) received care from trained medical professional within 2 days of delivery and a large proportion of deliveries (89%) took place in a health care institution. Only a small percentage of pregnant women had iron folic acid for 180 days (26%). Approximately 67% women had used a method of family planning, a small percentage of women non-users of contraceptives reported meeting a health worker for family planning (24%) and 22% of births were by c-section.

Correlation between 29 indicators included in the India Health Index are shown in Supplementary Fig. 1. A positive correlation implies that the magnitude in the variable pair increases and decreases together. On the other hand, a negative correlation indicates that the magnitude in variable pair vary in opposite directions, i.e., increase (or decrease) in magnitude in one variable tends to decrease (or increase) the other variable.

The correlation between the indicators is mainly weak or modest and in the expected direction. Stunting is highly correlated with underweight (r2 = 0.72). Women’s education had a moderate negative correlation with stunting (r2 = −0.55), underweight (r2 =−0.54), and positive relation with toilet facility (r2 = 0.57), clean cooking fuel (r2 = 0.68), and institutional delivery (r2 = 0.52). At the other end of the scale, sex ratio, ARI, breastfeeding, immunisation, drinking water and health insurance were either not correlated significantly to any other indicator or weakly correlated with other indicators such as diarrhoea, toilet facility, and cooking fuel.

As expected, obesity was positively correlated with the prevalence of blood sugar (r2 = 0.66), and blood pressure (r2 = 0.36) in women. Blood sugar in women was also strongly correlated with blood sugar (r2 = 0.88) in men. Blood pressure in women was also strongly correlated with blood pressure (r2 = 0.83) in men. Post-natal care was positively correlated with institutional birth (r2 = 0.83), the consumption of iron tablets during pregnancy (r2 = 0.69), family planning (r2 = 0.47) and undergoing c-sections (r2 = 0.58).

Principal component analysis (PCA) & computing the India Health Index

Using all 29 indicators PCA was carried out to determine weights for each indicator and to summarize the indicators into a single score to compute the rank of the district. By inspecting the scree plot (Supplementary Fig. 2), we determined the point where the slope of the curve clearly levelled off (the ‘elbow’), and by using the variance explained (at least 70% of cumulative variance explained) (Supplementary Table 2), we determined the number of components that should be retained by the analysis. After examining the scree plot, only nine components that had eigenvalue that cumulatively explained at least 70% percent of the variance were extracted for analysis. The first nine principal components explained nearly 72% of the total variation in the data (Supplementary Table 2). For example, first principal component explains nearly 27% of the variation and second principal component explains nearly 11% of variation. The third component explains nearly eight percent of the variation. The fourth and fifth component accounted for six and five percent of the variance each, the sixth and seventh explained nearly four percent, the eighth, and ninth principal component explained around three percent of the variance each.

The Eigenvector matrix in Supplementary Table 3, presents Eigenvector/weights assigned to each indicator in each of the principal components. A positive value of Eigenvector/weights indicates that a variable and a principal component are positively correlated, that is, an increase in one result in an increase in the other. Negative value of the Eigenvector/weight indicates a negative association. Large (either positive or negative) weight indicate that a variable has a strong effect on that principal component. The weights of each indicator vary on each principal component.

As shown in Supplementary Table 3, first principal component assigns large (> 0.20 based on the values of Eigenvectors in Supplementary Table 3) positive weights for the indicators, namely: women’s education, proportion of children with stunting, underweight, tobacco use in men, improved toilet facility, cooking fuel, institutional births, consumption of iron folic acids 180 days during pregnancy, c-section and negative weight for blood sugar and obesity in women. The second principal component assigns large positive weights to underweight, anaemic children and women, low BMI, toilet facility and large negative weights to postnatal care. The large positive weights were assigned by principal component third to prevalence of diarrhoea, acute respiratory infection, obesity, and health insurance, postnatal care, consumption of iron folic acid for 180 days during pregnancy, any method of FP, health worker FP. Negative large weights were computed for drinking water. Principal component fourth assigned large positive weights to early pregnancy, blood sugar in women and men, women’s use of tobacco, any method of FP, and negative to breastfeeding and anaemic children. Principal component fifth assigned large positive weights to blood pressure for men and women, low BMI, breast feeding, immunization, health insurance, and negative with drinking water.

The sixth principal component assigned largest and positive weights to diarrhoea, ARI, blood pressure women and men, anaemic children and women, women who use tobacco, toilet facility and negative weights to any family planning method. Principal component seventh assigned largest and positive weights to sex ratio, stunting, health insurance, diarrhoea, acute respiratory infection, and negative weight to diarrhoea, acute respiratory infection and immunization. In contrast, the eighth principal component assigned largest and positive weights to sex ratio, diarrhoea, acute respiratory infection and negative weights to anaemia in women. Lastly, the ninth principal component assigned largest positive weights to sex ratio and breastfeeding; negative weights to stunting, and health insurance.

The districts were grouped into three categories based on the score created for each district after combining all the seven scores as described in the methodology. The three groups were India Health Index < 235 rank, India Health Index 235–470 rank, and India Health Index > 470 rank; denoting the states that lead, are intermediate and lag. Thus, a lower value denotes better health and lower population health ranking.

Internal consistency

The computed Cronbach coefficient alpha was 0.71, which suggested internal consistency and appropriateness of the PCA methodology (Nunnally & Bernstein, 1994).

External validation

In the absence of a district-level measure for validation of IHI, we used U5MR, a popular state-level indicator for measuring child health and Subnational Human Development Index (SHDI). Table 2 presents the India Health Index rank, the coefficient of variation of the IHI for each state, SHDI, and the U5MR. The IHI correlation with U5MR was 0.74. This figure indicates a positive relationship of IHI with U5MR, supporting an indication of construct validity. A scatter plot (Supplementary Fig. 3) helps visualize this moderately strong positive relationship between the IHI and U5MR. We also correlated the India Health Index with the SHDI. The IHI correlation with SHDI was 0.87, presenting a high positive correlation with a popular index of overall development.

Table 2 Mean India Health Index (arranged in ascending order state level IHI), and state level subnational human development index (SHDI), and under five mortality rate (U5MR)

India Health Index: district and state level variations

Wide variation in IHI between states and within several states (i.e., district level) can be noted in the box plots in Fig. 1 and the Coefficient of Variation (CV) in Fig. 2. In a box plot, the horizontal line drawn through the box is at the median. The whiskers start from each quartile to the minimum or maximum value of the IHI for each state, the length of the box shows the width of the range of IHI rank within the state. Of all the states and union territories (Fig. 3), Goa from the West and Kerala, Puducherry, Andaman & Nicobar, Lakshadweep, and Tamil Nadu from the South and Himachal Pradesh, Punjab, Uttarakhand, Haryana in the North depict the best population health as indicated by their lower rank and lower score. Bihar, Jharkhand, Uttar Pradesh from the North, Tripura, Assam from the Northeast and West Bengal from the East depict worse population health. Taking Table 2 and Figs. 4 and 5 together, one can conclude that on the one hand, districts in some states such as Kerala have low IHI ranks (with a minimum IHI rank of 2 to a maximum of 34), and Tamil Nadu (with minimum IHI rank 3 to maximum of 231) indicating better health but have a moderately high CV of 68 in Kerala and 87 in Tamil Nadu showing large inter-district variation. On the other hand, districts in states such as Bihar and Jharkhand have high IHI ranks, indicating poorer health. The IHI ranks range from a minimum of 522 to a maximum of 707, and a low CV of 7 in Bihar, and a minimum of 358 to a maximum of 700 with a low CV of 15 in Jharkhand, meaning all the districts are clustered together in the poor spectrum of health (Table 2 and Fig. 2).

Fig. 1
figure 1

Box plot showing variation in India Health Index (IHI) between and within states/UTs

Fig. 2
figure 2

Plot of mean India Health Index and coefficient of variation (CV) for states

Fig. 3
figure 3

Map showing districts of India classified by ranking of India Health Index

Fig. 4
figure 4

Scatter plot of Moran’s I for district level ranking on India Health Index

Fig. 5
figure 5

Map based on LISA showing clustering of high-high and low-low districts for ranking on India Health Index

Table 3 provides a comparison of the States based on the number of districts into lead, intermediary, and lagged on IHI. This Table demonstrates within-state disparities in population health. In some states, 100% of districts fall under lead category, with an IHI value below 235. These states include Andaman & Nicobar, Chandigarh, Goa, Kerala, Lakshadweep, NCT of Delhi, Manipur, Puducherry, Sikkim, and Tamil Nadu. In progressing states like Ladakh, 100% of districts have an IHI value between 235 and 470. On the other extreme, in states like Bihar, 100% of districts are laggard, with an IHI value above 470) (Fig. 3). Fifty-two percent of districts in the lead states come from Tamil Nadu (14%), Haryana, Karnataka, and Punjab (nearly 8% each), Kerala (6%), NCT of Delhi and Telangana (with nearly 4% each), while Uttar Pradesh (17%), Bihar (16%), Madhya Pradesh (10%) and Jharkhand (9%) make up of approximately 52% of lagged states.

Table 3 India Health Index ranking at district level grouped into leading, intermediate, and lagging category

The Moran’s I index value (Fig. 4) was 0.715, indicating strong spatial autocorrelation in the IHI across the districts of India. Figure 5 illustrates the spatial distribution of the IHI across the 707 districts of India surveyed in NFHS-5. We observed hotspot clustering of IHI in most districts in Uttar Pradesh, all districts in Bihar and Jharkhand, as well as a few districts in Assam, Chhattisgarh, Maharashtra, and Gujarat. Conversely, cold spot clustering was predominantly observed in the southern part of India, and some northern states such as Haryana, Punjab, Delhi, and Himachal Pradesh.

Predicting India Health Index

The correlation matrix in Supplementary Table 4 shows the key indicators are positively and negatively related to IHI. A strong positive correlation of IHI was noted with a few indicators in each domain. The women’s education from socio-cultural domain had strong correlation at r2 = 0.80; from health determining (risk factor) domain stunting (r2 = 0.72), underweight (r2 = 0.73) and low BMI (r2 = 0.64), had strong positive correlation; from the household environment domain, the indicators of toilet facilities and clean cooking fuel had moderate correlation at 0.65 and 0.57 respectively. From the health system and policy domain—Iron 180 days and c-section had moderate correlation at around r2 = 0.54 and r2 = 0.57 respectively. There’s a moderately negative correlation between IHI and blood pressure in women (r2 = −0.54), obesity in women (r2 = −0.70) and blood pressure in men (r2 = −0.56). However, socio-cultural and health status indicators such as sex ratio, diarrhoea and ARI have a weak correlation (r2 = 0.06, r2 = 0.38 and 0.29 respectively) with IHI. Breastfeeding and immunization (r2 = 0.35 and 0.24 respectively) also show a weak correlation with IHI.

Discussion

This study presents the development and validation of a comprehensive India Health Index, computed at the district level. Our focus extended to examining the spatial pattern of health inequalities using this index. Previous efforts to create composite health indices have been limited in scope, addressing specific health dimensions such as health resources, child health status, and health coverage (see references in the Introduction). None have provided a comprehensive overview of health. Moreover, most relied on state-level data, obscuring within-state disparities (Sehgal et al., 2023). Addressing these gaps, the India Health Index is comprehensive and multidimensional, covering indicators across various life stages and a wide spectrum of factors including sociocultural influences, health determinants and dimensions associated with health systems and policies. To overcome the limitations of previous approaches in giving equal weights, we employed Principal Component Analysis (PCA) to assign weights to individual indicators, thus ensuring a more nuanced approach. Furthermore, by utilizing a single, nationally representative, publicly available dataset, we minimized the influence of variations in results due to differences in data quality and methodologies. To the best of our knowledge, this kind of work has never been undertaken in India or globally. The study's key findings are outlined below.

The Cronbach coefficient alpha of 0.71 indicates internal consistency. The IHI's external validity was established through its correlation with U5MR and SHDI, which were 0.74 and 0.87 respectively. While U5MR focuses on mortality within a specific age group, SHDI serves as a broader indicator of overall quality of life (Aimbetova et al., 2022). The strong correlation with these external indices suggests that the IHI rankings align well with other widely used indexing tools. For instance, Lakshadweep, Kerala, and Goa hold the top three ranks according to IHI, while Puducherry, Kerala, and Goa respectively hold the top three ranks in terms of U5MR. Kerala and Goa hold the top two rank for SHDI, followed by Chandigarh and Delhi. The small variation in rankings between IHI and SHDI stems from SHDI's broader focus on education, health, and standard of living.

We found substantial disparities in population health outcomes both between states and within districts across the country. Our index findings highlight that districts in Kerala, and Tamil Nadu in the southern region and Delhi in the north, generally exhibit higher levels of population health, suggesting better health outcomes for their populations. Conversely, districts in the eastern regions—namely Bihar, Uttar Pradesh, and Jharkhand—demonstrate poorer health outcomes. Interestingly, even in states with strong overall public health records like Kerala and Tamil Nadu, significant inter-district variations persist. The high coefficient of variation (CV) values in these states (68 for Kerala and 87 for Tamil Nadu) underscore the pronounced disparities in health within leading states. In contrast, states like Bihar and Jharkhand show less variation, but unfortunately, many districts in these states are categorized as "lagged," indicating substantial challenges in these areas. Due to limited prior research, making direct comparisons with other studies on this topic proves challenging.

The research also highlights a clear spatial pattern of health outcomes across the districts of India. The Moran’s I statistics for the health in the districts was 0.72, which confirms the spatial clustering in districts in India. This clustering aids in identifying areas of high-high and low-low districts in terms of health status. The clustering of high-high and low-low districts reveals a distinct pattern, with high-high clusters primarily situated in the eastern and some in the western regions of India, while low-low clusters are predominantly found in the south and a few in the north. These findings underscore the significant influence of district-level neighbourhood effects on health outcomes throughout India (Khan & Mohanty, 2018; Puri et al., 2020; Singh & Masquelier, 2018; Srivastava et al., 2020). The clustering of high-high and lagging districts may stem from inadequacies in health infrastructure coupled with disparities in service delivery, financing, and financial risk protection in India (Balarajan et al., 2011). Additionally, the healthcare systems in lagging states are often more vulnerable compared to the other parts of the country (Bajpai, 2014; John, 2005; Somvanshi, 2018; Swamy, 2014).

The finding that all districts in Bihar belong to the "lagged" category is of concern and underscores the critical need for tailored interventions and allocation of resources in this state to improve population health outcomes. The Government of India's efforts through the National Health Mission’s (NHM) flagship programmes have strengthened the healthcare system by ensuring universal access to fair, affordable, and high-quality healthcare services (Ministry of Health & Family Welfare 2019). However, achieving health equity in India remains a significant challenge. The launch of the Ayushman Bharat scheme in 2018 aimed to address significant disparities in healthcare access and quality across the country. It aligns with the National Health Policy 2017. With a focus on achieving Universal Health Coverage (UHC), this initiative has expanded healthcare accessibility, and provided financial protection. It has also transformed healthcare infrastructure and offered relief during the COVID-19 crisis (Grewal et al., 2023). However, critical issues persist, including supply–demand gaps, and the need for increased government expenditure. Additionally, there are challenges related to access and quality in rural healthcare centers (Grewal et al., 2023). These issues require urgent attention and resolution. By identifying districts with varying health outcomes, this study provides policymakers and health authorities with an opportunity to strategically allocate resources and implement existing and new strategies to address the specific challenges faced by each region.

The National Family and Health Survey offers extensive data coverage at sub-national levels and across time periods, facilitating the examination of spatial disparities and temporal patterns within the India Health Index. This creates avenues for developing similar indices in regions where data from the Demographic and Health Survey (equivalent to India's National Family Health Surveys) are accessible. This is particularly significant considering the resource-intensive nature of primary data collection required for compiling composite indices. Access to publicly available, nationally representative data sources enables other countries to utilize the India Health Index as a foundation for constructing their health indices, fostering cross-country comparisons.

This study has certain limitations that indicate avenues for future research. First, the index offers a relative measure of health inequality among districts but does not provide data on absolute levels of health status. Consequently, while the index facilitates district comparison and ranking, it may not fully illustrate the distinct health challenges encountered by each district in absolute terms. Second, we acknowledge that the NHFS dataset may not fully encompass the demographic diversity of households across India, particularly in terms of representing elderly couple households or nuclear families. However, it's essential to recognize that no single dataset can perfectly encapsulate the complexity and diversity of a nation as vast and varied as India. Future research should ensure a more comprehensive representation of India's diverse demographics. Third, our analyses were unable to include certain variables such as geriatric health outcomes, mental health, disabilities, women’s role in the decision-making process and distance to the health facility, due to data unavailability in NFHS at the district level. Therefore, there is a need to broaden the database to encompass these additional variables. Fourth, the study's reliance on data from the National Family Health Survey (NFHS-5) presents potential limitations concerning self-reported data and the presence of recall or reporting bias. These limitations could impact the precision of the indicators used in the index. Fifth, some districts in India are large in population size and area, potentially containing pockets with significantly divergent health outcomes. Developing the index at a smaller geographic level, such as the municipal or block level, could offer a more nuanced understanding of local disparities and improve the effectiveness of targeted interventions.

Sixth, we acknowledge that all the indicators included in our analysis can vary significantly between rural and urban areas within a district, and this distinction is crucial for a more nuanced understanding of health disparities. We carefully considered the trade-off between using a single dataset (NFHS) to ensure consistency and minimize variations in results, versus disaggregating the data and constructing separate health indices for rural and urban areas within each district using different datasets. Given our priority to maintain data consistency and reliability, we opted for the former approach, which did not explicitly address urban–rural health disparities. Nonetheless, we recommend future research to incorporate this distinction into the analysis by disaggregating the data and constructing separate health indices for rural and urban areas within each district.

Lastly, the study's conclusions are derived from data collected during a specific period (NFHS-5), and population health outcomes may change over time owing to changes in policies, programs, and socioeconomic factors. Regular updates of the index using recent data are necessary to monitor progress and identify emerging health outcomes. Furthermore, longitudinal studies and qualitative research to supplement our findings will provide a more comprehensive understanding of the dynamics influencing health outcomes at the district level. Future iterations of the index would help build trends in ranks that reflect changes based on the health system and policy interventions.

Despite these limitations, the current study has important strengths. This paper significantly advances the field of health inequalities. It leverages publicly available data, considers various life stages and covers numerous factors affecting health. Additionally, it uses the PCA technique for combining the indicators into the creation of a comprehensive India Health Index. Rigorous research on health indices in India is scarce, and to the best of our knowledge, this study is the first one on the development and validation of a comprehensive and multidimensional India Health Index. This study also lays a foundation for the application, comparison, and further refinement of health indices in other developing countries where similar data sets are available. Our focus on analysing spatial patterns in health represents another significant contribution to the literature, potentially advancing research on geographical disparities in health. This study clearly illustrates the spatial diversity of health across Indian districts. The insights gained could inform public health planning, targeting underlying factors linked to health outcomes in India. This would help allocate health resources and interventions at the district level within the hotspot clusters, taking into account locally significant determinants for areas with poor health indicators. Continuous monitoring, refinement, validation, and targeted interventions based on the index have the potential to advance equity in population health throughout India.