FormalPara Key Points for Decision Makers

There is an urgent need for healthcare cost data in India to inform priority setting, insurance reimbursement rates and budgeting.

A statistical cost function is used to estimate costs for settings where there currently are no data and provides a set of state-level unit costs.

The analysis shows the variability in healthcare costs across different settings and demonstrates the usefulness of such an approach in the absence of national cost datasets.

1 Background

The costs of healthcare service delivery underpin many important policy decisions—from questions of affordability, to making choices between different technologies and innovations, to setting prices of health services. Many countries, and in particular, low- and middle-income countries, suffer from a lack of information in this area, creating an information vacuum that leads to opaque policy decisions and cost escalation in health services [1, 2]. As the modus operandi of public health systems involves the purchasing of services by the state using public money, and as international pressure to move towards systems of Universal Health Care (UHC) coverage grows, governments without cost information are increasingly vulnerable in their ability to engage in strategic purchasing of health services. Cost information allows governments to be better-equipped as price setters and allows for informed decisions around the allocation of resources between different healthcare services and technologies to ensure value for money.

The challenges arising from limited healthcare cost information are magnified in the highly complex and fragmented Indian health system, where public services are purchased and delivered by a mix of both private and state providers. India operates under a model of fiscal federalism, where health is primarily the responsibility of state governments. There are a multitude of state insurance schemes in operation across the country that have been introduced to tackle the inequitable access to healthcare [3]. At the central level, the first national government tax-funded scheme for the poor and vulnerable was initiated in 2008. In 2018, the national government then launched the Ayushman Bharat-Prime Minister’s Jan Arogya Yojana (AB-PMJAY), a tax-funded national insurance programme that aims to subsume earlier national and state schemes and to provide healthcare cover of 500,000 Indian Rupees (INR) for over 500 million poor beneficiaries [4]. Despite these different schemes, to date, state governments have not used systematic, evidence-based approaches to setting prices [5]. Until recently, the process of setting prices under AB-PMJAY and other insurance schemes has been somewhat haphazard, relying on surveys of insurance claims data and interviews with experts. The government explained the severity of the problem in 2018: “we don’t have costing studies in India” [6]. While the latter statement exaggerates the issue, there are limited studies, the majority of which focussed on a single disease, technology or site. Data on private expenditures are regularly collected by the National Sample Survey Office, but this is limited to patient expenditures on healthcare and cannot be broken down fully by disease or condition [7, 8]. Insurance claims data from 22 different government-funded schemes have also been collated into a single database; however, these estimates reflect prices agreed by tender and do not represent the cost of production [7, 9]. More recently, the availability of production cost data for the public sector has begun to grow with individual costing studies that have been carried out to explore the cost-effectiveness of different technologies as well as a series of primary costing studies [10,11,12,13,14,15,16,17,18,19]. However, the paucity of cost information remains a significant fact within the Indian health system [7, 20].

Compounding the general lack of cost information, a further challenge in the practical application of cost data for price setting and health technology assessment in India is the vast heterogeneity in costs of service delivery across different types of providers, levels of the system, states and geographical settings. Previously published cost studies have highlighted the variation in healthcare costs within the Indian health system [10,11,12,13, 20, 21]. This variation in unit costs arises as a result of supply factors due to different methods of production, ownership (public or private), differences in prices and wage rates as well as demand factors such as epidemiology, population density and socio-economic status of the local population [22,23,24]. Moreover, India’s size and federal governance structure with different modes of financing, delivery and purchasing of healthcare have given rise to the view that there is scope for ‘differential pricing’ across settings when setting reimbursement rates.

With sufficient data, it is possible to explore the determinants of cost using a statistical cost function and thereby enable the prediction of costs for different settings. The advantage of using a cost function approach is the ability to interpret the coefficients on key variables such as scale, geography and other factors and explore the impact of these factors that might drive cost variations [25,26,27,28]. At the global level, the World Health Organization (WHO) has used such an approach to predict country specific unit costs [29, 30]. In India, efforts, including those by the present authors, towards estimating costs and understanding the cost structures of healthcare facilities have begun [11,12,13,14,15,16,17,18, 21]. While these first studies have provided cost data for different levels of the health system, they have been undertaken in a limited number of facilities and states. Nonetheless, these data form the richest and most accurate source of cost data for publicly delivered health services in India to date and can be used to understand better what drives cost differences between facilities using statistical cost function analyses. Using these data and an average cost function approach, the aim of this paper is to develop models to identify the key drivers of unit costs of inpatient (IP) and outpatient (OP) services at primary and secondary level public health facilities in India for use in healthcare decision-making and to demonstrate the use of such a model in the prediction of unit costs for each Indian state.

2 Methodology

2.1 Development of the Models

The aims of the models are to predict unit costs of healthcare services for each state in India, at different service volumes by nature of service delivery (IP and OP care), and identify the degree to which other factors influence average costs. Unit costs are dependent on the total costs of input resources consumed (numerator) and output in terms of services provided (denominator). Total costs can broadly be divided into the ‘hotel’ components, i.e. human workforce, capital resources (like building and equipment) and overheads, and those associated with specific treatments, i.e. the costs of medicine, consumables and diagnostics, and are dependent on the beneficiaries. Hotel costs generally comprise 70–80% of hospital costs [13, 19, 31]. Other costs, i.e. those of medicines, consumables and diagnostics, are subject to different types of market influences than hotel type services, with patients typically contributing to a large share of these inputs in India [32]. These costs tend to be variable so that the average cost remains constant as volume of services changes. Hotel costs are more fixed in nature, with average costs varying with scale as well as other characteristics that might shape the production of a particular service. The cost function presented here is therefore designed to predict the hotel costs component of the average cost.

Our starting point is the WHO CHOICE refined model for predicting national-level unit costs (version 2) 2017/18 [30]. This model regresses unit costs against a set of explanatory variables such that:

$$\ln ({\text{UC}}) = \alpha_{0} + \alpha_{i} \mathop \sum \limits_{i = 1}^{n} \ln X_{i} + e_{i} \quad i = 1 \ldots n,$$

where UC is unit cost, \(X_{i}\) are the explanatory variables, \(\alpha_{0}\) and \(\alpha_{1 \ldots n}\) are the estimated parameters, and e is the error term. Natural logs of both the unit cost and independent variables are used because of the inherently skewed nature of cost data. In addition, the equation can then be estimated using ordinary least squares methods, and the coefficients can be interpreted as elasticities [29].

An average cost function approach was chosen. In this approach, the unit cost is regressed against a set of independent variables, selected based on theory and previous empirical findings. Resource inputs and their prices are included to identify the influence of each, e.g. is it the price or quantity of doctors that explains more of the unit cost? Further, explanatory factors such as volume of services, geographical location, capacity utilisation, quality and the nature of the healthcare market can also be included [23, 25, 33,34,35]. As average cost is largely explained by volume of activity, the model needs to include output as an independent variable to isolate and identify the influence of each of the other variables. Careful attention then needs to be paid to potential multi-collinearity. Sensitivity analyses were run to verify the inclusion of output as an independent variable and the choice of average cost over total cost function.

Within a hospital the standard set of outputs are the OP visits, IP admissions or IP days. To estimate unit costs for a particular service within a large production unit, a regression model can be run separately for each different service based on the assumption that the production relationship between different services at any single facility is constant. In this case, separate models are needed for IP care and OP care.

2.2 Data

Cost data are taken from the cost data surveys of 81 public sector facilities across six states in India: Punjab, Haryana, Tamil Nadu, Odisha, Himachal Pradesh and Kerala [11,12,13, 19]. The cost dataset comprised healthcare facility costs at district hospitals (DHs), community healthcare centres (CHC) and primary healthcare centres (PHC) (the sampling methods are described in Appendix 1; see the electronic supplementary material). Economic cost data were collected from a provider perspective using a mixed methodology. The first round of data collection took place in 2013; the second round took place in 2016. The costing methodology followed standard principles [36, 37], and the standardised methodology has been published elsewhere [11, 13, 19]. Costs from the first round were adjusted to 2015–2016 prices, in line with the second round, using a correction factor of 1.46 (source: 1 In line with service provision at the different health system levels, CHCs and DHs were included in the IP care model analysis. For the OP care model, the analysis included DHs, CHCs and PHCs.

Two adjustments were made to the cost dataset to address data issues. First, for the OP model estimation, 14 PHCs reported a value of zero for the number of beds. As number of beds was used as the measure of capital stock, and due to the need to take logs, we would have had to exclude these facilities from the OP model. To address this, we assumed the value 1 for the number of beds—as the lowest level of capital stock—for these facilities. Second, one CHC reported no admissions in the reference year and therefore was excluded from the IP analysis and the OP analysis runs where the admissions variable was required (models including capacity utilisation). No ethical clearance was required for this analysis of secondary data.

2.3 Model Specification

A range of different variables were considered for inclusion in the models based on a combination of theoretical considerations and previous cost models (for example, see [22, 23, 30, 31, 38, 39]). To identify the best specification, we used a combination of guided stepwise linear regression informed by Table 1 and the perspective of a potential user of the model. Potential users attempting to generate a facility- or state-specific unit cost could include policy makers at the central or state level, academics and health technology assessment professionals. As data availability is limited in India, in particular, access to data on wages and human resources and even facility infrastructure variables, it was important to factor data accessibility into the decision around which variables to include in the final model. The variables considered included the outputs of each service category, prices and quantities of labour, proxy of capital and any other significant inputs into production, as well as supply-side factors that might lead to cost variation [22]. Importantly, the models needed to be constructed using readily accessible data to enable a user to predict costs for their settings.

Table 1 Proposed variables for input into the cost function

Following Serje et al. four categories of labour should be considered [40]: professional, technical/auxiliary, clerks/secretaries and physical labourers. However, our data only allowed distinction between three categories: doctors, medical support staff (nurses/technicians/pharmacists) and other support staff. The capital stock can be represented by the number of beds at the facility for both IP and OP services. Number of beds gives an indication of the relative capital stock even where IP beds are equal to zero. Outputs used are hospitalisations for the IP model and OP visits for the OP model.

To account for the influence of supply-side factors, we included the level of health facility, capacity utilisation and state-level per capita health expenditure as well as two proxy variables to represent quality and the nature of the healthcare market. These proxy variables are the state health index, which reflects a combination of health outcomes, governance and inputs to the health sector [41], and the government “Aspirational District Programme” index, a composite index of socio-economic progress, health and education sector performance and basic infrastructure indicators [42]. The full set of variables considered for the models is listed and defined in Table 2. The best-fit models were then selected based on both statistical and theoretical considerations.

Table 2 List and definition of variables considered for each of the models

2.4 Model Selection

A best-fit model was then chosen for each of the OP and IP cost functions for the prediction of unit costs at DHs, CHCs and PHCs for all states in India. Base models for prediction were selected based on the trade-off between the data requirements and high predictive power of the model in terms of adjusted R-squared. First, models with multi-collinearity [variance influence factor (VIF) scores > 10] were excluded [44]. Subsequently models were listed in order of the adjusted R-squared. Starting with the model with the highest score, the models were assessed for their suitability to be used in the state-level predictions, according to data availability and ability to interpret the coefficient on the independent variable. The first model to fit these criteria was selected for the state-level cost predictions.

2.5 Cost Predictions

The mean values and the 2.5% and 97.5% uncertainty limits for IP admissions and OP visits at DHs and CHCs were estimated for all states in India using the best-fit models. State-level data were identified for each of the variables using national-level data sources including the National Health Profile, the National Sample Survey Office (NSSO) “Health in India” report (71st round) and, where data were unavailable at the national level, state-specific health system websites [8, 45]. There were no centrally compiled data on hospitalisations and OP visits as reported by facilities. These state-level variables were therefore estimated from NSSO household survey data on hospitalisations and seeking medical care in the public sector combined with reported institutional deliveries and maternal and child healthcare visits. The summary data on hospitalisations and OP visits and the methodology for their estimation is provided in supplementary appendices 2 and 3 (see the electronic supplementary material).

The predicted unit cost generated from the models is a median value due to the log transformations. This was adjusted to the arithmetic mean by applying a smearing factor [30, 46]. Uncertainty intervals around the predicted values were also constructed by first generating a random sample of 1000 based on the mean and standard error of each of the coefficients in the model and extracting the values at the 2.5th and 97.5th percentiles. Versions of the models will further be made available online for individual users to estimate unit costs for their own settings (see

2.6 Model Validation

The cost prediction models were validated by using data and results from preliminary findings of an additional costing study funded by the Government of IndiaFootnote 2 [47]. Actual costs and the explanatory variables were extracted from the primary data collected from the sampled health facilities for this study. The explanatory variables were used to predict the facility unit cost estimates based on the state prediction models. The predicted cost estimates were then compared with the actual unit cost estimates generated as part of the national costing study. The study results are yet unpublished, so identifiers of the health facilities were kept confidential.

3 Results

3.1 Model Outcomes

Descriptive statistics for each model are reported in Table 3. The mean unit cost for an IP hospital admission across the sample was 2563 INR, and the mean number of hospitalisations per facility in a year was 10,467. The mean cost of an OP consultation was 113 INR, with an average of 76,129 consultations in a year. DHs represented 37.5% of the IP sample (DHs and CHCs only) and 19% of the OP sample (which also includes PHCs).

Table 3 Descriptive statistics of the cost data used in the estimation of the inpatient and outpatient cost models

Table 4a and b present the results of the best-fit models for the IP and OP sample. The supplementary material reports on all the model runs that were carried out (Appendix 4; see the electronic supplementary material). The majority of the models had good predictive power with an adjusted R-squared greater than 0.7 both with and without inclusion of price and labour variables. The coefficients made theoretical and logical sense with relationships in the hypothesised direction and most were significant. Capacity utilisation was excluded from the reported models due to the presence of multi-collinearity. When output was excluded, the adjusted R-squared was found to be lower than when it was included, and a model run with total cost as the dependent variable yielded similar results to the average cost function (see Appendix 6). In the case of the IP cost estimation, the models that included information on quantity and price of labour performed better than those without—with adjusted R-squared scores all greater than 0.9. For the OP model, models that included the labour variables suffered from multi-collinearity. The models that included district ranking as a variable performed better than those that did not, but due to the difficulty of interpreting the coefficient on a ranking and the impracticality of using this for state-level estimates, these models were not considered for state-level estimations. The output variable has the greatest influence on the unit cost for both IP and OP care. Capital, in the form of number of beds, and the facility level were both important and significant predictors of unit cost. The state health index was significant for some models, but had a relatively smaller influence on the overall unit cost.

Table 4 Results of the unit cost function estimation

3.2 Cost Predictions

Model C (see Table 4a and b) was used to predict the state-level unit costs. Model C was chosen as it fits both the criterion of access to data on each variable at the health facility or state/district health departments and statistically had a good R-squared value. The state-level estimates for IP admissions and OP visits at DHs and CHCs are shown in Fig. 1 and Appendix 5 (see the electronic supplementary material, supplementary tables A5.1–A5.4). The mean cost of an IP admission at a DH was 2502 INR, ranging from 1028 (313–3429) INR in West Bengal to 4499 (1451–14,159) INR in Meghalaya. Inpatient admissions at the CHC level had a mean cost of 1601 INR and ranged from 412 (148–1151) INR in Bihar to 3677 (1359–10,055) INR in Goa. There is a wide range in the uncertainty intervals. For OP care, at the DH level, the mean cost of an OP visit was 224 INR, ranging from 91 (44–196) INR in Odisha to 657 (339–1337) INR in Manipur. The mean OP visits cost across CHCs and PHCs was 214 INR, ranging from 96 (50–187) INR in Odisha to 429 (217–844) INR in Mizoram.

Fig. 1
figure 1

State-level inpatient admission and outpatient visit unit cost predictions, Indian Rupees (INR)

In the validation exercise, the actual unit costs from the sampled facilities all fall within the uncertainty intervals for the unit costs predicted by the models. The difference between the actual and predicted costs ranged from − 2 to 62% in the IP model and − 90% to 30% for the OP model (see Table 5).

Table 5 Results of the validation exercise: actual versus predicted unit costs for eight facilities

4 Discussion

This paper has described the development of a set of models to predict the mean hotel costs of IP admissions and OP visits in Indian healthcare facilities. The models have been used to generate state-level estimates of the cost of IP admissions and OP visits in India. This is the first time cost estimates for IP admissions and OP visits across all the states have been estimated based on standardised data, and no other standardised data on economic costs exist in this form. The models and the cost estimates are a valuable resource for health policy-makers and planners, and can also be utilised to inform important policy-relevant research. In the absence of healthcare cost information in India, these models can inform multiple high-level policy decisions regarding the design and shape of health benefits packages and insurance schemes, including price setting, health technology assessment, budgeting and forecasting. They provide a method with which to estimate facility-level unit costs for both IP and OP care without the need to undertake a full costing exercise, saving both time and money, and avoiding the need to use an inaccurate national average.

The variation in cost estimates across the states, predicted by the models, confirms the need for state-level information in estimating costs. But the differences in unit costs can be driven by a number of factors, including different levels of service quality and input mix, case mix, price and wage levels and/or the nature of the referral systems [23]. Due to the nature of the models developed here, hospitalisation or OP visit rate have the strongest influence on the unit cost. As a result, unit costs in the states where the DHs have a higher patient load generally have lower costs. In contrast, the states with low population density (such as Nagaland and Mizoram) consistently have higher unit costs across the different sets of predictions. It is also notable that the states that score highest on the state health index (Mizoram, Kerala, Punjab and Tamil Nadu) all have IP unit costs that fall in the higher cost half of states. Those states that score lower on the state health index (Odisha, Rajasthan, Bihar and Nagaland) tend to have lower unit costs.

As well as a wide range in the unit costs across the states, there is some uncertainty in the point estimates seen in the uncertainty intervals. Some of this uncertainty is driven by the choice of model for the state-level predictions. Facility-level unit cost predictions are likely to need a better specified model that captures individual facility characteristics, including management, quality of care, wage rates and labour time. The model runs presented in Appendix 4 (see electronic supplementary material) confirm that state-level models could be improved. However, for this analysis, it was not possible to use models with the highest adjusted R-squared score (e.g. model A in Table 4a and b), as quality data on the average price of labour and the average quantity of human resources at facilities in each state were not available. In addition, the models have been developed using data from only six states and include only public facilities. Cost data from more states would help strengthen and further validate the model predictions for each state. The government funded costing study currently underway [47] has potential to expand the data set and include the private sectors and tertiary providers in an increased number of states.

Further to these limitations, the critical variables of hospital admissions and OP visits were estimated using household reported data. Ideally, facility-reported admissions and visit data from the Health Management Information System (HMIS) would have been used, but these were not accessible at the central level. There is also state variation in reporting and quality of data on infrastructure. In addition, differences in patient characteristics would further strengthen the estimates, but these are not consistently available either. To refine the modelling approach, improving the accessibility and quality of HMIS data is vital. To address this uncertainty, a validation procedure was carried out in which model predictions were compared with actual unit costs for individual sites. While there were some differences between the actual and predicted costs, the actual unit costs for the selected facilities fall within the upper and lower bounds of the predicted costs for the same sites. This suggests that the results from the models are relatively robust, but that larger samples of cost data are needed to narrow the uncertainty intervals. More and better data on health facility costs, state-level health system infrastructure and service provision, in particular, facility-level average admission and OP visit rates, will increase the predictive power of the models.

The variation in costs across the states suggests that costs have the potential to be more closely aligned to each other. In other words, there is scope for improving efficiency and allocation of resources. Ideally unit costs would be aligned around those states that are more efficient at producing quality healthcare services that are equitably distributed, and therefore these estimates need to be used with caution. However, the level of efficiency with respect to health outcomes or quality health services of these facilities is currently unknown, suggesting further research to identify the most efficient states or facilities would be highly beneficial. The state-level estimates also provide a way to assist in the prediction of costs of healthcare interventions or treatment of conditions and can assist with setting ‘differential prices’ for insurance reimbursement. By characterising the relationship between healthcare costs between different states, it is possible to use this relationship to predict costs or set prices for specific interventions. For example, as the model predicts that DH IP costs are 1.5 times higher in Kerala than Odisha; then if cost data are available for Kerala, one approach would be to assume that an IP procedure such as a caesarean section will be 1.5 times the cost of a caesarean section in Odisha. Again, further research is required to validate this as a robust method for setting differential prices for treatment of individual conditions.

In using statistical methods to predict costs, the analysis confirms that unit costs in India are not explained by scale of activity (admissions/OP visits) and capacity (number of beds) alone and that a modelling approach such as this is a better method to obtain state-level average costs estimates. While limitations do persist and therefore the results and, in particular, point estimates need to be used with caution, this first attempt at modelling facility costs in India represents a step forward in providing a more evidence-based approach to cost estimation.

5 Conclusion

There exists a dearth of cost information within India. We describe here a novel means of estimating cost functions within the Indian healthcare system, which can provide a robust means of filing information gaps in the costing evidence base. An adaptation of the WHO method for estimating unit costs was applied at a country level to generate unit costs estimates for different states in India. The models estimated were statistically robust and found significant variation in unit costs between the states and levels of the health system. Further cost and health system data are still needed to improve the cost evidence base in India. Despite this, the cost function approach is a useful tool for the Indian context, and can be further developed as the evidence base expands at the district level and below, as well as for tertiary level and private sector facilities.