The results and the scientific value of health care efficiency analysis depend crucially on the appropriateness of the inputs and outputs considered in the analysis. We start our discussion by formulating ideal-typical concepts (as defined by Max Weber) of inputs and outputs. These ideal-typical concepts are then contrasted with the available inputs and outputs in the OECD database.
Ideal-Typical Input Measures
Ideally, the inputs should reflect all of the resources used by the health care system to improve the health status of the population. The inputs can be broadly classified into labor (in its different forms), materials, and the services provided by the capital stock in use. In classical economics,the total input is considered the sum of the direct labor input, the indirect labor input associated with the material consumed during the process of producing health care, and the indirect labor input associated with the portion of the capital stock that is consumed during the production process. In contrast to the classical use of labor coefficients to transform the quantities of all the different inputs into amounts of labor, other weighting schemes can be applied when aggregating the inputs.
The direct labor inputs can be broadly categorized into service by doctors, service by therapists (non-doctoral), services by nurses and care workers, services by pharmacists, and services by administrative staff. The material inputs can be further categorized according to commodity group into chemical goods, care materials, etc. The fraction of the capital stock consumed may be disaggregated into buildings, machinery (e.g., for drug production), and equipment (e.g., beds, MRIs, etc.; see Fig. 1).
Readers familiar with input–output tables could conceptualize the health system as one specific sector, which would provide the necessary information as quantities with different units of measurement. For instance, this information could be expressed in labor values, based on employment data for the relevant sectors (ignoring the issue of nonhomogeneous labor for the moment).
Ideal-Typical Output Measures
Starting with the idea of describing the health sector as an individual sector in the input–output analysis, the output would be the aggregated value of health goods and services produced, thereby reflecting the individual valuation of the output from the buyers’ (patients’) perspective. But, as most of the products of the health system are not “sold” at market prices, this concept of subjective valuation of the output is not feasible.
Going back to an ideal-typical concept of health output, three quantities may best capture the output of the health sector: the amount of pain relief, the number of additional (quality-adjusted) life years, and the increase in well-being (e.g., achieved through the use of prostheses) caused by treatment within the health system. These initially purely theoretical concepts could perhaps be evaluated by summing the values of each measure in the various subsectors: hospitals, medical practices, pharmacies, etc. Alternatively, the aggregation could be performed by summing across patient categories (e.g., acute patients, patients with chronic conditions, etc.) or population subgroups (e.g., grouped by age and sex). Hence, ideal-typical outputs could be the number of extra pain-free hours (weighted by objective pain measures) due to treatment, the sum of the extra (quality-adjusted) life years (QALYs) gained by undergoing treatment within the health system, and the increase in well-being caused by treatment (see Fig. 2).
Obviously, the causal relation between treatments and measures often cannot be verified in practice. Furthermore, the intersubjective comparable measures of pain relief that need to be aggregated are purely theoretical, as is the calculation of additional (quality-adjusted) life years or the increase in well-being. Nevertheless, we consider this ideal-typical reasoning to be very important when judging the adequacy of available output indicators.
The OECD Database
The OECD collects health-related data on OECD countries from various data sources such as the Eurostat database, The World Bank, the Global Health Observatory data repository of the World Health Organization (WHO), and other OECD databases (OECD.Stat). These data are complemented by national databases (e.g., statistical offices, federal ministries).
The OECD health data provide information on health status (e.g., mortality, morbidity), nonmedical determinants of health (e.g., tobacco, alcohol, food consumption, obesity), health care resources (e.g., health employment and education, physical and technical resources), health care utilization (e.g., consultations, immunization, screening, hospital stay, surgical procedures, waiting times), health care quality indicators, the pharmaceutical market, expenditure on health, social protection, demographic factors (e.g., population, fertility, labor force), and economic factors (e.g., GDP, wages) (see [15] for example).
Differences in the data sources, definitions, and methodologies used for variable construction affect comparability across countries. For instance, when comparing the number of practicing physicians, some countries include doctors working in administration, management, academic, and research positions, while others only include doctors who provide care directly to patients. Similarly, when comparing the number of nurses, some countries also include midwives (considered to be specialist nurses) in their figures, while others do not. When comparing surgeries, it is important to realize that classification and registration practices differ among countries. For example, some countries group data on total hip replacements and partial replacements together, while others only provide total hip replacement data. Countries also apply different rules for registering technical equipment (e.g., MRI units or CT scanners): most OECD member states include all such equipment, regardless of whether it is used in a hospital, but some countries only include equipment used in hospitals (see [16]).
Moreover, a lack of standardization in health interview surveys (differences in wording and in response categories) complicates between-country comparisons. For example, the perceived health status can be retrieved from The European Union Survey on Income and Living Conditions (EU-SILC), but in this case there is no measurement standardization across OECD countries. Similarly, countries may differ in the data they provide on health-related behavior such as tobacco and alcohol consumption (see [16]).
The main source of data on infant mortality in most European countries is the Eurostat database. Some of the between-country variation in infant mortality rates is probably due to differences in registering practices for premature infants. The methodology used to estimate life expectancy, data on which are again mainly retrieved from the Eurostat database, may also differ between countries. Life expectancy at birth for the whole population is calculated by the OECD Secretariat (see [16]).
These limitations in between-country comparability should be kept in mind when interpreting the empirical results. Moreover, we face the problem of missing data in the OECD database, which limits empirical analyses dramatically. For our analysis, we used the OECD health data for 2012. In the rather rare cases in which entries were missing for 2012 but available for 2011, we used the 2011 values to substitute for the missing 2012 values. Only six countries could be included in all five of the partial analyses, so the number of available scores for partial analyses varied between 1 and 5 for the 34 countries.
Available Input and Output Measures and their Adequacy
We scanned the complete available OECD health database to find the most adequate input and output indicators. Not unexpectedly, the available indicators do not remotely coincide with the ideal-typical measures discussed above. Note that we disregarded from the start all indicators with <20 valid data (i.e., with values from <20 countries).
Inputs
In the literature, health care expenditure is often used as an input measure. Unfortunately, however, some of this expenditure depends on the country’s wage levels (for nurses, physicians, etc.) while the rest does not (e.g., expenditure on imported magnetic resonance imaging machines). Moreover, disentangling the effects of private and public spending on health outcomes is a challenging task (see [17]), and it is unclear how variations in the public financing of health care influence a country’s health outcomes (see [6]). Not all of the public expenditure on health has a measurable effect on health outcomes (e.g., the effect of the expenditure on tooth cleaning). Furthermore, we face the problem of differentiating between health expenditure and consumption (e.g., designer and basic spectacles).
Nonfinancial health care resources often used in the literature include practicing physicians and nurses and inpatient beds per 1000 population. Moreover, to take into account health care technology, technical equipment such as MRI units is considered medical input (see [1]). Medical services such as surgeries and transplantations are not usually considered in research, but are assumed to be highly relevant to health care efficiency.
Due to issues with availability and missing data, we cannot cover all of the aspects of the ideal-typical inputs mentioned above. We categorize the inputs used into the three broad categories of basic medical inputs, intermediate medical inputs (accomplishments, provided services), and financial inputs:
-
1.
Basic medical inputs
-
hosp_beds: total hospital beds per 1000 population
-
physicians: practicing physicians per 1000 population
-
nurses: practicing nurses per 1000 population.
-
2.
Intermediate medical inputs (used as outputs)
-
cataract: cataract surgery, total number of procedures per 100,000 population
-
bypass: coronary artery bypass graft, inpatient cases per 100,000 population
-
kidney: transplantation of kidney, total number of procedures per 100,000 population.
-
3.
Financial inputs
Outputs
Unfortunately, the ideal (or almost ideal)-typical output measures that have already been mentioned are not available in health data sets. Life expectancy and infant mortality are often used as health output indicators, but these outputs are, of course, incomplete measures of health status since they do not reflect the quality of life of the living. Life expectancy depends strongly on social, cultural, and environmental factors (especially at the time of birth). Infant mortality strongly depends on health-system and hygiene standards but focuses on a very specific aspect of health. In addition, life expectancy and mortality rates are probably affected by country-specific rates of accidents, homicides, and suicides. It is difficult to define health care outcomes in the same way for all countries using a single factor, because different countries have different policy priorities and aims (see [17]). Research shows that there are strong differences in the extent to which the health system influences performance measures, ranging from large effects on measures such as waiting time to very small effects on mortality, which is strongly influenced by factors that are not within the purview of the health care system (see [18]). High infant mortality rates (e.g., in the USA) may be due to factors such as poverty rather than low health care system efficiency (see [1]). To account for mortality rates that are less strongly affected by these factors, we include 30-day mortality after ischemic stroke and acute myocardial infarction in our analyses.
We categorize the output indicators into two groups: specific outputs and unspecific outputs. While recognizing that this distinction is somewhat fuzzy, we regard the specific outputs as more closely related to the effects of the health system than the unspecific outputs. For instance, infant mortality depends crucially on the hygiene standards applied and the medical care provided during birth. This is not true of general life expectancy, which is a weighted average of the number of deaths in a given year across all ages. Therefore, the living conditions and health provision across a period of about 100 years determine the actual measure of life expectancy. This means that life expectancy can, at best, be only partially attributed to the present state of the health system. We use the following variables for our analyses:
-
1.
Specific outputs
-
infmort: infant mortality, deaths per 1000 live births
-
ischemicstroke30: 30-day mortality after admission to hospital for ischemic stroke per 100 patients (based on admission data).
-
amim30: 30-day mortality after admission to hospital for acute myocardial infarction (AMI) per 100 patients (based on admission data).
-
2.
Unspecific outputs
Economic Conditions and Lifestyle
It is not easy to correlate health status with the activities performed by health care systems. There are other non-health variables that are outside the range of influence of the health care system but can affect health, such as environmental conditions (e.g., temperature, pollution), socioeconomic factors (e.g., education, income, employment), and health-related behaviors and lifestyle (e.g., alcohol and tobacco consumption, hygiene) of the population. But these factors are not routinely measured and they are highly correlated with each other. For instance, many behavioral factors are correlated with educational attainment (see [3]) and measures of income. Therefore, we face the problem of using a combination of macro (e.g., education, employment) and intermediate (e.g., smoking, alcohol) factors in the DEA model which could lead to biased results (see [11]). We categorize the analyzed variables into economic conditions and lifestyle inputs:
-
1.
Economic conditions
-
gdp_ppp: GDP per capita in US dollars, based on PPP
-
gini: Gini coefficient of income distribution (disposable income after taxes and transfers).
-
2.
Lifestyle input
-
alcohol: alcohol consumption in liters per capita, population aged 15+
-
tobacco: % of population aged 15+ who are daily smokers
-
obesity: % of population aged 15+.
Methodology
Unfortunately, a large number of values are missing from the OECD health data. An all-encompassing analysis would only be possible for an extremely small number of countries. Thus, we instead focus on several aspects of health systems and conduct five partial analyses, as displayed in Fig. 3. Note that the analyses differ in the number of countries they consider. Moreover, we assume that separate analyses of specific aspects of health care provision are useful for providing detailed information on how a country’s health care system works.
The first analysis (analysis I) focuses on the efficiency of surgery provision. Outputs are cataract and bypass surgeries and kidney transplantations (per 100,000 population). We use the numbers of physicians, nurses, and beds (per 1000 population) as medical inputs.
In the second analysis (analysis II), we focus on the efficiency of mortality prevention, defined by infant mortality, mortality 30 days after stroke, and mortality 30 days after infarct. Medical inputs are again the numbers of physicians, nurses, and beds (per 1000 population).
The third analysis (analysis III) concentrates on the effects of lifestyle on life expectancy from birth. Alcohol consumption, smoking, and obesity are used as lifestyle inputs.
The focus of the fourth analysis (analysis IV) is on the effects of income and health expenditure on life expectancy from birth. Income is measured as the gross domestic product per capita (GDP/Pop.). Expenditure is measured as the health expenditure per capita (HExp/Pop.).
In the fifth analysis (analysis V), we concentrate on the effects of relative expenditure, measured as the ratio of health expenditure to GDP (HExp/GDP), and inequality, measured as the Gini coefficient, on life expectancy from birth.
We use an input-oriented data envelopment analysis (DEA), as described in Sect. 2, with constant returns to scale. The idea of DEA is that price vectors are selected for each unit that maximize efficiency given a set of constraints, i.e., DEA provides the “optimal” price vectors \(u^{*}\) and \(v^{*}\) for all n health systems. \(u^{*}\) and \(v^{*}\) vary strongly between units, and in many cases several input and output prices are zero. Even when DEA was first applied by Charnes et al. [19], some DMUs obtained zero weights, and all but one of the several inputs of some efficient units had zero weights.
Charnes et al. [20] question restrictions on the firm’s individual optimal weights. Allen et al. [21] provide a discussion of value judgements of weights obtained by DEA and a priori weight restrictions by the researcher, and regard the existence of zero weights as a conceptual problem with DEA. Due to the nature of value judgements, they understandably conclude that “There is no all purpose method for translating value judgements into restrictions on DEA weights” (p. 30). To circumvent the introduction of value judgements, we decided to use means of DEA weights as an alternative to original DEA scores.
In detail, we use DEA weights averaged over all countries to aggregate inputs and outputs, and we construct an alternative relative productivity measure: the ratio of a country’s productivity to the productivity of the most productive country. We obtain average “optimal” input and output price vectors \(\bar{u}^{*}\) and \(\bar{v}^{*}\) based on the n individual input and output price vectors. We calculate aggregates of the inputs and outputs based on \(\bar{u}^{*}\) and \(\bar{v}^{*}\) as follows:
$$\begin{aligned} {\text {Productivity}}=\frac{{\text {output}}}{{\text {input}}}=\frac{\sum _{r}y_{r}\bar{u}_{r}^{*}}{\sum _{i} x_{i}\bar{v}_{i}^{*}}. \end{aligned}$$
The impact of varying the weights of the health system outputs is discussed in, for example, Lauer et al. [22]. They criticize the use of fixed output weights for all of the countries considered in the WHO health-system performance rankings. Fixed weights do not account for the development status or cultural traditions of a particular country, or for the different political objectives of different countries. This issue becomes more important as the range of income levels across the countries considered increases. Therefore, this issue is of less concern when investigating the (relatively similar) OECD countries than when considering the broader set of countries included in the WHO database. Hauck and Street [23] try to avoid the drawbacks of aggregating multiple objectives into one single index by applying a multivariate multilevel model.
To carry out the DEA, we have to transform some of our variables. The output (input) should be \(\ge \)0, and the output (input) should be “good” instead of “bad.” For instance, Cheng and Zervopoulos [24] use a generalized directional distance function to handle desirable and undesirable outputs. Seiford and Zhu [25] propose a linear monotonically decreasing transformation for undesirable outputs and output-decreasing inputs. In our analysis, mortality rates and lifestyle inputs such as the amount of alcohol must be transformed. x denotes the original variable (e.g., mortality rate), \(\tilde{x}\) is the scaled variable (with a mean of 0 and a standard deviation of 1), and \(x^{*}\) the transformed variable:
$$\begin{aligned} \tilde{x}&=\frac{x-\text {mean}(x)}{\text {sd}(x)}\\ x^{*}&=-\tilde{x}-\min (-\tilde{x})+1. \end{aligned}$$
Note that \(x^{*}\) has a minimum value of 1 for the health system with the smallest output (i.e., the highest mortality rate) and a standard deviation of 1. The Gini coefficient is transformed into \(1-{\rm Gini}\), as \(1-{\rm Gini}\) is regarded as a sensible equality measure.