1 Introduction

In previous studies (Kittelsen et al. 2009; Kittelsen et al. 2008; Linna et al. 2006, 2010) one has found persistent evidence that the somatic hospitals in Finland have a significantly higher average productivity level than hospitals in the other major Nordic countries (Sweden, Denmark and Norway).Footnote 1 These results indicate that there could be significant gains from learning from the Finnish example, especially in the other Nordic countries, but potentially also in other similar countries. The policy implications could however be very different depending on the source of the productivity differences. This paper extends earlier work by, (1) decomposing the productivity differences into those that stem from technical efficiency, scale efficiency and differences in the possibility set (the technology) between periods and countries, and (2) exploring the statistical associations between the technical efficiency and various hospital-level indicators such as case-mix, outpatient share and status as a university or capital city hospital. Finally, (3) we examine the robustness of the results to the choice of method.

International comparisons of productivity and efficiency of hospitals are few, primarily because of the difficulty of getting comparable data on output (Derveaux et al. 2004; Linna et al. 2006; Medin et al. 2013; Mobley and Magnussen 1998; Steinmann et al. 2004; Varabyova and Schreyögg 2013). Such analyses often find quite substantial differences in performance between countries. Differences may be due to the dissimilar hospital structures and financing schemes, e.g. whether hospitals exploit economies of scale, have an optimal level of specialisation, or face high-powered incentive schemes that would encourage efficient production. Differences may also result from methodological problems. Cross-national analyses are often based on data sets that only to a limited extent are comparable—in the sense that inputs and outputs are defined and measured differently across countries. Our comparison gains validity from the existence of a Nordic standard for diagnosis related groups (DRGs) (Linna and Virtanen 2011). As described in the data section, the structure of the hospital sectors are broadly similar in the Nordic countries and the main differences are handled by assuming country specific production frontiers and variables in the analysis. It is, however, well known that the way we measure hospital performance may influence the empirical efficiency measures (Halsteinli et al. 2010; Magnussen 1996). In this article we will therefore use both the non-parametric data envelopment analysis (DEA) method and the stochastic frontier analysis (SFA) method, and provide evidence of the robustness of our results.

2 Methods

2.1 Efficiency and productivity

Efficiency and productivity are often used interchangeably. In our terminology productivity denotes the ratio of inputs and outputs, while efficiency is a relative measure comparing actual to optimal productivity. Since productivity is a ratio, it is by definition a concept that is homogenous of degree zero in inputs and outputs, i.e. a constant returns to scale (CRS) concept. This does not imply that the underlying technology is CRS. Indeed, the technology may well exhibit variable returns to scale (VRS), and equally efficient units may well have different productivity depending on their scale of operation, as well as other differences in their production possibility sets.

Most productivity indexes rely on prices to weigh several inputs and/or outputs, but building on Malmquist (1953), Caves et al. (1982) recognised that (lacking prices) one can instead use properties of the production function, i.e. rates of transformation and substitution along the frontier of the production possibility set, for an implicit weighting of inputs and outputs. We will use the term technical productivity to denote such a ratio of inputs to outputs where the weights are not input and output prices but rather derived from the estimated technologies.

This analysis departs from Farrell (1957) who defined (the input-oriented) technical efficiency as:

$$E_{i}^{{T^{tc} }} = {\text{Min}}\left\{ {\theta \left| {(\theta {\mathbf{x}}_{i} ,{\mathbf{y}}_{i} ) \in T^{tc} } \right.} \right\}$$

Where \(({\mathbf{x}}_{i} ,{\mathbf{y}}_{i} )\) is the input/output vector for an observation i, and T tc is the technology or production possibility set for year t and country c. For an input/output-vector \(({\mathbf{x}},{\mathbf{y}})\) to be part of the production possibility set, we need to be able to produce y using x. As shown in Färe and Lovell (1978), this is equivalent to the inverse of the Shephard (1970) input distance function.

If there are variable returns to scale, Farrell’s measure of technical efficiency depends on the size of the observation, so that we can account for (dis)economies of scale. The measure of technical productivity can, following Førsund and Hjalmarsson (1987), be defined by rescaling inputs and outputs:Footnote 2

$$E_{i}^{{\lambda T^{tc} }} = Min_{\theta ,\lambda } \left\{ {\theta \left| {(\theta {\mathbf{x}}_{i} ,{\mathbf{y}}_{i} )} \right. \in \lambda T^{tc} } \right\},$$

where the convex cone of the technology λT tc, contains all input–output combinations that are a proportionate rescaling of a feasible point in the technology set T tc. While this is formally identical to a “CRS technical efficiency” measure, our definition here is instead that the reference surface is a homogenous envelopment of the underlying technology. This is the same assumption normally used in Malmquist indices of productivity change, see e.g. Grifell-Tatjé and Lovell (1995).

Furthermore, it is not necessary to assume that the technologies of different countries and time periods are identical in order to compare productivity, as long as one has a common reference set. It is common to use a specific (base) time period as a reference, as in Berg et al. (1992):

$$M_{ij}^{tc} = \frac{{E_{i}^{{\lambda T^{tc} }} }}{{E_{j}^{{\lambda T^{tc} }} }},$$

which compares the productivity of two observations i and j using a fixed time period t as the reference, even if the observations i and j are from different time periods. A widespread alternative method is to construct geometric averages of indices based on consecutive time periods, as in Färe et al. (1994), which avoids the arbitrary choice of reference period t, but instead introduces a circularity problem. The approach followed here is instead to use information from all time periods for the country specific productivity reference:

$$T^{c} = \mathop {\text{Env} }\limits_{t} \left( {T^{tc} } \right)$$

where Env() is the convex envelopment of the time specific technologies. Furthermore, to compare the productivity across countries we will need the envelopment of all time and country specific technologies:

$$\overline{T} = \mathop {\text{Env} }\limits_{c} \left( {T^{c} } \right)$$

The reference sets (4) and (5) are not themselves technologies, only envelopment of technologies, as are the convex cones (rescaled sets) \(\lambda T^{c} ,\lambda \bar{T}\). Analogous to (2), it is then possible to define the productivity levels relative to the country specific references and the pooled references as \(E_{i}^{{\lambda T^{C} }}\) and respectively.

The country c specific Malmquist index of productivity change over time can then be defined as.

$$M_{ij}^{c} = \frac{{E_{i}^{{\lambda T^{c} }} }}{{E_{j}^{{\lambda T^{c} }} }},$$

which normally is reported for two observation i and j of the same unit at two points in time. In this analysis we are primarily concerned with comparing observations from different units in different countries, and there is no natural pairing of i and j. Edvardsen and Førsund (2003) develop and report geometric means of Malmquist indices between a unit in one country and all units in another country. We will instead take a simpler approach and report the productivity and efficiency levels of each unit and their country means.

2.2 Decomposition

As discussed e.g. in Fried et al. (2008), the Malmquist index can be decomposed in various ways, where the original decomposition is into frontier shift and efficiency change. When working in productivity and efficiency levels, the starting point is instead the decomposition of technical productivity into technical efficiency and scale efficiency:

$$E_{i}^{{\lambda T^{tc} }} = E_{i}^{{T^{tc} }} \frac{{E_{i}^{{\lambda T^{tc} }} }}{{E_{i}^{{T^{tc} }} }} = (TP_{i} = TE_{i} *SE_{i} ),$$

where the parenthesis denotes the conventional way of writing the technical productivity (TP) as the product of technical efficiency (\(TE_{i} = E_{i}^{{T^{tc} }}\)) and scale efficiency (\(SE_{i} = \frac{{E_{i}^{{\lambda T^{tc} }} }}{{E_{i}^{{T^{tc} }} }}\)). By including the possibility of comparing productivity across both time and countries, this decomposition naturally expands into:

$$E_{i}^{{\lambda \bar{T}}} = E_{i}^{{T^{tc} }} \frac{{E_{i}^{{\lambda T^{tc} }} }}{{E_{i}^{{T^{tc} }} }}\frac{{E_{i}^{{\lambda T^{c} }} }}{{E_{i}^{{\lambda T^{tc} }} }}\frac{{E_{i}^{{\lambda \bar{T}}} }}{{E_{i}^{{\lambda T^{c} }} }} = (TTP_{i} = TE_{i} *SE_{i} *PP_{i} *CP_{i} ),$$

where we have decomposed the now total technical productivity (TTP) into technical efficiency (\(TE_{i} = E_{i}^{{T^{tc} }}\)), scale efficiency (\(SE_{i} = \frac{{E_{i}^{{\lambda T^{tc} }} }}{{E_{i}^{{T^{tc} }} }}\)), period productivity (\(PP_{i} = \frac{{E_{i}^{{\lambda T^{c} }} }}{{E_{i}^{{\lambda T^{tc} }} }}\)) and country productivity (\(CP_{i} = \frac{{E_{i}^{{\lambda \bar{T}}} }}{{E_{i}^{{\lambda T^{c} }} }}\)). Each of these is specific to the observation i.

Note that dividing this decomposition for two observations of one unit at different points in time, and ignoring the country productivity, one gets the common Malmquist decomposition of technical efficiency change, scale efficiency change and frontier change. As with the Malmquist index, the decomposition is not easily extended to comparisons between countries, as there is no natural pairing of observations. Asmild and Tam (2007) develop a global index of frontier shifts which they note would be useful for international comparisons, but does not extend this to a full decomposition.

These concepts are illustrated in Fig. 1, where we ignore the time dimension and concentrate on country differences. For an observation A in country 1 with a production possibility set bounded by the production function Frontier 1, we can define the technical efficiency by (1) above as the ratio BC/BA of necessary inputs to actual inputs for a given output. The productivity of A is the slope of the diagonal OA, but we can normalise this in (2) by comparing it to the maximal productivity given by the slope of the diagonal OD. The technical productivity of A is then the ratio BD/BA. Using the definition implicit in (7), scale efficiency is BD/BC. Assume that country 2 has a production possibility set bounded by Frontier 2, and that the maximal productivity of country 2 given by the slope OE is also the maximal for all countries, i.e. bounding the convex cone of all possibility sets \(\lambda \bar{T}\). This slope OE will serve as the reference for the total technical productivity in (8), which for observation A is given by BE/BA. The country productivity for observation A is then the ratio BE/BD.

Fig. 1
figure 1

The components of hospital total technical productivity in input–output space. For observation A in country 1, Total technical productivity (TTP) = BE/BA, Technical efficiency (TE) = BC/BA, Technical productivity (TP) = BD/BA, Scale efficiency (SE) = BD/BC and Country productivity (CP) = BE/BD

With only one input and one output as in Fig. 1, one country will define the reference and all observations in each country will have the same country productivity. With two outputs as in Fig. 2, the convex cone of each country’s frontier λT C can be drawn as the curved lines for a given level of the single input. The convex cone of all the country frontiers \(\lambda \bar{T}\) is represented by the dashed line which serves as the reference for total technical productivity defined in (8). If the country frontiers cross as in this example, the country productivities will depend on the output mix of the observation.

Fig. 2
figure 2

The components of hospital total technical productivity in output–output space. For observation A in country 1, Total technical productivity (TTP) = OA/OD, Technical efficiency (TE) = OA/OC, and CP Country productivity (CP) = OC/OD

2.3 Cost efficiency and productivity

Finally note that since we have only one input in our data, cost minimization for a given input price is formally equivalent to input minimization. Thus cost efficiency, which is defined as the ratio of necessary costs to input costs, is also equivalent to technical efficiency. The decomposition of productivity and the Malmquist index is most often shown in terms of technical efficiency and technical productivity but could easily have been developed in terms of cost efficiency and cost productivity. Note that in the general multi-input case the numbers will differ in technical and cost productivity decompositions, but in our one-input case, the actual numbers will be identical.Footnote 3 Thus, we may view the terms technical efficiency and cost efficiency as equivalent in discussing the results in this analysis.

2.4 Estimation method

The DEA and SFA methodologies build upon the same basic production theory basis. In both cases one estimates the production frontier (the boundry of the production possibility set or technology) or the dual formulation in the cost frontier, but the methods are quite different in their approach to estimating the frontiers and in the measures that are easily calculated and therefore commonly reported in the literature (Coelli et al. 2005; Fried et al. 2008). While the major strengths of DEA has been the lack of strong assumptions beyond those basic in theory (free disposal and convexity) and the fact that the frontier fits closely around the data, SFA has had a superior ability to handle the prescense of measurement error and to perform statistical inference. The latter shortcoming of DEA has been allieviated somewhat with the bootstrapping techniques introduced by Simar and Wilson (1998, 2000).

In our data there are good reasons to choose either method. While the prescense of measurement error is probably limited for those activities that are actually measured, there is a strong case for omitted variable (i.e. quality) bias that may be more severe in DEA. The DEA method can easily estimate the country specific frontiers without strong assumptions, thereby making country differences dependent on the input–output mix, while the SFA formulation generally introduces a constant difference between country frontiers. The prescense of country dummies in SFA implies however, that information from other countries are used to increase the precision of the estimates and therefore the power of the statistical tests.

In the DEA analysis the frontiers have been estimated using the homogenous bootstrapping algorithm from Simar and Wilson (1998), while the second stage analysis of the statistical association of technical efficiency and the environmental variables has been conducted using ordinary least square (OLS) regressions. The SFA analysis has used the simultanous estimation of the frontier component and the (in)efficiency component proposed in Battese and Coelli (1995).Footnote 4

2.5 Data

Data has been collected for inputs and outputs of all public sector acute somatic hospitals. The hospital structure of the four Nordic countries is broadly similar. The structure consists of mostly publicly financed and governed somatic hospitals with only a very few commercial hospitals, almost no specialization in medicine, surgical, cancer care etc., and no specialization to cater for specific groups such as veterans/military, childrens hospitals etc. Only in Finland are there a number of Health Centres with inpatient beds that serve less severe patients, and these are excluded from our analysis, as are the few commercial hospitals. Some non-profit private hospitals that are under contract with the public sector are included, however. The data includes almost the whole population of somatic hospitals in the Nordic countries, which due to a natural geographic monopoly usually serve a catchment area covering all residents. Differences in patient mix will mainly reflect demographic differences across the geographic areas, factors that are partly included in the second stage regression.

While the hospital sectors in all four countries are based on public ownership and tax-based financing, there are administrative and incentive differences. In Norway, all hospitals are state-owned, but the provision of hospital services is delegated to five (reduced to four during 2007) regional health enterprises (RHF). Each of these own between four and thirteen health enterprices (HF) which are the administrative units of hospital production, but a number of the health enterprises are multi-location institutions and the extent of integration between the actual physical hospitals varies considerably. In Denmark and Sweden hospitals are owned by the intermediate government level regions or counties (“regioner” and “landsting”), but single-location hospitals are still mainly separate institutions. The Finnish hospital sector is owned by health districts that are federations of municipalities. Norway and some counties in Sweden use partial activity based financing (ABF) based on the DRG-system, but with most of the payment made by block grants. In Denmark ABF was used only to a limited extent during the period. The Finnish hospital districts use various case-based classification systems (including DRGs) as a method of collecting payments from municipalities, but the Finnish payment system does not create similar incentives as ABF used in other countries (Kautiainen et al. 2011). However, since hospitals can be described by the same input–output vectors the productivity of the hospitals in our sample should be comparable even though they may not face the same production possibility sets.

Inputs are measured as operating costs, which for reasons of data availability are exclusive of capital costs. It was not possible to get ethical permission for the use of data for 2007 in Sweden. The Swedish data is further limited by the lack of cost information at the hospital level, nescessitating the use of the administrative county (“landsting”) level as the unit of observation, each encompassing from one to five physical hospitals. The difference in level of observational unit between the countries (counties, health enterprises or hospitals) is one of the reasons why we estimate different technologies or production possibility sets in each country.

Since we do not have data on teaching and research output, the associated costs are also excluded. Costs are initially measured in nominal prices in each country’s national currency, but to estimate productivity and efficiency one needs a comparable measure of “real costs” that is corrected for differences in input prices.

To harmonize the cost level between the four countries over time we have constructed wage indices for physicians, nurses and four other groups of hospital staff, as well as one for “other resources”. This removes a major source of nominal cost and productivity differences between the countries, a difference that can not be influenced by the hospitals themselves, nor by the hospital sector as a whole. The wage indices are based on official wage date and include all personnel costs, i.e. pension costs and indirect labour taxes (Kittelsen et al. 2009). The index for “other resources” is the purchaser parity corrected GDP price index from OECD. The indices are weighted together with Norwegian cost shares in 2007. Thus we construct a Paasche-index using Norway in 2007 as reference point. Note that this represents an approximation, the index will only hold exactly if the relative use of inputs is constant over time and country.

Outputs are measured by using the Nordic version of the diagnosis related groups (DRGs). Each hospital discharge is assigned to one of about 500 DRGs on the basis of diagnosis and procedure codes. When activity is measured by DRG-points, discharges are weighed by a factor that is an estimate of the average cost of patients in that DRG. Thus the weighting is implicitly by patient severity or complexity as reflected in average costs. We define three broad output categories; inpatient care, day care and outpatient visits. Within each category patients are weighted with the Norwegian cost weights from 2007, where the weights are calculated from accounting data from a sample of major Norwegian hospitals.Footnote 5 Outpatient visits were not weighted. Considerable work has gone into reducing problems associated with differences in coding practice, including moving patients between DRGs, eliminating double counting etc. The problem of DRG-creep, where hospitals that face strong incentives to upcode from simple to more severe DRGs based on the number of co-morbidities has been reduced by aggregating these groups. In the DEA analysis this had the effect of reducing the mean productivity level of Norwegian hospitals by 2 % points while the other countries were not affected, presumably because activity based financing is a more entrenched feature in Norway.

In addition to the single input and the three outputs, we have collected data for some characteristics that vary between hospitals within each country or over time, and that may be associated with efficiency. These include dummies for university hospital status which may capture any scope effects of teaching and research. This must be effects beyond the costs attributed to these activities which are already deducted from the cost variable, but the sign of the effect on productivity would depend on whether there are economies or diseconomies of scope between patient treatment and teaching and research. University hospitals may also have a more severe mix of patients within each DRG-group, which may bias estimated productivity downwards. The main case-mix effect should presumably already be captured by the DRG weighting scheeme. University hospitals are located in major cities. We also include a dummy for capital city hospitals, which may have a less favourable patient mix due to the socio-economic composition of the catchment area, so that one would expect the capital city hospitals to have lower productivity. However, university and capital city hospitals could also have lower costs due to shorter travelling times and a greater potential for daypatient or outpatient treatment, so the net effect is not obvious. Allthough all hospitals are located in towns, the university and capital city dummies should capture the main differences that may be due to urban or rural catchment areas.

The case-mix index (CMI) is calculated as the average DRG-weight per patient, and may again capture patient severity if the average severity within each DRG-group is correlated with the average severity as measured by the DRG-system itself, in which case one should expect a high CMI to be correlated with low productivity. The length of stay (LOS) deviation variable is calculated as the DRG-weighted average LOS in each DRG for each hospital divided by the average LOS in each DRG across the whole sample (i.e. expected LOS). Again this could capture differences in severity within each DRG group, but may also indicate excessive, and therefore inefficient, LOS. Finally, the outpatient share is an indicator of diffences in treatment practices across hospitals, where a high outpatient share may indicate lower costs per discharge. These variables are collectively termed “environmental variables”, although they are not always strictly exogenous to the hospital.

In earlier studies, the extent of activity based financing (ABF) has been an important explanatory variable, but in the period covered by our dataset there has been too little variation in ABF within each country. If a variable is or highly correlated with the country then it is not possible to statistically separate the effect from other country specific fixed effects. This also holds for structural variables such as ownership structure, financing system etc. Travelling time to hospital can be an important cost driver but is not included here due to lack of data.Footnote 6 Finally, no indicators of the quality of treatment have been available for this analysis.

Table 1 shows the distribution of hospitals between countries and summary statistics for the varibles in the analyses. When interpreting the size of the Swedish observations, remember that these are not physical hospitals but the larger administrative “Landsting” units. To a lesser extent, the Norwegian observations of health enterprises can also encompass several physical hospitals.

Table 1 Descriptive statistics. Observation means and SD

3 Results

3.1 DEA results

In the DEA analysis, the total technical productivity level is calculated with reference to a homogenous frontier estimated from the pooled set of observations for all countries and periods. Figure 3 show that the considerable productivity superiority of the Finnish hospitals found in previous studies is also present and highly significant in this dataset. The other Nordic countries are in some periods significantly different from each other, but in general have a similar productivity level.

Fig. 3
figure 3

DEA bootstrapped productivity estimates by country and year with common reference frontier. Mean of observations and 95 % CI

Figure 3 also shows a slight time trend towards declining productivity. However, the DEA bootstrap tests did not reject a hypothesis of constant technology across time periods. This implies that we can ignore the time dimension and report the simpler three-way decomposition

$$E_{i}^{{\lambda \bar{T}}} = E_{i}^{{T^{c} }} \frac{{E_{i}^{{\lambda T^{c} }} }}{{E_{i}^{{T^{c} }} }}\frac{{E_{i}^{{\lambda \bar{T}}} }}{{E_{i}^{{\lambda T^{c} }} }} = (TTP_{i} = TE_{i} *SE_{i} *CP_{i} ),$$

The productivity estimates for the individual observations are shown in Fig. 4. The hypothetical full productivity frontier is represented by productivity equal to 1.0, but since these numbers are bootstrapped estimates no observation is on the frontier. Clearly, the Finnish productivity level is consistently higher, with all Finnish observations doing better than most observations in Denmark and Norway and almost all in Sweden. Confidence intervals are quite narrow so this is a robust result. In all countries one can see that smaller units tend to be more productive, while comparisons between countries are confounded by the fact that the Swedish units are not hospitals but observations on the administrative “Landsting” level.

Fig. 4
figure 4

Hecksher–Salter diagram of DEA bootstrapped total technical productivity estimates with pooled common reference frontier. Height of each bar is productivity estimate for each observation with 95 % CI, and width is proportional to the observation size measured by real costs

Table 2 reports the mean country productivity results and its decomposition. The first line reports the of productivity of each country’s hospital sector relative to the envelopment of the bootstrapped estimates of the country-specific production possibility sets, i.e. an estimate largely based on pooling the best hospitals. While Finland has an average productivity of around 80 % measured relative to the pooled frontier, the decomposition reveals that this is wholly due to lack of scale efficiency and technical efficiency, which are at around 90 % each. The country productivity mean is almost precisely 100 %, which means that it is the Finnish hospitals that define the pooled reference frontier alone.

Table 2 Mean bootstrapped productivity in each country as measured against the pooled reference frontier in DEA

For Sweden and Norway the picture is quite different; here the country productivity is the major component in the lack of total productivity. In fact, the cost efficiency and scale efficiency components are quite similar for Finland, Norway and Sweden. This implies that the hospitals in each country has a similar dispersion from the best to the worst performers both in terms of technical and scale efficiencies, but that the best performing hospitals in Norway and Sweden are significantly less productive than the best performers in Finland.

Denmark is in between, with significantly higher country productivity than Sweden and Norway, but still lagging far behind Finland. On the other hand, Denmark has clearly the lowest technical efficiency level of the Nordic countries, which means that the dispersion behind the frontier is largest in Denmark.

Table 2 also reports the scale elasticities in the last line. Scale properties can be different across geographical units, as also found in a study on hospitals in two Canadian provinces by Asmild et al. (2013). Since the DEA numbers are based on separate frontier estimates for each country, the fact that the units are of a different nature represents no theoretical problem but must be reflected in the interpretation of the results. For Finland, Denmark and Norway, where the units are hospitals or low-level health enterprises, the scale elasticities below 1 indicate decreasing returns to scale on average, a result that is often found in estimates of hospital scale properties. Thus, optimal size is smaller than the median size. For Sweden, however, the scale elasticity is larger than one, although only just significantly. Thus, even though the units of observation are clearly larger in Sweden, the optimal size is even larger. The natural interpretation of this paradox is that while the optimal size of a hospital is quite small, the optimal size of an administrative region (or purchaser), such as the Swedish Landsting, is quite large. Of course, other national differences that are not captured by our variables may also explain this result.

3.2 SFA results

The testing tree for the SFA model is shown in Table 3. The formulation by Battese and Coelli (1995) implies that factors that determine the position of the frontier function in the deterministic part of the equation are estimated simultaneously as the variables in the “explanation” of the inefficiency term. Right hand side variables can potentially enter both components.

Table 3 Simplified test tree in the SFA analysis

Clearly, the strongest result is that country dummies should enter the frontier term. This implies that there are highly significant fixed country effects that are not explained by any of our other variables, and that by the assumptions of the model specification the country dummy should primarily shift the frontier term. The functional form of the inefficiency term is not easily tested but the exponential distribution is the one that fits the data most closely. The functional form of the frontier function itself is, however, testable, and the simple Cobb-Douglas form is rejected in favour of the flexible Translog form. The time period dummies are also rejected in both terms, which mean that the period can be ignored as in the DEA case.

The normalized marginal effects are shown in Table 4 together with the corresponding DEA results. The full estimation results for the preferred model are included in Appendix 2. The normalization in Table 4 is done so that a positive coefficient shows the percentage point increase in the productivity level (or decrease in costs) stemming from a one per cent increase in the explanatory variable. The frontier and efficiency terms are shown in separate columns. For the DEA results, the marginal effects are dependent on the input–output mix, and the numbers shown are for the average Norwegian observation.

Table 4 Marginal normalized effects on productivity in SFA and DEA, 95 % CI

The results are generally very robust across methods. The Finnish hospitals are strongly more productive than the other countries. The Swedish and Norwegian frontiers are not significantly different from each other, while the Danish frontier is in between the Finnish and the Swedish/Norwegian. In the efficiency term, the only significant country effect is that the Danish hospitals are less efficient. Of the environmental variables, the outpatient share has a significant positive effect on productivity while the LOS deviation has a weaker negative effect. The case-mix index and the dummies for university and capital city hospitals have no effect on costs. There seems to be no sign that the central hospitals have a more costly case mix than what is accounted for by the DRG system.

4 Conclusion

International comparisons can reveal more about the cost and productivity structure of a sector such as the somatic hospitals than a country specific study alone. In addition to an increase in the number of observations and therefore in the degrees of freedom, one gets more variation in explanatory variables and stronger possibilities for exploring causal mechanisms. This study has found evidence of a positive association between efficiency and outpatient share, a negative association with LOS, and no association with the case-mix index or university and capital city dummies. We have further found evidence of decreasing returns to scale at the hospital level, with a possibility of increasing returns to scale at the administrative or purchaser level. There is also evidence of cost/technical inefficiency, particularly in Denmark.

As so often, the strongest results are not what we can explain, but what we cannot explain. There is strong evidence, independent of method, that there are large country specific differences that are not correlated with any of our other variables. Finland is consistently more productive than the other Nordic countries. There are systematic differences between countries that do not vary between hospitals within each country. Without observations from more countries, or more variables that vary over time or across hospitals within each country, such mechanisms cannot be revealed by statistical methods.

On the other hand, qualitative information can give some speculations and plausible explanations. Interestingly, the stronger incentives that are supposed to be provided by ABF in Norway and some counties of Sweden does not seem to increase productivity. These data are from before the financial crisis, but Finland was still suffering the after-effects of a local recession after the collapse of the Soviet Union, with increased budget restraint in the public sector. Based on interviews of 8 hospitals in Nordic countries (Kalseth et al. 2011) some of the possible reasons for the Finnish good results can be the good coordination between somatic hospitals and primary care, including inpatient departments of health centres. This coordination is primarily due to the common ownership by the municipalities of both hospitals and primary care institutions.Footnote 7 Finland also had a smaller number of personnel as well as better organization of work and team work between different personnel groups inside hospitals (Kalseth et al. 2011). However, these findings are still preliminary. An important research and policy question is whether the higher productivity in Finland is related to differences in quality.

Our claim is that the country productivity differences are consistent with possible differences in system characteristics that vary systematically between countries. Such characteristics may include the financing structure, ownership structure, regulation framework, quality differences, standards, education, professional interest groups, work culture, etc. Some of these characteristics, such as quality, may also vary between hospitals in each country and should be the subject of further research.

Differences in estimated country productivity are also consistent with data definition differences, but the analysis in Kalseth et al. (2011) does not support this. In summary, these country effects are essentially not caused by factors that can be changed by the individual hospitals to become more efficient, but rather factors that must be tackled by relevant organizations and authorities at the national level.