Dynamics of productivity in higher education: cross-european evidence based on bootstrapped Malmquist indices

This study examines patterns of productivity change in a large set of 266 public higher education institutions (HEIs) in 7 European countries across the time period 2001–2005. We adopt consistent bootstrap estimation procedures to obtain confidence intervals for Malmquist indices of HEI productivity and their components. Consequently, we are able to assess the statistical significance of changes in HEI productivity, efficiency and technology. Our results suggest that, assessed vis-à-vis a common ‘European’ frontier, HEI productivity rose on average by 4 % annually. Statistically significant changes in productivity were registered in 90 % of observations on the institutions in our sample, but statistically significant annual improvements in overall productivity took place in only 56 % of cases. There are considerable national differences, with German, Italian and Swiss HEIs performing better in terms of productivity change than HEIs from the other countries examined.


Introduction
In recent years, the higher education sector has been subject to formal quantitative research that has mainly covered such topics as the estimation of rates of return in higher education, the academic labour market, institutional behaviour, and higher education as an industry. Approaches to higher education institutions (HEIs) have been changing and, apart from recognizing their obvious role in human capital and knowledge creation, critical analysis concerning their productivity and efficiency has started to gain importance. In particular, due to changing demographic trends and competition for students, as well as a growing squeeze on public entities by financial constraints, public HEIs are under constant pressure to improve their performance. On top of this, competition between universities has been growing steadily, and European HEIs are still struggling to catch up with American institutions.
An examination of the existing literature leads us to conclude that there are several gaps in analyses of higher education productivity that need to be filled. First, major attention has thus far been focused on the analysis of productivity performance, in so far as this concerns evaluating the productivity levels of universities (among others: Glass et al. 1995;Johnes 2006a, b;Bonaccorsi et al. 2007). However, such efficiency scores, apart from providing a tool for comparing productivity between units (and, hence, serving as yet another university ranking system), say nothing about changes in productivity across time (and whether universities manage to improve their performance, stagnate or regress). Second, due to problems commonly associated with gathering comparable data for HEIs from multiple countries, such exercises (assessments of productivity changes over time) have usually been conducted with units from only one country (or, exceptionally, as in Johnes 2009 or Agasisti andPérez-Esparrells 2010, two countries). In fact, Agasisti and Pérez-Esparrells (2010) state: 'Future research can extend this study. For instance, a wider comparison among universities from different European countries could be useful for policy purposes ' (p.102). From the policy perspective, a comparative cross-European analysis of HEI productivity is of major importance, especially in the light of the integration of European higher education systems under the Bologna process.
Finally, certain higher education studies assessing productivity changes over time (Flegg et al. 2004;Johnes 2008;Worthington and Lee 2008;Agasisti and Johnes 2009;Agasisti and Pérez-Esparrells 2010) have adopted techniques based on Malmquist indices that have not been statistically verified. In other words, these authors simply state that productivity (efficiency, technology-if the indices are decomposed) in selected HEIs has increased or decreased, but no formal tool has been applied to check whether the estimates are sensitive to random variations in the data. Traditional Malmquist methodology, based on estimations of distance measures made through data envelopment analysis (DEA), a non-stochastic procedure, does not provide any insight into the statistical significance of its results. There are, however, tools based on resampling (bootstrap) methods that allow us to correct this weakness.
Hence, the considerable limits of the existing literature stem from the facts that (i) little is known about productivity changes across universities from several countries analyzed within a common methodological framework, and (ii) methodological issues concerning the significance of the results obtained with Malmquist indices have not been appropriately addressed.
A very particular feature of our dataset is its panel dimension, which allows us to go beyond studies that only compare efficiency scores across units of higher education (usually from just one country). We have managed to gather comparable statistics concerning the inputs and outputs of 266 public HEIs from seven European countries (namely: Austria, Finland, Germany, Italy, Poland, the United Kingdom, and Switzerland) over the time period [2001][2002][2003][2004][2005]. Moreover, the bootstrap estimation procedure adopted (Simar and Wilson 1999) corrects the basic and possibly biased information given by Malmquist indices of productivity, providing us with confidence intervals for these indices and their components. As a result, we have a tool to verify whether changes in the productivity of European HEIs, as indicated by Malmquist indices, are significant in a statistical sense (i.e. whether the result indicates a real change in the productivity of a given HEI or is just an outcome of sampling noise). Thus, by focusing on the two limits described above and using an original and vast set of microdata on HEIs from several European countries in conjunction with a consistent bootstrap methodology, this study presents an important extension of the existing literature.
The remainder of the paper is organized as follows: we devote Sect. 2 to a presentation of the methodology applied in our analysis (in particular, describing ways of assessing the statistical significance of Malmquist indices) and a concise description of the studies most closely-related to our research. In Sect. 3, we first present our data and then show the statistically significant results of a cross-European assessment of productivity, efficiency and technology changes in 266 HEIs. Conclusions follow.
2 Theoretical and empirical background 2.1 Changes in productivity across time-Malmquist indices and their statistical significance Higher education institutions are not classical firms whose aim is profit maximization; public HEIs, in particular, are by definition, non-profit organizations. Hence, we cannot assess their productivity by using the methods typically applied to the evaluation of companies producing goods or services and generating profit. Moreover, the functioning of HEIs is characterized by interplay between multiple inputs and outputs. Universities use such inputs as human resources (staff), students and financial resources and 'produce' at least two outputs, reflecting both their teaching and research missions. 1 Consequently, analysis of HEI productivity dynamics must take these features into account. Tools based on DEA have proven very useful in capturing multiple inputs and outputs at the same time and focusing on a non-parametric treatment of efficiency frontiers. We focus on changes in the productivity of European public HEIs where productivity is understood not in absolute terms, but as performance that is relative to the efficiency of technologies (represented by a frontier function). The aim is not to identify levels of productivity as previous studies have (e.g. Glass et al. 1995;Johnes 2006a, b), but to study the dynamics of productivity. Thus, below, we do not focus on the formal derivation of DEA relative productivity scores, 2 but we show the methods applied to assess changes in productivity in the higher education sector. To measure productivity change between two periods of time, we adopt the output-based Malmquist index of productivity developed by Färe et al. (1992Färe et al. ( , 1994Färe et al. ( , 1997, itself drawn from the measurements of efficiency in Farrell (1957) and of productivity in Caves et al. (1982). The output-oriented model aims to maximize output while using no more than the number of inputs observed. 3 Hence, the question to be answered is: by how much can output quantities be proportionally augmented without changing input quantities? In the context of HEI efficiency, outputoriented models are usually used because the quantity and quality of inputs, such as student entrants, are assumed to be fixed exogenously and universities can hardly influence their number or characteristics, at least in the short term. We compute Malmquist indices 4 that are based on DEA scores, allowing us to measure the total factor productivity (TFP) 5 change of single HEIs between two data points: where i = 1,…,N denotes the DMU 6 (in our case HEI) being evaluated, x refers to inputs and y to outputs, and m is the productivity of the most recent production point defined by inputs and outputs (x t?1 , y t?1 ) using period t ? 1 technology, relative to the earlier production point (x t , y t ) using period t technology. 7 Output distance functions are denoted as d. 8 With regard to output orientation, a value of m i,(t,t?1) greater than one indicates positive TFP growth in HEI i from period t to period t ? 1, while m i,(t,t?1) smaller than one indicates TFP decline. For example, m i, (2002,2003)= 1.14 would signify an improvement in TFP of HEI i between the years 2002 and 2003 of 14 %. If m i,(t,t?1) equals unity, then no improvement in the TFP of HEI i was observed between the two data points. In order to distinguish between two basic mechanisms provoking TFP growth, we adopt the Malmquist decomposition proposed by Färe et al. (1992): where technical efficiency change 9 (e) reflects changes in the relative efficiency of a unit i (e.g. universities getting closer to or further away from the efficiency frontier), while technological change (s) measures the shift in the production frontier itself and reflects effects that concern the higher education system as a whole. Values of e i.(t,t?1) greater (lower) than unity indicate improvements (decreases) in technical efficiency between t and t ? 1. Similarly, values of s i,(t,t?1) greater (lower) than unity indicate technological progress (regress) between t and t ? 1. The value of m will be equal to 1 if the net effect of changes in technical efficiency and frontier changes is null. The problem with the approach described above is that the frontier needed for the calculation of distance functions is estimated from the data, and thus the resulting changes in m may simply be the result of sampling noise. Hence, we adopt a particular way of measuring productivity changes: we follow a bootstrap procedure to obtain bias-corrected estimates of Malmquist indices (and their components-as in Eq. 2) and their confidence intervals (Simar and Wilson, 1999). This procedure is based on bootstrap DEA analysis 10 (relying on replication of the data-generating process) and allows us to: (i) verify whether correction for the bias in non-parametric distance function estimates (and thus in Malmquist index estimates) is desirable, and (ii) check whether the changes in productivity indicated by 3 In contrast, the objective of the input-oriented model is to minimize inputs while producing, at least, given output levels. 4 The Malmquist indices and their decomposition in our paper were computed using the FEAR software package for frontier analysis with R (Wilson 2008). 5 Färe et al. (1992) assume that production technology exhibits constant returns to scale, which implies that the Malmquist index can be interpreted as an index of total factor productivity. Allowing for variable returns to scale (convex hull or free-disposal hull) means that the solutions to programming problems can be unattainable for some observations and, in addition, that in such cases, the Malmquist index cannot be interpreted as an indicator of TFP. 6 Decision making unit (the expression commonly used in DEA analysis). In our case, each HEI is a DMU. 7 Here, Malmquist index m is defined as the geometric mean of two indices: the first, with period t, being the reference technology; the second, with period t ? 1, being the reference technology. These two indices are equivalent only if the technology is Hicks output neutral (Coelli, et al. 2005, p. 291). The geometric mean is used to avoid an arbitrary choice of the technologies from period t or t ? 1 as a reference. 8 The values of distance functions that appear in the Malmquist index are unobserved and must be estimated from the data. Due to space limits, we do not discuss all the steps concerning the derivation of distance measures. For a concise description of the formal procedure, see Coelli et al. (2005), pp. 291-294. 9 The 'Technical efficiency change' e can be further decomposed into 'scale efficiency change' and change in 'pure efficiency' (Färe et al. 1994). The results are available from the authors upon request. 10 Bootstrapping was developed by Efron (1982) and Efron and Tibshirani (1993) for cases where little or nothing is known about the underlying data generating process for a sample of observations. The data generating process can be estimated empirically by resampling the original data series to generate a set of bootstrap pseudosamples and then applying the original estimators to these pseudosamples. Bootstrapped DEA was introduced by Ferrier and Hirschberg (1997) and Wilson (1998, 2000), who demonstrated how to construct confidence intervals for DEA efficiency scores in order to overcome the main weakness of basic DEA analysis-namely, the sensitivity of the results to the sample composition.
Malmquist indices and their components are statistically significant. 11 In line with Simar and Wilson (1999), we first compute a set of bootstrap estimates for the Malmquist index for each HEI i: fmðbÞ i;ðt;tþ1Þ gfor b = 1,…B (where B is the total number of replications performed with pseudosamples drawn from the 'original' dataset). Then, the bootstrap bias estimate for the 'original' (non-bootstrapped) estimator is calculated as: The choice between the 'original' estimate of the Malmquist index and its bias-corrected version is based on a comparison of the mean square errors (MSEs) of the two indices, as it is plausible that the latter may have a higher MSE (Efron and Tibshirani 1993). 12 Finally, in order to assess whether productivity change is meaningful in the statistical sense, the (1-a) percent confidence interval is obtained with the bootstrapping procedure as:m i;ðt;tþ1Þ þ lm a ðbÞ m i;ðt;tþ1Þ m i;ðt;tþ1Þ þ um a ðbÞ: ð5Þ The lm a and um a estimated respectively define the lower and upper bootstrap estimates of the confidence interval bounds for the Malmquist index, and a (e.g. 10, 5 or 1 %) characterizes the size of the interval. Following Simar and Wilson (1999), the Malmquist index estimated is said to be significantly different from unity (and so the productivity change is statistically significant) if the interval defined in Eq. 5 does not include unity.
An analogous approach applies for all the components of the Malmquist index (e and s), so that we also obtain bias-corrected estimates of e and s:ê corr i;ðt;tþ1Þ and s corr i;ðt;tþ1Þ , as well as confidence intervals for e and s, allowing us to verify their statistical significance.

Related empirical evidence in the context of higher education
So far, probably due to problems with obtaining multiperiod micro-level data on the performance of single universities, few authors have applied Malmquist indices to HEIs, usually preferring to focus on institutions from one country. A multi-country setting demands the computation of an index that requires the same set of inputs and outputs for all HEIs from the sample and, additionally, the presence of the same units and variables across time; unbalanced panels with changing sets of HEIs or inputs/outputs are not allowed. Flegg et al. (2004) apply the Malmquist approach to a sample of 45 British universities for the period 1980/1981-1992/1993. Their results show that in these years TFP increased by 51.5 % but that most of this rise was caused by an outward shift of the efficiency frontier (technological change) and not by the movement of universities towards the frontier (efficiency change). Johnes (2008) derives Malmquist indices for 112 English HEIs over the period 1996/1997-2004/2005 and finds an average increase in TFP of around 1 % per year (decomposition shows that average annual technological change was equal to approximately 6 %, but a decrease in efficiency of 5 % per year took place). Worthington and Lee (2008) analyze 35 Australian universities (1998-2003 and find an average increase in productivity growth of circa 3 %, largely due to technological progress and not technical efficiency change. All in all, the existing evidence, based on British and Australian experience, suggests a predominant role for technological change, rather than efficiency change, in provoking overall TFP growth in HEIs. It should be noted, however, that the HEIs from the countries analyzed so far were characterized by high levels of efficiency (high DEA efficiency scores) at the outset. 11 Bootstrap methods can also be applied in the context of the socalled 'two-stage' DEA procedure, where in the second stage estimated efficiency measures are regressed on some environmental variables. Simar and Wilson (2007) define a statistical model where truncated regression yields consistent estimates and develop a bootstrap approach as a valid inference in the second-stage regression. As demonstrated in Simar and Wilson (2011), bootstrap methods, in contrast with second-stage OLS estimates, actually provide feasible means for inference in the second stage. In the higher education context, the two-stage approach performed with the use of bootstrapped truncated regression as in Simar and Wilson (2007) is adopted by Wolszczak-Derlacz and Parteka (2011). 12 Given the sample variance s 2 ðbÞ i of the bootstrap values fmðbÞ i;ðt;tþ1Þ g for b = 1,…B, and assuming that the estimated MSE ofm corr i;ðt;tþ1Þ is 4s 2 ðbÞ i (Simar and Wilson 1999: 463), it can be shown that the 'original' estimatem i;ðt;tþ1Þ , rather than the biascorrected estimatem corr i;ðt;tþ1Þ , should be used if There are only two published papers (that we are aware of) comparing changes in the productivity and efficiency of HEIs from more than one country: Agasisti and Johnes (2009) and Agasisti and Pérez-Esparrells (2010). Agasisti and Johnes (2009) employ Malmquist indices to analyze 127 English and 57 Italian public universities over the short period 2002/2003-2004/2005. In line with the findings of the abovementioned authors, their results confirm that English HEIs did not realize gains in technical efficiency, but rather registered changes in productivity that were due to frontier shifts. On the contrary, Italian HEIstypically less efficient at the outset than English onesbecame more technically efficient with respect to the frontier. This is an important result, suggesting that HEIs from countries further away from the common 'European' higher education efficiency frontier can experience 'catching-up' effects, while those which are already highly efficient move the frontier itself up.
Agasisti and Pérez-Esparrells (2010) adopt a similar setting, counting (apart from DEA scores) Malmquist indices for 57 Italian public institutions and 46 Spanish ones, again for a relatively short time span (the academic years 2004/2005 and 2000/2001). They find that Italian universities experienced important improvements in productivity, mainly due to improvements in 'technology' (the authors argue that the change resulted from important reforms in the curriculum organization of the Italian system of higher education), while Spanish universities registered much lower improvements in overall productivity, as a result of changes in efficiency.
However, despite the great advantages of cross-country evidence, none of these papers assess the statistical significance of their results. Consequently, we cannot exclude the possibility of bias caused by sample noise.
Bootstrapped DEA techniques have been used in economic analyses of productivity levels in many different sectors, including higher education (e.g. Johnes 2006a, b). On the contrary, the application of bootstrapped Malmquist methods to the analysis of productivity change has in general been less frequent, 13 and it should be noted in particular that none of the papers (that we are aware of) have used a consistent bootstrap methodology for the computation of Malmquist indices in the context of the higher education sector. Hence, the 'original' estimates of the distance functions and Malmquist indices of the universities analyzed so far have not been corrected for finitesample bias, and what remains their main weakness is that their statistical significance is unknown. In this paper, we address these issues.

Data and panel composition
Our analysis draws on a university-level database containing information on the outputs and inputs of 266 public HEIs from a set of European Union (Austria, Finland, Germany, Italy, Poland and the UK) and non-EU (Switzerland) countries for which it was possible to gather comparable micro data. We draw on a balanced panel containing statistics for single European HEIs for the years 2001-2005. 14 Even though the data comes from numerous sources, particular attention has been given to ensuring the maximum level of comparability of the crucial variables across countries in accordance with the Frascati manual (OECD 2002)-for details, see the data appendix (Table 6). Table 7 in the Appendix contains information on the number of HEIs from each country (due to space limits a detailed list of all the universities covered by our study is available upon request). To the best of our knowledge, this is the most comprehensive balanced panel micro dataset on European HEIs from several countries that has been used for Malmquist analysis of productivity change. 15 Moreover, so far, advanced analysis of productivity trends in universities from new EU member states has been ignored. In contrast, along with universities from six western European countries, we also included in our analysis HEIs from Poland. 16 Our dataset only contains public HEIs, because several statistics, the crucial ones concerning funding, are often not available for private HEIs. Additionally, we decided to concentrate only on the university sector; regarding the binary higher education system, we excluded from our 13 Bootstrapped Malmquist indices have been used to study productivity changes in cases of, inter alia, the farming sector (Balcome and Davidova 2008) and the banking (Assaf et al. 2010), insurance (Mahlberg and Url 2010) and airline industries (Assaf 2011). 14 Data on HEIs from several countries are available for more years (e.g. 1995-2008 for Poland), but for the computation of Malmquist indices based on a frontier common for all countries we need to have the same set of units across time. For example, the necessary data on Italian HEIs are not available prior to the year 2001. 15 The Eumida project (see Daraio et al. 2011 for details) collected data on 488 universities from 11 European countries: Finland, France, Germany, Hungary, Italy, the Netherlands, Norway, Portugal, Spain, Switzerland and the UK. However, to the best of our knowledge, its panel dataset was not balanced and, more importantly, from our point of view, the study of productivity changes using consistent bootstrapped Malmquist methodology was not performed with Eumida statistics. Its data is not publicly available. 16 For a study on scientific productivity of Polish HEIs compared versus HEIs from other more developed European countries and based on a similar dataset as the one used in the present study, see Wolszczak-Derlacz and Parteka (2010). sample applied science institutes/schools (such as German or Austrian fachhohschule and applied science HEIs in Finland and Switzerland), which were only marginally conducive to research. Moreover, we also excluded from our analysis special purpose units specializing in one discipline only (e.g. medicine, arts, sports) and distance learning universities, as these were not considered comparable with 'traditional' universities. Finally, units whose publication records (used as a measure of one of the outputs) were scant, incomplete or identified via ambiguous affiliations 17 were not taken into consideration.
The calculation of Malmquist indices required the estimation of distance functions. We first used a bootstrapped DEA method based on annual observations of 266 European HEIs, which produced two outputs from three inputs. Given the double mission of HEIs (teaching and research) 18 as outputs, we considered teaching output (measured in terms of graduates), as well as research output (quantified by means of bibliometric indicators and based on an analysis of publication records, as in, among others, Creamer 1999; Dundar and Lewis 1998). While comparison of the number of graduates (total, without distinguishing between various types of studies) across HEIs was quite straightforward, 19 a challenge was posed by the necessary cross-country comparability of research outputs. Different countries adopt specific measures of research production (such as research funds, publication records, patents and applications). However, we relied on the uniform bibliometric data from Thomson Reuters' ISI Web of Science database (a part of the ISI Web of Knowledge 20 ), which lists publications from quality journals (with a positive impact factor) in the majority of scientific fields. 21 We counted all publications (scientific articles, proceedings papers, meeting abstracts, reviews, letters, notes) published in a given year, with the requirement that at least one author declared an institutional affiliation with an HEI. 22 Concerning input measures, our dataset contained information on numbers of students, total academic staff and total real revenues. Revenues were converted from national currency units into Euro PPS 23 (using exchange rates from Eurostat), to account for cross-country differences in price level and the purchasing power of the money that HEIs dispose of.
As for data sources ( Table 8 in the Appendix), the availability and coverage of university-level data differed from country to country. The most comprehensive databases concerning HEIs exist in Finland, the UK and Italy, with freely-available online platforms giving access to a broad range of statistics that are not confidential. For Swiss, Austrian and German HEIs, data was kindly provided by the staff of each country's central statistical office. In the case of Poland, unfortunately, micro-data on HEIs (even public ones) practically does not exist for research purposes. There is no on-line platform containing such data, and only a few statistics are available in paper versions of publications issued by the Polish Ministry of Science and Higher Education (MNiSW) and the Polish Central Statistical Office (GUS); part of the data used were obtained through direct contact with the statistical offices possessing them. 24 Our benchmark Malmquist analysis is based on DEA performed with three inputs and two outputs, where DMUs are compared with respect to the common European frontier. As a robustness check, we consider alternative formulations of DEA specification: a two input-two output version of the DEA model (without students as an input) 25 and estimates based on the use of average values of inputs and outputs. 26 Finally, to check for cross country heterogeneity, we perform an additional analysis where countryspecific frontiers are estimated and productivity change is estimated with respect to units from the same country.

Benchmark results
In benchmark estimation we considered productivity change with respect to a common frontier, thus all 266 HEIs were treated jointly, and the frontier was estimated using annual information on the whole sample of European universities. Consequently, changes in productivity were relative to the 17 For example, we excluded from our analysis the University of London, which, as a confederal organization, is composed of several colleges. It was not possible to identify publication records for the University of London because we could not be sure whether the university's academic staff gave the names of their colleges or 'University of London' as their affiliation. 18 'Third mission' was not considered due to the methodological problems linked to its measurement and lack of relevant data. 19 See data appendix- Table 6. 20 www.apps.isiknowledge.com. 21 Web of Science covers nearly 12,000 international and regional journals and book series in every area of the natural sciences, social sciences and arts and humanities. For example, in 2009, it covered over 110,000 conference proceedings. Alternative sources, such as Scopus, could have been used, but we had access to Thomson Reuters' services only. 22 Note that papers co-authored by persons affiliated to the same institution were only counted once. 23 Purchasing power standard. 24 Detailed information is available from the authors upon request. 25 Such a reduction in the number of inputs is due to possible correlation between students and other inputs. We thank an anonymous referee for pointing this out. 26 Present outputs can be dependent not only on present inputs but also on their past values. We thank an anonymous referee for pointing this out.
European efficiency frontier in public higher education (relative in the sense that they were computed with reference to other universities from the group). Later on, we take into account cross-country specificity (see Sect. 3.3).
We first calculated 'original' (not bootstrapped) estimates of Malmquist indices (and their components). Then, we applied the bootstrap method described above (maintaining the assumption of constant returns to scale and output orientation), setting the number of bootstrap replications B = 2,000. We compared the MSEs of biascorrected and 'original' (non-bootstrapped) estimates of Malmquist indices, finding that in the vast majority of cases, bias correction increased MSE (for details, see Table 9 in the Appendix). Simar and Wilson (1999) obtained analogous results. Consequently, and like the aforementioned authors, we do not report bias-corrected estimates, but rely on 'original' estimates of m (TFP), e and s that are based on decomposition (2):m,ê andŝ. In Table 10 we show summary statistics of the variables used in the DEA model, while summary statistics of both the 'original' and bias-corrected estimates of the indices are reported in Table 11 in the Appendix, where it can be seen that the difference between the two is negligible (the coefficients of correlation between the 'original' and biascorrected series are between 0.97 for s and 0.99 for m). However, we do refer to the estimated bootstrap confidence intervals to assess whether changes in productivity, efficiency and technology are meaningful in a statistical sense. The full set of results for all HEIs is obtainable upon request; here, we present the key findings.
In Table 1, we compare all the results (N = 1,064) with the statistically significant ones (at a significance level of 5 %). 27 In particular, we show the number of cases in our panel in which estimates of Malmquist indices were significantly different from unity, Nðm Ã ÃÞ, and their average value ( " m Ã Ã), comparing them with the average value of all the indices ( " m). Finally, we report the number of cases with statistically significant increases in TFP, Nðm Ã Ã [ 1Þ, and the percentage of cases in which statistically significant annual improvements in productivity were registered. The same exercise has been done with estimates of e and s.
The calculation of confidence intervals permits us to note that at a standard 5 % level of significance, out of the 1,064 annual estimates of TFP growth between the years 2001 and 2005, 963 were statistically different from unity. Thus, in 90 % of the HEIs in our sample statistically significant changes in productivity were registered. Taking into account only statistically significant estimates of m, between the years 2001 and 2005, on average, HEIs in our sample registered an increase in productivity of around 4.5 % annually (the average value of all Malmquist indices, significant and not, equals 4.1 %). Counting cases in which m was significant and greater than one, denoted in Table 2 as %ðm Ã Ã [ 1Þ, we can conclude that statistically significant annual improvements in overall productivity took place in 56 % of cases.
Comparing statistically significant estimates e and s, the two basic components of m, average efficiency improved by 5.7 %, while technology shifted up by 4.6 %. 28 If we considered all the estimates, these values would be lower (3.2 % and 1.2 %, respectively). Hence, accounting for statistical significance matters for the conclusions drawn. Looking at the number of cases with significant improvements in efficiency and technology, it is evident that  (2), m should be a product of e and s, this need not necessarily be the case when we take into account only statistically significant changes in m, e and s (e.g. a given HEI can register a significant change in overall productivity and efficiency, but not a significant change in technology).

Robustness checks and extensions of the basic model
In order to check the robustness of our findings, we ask whether the way the productivity frontier was defined in the DEA estimation matters to the conclusions drawn, so we consider alternative DEA model formulations with modified sets of inputs and outputs. Firstly, we consider a DEA model with a restricted number of two inputs (total staff, total revenues) and two outputs (teaching output-graduates, and research outputpublications). Such a formulation addresses the difficulty in modelling the students-graduates productivity relationship 29 and corrects for any correlation between students and other inputs (such as teaching staff and funding).
Secondly, we perform a Malmquist analysis based on a DEA model with input and output data expressed as time averages. 30 Such an exercise permits us to correct for any random time variation in the data, as well as a possible relationship between past inputs and present outputs. We consider a DEA model with three inputs and two outputs as in the benchmark estimation, but based on moving averages of all inputs and outputs (2 year moving averages: t1 = 2001-2002, t2 = 2002-2003, t3 = 2003-2004, t4 = 2004-2005). Then, we obtain Malmquist indices based on this average data, which reflect productivity changes between periods: t1/t2, t2/t3, and t3/t4.
The results concerning TFP growth in European HEIs obtained with alternative DEA formulations are actually very similar to the benchmark ones (we compare them in Table 2) and the correlations between the estimates obtained with different models are fairly high. 31 The estimated annual TFP change indicated by the Malmquist index at most deviates from the benchmark result (4 %) by approximately 0.6 p.p.
Alternatively, as an extension to the basic analysis of annual changes in productivity, efficiency and technology, we employ a Malmquist analysis to only two periods: in this case the DEA model is estimated with 3-year averages (T1 = 2001-2003 and T2 = 2003-2005), so that the Malmquist index obtained can be interpreted as the average productivity change between T1 and T2 and fully corrects for time variation in the original annual data on inputs and  29 We are aware of the fact that the basic model only partially captures the relationship between student cohorts (as inputs) and graduates (as outputs). Data is annual, so the input measuring the total number of students corresponds to students attending at any level at the university in the present year. At the same time, we do not expect the number of graduates this year to be dependent on the number of first year students this year. Unfortunately, data on students divided by year of attending the university is not available for most countries in the sample. However, one might think that the proportion of first year students to the total number of students in a given university tends to be stable, so that the basic DEA model employing the total number of students as one of the inputs and the total number of Footnote 29 continued graduates as one of the outputs approximates productivity in the teaching process well. We thank a referee for raising this point. 30 We thank a referee for this suggestion. 31 The correlation coefficient between different estimates of m ranges between 0.62 and 0.97.
outputs. Crucial results based on such averaged data are reported in Table 3 and can be compared with the evidence on annual changes reported in Table 1. On average, productivity in European HEIs rose by approximately 9 % between the initial period T1 and final period T2 ( " m ¼ 8:9 % and " m Ã Ã ¼ 9:6 %)-note that this result is actually in line with the evidence on annual change reported in Table 1 (where " m ¼ 4 %; " m Ã Ã ¼ 4:5 %) because the input and output values for T1 and T2 are in fact averaged data around 2002 and 2004. Consequently, the estimates of TFP growth obtained with 3-year averages should be approximately twice as large as those obtained with annual data, and this indeed is the case. The only difference is that when we consider a longer time horizon, the proportion of HEIs registering statistically significant improvements in productivity is larger than in the case of annual changes in productivity (72 % versus 56 %, respectively).

Malmquist indices: accounting for cross-country heterogeneity
Our dataset has the important property of panel dimension. Thus, we can check for country-specific trends in productivity, efficiency and technology change. In Table 4, we report the average (by country) values of m, e and s (all and only those which are statistically significant) and the percentage of cases with statistically significant annual improvements in productivity, efficiency and technology. In most cases (with the exception of technology change in Poland) accounting for statistical significance only negligibly alters the average values of the indices estimated, so in the interpretation of the results we limit ourselves to the significant ones. The average statistically meaningful TFP change indicated by the Malmquist index ranges from 0.98 (TFP decline of 2 % annually) in Austrian HEIs to 1.09 (TFP growth of 9 % annually) in Switzerland, where the average efficiency change was also the highest (rising by 19 % annually). Only Austrian HEIs registered a decline in average efficiency: by 4 % ( " e Ã Ã ¼ 0:96).
In all of the countries examined, so that the percentage of cases with HEIs registering statistically significant improvements in TFP was larger than the percentage of cases with statistically significant efficiency growth, which, in turn, was higher than the percentage of cases with statistically significant positive changes in technology. For instance, 69 % of 216 annual observations on Italian HEIs (54 university units observed across four time periods) registered statistically significant TFP growth; 45 % were characterized by statistically significant improvements in efficiency, but only 9 % showed statistically significant improvements in technology. 32 Among the seven European countries analyzed, Italy had the highest percentage of public HEIs with significant TFP growth and significant efficiency improvement.
The time dimension can also be important when analyzing the productivity changes of HEIs in European countries. We have isolated only HEIs with Malmquist indices statistically significantly different from unity (at the 5 % level)-that is, either higher (statistically significant productivity increases) or lower (statistically significant productivity decreases). Of Table 3 Results-trends in productivity (m), efficiency (e) and technology (s) in 266 European HEIs, change between T1:2001-2003and T2:2003-2005 (Agasisti and Pérez-Esparrells 2010), in contradiction to our findings, conclude that productivity growth in Italian HEIs was due to major technological change (frontier shift). However, they do not count annual changes, as we do, but overall change between 2000/2001 and 2004/2005. Secondly, our estimate concerning British HEIs (2 % annual rise in TFP) is higher than the one obtained by Johnes (2008), who finds a 1 % increase in TFP per year over the period 1996/1997-2004-2005/, but in line with these authors and with Agasisti and Johnes (2009) and Flegg et al. (2004), we also find a dominating role of frontier shifts in causing productivity gains in UK universities.
J Prod Anal (2013) 40:67-82 75 these, we have calculated the average TFP across the HEIs in given countries in each time period. Figure 1 shows the average significant change in TFP in public European HEIs, by country and year (detailed data are reported in Table 12 in the Appendix). It turns out that if we take into account exclusively those universities that really (in a statistical sense) registered a change in productivity, on average German, Italian and Swiss HEIs performed better (having constant rises in TFP) than HEIs in other countries. Due to space limits, we are not able to report all the Malmquist indices for every HEI and year analyzed. However, we count European universities that registered constant statistically significant improvements in TFP (thus having Malmquist indices significantly larger than unity in all of the time periods). Of the 266 universities in our sample, this was the case in only 28 units. Among these we find: two HEIs from Finland, eight HEIs from Germany, fourteen HEIs from Italy, one from Poland, two from Switzerland and one from the UK. 33 Finally, in order to check whether the definition of the frontier matters for country-specific conclusions, we consider two alternative applications of the DEA model: the first based on a pooled sample (266 HEIs) and thus reflecting the general 'European frontier'; the second based on separate DEA models for each country (where each HEI Table 4 Results by country (1): annual changes in productivity (m), efficiency (e) and technology (s)-mean values (all and statistically significant) and percentage of cases with statistically significant improvements; CRS Table 5 Results by country (2)-changes in productivity (m), efficiency (e) and technology (s) based on 3 year averages (T1 = 2001-2003 and T2 = 2003-2005) and alternative frontier definition (E-European frontier; C-country specific frontier): mean values of all indices; CRS was evaluated with respect to the units from the same country, e.g. comparing the performance of Italian HEIs with other Italian HEIs etc.). This exercise could be done for five of the seven countries in our sample-in the cases of Austria and Switzerland the number of decision-making units is not sufficient to estimate the frontier and assure a reasonable level of discrimination. 34 A comparison between the two approaches to frontier definition can be particularly informative when comparing efficiency change and analyzing whether universities were getting closer to (or further away from) the overall 'European' efficiency frontier or their national frontiers (influenced by countryspecific educational policies etc.). 35 Table 5 summarizes the results by country, based on 3-year averaged data and corresponding to the two alternative definitions of the frontier (E-European frontier and C-country-specific frontier). The values reported correspond to mean TFP, efficiency and technology change in HEIs from single countries in the periods T1:2001-2003 and T2: 2003-2005. In turns out that frontier definition is less important for the measurement of general productivity change indicated by the Malmquist index (which remains the main interest of our analysis) than for its components. The correlation between the series of " m obtained with the European frontier (E) and that using the country-specific frontier (C) equals 0.99 and their average values are very similar. Italian and German HEIs registered the biggest TFP change in T1 (2001-2003) and T2 (2003-2005) (by 17 and 11 %, respectively). However, the channels through which productivity changes are materialized differ depending on the frontier formulation. This observation leads us to the issue of frontier definition in DEA/Malmquist studies performed for units from different countries. 36 As far as the common European frontier is concerned (E), HEIs from all countries obtained an average rise in productivity ( " m [ 1), which in the cases of German, Italian and Polish HEIs was mainly due to an increase in their relative efficiency (movement towards the European frontier-catching up effect), while in the case of Finish and British universities productivity growth was achieved more through technology change.
Using the country-specific frontier model (C), again, universities from all countries on average registered improvements in their productivity. It is notable, however, that this time a rise in TFP was mostly due to shifts of their country-specific frontiers (as indicated byŝ, the technology change estimate in country-specific frontier setting). Only in the case of Polish HEIs do we obtain a value of " e greater than 1 in the country-specific approach, meaning that units from Poland were not only catching up with the European frontier but also with the national one. On the contrary, German HEIs on average caught up with European ones ( " e ¼ 1:081 in the European frontier setting) but moved back from the German efficiency frontier ( " e ¼ 0:98 in the country-specific frontier setting). This means that the German higher education frontier was rising more quickly than the overall European one. Similar patterns emerge when analyzing the Italian case. Consequently, the choice of benchmark against which we assess the efficiency performance of universities makes a difference. This is an important result and will be included amongst the guidelines for future research which we propose together with our conclusions.

Conclusions and suggestions for future research
Despite increasing pressure on public universities to constantly optimize results using limited resources, changes in  (Boussofiane et al. 1991;Dyson et al. 2001). In our case, the use of a DEA model with three inputs and two outputs requires a set of at least 12 units for each country. 35 As an alternative, another possible method for comparing country frontiers is the meta-frontier approach, where the metafrontier function is defined as: ''an overarching function that encompasses the deterministic components of the stochastic frontier production functions for the units that operate under the different technologies involved'' (Battese et al. 2008). Such a methodology has been mainly applied in the context of regional variation in the data. In our case, such a European metafrontier would envelope country-specific productivity frontiers for HEIs from separate countries in the sample. More on the application of the metafrontier method for the decomposition of the Malmquist index can be found in Oh and Lee (2010). We thank a referee for drawing our attention to this issue. 36 We thank a referee for pointing out this issue. university productivity have only been marginally analyzed, usually with respect to HEIs from just one or at most two countries. Cross-country multi-period analysis of productivity trends is demanding, as it requires the collection of micro data for the same units and for multiple time periods. In the case of universities from several European countries, it has proven to be a quite challenging, albeit feasible, piece of research. Our paper contributes to the existing literature by presenting productivity changes (along with efficiency and technology trends) in 266 public HEIs from seven European countries for the years 2001-2005 (analyzed mainly annually, but also in terms of time averages). Moreover, we have proposed the application of important methodological improvements, providing consistent estimates of Malmquist indices, along with their confidence intervals, based on a bootstrap method. Consequently, our conclusions are based on statistically significant results that do not suffer from sample noise and, hence, are statistically robust.
These robust results indicate that, of the 1,064 annual estimates of TFP growth in the European HEIs analysed, 963 (90 %) were statistically different from unity (at a standard 5 % level of significance) so the majority of HEIs registered statistically significant changes in productivity. Between the years 2001 and 2005, HEIs from our sample registered an average increase in productivity of around 4.5 % per year and efficiency change predominated over technology improvements. However, the methodology adopted permits us to state that only approximately half of the cases were characterized by statistically significant annual improvements in overall productivity. In the other cases, either the TFP of HEIs declined or their Malmquist indices were not significantly different from unity (no improvement-no regress).
Our study has benefited from the advantage of being based on panel data, with information on the productivity performance of universities from several European countries. Consequently, we have thoroughly analyzed crosscountry variation in productivity changes that are typical for universities from different systems of higher education. The average TFP index ranged from 0.98 (TFP decline of 2 %, annually) in Austria to 1.09 (TFP growth of 9 %, annually) in Switzerland. There is also much inter-country variation in the proportion of universities that registered statistically significant improvements in productivity. For instance, two-thirds of Italian HEIs registered statistically significant TFP improvements (the best score across the seven countries), while this was typical for less than half (46 %) of British universities.
With regard to the time dimension, we have been able to check which universities registered constant TFP growth in every time period across the years 2001-2004. On average, German, Italian and Swiss HEIs, whose TFP rose consistently, performed better than HEIs from the other countries. Looking at single university units, in our basic analysis evaluating HEIs vis-à-vis a common European frontier of productivity, we found that only 28 European universities (out of 266) registered statistically significant improvements in productivity in all of the years between 2001 and 2005.
We have extended our analysis by comparing the results obtained with alternative datasets, by changing the set of inputs and outputs in the DEA estimation and by employing alternative definitions of the productivity frontier ('European' and 'country specific'). Our basic finding of approximately 4 % annual productivity growth in European HEIs is robust to changes in the formulation of the DEA model for the Malmquist index calculations. Frontier definition is not so important for the measurement of general productivity change (the Malmquist index remains fairly stable) but proves to be relevant when comparing efficiency and technology developments. A joint treatment of universities with respect to a common productivity frontier is appropriate if the researcher is interested in comparing HEIs as units competing jointly within the European system of higher education, as we were. Assessing HEIs against other units from the same country tells more about the movement of national frontiers of higher education. Consequently, through alternative frontier measurement we demonstrate that, depending on the research question formulated at the outset, the need to take into account the heterogeneity of higher education systems across countries should be considered.  (2004, p.34) as academic staff we consider: ''personnel whose primary assignment is instruction, research or public service; personnel who hold an academic rank with such titles as professor, associate professor,assistant professor, instructor, lecturer, or the equivalent of any of these academic ranks; personnel with other titles if their principal activity is instruction or research.'' Source: own elaboration J Prod Anal (2013) 40:67-82 79   Source: own elaboration Based on three input-two output model