The origins and development of statistical approaches in non-parametric frontier models: a survey of the first two decades of scholarly literature (1998–2020)

This paper surveys the increasing use of statistical approaches in non-parametric efficiency studies. Data Envelopment Analysis (DEA) and Free Disposable Hull (FDH) are recognized as standard non-parametric methods developed in the field of operations research. Kneip et al. (Econom Theory, 14:783–793, 1998) and Park et al. (Econom Theory, 16:855–877, 2000) develop statistical properties of the variable returns-to-scale (VRS) version of DEA estimators and FDH estimators, respectively. Simar & Wilson (Manag Sci 44, 49–61, 1998) show that conventional bootstrap methods cannot provide valid inference in the context of DEA or FDH estimators and introduce a smoothed bootstrap for use with DEA or FDH efficiency estimators. By doing so, they address the main drawback of non-parametric models as being deterministic and without a statistical interpretation. Since then, many articles have applied this innovative approach to examine efficiency and productivity in various fields while providing confidence interval estimates to gauge uncertainty. Despite this increasing research attention and significant theoretical and methodological developments in its first two decades, a specific and comprehensive bibliometric analysis of bootstrap DEA/FDH literature and subsequent statistical approaches is still missing. This paper thus, aims to provide an extensive overview of the key articles and their impact in the field. Specifically, in addition to some summary statistics such as citations, the most influential academic journals and authorship network analysis, we review the methodological developments as well as the pertinent software applications.


Introduction
Data Envelopment Analysis (DEA) is a linear programming technique measuring the relative efficiency of decision making units (DMUs), as introduced by Charnes et al. (1978). While Charnes and colleagues' article remains the most cited paper in the field of operations research (Laengle et al., 2017), there have been significant developments in both theory and applications of this non-parametric technique (Emrouznejad & Yang, 2018). Free Disposal Hull (FDH), introduced by Deprins et al. (2006), is the second most used non-parametric method which requires mixed integer programming formulation and relaxes the convexity assumption of DEA. The popularity of non-parametric techniques such as DEA and FDH is due to their advantage of not requiring any assumption on the functional form of the production frontier. However, conventional non-parametric methods are criticised as being deterministic, meaning that no noise is considered and all deviations from the frontier are assumed as inefficiency. Simar and Wilson (1998) propose a smooth bootstrap to deal with boundary issues arising from the "deterministic" nature of DEA estimators. Subsequently, it can be argued non-parametric approaches have a statistical basis and hence one of their main criticisms is no longer valid.
Most previous DEA/FDH literature reviews are solely methodological or surveys on applications (e.g., Hatami-Marbini et al., 2011;Kao, 2014;Witte & López-Torres, 2017). There are also several bibliometric analyses, as the following examples show. Lampe and Hilgers (2015) provide a bibliometric analysis of DEA and Stochastic Frontier Analysis (SFA) and discuss the methodological trends with each technique. Emrouznejad and Yang (2018) report a broad list of DEA related articles between 1978 and 2016 and provide some summary statistics including the most utilised journals, authorship, and keyword analysis. Olesen and Petersen (2016) provide a review of stochastic DEA. Stochastic DEA is defined as "an efficiency analysis using non-parametric convex hull/convex cone reference technologies based on either statistical axioms or distributional assumptions that allow for a random (estimator of the) reference technology" (Olesen & Petersen, 2016, p.3). They identify three main directions for stochastic DEA in the literature. First, methods which regard given input and output variables as a sample from a large population. Simar and Wilson (1998) approach is grouped in this category. Second, methods which are able to handle random noise like SFA. Banker and Maindiratta (1992) as the pioneer of this direction show how to interpret DEA residuals derived based on maximum likelihood function. Third, methods which are based on having information on random disturbances in the form of knowledge of distributions involved. Chance Constrained DEA is an example of such an approach (see Cooper et al., 1998;Olesen & Petersen, 1995). Among these three directions, our study only focuses on the first one as the most popular statistical approach in the literature.
Despite the recent prevalence of using statistical approaches in non-parametric efficiency studies, no previous study has focused on analysing a bibliography of this growing field of research. We address this gap by reviewing the statistical approaches introduced in nonparametric frontier analysis and identifying the most influential methodological papers over the past two decades. We then provide a bibliometric analysis for the most cited papers, including summary statistics of their citations, leading academic journals, keywords, and authorship network analysis.
The remainder of this paper is as follows. In the next section we briefly review the statistical approaches in the context of non-parametric models. Section 3 describes data and methodology used for bibliometric analysis. Section 4 identifies the most influential papers based on citations. Section 5 provides a bibliometric analysis of the leading methodological papers, Simar and Wilson (1998) and Simar and Wilson (2007). Section 6 reviews available software applications. Section 7 provides a conclusion.

Non-parametric frontier models and statistical approaches
The popularity of non-parametric techniques such as DEA and FDH is due to their advantage of not requiring parametric assumptions on the functional form of the production frontier as well as their ability to accommodate both multiple inputs and multiple outputs. However, conventional non-parametric methods were initially criticised as (i) being deterministic (meaning that no noise is considered, and all deviations from the frontier are assumed to result from inefficiency), and hence sensitive to outliers; and (ii) lacking any statistical interpretation. With regard to (i), a number of methods for detecting outliers that might distort DEA or FDH efficiency measurements have been developed, e.g., Wilson (1993Wilson ( , 1995, Simar (2003), and Porembski et al. (2005). Researchers can use these methods to detect outliers and then decide what to do. As discussed by Wilson (1993), an "outlier" is an atypical observation. When a researcher finds an atypical observation, more work is needed to determine whether the observation results from an error. Furthermore, Cazals et al. (2002) and Aragon et al. (2005) propose probabilistic approaches which provide robust estimators in the presence of outliers. With regard to (ii), Kneip et al. (1998) give a statistical model and use the assumptions of the model to establish statistical consistency and the rate of convergence of the variable returnsto-scale (VRS) version of DEA estimators. Meanwhile, Simar and Wilson (1998) show that conventional, "naive" bootstrap methods cannot provide valid inference when used with DEA or FDH estimators, and propose a smooth bootstrap to deal with boundary issues arising from the "deterministic" nature of DEA estimators.

Fig. 1 Statistical approaches in non-parametric efficiency analysis
Another large strand of the efficiency literature involves estimation of efficiency, and then regression of the efficiency estimates on some additional, environmental variables in a second stage. As of 14 February 2022, a search on Google Scholar using the keywords "efficiency," "regression" and "second stage" found approximately 229,000 hits. These studies use a variety of parametric (and occasionally, nonparametric) models for the second-stage regressions. Figure 1 illustrates the main statistical approaches in examining efficiency using non-parametric models since 1998.
In this section, we focus on DEA as most of efficiency studies employ this non-parametric method. Radial 2 DEA efficiency scores can be obtained using input or output oriented approaches, as well as the VRS assumption. Let X be a set of p inputs and Y be a set of q outputs. The efficiency score of a DMU operating at level (x, y), in an input-oriented 3 framework with the assumption of variable returns to scale and can be estimated using where θ (x, y) is the estimation of the true unknown efficiency score of θ (x, y), λ refers to the vector of constants, and i is the number of DMUs. We use (1) and provide a summary of the methods illustrated in Fig. 1 in the following sections.

Bootstrap DEA and FDH
It is a tautology that statistical inference is impossible without a statistical model and some knowledge of the distribution of an estimator of the feature about which one wishes to make inference. The limiting distribution of the FDH estimator was unknown until the work of Park et al. (2000), and the limiting distribution of the VRS-DEA estimator was unknown until it was established by Kneip et al. (2008). Simar (1996) emphasise on the importance of a valid bootstrap data generating process in statistical analysis of DEA models. Simar and Wilson (1998) proposed estimating the distributions of nonparametric, VRS-DEA efficiency estimators using a smooth bootstrap method, thereby allowing inference about inefficiency. The idea was extended in Simar and Wilson (2000a), where the distribution of efficiency was allowed to vary throughout the production set. Neither Simar and Wilson (1998) nor Simar and Wilson (2000a) offer proofs of validity of their suggested bootstrap methods since the limiting distribution of the VRS-DEA estimator remained unknown until eight years later. Nonetheless, simulation results provided in both papers seem to suggest that the proposed methods "work," at least within the context of the simulations of the papers.
Specifically, Simar and Wilson (1998) employ the bootstrap technique introduced by Efron (1992) to measure the sensitivity of efficiency scores of DEA/FDH models to the sampling variation. The homogeneous bootstrap procedure suggested by Simar and Wilson (1998), to estimate the bias corrected efficiency score of a given DMU, is presented as follows: [1] Compute the technical efficiency using (1) for all DMUs to generate θ 1 , . . . , θ n .
[2] Repeat the following five steps B times (B is a large number, say 2000) to provide a set of estimates θ * b (x, y), b = 1, . . . , B for a given DMU operating at level (x, y).
where h is the bandwidth and ε * i,b is a random error drawn from the standard normal distribution. The bandwidth can be estimated using the likelihood cross validation method suggested by Daraio and Simar (2007a). [2][3] Correct the variance of the generated bootstrap by β * i,b /n and σ 2 θ refers to the sample variance of θ 1 , . . . , θ n .
[3] Calculate the bias corrected efficiency score using Then as explained in detail by Simar and Wilson (2000b), bootstrap estimates are utilised to construct the confidence intervals. Simar and Wilson (1999) further extend the bootstrapping idea to examine whether Malmquist indices of productivity are significant in a statistical sense. In addition, Simar and Wilson (2000a) propose a general bootstrapping approach which allows for heterogeneity in the structure of efficiency. The double-smooth and subsampling bootstrap methods described by Kneip et al. (2008) were the first to be proved to provide asymptotically valid inference about inefficiency estimated by VRS-DEA estimators.
As discussed by Olesen and Petersen (2016), although the homogeneous bootstrap works well with relatively few observations in a moderate input output dimension, the assumption of identical random inefficiency distribution discredits the efficiency results in the eyes of DMUs evaluated. For example, the DMUs that are doing things differently (outliers) may not be presented in a fair way in such performance evaluations due to the focus on general tendency in statistical approaches (see Olesen and Petersen (2016) for detailed discussions).
Simar and Wilson (2011a) suggest a subsampling bootstrap approach in nonparametric models which does not require multivariate kernel smoothing. The subsampling approach is easier to implement and only requires drawing bootstrap pseudo-sample of size m out of n (the original sample size). Simar and Wilson (2011a) also provide a data-based algorithm for selecting m. The m out of n bootstrap method, to estimate the confidence interval for (x, y), with p inputs and q outputs is presented as follows: [1] Draw m samples out of n (m < n).
[2] Compute the technical efficiency based on m samples using (1).

B
[4] Construct a (1 − α) confidence interval for θ using The latent variable problem discussed by Simar and Wilson (2007) extends to other situations where researchers might want to make inference. For example, the true, underlying inefficiencies were observed, making inference about expected (or mean) inefficiency would be simple, relying on standard results such as the Lindeberg-Feller Central Limit Theorem (CLT). However, true inefficiency is not observed and must be estimated. Several applied studies have used inefficiency estimates obtained from DEA or FDH estimators to estimate confidence intervals for inefficiency using standard CLT results. However, Kneip et al. (2015) show that when using sample means of inefficiencies estimated by DEA or FDH estimators, the usual CLTs are valid only in special cases. In particular, under constant returns-to-scale (CRS) and using means of (CRS) DEA estimators, standard CLTs remain valid only if the number of dimensions (i.e., the number of inputs plus the number of outputs) is no larger than 3. Under VRS, standard CLT results hold for means of DEA estimates only if d ≤ 2, and when FDH estimators are used, standard CLT results never hold. Kneip et al. (2015) provide new CLTs for making inference about mean efficiency using either FDH, VRS-DEA or CRS-DEA estimators. Using these results, researchers can now estimate valid confidence intervals for mean inefficiency. Kneip et al. (2016) use the new CLTs proved by Kneip et al. (2015) to develop tests of differences in mean inefficiency across groups of producers as well as a test of convexity of the production set versus non-convexity and a test of CRS versus VRS. Given that FDH, VRS-DEA and CRS-DEA estimators require different assumptions regarding the shape of the frontier, the latter two tests are useful for deciding which estimator to use in a particular application. While Kneip et al. (2016) provide asymptotically normal test statistics for each test, bootstrapping is still needed for the tests of convexity and returns to scale. Both tests require randomly splitting the initial sample into two independent subsamples to implement valid tests. While the results of Kneip et al. (2016) are valid for any particular random split of the sample, different results are obtained from different splits of the same sample. Simar and Wilson (2020) provide a bootstrap method to combine information across multiple, random splits of a given sample, thereby removing the ambiguity resulting from a single random split of the sample.

Robust approach
As discussed, traditional non-parametric techniques such as DEA and FDH are sensitive to outliers. This drawback can be addressed by employing the order-m and order-α quantile frontiers introduced by Cazals et al. (2002) and Aragon et al. (2005), respectively. These two robust approaches are briefly reviewed in the following. Cazals et al. (2002) initially introduce the order-m partial frontier based on a probabilistic formulation. Accordingly, the order-m frontier is defined as the expected minimum use of inputs among sample of m DMUs drawn from the population generating more than a given level of output. Due to its sampling effect, the order-m approach provides less-extreme benchmarks than the usual non-parametric estimators. Daraio and Simar (2005) develop the idea of Cazals et al. (2002) to a multivariate case and introduce external environmental factors in modelling the FDH estimators. Daraio and Simar (2007b) further extend the early works on the order-m partial frontier to a fully non-parametric methodology with the convexity assumption. The summary of the order-m algorithm to calculate the efficiency score of θ m (x, y) is as follows:

Order-m frontier
[1] Repeat the next two steps for b = 1, . . . , B (B is a large number, say 1000).
[1-1] For a given y, generate [2] Calculate the order-m efficiency score using m (x, y). Daraio and Simar (2007b) also introduce a conditional measure of efficiency by including environmental factors in the probabilistic formulation of the production function. Accordingly, the joint distribution of (X , Y ) conditional on environmental factors, denoted by Z, defines the production process. The summary of the conditional order-m algorithm to calculate the efficiency score of θ m (x, y|z) is as follows: [1] Repeat the next two steps for b = 1, . . . , B (B is a large number, say 1000).
[1-1] For a given y, generate the sample of (X 1,b , . . . , X m,b ) by drawing a sample size m with replacement, and with a probability h is the chosen bandwidth for kernel K (.) with bounded support (see Bȃdin et al. (2010)).
m (x, y) using the following linear program [2] Calculate the conditional order-m efficiency score using  adapt the order-m approach from the radial models to directional distances. For information about the difference between radial and directional distance models readers are referred to Chambers et al. (1998) and Färe and Grosskopf (2006). Aragon et al. (2005) introduce a robust α-quantile approach based on the quantiles of a univariate distribution function to deal with outliers in non-parametric efficiency estimation. The frontier of order-α quantile is defined as the input level not exceeded by (1− α) × 100 percent of DMUs from the population of peers generating more than a given level of output. In contrast to order-m, the order-α approach have better robustness properties as trimming is continuous (α ∈(0,1]) in terms of the order-α quantile. The idea is further extended to a full multivariate setup by Daouia and Simar (2007). The summary of the order-α algorithm with no convexity assumption to calculate the efficiency score of θ α (x, y) is as follows:

Order-α quantile partial frontier
[1] Define [2] Denote as the jth order statistic of the observation such that Y i ≥ y 5 for j = 1, . . . , M y , [3] Sort as: [4] Calculate the order-α efficiency score using Recently, Daouia et al. (2017) propose an alternative way for the full multivariate environment based on the directional distance estimator of order-α suggested in .

Environmental variables
In addition to measuring the level of efficiency for DMUs, researchers and policy makers are interested to examine the impact of environmental factors on the production process. It is assumed that these environmental factors are exogenous and are not under control of management. Two recent approaches in the literature to deal with environmental factors are briefly discussed in the following.

Two-stage regression approach
A large part of the literature focuses on two-stage models where the level of efficiency is estimated in the first stage, and then estimated efficiency scores are regressed on a number of covariates in the second stage. However, as discussed by Simar and Wilson (2007), the majority of these studies provide invalid inferences due to the existence of unknown serial correlation among the estimated efficiency scores. Simar and Wilson (2007) examine this genre and provide important sets of results. They give the first (and as far as we know the only one) coherent statistical model encompassing both the first stage (where efficiency is estimated) and the second stage (where efficiency estimates from the first stage are regressed on environmental variables). Simar and Wilson (2007) show that the second-stage regression is a latent variable problem-one would like to regress inefficiency on the environmental variables, but inefficiency is unobserved and hence inefficiency estimates must be used. Moreover, the second-stage regression is shown to be a truncated regression, as opposed to censored (Tobit) regressions or linear regressions that are sometimes used in applied papers.
Specifically, Simar and Wilson (2007) propose bootstrap procedures to address aforementioned drawbacks using a bootstrap truncated regression given as where θ i is the estimated efficiency score, Z i is a vector of environmental variables, β is the coefficient vector and ε i is the stochastic error term. The summary of the double bootstrap procedure is as follows: [1] Compute the efficiency scores using (1) for all DMUs to generate θ 1 , . . . , θ n .
[3-1] Draw ε i,b from a standard normal distribution with mean of zero and standard deviation of σ ε with left truncation at −Z i β and right truncation at 1 − Z i β for i = 1, . . . , n.
Calculate the bias corrected efficiency scores for i = 1, . . . , n using [5] Use the maximum likelihood method to estimate the truncated regression of θ i on Z i to obtain β, σ . Despite the prevalence of using this method, it requires a restrictive assumption about the separability of the frontier production from the impact of Z. This assumption can be examined using a test procedure provided by Daraio et al. (2010). Simar and Wilson (2007) show that a second-stage regression of efficiency estimates on some environmental variables can only be meaningful if the environmental variables have no relation to nor influence on the shape of the frontier. Simar and Wilson (2007) refer to this as a "separability" condition, meaning that the environmental variables are not included in the set of variables that influence the shape of the frontier, and note that this is a strong condition that should be tested. Testing the separability condition requires additional theoretical results, in particular those of Kneip et al. (2015). Daraio et al. (2018) provide a test of the separability condition, building on the work of Kneip et al. (2015Kneip et al. ( , 2016. For cases where the separability condition is satisfied, Simar and Wilson (2007) provide bootstrap methods for making inference in second-stage regressions. The latent variable problem in the second-stage regression complicates inference; in particular, Simar and Wilson (2007) show that conventional inference (e.g., involving inverting the negative Hessian of the log-likelihood for the second-stage regression) cannot provide valid, meaningful inference. As further explained by Simar and Wilson (2011b) and Kneip et al. (2015), the inefficiency estimates used in the second stage are biased, creating problems beyond those described by Simar and Wilson (2007).
In contrast, some studies (e.g., Banker & Natarajan, 2008;Banker et al., 2019;McDonald, 2009) highlight the limitations of second stage bootstrap approach suggested by Simar and Wilson, and argue that a two-stage method followed by ordinary least square (OLS) regression provides consistent estimators of the impact of environmental variables. For example, Banker et al. (2019) argue that the effectiveness of bootstrap approach critically relies on the assumed data generating process and show that the bootstrap approach does not provide correct inferences in presence of stochastic noise. Using extensive simulations, they assert that the OLS second-stage model significantly outperforms the complex Simar-Wilson approach. Daraio and Simar (2005, 2007a, 2007b describe how the comparison between conditional and unconditional efficiency measures can be used to examine the impact of environmental factors on the production process. They define the following ratio for such analysis:

Conditional efficiency approach
where θ (x, y) is the unconditional efficiency score that can be simply obtained using (1) and θ(x, y|z) is the conditional efficiency score that can be estimated by solving where h is the local bandwidth and can be computed according to a data driven method suggested by Daraio and Simar (2005) or the least squares cross-validation procedure proposed in Bȃdin et al. (2010). Daraio and Simar (2005) also demonstrate how a smoothed non-parametric regression and a scatter diagram of R(x, y|z) against a univariate Z can be used to describe the impact of environmental variables on efficiency. Bȃdin et al. (2012) propose a location scale non-parametric regression to purify the conditional efficiency scores of θ (X , Y |Z = z) from the impact of Z as follows: where , E(ε|Z = z) = 0, and V(ε|Z = z) = 1. Analysing the residual of ε reveals the unexplained part of conditional efficiency score. While a large ε indicates poor managerial performance, a small ε specifies a good level of performance. Daraio and Simar (2014) show how this approach can be adapted in a directional distance setting. Bȃdin et al. (2014) further develop the idea of Bȃdin et al. (2012) and suggest a bootstrap algorithm to provide confidence intervals for the local impact of the external factors on the ratio of conditional to unconditional efficiency scores as follows: [1] Compute the n unconditional and conditional efficiency scores using (1) and (9), respectively.
[3] Select a fixed grid of values for Z, say z 1 , .., z k [4] Use the following non-parametric regression to calculate τ z j n for j = 1, . . . , k where W n Z i , z j , h z is the Nadaraya-Watson kernel weights and given by h z is the bandwidth selected by the least-squares cross validation method suggested in Bȃdin et al. (2010).
[5] Repeat the next three steps for b = 1, . . . , B (B is large, say B = 1000) for a given value of m < n. Note that the choice of m is based on the data driven method described in Simar and Wilson (2011a) [5-1] Draw a random sample of X (8) to compute m ratios for the sample. Note that the bandwidth of h * ,b m,i for calculating conditional efficiency needs to be modified by rescaling the bandwidth of h * ,b n,i used to calculate the conditional efficiency in [1] using where r is the number of environmental variables. [5-3] Estimate τ * ,b,z j m at the fix point z j for j = 1, . . . , k using (11). Note that the bandwidth needs to be rescaled to the appropriate size using where h z is the bandwidth used in [4].
[6] For each j = 1, . . . , k, compute the α/2 and 1-α/2 quantiles of the B bootstrapped values of τ * ,b,z j m -τ z j n . Due to flexibility of the directorial distance approach, recent studies focus more on adapting the aforementioned algorithms in this section to directional models (e.g., . Recently, Daraio et al. (2020) provide a fast and efficient computation of directional distance estimators for full frontier, robust versions, and conditional efficiency estimates on environmental factors, along with their MATLAB codes. et al., 2017;Merigó et al., 2018;Pritchard, 1969). There are a wide range of bibliometric indicators including the number of papers, citations, co-authorships, and keywords frequency. Of the varied approaches to bibliometric analysis, the study of citations is pronounced receiving increased attention (Kaffash & Marra, 2017;Lampe & Hilgers, 2015). Citation analysis provides critical information about emerging knowledge trends in a discipline (Kaffash & Marra, 2017).
This study focuses on the bibliometric analysis of the use of bootstrap approach in nonparametric efficiency analysis and statistical advancements developed since then. To conduct this bibliometric analysis, we first identify the methodological papers in the field which are found in a recent comprehensive survey conducted by Simar and Wilson (2015). Those papers with an average of at least 5 citations per year are then selected as possessing influential methodological content. We also identified the most influential methodological papers by choosing the papers with at least 50 citations per year. Using a range of bibliometric indicators including the cites per paper, cites per year, productive journals, frequent keywords and network of co-authors and countries contributed to both development and application of statistical approaches in non-parametric efficiency estimation, we identify important trends. This analysis reveals useful information about successful applications of new methods, past trend, and future outlook.
Specifically, we provide graphical visualisation of the relevant bibliometric materials by using visualisation of similarities (VOS) viewer software ( Van Eck & Waltman, 2010). Recently, the VOS viewer has been used as a powerful visualization tool in bibliometric analysis of various research fields (e.g., Baier-Fuentes et al., 2019;Laengle et al., 2017;Merigó et al., 2018;Muhuri et al., 2019;Türkeli et al., 2018;Yeung et al., 2017). This software provides a network representation of bibliometric indicators such as co-authorship (Peters & Van Raan, 1991) and co-occurrence of keywords (Callon et al., 1983). Co-authorship counts the number of co-authored publications among two authors and co-occurrence counts the number of publications in which two keywords used together (Eck & Waltman, 2020).
The Scopus database owned by Elsevier Ltd is used to extract the relevant data and information including citation counts and journal ranking. It is argued that Scopus covers a greater number of journals compared to the main alternative, the Web of Science (WoS) (Baier-Fuentes et al., 2019;Mongeon & Paul-Hus, 2016). Consequently, we choose Scopus over the WoS, as it covers a greater number of publications cited our selected methodological papers and provides more flexible output formats for our bibliometric analysis. The data was collected during 2021. The covered search period was from 1998, when the first article published by Simar and Wilson (1998), to the end of 2020. Figure 2 illustrates the influential methodological papers that introduce or extend statistical approaches in the context of non-parametric models. The left vertical axis indicates the number of total citations while the right axis represents the average citation per year. As can be seen, the two most cited papers are Simar and Wilson (2007) and Simar and Wilson (1998) with 1586 and 1098 total citations, respectively. Figure 2 also reveals the average number of citations per year for these two papers are significantly higher than the other methodological papers. Therefore, for this bibliometric analysis we mainly focus on these two most influential articles, analysing the trend of their citations, and other relevant bibliometric indicators.  Figure 3 shows the trend of citations of those most influential methodological papers. As can be seen, the number of citations of Simar and Wilson (1998) is less than 20 per annum until 2007, however, a surge in citations is evident after 2008. It is likely a lack of knowledge about the method and user-friendly software applications are responsible for this10-year gap between the introduction of the method and its widespread application. On the contrary, Simar and Wilson (2007) was highly cited in a short span of time. For example, it was cited 93 times in 2010 only three years after its publication. This high citation level may be a result of the release of the FEAR package in R by Wilson (2008), combined with a decade effort by Simar and Wilson to promote their bootstrap approach. Figure 3 also details that while the number of citations of Simar and Wilson (1998) is relatively flat after its peak in 2013, the number of citations for Simar and Wilson (2007) continues its ascent reaching 238 citations in 2020. Table 1 provides the list of top fifteen authors citing Simar and Wilson's papers published in 1998 or 2007. As is described, most authors are domiciled in European countries. Barros 6 as the most productive author has 50 published papers and is the listed first author in 27 papers. His prevalence suggests that he has played a significant role in both the promotion and application of the bootstrapping method. His research in the context of non-parametric  Simar and Wilson (1998) and Simar and Wilson (2007) efficiency analysis sees him collaborate with 31 individual co-authors. Collaborations with Wanke and Assaf (3rd and 7th authors in Table 1) are the highest, with 18 and 13 co-authored papers, respectively. Not surprisingly, Simar is the second productive author with 47 articles. Simar has 21 individual co-authors, and he authors with Wilson (5th author in Table 1) with 23 co-authored papers. The third most productive author is Wanke with 42 papers and 39 individual co-authors. To provide a comprehensive picture of co-authorship connections, we analyse co-authorship network of key researchers. Figure 4 reveals the co-authorship network of researchers citing Simar and Wilson (1998) or Simar and Wilson (2007). Further, it highlights how the productive authors listed in Table  1 are connected and disseminated their knowledge to other researchers worldwide. 7 To draw Fig. 4, the co-authorship network was restricted to researchers with at least 3 articles, resulting in 466 authors. Removing authors with limited connections, resulted in 236 authors in the final network map as demonstrated in Fig. 4. Note that the size of circles reflects the number of documents published by each author. As is seen, key authors form the clusters (shown with different colours) and/or act as a bridge between two or more clusters. For example, while Barros is the key person in the orange cluster, he connects to Wanke who performs as a bridge between the orange and green clusters. Figure 4. can also provide us with useful information on how a new method or approach in efficiency analysis is disseminated among researchers in different countries or institutions. Interestingly, Fig. 4. shows Simar and Wilson's co-authorship cluster (shown in dark blue) is relatively smaller than some other clusters and they have directly worked with a limited number of co-authors. However, further investigation reveals Simar has run many workshops, seminars and trained European researchers. 8 Thus, it suggests, lectures and seminar involvement may be a useful approach to introduce new methods in efficiency analysis.  Simar and Wilson (1998)  In addition, direct trainings via PhD candidates or post-doctoral fellowships also seems effective. For example, Fig. 4. shows that Zelenyuk (twelfth in Table 1) played a key role as a bridge between the purple cluster and several other clusters. Zelenyuk has been a postdoctoral research fellow at the Catholic University of Leuven, working with Simar between Sep 2004 and Aug 2005. Figure 5 presents co-authorship network map of countries. This network is based on 70 countries with at least 3 documents (total number of countries was 114). Croatia has also been removed from the map, having no link with other countries. As can be seen, the USA and UK have the highest number of documents and interestingly, many developing countries in Fig. 5, indicative of the global application of the bootstrapping method. Figure 5  how researchers from different countries are connected and formed clusters. For example, the link between Belgium and Italy demonstrated the close relationship between researchers in these countries, as two out of fifteen highly productive authors are based in Italy (as shown in Table 1). Further, our investigation reveals Daraio (14th in Table 1) was a former PhD candidate of Simar, again highlighting the importance of direct connection and training of PhD candidates, which in turn disseminates techniques and develops new knowledge. Table 2 provides the list of academic journals with at least ten articles citing Simar and Wilson (1998) or Simar and Wilson (2007). It reveals the European Journal of Operational Research, Journal of Productivity Analysis and Applied Economics with 110, 89 and 53 are, respectively, the three most utilised journals. This result consistent with Fig. 2 which shows the European Journal of Operational Research and Journal of Productivity Analysis had the highest number of methodological papers on statistical developments in the context of nonparametric methods. Note that the percentage and cumulative percentage columns are based on the total number of 2,306 documents including journal and conference publications cited Simar and Wilson (1998) or Simar and Wilson (2007). As shown in Table 2, 35 listed journals cover more than a third of total documents. Therefore, it highlights industries in which the bootstrap approach has been employed, including health, air transport, energy, environment, and banking. Figure 6 shows the most frequently used all keywords in papers cited by Simar and Wilson (1998) or Simar and Wilson (2007). Choosing keywords with at least 5 recurrences, resulted in 846 keywords. Removing common keywords including DEA, efficiency and bootstrapping provided increased clarity of the co-occurrence of keywords network. As shown in Fig. 6, technical efficiency and productivity are the commonly used keywords located in the centre of network. A continuum of colours is used to distinguish between the average year of publication for the keywords. Accordingly, a shift from traditional topics such as banking, air transportation and health to new areas such as environmental efficiency, water treatment and sustainability can be seen. A co-occurrence keywords network map can highlight the most common research areas in different countries or regions. We highlight the relevant parts of Fig. 6 for the top five countries in the appendix to provide a clear picture of the link between each country and keywords. Specifically,Figs. 7,8,9,10 and 11 illustrate the keywords networks of China, Italy, Spain, Brazil, and United States, respectively. As can been, these figures show both the average year of publications and research areas. For example, comparing Figs. 7 and 11 shows China, as a keyword has been used in more recent studies than the United States. In other words, there is more recent attention on efficiency studies in China. The comparison also reveals more topics are relevant to China than other countries. Specifically, efficiency studies related to China have addressed a wide range of research topics, especially (yellow coloured) recent research areas such as eco-efficiency, sustainability, and wastewater treatment. Figs. 7,8,9,10 and 11 also assist us to identify the areas where the frontier methods have been applied in each country. For example, Fig. 7 shows that in China the focus of efficiency analysis has been more on areas such as airline, airport, agriculture, energy efficiency, health care, retailing, sustainability and environmental efficiency. Similarly, Figs. 8, 9, 10 and 11 reveal the research areas in other countries respectively as follows: In Italy, they include: airline, airport, health care, hospitals, local government, tourism, and water industry. Efficiency Table 2 Journals with at least ten articles citing Simar and Wilson (1998)   studies conducted in Spain are mainly on agriculture, education, energy, food industry, hospital, local government, sustainability, port operation and water industry. Studies in Brazil also focus on areas such as airport, agriculture, electricity, energy, local government, port, public health, and railway transport. Finally, efficiency analysis in the United States covers areas like airline, airport, electricity, energy, government, health care, higher education, hospital, power plans, sustainability, utility sector and water. As can been, among all five countries health related studies is the most common research area followed by airline, airport, and environment.

also details
6 Software applications Darairo et al. (2019) provide a comprehensive review of software options to conduct efficiency and productivity analysis. Given that our emphasis is on the bootstrap approach, we only focus on those software applications which are capable of handling bootstrap method in non-parametric efficiency context. Due to the increasing use of the bootstrapping approach, recently more software applications and packages have been developed mainly by academics in the field. While some packages like FEAR designed by Wilson (2008) have been specifically developed to run the bootstrap DEA models, some other existing DEA software applications have added bootstrapping as a new option. Table 3 provides a list of common software applications and packages capable of running bootstrap methods suggested by Simar and Wilson (1998) and/or Simar and Wilson (2007).   Atwood and Shaik (2015) 8 *The name of software along with "bootstrap","Simar" and "Wilson" were used to search the number of documents has mentioned each package. This search was conducted using the Google Scholar on 14th February 2022 As shown in Table 3, there exist a wide range of platforms from Ms Excel to R which are used to develop packages capable of running the bootstrap method. R is a popular free open source statistical software and widely used by academics to create novel statistical packages. There are more bootstrap packages in R than other platforms and the FEAR package is by far the most used program. Simar and Wilson (1998) propose a bootstrap DEA approach which addresses one of the main shortcomings of non-parametric efficiency analysis methods such as DEA and FDH. The method provides statistical properties of efficiency estimates including bias and confidence intervals. In the last two decades, the statistical approaches including the bootstrap method have been widely used in DEA efficiency estimations. Simar and Wilson (1998) and Simar and Wilson (2007) have been highly cited by authors following these publications, yet, despite the widespread popularity of their method, to the best of our knowledge, no comprehensive review and bibliometric analysis have been undertaken. Addressing this gap, this study not only reviews and summarises the influential statistical approaches, but it also provides a bibliometric analysis of the two most influential papers in the field, totalling 1586 and 1098 citations, respectively by 2020.

Conclusions
We utilised a range of bibliometric indicators including the cites per paper, cites per year, productive journals, frequent keywords and network of co-authors and countries revealing useful information about the most influential methods and potential factors driving their success. For example, we identified the most productive authors, highlighting their role in disseminating the bootstrap approach. Further, we revealed the most common research areas in different countries and trends using keyword co-occurrence network analysis. Our results highlight how future methodological advancements can be highlighted, promoted effectively disseminated. Finally, we revealed the important role workshops, seminars and lectures, co-authorship and user-friendly software applications play in enhancing the use of methodological advancements in the field.