Introduction

The oil industry plays a key role in the economy. Fluctuations in the price of oil greatly influence economic variables (Cuñado and Perez de Gracia, 2003). The price of energy is an important component of the cost of many goods and services; as the current scenario shows, rises in energy prices bring about fast growth in inflation rates (International Monetary Fund, 2022; ECB, 2018). The increasing awareness about climate change, greenhouse gas emissions (GHG) and sustainability has also fostered the interest of policymakers and citizens in this industry, whose environmental impact is substantial (Mariano & La Rovere, 2017).

One widespread approach to assess the performance of an industry is to explore its efficiency. There are two main tools to carry out this analysis: stochastic frontier analysis (SFA) and data enveloping analysis (DEA). Both employ as the starting point data on inputs and outputs for a group of decision-making units (DMUs), typically companies. SFA, pioneered by Aigner et al. (1977) and Meeusen and Broeck (1977) works from a parametric and statistical approach, estimates a production function and captures inefficiency in the error term. DEA, proposed by Charnes et al. (1978), is instead a non-parametric technique which constructs a best practice frontier by linear programming, computes the relative distance of DMUs to the frontier and generates an efficiency score for every DMU in the sample.

In the last decades DEA has been employed extensively and successfully for the assessment of many industries, including energy (see Emrouznejad & Yang, 2018, for a review). Here the literature has focused primarily on the analysis of electricity and energy saving; the exploration of other aspects, as the oil industry, has been sparser in comparison despite its relevancy (Sueyoshi et al., 2017; Zhou et al., 2008).

This paper contributes to filling this void in the literature by analyzing the efficiency of oil firms operating in Europe over 2010–2019. More in particular, this exploration has the following goals: first, it aims to shed more light on the recent performance of efficiency in this industry. Second, it intends to identify managerial and macroeconomic aspects associated with efficiency. Another goal of our investigation is to provide practical recommendations for firm managers, stakeholders, and policymakers related to this industry.

We pursue our investigation empirically by computing efficiency scores for a large sample of European oil companies using a DEA baseline model. Additionally, we explore the statistical association between efficiency and various managerial and macroeconomic variables as size, activity, environmental policies, financial management, human resource management, oil price, and economic activity, among others.

In accord with part of the literature (Bang et al., 2019; da Silva et al., 2019; Lee et al., 2009; McDonald, 2009; Miao et al., 2020; Sağlam, 2018, among others), we have divided the empirical analysis in two stages: in the first one we compute efficiency scores; in the second stage, we explore the correlation between the efficiency scores estimated in the first stage and different sets of variables.

We have combined several DEA tools for this investigation. We have worked with the baseline DEA model in the first stage and the Tobit specification in the second. Since these methodologies are not devoid of some limitations, we have completed this exercise with another one, grounded upon the Simar and Wilson methodology (1998; 2000; 2007) and performed by means of bootstrap. This procedure provides a natural robustness test for our findings with baseline DEA and Tobit. By and large the main messages obtained with the first approach carry over when the Simar-Wilson methodology is employed.

One of the aspects of firm management we are interested in is the environmental concern. One potential strategy to explore this issue is to employ DEA models with desirable and undesirable outputs (non-polluting and polluting, respectively). A relevant line of research pioneered by Sueyoshi and Goto (2012a, b), among others, has coined and assessed the distinction between operational and environmental efficiency.Footnote 1

Our approach in this paper, however, is largely determined by data. We have access to a rich dataset of microdata which enables us to explore and compare the performance of around 300 companies; detailed data on environmental aspects of companies in such a large scale are not available yet, though, either in the dataset we are using or in other compilations, as far as we know. A thorough review of environmental reporting by companies has enabled us to gather environmental data, but only for a reduced subsample of the original sample. Hence, the computation of environmental efficiency along the lines of Sueyoshi and Goto (2012a) is not feasible for our complete sample. For the purpose of this paper, we have opted to exploit as much as possible our primary dataset and focus on operational efficiency. Nonetheless, we complement the information for the large sample with the environmental data we collected and employ this combination in the second stage of our empirical analysis, as it will be detailed below.

This paper contributes to the literature in several ways. First, it provides an exploration of a sample of European oil companies; this is a geographical area which has not been covered enough by the DEA literature so far, despite its large theoretical and practical interest. As “Related literature and theoretical framework” will discuss in more detail, there are empirical analyses for US, Asian, and Middle East companies and for general samples of firms worldwide. There are almost no studies, however, for samples of European companies.Footnote 2 We think it is important to fill this void and focus on firms operating in Europe for a number of reasons. From a theoretical approach, Europe is an appealing area for comparative economic analyses because it combines a substantial degree of economic convergence and homogeneity with idiosyncratic features and diversity across regions and countries. These characteristics imply that, while the performance of companies operating in Europe is comparable, it displays significant differences as well which may be exploited by the researcher to identify trends and patterns.

There are other, more practical reasons which justify the investigation of the European oil industry. Europe is an important agent in the worldwide oil sector. In 2019, the EU represented 27% of global imports of crude oil and 30% of global imports of oil products (International Energy Agency, 2022).Footnote 3 At the macroeconomic level, this high dependency from other areas poses major risks to the European population if geopolitical tensions arise, as we are currently seeing in regard to the Russian oil. At the microeconomic level, moreover, there is some concern about the performance of the European oil sector in recent years, since policy reports have detected low utilization rates, high operation costs and overcapacity (Lukach et al., 2015). It is true that there has been some progress in the recent past in the correction of these shortcomings, but the improvement has not been sufficient. Increasing competition from other areas (notably Asia) is jeopardizing the survival capacity of European firms (Nivard & Kreijkes, 2017). The energy dependency of Europe from outside (which poses a concern for the European authorities) could be reduced if the European oil industry reached higher levels of efficiency and competitiveness.

Another contribution of our paper is related to the data we use. We have gathered a rich dataset covering around 300 firms over a 10-year period. This allows us to work with a sizeable number of observations and explore in detail the association of efficiency with size, activity, country, region, and other aspects, obtaining interesting insights for managers, potential investors, and other stakeholders of oil companies. Third, our analysis provides specific policy recommendations which can orientate the industrial policy strategy regarding this sector primarily not only in Europe but also in other geographical areas.

Our investigation is grounded upon the hypothesis that the level of efficiency in European firms is not high, since policy reports have already alerted about various dysfunctions in the European oil sector. Another hypothesis is that there must be some association between efficiency and size, because of the presence of increasing returns to scale linked to the heavy requirements of technology and capital for this industry. We also believe that firms present features which are linked to efficiency and can be approximated by means of variables; thus, the design and estimation of appropriate models may help identify the sign and magnitude of the association of these variables with efficiency.

The structure of this paper is as follows: The “Related literature and theoretical framework” section summarizes the related literature and our theoretical framework. The “Methodology” section describes our methodology. The “Data and variables” section discusses our data and variables. The “Empirical results” section details our baseline empirical exercise, and “Sensitivity analysis” carries out a robustness test. The “Concluding remarks and policy recommendations” section concludes and offers policy recommendations.

Related literature and theoretical framework

Recent contributions have analyzed the performance of efficiency in oil companies. By and large, the most common tool for this analysis has been DEA, with the number of employees and a proxy for assets as inputs and revenue or production as output.

Table 1 compiles a set of recent papers in this literature. Some of these papers focus on the performance of oil firms from a particular country such as China (Hanrui & Xun, 2011; Song et al., 2015), India (Vikas & Bansal, 2019), the USA (Atris & Goto, 2019; Mekaroonreung & Johnson, 2010; Sueyoshi and Wang, 2014; 2018), and Indonesia (Putra & Adinugraha, 2018). Others, instead, pursue an exploration of OECD countries (Lim & Lee, 2020) or the whole world (Eller et al., 2011; Ohene-Asare et al., 2017; Sun et al., 2017). In general, efficiency in emerging countries grows over time (Hanrui & Xun, 2011), whereas it exhibits a decreasing trend in developed countries (Lim & Lee, 2020).

Table 1 Efficiency in oil companies, selected papers

Most of the literature goes beyond the mere discussion of efficiency trends and explores their potential drivers along different dimensions, usually by means of regressions where the dependent variable is efficiency and the regressors are variables capturing the potential drivers. Managerial practices and other variables capturing aspects which are internal to companies are important candidates in this regard.

Company size may be associated to efficiency when the production function does not exhibit constant decrease to scale. In fact, the oil sector is heavily dependent on fixed assets (plants, machine, equipment) and state-of-the-art technology, which can be best capitalized when the number of units produced is large, entailing in turn increasing returns to scale (Romer, 1990).Footnote 4

This aspect has also been addressed by the empirical literature on this industry. Sueyoshi and Wang (2014) suggest that integrated companies operating along the entire supply chain outperform independent companies because integration facilitates economies of scale. Atris and Goto (2019) argue that large companies outperform small ones in terms of efficiency. Vikas and Bansal (2019) detect inefficiencies of scale in a significant number of firms included in a sample of Indian oil and gas listed companies.

It is also possible that there are competitive advantages associated to niches, thus benefiting small firms. This kind of pattern has also been detected in other industries, where the efficiency of very big and extremely small firms dominates the rest, due in turn to the coexistence of increasing returns to scale and niche specialization (Díaz & Sanchez-Robles, 2020, 2022). Some papers suggest this possibility. For example, Ismail et al. (2013) document for their sample that large and small oil firms outperform medium size companies in terms of efficiency. Mekaroonreung and Johnson (2010) find that very specialized, small firms outperform their counterparts in a sample of US oil companies.

The literature has also documented the impact of labor costs and human resource management in the performance of the firms in this sector. According to Lukach et al. (2015), operating costs are higher in companies of Europe than in other regions and exhibit an increasing trend, partly because of personnel expenses. Al-Najjar and Al-Jaybajy (2012) find that an excessive number of employees damages efficiency in a sample of oil firms from Iraq.

Financial stability has also been found to be a factor associated with efficiency (Díaz & Sanchez-Robles, 2020, 2022). The financial structure is usually crucial for capital-intensive companies and may have an important effect on its performance. On a priori grounds, it can be argued that more leveraged balance sheets have a positive impact on the firm, since external finance is normally cheaper than the return requested by shareholders. The opposite link may also exist if information asymmetries prevail in financial markets, since under this hypothesis, banks usually charge a higher interest rate premium to more indebted firms, harder to monitor, and riskier (Bernanke & Gertler, 1995). It is, thus, an empirical issue which depends on the degree of competitiveness of financial markets and must also be dilucidated for each sector.

Another managerial aspect we are interested in is the relationship between efficiency and environmental conscience. A priori, it is not clear whether there is a positive or negative correlation between efficiency and environmental concern. Both scenarios are feasible from a theoretical point of view. If firms must incur in large outlays to operate sustainably, a trade-off may exist between efficiency and environmental concern.Footnote 5 If state-of-art, more advanced technologies allow simultaneously a better utilization of inputs and a reduction of undesirable outputs, then efficiency and environmental concern may be positively associated. Empirical evidence in this regard is mixed. Boyd and McClelland (1999) and Berman and Bui (2001) document a positive relationship between efficiency and environmental awareness for US firms. Ismail et al. (2013) arrive at the same conclusion using a sample made up of firms worldwide.

Instead, Song et al. (2015) report a negative association for 20 listed Chinese firms. Mekaroonreung and Johnson (2010) analyzed the efficiency of 113 US oil refineries in 2006 and 2007. They show that environmental regulations have damaged company efficiency, although this impact is smaller for efficient firms. Sueyoshi and Wang (2018) find that the environmental pressure exerted by regulations and stakeholders has damaged efficiency in a sample of 30 US firms.Footnote 6

The literature has also explored the association of external variables with efficiency. Key variables for the oil industry are the oil price, which proxies for supply, and the economic activity, which captures the demand side of the market. The sign of the impact of oil price on efficiency is not straightforward. Growing oil prices should positively affect oil production (Friedman, 1992); low prices, however, could also increase output if companies try to offset this factor by expanding production (Mohaddes & Pesaran, 2016). In this regard, Sueyoshi and Wang (2018) detect a positive association between oil prices and efficiency while Putra and Adinugraha (2018) do not.

It is also possible that efficiency be affected by factors associated to the country of origin of companies (through features like government policies, the institutional environment, and the education of the labor force, among others). Putra and Adinugraha (2018), for example, document a negative correlation between government intervention and efficiency for their sample.

The discussion of the literature suggests, therefore, an association between efficiency on the one hand and aspects such as size, human resource management, financial management, environmental policies, country characteristics, and oil price on the other. We explore in more detail these associations in the second stage of our empirical exercise by means of the estimation of several models relating efficiency to variables capturing these aspects.

Methodology

Our methodology in this paper is empirical and carried out in two stages. Figure 1 provides a visual summary of the methodology. In the first stage, we employ DEA to compute efficiency scores from data for each year and company in the sample. In the second stage, we construct a panel; then, we use regression analysis and the efficiency scores estimated in the first stage to examine the performance of several variables potentially associated with efficiency, qualitatively, and quantitatively. This section discusses this methodology in more detail, distinguishing between the first and second stages.

Fig. 1
figure 1

The methodology in this paper. A summary. Note: The Figure illustrates our methodology in this paper. It is divided in two stages. In the first stage, we use data on inputs and outputs to get efficiency scores per company and year in the sample. The tool to perform this computation is DEA. In the second stage, we estimate regression models using the efficiency scores from stage 1 as the dependent variable and data on various variables which capture internal and external aspects of companies as regressors. Regression models are estimated according to different procedures (Tobit, Simar-Wilson) and provide quantitative and qualitative results

First stage

In the first stage, we employ data on inputs and outputs for the firms in our sample. We get an efficiency score per company and year. For this computation, we use DEA, a popular tool in operations research, whose main features we describe next.

Essentially, DEA constructs a technological frontier from data about inputs and outputs of companies. This frontier can be understood as the geometrical locus of production plans in an input/output space.Footnote 7 The best performers in the group lie on the frontier and register the maximum efficiency score by definition. DEA computes efficiency scores for the rest of the companies (placed inside the frontier) according to their distance to the best performers.

In terms of the solution strategy, DEA solves a linear programming problem, searching for a vector in the feasible set satisfying the optimality criteria. Both the objective function and the constraints are linear equations.Footnote 8 Constraints specify technical and economic conditions, typically in the form of total available quantities of inputs (in the output-oriented model), desired levels of output (in the input-oriented model), and non-negativity of variables representing quantities. The search for the solution requires the use of algorithms when the number of outputs and inputs (and hence the dimensionality of the problem) gets large.Footnote 9

The theoretical setup for DEA is grounded on the following, very general assumptions (Simar & Wilson, 1998)Footnote 10:

  1. 1.

    There are n DMUs indexed by j (j = 1, …, n), and a technology Γ transforming inputs into outputs:

    $$\Gamma =\left\{\left(x,y): y\;\mathrm{can\;be\;produced\;by}\;x \}\right.\right.$$
    (1)

    where xj is a vector of m inputs and yj is a vector of s outputs for DMU j.

    $$\begin{array}{c}{x}_{j}= \left({x}_{1j }, \dots , {x}_{mj}\right) \in {R}^{m}\\ {y}_{j}= \left({y}_{1j }, \dots , {y}_{sj} \right) \in {R}^{s}\end{array}$$
    (2)

    The feasible set in this problem is the production possibility frontier P(x) or, alternatively, the input requirement set L(y):

    $$P\left(x\right)\equiv \left\{y: \left(x, y\right)\in\Gamma \right\}$$
    (3)
    $$L\left(y\right)\equiv \left\{x: \left(x, y\right)\in\Gamma \right\}$$
    (4)

    where P(x) is closed, convex, and bounded for all x \(\in {R}^{m}\) and L(y) is closed and convex for all y \(\in {R}^{s}.\)

  2. 2.

    Inputs and outputs are freely disposable, i.e., technology is monotonic.

  3. 3.

    Given a vector of control variables z, which are potentially associated with efficiency, there exists a density function f(x,y|z), strictly positive and continuous.Footnote 11

  4. 4.

    Within the production possibility frontier, technology is differentiable.

Since DEA is a non-parametric technique, there is no need to assume a specific functional form (as Cobb–Douglas or Translog) for the technology mapping inputs to outputs. The only crucial technological feature to be specified is whether returns to scale are constant or variable.

In this paper, we work with the input-oriented setup of the problem because it is, in our view, intuitively more appealing and closer to the actual practice in firms and other institutions.Footnote 12 For the specific case of oil companies, the assumption of variable (increasing) returns to scale seems more realistic: the installation, maintenance and upgrading of the necessary technology, infrastructure, and equipment entail larger fixed costs which bring about decreasing average costs in production.

Define θj as the input-oriented efficiency score for DMU j, with \(0<{\theta }_{j}\le 1.\) Intuitively, θj informs of the reduction in inputs which DMUj should carry out in order to become efficient, relative to the best performers in the sample. In this setting, the best performers attain an efficiency score of 1 by construction. To compute an efficiency score for every DMU in the sample means solving Eq. (5) for each unit:

$$\mathrm{min}\left\{\theta >0: \theta x\ge {\sum }_{j=1}^{n}{\lambda }_{j}{x}_{j}; y\le {\sum }_{j=1}^{n}{\lambda }_{j}{y}_{j}, {\sum }_{j=1}^{n}{\lambda }_{j}=1, { \lambda }_{j}\ge 0, j=1, \dots , n\right\}$$
(5)

where λ stands for the set of multipliers in the linear combinations of the DMUs’ inputs and outputs, i.e., the weight of each DMU within the peer group of DMUs.Footnote 13 The constraint \({\sum }_{j=1}^{n}{\lambda }_{j}=1\) is the convexity condition associated to the variable returns to scale assumption.

In the particular case of this paper, this first stage provides a set of efficiency scores for each company and year in our sample, which plays an important part in the second stage of the analysis.

Second stage

In the second stage, we design a statistical model to explore the correlation between efficiency scores from the first stage, on the one hand, and different aspects of firm management and the macroeconomic and institutional setup where companies operate, on the other. The literature has already employed successfully this method (Bang et al., 2019; da Silva et al., 2019; Lee et al., 2009; McDonald, 2009; Miao et al., 2020; Sağlam, 2018, among others).

More formally, we estimate Eq. (6).

$$\widehat{\theta }_{it}={z}_{it} \beta +{\epsilon }_{it}$$
(6)

The dependent variable \(\widehat{\theta }\) in Eq. (1) is made up by the efficiency scores constructed in stage 1 for each firm and year within the framework of baseline DEA and discussed in the previous subsection. The controls in z are internal and external variables, as described in “Data and variables.” Since the dependent variable and most of the regressors exhibit time and cross-section dimensions, we organize our data in a panel, with i indexing firms and t indexing time.Footnote 14 This practice allows to exploit heterogeneity across firms and over time and to increase the degrees of freedom in our estimation. β is a vector of parameters to be estimated, and εit is the error term.

We have chosen a Tobit model for the functional form of Eq. (6); it seems appropriate since efficiency scores are censored by construction at the maximum value of efficiency, 1. Nonetheless, this assumption will be relaxed in “Second stage results” below. Parameters are estimated by maximum likelihood, as in previous research on DEA for oil companies (Dalei & Joshi, 2020; Putra & Adinugraha, 2018). The model has been specified with random effects because fixed effects yield inconsistent estimates in non-linear models (Greene, 2004). Moreover, in fixed effects models, covariates with little or no variability over time, as many of our categorical variables, are not identified.

The model may be characterized more precisely by Eq. (7).

$$\begin{array}{c}{\widehat{\theta }}_{it}^{*}= {z}_{it}\beta +{u}_{i}+{e}_{it}\\ \begin{array}{ccc}{\widehat{\theta }}_{it }=1\ if\ {\widehat{\theta } }_{it}^{*}\ge 1;& {\widehat{\theta }}_{it }= {\widehat{\theta }}_{it}^{*}\ if\ 0\le {\widehat{\theta }}_{it}^{*}\le 1;& {\widehat{\theta }}_{it }=0\ if\ {\widehat{\theta }}_{it}^{*}\le 0\end{array}\\ \begin{array}{c}\begin{array}{cc}{u}_{i}\sim N\left(0, {\sigma }_{u}^{2}\right);& {e}_{it}\sim N\left(0, {\sigma }_{e}^{2}\right)\end{array}\\ \begin{array}{cc}i=1, 2, \dots , n;& t=2010, 2011, \dots , 2019\end{array}\end{array}\end{array}$$
(7)

where \({\widehat{\theta }}_{it}^{*}\) is the latent or unobservable efficiency, \({\widehat{\theta }}_{it}\) is the observable efficiency, ui is the panel level variance component, and eit is the overall variance component. In general, the unobservable and the observable efficiency coincide except in the case of the most efficient DMUs, where efficiency does not surpass the threshold of 1 by construction.

Heteroskedasticity is a frequent problem in panel data estimations. In order to handle this issue, we have performed the estimations with observed information matrix (OIM) corrected standard errors. Regressors have been included in the equation sequentially to reduce the risk of multicollinearity.

Data and variables

Data

Our sample is composed by around 300 companies operating in Europe. Most of them are European firms; there are also affiliates of non-European multinationals working on European soil. Our time horizon is 2010–2019.

We have constructed a rich and detailed dataset with yearly information at the firm level. We have organized data in variables, which can be classified in internal and external (see Appendix 1 Table 14 for details of all the variables defined). Internal variables refer to key aspects of the business which can be controlled by the company (number of employees, total assets, turnover, size, activity, financial management, human resources management, environmental policies). External variables proxy for relevant features of the macroeconomic and institutional environment where firms operate. Internal variables vary over two dimensions, company and time (we get one observation per company and year). External variables are the same for all (or a subset of) companies; some change on a year-by-year basis (as the oil price); others are constant over time (as country of origin).

Internal variables

Internal economic variables have been constructed from the Amadeus database (Van Dijk, 2021), a rich and detailed collection of microdata disaggregated at the company level on a yearly basis. We use a set of internal variables in stage 1 and another set of internal variables in stage 2.

In the first stage of our empirical exercise, and in order to compute efficiency scores, we need to employ internal variables which proxy for inputs and output for each firm and year. We approximate the input labor with the total number of employees per company and year. The input capital is approximated by the quantity of total assets, in euros, also per company and year. In turn, output is proxied by turnover (i.e., operational revenues) in euros.Footnote 15 The use of these variables is in accord with the literature in the area (Ismail et al., 2013; Song et al., 2015; Sueyoshi & Wang, 2014, among others).

Figure 2 provides a first synthetic approximation to the evolution of these variables over time for the firms in our sample (the figure is constructed with averages over companies). As Fig. 2a shows, the average number of employees diminishes abruptly until 2015 and is fairly stable ever since. Average real assets (Fig. 2b) exhibits as well a decreasing profile over most of the time horizon but grows slightly in 2018. Real turnover (Fig. 2c) increases at the beginning of the period, decreases between 2012 and 2016, and partially recovers at the end. The figure points to a mixed performance of the industry over the time horizon considered, with large oscillations in resources and production.

Fig. 2
figure 2

a Average employees, b real assets, and c real turnover, 2010–2019. Note: Average real turnover and average real assets in thousand euros. They have been computed averaging over the companies in our sample. Source: Amadeus

In the second stage of the empirical exercise, we work with other internal variables which capture different aspects of firm performance and may be potentially associated with efficiency. Some of them have been already explored by the literature, as discussed in “Related literature and theoretical framework”. Most of these variables have been compiled or constructed from the data available in Amadeus.

One important aspect is firm size. To construct size indicators, first we have distributed the oil companies in our sample in five size categories: very big, big, medium, small, and very small. These categories are determined by the 90th, 75th, 50th and 25th percentiles of real turnover (defined as nominal turnover over the Harmonized Index of Consumer Prices, HICP, for the EU).

The size categories are as follows:

  • Very big: for those companies whose real turnover is higher than the 90th percentile of real turnover in the sample.

  • Big: if real turnover is less or equal than the 90th percentile and higher than the 75th percentile of real turnover in the sample.

  • Medium: if real turnover is less or equal than the 75th percentile and higher than the 50th percentile of real turnover in the sample.

  • Small: if real turnover is less or equal than the 50th percentile and higher than the 25th percentile of real turnover in the sample.

  • Very small: if real turnover is less or equal than the 25th percentile of real turnover in the sample.

Next, we have created dummy (1,0) variables for each of those size categories. The dummy very big takes the value 1 for company i in time t if the real turnover of company i in time t is higher than the 90th percentile of average real turnover in the sample; otherwise, it takes the value 0. We have proceeded similarly for the big, medium, small, and very small categories.

Another dimension we want to explore is activity. The firms in the sample belong to a sector with NACE code 19, Manufacture of coke and refined petroleum products; this sector is classified in two subsectors: 19.1 (Manufacture of coke oven products) and 19.2 (Manufacture of refined petroleum products).

Ninety-four percent of the firms in our sample are refineries, and 6% are coke plants. We have captured the main activity of firms with another dummy variable, refineries, defined as 1 if a particular firm i is a refinery in time t and 0 otherwise.

We have defined two more internal economic variables. The variable employee cost captures the total cost of the payroll over the number of employees. It is a proxy of the unit cost of labor. The solvency ratio is defined as current assets/current liabilities. It measures the firm’s capacity to confront short-term financial obligations. These two variables inform about the human resource and financial management of the firm, respectively.

Environmental aspects

The reduction and control of emissions and residuals have become an important element of the production processes of firms operating with fossil energies. In recent years, many companies in the sector have started to define and monitor environmental goals and to adopt less polluting practices. To explore the connection between environmental awareness and efficiency, we have gathered additional data and constructed two indicators.

Unfortunately, and although sustainability reporting has improved very much over the last decade, it is still largely in its infancy. Unlike what happens with economic and financial variables, there is not yet a unified source of detailed, quantitative information about environmental aspects, disaggregated by company. To circumvent this problem, we have collected and compiled a data set on environmental aspects manually, checking the annual sustainability reports of groups and companies individually for each year in the sample. Ismail et al. (2013) and Bang et al. (2019) also construct indicators from sustainability reports. Furthermore, most groups do not publish environmental information disaggregated at the plant or affiliate level. Instead, they elaborate reports at the group level and disclose figures of greenhouse gases for the whole group. In the case of affiliates, we assume that affiliates and individual firms follow the directives and policies stated for the whole group, and thus, the paths in emission reductions for the whole group and for the individual affiliates exhibit the same trend over time.

Detailed and timely environmental reporting may be considered as a proxy of sound managerial green practices. We have defined a variable, sustainability reporting, reflecting the commitment of a company with the pursuit of sustainable goals; it is a categorical variable equaling 1 for company i in year t if its group has published a sustainability report in t and zero otherwise.Footnote 16

Many companies follow the guidelines of the EU and the GRIFootnote 17 and control the emissions of CO2 and other substances related to the GHG effect. The path of reduction in GHG emissions is a result of the measures implemented to achieve environmental efficiency. More commitment with the environment results in more investment aimed toward less polluting technologies. We have proxied the degree of commitment with clean technologies using the information conveyed by the data of scope 1 GHG emissions, taken again manually from yearly sustainability reports.Footnote 18 We focus on scope 1 because those are the emissions that the firm can manage more directly. In addition, they are the lion’s share of total emissions of the firms in our sample. Usually, CO2 is the main component. Sueyoshi and Goto (2012b) use CO2 as the undesirable input in their analysis of 19 oil firms.

We have defined a categorical variable, GHG reduction, equaling 1 for company i in year t if the decrease in GHG scope 1 emissions with respect to the previous year is higher than or equal to 1.5% (the average yearly change in GHG in our sample), and 0 otherwise. More formally, if we define “GHG emissionsi,t” as the variable capturing the scope 1 emissions for firm i and year t, then

$${\mathrm{GHG\;reduction}}_{i,\;t}=\left\{\begin{array}{c}1\;if\;\frac{{\mathrm{GHG\;emissions}}_{i,\;t}-{\mathrm{GHG\;emissions}}_{i,\;t-1}}{{\mathrm{GHG\;emissions}}_{i,\;t-1}}\le -0.015\\ 0\;otherwise\end{array}\right.$$

where GHG emissions for each firm and year are proxied by GHG emissions for its group and year.

Among the most successful firms in our sample in terms of GHG reductions, there are not only large firms as ENI or BP but also smaller ones like Neste, INA, or Petrogal.

External variables

Several external variables help us assess the impact of the external economic and institutional environment. The first external variable is oil price, defined as the amount of dollars per barrel of the Brent-Europe. This information comes from Federal Reserve Bank of St. Louis (2021). To express this variable in real terms, nominal prices have been deflated with the Harmonized Index of Consumer Prices (HICP) for EU 27, from Eurostat. Economic activity is proxied by the growth rate of real GDP for EU 27. It has been deflated also with the HICP. Data on GDP come from Eurostat.

Figure 3a shows the performance of the real oil price, suggesting that real oil prices have indeed fluctuated substantially over the time horizon considered. Figure 3b illustrates the trajectory of economic activity over our time horizon.

Fig. 3
figure 3

Real oil price and real GDP growth, 2010–2019. Notes: a Real oil price: dollars per barrel of Brent Europe, deflated with HIPC for EU 27. b Real GDP growth: rate of growth of real GDP for EU 27, deflated with HIPC for EU 27

Moreover, efficiency might also differ by country and by larger geographical areas (as Western versus Eastern Europe) because of different degrees of technological diffusion, institutional environments, education of the labor force, and economic policy, among others. In order to test for these potential influences, we have defined other dummy variables which control for the country and geographical area of origin of companies. For example, the dummy Germany takes the value 1 for the companies operating in Germany and 0 for the rest. Likewise, the dummy Western Europe is 1 for companies in the area and 0 elsewhere.

Empirical results

As already mentioned, we have organized our empirical analysis in two stages. In the first stage, we compute efficiency scores for each firm and year in our sample. In the second stage, we explore the association between the efficiency scores obtained in the first stage and several candidate variables potentially correlated with them.

First stage: efficiency, levels, and evolution

Average efficiency: levels and trends

Banker (1993) showed that, under fairly general conditions (deviations from the frontier being independent and identically distributed), the expected value of the baseline DEA estimator in the single-output case converges to the true value in large samples. Korostelëv et al. (1995) relaxed the conditions for convergence for the single-output case and showed that, if pairs (x,y) have a strictly positive density, the baseline DEA estimator is the maximum likelihood estimator, although the rate of convergence is slow.

Our sample is relatively large (almost 300 firms and 2500 observations), and thus, it is reasonable to assume that baseline DEA provides consistent efficiency scores. Nonetheless, in the next section, we check the robustness of these results by computing as well bootstrap estimators (Daraio & Simar, 2007; Simar & Wilson, 1998, 2000, 2007).

We denote the efficiency score obtained with the baseline radial DEA procedure by \(\widehat{\theta }\). Table 2 provides summary statistics for \(\widehat{\theta .}\) Its mean across firms in our sample over the period 2010–2019 is 0.27, with a standard deviation of 0.24 (Table 2). The median is 0.19. This suggests a modest level of mean efficiency in the sample: Firms could reduce their input consumption 73% on average with respect to the best performers. This result is in accord with the excess capacity and low utilization rate documented by Lukach et al. (2015).

Table 2 Summary statistics, baseline DEA score, 2010–2019

Figure 4 displays the evolution of average efficiency over time. Average efficiency exhibits a decreasing trend until 2013 and grows thereafter, albeit it shrinks again in 2018.

Fig. 4
figure 4

Average efficiency over time, 2010–2019

Top performers according to efficiency

Table 3 details the top performers for the entire period, defined as companies with an average efficiency for 2010–2019 of 0.9 or more. There are three companies achieving the maximum efficiency, 1, for the whole period: Eni, Total Belgium, and Total Raffinage France. Other firms with excellent results are Tamoil, Waxoil, and Gilops, which register an average efficiency over the period larger than 0.98. Gunvor Deutschland and Repsol achieve a mean efficiency exceeding 0.95. Cepsa closes the list of top performers. There is some country heterogeneity among the top performers: Eni, Tamoil, and Waxoil are Italian, whereas Gilops is Belgian and Repsol and Cepsa are Spanish.

Table 3 Top performers, baseline DEA efficiency, 2010–2019

The best performing companies in terms of efficiency are rather stable over time; there is little internal mobility in the sample as far as efficiency is concerned.

Average efficiency and internal variables

It may be useful to conduct a first explanatory analysis of the connection of efficiency scores with various variables. Moreover, this exploration provides grounds for the second stage, where the evaluation will be carried out more thoroughly.

Company size

Size may be a key determinant of the performance of a company, especially if the production function does not exhibit constant returns to scale, as it is the case in this kind of industry, capital-intensive. In order to explore the potential connection between efficiency and size, we have distributed the oil companies in our sample in five categories. These categories are determined by the 90th, 75th, 50th, and 25th percentiles of real turnover, as detailed above.

Table 4 displays summary statistics of efficiency for each category of size. The last column reports the p value of the Krishnamoorthy and Yu (2004) test of equality of means for each category and the rest of the sample.Footnote 19 This test should be considered with caution and just as an orientation, though, since it assumes normality.

Table 4 Efficiency by size, 2010–2019

According to Table 4, efficiency is not homogeneous across different sizes. The highest average efficiency is reported by very big companies, whose efficiency (0.6) greatly exceeds the global average efficiency (0.27). Very small companies register an average efficiency of 0.32, larger than the global average. Medium firms exhibit the lowest value of average efficiency; the average for big and small firms is also small and below the global average efficiency. The null hypothesis of equality of means is rejected for all sizes.

These results are similar to those of Ismail et al. (2013), who argue that in their sample efficiency is higher in very large and very small firms. They also agree with Mekaroonreung and Johnson (2010), who find in a sample of US oil companies that very specialized, small firms perform better than the rest of the sample. Other industries, as the biopharmaceutical, display this behavior as well (Díaz & Sanchez-Robles, 2020, 2022). This pattern is consistent with the coexistence of increasing returns to scale and niche advantages due to specialization in the production technology. The standard deviation of efficiency by size category is larger for the very big and very small companies, suggesting a higher level of heterogeneity in those categories.

Figure 5 shows the evolution of efficiency over time by size categories. Efficiency in very big firms has been rather consistently larger and steady over the period, although with a dip in 2015. Efficiency in big, medium, and small firms has also been quite stable. In very small firms, instead, efficiency has been more volatile over time. This last point also agrees with Mekaroonreung and Johnson (2010), who state that small firms are more vulnerable to oscillations in oil prices.

Fig. 5
figure 5

Efficiency by size, 2010–2019

Activity

Refineries exhibit a higher value of average efficiency, ten points above coke plants (Table 5), although this result should be taken with caution due to the asymmetric number of firms in either category in our sample. As above, the last column of the Table displays the p value of the Krishnamoorthy and Yu (2004) test of equality of means for both subsectors, which can be rejected at the 99% significance level.

Table 5 Efficiency by main activity, 2010–2019

Environmental commitment

We have compared the average efficiency for the companies exhibiting a higher level of environmental commitment, as captured by the two variables discussed above, and for the rest. Table 6 shows the main results. Average efficiency is 0.47 for the firms regularly elaborating and publishing their sustainability report, while it is 0.25 for those who do not. The difference is significant at conventional values. The situation is very similar for the firms reducing scope 1 emissions in a proportion equal to or larger than 1.5% per year (0.48 versus 0.27). This suggests tentatively that greater environmental awareness can be associated with larger efficiency scores in oil companies. The standard deviations of efficiency in the environmentally conscious groups (measured by GHG reduction and environmental reporting) are greater than those of the non-environmentally conscious group.

Table 6 Efficiency and environmental commitment, 2010–2019

Average efficiency and external variables

Geographical variables

We have explored the differences between firms from Western Europe (74% of the sample) and Eastern Europe (26%).Footnote 20 Average efficiency is one point higher in Western European countries (0.28) when compared with Eastern Europe (0.27). The p value of the Krishnamoorthy and Yu (2004) test of equality of means is 0.26, non-significant. Therefore, we cannot reject the hypothesis that the average efficiency in both areas is the same. Dispersion is larger for Western Europe countries. These figures suggest that the gap in efficiency between Western and Eastern Europe in this sector seems to be very small or negligible, implying that technological diffusion in Europe is rather advanced (Table 7).

Table 7 Efficiency by geographical area and country, 2010–2019

Italy has the largest number of observations in the sample (1176 versus 2484), 47.34% of the total. Efficiency is slightly lower in Italy than in the whole sample (0.26 versus 0.27). The p value of the test of equality of means is significant at 1% level, suggesting that the averages are indeed different. Within the Western countries, the best performers in terms of average efficiency are Spanish (0.47) and Belgian (0.40) firms.Footnote 21 This fact can be traced back to the existence of two very efficient multinational plants in Belgium, Total, and Esso. In turn, there are two solid oil companies (Repsol and Cepsa) operating in Spain.

Regarding Eastern Europe, average efficiency is slightly higher than the global mean in Romania (0.31), which historically has been an important oil producer since the beginning of this industry, in the mid-nineteenth century. Average efficiency is somewhat lower than the global mean for Ukraine (0.24). Although there are differences, by and large efficiency is rather homogeneous across countries in Europe. This is reasonable because of the traditional internationalization of oil companies, entailing FDI flows among countries, which has favored convergence in technology and processes. The gradual accession of many countries in the sample to the EU has sped up this trend.Footnote 22

Second stage: econometric analysis of variables potentially correlated with efficiency

In this section, we discuss the main results from the estimation of several econometric models designed to explore the correlation between efficiency, on the one hand, and different aspects of firm management and the macroeconomic and institutional setup where companies operate, on the other.

The basis for this exploration is the estimation of Eq. (8).

$$\widehat{\theta }_{it}={z}_{it} \beta +{\epsilon }_{it}$$
(8)

where \(\widehat{\theta }\) represents the efficiency scores constructed in stage 1 for each firm and year within the framework of baseline DEA and discussed in the previous subsection. The regressors in z are internal and external variables, as described above.

In order to correct for heteroskedasticity, we have performed the estimations with observed information matrix (OIM) corrected standard errors. Regressors have been included in the equation sequentially to reduce the risk of multicollinearity. For example, the first column of Table 8 displays the results from the estimation of the following model:

Table 8 Efficiency and internal variables
$$\mathrm{Efficiency scores}={\beta }_{0}\times \mathrm{intercept}+{\beta }_{1}\times \mathrm{refineries}+{\beta }_{2}\times \mathrm{year }2010+{\beta }_{3}\times \mathrm{year }2013+\mathrm{error term}$$

Internal variables

Table 8 summarizes the results from estimating different versions of the baseline Eq. (8). Covariates capture internal company features. One potentially important internal feature is activity. The dummy variable refineries are positive and significant at the 90% level in model 1, backing up the initial evidence described in “Average efficiency and internal variables,” whereby average efficiency is larger for refineries than for coke plants.

As discussed, size appears to be associated to efficiency in our sample. To test this issue further, we have included in the estimation the dummy variables for each of the five size categories defined in “Methodology.” The results in models 2–6 provide interesting insights in this regard. The dummy very big in model 2 is positive and significant at 99% level, showing that firms registering real turnover larger than the 90 percentile are more efficient, ceteris paribus. The dummy big is also positive (model 3) but not significant. The dummy capturing the medium size is negative but not significant (model 4). As conveyed by model 5, small firms exhibit lower levels of efficiency, ceteris paribus, according to the negative sign of the corresponding dummy, significant at 99%. Finally, the dummy very small displays a positive and significant association with efficiency (model 6). These results confirm those of “Average efficiency and internal variables” above and are consistent with other contributions, as Ismail et al. (2013) and Mekaroonreung and Johnson (2010). They are also reasonable from an economic point of view. Large firms can exploit scale economies — associated to large investments in technology, machinery, and equipment — and reach higher levels of productivity, whereas medium and small firms cannot (Lim & Lee, 2020). Very small firms may profit from their size and enjoy niche advantages associated to specialization (Mekaroonreung & Johnson, 2010).

As suggested by models 2–6, companies achieving a reduction of greenhouse emissions of 1.5% or more in the period analyzed are more efficient. The point estimate is fairly stable in all estimations and significant at 95% (except in model 2). Companies with a more transparent reporting policy about sustainable goals and achievements do not seem to register higher levels of efficiency, ceteris paribus: The variable sustainability reporting is positive but not significant. Finally, the dummies for years 2010 and 2013 are positive and negative, respectively.

Human resource and financial management are key internal aspects of firms since they handle the inputs labor and capital, respectively. They have proved to be associated with efficiency in other sectors (Díaz & Sanchez-Robles, 2020, 2022).

The financial structure of companies has been captured with the first difference of the solvency ratio, defined as current assets/current liabilities. The human resource management of companies has been proxied by the unit cost of employees, defined as total costs in employees over number of employees. In order to circumvent potential endogeneity issues, we have worked with the first lag of this variable.Footnote 23

Table 9 displays some results from estimations including these variables. Increases in the level of solvency, as captured by mounting ratios of current assets to current liabilities, are positively associated with efficiency. The point estimate is stable across estimations and significant at 99% level. The variable is robust to the introduction of most regressors capturing size, which maintains their sign with respect to Table 8. Now, however, the dummies for big firms and medium size firms are positive and significant and negative and significant, respectively (models 9 and 10).

Table 9 Efficiency, human resources, and financial structure

The variable capturing unit employee costs is not significant when considered for the whole sample. If the sample is divided by size, it is positive and non-significant for the very big and big firms (model 13). It is negatively and significantly associated to efficiency for medium, small, and very small firms, however, suggesting that an increase in the employee costs reduces efficiency for these companies (model 14). This is in accord with Al-Najjar and Al-Jaybajy (2012), who report that one of the reasons of inefficiency in their sample of Iraqi oil companies is the excess of workforce and its underutilization. Lukach et al. (2015) attribute as well to rising human resource costs part of the loss of competitiveness of oil companies operating in Europe.

External variables

The external economic environment where firms operate influences their performance. In order to control for this aspect, we have included in the estimations an indicator of the dynamism of economic activity, the rate of growth of real GDP for the EU 27. Some papers have uncovered a positive association between oil prices and efficiency, and thus, this is other potential external variable (Sueyoshi & Wang, 2018). Finally, we have included several country and area dummies as covariates in our baseline equations.

Table 10 summarizes the main results in this regard. Real GDP growth displays a positive and significant association with efficiency (models 15–21), suggesting a procyclical behavior of efficiency.Footnote 24 This effect, which has also been detected in other industries, is reasonable, since a strong level of activity reduces slackness in the use of resources (Díaz & Sanchez-Robles, 2020, 2022).

Table 10 Efficiency and external variables

Models 15–21 include the real price of Brent oil as a regressor. The point estimate is positive and significant, suggesting that periods of escalating oil price are correlated with higher levels of efficiency. Two macroeconomic effects may be intertwined here. On the one hand, both GDP growth and oil prices display positive partial correlations with efficiency. The partial correlation of growth and real oil prices, however, is − 0.5289, significant at 99%. It can be argued, then, that high oil prices foster efficiency directly but indirectly reduce growth and hence ultimately jeopardize efficiency. The direct effect of oil prices on efficiency partially offsets their negative impact through the indirect channel of slower growth.

We have explored the link between efficiency and the social and cultural features of the country where the affiliate or firm is located by including area and country dummies. We have constructed an additional dummy, Western Europe, equaling 1 if the firm is located in a country belonging in this area, 0 otherwise. This dummy may shed some light on the degree of technological diffusion in the industry. For example, if firms in Western Europe are systematically more efficient than companies in the East, this will suggest that Eastern European countries have not totally converged in technology with their western counterparts and that technological diffusion has been incomplete. The dummy for Western Europe is positive but not significantly correlated with efficiency (model 20), suggesting that Western European firms do not systematically allocate their resources more efficiently than those in Eastern Europe and thus that technological catch up is almost total.

The dummy for Italy is negative and significant (model 20); the categorical variable for Belgium, instead, is positively and significantly correlated with efficiency, implying that firms in that country are ceteris paribus more efficient (model 21). The variables capturing size exhibit similar signs and point estimates than before.

Sensitivity analysis

Efficiency scores

We have performed a sensitivity analysis of our results in two main dimensions. First, we have checked the adequacy of the efficiency scores computed in the framework of baseline DEA; moreover, we have explored the robustness of the correlation between the efficiency scores and the control variables employed in the second stage.

The baseline DEA framework does not consider explicitly the possibility of measurement errors or sample bias in the data. Typically, the efficient frontier and the underlying data generating process (DGP) of the efficiency scores are unknown. In a series of influential papers, Simar and Wilson (1998, 2000, 2007) and Daraio and Simar (2007) design some bootstrapping tools which, by means of repeated sampling, provide approximations to the unknown distribution of the DGP of efficiency and enable the computation of bias-corrected scores. This methodology can also be used to compute standard errors and confidence intervals of the efficiency scores at a specific significance level. The size of our sample suggests that the scores computed within the baseline DEA framework may be regarded as consistent, according to Korostelëv et al. (1995). Nonetheless, we have compared them with a new set of efficiency scores, \(\widehat{{\theta }_{b}}\), estimated by means of the Simar and Wilson (1998) bootstrap methodology (henceforth SW). We have used the same data and variables as in our exercise employing baseline DEA and set up the computation under variable returns to scale and input orientation as well.Footnote 25

Table 11 compares some descriptive statistics of the efficiency scores obtained by the baseline DEA and the bootstrap models, \(\widehat{\theta }\) and \(\widehat{{\theta }_{b}},\) respectively. \(\widehat{{\theta }_{b}^{U}}\) and \(\widehat{{\theta }_{b}^{L}}\) are the upper and lower bounds of the 95% confidence interval. Means and standard deviations computed by bootstrap methods are slightly lower. This is reasonable for two main reasons: first, because bootstrap repeated sampling acts as a smoothing procedure which tends to give less preponderance to observations with extreme values and more to those which are close to others (and hence are more likely to be included several times in different resamplings); second, because the baseline DEA tends to compute upward biased efficiency scores. Anyway, the means of \(\widehat{\theta }\) and \(\widehat{{\theta }_{b}}\) are rather close. The difference between them, 0.05, is below the figure reported in other studies, as López-Penabad et al. (2020). Hanrui and Xun (2011) find a difference of 0.02 between the original DEA and the Bootstrap DEA scores, which is not far from our result.

Table 11 Descriptive statistics, DEA and bootstrap efficiency

The mean of the DEA estimator is above the upper bound of the 95% confidence interval for just one point (0.27 versus 0.26). The Mann–Whitney test cannot reject the null hypothesis that the distributions of the DEA and the upper bound of the 95% confidence level are equal (p value 0.43).

Since the distributions for the efficiency scores are typically skewed, it seems reasonable to compare the medians as well. According to Table 11, the medians for \(\widehat{\theta }\) and \(\widehat{{\theta }_{b}^{U}}\) are quite similar. In fact, the median efficiency computed by DEA lies within the 95% confidence interval of the median of the bootstrap efficiency.

We can conclude that, while there is not total convergence between the DEA and the bootstrap estimators, the DEA estimator does converge in distribution to the upper bound of the 95% confidence interval estimator. Moreover, medians do converge. This suggests that the set of baseline DEA efficiency scores is a reasonable approximation to the true scores for our sample.

The last column informs about maximum values. As the Table conveys, the baseline model places some DMUs on the frontier, with efficiency of 1, while the bootstrap method does not: in the SW framework DMUs may theoretically approach an efficiency score of 1, but this occurs with 0 probability.

Table 12 informs about partial correlations between the estimators. The correlation coefficient between the mean efficiencies of the DEA and bootstrap models is high, 0.95. The correlation between the DEA scores and the upper bound of the confidence interval is 0.97, larger than the correlation between \(\widehat{\theta }\) and \(\widehat{{\theta }_{b}^{L}}\). The higher correlation between \(\widehat{\theta }\) and \(\widehat{{\theta }_{b}^{U}}\) is to be expected due to the upward bias of baseline DEA scores. Again, we attribute the small discrepancy among them to the different treatment of efficient DMUs in either methodology. It is well known that the bootstrap estimator does not perform equally well in the proximity of the efficient frontier (Simar & Wilson, 1998). Nonetheless, we intend to explore this issue more thoroughly in future research.

Table 12 Partial correlations

Figure 6 in Appendix 2 compares the two distributions of efficiency scores as computed by the DEA and the bootstrap methodology. The interquartile range is somehow larger for the baseline model. This is also reasonable since the SW estimator does not assign scores of 1 in practice, while the baseline does. Figure 7 in Appendix 3 displays the evolution of the means of both sets of efficiency scores over time and shows that the time pattern is almost identical.

By and large, and although we have not found convergence in means, we may conclude that there is a remarkable level of similarity between the results from the baseline DEA and the bootstrap estimators, which backs up the results discussed in “Empirical results.”

Second stage results

Another drawback of baseline DEA is the potential presence of serial correlation in the efficiency scores. Simar and Wilson (2007) design a strategy in order to handle this issue. They propose a double bootstrap procedure which provides consistent results. Basically, the first bootstrap computes the bias-corrected efficiency scores, as discussed in “Efficiency scores.” The second bootstrap obtains a set of estimates of the first and second moments of the parameters of interest in a truncated regression of efficiency scores on the control variables of the following form, in matrix notation:

$$\widehat{{\theta }_{b}}=z\delta +\xi$$
(9)

where the error term ξ corresponds to a normal distribution with left truncation. Maximum likelihood estimation in this setting provides consistent estimators of δ. A common assumption about the DGP in Eq. (7) or (9) is that the error term follows a censored distribution, since efficiency scores cannot exceed 1 by construction, and hence, the appropriate specification for the second stage is a Tobit model. This was our hypothesis in “Second stage: econometric analysis of variables potentially correlated with efficiency.” Simar and Wilson (1998, 2007) argue, however, that the correct DGP is not censored but truncated because the upper limit of 1 is a true feature of the distribution and not an artifact of the computation procedure.

This is a controversial point. Some researchers still prefer the Tobit model (Greene, 2003). McDonald (2009) argues that the dilemma between the censored and the truncated models is primarily methodological and conceptual, and that in practical applications the Tobit model provides more robust results. The clarification of this issue is beyond the scope of this paper. Anyway, we choose to check the robustness of the second stage results to the assumption of a Tobit versus a truncated distribution by using the SW approach as well.

Notice that there are three main differences between the baseline DEA (Eq. 7) and the SW (Eq. 9) approaches: (i) the dependent variable, constructed by the DEA baseline estimation in the first case and the bootstrap replications in the second; (ii) the distribution of the error term, censored in Eq. (7) and truncated in Eq. (9); and (iii) the correction for serial correlation, implemented in Eq. (9) but not in Eq. (7).

Table 13 displays the main results obtained with the SW procedure. Qualitative results are roughly the same in the Tobit and in the SW models, although the SW methodology seem to provide more efficient estimations (Simar & Wilson, 2007). By and large, signs and order of magnitudes are similar in both cases.

Table 13 Second stage results, Simar and Wilson model

There are a few differences, though. The variable big is positive in the Tobit estimations, non-significant in two cases (models 3 and 16; Tables 8 and 10) and significant in another (model 9; Table 9); it is negative and significant at the 95% significance level in the SW specification (model 23). Since the mean efficiency for big firms is 0.23, below the global mean (Table 4), we conclude that in this case, the SW result is more plausible. Another difference has to do with the external variables capturing general conditions for oil firms, growth, and oil prices. They are positive but non-significant in the SW specification (model 20), in line with Putra and Adinugraha (2018).

By and large, the Simar-Wilson methodology provides similar results to those obtained with the baseline DEA and the Tobit estimation and presented in “Empirical results.” This, in our view, provides robustness both to the efficiency scores computed in the first stage and to the association between efficiency and other variables found in the second stage.

Concluding remarks and policy recommendations

This paper analyzes the level and evolution of efficiency in oil companies operating in Europe. It also explores variables potentially associated with efficiency, both internal and external. Our sample is encompassed by almost 300 firms over the period 2010–2019. The main insights from our empirical exercise are as follows:

  1. 1.

    The average level of efficiency in the sample is relatively low, 0.27, and has exhibited a decreasing trend in the period 2010–2019. This is in accord with our first hypothesis. Top performers (Guvnor, Neste, Total, ENI) are quite stable over time.

  2. 2.

    Efficiency is positively associated with size, activity, financial stability, controlled labor costs, and environmental commitment. Very large firms (with turnover higher than the 90% of the distribution) exhibit higher levels of efficiency, ceteris paribus, suggesting the presence of scale economies in the industry. The existence of increasing returns is reasonable due to the requirements in terms of technology, machines and equipment which entail considerable volumes of assets and fixed costs for companies. Very small firms, however, also perform well in term of efficiency, implying the presence of niche competitive advantages associated to specialization. While we hypothesized beforehand the presence of increasing returns, the sound performance of very small firm suggesting niche advantages was unexpected. Mounting employee costs jeopardize efficiency, but only for medium, small, and very small firms, and not for the firms in the top quartile.

  3. 3.

    From the point of view of environmental variables, large reductions in greenhouse emissions (i.e., above the sample mean) are positively correlated with efficiency. These results suggest that there is not a trade-off between environmental and operational efficiency. Instead, they mutually enhance each other. Our evidence, however, does not suggest that the mere disclosure of information about sustainability goals and performance (i.e., the publication of a sustainability report by the company) is enough to impact efficiency significantly.

  4. 4.

    In terms of macroeconomic variables, results suggest that efficiency in oil companies is procyclical; it is positively correlated with oil prices as well, although these two findings are not robust. According to our empirical analysis, the technological catch up of Western European firms by their Eastern European counterparts seems to be almost complete. There are not substantial differences in efficiency across the majority of countries.

  5. 5.

    We have complemented the results from the DEA baseline with those obtained with the Simar-Wilson methodology in both stages. There is not a total convergence in the means of both sets of efficiency scores, but the medians fall in the same 95% confidence interval. Both sets of efficiency scores display a high correlation (95–97%) and a similar time pattern. The basic messages of the baseline second stage carry over to the Simar-Wilson methodology, which nonetheless provide more efficient estimates.

These findings have practical implications. Companies with low levels of efficiency are not sustainable in the long run and require careful analysis so that the best strategy for the future is designed and implemented. This strategy will not be the same for all companies. Some of them may still be able to recover through investment, rationalization, and modernization; in other cases, survival will not be possible. Moreover, stakeholders and firm managers should keep in mind that, according to our results, the consolidation of the industry by means of mergers and acquisitions is feasible in the future. This process would bring companies closer to their optimal size, where increasing returns can be exploited (Lim & Lee, 2020). Medium size and small firms with poor performances are particularly at risk, especially if their managers do not strive to reduce inefficiencies and boost productivity.

Our analysis provides some insights which may be useful to orientate the industrial policy for this sector; this discussion is especially relevant now in Europe because of the launching of the Next Generation Plan, which provides funds for the transformation of the economy.

Our exercise has shown that average efficiency in the European oil sector is modest and declining; low levels of efficiency and overcapacity, in turn, are damaging the competitiveness of this industry. One avenue for the correction of these shortcomings is to allow the gradual reallocation of resources from low efficiency to high efficiency companies. Barriers to exit (especially from local authorities) which hamper the closure of inefficient plants because of political reasons should be gradually suppressed (Nivard & Kreijkes, 2017). Consolidation of this industry through mergers and acquisitions is also desirable and should not be prevented by policymakers. Legislation which entails higher bureaucratic, fiscal, and labor costs for large firms and hinder company growth should be discouraged as well (Hsieh & Klenow, 2014). These processes could increase global productivity in this sector and the rest of the economy and foster economic growth (Acemoglu et al., 2018; Aghion & Howitt, 1992; Hsieh & Klenow, 2009; Lentz & Mortensen, 2008). Instead, measures like indiscriminate subsidies for firms in the sector could maintain underperformers artificially active and perpetuate inefficiencies.

In parallel, the modest levels of efficiency in the industry suggest that the growth in oil prices should not be automatically passed through to fuels and other consumer products. They can be absorbed by the producers by means of increases in efficiency through rationalization, better resource reallocation, and process innovation. This implies that policies intended to subsidize the price of oil products to final consumers may not be the best solution and should be carefully considered before implementation.

Finally, policymakers should assess carefully the upsides and downsides of new and existing regulations for this industry, especially about the labor market, since excessive employee cost seems to be one reason underlying poor efficiency. In particular, European authorities should avoid measures which introduce further rigidity in labor markets.

The industrial policy for this sector, in any case, must be very prudently designed and implemented. Measures should be conceived prioritizing the long run over the short run and economic goals over political goals.

Because of the crucial role of the oil sector in the supply chain of most products and services, an increase in the efficiency of this industry will spillover to the rest of the economy, favoring its smooth operation. The restructuration and consolidation of the sector may also be convenient from the macroeconomic point of view. A more solid and productive oil industry may help decrease the dependence of Europe from oil imports, something particularly desirable in a scenario of geopolitical risks.

The absence of environmental data for a part of our sample is the main limitation of our paper. In future research, we intend to circumvent this issue and further explore the association between efficiency, environmental commitment, and other indicators of firm management. One promising avenue is to compare European and non-European firms, assessing the impact of environmental regulations in each case.