1 Introduction

Stock performance and portfolio evaluation and optimization are critical issues in financial economics (Grinblatt & Titman, 1989; Peñaranda, 2016; Hodoshima, 2019). Markowitz (1952) argued that such evaluation must account for the returns and risks associated with stocks or portfolios. Investors and portfolio managers, however, may find it challenging to deal with multiple return and risk levels (and for multiple stocks). Based on the capital assets pricing model (CAPM) and the equilibrium theory, Sharpe (1966, 1994) proposed that one can reduce this problem into a two-dimension setting. This has resulted in a notable rise in research related to financial markets, encompassing stock markets (Cantaluppi & Hug, 2000; Bailey and Lopez de Prado, 2012; Peñaranda, 2016; Hodoshima, 2019) and, more recently, the cryptocurrency markets (Liu, 2019; Kumaran, 2022; Letho et al., 2022).

Stock performance is not only associated with the firm’s financial performance, where returns and risks play important roles (Dutta et al., 2012; Dang & Nguyen, 2020; Atukalp, 2021; Mirzaei et al., 2022). In recent years, research has demonstrated that the environmental, social, and governance (ESG) performance of firms can have both direct and indirect impacts on stock returns (Edmans, 2011; Friede et al., 2015; Okafor et al., 2021; Whelan & Atz, 2021). With increasing awareness and concerns surrounding environmental and climate issues (Ngo et al., 2022a), ESG has become increasingly important for both businesses and investors (Gillan et al., 2021; Edmans, 2022). While studies on ESG and stock performance have traditionally used a qualitative approach, such as negative screening (Amel-Zadeh & Serafeim, 2018), there have been recent attempts to use quantitative methods to assess the ESG performance of firms and its impact on stock performance, including Qi and Li (2020), Pedersen et al. (2021), and Cesarone et al. (2022), among others. Pedersen et al. (2021) argued that ESG could also be included in the objective function of the optimization, in which there is a trade-off between SR and ESG. In other words, if two investors have the same risk aversion but different ESG preferences, the one concerned more about ESG would choose a portfolio with higher ESG but lower SR. In contrast, if the two have the same ESG preference, the investor with a higher risk aversion should select a high-SR but low-ESG portfolio (Pedersen et al., 2021). In this sense, Pedersen et al. (2021) extended the two-dimensional setting of SR (i.e., returns vs. risks) into a two-dimensional setting of SR vs. ESG. Such a combination of the two elements helps create an ESG-efficient frontier.

However, the idea of an efficient frontier is not new and can be traced back to the production possibility frontier and the efficient production function (Farrell, 1957). In contrast, the non-parametric data envelopment analysis (DEA) is widely used as a flexible method to deal with the multiple inputs/outputs settings of the financial sector (see, for example, the reviews of Liu et al., 2013; Emrouznejad and Yang, 2018). Specifically, DEA is a method of operations research that looks at optimizing the outputs, given a set of fixed inputs; or minimizing the used of inputs, given a certain level of outputs (Coelli et al., 2005; Cooper et al., 2006). Previous DEA studies, however, did not attempt to either (i) incorporate ESG into the DEA examination of portfolio selection or (ii) employ SR as a single index in DEA. Additionally, we use several DEA developments to achieve the best results. In particular, we use the base point slacks-based measures (SBM) proposed by Tone et al. (2020) to consider both input minimization and output maximization, as well as negative data (e.g., SR can be negative). Moreover, we use the double frontiers approach (Wang & Chin, 2009; Azizi, 2014) instead of a single DEA frontier in our analysis to overcome the sensitivity issue of DEA. Also, we further follow Simar and Wilson (2007) and use a double bootstrap technique to better examine the determinants of efficiency (e.g., market capitalization and institutions), given the unknown distribution of our data. To the best of our knowledge, this is the first study to implement this strategy; we coin this method the ESG-SR double-frontier double-bootstrap (ESG-SR DFDB) approachFootnote 1.

Our empirical application of the ESG-SR DFDB uses data from the global energy industry. It is argued that the ESG risks in this industry are higher than the others (Behl et al., 2021); therefore, we believe this sample should better reflect the trade-off between ESG and SR; thus, it can appropriately illustrate the ESG-efficient frontier. Our analysis suggests that only around 11% of our sample firms perform well in the multi-dimensional ESG-efficient frontier and can be the ‘fund of funds’ that investors could select it (Vidal-García et al., 2018). More importantly, our findings show that a firm’s characteristics (e.g., market capitalization, size, and corporate governance) and a country’s institutions (e.g., voice and accountability) can influence the ESG-SR efficient frontier.

Our contribution to the literature is, therefore, threefold. First, we propose a multi-dimensional approach to examine the trade-off between ESG (i.e., sustainability) and Sharpe ratio (i.e., risks and returns), while previous studies (e.g., Bailey and Lopez de Prado, 2012; Kumar et al., 2016; Pedersen et al., 2021) are limited to a two-dimensional perspective. Second, we employ an advanced DEA model to deal simultaneously with negative data and the input minimization and output maximization (through the base point SBM approach), the sensitivity issue (through the double-frontier approach), and the unknown distribution of the DEA efficiency scores as well as the endogeneity issue of DEA’s determinants (through the double bootstrap approach). Third, we are the first to empirically examine the ESG-SR efficiency of the global energy sector, given the high ESG risk in this industry.

The rest of the study is organized as follows. Section 2 reviews the relevant literature on portfolio optimization, Sharpe ratio (SR), ESG, and, more recently, the ESG frontier. Section 3 briefly explains the technical aspects of the ESG-SR DFDB approach proposed in this study. Section 4 describes the data and reports and discusses the empirical findings. Section 5 concludes and suggests future research directions.

2 Literature review

2.1 Portfolio optimization and the Sharpe ratio

Stock performance and portfolio evaluation/optimization are fundamental issues in financial economics (Grinblatt & Titman, 1989; Sharpe, 1994; Peñaranda, 2016; Hodoshima, 2019). Sharpe (1966) reduced the CAPM problem into a two-dimension setting of risk versus returns, which makes it easier for investors to optimize their portfolios. In short, it is argued that because of the trade-off between risk and return, one could evaluate a portfolio performance by comparing the difference in returns between the portfolio and a risk-free asset, such as the treasury bill. In contrast, the higher (positive) difference, the better portfolio performance is (Sharpe, 1994). Combining such portfolios can form a two-dimensional SR frontier (Cantaluppi & Hug, 2000; Vidal-García et al., 2018).

Grinblatt and Titman (1989) explored several criticisms regarding previous measures of portfolio performance and optimization, including the Sharpe ratio (SR), in terms of identifying an appropriate benchmark portfolio, the probability of risk overestimation, and the incapability of informed investors to generate positive risk-adjusted returns. They concluded that the unconditional mean-variance efficient portfolio of assets considered tradable by investors, i.e., the unconditional Sharpe ratio (SR), can provide correct insights about the portfolio’s performance (Grinblatt & Titman, 1989). Using US stock data, Peñaranda (2016) further developed two types of portfolio optimization based on the conditional asset pricing models and the SR: the first type maximizes the unconditional Sharpe ratio of excess returns, while the second one maximizes the conditional Sharpe ratio options. A recent study by Hodoshima (2019) used both the SR and the inner rate of risk aversion (IRRA) to evaluate the stock performance of a selection of US stocks. While the two measures account for risk and return simultaneously, the SR accounts for risk only with the standard deviation; thus, it is less sensitive to losses than gains compared to the IRRA. The author consequently suggested that developments of the SR to incorporate more dimensions/aspects are needed (Hodoshima, 2019).

Stock performance does not only depend on the firm financial performance, where returns and risks play important roles (as in the Sharpe ratio) but also due to other factors (Dutta et al., 2012; Atukalp, 2021; Chen et al., 2021; Mirzaei et al., 2022). A study by Dutta et al. (2012), using Indian data, identified determinants of stock performance using various financial ratios. The logistic regression results indicated that eight financial ratios provide a good prediction of outperforming stocks based on their rate of return with 74.6% accuracy. Their study, however, did not consider macroeconomic factors. Rjoub et al. (2017) further considered both micro and macroeconomic factors affecting bank stock prices in Turkey. More specifically, their findings suggested that money supply, interest rate, and economic crisis significantly affect bank stock prices. Nonetheless, this implies that investors should account for firm-specific information and macroeconomic factors when making investment decisions. When accounting for the impact of the financial crisis, Dang and Nguyen (2020) also demonstrated that ex-ante liquidity risk may intensify the price reduction of stocks during the crisis period. However, Atukalp (2021) used different methods (e.g., CRITIC method, TOPSIS method, and Spearman’s rank correlation) on Turkish deposit banks and showed the presence of no relationship between financial performance rankings and the stock return ranking.

Furthermore, several studies further examined the determinants of stock performance when considering firm efficiency derived from efficiency frontier techniques. One may argue that efficiency measures better explain stock returns compared to conventional accounting ratios (Beccalli et al., 2006; García-Herrero et al., 2009). Indeed, efficiency frontier approaches have become one of the common methods in examining firms’ efficiency in finance for banks, stock markets, mutual funds, pension funds, and insurance (Boubaker et al., 2018, 2021; Vidal-García et al., 2018; Le et al., 2021). Using European data, Beccalli et al. (2006) concluded that banks’ stock prices are due to a change in efficiency derived from both parametric and non-parametric approaches. Their findings emphasized that the stock performance of cost-efficient banks is greater than those of less efficient peers. In the same vein, Kirkwood and Nahm (2006) found that Australian stock returns are significantly associated with changes in profit efficiency, and this relationship is more pronounced in the case of regional banks. Similar results when observing other efficiency perspectives (e.g., technical efficiency, allocative efficiency, scale efficiency, and productivity) are also obtained by Sufian and Majid (2009) in China and Erdem and Erdem (2008) and Saadet and Adnan (2011) in Turkey.

On the other hand, Ioannidis et al. (2008) suggested that bank stock returns in Asia and Latin America are affected by changes in profit efficiency but not cost efficiency changes. Their findings reemphasized that profit efficiency is seemingly more powerful in explaining stock returns than traditional accounting profit measures. When considering the impact of the COVID‒19 pandemic, Mirzaei et al. (2022) pointed out that risk-adjusted efficiency scores deriving from a non-parametric approach can explain Islamic banks’ stock return but not for conventional counterparts. The results still hold when using alternative efficiency models and different measures of stock returns.

2.2 ESG and stock performance

In recent years, it has been proven that environmental, social, and governance (ESG) performance can, directly and indirectly, influence stock returns (Friede et al., 2015; Okafor et al., 2021; Whelan & Atz, 2021). For instance, Friede et al. (2015) reviewed more than 1800 studies (1970‒2015) on the ESG-financial performance, of which about 1400 had involved the environmental component E of ESG. They found that, on average, nearly half of them identified a positive relationship between ESG and financial performance. For post-2015 studies, Whelan and Atz (2021) showed that only a few of them found a negative correlation between ESG and financial performance. Particularly, Hong and Kacperczyk (2009) found that sin stocks (e.g., alcohol and tobacco), a weak proxy for the social component S of ESG, have higher premiums than their counterparts, whilst Edmans (2011) argued that stocks with higher employee satisfaction, a stronger proxy for S, can also generate positive abnormal returns. In addition, studies such as Gompers et al. (2003) and Peiris and Evans (2010) showed that stocks with good governance (component G of ESG) outperform the ones with low governance.

Traditionally, stock performance and portfolio selection/optimization involve ESG under a qualitative approach, e.g., negative screening (Amel-Zadeh & Serafeim, 2018). Recently, there have been several attempts to quantitatively utilize SR in assessing stock performance regarding the ESG (performance) of the firms, including Qi and Li (2020), Pedersen et al. (2021), and Cesarone et al. (2022), among others. For instance, Qi and Li (2020) used ESG as additional constraints for their Markowitz-based portfolio optimization. By comparing the performance of sustainable-investment mutual funds and conventional mutual funds using monthly data from 27 component stocks from the Dow Jones Industrial Average Index (from January 1, 2004, to December 31, 2013), Qi and Li (2020) argued that sustainable investors can still obtain a maximum SR like that of conventional portfolios even when ESG constraints are imposed (although the portfolio weights can differ). Pedersen et al. (2021) extended the two-dimensional setting of SR (i.e., returns vs. risks) into the two-dimensional setting of SR vs. ESG.

2.3 The ESG-based and SR-based efficient frontier

Pedersen et al. (2021) argued that ESG could also be included in the objective function of the optimization, in which there is a trade-off between SR and ESG, i.e., a two-dimensional setting. Such trade-off has also been discussed in Herzel et al. (2012), Burchi (2019), and Burchi and Włodarczyk (2022), among others. Those studies argued that if two investors exhibit the same level of risk aversion but different ESG preferences, the one caring more about ESG is expected to choose a portfolio with higher ESG but lower SR. However, if the two have the same ESG preference, an investor with lower risk aversion is expected to select a high-SR but low-ESG portfolio. Such a combination of the two elements helps create a two-dimensional ESG frontier. In this sense, the ESG-frontier consists of all portfolios with the highest SR given each level of ESG. Therefore, the ESG-frontier reflects the investment opportunities when investors care about risks, returns, and ESG simultaneously (Pedersen et al., 2021).

The idea of an (efficient) frontier is not new and can be traced back to the production possibility frontier and the efficient production function (Ngo & Tsui, 2021). For instance, Farrell (1957) proposed that one can envelop all firms (or decision-making units, DMUs) being examined to form the best-practice efficient frontier. Such an idea has been well developed in terms of the parametric and non-parametric approaches, as well as in other hybrid forms. In contrast, the non-parametric data envelopment analysis (DEA) is a flexible method that can deal with multiple inputs/outputs settings without requiring an a priori production function (Nguyen et al., 2019; Le et al., 2021). Such advantage allows DEA to be popularly applied in the financial sector (Liu et al., 2013; see, for example, the reviews of Emrouznejad and Yang, 2018).

It is noted that the idea of using DEA to solve the portfolio optimization problem has been proposed by Murthi et al. (1997) and Basso and Funari (2001), among others. These studies treat the expected returns as an output while the risks (e.g., the portfolio standard deviation or the square root of the half-variance) are the inputs of their DEA model. Therefore, when accounting for the trade-off between risks and returns, the portfolios with the lowest trade-off form a multi-dimensional ESG-SR efficient frontier. In this sense, these studies examine an indirect measure of SR under two (independent) components of risks and returns, not as a single (and whole) SR index. Basso and Funari (2001), however, noticed that for the case of only one input (i.e., risk) and one output (i.e., return), their DEA results coincide with the SR index by a normalization multiplier, as DEA scores are bounded within the [0,1] interval whilst SR does not. This approach has since been used by Galagedera and Silvapulle (2002), Liu et al. (2015), Vidal-García et al. (2018), and Galagedera (2019), among others.

All in all, the SR frontier (Murthi et al., 1997; Basso & Funari, 2001; Vidal-García et al., 2018) measures the trade-off between risks and returns, while the ESG-frontier (Pedersen et al., 2021) measures the trade-off between SR and ESG. In this paper, we extend the ESG-frontier approach to account for the trade-off between SR and ESG under a multi-dimensional ESG-SR frontier setting of DEA. Such a DEA model can incorporate all ESG components (E, S, and G) and SR (as a single risk-return optimal index) in its estimations. Thus, it could provide a more insightful analysis of the multi-dimensional performance of the examined portfolios.

3 Methodology

This section presents the basic methods to estimate the Sharpe ratio (SR), the ESG-SR efficiency scores (through the double-frontier and base point SBM approaches), and the determinants of those efficiencies (through the double-bootstrap approach). We thus coin our model as the ESG-SR DFDB approach. Specifically, our model can deal with several issues of DEA, including the non-oriented and negative data (via base point SBM), the sensitivity (via double-frontier), the exogenous factor of efficiency, and robustness improvement (via double-bootstrap).

3.1 Estimating the Sharpe ratio

The Sharpe ratio (SR) was initially introduced as a reward-to-variability ratio (Sharpe, 1966) to compare the performance of mutual funds. While the ex-ante version of SR can be used to forecast portfolios’ performance (Sharpe, 1994; Beller et al., 1998; McLeod & van Vuuren, 2004), the ranking and, consequently, portfolio selection (i.e., portfolio optimization) can rely on information provided by the ex-post SR (Friede et al., 2015; Guidolin et al., 2018; Theron & van Vuuren, 2018). Specifically, given a set of \(N\) portfolios, the ex-post\({SR}_{i}\) of portfolio \(i\) (\(i=\text{1,2}, \ldots ,N\)) can be computed as the ratio between its expected return and its standard deviation (Sharpe, 1994; McLeod & van Vuuren, 2004; Agarwal & Lorig, 2020):

$$ {SR}_{i}=\frac{{\stackrel{-}{R}}_{i}-{R}_{f}}{{\sigma }_{i}} $$
(1)

where \({\stackrel{-}{R}}_{i}\) is the average of the historical return series of portfolio \(i\); \({R}_{f}\) is the return of a benchmark portfolio, usually a risk-free asset such as the short-term treasury bill rate (Ziemba et al., 1974; Li et al., 2022); and \({\sigma }_{i}\) is the standard deviation of the historical return as a static indicator for the last year of the examined period. For instance, if we have data on the treasury bill rates and annual returns for a company such as Apple for the 2001‒2020 period (i.e., time-series), then Eq. (1) can be re-written as in Eq. (2) to calculate the SR of Apple for the year 2020 (i.e., static) series of portfolio \(i\). Specifically, according to Eq. (1), when the average return of portfolio \(i\) equals to that of the benchmark portfolio (i.e., \({\stackrel{-}{R}}_{i}={R}_{f}\)), we have \({SR}_{i}=0\). On the other hand, we have a positive \({SR}_{i}\) if \({\stackrel{-}{R}}_{i}>{R}_{f}\), or a negative \({SR}_{i}\) if \({\stackrel{-}{R}}_{i}<{R}_{f}\). Since higher SR is better, investors are expected to optimize their portfolios by choosing one with the highest value of \({SR}_{i}\).

$$ \begin{aligned} & SR_{{Apple}}^{{2020}} = \frac{{\mathop R\limits^{ - } _{{Apple}} - R_{{{\text{T - bill}}}} }}{{\sigma _{{Apple}} }} \\ & {\text{with}}\;\mathop R\limits^{ - } _{{Apple}} = \frac{{R_{{Apple}}^{{2001}} + R_{{Apple}}^{{2002}} + \cdots + R_{{Apple}}^{{2020}} }}{{20\left( {years} \right)}} \\ & \sigma _{{Apple}} = \sqrt {\frac{{\sum\limits_{{t = 2001}}^{{2020}} {(R_{{Apple}}^{t} - \mathop R\limits^{ - } _{{Apple}} )^{2} } }}{{20\left( {years} \right)}}} \\ \end{aligned} $$
(2)

This study uses Eq. (2) to estimate the SR of US energy firms in 2019, using their historical daily-returns time-series data from 02/01/2015 to 31/12/2019. The descriptions of our data are presented in Sect. 4.1.

3.2 Estimating the ESG-SR frontier: the double-frontier and base point SBM approaches

DEA estimates the efficiency of a set of homogeneous decision-making units (DMUs) in terms of (technically) transforming their inputs into outputs. Given a set of n DMUs, each DMU utilizes k inputs xi (i = 1,2,.,k) to produce m outputs yr (r = 1,2,.,m), following Charnes et al. (1978), DEA can be used to estimate the best-frontier efficiency of the \({j}_{0}\)‒th DMU as:

$$ \begin{aligned} & EF_{{j_{0} }}^{B} = max_{{u,v}} \frac{{\sum_{{r = 1}}^{m} {u_{r} } y_{{rj_{0} }} }}{{\sum_{{i = 1}}^{k} {v_{i} } x_{{ij_{0} }} }} \\ & {\text{subject to}} \\ & \frac{{\sum_{r}^{m} {u_{r} } y_{{rj}} }}{{\sum\nolimits_{i}^{k} {v_{i} } x_{{ij}} }}{\text{ = 1,j = 1,2,}} \ldots {\text{,n}} \\ & u_{r} ,v_{i} \ge \varepsilon ,\forall i,r \\ \end{aligned} $$
(3)

where \({EF}_{{j}_{0}}^{B}\) is the efficiency score of the DMU j0 (j = 1,2, … ,n), vi and ur are the optimal weights assigned to the relevant inputs and outputs of this DMU, and ε is a non-Archimedean value designed to enforce positivity on the weights.

Specifically, we employ the SR index as the single output of our DEA model. As discussed before, portfolio optimization involves the selection of the maximum SR among different portfolios, while DEA also considers the outputs to be maximized (output-oriented) or, at least, stay the same (input-oriented). It is, therefore, natural to consider SR as a DEA output, although we need to follow Tone et al. (2020) to deal with negative SR values. In contrast, the trade-off relationship between ESG and SR (Pedersen et al., 2021) indicates that lower ESG is better, suggesting that ESG should be the inputs. To extend the work of Pedersen et al. (2021) from a two-dimensional setting (i.e., SR vs. ESG) into a multi-dimensional setting and taking advantage of DEA, we use all components E, S, and G as the three outputs. It is noted, however, that the ESG components are evaluated under the assumption that the higher values of E, S, and G, the better. We, therefore, use their reciprocal values instead of the original ones in our analysis to reflect the assumption of DEA that the lower the inputs, the better.

Equation (3) implies that the higher value of \({EF}_{{j}_{0}}^{B}\) the better performance of the portfolios, with \({EF}_{{j}_{0}}^{B}=1\) indicating the most efficient ones, i.e., the portfolios with the lowest ESG-SR trade-off form the best-frontier. It is also possible, however, to measure the performance of those portfolios using a worst-frontier approach (Paradi et al., 2004; Wang et al., 2007; Azizi, 2014), where the less efficient portfolios form the worst-frontier. Accordingly, the worst-frontier efficiency score, as in Eq. (4), still implies that the higher value of the scores, the better performance of the portfolios, now with \({EF}_{{j}_{0}}^{w}=1\) indicating the less efficient ones, i.e., portfolios with the highest ESG-SR trade-off.

$$ \begin{aligned} & EF_{{j_{0} }}^{w} = min_{{u,v}} \frac{{\sum\nolimits_{{r = 1}}^{m} {u_{r} } y_{{rj_{0} }} }}{{\sum\limits_{{i = 1}}^{k} {v_{i} } x_{{ij_{0} }} }} \\ & {\text{subject to}} \\ & \frac{{\sum\nolimits_{r}^{m} {u_{r} } y_{{rj}} }}{{\sum\nolimits_{i}^{k} {v_{i} } x_{{ij}} }} \ge {\text{1,j = 1,2,}} \ldots {\text{,n}} \\ & u_{r} ,v_{i} \ge \varepsilon ,\forall i,r \\ \end{aligned} $$
(4)

As discussed earlier, there are negative data in SR that traditional DEA models could not handle. More importantly, Pedersen et al. (2021) implicitly show that investors prefer portfolios with both high ESG and high SR. Therefore, we employ the base point SBM model (Tone et al., 2020) in the best- and worst-frontiers estimations instead of the traditional ones. Consequently, Eqs. (3) and (4) can be rewritten as in Eqs. (5) and (6), respectively:

$$ \begin{aligned} & EF_{{j_{0} }}^{B} = {\text{min}}\frac{{1 - \frac{1}{k}\sum _{{i = 1}}^{k} s_{i}^{ - } /x_{{ij_{0} }} }}{{1 + \frac{1}{m}\sum _{{r = 1}}^{m} s_{r}^{ + } /y_{{rj_{0} }} }} \\ & {\text{subject}}\;{\text{to}} \\ & \sum _{{i = 1}}^{k} \lambda _{j} x_{{ij}} + s_{{ij_{0} }}^{ - } = x_{{ij_{0} }} \\ & \sum _{{r = 1}}^{m} \lambda _{j} y_{{rj}} - s_{{rj_{0} }}^{ + } = y_{{rj_{0} }} \\ & {\text{j = 1,2,}} \ldots {\text{,n}} \\ & \lambda _{j} ,s_{i}^{ - } ,s_{r}^{ + } \ge 0 \\ \end{aligned} $$
(5)

and

$$ \begin{aligned} & EF_{{j_{0} }}^{W} = {\text{max}}\frac{{1 + \frac{1}{m}\sum _{{r = 1}}^{m} s_{r}^{ + } /y_{{rj_{0} }} }}{{1 - \frac{1}{k}\sum _{{i = 1}}^{k} s_{i}^{ - } /x_{{ij_{0} }} }} \\ & {\text{subject}}\;{\text{to}} \\ & \sum _{{i = 1}}^{k} \lambda _{j} x_{{ij}} - s_{{ij_{0} }}^{ - } = x_{{ij_{0} }} \\ & \sum _{{r = 1}}^{m} \lambda _{j} y_{{rj}} + s_{{rj_{0} }}^{ + } = y_{{rj_{0} }} \\ & {\text{j = 1,2,}} \ldots {\text{,n}} \\ & \lambda _{j} ,s_{i}^{ - } ,s_{r}^{ + } \ge 0 \\ \end{aligned} $$
(6)

Wang et al. (2007), Badiezadeh et al. (2018), and Cui et al. (2022), among others, further argued that DEA’s overall performance should be based on both the information provided by the best and worst frontiers. Such DEA double-frontier approach can overcome the sensitivity issue of DEA (Hughes & Yaisawarng, 2004; Tortosa-Ausina et al., 2008) and has been applied in the manufacturing system (Wang & Chin, 2009), supply chains (Badiezadeh et al., 2018), aviation (Cui et al., 2022), but not in finance/portfolio optimization. Although there are several ways to aggregate the two efficiency scores\({EF}_{{j}_{0}}^{B}\) and \({EF}_{{j}_{0}}^{W}\) (Wang & Chin, 2009; Cui et al., 2022; Mai et al., 2023), we follow the popular approach of Wang et al. (2007) to compute the (overall) ESG-SR double-frontier efficiency score of the examined portfolios as the geometric mean of the two as in Eq. (7), whereas Fig. 1 illustrated the principles of the double-frontier approach.

$${EF}_{{j}_{0}}=\sqrt{{EF}_{{j}_{0}}^{B}\times {EF}_{{j}_{0}}^{w}}$$
(7)
Fig. 1
figure 1

DEA double-frontier for a one input/output setting

3.3 Determinants of the ESG-SR efficiency under double-bootstrapping

DEA is based on the assumption that all firms or DMUs included are homogeneous, i.e., they are similar in terms of using inputs to produce outputs. However, the performance of the firms could still be affected by their internal operating environment (e.g., ownership, corporate governance) as well as other external operating environment (e.g., economic development, institutions). Most DEA studies extend beyond the estimation of the DEA frontier by utilizing a second-stage regression to examine those internal and external factors (Dao et al., 2021; Ngo & Tsui, 2021; Mirzaei et al., 2022). Simar and Wilson (2007) further argued that one should follow a bootstrap approach to overcome the multicollinearity problem between those factors and DEA input/output variables. We, therefore, follow Simar and Wilson (2011), Ngo and Tian (2020), and Le et al. (2022), among others, and apply as bootstrap DEA technique to investigate the key determinants of our ESG-SR efficiency.

More specifically, we follow Crespi and Migliavacca (2020) and Crace and Gehman (2022), among others, to select the internal and external variables of the ESG-SR efficient frontier. For firm-level internal factors, there is evidence that ESG performance depends on firm size (Rahman et al., 2011; Elsakit & Worthington, 2014; Sharma et al., 2020; El Khoury et al., 2021), board characteristics (Reverte, 2009; Khan, 2010), investments on research and development (R&D) (Xu et al., 2021; Dicuonzo et al., 2022; Ngo et al., 2022b), and market capitalization (Kiymaz, 2019; Crespi & Migliavacca, 2020). For country-level external factors, the important variables include economic development (El Khoury et al., 2021) and governance institutions of the country where the firm operates (Hooper et al., 2009; Crespi & Migliavacca, 2020). Our regression model, therefore, can be expressed as follows.

$$ \begin{aligned} EF = & \, \beta _{0} + \beta _{1} MKTCAP + \beta _{2} FIRM\_SIZE + \beta _{3} FIRM\_AUDIT \\ & + \beta _{4} BOARD\_SIZE + \beta _{5} BOARD\_FEMALE + \beta _{6} RD \\ & + \beta _{7} GDPCAP + \beta _{8} INF + \beta _{9} INSTITUTIONS + \in \\ \end{aligned} $$
(8)

where EF is the efficiency scores of the firms/portfolios derived from Eq. (7); MKTCAP represents the firm’s market capitalization; FIRM_SIZE represents the total assets of the firm; FIRM_AUDIT represents the role of the audit committee of the firm; BOARD_SIZE represents the number of board members of the firm; BOARD_FEMALE is the percentage of females in the firm’s board; RD measures the ratio of firm’s R&D expenses to total revenues; GDPCAP and INF are the GDP per capita and inflation of the country where the firm operates, respectively; and INSTITUTIONS is a set of governance variables including Control of corruption (CC), Government effectiveness (GE), Political stability and Absence of violence/terrorism (PV), Regulatory quality (RQ), the Rule of law (RL), and Voice and Accountability (VA). The error term \(\epsilon\) represents the measurement error of our estimation. The monetary variables MKTCAP and GDPCAP are in 2015 US dollars constant price and have been normalized by their logarithmic values.

Based on the double-bootstrap DEA algorithm (Simar & Wilson, 2007), our proposed ESG-SR DFDB approach can be briefly expressed as follows

  • Step 1 Calculate the ESG-SR double-frontier efficiency score \({EF}_{j}\) for all firms/portfolios involved using Eq. (5)‒(7).

  • Step 2 Use truncated regression to estimate Eq. (8) and reach the vector of estimated coefficients \(\widehat{\beta }\) corresponding to an estimated standard deviation \({\widehat{\sigma }}_{\epsilon }\) for the error term \(\epsilon\).

  • Step 3 Loop over the next three steps (3.1‒3.3) \({L}_{1}\) times to obtain \(n\) sets of bootstrap estimates \({\mathfrak{B}}_{l}={\left\{{EF}_{jl}^{}\right\}}_{l=1}^{{L}_{1}}\):

    • Step 3.1 For each \(i=1,.,n\), draw \({\epsilon }_{i}\) from the N(0, \({{\widehat{\sigma }}_{\epsilon }}^{2}\)) distribution with left-truncation regarding Eq. (8).

    • Step 3.2 For each \(i=1,.,n\), compute the efficiency score \({EF}_{i}^{*}\) using Eq. (8).

    • Step 3.3 For each firm, set \({SR}_{j}^{*}= {SR}_{j}\times \left({EF}_{j}/{EF}_{i}^{*}\right)\) and re-calculate its efficiency score \({EF}_{ji}^{*}\) using Eq. (5)‒(7).

  • Step 4 For each \(i=1,.,n\), compute the bias-corrected estimator \({\widehat{\widehat{EF}}}_{j}={EF}_{j}-\left[\text{E}\left({EF}_{ji}^{*}\right)-{EF}_{i}^{*}\right]\).

  • Step 5 Use truncated regression to estimate Eq. (8), this time using \({\widehat{\widehat{EF}}}_{j}\) as the dependent variable and arrive at the vector of estimated coefficients \(\widehat{\widehat{\beta }}\) corresponding to an estimated standard deviation \({\widehat{\widehat{\sigma }}}_{\epsilon }\) for the error term \(\epsilon\).

  • Step 6 Loop over the next three steps (6.1‒6.3) \({L}_{2}\) times to obtain \(n\) sets of bootstrap estimates \({\mathcal{C}}_{l}={\left\{{EF}_{jl}^{}\right\}}_{l=1}^{{L}_{2}}\):

    • Step 6.1 For each \(i=1,.,n\), draw \({\epsilon }_{i}\) from the N(0, \({{\widehat{\widehat{\sigma }}}_{\epsilon }}^{2}\)) distribution with left-truncation regarding Eq. (8).

    • Step 6.2 For each \(i=1,.,n\), compute the efficiency score \({\widehat{EF}}_{i}^{*}\) using Eq. (8).

    • Step 6.3 For each firm, set \({\widehat{SR}}_{j}^{*}= {SR}_{j}\times \left({EF}_{j}/{\widehat{EF}}_{i}^{*}\right)\) and re-calculate its efficiency score \({\widehat{EF}}_{ji}^{*}\) using Eq. (5)‒(7).

  • Step 7 Use the bootstrap values in \(\mathcal{C}\) and the original estimates \(\widehat{\widehat{\beta }}\), \({\widehat{\widehat{\sigma }}}_{\epsilon }\) to construct estimated confidence intervals for each element of \(\beta\) in Eq. (8). They are the (double-bootstrap) bias-corrected associations between the independent variables (e.g., MKTCAP or INSTITUTIONS) and the ESG-SR double-frontier efficiency scores.

4 Empirical results and discussion

4.1 Data

There is an increasing trend in examining the nexus between ESG and stock performance in the energy industry, especially with a focus on advanced economies such as the US, the UK, and European countries (Behl et al., 2021). The energy industry attracts more interest than other industries thanks to the perception that ESG risks are more significant in this sector (Behl et al., 2021). For instance, Makridou et al. (2016) argued that the energy industry is not only crucial for the development of an economy (and other sectors), but it is also intensively involved in natural resources consumption and environmental degradation. Kumar et al. (2016) also found that during 2014‒2015, the US energy sector was the industry with the highest stock return volatility compared to other industries, such as automobiles, utilities, or transportation. It is consequently argued that considering ESG factors when investing in the energy industry could help significantly reduce potential risks in this sector (Kumar et al., 2016). Therefore, we select the energy industry as an empirical sample of our analysis as we believe that this sector should best reflect the trade-off between ESG and SR. Thus, it can appropriately illustrate the ESG-SR efficient (double) frontier. It is noted that DEA has been used to examine the efficiency and performance of the energy sector (e.g., Sueyoshi and Goto, 2017; Sueyoshi et al., 2017), but not focusing on the ESG-SR trade-off nor using the DFDB approach as in this study.

We started by collecting the ESG and its components (E, S, and G) of more than 460,000 daily prices (from 02/01/2015 to 31/12/2019) of 371 global energy firms from the Eikon Refinitiv database (Thomson Reuters Eikon, 2022). As discussed in Sect. 3.1, Eq. (2) derives the SR for those firms in the year 2019 (i.e., the final year of our data), with the risk-free asset proxied by the 3-month treasury bill rates, following Ziemba et al. (1974), Kamil et al. (2006), and Li et al. (2022), among others. We ended up with SR data for all those 371 firms in 2019 and continued to match them with the ESG data. After filtering for missing observations, our 2019 DEA dataset (including data on E, S, G, and their reciprocals as inputs; and SR as output) consists of 334 energy firms. It covers 45 countries (see also “Appendix 1”), of which 26 are advanced economies (including Japan, Germany, the UK, and the US), and another 19 are emerging ones (e.g., China, Russia, and India). Although only one African country (i.e., South Africa) is involved in this research, the number of countries for the other continents are seven for the Americas, 20 for the Europe, and 17 for Asia and the Pacific. It is thus a good representative for the global energy industry in our empirical analysis.

For our second-stage double-bootstrap regression analysis (Simar & Wilson, 2011; Le et al., 2022), data at the firm level (e.g., MKTCAP, FIRM_SIZE, and BOARD_SIZE) are also extracted from the Eikon Refinitiv (Thomson Reuters Eikon, 2022), while country-level data (including GDPCAP, INF, and the six INSTITUTIONS indices) are from the World Development Indicators (WDI) database (World Bank, 2020). Consequently, we ended up with cross-sectional data for 334 firms in 2019 - the descriptive statistics of our data are reported in Table 1.

Table 1 shows that the examined stocks of our 334 energy firms had their SR in 2019 ranging from ‒1.306 to 1.459. As previously discussed, a negative SR means the expected return of the stock is lower than that of the risk-free asset (i.e., \({\stackrel{-}{R}}_{i}<{R}_{f}\) as in Eq. (1)), suggesting that investors prefer buying a risk-free asset (e.g., T-bills) rather than investing in such stock. On the other hand, the wide ranges of E, S, and G also suggest that the examined firms performed differently in each ESG aspect. We thus argue that it is difficult for investors to select their best portfolios depending on a two-dimensional aspect (e.g., SR vs. ESG, SR vs. E, SR vs. S, and SR vs. G).

Table 1 Descriptive statistics of the dataset

Table 2 A reports more details on the statistics of a single-dimensional performance, in which if an investor uses SR alone, they will find that more than 52% of the sample stocks can be considered as ‘good stock,’ following the categorization of Thomson Reuters Eikon (2022). However, if they rely only on ESG performance, they may find that about 77% of stocks can be regarded as ‘green stock.’ When we extend it to a two-dimensional framework (e.g., SR vs. ESG, SR vs. E, SR vs. S, and SR vs. G), Table 2B shows the results are difficult to interpret and inclusive. For example, regarding SR vs. E, ‘good and green’ stocks accounted for about 37% of the sample, while SR vs. G is about 44% of the sample stocks. Such inconclusive findings justify our argument that one needs to examine the multi-dimensional relationship between E, S, G, and SR using the ESG-efficient frontier derived from the ESG-SR-SBM DEA approach.

Table 2 Data analytics on ESG and SR

4.2 Empirical results

We first summarise the double-frontier ESG-SR efficiency of the global energy industry (in 2019) in Fig. 2. Specifically, Fig. 2 reports the initial results of average ESG-SR efficiency for the energy industry of each sampled country, where the highest efficiency was found in Finnish firms (EF = 6.273) and the country with the least efficient energy firms is Israel (EF = 0.397), yielding a global average (in)efficiency of 2.273 (or 0.440 efficient, compared to the highest level of 1.000 or 100% efficient). In general, we found that in 2019, there were 29 countries (i.e., 64.4% of the sample) had their energy industries performed above the global average level. Countries that underperformed in 2019 include China, the US, Pakistan, Singapore, and Israel. Consequently, it is suggested that responsible investors should not optimize their portfolios by purchasing stocks from those energy firms because of their ESG-SR trade-off. For instance, the average Israeli firm (i.e., worst performer) only has a moderate Social value of 24.70 but low values of Environment (2.02) and Governance (4.59); more importantly, its SR was very low (− 0.27). In contrast, the average Finnish firm (i.e., top performer) has their E, S, G, and SR of 70.65, 77.13, 88.46, and 1.14, respectively. Therefore, such findings can provide important information for the investment decisions of responsible and other investors.

Fig. 2
figure 2

Average ESG-SR double-frontier efficiency scores (by country)

Table 3 looks closer at the firm level, where we report the top‒10 and bottom‒10 energy firms in terms of their ESG-SR double frontier efficiency scores. It is reasonable to see that the top performers are better in both ESG and SR, while the bottom ones are the inverse. We also observe that the rankings of those firms are consistent in both EFB and EFW and that the overall ESG-SR double-frontier efficiency scores (EF) are the comprehensive aggregation of the two. Consequently, energy firms in Finland, India, and Russia should be the ones to optimize the ESG-SR investment portfolios of responsible investors.

Table 3 The Top-10 and Bottom-10 energy firms in terms of ESG-SR efficiency

\(EF\): the overall double-frontier efficiency score derived from Eq. (7).

To further examine the factors that can influence the EF scores of the sample firm, the double-bootstrap approach (Simar & Wilson, 2007) is employed, as described in Sect. 3.3, with \({L}_{1}={L}_{2}=2000\) bootstraps. The results in Table 4 show that the firm’s market capitalization (MKTCAP), board characteristics (BOARD_SIZE and BOARD_FEMALE), and the rule of law (RL) are important factors that can improve the ESG-SR efficiency. We, however, could not find any statistical evidence of the impacts of other factors, such as GDP per capita and firm size, on such efficiency.

Table 4 The determinants of the ESG-SR efficiency

This table provides the estimation results of the double bootstrap analysis on the determinants of the ESG-SR double-frontier efficiency of the 334 sample firms. MKTCAP represents the firm’s market capitalization (in logarithm); FIRM_SIZE represents the firms’ total assets (in logarithm); FIRM_AUDIT is a dummy variable that equals one if the firm has an audit committee, and 0 otherwise; BOARD_SIZE represents the number of board members of the firm (in logarithm); BOARD_FEMALE is the percentage of females in the firm’s board (in logarithm); RD measures the firm’s R&D expenses to total revenues (in logarithm); GDPCAP is the GDP per capita of the country where the firm operates (in logarithm); INF is the inflation rate of the country where the firm operates (in logarithm); CC: Control of corruption; GE: Government effectiveness; PV: Political stability and Absence of violence/terrorism; RQ: Regulatory quality; RL: The rule of law; and VA: Voice and Accountability.

Table 4 suggests that a firm can benefit from its market development to achieve higher ESG-SR efficiency. It is because large firms with high market capitalization have more resources and can invest more in environmental (E) and social (S) strategies to improve their ESG (Kiymaz, 2019; Crespi & Migliavacca, 2020; Janicka & Sajnóg, 2022) while they also have better governance (G) ratings (Halbritter & Dorfleitner, 2015; Al Amosh et al., 2022). Meanwhile, there is evidence that board size positively impacts ESG (Reverte, 2009; Khan, 2010; Husted & Sousa-Filho, 2019), while the participation of females in the firm’s board has a positive impact on the ESG and thus, the ESG-SR performance of the firm (Velte, 2016; Birindelli et al., 2018). Khan and Baker (2022) argued that female board members might be more concerned about ESG issues than their male counterparts, so they may be more active in monitoring the ESG activities of the firm. Birindelli et al. (2018) further found a U-shape relationship between the proportion of female board members and ESG, in which the positive impact exists only if this proportion is less than or equal to 32%. In our sample, the average value of BOARD_FEMALE is 17.97%, which satisfies such a condition.

Among the INSTITUTIONS variables, only the Rule of Law (RL) is found to have a statistically significant impact on ESG-SR efficiency. It is noted that RL reflects the perception of firms on the country’s legal and law, while the other variables (e.g., GE or PV) reflect the perceptions of the country’s business environment (World Bank, 2020). For instance, GE is about the quality of public services and the quality of policy implementation, while PV is about the likelihood of political instability and/or politically-motivated violence (World Bank, 2020). These are the conditions to create a suitable environment for business development. In this sense, RL is more relevant to the compliance of ESG activities and reporting of the firms, and thus, it is reasonable to see that a higher RL can lead to a higher ESG-SR performance of the global energy firms. For instance, the shareholder theory (Friedman, 1970) argues that firms are unwilling to engage in ESG activities or ESG disclosures because they are not at the core of their businesses. Rules and laws can, therefore, influence the firms’ incentives, making them more involved in ESG activities (Baldini et al., 2018; Sharma et al., 2020; Al Amosh et al., 2022).

5 Conclusions

It is important to evaluate and optimize investment portfolios while accounting for the trade-off between risks and returns (i.e., the Sharpe ratio) associated with the examined stocks or portfolios. In recent years, it has been proven that environmental, social, and governance (ESG) performance can, directly and indirectly, influence stock returns. Therefore, such a trade-off has evolved from a two-dimensional perspective (i.e., risks versus returns) into a multi-dimensional one (e.g., risks versus returns versus ESG). This study is the first to examine this setting in the global energy industry.

More specifically, we propose a multi-dimensional ESG-efficient frontier approach to examine the trade-off between ESG (and its components) and Sharpe ratio (i.e., risks and returns). To simultaneously account for the issues of negative data, input/output optimization, estimation sensitivity, and endogeneity in Data Envelopment Analysis (DEA), we employed a (slacks-based measures, SBM) ESG-SR double-frontier double-bootstrap (ESG-SR DFDB) to evaluate the overall performance of energy firms, as well as examine the determinants of this performance. Since the ESG risks in the energy industry are higher than the others, this sample can best reflect the above multi-dimensional trade-off.

Our empirical results show that only around 11% of our sample firms perform well in the multi-dimensional ESG-SR efficient frontier. At the country level, the 2019 average (in)efficiency of the global energy industry was 2.273 (or 0.440 efficient, compared to the highest of 1.000 or 100% efficient level). Consequently, the responsible investors (those concerned about the ESG- and SR-performance of the firm) should optimize their portfolios based on energy firms operating in Finland or Russia, but not the ones from China, the US, and Israel. Besides the firm’s input/output utilization (e.g., the average Finnish firm has their E, S, G, and SR all in higher values, compared to the Israeli one), we also found that the firm-level characteristics (e.g., market capitalization and board characteristics) and country-level characteristics (e.g., the rule of law) can have positive impacts on their ESG-SR performance. Such findings, therefore, are essential not only to the (responsible) investors but also to managers and policymakers in those firms/countries. For instance, improving corporate governance can be an important source for the firm’s ESG-SR efficiency. Meanwhile, the (minimum) government’s role in monitoring the market through laws and regulations is justified.

Future research should extend our model into other sectors, such as financial or manufacturing, to confirm our findings. More importantly, newer estimation techniques, including network DEA, Malmquist DEA, or stochastic DEA (Tone & Tsutsui, 2009; Kerstens & Woestyne, 2014; Sensoy et al., 2019; Matsumoto et al., 2020; Tsionas, 2021) in the first stage and artificial intelligent or machine learning (Anouze & Bou-Hamad, 2021; Nandy & Singh, 2021; Zhu, 2022) in the second stage of the analysis could also help improve the robustness and predictive power of the model. Extensions in data and variables (e.g., with alternative proxies of E, S, and G) can also strengthen our study.