1 Introduction

In recent years there has been a steady increase in the number of worldwide mutual funds investing in assets that meet environmental, social, and governance (ESG) criteria (Bauer et al. 2007; Cortez et al. 2009; Geczy et al. 2021), which reflects the investors’ increasing awareness towards these issues (El Ghoul et al. 2023; Renneboog et al. 2008a). The prominence of investor awareness was particularly heightened in the aftermath of the 2007–08 financial crisis, since there was a significant rise in assets under management (AUM) for funds investing according to ESG criteria (El Ghoul et al. 2023). Karoui and Nguyen (2022) stated that crises in the financial markets, the existence of financial scandals and the challenges of the climate crisis led investors to invest more in socially responsible investments. Becker et al. (2022) showed that after the publication of the Sustainable Finance Disclosure Regulation (SFDR), the EU funds’ sustainability ratings increased, whereas funds with better ESG labels received greater net inflows.

Awareness relative to ESG issues can be captured by the number of signatories to the United Nations Principles for Responsible Investments (PRI), which indicates the number of organizations that consider ESG criteria in their investment decision-making processes, whereas the corresponding size of AUM demonstrates the size of those investments. In this context, considering the relationship between ESG awareness and funds’ AUM, the findings from the PRI report indicate that since its inception in 2016, both the number of signatories and the AUM have rapidly grown, which implies that awareness relative to ESG issues led to increasing adoption of responsible investment practices.Footnote 1 Specifically, from 734 signatories and US$21 trillion AUM in 2010, there were more than 4902 signatories and more than US$121.3 trillion AUM by 31 March, 2022 (Avramov et al. 2022; PRI Annual ReportFootnote 2). With respect to investments in socially responsible funds, the 2020 Global Sustainable Investment ReviewFootnote 3 revealed that assets in socially responsible funds globally rose from US$30.6 trillion in 2018 to US$35.3 trillion in 2020, whilst during the same period the funds’ assets with ESG integration rose by about 44%, from US$17.5 trillion in 2018 to US$25.2 trillion in 2020.

The global economic turmoil caused by the COVID-19 pandemic has had profound financial consequences, leading investors to seek safe-haven investment opportunities (Ji et al. 2020; Tampakoudis et al. 2022). During periods of financial crises, such as the one associated with COVID-19, investors adjust their investment preferences and reallocate their portfolios accordingly (Himanshu et al. 2021). In times of financial instability, investors tend to adopt investment strategies that minimize risk, while enhancing investment performance (Díaz et al. 2022). Thus, the COVID-19 pandemic has heightened the significance of Corporate Social Responsibility/Environmental, Social, and Governance (CSR/ESG) issues (Bae et al. 2021). Indeed, Fang and Parida (2022) argue that since the COVID-19 market crash investors have considered socially responsible investments (SRIs) a necessity and have prioritized such investments. Examining mutual fund flows during such crises thus becomes paramount in assessing financial fragility (Falato et al. 2021).

When considering the ESG characteristics in the selection of mutual funds, it is important to recognize the existence of SRI funds. These funds are distinct in their objective of simultaneously pursuing financial returns and social objectives (Renneboog et al. 2011). Despite the significant integration of ESG criteria into investment processes, investors continue to seek further guidelines regarding portfolio selection using ESG criteria. However, there remains a lack of consensus regarding the impact of ESG factors on financial performance. (Pedersen et al. 2021).

In fact, prior empirical studies on the effect of ESG performance on the financial performance of mutual funds provide inconclusive evidence. Thus, while some studies demonstrate an insignificant association between the level of funds’ ESG score and their financial performance (Bauer et al. 2007; Ghoul and Karoui 2022), there are other studies that report a negative association between funds’ ESG and financial performance (Chang et al. 2012; El Ghoul et al. 2023; El Ghoul and Karoui 2017). There are also studies that provide evidence for a positive association between mutual funds’ ESG and financial performance (Abate et al. 2021; Fang and Parida 2022; Gil-Bazo et al. 2010).

Even though prior studies investigate the nexus between overall ESG score and financial performance, the effect of ESG controversies scores on financial performance is relatively unexplored, while ESG controversies scores have drawn limited research attention (Galletta and Mazzù, 2023; Treepongkaruna et al. 2022). ESG controversies exist when there are ESG-related corporate scandals and thus ESG controversies score reflects the firm's exposure to ESG controversies and adverse events reflected in global media. Galletta and Mazzù (2023) used the ESG controversies score as a proxy of the market’s perception of the firms’ compliance with ESG criteria, whereas Aouadi and Marsat (2018) used this score to assess concerns related to ESG issues. We respond to relevant calls (see Luo et al. 2022) by considering the impact of funds’ ESG score on their performance and further evaluate whether investing in socially responsible funds is associated with performance improvements amid the COVID-19 pandemic. This paper investigates the financial performance of worldwide mutual funds concerning the role of their ESG performance. Specifically, we evaluate mutual fund performance using the methodology of Data Envelopment Analysis (DEA) and we compare the efficiency of mutual funds that meet high ESG standards against those with lower ESG standards.

In this context, we aim to provide answers for the following research questions: (a) Do funds’ ESG controversies scores significantly affect their financial performance? (b) Do mutual funds with higher ESG controversies scores provide better financial performance and to what extent? and (c) Is the association between ESG controversies scores and mutual fund performance affected by the fund’s geographical investment area?

Our paper introduces several novel aspects compared to previous studies in the field, making a significant contribution to the understanding of mutual fund performance evaluation. Firstly, instead of relying on more conventional measures, we employ the methodology of DEA to assess the financial performance of mutual funds. This approach offers distinct methodological advantages over traditional measures and has been gaining attention in recent years.

In line with the work of Abate et al. (2021), the application of DEA in evaluating mutual fund performance provides several notable benefits. Firstly, DEA allows for the comparison of the performance (efficiency level) of the entire sample of funds simultaneously, without the need for a benchmark. This is particularly valuable when dealing with diverse portfolios and varying investment strategies, as it provides a comprehensive assessment of each fund's efficiency relative to its peers. Secondly, DEA is a flexible method of operational research that can incorporate multiple inputs and outputs. This flexibility enables a more comprehensive analysis of the factors influencing mutual fund performance and allows for a more nuanced evaluation of efficiency. Lastly, DEA enables the evaluation of the overall performance of funds while considering the marginal contribution of each input. This aspect provides insights into the relative importance of different inputs in driving fund efficiency and helps identify areas for improvement.

Secondly, in contrast to categorizing mutual funds into socially responsible funds based on various criteria (such as investing in sin stocks versus non-sin stocks), we employ a fund-level social responsibility score, aligning with recent literature (Borgers et al. 2015; El Ghoul et al. 2023; El Ghoul and Karoui 2017; Hartzmark and Sussman 2019; Muñoz et al. 2022). This approach allows us to decompose fund portfolios into socially responsible and conventional (non-socially responsible) funds. Specifically, for each fund, we calculate an ESG ratio using the ESG scores provided by Refinitiv for the individual securities within each mutual fund.

Lastly, our paper evaluates the impact of ESG on mutual fund financial performance using an alternative indicator for fund-level ESG performance that incorporates the effects of negative events related to firms' corporate social responsibility. In line with recent studies examining the financial consequences of ESG performance (Agnese et al. 2022; Aouadi and Marsat 2018; Dorfleitner et al. 2021; Schiemann and Tietmeyer 2022; Treepongkaruna et al. 2022), we incorporate the ratio of ESG controversies score to capture the level of a fund's social responsibility by considering the presence of negative ESG events. ESG controversies arise when negative externalities, such as pollution, are associated with a project (Renneboog et al. 2008a), and therefore, ESG controversies scores serve as a measure for ESG-based scandals (Dorfleitner et al. 2021).

We employ a sample that contains 17,961 global mutual funds with available financial characteristics for the later phase of the COVID-19 pandemic. We demonstrate that ESG characteristics affect the performance of mutual funds during the COVID-19 pandemic. Specifically, there is a statistically significant difference in mutual fund performance between mutual funds with high ESG scores and those with low ESG scores. This outperformance of funds with higher social responsibility remains unaltered even after controlling for the funds’ investment area.

The rest of the paper is organized as follows. Section 2 provides the theoretical framework and develops the research hypotheses that are under examination. Section 3 describes the empirical methods and specifies the DEA models that are applied for the estimation of mutual fund performance. Section 4 reports the empirical results, whilst Sect. 5 provides the main conclusions of this study.

2 Literature review and hypothesis development

2.1 Theoretical framework

The existing body of literature offers multiple perspectives contributing to the ongoing debate regarding the performance of funds incorporating socially responsible criteria and their alignment with wealth maximization objectives. The association between ESG performance and financial performance can be explained by two opposing theoretical viewpoints.

Some studies highlight the potential negative impact of ESG performance on mutual fund performance (El Ghoul and Karoui 2017). Investing in firms with high ESG scores may result in a reduction in investment opportunities and an increase in monitoring costs. The underperformance of SRI funds compared to their conventional counterparts can be explained by their exclusion of investments that do not align with ESG objectives, thereby missing out on potential financial opportunities (Renneboog et al. 2008b).

In essence, investing based on ESG criteria restricts investment choices, as socially responsible investments constitute a subset of the overall universe of investment opportunities. This implies that ethical investors may have to forego some of the benefits of diversification. According to portfolio theory, the introduction of additional constraints, such as ESG criteria, into the asset selection process is likely to hinder optimal portfolio construction. Specifically, incorporating these constraints reduces the pool of available investments, which in turn affects potential diversification compared to unconstrained portfolios. The altered benefits from diversification may result in lower returns for constrained portfolios relative to unconstrained portfolios (Cortez et al. 2009). Additionally, the need for screenings and rankings to assess the level of investment social responsibility incurs costs that can affect the net returns of investments (Bauer et al. 2007).

On the other hand, there are also arguments in favor of socially responsible investments which highlight the long-run benefits (i.e., improvements in economic performance) that can be derived from investing in firms which follow a strategy that enhances their corporate social responsibility. Specifically, a higher level of corporate social responsibility reflects higher quality of corporate management, which in turn may lead to comparative advantages relative to firms with lower levels of corporate social responsibility (Cortez et al. 2009). Thus, mutual funds investing in stocks with high ESG performance can generate higher levels of performance. According to this view, it is likely that the additional costs due to ESG screenings may be overcome by ruling out unprofitable firms through the screening process (El Ghoul et al. 2023). This outperformance of SRI funds compared to their conventional peers can be explained by the fact that fund managers who seek firms with high ESG scores are in fact likely to incorporate firms with strong financial fundamentals or firms with more sustainable performance into their portfolios, which may ultimately result in better performance (El Ghoul and Karoui 2017).

2.2 Hypothesis development

Considering the effect of CSR scores on stock returns during the 2007–08 financial crisis, Lins et al. (2017) evaluated 1673 U.S. non-financial firms and provided evidence that firms with higher CSR scores presented significantly higher returns compared to firms with lower CSR scores. Leite and Cortez (2015), for a sample of French funds investing in Europe, provided evidence that SRI funds underperformed conventional funds during non-crisis periods, whereas SRI funds matched the performance of their conventional peers during periods of market downturns.

With respect to the association between SRI fund flows and their performance, Klinkowska and Zhao (2023) used a sample of U.S. SRI funds over the period December 1999 to March 2021 and showed that SRI funds with greater performance subsequently attracted more investments. A convex relationship was identified between fund performance and flows for retail funds, while no such convexity was observed for institutional funds. A relationship was also observed between SRI mutual funds and firms’ ESG scores. Peng et al. (2023), using a sample of listed A-share firms in China over the period 2010–2020, showed that firms with greater ESG scores were more likely to attract investments from SRI funds in the following year, whilst the investments from SRI mutual funds had a positive impact on their investee firms’ ESG scores in the following year.

The COVID-19 pandemic had several consequences, among others, in the financial markets (Bai et al. 2023; Berkman and Malloch 2023), the banking sector (Feyen et al. 2021; Igan et al. 2023; Silva et al. 2023), as well as in the industry of mutual funds (Allen et al. 2023; Jacob et al. 2023; Zhang et al. 2023). Considering the effects of COVID-19 pandemic on mutual funds, Allen et al. (2023) showed that prime institutional investors of money market funds responded to the intensity of the pandemic, whereas retail investors did not have a significant response. In addition, in response to the COVID-19, the weighted average life of funds was decreased, and the funds’ daily liquidity was increased by the fund managers. According to Jacob et al. (2023), during the peak of the COVID-19 pandemic, equity funds preferred firms with lower levels of risk, higher expected growth as well as larger size. They revealed that the COVID-19 pandemic affected the mutual funds’ asset allocation strategy and the funds’ investment preferences. Moreover, Zhang et al. (2023) using a sample of Chinese equity mutual funds, showed that there was a positive relationship between funds’ ESG scores and their downside risk; however, this positive relationship was weakened during the pandemic of COVID-19.

A strand of recent studies demonstrated that ESG criteria constituted significant determinants of financial performance during the COVID-19 pandemic. Albuquerque et al. (2020) used a sample of 2171 U.S. listed firms and showed that firms with greater environmental and social ratings presented significantly higher returns compared to those with lower environmental and social ratings. Moreover, Broadstock et al. (2021), analyzing a sample of Chinese CSI300 stocks during the COVID-19 pandemic, revealed that portfolios with higher ESG performance demonstrated superior performance compared to portfolios with lower ESG performance. Additionally, these high ESG portfolios exhibited a reduction in risk during the COVID-19 crisis. Engelhardt et al. (2021), using a sample of 1452 firms from 16 European countries, found that European firms with higher ESG ratings were associated with significantly higher market-adjusted abnormal stock returns during the crisis of the COVID-19 pandemic. Omura et al. (2021) analyzed whether socially responsible investing matters in times of economic downturns using a sample of ESG ETFs. They concluded that responsible investments outperformed conventional investments during the COVID-19 pandemic.

On the other hand, Tampakoudis et al. (2021), using a sample of M&As in the U.S., provided evidence that ESG scores had a negative impact on acquirer performance, and this negative effect was stronger during the COVID-19 pandemic. In addition, Bae et al. (2021) using a sample of 1750 U.S. firms evaluated whether there was an association between CSR and stock returns during the pandemic induced turmoil and they argued that stock returns were unaffected by the level of CSR during the crash period. Similarly, Demers et al. (2021) used a sample of 1652 U.S. firms with available data from the onset of the pandemic and the full year ending in December 2020. They found that ESG scores had an insignificant impact on stock returns during COVID-19; however, the firms’ stock of investments in internally generated intangible assets constituted a factor that was positively associated with stock returns. In this context, amid periods of financial crises, such as the pandemic-driven crisis, investing with ESG criteria may lead to better financial performance outcomes.

The extant literature provides mixed results considering the association between ESG scores and mutual fund performance. The first strand of the relevant research demonstrates that ESG scores have an insignificant impact on mutual fund performance, which implies that the financial performance of SRI mutual funds is similar to the performance of more conventional funds. This result may imply that the benefits of SRI equal the increases in monitoring cost due to processes of social screening (Ghoul and Karoui 2022).

Bello (2005) used a sample of 42 socially responsible and 84 conventional funds over the period 1994–2004 and provided evidence that, after adjusting for the level of portfolio diversification, both categories of funds had insignificant differences in terms of performance. Moreover, Bauer et al. (2005) evaluated the performance between conventional and ethical funds using an international sample of 103 ethical and 4,384 conventional mutual funds over the period 1990–2001. After controlling for investment style, they demonstrated insignificant differences in risk-adjusted returns, using the Carhart four-factor model, between the two subsamples. Similarly, Kreander et al. (2005) evaluated 60 funds from four European countries (U.K., Sweden, Germany, Netherlands) during the period 1995–2001 and found insignificant differences in performance between ethical and non-ethical funds. The insignificant difference in performance between ethical and conventional mutual funds was also supported by Bauer et al. (2007) who evaluated Canadian mutual funds from 1994 to 2003. Renneboog et al. (2008b) used a sample of mutual funds domiciled in 17 countries over the period 1991–2003 and they also found that, in most countries, the differences in risk-adjusted returns between SRI funds and their conventional peers were insignificantly different from zero. Klinkowska and Zhao (2023) found insignificant differences between the performance of SRI funds and their conventional peers for funds run by the same management companies with similar investment objectives. However, this result differed when they compared SRI and conventional funds with similar characteristics but without the restriction of the same manager.

The second group of prior studies provides evidence that the ESG performance of mutual funds is negatively associated with their financial performance. El Ghoul and Karoui (2017) evaluated the impact of CSR on the performance of mutual funds and fund flows using a sample of 2168 U.S. domestic equity funds over the period 2003–2011. They found that mutual funds with high CSR scores experienced weaker performance but comparable persistence in flows, relative to funds with low CSR scores. Moreover, El Ghoul et al. (2023), using a sample of 2255 U.S. domestic equity funds over the period 2010–2021, found that socially responsible funds presented relative underperformance compared to non-socially responsible funds, which, however, was relatively small. Specifically, they documented lower raw returns, lower risk-adjusted return, and lower Sharpe ratios for socially responsible funds.

Finally, the third strand of literature demonstrates that mutual funds with higher ESG scores perform better than those with lower ESG scores. Gil-Bazo et al. (2010) evaluated the performance of 86 SRI equity mutual funds and 1,761 conventional funds in the U.S. over the period 1997–2005 and found that SRI funds experienced higher after-fee performance compared to their conventional peers. However, the outperformance was exclusively driven by SRI funds that were managed by companies with a specialization in SRI. Steen et al. (2020) investigated the association between funds’ ESG rating and performance using a sample of 146 equity funds domiciled in Norway over the period 2014–2018. Even though insignificant results were derived from the total sample, by focusing on European categorized funds, they found that mutual funds classified into top ESG quantiles are associated with significantly higher returns. Abate et al. (2021), using a sample of 634 European equity mutual funds over the period 2014–2019, applied the DEA methodology to investigate the effect of the level of sustainability on mutual fund performance. They found that funds with higher ESG ratings presented higher efficiency than those with lower ESG ratings. With regard to mutual fund performance, Fang and Parida (2022) used a sample of equity mutual funds from September, 2018 to June, 2021 and documented that mutual funds with high sustainability generated higher returns compared to funds with low sustainability. This outperformance significantly increased during the COVID-19 crash as well as during the post-crash period.

Bilbao-Terol et al. (2023) investigated the overall efficiency of mutual funds using a sample of 144 French mutual funds including 31 funds that were labeled as socially responsible. They used the DEA methodology to estimate the funds’ financial, corporate sustainability, and overall efficiency. The results indicated that the proportion of overall-efficient funds was greater in the sample of socially responsible funds compared to the overall-efficient proportion in the sample of conventional funds. Tampakoudis et al., (2023) evaluated the effect of funds’ ESG scores on their financial performance during the COVID-19 pandemic using the methodology of DEA. Using a sample of 9864 mutual funds worldwide, they revealed that non-equity mutual funds generated significantly higher financial efficiency compared to equity mutual funds. In addition, mutual funds’ high ESG scores were associated with significantly greater performance compared to mutual funds with low or medium ESG scores. Considering first that ESG performance affects financial performance during the COVID-19 pandemic and second that mutual funds with higher ESG scores can lead to better financial performance and amid the COVID-19 pandemic, the following hypotheses are proposed:

H1a

ESG performance constitutes a significant determinant of financial performance amid the COVID-19 pandemic

H1b

The performance of mutual funds is higher for funds with greater ESG performance amid the COVID-19 pandemic

The majority of prior research on the performance of mutual funds mainly focuses on evidence derived from the U.S. and/or the U.K., whereas limited attention has been paid to the performance of mutual funds investing in other markets. Moreover, prior studies mainly focus on the region of fund domiciliation rather on the geographical focus of such investments. Considering the investment strategy of a sample of European funds, Kreander et al. (2005) argued that through international diversification ethical funds investing at international level were able to overcome performance constraints. Matallín-Sáez et al. (2019) analyzed a sample of 3920 worldwide equity mutual funds over the period 2000–2018 and evaluated the impact of four attributes of social responsibility on the funds’ performance, considering the role of the funds’ investment area. They found that the performance of funds with a higher level of social responsibility was worse than the performance of funds with a lower level of social responsibility, and this result was irrespective of the funds’ geographical area of investment. However, this underperformance of socially responsible funds is insignificant for funds investing in Europe. Abate et al. (2021) used a sample of equity funds domiciled in Europe and documented that the financial efficiency of mutual funds with high ESG level is higher than the efficiency of funds with lower ESG level, irrespective of the area of investing focus. Specifically, high-ESG funds investing in European, global or other equity markets presented superior efficiency compared to low-ESG funds investing in these locations. To evaluate whether the effect of ESG scores on mutual fund performance remains unaltered even after accounting for differences in performance due to differences in the funds’ geographical investment area, we also tested the following hypothesis.

H2

Mutual funds with higher ESG performance have superior efficiency regardless of their geographical investment area.

3 Methodology

3.1 Introduction to DEA

In this section, the analysis of the paper will focus on providing the theoretical and mathematical framework. To this end, the DEA is introduced, which is a non-parametric technique based on Linear Programming (LP). The first DEA models were introduced by Charnes et al. (1978), who assumed an underlying production function where each unit \(j\) (hereafter entitled Decision Making Unit or DMU) consumes \(i\) inputs (\(x_{i,j}\)) to produce \(r\) outputs (\(y_{r,j}\)). In the simplest case of 1 input and 1 output, the efficiency is defined as the fraction of outputs produced to inputs consumed or \(eff = \frac{y}{x}\). However, models in real life instances assume multiple inputs and outputs leading to a more complex way of calculation of efficiency calculated as \(\theta_{j} = \frac{{\sum\nolimits_{r = 1}^{s} {u_{r} \cdot y_{r,j} } }}{{\sum\nolimits_{i = 1}^{m} {v_{i} \cdot x_{i,j} } }}\). To assess the performance of each DMU under multiple inputs and outputs, the LP models solved (Banker 1984) are shown in Table 1.

Table 1 Multiplier (a) and output (b) envelopment DEA models

The model presented in Table 1(a) is a multiplier input-oriented model, whereas the one presented in Table 1(b) is the envelopment model which is the dual of the multiplier LP model. The technology assumed in model presented in Table 1(b), is Constant Returns to Scale (CRS); with the addition of constraint \(\sum\nolimits_{j = 1}^{n} {\lambda_{j} } = 1\). The technology assumed is Variable Returns to Scale (VRS). The LP models presented in Table 1, are solved for each DMU under consideration. Each efficiency score is obtained as \(\theta^{*}\) in the case of an input-oriented DEA model.

3.2 Evaluation of mutual fund performance

As DEA has been applied in a wide range of disciplines and scientific areas, variations of the initial models have been proposed taking into account the evolution of DMUs over time, including multi-objective analysis, multi-criteria decision analysis and so on. One such special case is that of financial analysis with emphasis on mutual funds.

In the framework of mutual funds, a model focused on the performance of each unit under investigation has been proposed entitled DEA Portfolio Efficiency Index (DPEI), initially developed by Murthi et al. (1997) (presented in model 1). The aim of the model is to maximize the return of each mutual fund under investigation taking as inputs the value of transaction cost \(i\) and the risk for each mutual fund \(j\).

$$\begin{gathered} \max \, \frac{{R_{0} }}{{\sum\nolimits_{i = 1}^{I} {w_{i} \cdot x_{i,o} } - v \cdot \sigma_{0} }} \hfill \\ s.t. \hfill \\ \frac{{R_{j} }}{{\sum\nolimits_{i = 1}^{I} {w_{i} \cdot x_{i,j} } - v \cdot \sigma_{j} }} \le 1, \, j = 1,..,n \hfill \\ \end{gathered}$$
(1)

To this end, a special case of DEA model has been proposed by Basso and Funari (2001, 2014) extending the initial performance index for mutual funds (presented in model 2).

$$\begin{gathered} \min \, \sum\limits_{i = 1}^{m} {v_{i} \cdot x_{i,o} } + \sum\limits_{k = 1}^{p} {w_{k} \cdot \sigma_{k,o} } \hfill \\ s.t. \hfill \\ \sum\limits_{r = 1}^{s} {u_{r} \cdot y_{r,j} } - \left( {\sum\limits_{i = 1}^{m} {v_{i} \cdot x_{i,o} } + \sum\limits_{k = 1}^{p} {w_{k} \cdot \sigma_{k,o} } } \right) \le 0, \, j = 1,..,n \hfill \\ \sum\limits_{r = 1}^{s} {u_{r} \cdot y_{r,j} } = 1 \hfill \\ u_{r} \ge \varepsilon \hfill \\ v_{i} \ge \varepsilon \hfill \\ w_{k} \ge \varepsilon \hfill \\ \end{gathered}$$
(2)

The aim of the model is to minimize the inputs which are of two types: those connected with costs and fees of each mutual fund under investigation and the ones which are associated with risk (e.g., like annualized standard deviation).

The dual envelopment model of (2) is presented below (model 3). The inputs are both the ones associated with the cost/fees of the mutual fund as well as the ones associated with risks and the outputs are the ones associated with return. Model (3) as presented is CRS and with the addition of constraint \(\left[ {\sum\nolimits_{j = 1}^{n} {\lambda_{j} = 1} } \right]\) the technology is VRS.

$$\begin{gathered} \max \, \varphi \hfill \\ s.t. \hfill \\ \sum\limits_{i = 1}^{m} {\lambda_{j} \cdot } x_{i,j} \le x_{i,o} , \, j = 1,..,n \hfill \\ \sum\limits_{i = 1}^{m} {\lambda_{j} \cdot } y_{r,j} \ge \varphi \cdot y_{r,o} , \, j = 1,..,n \hfill \\ \left[ {\sum\limits_{j = 1}^{n} {\lambda_{j} = 1} } \right] \hfill \\ \lambda_{j} \ge 0, \, j = 1,..,n \hfill \\ \end{gathered}$$
(3)

3.3 Hypothesis testing

Once the efficiency scores are obtained with the use of DEA model (3), the question which arises is whether the model constructed is the optimal one in terms of the technology assumed (CRS, VRS) or whether any other combination of inputs or outputs could lead to better performance. Furthermore, in many cases, efficiency scores are analyzed against several control variables, which are not used in the calculation for efficiency, but nevertheless affect the efficiency results in an indirect way. In this section two types of hypothesis testing will be presented.

3.4 Hypothesis testing for model construction and underlying technology

Let us assume the case where DEA Model (3) is constructed in two ways; with VRS or CRS technology. If \(T_{CRS}\) and \(T_{VRS}\) are CRS and VRS technologies respectively, while \(TE_{CRS}\) and \(TE_{VRS}\) are the efficiency scores under the corresponding technologies, then the hypotheses formulated are shown below:

$$\begin{gathered} H_{0} {: }TE_{CRS} = TE_{VRS} \hfill \\ H_{1} {\text{: TE}}_{CRS} \ne TE_{VRS} \hfill \\ \end{gathered}$$

The calculation formulas for the test statistic are shown in Table 2

Table 2 Test statistic calculation formulas based on the various distributions

3.5 Hypothesis testing for control variables

In the world literature, mutual fund performance is analyzed against control variables (as a means of second stage analysis). To this end, the performance as calculated with the DPEI model as well as the bootstrap method is examined in terms of the levels of the control variables. There are a series of research questions which arise from the literature and are examined in this paper.

3.5.1 H1a and H1b hypothesis formulation

According to the literature (Belghitar et al. 2017; Henke 2016; Nofsinger and Varma 2014; Soler-Domínguez et al. 2021), funds with higher ESG scores outperform those with lower ESG scores due to ESG risk migration.

In this case, the hypothesis which is assumed and should be confirmed by statistical analysis is the following:

$$\begin{gathered} H_{0} : \overline{\varphi }_{ESG = Q1} = \overline{\varphi }_{ESG = Q4} \hfill \\ H_{1} : \overline{\varphi }_{ESG = Q1} < \overline{\varphi }_{ESG = Q4} \hfill \\ \end{gathered}$$

In this case, the hypotheses formulated indicate whether the efficiency scores with mutual funds belonging to higher quartiles of ESG controversies score are greater than those (efficiency scores) with mutual funds belonging to lower quartiles of ESG controversies scores.

3.5.2 H2 hypothesis formulation

Even if geographical location and mutual fund performance may seem to be unrelated at a first glance, relevant literature has shown that mutual funds which are based in Europe, are associated with low mutual funds’ performance, mainly because of the low values of Sharpe ratio (Abate et al. 2021).

3.6 DEA model construction

Once the model to be used has been selected and established, the inputs and outputs of the study are analyzed. The aim of the proposed model is to evaluate the performance of mutual funds under the prism of ESG controversies scores in the COVID-19 era. Therefore, the inputs and outputs selected are approximating the behavior of mutual funds and will be analyzed against the control variables.

On the input side, the variables selected are the Standard Deviation of the return of each Mutual Fund \(j\) which is calculated by \({\sigma }_{j}= \sqrt{Var({R}_{j})}\) and the Total expenses ratio (hereafter Expenses). These measures approximate the risk side of mutual funds.

On the output side, the variables selected are the Treynor index of each mutual fund \(j\) which is calculated as \({Treynor}_{j}= ({R}_{j}- {R}_{f})/{\beta }_{j}\) and Information ratio of each mutual fund \(j\) which is calculated as \({Information}_{j}=\frac{{R}_{j}- {R}^{b}}{Tracking\;Error}\). These measures approximate the return side of the mutual funds.

The model, denoted as model (5), is constructed to address the objective of maximizing efficiency, represented by variable \(\varphi\). This model is designed to be solved for each DMU, with the specific DMU under investigation indicated by the index o.

In the model, variable \(\lambda_{j}\) represents the proximity or similarity of the DMU under investigation to other DMUs in the reference set. The model aims to find the optimal solution for φ, which maximizes the efficiency of the DMU.

To measure the deviation between the left-hand side and right-hand side of the constraints, slack variables (\(s_{Expenses}^{ - } ,s_{\sigma }^{ - } ,s_{Sharpe}^{ + } ,s_{InfRario}^{ + }\) are introduced. These slack variables quantify the amount of deviation or surplus between the resources utilized by the DMU and the resources available to it.

The calculation of the Technical Efficiency for each DMU is expressed as \({\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 {\varphi^{*} }}}\right.\kern-0pt} \!\lower0.7ex\hbox{${\varphi^{*} }$}}\), where φ* represents the optimal solution for the variable φ obtained through solving the model. A DMU is considered fully efficient if φ* is maximized and all slacks are equal to zero, indicating that the DMU is operating at the highest level of efficiency with no unused resources or underutilized inputs.

$$\begin{gathered} \max \, \varphi + \varepsilon \cdot \left( {s_{Expenses}^{ - } + s_{\sigma }^{ - } + s_{Sharpe}^{ + } + s_{InfRario}^{ + } } \right) \hfill \\ \, s.t. \hfill \\ \, \sum\limits_{j = 1}^{n} {\lambda_{j} \cdot } Expenses_{j} + s_{Expenses}^{ - } = Expenses_{o} \hfill \\ \, \sum\limits_{j = 1}^{n} {\lambda_{j} } \cdot \sigma_{j} + s_{\sigma }^{ - } = \sigma_{o} \hfill \\ \, \sum\limits_{j = 1}^{n} {\lambda_{j} } \cdot Sharpe_{j} - s_{Sharpe}^{ + } = \varphi \cdot Sharpe_{o} \hfill \\ \, \sum\limits_{j = 1}^{n} {\lambda_{j} } \cdot InfRatio_{j} - s_{InfRario}^{ + } = \varphi \cdot InfRatio_{o} \hfill \\ \, \sum\limits_{j = 1}^{n} {\lambda_{j} = 1} \hfill \\ \, \lambda_{j} \ge 0, \, j = 1,..,n \hfill \\ \, s_{Expenses}^{ - } ,s_{\sigma }^{ - } \ge 0 \hfill \\ \, s_{Sharpe}^{ + } ,s_{InfRario}^{ + } \ge 0 \hfill \\ \end{gathered}$$
(4)

In the objective function of Model 5, \(\varepsilon\) is a very small positive number.

In model (5), the output \(InfRatio\) is ratio, therefore the convexity assumption of the Production Possibility Set (PPS) is not met. Relevant research has outlined and highlighted this very important issue of the PPS of DEA when the inputs or outputs are ratios (Emrouznejad and Amin 2009).

One of the possible solutions to this problem is to reformulate the output-oriented model by decomposing the ratio to numerator and denominator as extra constraints. Therefore, if the numerator of information ratio is defined as \(InfRatio_{j} \_num = R_{j} - R^{b}\) and the denominator of information ratio is defined as \(InfRatio_{j} \_den = Tracking \, error_{j}\) then model (5) is rewritten as follows.

$$\begin{gathered} \max \, \varphi \hfill \\ \, s.t. \hfill \\ \, \sum\limits_{j = 1}^{n} {\lambda_{j} \cdot } Expenses_{j} \le Expenses_{o} \hfill \\ \, \sum\limits_{j = 1}^{n} {\lambda_{j} } \cdot \sigma_{j} \le \sigma_{o} \hfill \\ \, \sum\limits_{j = 1}^{n} {\lambda_{j} } \cdot Sharpe_{j} \le \varphi \cdot Sharpe_{o} \hfill \\ \, \sum\limits_{j = 1}^{n} {\lambda_{j} } \cdot InfRatio_{j} \_num - InfRatio_{o} \cdot \varphi \cdot \sum\limits_{j = 1}^{n} {\lambda_{j} } \cdot InfRatio_{j} \_den \ge 0 \hfill \\ \, \sum\limits_{j = 1}^{n} {\lambda_{j} = 1} \hfill \\ \, \lambda_{j} \ge 0, \, j = 1,..,n \hfill \\ \end{gathered}$$
(5)

However, Hanafizadeh et al. (2014) proposed a potential solution to address the convexity issue in DEA models by suggesting that, if the DMUs are approximately of the same size, the problem of violating the convexity axiom can be eliminated. In this paper, the DMUs under investigation are approximately of the same size (Fig. 1).

Fig. 1
figure 1

Graphical demonstration of the proposed model

The proposed model is solved, and efficiency scores are evaluated for the following instances:

  1. 1.

    For all of the DMUs (mutual funds) (hereafter set \(J\))

  2. 2.

    For the Mutual Funds belonging to ESG quartile Q1 (hereafter set \(Q1\subset J\))

  3. 3.

    For the Mutual Funds belong to ESG quartile Q4 (hereafter set \(Q4\subset J\))

The results derived for each of the instances presented above and the data will be analyzed with respect to the control variables of the second stage analysis. The results of the analysis will be presented in the following sections, where we will also draw valuable insights linking our findings with the relevant literature.

4 Results

4.1 Descriptive statistics

The descriptive statistics of the examined dataset will be initially presented. Table 3 provides an overview of the descriptive statistics for the inputs and outputs of the study. The number of DMUs selected in this analysis is 17,961. We used the Refinitiv database to identify mutual funds with available ESG scores. The final sample consists of 17,961 global mutual funds that were active during the later phase of the COVID-19 pandemic, and we retrieved static data for a one-year period as of July, 2022. To evaluate the effect of ESG performance on mutual fund performance, we used the ESG controversies score, which takes into account the firms’ exposure to ESG-related corporate scandals. The ESG controversies score is estimated using 23 ESG controversy topics. Higher values of ESG controversies score imply lower level of controversies and thus an ESG controversies score of 100 indicates no controversies for the selected company and therefore higher overall combined ESG performance. Thus, higher ESG controversies scores indicate that the firms are low concerns related to corporate sustainability (Park 2023).

Table 3 Descriptive statistics and central tendency measures of the inputs and outputs

Crucial information when dealing with DEA models is provided by the correlation within the inputs and outputs, and between inputs and outputs of the study. As seen in Fig. 2, the correlation between inputs and outputs, and within the inputs and outputs remains at low levels indicating no or low correlation.

Fig. 2
figure 2

Correlogram of inputs and outputs of the DEA model

In the following Fig. 3 the correlation analysis of the inputs and outputs will be presented.

Fig. 3
figure 3

Boxplot for Inputs and Outputs for each ESG Quartile; A σ, B Expenses, C Treynor Ratio, D Information Ratio

Figure 3 presents the analysis of input and output variables for each ESG quartile. From Fig. 3A it can be observed that the higher the ESG quartile of a mutual fund, the higher the average annualized standard deviation. Regarding expenses ratio, as seen in Fig. 3B, the average expenses ratio is higher for higher ESG quartiles and with less dispersion compared to lower ESG quartiles. On the output side, in Fig. 3C, D, the average of Treynor and Information Ratios seem to be very close in each ESG quartile.

Another point of interest which is taken into consideration in this analysis is the mutual funds’ geographical investment area. Four geographical location groupings were considered, namely Europe, Global, United States and Rest of the world.

In Fig. 4, each input and output are presented against the geographical area of investment. Regarding the risk, approximated by the annualized standard deviation, the highest mean value for σ is observed for mutual funds in Europe. This finding can be interpreted by the heavy impact of COVID on the economies of European Union countries (Fig. 4A). Regarding the total expenses ratio (Fig. 4B), the lowest average is observed in mutual funds with U.S. as geographical location of investing focus. Regarding the outputs of the model (Treynor and Information Ratio), the average returns of the mutual funds examined seem to be quite close to each other, albeit with large outliers in almost all geographical location levels.

Fig. 4
figure 4

Boxplot for Inputs and Outputs for the Geographical location of Mutual Funds examined; A σ, B Expenses, C Treynor ratio, D Information ratio

Having analyzed all the control variables of the study, in the next subsections the results of the DPEI model will be analyzed along with the results from the hypothesis testing as formulated in the methodology section.

4.2 Efficiency scores for mutual funds

In this section, the efficiency scores for the DPEI model will be presented. Since the main aim of the paper is to explore the behavior of mutual funds under external (control) variables, the main of which is ESG controversies score quartiles, the efficiency score results are analyzed as a whole (for all the dataset of 17,961 mutual funds) among the mutual funds which belong to ESG quartile Q4 and among the mutual funds which belong to Q1. This type of segregation of mutual funds is performed mainly to analyze in depth the connection between efficiency scores and ESG controversies scores.

The way that the data is obtained and analyzed is graphically presented in Fig. 5.

Fig. 5
figure 5

Graphical illustration of the DPEI model solved for each instance indicating the obtained efficiency φ and the corresponding Technical Efficiency (1/φ)

To explore the relationship between ESG scores and technical efficiency, we divide the mutual funds into different strata based on their ESG scores. Specifically, we calculate the technical efficiency of mutual funds for all DMUs in our sample. We then separate the DMUs into two distinct groups: those whose ESG scores fall within the low quartile (Q1) and those whose ESG scores fall within the high quartile (Q4).

This approach allows us to investigate whether mutual funds in the higher quartile of ESG scores exhibit superior performance compared to those in the lower quartile. By focusing on these specific strata of observations, we aim to capture any potential differences in technical efficiency between the two groups.

The rationale behind this stratification is to shed light on the impact of ESG scores on mutual fund performance. We hypothesize that mutual funds with higher ESG scores, represented by those in the Q4 quartile, may demonstrate superior technical efficiency compared to funds with lower ESG scores, represented by those in the Q1 quartile. This analysis will help us assess whether there is a correlation between higher ESG scores and improved technical efficiency in the context of mutual funds.

By conducting this stratified analysis, we aim to provide valuable insights into the relationship between ESG scores and technical efficiency within the mutual fund industry. The findings from this investigation will contribute to the existing literature by offering a deeper understanding of the potential performance advantages associated with higher ESG scores.

4.3 Efficiency scores analysis: all mutual funds

Firstly, the analysis starts with every DMU (mutual fund) of the dataset. The results of the efficiency scores are presented in Figs. 6, 7, 8 and 9.

Fig. 6
figure 6

Density plot and Boxplot of Technical Efficiency (1/φ) scores of all mutual funds \(\left( {j \in J} \right)\)

Fig. 7
figure 7

Density plot of Technical Efficiency (1/φ) scores of all mutual funds \(\left( {j \in J} \right)\), for scores of mutual funds of ESG quartile Q1 \(\left( {Q_{1} \subset J} \right)\) and for scores of mutual funds of ESG quartile Q4 \(\left( {Q_{4} \subset J} \right)\)

Fig. 8
figure 8

Cumulative Density Function Technical Efficiency (1/φ) for scores of mutual funds of ESG quartile Q1 \(\left( {Q_{1} \subset J} \right)\) and for scores of mutual funds of ESG quartile Q4 \(\left( {Q_{4} \subset J} \right)\)

Fig. 9
figure 9

Interaction plot of the average Technical Efficiency for each level of geographical location of each mutual fund for Q1 and Q4 ESG quartiles

In Fig. 6, the results are presented for Technical Efficiency scores defined as the inverse ratio of the optimal level of \(\varphi_{j}^{*}\) (\(TE_{j} = {\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 {\varphi_{j}^{*} }}}\right.\kern-0pt} \!\lower0.7ex\hbox{${\varphi_{j}^{*} }$}}\)), obtained by solving model (4) for each DMU. More specifically, the density plot (Fig. 6A), indicates that the model has increased discrimination power, since the majority of DMUs have technical efficiency less than 1 and there is a dispersion of efficiency scores across the spectrum of 0 to 1. The latter finding is supported by the boxplot and jitter plot presented in Fig. 6 B; the average technical efficiency is 9.2%.

4.4 Efficiency score analysis: linking the results

The results of the DPEI model are shown with respect to all DMUs (all mutual funds in our case), only for those mutual funds whose ESG controversies score falls within the 1st quartile (Q1) and for those mutual funds whose ESG controversies score falls within the 4th quartile (Q4). The aim of the model is to investigate the insights which will be produced and link these results with the background theory and practice. Figure 7 presents two types of information: the density plot of efficiency scores as derived for each of the run (all DMUs or \(j \in J\),\(Q_{1} \subset J\),\(Q_{2} \subset J\)) and a boxplot for each type of combination examined. From the density plot, the density of all DMUs is actually the same as in Fig. 6, however, it has been plotted in combination with the density plots of efficiency scores of ESG quartiles Q1 and Q4 to analyze the behavior in each of the instances.

The majority of efficiency scores for mutual funds belonging to ESG quartile Q4 (green line) demonstrate higher frequency in higher efficiency scores. On the contrary, efficiency scores for mutual funds belonging to ESG quartile Q1 (red line) demonstrate higher frequency in lower efficiency scores. On an aggregate level, the boxplot of Fig. 7 indicates that, when the DPEI model has been calculated for all DMUs, the average efficiency is less than the average efficiency, when the DPEI model has been calculated for DMUs belonging to ESG Quartile Q1. The highest average efficiency scores are those belonging to ESG quartile Q4. This finding comes in alignment with the relevant literature which indicates that mutual funds with high ESG controversies scores perform better than those with lower ESG controversies scores (in our case proxied by ESG Quartiles) due to the dispersion of risk.

The efficiency scores as calculated for mutual funds belonging to Q1 and Q4 ESG quartiles are also shown in the Cumulative Density Function (CDF) analyzing their Technical Efficiencies.

In Fig. 8 the CDF of Technical Efficiencies for each of the two instances (Q1, Q4) is shown. As can be seen, mutual funds belonging to ESG Quartile Q4 exhibit higher overall Technical Efficiency (one by one comparison) with respect to the mutual funds belonging to ESG Quartile Q1. This indicates that mutual funds with higher ESG scores perform better in comparison to the ones with lower ESG scores, which means that the overall risk and expenses ratio are lower, while the outputs (Sharpe and Information ratio) are higher.

4.5 Hypothesis testing

4.5.1 Results for H1a & H1b

Efficiency scores as per each instance of ESG Quartiles have been calculated and, in this section, a hypothesis testing is performed to examine the relationship between the average values of the two efficiency scores.

To do so, the Kolmogorov—Smirnoff test is performed assuming no prior distribution for \(t\left( \cdot \right)\) (Table 2).

The hypothesis which is formulated does not concern two different types of underlying production technologies (VRS and CRS), but two levels of inputs and outputs as per an external control variable. It is the following:

$$\begin{gathered} H_{0} : \overline{TE}_{ESG = Q1} = \overline{TE}_{ESG = Q4} \hfill \\ H_{1} : \overline{TE}_{ESG = Q1} < \overline{TE}_{ESG = Q4} \hfill \\ \end{gathered}$$

The results from the non-parametric test (Kolmogorov–Smirnoff) indicate that \(D_{n} = {0}{\text{.26}}\) (\(D_{n}\) is defined as the supremum of \(\left| {F_{n} \left( X \right) - F\left( X \right)} \right|\) where \(F_{n} \left( X \right)\) is the empirical distribution function for n observations \(X\)) with \(p = 0.00 < 0.01\); thus, the results are statistically significant at 1% level. The latter indicates that the null hypothesis is rejected and that the alternative is accepted leading to the fact that the Technical Efficiency for mutual funds with higher ESG controversies score (approximated with ESG Quartile) is greater than those mutual funds with lower ESG controversies score.

4.5.2 Results for H2

The mutual funds’ geographical investment area plays a significant role on mutual fund performance, mainly because of the factors that have an impact on the local economy. Therefore, the efficiency scores as derived from the DPEI model are analyzed against the categories of the geographical investment area of the mutual fund. To analyze the hypothesis and answer this research question, analysis of variance (ANOVA) is applied, since the geographical location variable contains more than one level.

Also, this research question is examined in conjunction with mutual funds’ ESG quartile (Q1 or Q4). This type of analysis allows researchers to gain insights breaking down complex combinations to usable results which can lead to valuable investment decisions. Results are shown in Table 4. They demonstrate that there is a statistically significant difference between the efficiency scores and levels of the geographical location variable, ESG quartile and their interaction (geographical location and ESG quartile). Results show (Fig. 9) that the average efficiency of mutual funds of a higher ESG quartile (Q4) is significantly higher in comparison to those of a lower ESG quartile (Q1). Also, taking into consideration the mutual funds’ area of investing focus, the results for mutual funds with low values of ESG score (Q1) show that almost all funds seem to perform the same, with funds that are investing in Europe performing better than the rest. The results for mutual funds with high values of ESG score (Q4) demonstrate that the average Technical Efficiency is higher for mutual funds that are investing in the U.S. in comparison to funds that are investing in Europe. This is associated with the low performance of European mutual funds in the Sharpe Ratio. The finding provides an answer to the research question, confirming the relevant literature as well.

Table 4 ANOVA results for research question 2

5 Conclusions

The COVID-19 pandemic has significantly disrupted economies worldwide, impacting investment in mutual funds. The unprecedented nature of the pandemic has compelled investors to assess financial products not only based on traditional financial metrics but also on non-financial indicators, such as ESG scores. These scores evaluate a company's or a financial institution's performance in terms of environmental, social, and governance factors. This shift towards ESG-focused investing has gained significant attention in recent years due to increased awareness of environmental and social issues in the business world. However, it is important to recognize that ESG scores and the financial performance of mutual funds are not completely unrelated.

This paper examined the impact of ESG controversies scores, which serve as a measure of funds' ESG performance, on the performance of global mutual funds during the later phase of the COVID-19 pandemic. Data Envelopment Analysis (DEA) models were utilized to evaluate mutual fund performance, and various statistical tests were employed to compare the performance between mutual funds with high ESG performance and those with low ESG performance.

The financial performance of mutual funds was assessed using the DEA method, examining inputs related to expenses and risk, and outputs related to returns for mutual funds worldwide. In the analysis, efficiency results derived from DEA were examined in relation to the characteristics of mutual funds, with a particular focus on their ESG scores. ESG scores were proxied by quartiles to facilitate result interpretation.

Statistical methods were applied to evaluate the validity of hypotheses formulated in the relevant literature. This secondary analysis, using DEA results, strengthens previously published findings, indicating that mutual funds with higher ESG scores (belonging to quartile Q4) outperform those with lower ESG scores (belonging to quartile Q1). This finding holds true across different countries/geographical locations, leading to a definitive overall conclusion. Therefore, stakeholders and investors can utilize the findings of this study as follows. The study highlights the importance of considering ESG factors in investment decision-making, particularly during environmental crises. Investing in mutual funds with higher ESG controversies scores may enhance financial performance. Furthermore, mutual fund managers can incorporate ESG criteria into their investment selection strategy without sacrificing financial performance. The results affirm that ESG controversies scores, which capture a fund's ESG performance, impact its financial efficiency, with funds exhibiting higher ESG controversies scores performing better than those with lower scores. As a result, mutual fund managers may incorporate securities with high ESG performance into their portfolios to achieve improved financial performance.

This study provides robust evidence that the ESG performance of mutual funds, specifically focusing on ESG controversies, is a crucial factor that influences their financial efficiency during the COVID-19 pandemic. Therefore, during periods of market crises associated with heightened public environmental awareness, such as the COVID-19 pandemic, analyzing criteria related to ESG controversies and investing in funds with higher ESG controversies scores can be an efficient strategy for financial maximization.