1 Introduction

The Paris Agreement and the 2030 Agenda for sustainable development demonstrate how the environmental, social and governance (ESG) challenges are widely complex (Friede 2019).

In the past decades, the investors’ attention on the ESG factors has been growing, becoming a recurring theme of research in different disciplines (Abate et al. 2021). While several scholars claim that ESG investments mean giving up a portion of profits for ethical reasons (Bauer et al. 2007; Bauer and Smeets 2015; Fich et al. 2015; Arouri and Pijourlet 2017), other scholars state that behind such choices, there is much more (Nilsson 2009; Krosinsky and Robins 2012; Hemerijck 2018; Kaufer and Steponaitis 2019).

Traditional investors choose their investments by adopting economic–financial criteria, such as returns, investment duration, risk aversion, risk premium, liquidity and so on. By contrast, ESG investors combine the economic performance with measurable environmental and social factors (Revelli 2017; Benlemlih and Bitar 2018; Lapanan 2018; Oikonomou et al. 2018; Chatzitheodorou et al. 2019; Rossi et al. 2019; Gomes 2020). The latter “recognize that the generation of long-term sustainable returns is dependent on stable, well-functioning, and well-governed social, environmental, and economic systems” (University of Cambridge 2022).

However, traditional and ESG investors assess the market risk considering a mix of economic and socio-political factors or aggregate events that may cause financial turmoil periods and deep shocks to stock markets (Zigrand 2014; Berk and DeMarzo 2018; Szczygielski et al. 2021; Duttilo et al. 2021). The dotcom bubble burst (2000–2002), the global financial crisis (2007–2008), the European sovereign debt crisis (2010–2012) and the COVID-19 pandemic are examples of turmoil periods.

This work studied the effect of turmoil periods on the performance and volatility of several Dow Jones Sustainability Indices (DJSIs) and compared them with their respective market benchmarks (traditional indices).

For this purpose, two different tools have been employed: a finite mixture of two generalized normal distributions (MGND) and the exponential GARCH-in-mean (EGARCH-M) model with exogenous dummy variables.

The MGND model is a flexible tool able to capture important stylized facts of financial returns, e.g. excess kurtosis and skewness (Wen et al. 2020). It was applied to fit the return distribution of the MSCI All-Country World Equity Index (MSCIW). This index represents the performance of large- and mid-cap stocks of 23 developed and 24 emerging markets. Financial market turmoil periods were objectively detected by applying the Naïve Bayes’ classifier to the mixture model results.

EGARCH-M model is commonly used to predict financial returns and their volatility (Hoti et al. 2007). The model was applied on several ESG and non-ESG indices including an exogenous dummy variable denoting the turmoil periods as detected by the MGND model.

The entire analysis provided insights on potential differences between ESG indices and their traditional market benchmarks investigating the impact of stable and turmoil periods on the conditional mean and volatility and other interesting aspects of financial returns such as risk premium, leverage effect and volatility persistence.

These indices comprise the following markets: Global, the US, Europe (EU) and emerging markets (EM). Hence, Global and regional DJISs are selected: Dow Jones Sustainability World Index, Dow Jones Sustainability US Composite Index, Dow Jones Sustainability Europe Index and Dow Jones Sustainability Emerging Markets. Likewise, their respective traditional market benchmarks are collected: Dow Jones Global Index, Dow Jones Industrial Average Index, Dow Jones Europe Index and Dow Jones Emerging Markets Index. The time period taken for the study is from 4 January 2016 to 30 September 2022.

The rest of the paper is organized as follows. Section 2 includes the literature review and the contribution of the present work. Section 3 describes the methodology and the data used. Section 4 illustrates the results of the analysis. Finally, Sect. 5 provides some conclusions.

2 Literature review

2.1 Previous research

There is a considerable amount of research that analyses the ESG equity indices by comparing their financial performance with traditional equity benchmarks with many different methods, techniques, dates and variables. However, there is no absolute consensus on the fact that ESG portfolios are less volatile than market benchmark portfolios (Ouchen 2022). Cunha et al. (2020) distinguished three different kinds of results about the financial performance of ESG equity indices: positive, neutral and mixed. The latter refers to studies that “found one or more positive, neutral or negative return-risk performance findings”. In order to highlight different methods and techniques used in this research field, the existing literature has been grouped into three main methodological frameworks: portfolio performance measures, Markov-switching (MS) models and GARCH models.

The first methodological framework is featured by works that applied portfolio performance measures such as Sharpe ratios, Treynor ratios and Jensen’s alpha (Schröder 2007; Collison et al. 2008; Consolandi et al. 2009; Belghitar et al. 2014; Lean and Nguyen 2014; Cunha et al. 2020).

The second methodological framework stands out for the use of MS models to analyse the performance of ESG indices (Shunsuke et al. 2012; Ouchen 2022).

In the last methodological framework, GARCH models are employed to analyse the conditional mean and volatility of daily returns on ESG indices. Usually the conditional analysis is supported by the unconditional analysis which uses different methodologies such as the portfolio performance measures.

Employing GARCH models, Hoti et al. (2007) analysed the conditional volatility of some sustainability and ethical indices: Ethibel, ASPI Eurozone, Calvert Social Index, Ethical Index and FTSE4Good (Global, the USA, the UK and Europe). Results showed differences in the volatility persistence in the short and long run and in the leverage effect. Lean and Nguyen (2014) analysed the performance and volatility of DJSIs for the Global and three regional markets over the period 2004–2013. The conditional analysis performed via the EGARCH model highlighted that the 2008 global financial crisis had a substantial impact on both return and volatility of sustainable investments. Similarly, Ang (2015) explored the behaviour of the DJSI Korea. Results showed that both return and volatility of Korea ESG portfolio are less affected by the 2008 crash. Using mean–variance testing and GARCH models, Sudha (2015) compared the performance and volatility of the S&P ESG India Index with two market benchmarks, i.e. the Nifty and the S&P CNX 500. Although the volatility clustering featured all three indices, it was found that S&P ESG India Index was less volatile compared to the Nifty. Jain et al. (2019) explored whether ESG investments offer better financial returns than the market benchmarks in developed and emerging markets. Applying GARCH models, the study concluded that the US large-cap ESG index (TRESGUS) provided the highest return and a suitable level of risk. Sabbaghi (2022) analysed the impact of good and bad news on the volatility of ESG firms employing the MSCI indices and the GARCH framework. Results showed the presence of the leverage effect for ESG firms, i.e. bad news increase volatility by a larger amount than good news.

2.2 Our contribution to existing literature

This research work could be located in the third methodological framework. To the best of our knowledge, this is the first work that employs mixtures of GND to objectively detect financial market turmoil periods and subsequently use them as dummy variable in the EGARCH(1,1)-M model. Whereas common approaches to assess the impact of a given period on returns and volatility include the use of subjectively identified dummy variables (Shehzad et al. 2020; Bora and Basistha 2021; Duttilo et al. 2021), sub-periods (Lean and Nguyen 2014; Ang 2015; Han et al. 2019) or Markov-switching models (Shunsuke et al. 2012; Ouchen 2022). Particularly, this study answers to the following questions: is the ESG index less or more volatile than its respective market benchmark? and then, are there differences among markets? Do turmoil periods have a significant impact on ESG indices? and then, are there differences with respect to traditional indices? Which market has been most affected by the turmoil periods? Do the ESG indices have a significant risk premium and leverage effect? and Do ESG and traditional indices have the same volatility persistence?

3 Data and methodology

3.1 Data

The MSCIW index is selected to detect stable and turmoil periods. This global traditional index provides a good representation of the global market and regional ones by the performance of large- and mid-cap stocks of 23 developed and 24 emerging markets.

The DJSI Index Family tracks the stock performance of leading sustainability-driven companies which stand out in terms of economic, environmental and social criteria (Hoti et al. 2007; S&P Global 2022). This index family was launched in 1999 as the first global sustainability benchmark. The DJSI Index Family is a “best-in-class” benchmarks because only the top ranked companies in the S&P Global ESG Scores are selected for inclusion in the Index Family (S&P Global 2022). Its composition is reviewed annually and rebalanced quarterly.

In this study, data on daily closing prices of traditional and ESG indices have been collected as shown in Table 1. The time period taken for the study is from 4 January 2016 to 30 September 2022. To underline the COVID-19 pandemic scenario, the considered period does not include the global financial crisis (2007–2009) and the sovereign debt crisis (2010–2011) periods.

Table 1 Selected traditional and ESG indices
Table 2 Estimated basic statistics of traditional and ESG indices

Next, the daily returns of all equity indices under study were calculated with the natural log difference approach (Wen et al. 2020; Shehzad et al. 2020; Duttilo et al. 2021; Ouchen 2022)

$$\begin{aligned} r_{t}=\ln \Biggr (\frac{\textit{P}_{t}}{\textit{P}_{t-1}}\Biggr )100, \end{aligned}$$
(1)

where

  • \(r_{t}\) is the daily percentage return on equity index at time t;

  • \(P_{t}\) is the daily closing price of equity index at time t;

  • \(P_{t-1}\) is the daily closing price of equity index at time \(t-1\).

3.2 Finite mixtures of GND

Mixtures of distributions are widely used to fit the empirical distribution of daily returns in order to capture important stylized facts such as excess kurtosis and skewness. There are some works (Behr and Pötter 2009; Bellalah and Lavielle 2002; Kon 1984; Han et al. 2019) that applied a mixtures of Gaussians (with two or three components) to capture the excess kurtosis and positive or negative skewness of daily returns distribution of common stocks and indices. However, mixtures of Gaussians impose a priori specific constraints on the form of the returns distribution. Thanks to the flexibility provided by the additional shape parameter \(\nu\), the finite mixture of GND can overcome this critical issue. The contribution of Wen et al. (2020) explored the univariate mixture of GND and proposed an expectation conditional maximization (ECM) algorithm for parameter estimation. Moreover, a two-component mixture of GND and a two-component mixture of Gaussians were estimated on the S&P 500 and Shanghai Stock Exchange Composite Index daily returns. Results showed that the mixture of GND better describes the excess kurtosis and skewness of daily returns compared to mixtures of Gaussians.

A random variable R is said to have the generalized normal distribution with parameters \(\mu\) (location), \(\delta\) (scale) and \(\nu\) (shape) if its probability density function (p.d.f.) is given by

$$\begin{aligned} f(r_t|\mu ,\delta ,\nu )=\frac{\nu }{2\delta \Gamma (1/\nu )}\exp \Biggr \{-\Biggr |\frac{r_t-\mu }{\delta }\Biggr |^\nu \Biggr \}, \end{aligned}$$
(2)

with \(\Gamma (1\mathbin {/}\nu )=\int _0^\infty t^{1\mathbin {/}\nu -1}\exp ^{-t}dt\), \(-\infty<r_t<\infty\), \(-\infty<\mu <\infty\), \(\delta >0\), \(\nu >0\).

Thanks to the shape parameter who determines the tails of the distribution, the GND distribution is a flexible tool to capture a large class of statistical distributions (Nadarajah 2005; Wen et al. 2020), for example with \(\nu =1\) and \(\nu =2\), GND becomes a normal and Laplace distribution, respectively.

The p.d.f. of the univariate mixture of GND is given by:

$$\begin{aligned} \begin{aligned} f(r_t|\theta )&=\sum _{k=1}^K\pi _kp(r_t|\mu _k, \delta _k, \nu _k), \\&=\sum _{k=1}^K\frac{\pi _k\nu _k}{2\delta _k\Gamma (1\mathbin {/}\nu _k)}\exp \Biggr \{-\Biggr |\frac{r_t-\mu _k}{\delta _k}\Biggr |^{\nu _{k}}\Biggr \}, \end{aligned} \end{aligned}$$
(3)

where \(\theta =(\pi _k, \mu _k, \delta _k, \nu _k), \mu _k \in \textrm{R}\), \(\delta _k>0\), \(\nu _k>0\), \(0<\pi _k<1\) and \(\sum _{k=1}^K\pi _k=1\). If the random variable R has the p.d.f. as in Eq. (3), then the variance of R is given by

$$\begin{aligned} \sigma ^2=\sum _{k=1}^K\pi _k[\mu ^2_k+\sigma ^2_k], \end{aligned}$$
(4)

where

$$\begin{aligned} \sigma ^2_k=\delta _k^2\Gamma (3/\nu _k)/\Gamma (1/\nu _k) \end{aligned}$$
(5)

is the variance of the component k.

This model nests several distributions as its sub-models (Wen et al. 2020), namely according to the shape parameter value (\(\nu _k\)). For example, for \(K=2\), the univariate mixture of GND reduces to:

  • The mixture of Gaussians when \(\nu _1=\nu _2=2\);

  • The mixture of Laplace distributions when \(\nu _1=\nu _2=1\);

  • The mixture of Gaussian and Laplace distributions when \(\nu _1=2\) and \(\nu _2=1\);

  • The mixture of Gaussian and GND distributions when \(\nu _1=2\) and \(\nu _2>0\);

  • The mixture of Laplace and GND distributions when \(\nu _1=1\) and \(\nu _2>0\).

In brief, the mixture of GND does not impose a priori specific constraint on the shape of each component of the mixture.

In this work, a two-component mixture of GND (with \(K=2\)) was estimated on the MSCIW index returns via the ECM algorithm (Wen et al. 2020) to objectively detect turmoil periods. Specifically, it was assumed that there are two distinct stock market periods: stable and tumultuous. The former is predominant and less volatile than the latter which is characterized by extreme behaviour (Kim and White 2004). A daily return belongs to the stability period if it belongs to the stable component, i.e. the mixture component with the highest shape parameter. Similarly, a daily return belongs to the turmoil period if it belongs to the turmoil component, i.e. the mixture component with the lowest shape parameter. Conventionally, a smaller shape parameter means a thicker tail (excess kurtosis) and then a higher standard deviation, while a higher shape parameter means a thinner tail (mild kurtosis) and then a lower standard deviation. Shifts from the stable component to the turmoil component are assumed to be due to exogenous market events, i.e. “time-ordered shifts” (Kon 1984).

Kon (1984) assigned daily returns to a specific mixture component through the Naïve Bayes’ classifier. Originally, this classification rule was proposed for mixtures of linear models by Kon and Lau (1979) and applied by Christie (1983) and Kon (1983). As explained by Kon (1984), this procedure “may be particularly useful for efficient markets tests when the estimated data partition can be associated with corresponding public announcements or information signals in accounting numbers released prior to the event” (Kon 1984). In other words, it may be particularly useful to detect important market events and test hypothesis.

The two-component mixture of GND was exploited to classify the daily returns based on the Naïve Bayes’ classification rule that assigns each return to the class with the highest posterior probability (Frühwirth-Schnatter 2006). The simple classification rule is defined as follows:

$$\begin{aligned} \max _{k} \pi _k p(r_t|\theta _k), \end{aligned}$$
(6)

where the selected mixture component k generating the return \(r_t\) has the largest posterior probability. Consequently, a decoded variable which is an indicator of market turmoil was obtained by a “soft assignment” (Bishop 2006) of observations to classes. In this way, financial market turmoil periods are objectively detected.

3.3 Exponential GARCH-in-mean model with exogenous dummy variables

Originally, the work of Engle (1982) introduced the autoregressive conditional heteroscedastic (ARCH) model giving rise to a vast literature and variety of models. Next, in order to capture the risk premium, Engle et al. (1987) extended the ARCH model “to allow the conditional variance to be a determinant of the mean”, it was called ARCH-in-mean model or ARCH-M. Subsequently, Nelson (1991) introduced the exponential ARCH model to overcome some limitations of the ARCH model. The EGARCH-M model is a generalization of the exponential ARCH model. These last two asymmetric models are able to capture the leverage effect, an important stylized fact of financial time series. The leverage effect occurs when negative returns increase volatility by a larger amount than positive returns (Francq and Zakoian 2019).

In this study, the conditional mean and volatility equations have been modelled with the EGARCH-M model. In both equations, an exogenous dummy variable was included to take into account the state of the financial market at time t. Specifically, the dummy variable \({TURMOIL}_t\) assumes the value of 1 during turmoil periods; otherwise, it is equal to 0, i.e. during stability periods. In this way, it is possible to describe the impact of stability and turmoil periods on conditional mean and volatility of equity indices. The EGARCH(1,1)-in-mean model with exogenous dummy variables is specified as follows:

Conditional mean equation

$$\begin{aligned} r_t=\mu +m_1{TURMOIL}_t+\phi _1r_{t-1}+\lambda h_t+\epsilon _t, \end{aligned}$$
(7)

Conditional volatility equation

$$\begin{aligned} \begin{aligned}&\ln (h^2_{t})=\omega +v_1{TURMOIL}_t+\alpha _1 z_{t-1}+\gamma _1(|z_{t-1}|-E[ |z_{t-1}|])+\beta _1\ln (h^2_{t-1})\\&\text {where}\hspace{0.5cm} z_{t}=\frac{\epsilon _t}{\sqrt{h^2_t}} \sim \text {Skewed-GND}(0,1,\nu ,s). \end{aligned} \end{aligned}$$
(8)

In Eq. (7), \(r_t\) and \(\epsilon _t\) indicate the returns and error terms of equity index at time t, respectively. Besides, \(\mu\) is the constant term. The coefficient \(m_1\) determines the impact of turmoil periods on the conditional mean. If \(m_1\) is negative and statistically significant, turmoil periods caused a reduction in the conditional mean. To capture the autocorrelation of returns (i.e. the linear relationship between lagged values of returns time series), the conditional mean equation includes a stationary first-order autoregressive process AR(1) like Hoti et al. (2007). The coefficient \(\phi _1\) measures the time link between \(r_t\) and \(r_{t-1}\). Following Engle et al. (1987), the conditional mean equation also includes the risk premium coefficient \(\lambda\). If \(\lambda >0\) and statistically significant, returns are positively related to their conditional standard deviation (\(h_t\)).

With regard to the conditional volatility in Eq. (8), \(\ln (h^2_t)\) denotes the natural logarithm of the conditional variance, \(\omega\) is the constant term and \(\alpha _1\) captures the sign effect and \(\gamma _1\) the size effect, while \(\beta _1\) is the GARCH effect and the volatility persistence. \(E[|z_{t-1}|]\) is the expected value of the absolute standardized residual. The coefficient \(v_1\) determines the impact of turmoil periods on the conditional volatility. If \(v_1\) is positive and statistically significant, turmoil periods caused an increasing in the conditional volatility. In order to better describe leptokurtosis and fatter tails of returns, the standardized residuals \(z_t\) were modelled using the skewed-GND distribution with mean 0, variance 1, \(\nu\) and s as shape and skewness parameters, respectively.

The EGARCH(1,1)-M model with exogenous dummy variables was estimated for all indices under study through the R’s rugarch package (Ghalanos 2022).

Table 3 Statistical tests results
Table 4 AIC, BIC and log-likelihood (LL) of estimated models on MSCIW index

4 Results

4.1 Results of exploratory data analysis

Table 2 presents the basic statistics of traditional and ESG indices. The mean of traditional and ESG indices is almost the same for Global and the US markets, while the mean of ESG indices is slightly higher than the mean of traditional indices for EU and EM markets. More importantly, in the EU market, the traditional index turns out to have a higher standard deviation than its ESG counterpart. In the other markets, traditional and ESG indices have approximately the same standard deviation. All indices show negative skewness and excess kurtosis. In general, ESG indices show lower skewness and kurtosis than non-ESG indices with only two exceptions. In the EU market, the level of skewness is approximately the same, while the ESG index has a higher kurtosis and skewness than the traditional one in the EM market.

Table 5 Estimated parameters of the two-component mixture of GND on MSCIW index
Table 6 Estimated standard deviation and excess kurtosis of the two-component mixture of GND on MSCIW index

Table 3 shows the results of some preliminary statistical hypothesis tests. According to the Jarque–Bera (JB) test, the daily returns are not normally distributed. The augmented Dickey–Fuller (ADF) test and Phillips–Perron (PP) test show that the null hypothesis of unit root can be rejected, and all indices are stationary in their first difference at 1% significance. The ARCH-LM test confirms the presence of ARCH effect and heteroscedasticity because the null hypothesis of no ARCH effect is rejected at 1% significance.

4.2 Turmoil period identification

Figure 1 shows that the two-component mixture of GND well describes the heavy-tailed and leptokurtic characteristics of the daily returns of the MSCIW index. Moreover, it confirms the presence of negative skewness given by a longer tail on the left.

Fig. 1
figure 1

Estimated density of daily returns on MSCIW index

Fig. 2
figure 2

Daily returns on MSCIW index and turmoil periods (grey vertical lines)

Fig. 3
figure 3

Impact of turmoil periods on the conditional mean (\(m_1\) coefficient) by index type and market

Fig. 4
figure 4

Impact of turmoil periods on the conditional volatility (\(v_1\) coefficient) by index type and market

Fig. 5
figure 5

Estimated conditional volatility by index type and market

Fig. 6
figure 6

Distribution of the estimated conditional volatility by index type and market

The Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) were applied to compare the goodness of fit performance and complexity among the Gaussian distribution, the GND, the mixture of two GND, the mixture of two Gaussian distributions and the mixture of a Gaussian and a Laplace distribution. The latter two are nested models of the finite mixture of GND when the shape parameter \(\nu _k\) takes on specific values (Sect. 3.2). Table 4 shows that both the AIC and BIC support the use of the two-component mixture of GND to fit the daily return distribution of MSCIW.

As mentioned in Sect. 3.2, stable and turmoil components could be identified on the basis of the shape parameter \(\nu\). Looking at the estimated coefficients of the two-component mixture of GND (Eq. 3) in Table 5, a few interesting considerations arise. Firstly, the stable component (\(\pi _1=0.7280\)) is predominant compared to the turmoil component (\(\pi _2=0.2720\)). Secondly, the estimated mixture of GND is bi-modally asymmetric \(\mu _1=0.1244>\mu _2=-0.1197\). Thirdly, the tails of the stable component intermediate between the Laplace and normal distributions \(1<\nu _1=1.1019<2\). On the other hand, tails of the turmoil component are more extreme than those of the Laplace distribution because \(\nu _2=0.6977<1\). A smaller \(\nu _k\) means a thicker tail and then a higher standard deviation (Eq. 5). Conversely, a higher \(\nu _k\) means a thinner tail and then a lower standard deviation. Table 6 shows that the standard deviation of the stable component \(\sigma _1=0.6978\) is less than the standard deviation of turmoil component \(\sigma _2=1.4300\). The excess kurtosis of turmoil component \(\kappa _2=8.1379\) is higher than the excess kurtosis of the stable component \(\kappa _1=2.2652\) which have a “mild kurtosis”. The combination of the excesses of kurtosis of both components fits the largest excess kurtosis \(\kappa =13.0891\) of daily returns.

Table 7 EGARCH: estimated coefficients of the conditional mean equation
Table 8 EGARCH: estimated coefficients of the conditional volatility equation
Table 9 EGARCH: diagnostic test results

Figure 2 illustrates daily returns of the MSCIW index. Grey vertical lines identify turmoil periods detected by the two-component mixture of GND and the Naïve Bayes’ classifier. It can be seen that the COVID-19 pandemic constitutes the most turbulent period in terms of timing and returns fluctuation.

4.3 Estimated exponential GARCH-in-mean model with exogenous dummy variables

4.3.1 Results of conditional mean equation

Table 7 shows the estimated coefficients of the conditional mean equation. The constant \(\mu\) is negative and statistically significant for all indices, except for the DJSEMUP index. Turmoil periods caused a decrease in the conditional mean because the coefficient \(m_1\) is negative and statistically significant for all indices. Figure 3 provides an easily interpretable view of the turmoil periods impact on the conditional mean by index type and market. The magnitude of the impact is higher in the US market, which is followed by the Global, EU and EM markets. Results showed no significant differences between ESG and non-ESG indices except for the EU market where the conditional mean of the traditional index is more impacted than the conditional mean of the ESG index.

The AR(1) coefficient \(\phi _1\) is statistically significant for all indices except for the DJUS. In addition, it is positive for the Global and EM indices but negative for the US and EU indices. These different signs may be related to the spurious consequences of the non-synchronous trading among the index’s component stocks (Campbell et al. 1997; Koutmos 1997; Basher et al. 2007).

Moreover, the results reveal a statistically significant coefficient of the risk premium for all indices. The ESG indices have a higher risk premium than traditional indices in the EU and EM markets.

4.3.2 Results of conditional volatility equation

Table 8 illustrates the estimated coefficients of the conditional volatility equation. The constant \(\omega\) is negative and statistically significant for all indices. Turmoil periods caused an increase in the conditional volatility because the coefficient \(v_1\) is positive and statistically significant for all indices. According to Fig. 4, the magnitude of the impact varies by index type and market. It is higher in the US market, followed by the Global, EU and EM markets. A more pronounced impact on the conditional volatility has been found for the traditional indices in the Global and EM markets.

Coefficients \(\alpha _1\) and \(\beta _1\) are statistically significant highlighting the presence of ARCH and GARCH effects for all indices. Additionally, the conditional volatility is characterized by a negative leverage effect given by the negative sign of \(\alpha _1\) and the statistical significance of \(\gamma _1\). Generally, the leverage effect is lower for ESG indices compared to their market benchmark, except for the US and EM markets which exhibit a reverse pattern.

Volatility shocks have “long memory” because the volatility persistence (\(\beta _1\) coefficient) is close to 1 for all indices. However, it is necessary to consider the estimated (conditional) volatility and its distribution for risk assessment purposes. Figure 5 shows the estimated volatility by index type and market. The graphical examination suggests some interesting considerations.

Global: The difference in terms of estimated volatility between the ESG and traditional indices is slight. During the financial crisis due to the COVID-19 pandemic, the traditional index exhibits a highest peak of estimated volatility (8.64) compared to the ESG index (7.02).

The US: The estimated volatility of the ESG and traditional index is approximately the same. During the financial crisis due to the COVID-19 pandemic, the ESG index exhibits the same peak of estimated volatility (11.84) compared to the traditional index (11.82).

Europe: The estimated volatility of the ESG and traditional index is different because the latter is higher than the former. During the financial crisis due to COVID-19 pandemic, the traditional index exhibits a highest peak of estimated volatility (7.17) compared to the ESG index (6.38). In addition, the time series highlight other two pronounced volatility peaks: The first occurs during the Brexit vote (June 2016), while the second occurs during the Russia–Ukraine War (February 2022).

Emerging Markets: The estimated volatility of the ESG and traditional index is approximately the same for almost the entire study period. During the COVID-19 pandemic, the ESG index exhibits a slightly highest peak of estimated volatility (5.72) compared to the traditional index (5.35).

These findings are also confirmed by the distribution of the estimated volatility in Fig. 6. It is possible to infer that stable periods are characterized by low volatility, while turmoil periods by high volatility, for both index types (in line with the assumptions in Sect. 3.2). Specifically, the median volatility of both index types is approximately the same during stable periods (dashed lines). In contrast, the median volatility of traditional indices is higher than that of ESG indices during turmoil periods (straight lines), 1.43 against 1.34.

4.4 Diagnostic test results

The skewed-GND distribution suitably captured the leptokurtosis and the skewness of the standardized error terms (Eq. 8) because the coefficients \(\nu\) and s are statistically significant for all indices (Table 8).

In order to ascertain the absence of autocorrelation and heteroscedasticity in the standardized residuals of the models, the weighted Ljung–Box test and the weighted ARCH-LM test are performed. Table 9 illustrates that the standardized residuals are not affected by both the serial correlation and heteroskedasticity because all test statistics have p values\(>0.05\).

5 Conclusions

This study adds to the literature as it investigates the reactions of ESG and traditional indices to turmoil periods.

Firstly, a finite mixture of two GND was estimated on the MSCIW index returns. According to the initial hypotheses (Sect. 3.2), findings show that the two-component mixture of GND well describes the heavy-tailed and leptokurtic characteristics of the daily returns. Similarly to Kim and White (2004) and Wen et al. (2020), the estimated mixture contains a “predominant” stable component and a “relatively rare” turmoil component. The former is characterized by a high shape parameter that implies a thinner tail and then a lower standard deviation, while the latter is featured by a small shape parameter that implies a thicker tail and then a higher standard deviation. Consequently, the mixture model results were exploited to objectively detect financial market turmoil periods.

Secondly, turmoil periods were included as exogenous dummy variables in the EGARCH-in-mean model to capture their impact on the conditional mean and conditional volatility of several ESG and non-ESG Dow Jones indices. Results show that the return–risk performance as well as the impact of turmoil periods on return and volatility vary among ESG and traditional indices and across markets. Like Hoti et al. (2007), Lean and Nguyen (2014) and Ang (2015) mixed return–risk findings (Cunha et al. 2020) have been obtained.

Whereas the European ESG index results to be less volatile than its traditional market benchmark, the estimated conditional volatility is approximately the same in the other markets.

The turmoil periods impact on the conditional mean and volatility of both index types shows a similar pattern: The most affected market is the US followed by the Global, EU and EM markets. However, the conditional mean of ESG indices is less affected in the EU market, while they are more resilient in the Global and EM markets in terms of conditional volatility.

Other interesting key aspects concern the risk premium and leverage effect. ESG indices have a higher risk premium than traditional indices in the EU and EM markets. As identified by Sabbaghi (2022), results confirm the presence of the leverage effect for ESG indices. The leverage effect is lower for ESG indices compared to their market benchmark, except for the US market and EMs which exhibit a reverse pattern.

Findings are important to equity portfolio managers, investors, policy-makers, academics and anyone who decide to encourage ESG investments. In order to contain the risk during turmoil periods, portfolio managers may apply rebalancing strategies placing higher weight on the less volatile asset. Besides, they may obtain more investment information about decisions, in terms of diversifying the risks considering also the turmoil period impact, risk premium and leverage effect (Markowitz 1952). Lastly, corporate executives yet shall use it to benchmark their own results against peers and track news as well.

It is important to note that the results obtained depend on the definition of the turmoil period (Sect. 4.2). For example, considering market specific turmoil periods, larger differences between ESG and non-ESG indices might be observed. Additionally, the Naïve Bayes’ classification rule makes the assignment without considering the temporal dependence of the returns. It would be interesting to explore a dynamic time-varying estimation of the mixture model.

To conclude, another extension of this work could be the study of the evolution of correlations among ESG and traditional indices focusing on the impact of turmoil periods and spillover effects among markets.