1 Introduction

In recent years ESG (Environmental, Social and Governance) analysis has become an important part of the investment process given the increasing attention being paid to the sustainability and societal impact of investing in a company or business. In contrast to traditional stock indices, ESG ones are based on social responsibility criteria to screen and select their components. According to the MSCI (Morgan Stanley Capital International) 2021 Global Institutional Investor survey (a survey of 200 asset owner institutions with assets totalling approximately $18 trillion) over three-quarters (77%) of investors increased ESG investments ‘significantly’ or ‘moderately’ in 2020, with this figure rising to 90% for the largest institutions (over $200 billion of assets). The corresponding percentages were 79% for the Asia–Pacific region, 78% for the US and 68% for the EMEA (Europe, Middle East and Africa) group of countries.Footnote 1 Also, over $19 billion flowed into ESG ETFs (Exchange Traded Funds) in 2020 (up from $8 billion in 2019), bringing the total to over $40 billion.Footnote 2

The increasing role of ESG investment has spawned a new literature analysing whether or not ESG indices outperform conventional ones (Gladish et al. 2013; Durán-Santomil et al. 2019), and affect the performance of financial companies (Junkus and Berry 2015) or the degree of market efficiency (Mynhardt et al. 2017). In general, socially responsible companies provide more transparent reporting; this implies higher costs for the collection, compilation, disclosure, publication and verification of information according to ESG criteria, and should also result in lower information asymmetry and greater market efficiency; however, this might not be the case if reporting regulations are not sufficiently stringent.

This paper aims to shed new light on these issues by comparing two sets of 12 ESG and conventional MSGI indices to establish whether or not there are differences in their stochastic behaviour, and whether their properties are the same for different groups of countries. For this purpose two different long-memory methods, specifically R/S (Rescaled Range) analysis and fractional integration, are applied to MSCI data spanning the period 2007–2020. Therefore the present study is much more comprehensive than previous ones, such as Mynhardt et al. (2017), which focused on a smaller subset of indices and only carried out R/S analysis. Evidence of greater efficiency of the ESG indices would provide an additional reason for socially responsible investing, whilst a higher degree of predictability would provide opportunities to market participants to make abnormal profits by means of appropriately designed trading strategies.

The layout of the paper is the following. Section 2 provides a brief review of the relevant literature. Section 3 describes the data and outlines the empirical methodology. Section 4 presents the empirical results. Section 5 provides some concluding remarks.

2 Literature review

The PRI (Principles for Responsible Investment), which is a UN-supported network of investors whose aim is to promote sustainable investment, was the first to define ESG criteria on the basis of which a total score is calculated for each company, which reflects the level of corporate social responsibility (CSR) and determines the weight of the company in the ESG index. ESG data are used to compare the performance of conventional versus socially responsible indices and mutual funds. Statman (2000) found that ESG indices outperform conventional ones such as the S&P 500. Cortez et al. (2009) showed that they perform better in the European markets than in the US ones. Lopez et al. (2007) compared the financial performance of companies with social-responsible investment (SRI) with that of traditional ones and found differences in the Dow Jones Sustainability Indices (DJSI) and Dow Jones Global Indices (DJGI) dynamics due to these companies’ CSR practices.

Durán-Santomil et al. (2019) reported that mutual funds investing in companies with higher ESG scores have a better performance, whilst Managi et al. (2012) and Gladish et al. (2013) found no evidence that they outperform their conventional peers. Leite and Cortez (2013) confirmed that differences between SRI funds and conventional ones are not statistically significant.

El Ghoul and Karoui (2016) concluded that high-CSR funds are outperformed by low-CSR ones as their investors derive utility from non-performance attributes. Cortez and Leite (2015) argued that in general ESG indices underperform during normal periods, whilst during turmoil periods such as the 2007 global financial crisis (GFC) they outperform conventional ones because they play an ‘insurance role’ (Varma and Nofsinger 2014; Becchetti et al. 2015). Abidin and Gan (2017), Junkus and Berry (2015), Rehman et al. (2016) and Schröder (2004) showed that the performance of SRI mutual funds and indices is generally not significantly different from that of conventional ones. Rehman et al. (2021) reported that in the case of the BRICS (Brazil, Russia, India, China and South Africa) countries ESG and conventional indices influence each other. Jain et al. (2019) argued that sustainable indices and conventional ones are substitutes. As for the effects of the COVID-19 pandemic, no statistically significant differences have been detected for the returns of ESG indices compared to traditional ones (Chiappini et al. 2021; Umar et al. 2020).

The mixed results of the studies discussed above can be attributed to differences in model specifications, sample periods, benchmarks etc. (Junkus and Berry 2015). The heterogeneity of sustainable investment in terms of its performance provides an opportunity to reduce risk by diversifying across regions (Cunha et al. 2020); this type of investment is not necessarily penalising for investors who could switch to it without incurring losses (Tripathi and Kaur 2020; Sharma et al. 2021).

Very few studies focus on the issue of persistence of ESG indices vis-a-vis conventional ones. In particular, Mynhardt et al. (2017) examined the persistence of the DJSI, S&P500 Environmental & Socially Responsible Index, FTSE4 Good Global Index, MSCI World ESG Index, NASDAQ OMX CRD Global Sustainability Index, and their traditional equivalents with R/S analysis; they found that generally SRI indices exhibit lower efficiency than traditional ones. The only previous study to use fractional integration techniques is due to de Dios-Alija et al. (2021), who analysed the Dow Jones, Eurostoxx, and Hang Seng monthly and weekly sustainable and traditional indices; high levels of persistence were observed in all cases and no differences were detected across markets. Persistence is a measure of market efficiency as discussed by Mandelbrot (1972) and Peters (1991, 1994). Previous studies analysing it for various financial markets also include Greene and Fielitz (1977), Lo (1991), Jacobsen (1995), Costa and Vasconcelos (2003), Onali and Goddard (2011), Caporale et al. (2016).

3 Data and methodology

We analyse two sets of 12 ESG and conventional daily indices from the MSCI (Morgan Stanley Capital International) website https://www.msci.com/. The sample period goes from 1 October 2007 to 31 December 2020 (with the only exception of the MSCI BRIC ESG series which starts on 12 July 2013). Specifically, the following (both ESG and conventional) MSCI indices are examined: US, UK, Japan, India, China, South Africa, Emerging Markets (including 27 emerging markets such as Argentina, Brazil, Egypt, Malaysia, Mexico, etc.), BRICS (Brazil, Russia, India, China, South Africa), World (including 23 developed markets, such as the US, Japan, UK, France, etc.), Europe (including 15 European developed markets such as Germany, Italy, Netherlands, the UK, etc.), Pacific (including 5 developed markets in the Pacific region, specifically Japan, Hong Kong, Australia, Singapore, and New Zealand), EAFE (a broad market index of stocks from Europe, Australasia, and the Middle East which includes more than 900 stocks from 21 countries).

To measure the degree of persistence of these series two different methods are applied, namely R/S (Rescaled Range) analysis and fractional integration methods. The former is based on the Hurst exponent H, which is the estimated slope coefficient in the following equation: log (R / S) = log (c) + H*log (n) (Hurst 1951). For each sub-period the range R (the difference between the maximum and minimum value of the index within the sub-period), the standard deviation S and their average ratio R/S are calculated. The length of the sub-period is increased and the calculations repeated until the sub-period coincides with the full sample. Each sub-period is characterised by its average value of R/S. The least square method is applied to these values and a regression is run, obtaining an estimate of the slope coefficient, which, as already mentioned, is known as the Hurst exponent. More details are provided below.

  1. 1.

    One starts with a time series of length M and transforms it into one of length N = M—1 using logs and converting stock prices into stock returns:

    $${N}_{i}=\mathit{log}\left(\frac{{Y}_{t+1}}{{Y}_{t}}\right),t=\mathrm{1,2},3,...(M-1)$$
    (1)
  2. 2.

    One divides this period into contiguous sub-periods A with length n, such that An = N, then one identifies each sub-period as Ia for a = 1, 2, 3..., A. Each element of Ia is denoted by Nk, with k = 1, 2, 3..., N. For each Ia the average sub-period return \({e}_{a}\) is defined as:

    $${e}_{a}=\frac{1}{n}{\sum }_{k=1}^{n}{N}_{k,a}$$
    (2)
  3. 3.

    The accumulated deviations Xk,a from the average \({e}_{a}\) for each sub-period Ia are defined as:

    $${X}_{k,a}=\sum_{i=1}^{k}({N}_{i,a}-{e}_{a}).$$
    (3)

    The range is defined as the maximum Xk,a minus the minimum Xk,a, within each sub-period (Ia):

    $${R}_{Ia}=\mathrm{max}({X}_{k,a})\text{\hspace{0.17em}\hspace{0.17em}}-\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{min}({X}_{k,a}),\text{\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}}1\text{\hspace{0.17em}\hspace{0.17em}}\le \text{\hspace{0.17em}\hspace{0.17em}}k\text{\hspace{0.17em}\hspace{0.17em}}\le \text{\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}}n.$$
    (4)
  4. 4.

    The standard deviation \({S}_{Ia}\) is calculated for each sub-period Ia:

    $${S}_{Ia}\text{\hspace{0.17em}\hspace{0.17em}}=\text{\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}}{\left((\frac{1}{n})\sum_{k=1}^{n}{({N}_{k,a}-\text{\hspace{0.17em}\hspace{0.17em}}{e}_{a})}^{2}\right)}^{\mathrm{0,5}}.$$
    (5)
  5. 5.

    Each range RIa is normalised by dividing by the corresponding SIa. Therefore, the re-normalised scale during each sub-period Ia is RIa/ SIa. In the step 2 above, one obtains adjacent sub-periods of length n. Thus, the average R/S for length n is defined as:

    $${(R/S)}_{n}\text{\hspace{0.17em}\hspace{0.17em}}=\text{\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}}(1/A)\sum_{i=1}^{A}({R}_{Ia}/{S}_{Ia}).$$
    (6)
  6. 6.

    The length n is increased to the next higher level, (M—1)/n, and must be an integer number. In our case, we use n-indices that include the initial and end points of the time series, and Steps 1—6 are repeated until n = (M—1)/2.

  7. 7.

    Next one can use least square to estimate the equation log (R / S) = log (c) + Hlog (n). The slope coefficient in this regression is an estimate of the Hurst exponent H. To assess the statistical significance of the estimated Hurst exponent coefficients p-values and 95% confidence intervals can also be computed in the standard way in the context of regression analysis. Note that the Hurst exponent lies in the interval [0, 1]. On the basis of the H values three categories can be identified: the series are anti-persistent, and returns are negatively correlated if 0 ≤ H < 0.5; the series are random, returns are uncorrelated, and there is no memory in the series if H = 0.5; the series are persistent, returns are highly correlated, and there is memory in price dynamics if 0.5 < H ≤ 1.

Both static and dynamic R/S analysis is carried out. In the former case the Hurst exponent is calculated using the whole data set. In the latter a sliding-window approach is used and a series of Hurst exponents corresponding to each window are obtained. The procedure is the following: having computed the first value of the Hurst exponent (for example, for the date 01.04.2004 using data for the period from 01.01.2004 to 31.03.2004), each of the following ones is calculated by shifting forward the ‘data window’, where the size of the shift depends on the number of observations and a sufficient number of estimates (namely > 100 following the literature) is required to analyse the time-varying behaviour of the Hurst exponent. For example, if the shift equals 10, the second value is calculated for 10.04.2004 and characterises the market over the period 10.01.2004 till 09.04.2004, and so on.

The second method employs fractional integration techniques to estimate the differencing parameter d as a measure of persistence; note that this is related to the Hurst exponential described above through the relationship H = d + 0.5. Further, R/S analysis is applied to the return series (the first differences of the logged indices), while I(d) models are estimated for the logged indices themselves, in which case the relationship becomes H = (d – 1) + 0.5 = d – 0.5. We consider processes of the form:

$${(1\text{\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}}-\text{\hspace{0.17em}\hspace{0.17em}}B)}^{d}{x}_{t}\text{\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}}=\text{\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}}{u}_{t}\text{\hspace{0.17em}},\text{\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}}t\text{\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}}=\text{\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}}1\text{\hspace{0.17em}},\text{\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}}2\text{\hspace{0.17em}},\text{\hspace{0.17em}\hspace{0.17em}}...\text{\hspace{0.17em}},$$
(7)

where B is the backshift operator (Bxt = xt-1); ut is an I(0) process (which may incorporate weak autocorrelation of the AR(MA) form) and xt represents the errors of a regression model of the form:

$${\text{y}}_{\text{t}}\text{\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}}=\text{\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}}{\beta }_{0}\text{\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}}+\text{\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}}{\beta }_{1}\text{\hspace{0.17em}}t\text{\hspace{0.17em}\hspace{0.17em}}+{x}_{\text{t}};\text{\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}}t\text{\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}}=\text{\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}}1\text{\hspace{0.17em}},\text{\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}}2\text{\hspace{0.17em}},\text{\hspace{0.17em}\hspace{0.17em}}...\text{\hspace{0.17em}},$$
(8)

where yt stands for the log of the stock index in each case, β0 and β1 denote an unknown constant and coefficient on a linear time trend t, and the regression errors xt are I(d). Note that under the Efficient Market Hypothesis the value of d in (7) should be equal to 1 and ut should be a white noise process. We use parametric and semiparametric methods, assuming in turn uncorrelated (white noise) and autocorrelated errors as in Bloomfield (1973). To be more specific, in the former case the model including Eqs. (7) and (8) is fully parameterised, with the assumption being made that ut in (7) is a white noise process; in the latter no structure is imposed on ut, which is allowed to be weakly autocorrelated as in Bloomfield (1973)—his model can be characterised as semiparametric because it does not have an explicit specification but can be described simply by its spectral density function, whose logged form approximates that of autoregressive models. In both cases, we use the Whittle estimator of d in the frequency domain (Dahlhaus 1989; Robinson 1994, 1995), as described, for example, in Gil-Alana and Robinson (1997).

4 Empirical results

The results of the static R/S analysis for the ESG and conventional MSCI indices are reported in Table 1.

Table 1 Static Hurst exponent calculations for the ESG and conventional MSCI indices (R/S analysis)

As can be seen, all calculated p-values are below 0.05, which implies that the estimated Hurst exponents are statistically significant, and in most cases there are no significant differences between the two types of indices; it is noteworthy that the estimates are generally higher for the emerging markets considered, which suggests that these are less efficient than the developed ones (in line with previous evidence)—the general consensus is that this is due to the fact that such markets are generally characterised by greater information asymmetry, lower liquidity, and fewer market participants.

The next step is the dynamic R/S analysis, which provides information about changes in persistence over time. The results are plotted in Appendix 1, Figures 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 and 12. Visual inspection suggests that persistence is time-varying and that its dynamic behaviour is very similar for the ESG and conventional indices. This is confirmed by the results reported in Table 2—with very few exceptions (the BRICS and India by itself) the correlations between the Hurst exponents obtained for the two types of indices are very high.

Table 2 Correlation analysis of Hurst exponent dynamics for the ESG and conventional indices

As an additional check we also carry out t-tests to see whether or not there are any statistically significant differences between the ESG and conventional indices in terms of Hurst exponent dynamics. The results are presented in Appendix 2 (Tables 9, 10 and 11). The null hypothesis of no difference is rejected only in the case of India. To sum up, the R/S analysis implies that persistence and its dynamics are essentially the same for the two sets of indices. However, persistence tends to be higher in emerging as opposed to developed markets, which indicates that the former are less efficient, a common finding in the literature.

More evidence is obtained using I(d) techniques. Specifically, we estimate the model given by Eqs. (7) and (8) and report the results for the two cases of white noise and autocorrelated errors in Table 3 and 4 for the ESG indices and in Tables 5 and 6 for the conventional ones; in particular, these tables display the estimates of d as well as their 95% confidence intervals obtained as the values of do for which the null hypothesis of d = do cannot be rejected at the 5% level with the tests of Robinson (1994). In each case we display the estimates of d for three standard model specifications, namely: i) no deterministic terms (i.e., β0 = β1 = 0 in (8)), ii) an intercept only (β1 = 0), and iii) an intercept and a linear time trend. The values in bold are those from the preferred specifications selected on the basis of the statistical significance of the regressors.

Table 3 Estimates of d based on white noise errors – ESG indices
Table 4 Estimates of d based on autocorrelated errors – ESG indices
Table 5 Estimates of d based on white noise errors—conventional indices
Table 6 Estimates of d based on autocorrelated errors—conventional indices

Starting with the ESG indices, under the assumption of white noise errors we find a significant time trend (with a positive coefficient, not reported) in the case of China, Japan and the US, whilst in the remaining cases neither an intercept nor a trend is required. Long memory (d > 0) characterises the BRICS, EAFE, Emerging Markets, and World indices; evidence of short memory (d = 0) is found for China, Europe, India and South Africa, while anti-persistence (d < 0) is detected for Japan, Pacific, the UK and the US.

When allowing for autocorrelation, the time trend is significant only in the case of the BRICS and China. There is no a single case of long memory; I(0) or short memory is found for the BRICS, EAFE, the Emerging Markets, India, Pacific, US and World indices, while for the remaining series (China, Europe, Japan, South Africa and the UK) d is significantly smaller than 0, which amounts to anti-persistence.

Next, we analyse the conventional indices. With white noise errors (Table 5), the time trend is significant for the US and Japan, while in the remaining cases no deterministic terms are required. As for the estimated values of d, anti-persistence (i.e. d < 0) is found in the case of the US, UK, Japan and the Pacific; evidence of short memory or I(0) behaviour is obtained for Europe, China and South Africa, and long memory (i.e., d > 0) is detected in the case of the India, World, Emerging Markets, EAFE and BRICS indices.

Under the assumption of correlated errors the time trend is only significant for the World index, whilst in the remaining cases both the intercept and the time trend are insignificant. Anti-persistence is found in the case of the UK, China, Japan, South Africa, the World, EAFE, Europe and the Pacific, and short memory (d = 0) for the US, India and the BRICS, thus long memory is not found in any single case.

Table 7 and 8 provide a synoptic view respectively of the estimates of the differencing parameter d and of the findings concerning the presence of anti-persistence (AP, i.e., a statistically significant coefficient d < 0 at the 95% level, marked with * in Table 7), short memory (SM, d = 0) and long memory (LM i.e., a statistically significantly coefficient d > 0 at the 95% level, marked with + in Table 7) on the basis of the estimated values of d.

Table 7 Summary of the estimates of the differencing parameter d
Table 8 Summary of the results based on the estimates of d: anti-persistence (AP), short memory (SM) and long memory (LM)

As can be seen, with white noise errors, there are differences between the two sets of indices only in the case of India and the BRICS, where short memory (d = 0) characterises the ESG indices and long memory (d > 0) the conventional ones, and also in the case of South Africa, where the ESG index exhibits anti-persistence and the conventional one short memory instead. By contrast, when allowing for autocorrelation, differences are found in the case of the World, EAFE and Pacific indices, the ESG ones being characterised by short memory (d = 0) and the conventional ones by anti-persistence (d < 0).

In general, the fractional integration results confirm those based on the R/S analysis, namely there are no significant differences in terms of the degree of persistence between the two sets of indices. Further, higher persistence is found for emerging markets than for developed ones, the former appearing to be less efficient. These findings imply that trading and investment strategies based on the ESG indices are not more profitable, though there might be scope for abnormal profits in the case of the less efficient emerging markets (the BRICS in particular).

Possible explanations for these results include different types of “camouflage” or “washing” (see Gray 2006), namely the misrepresentation of a company’s ESG record by exaggerating its environmental credentials (“green washing”), overstating the impact of an investment on labour or human rights (“social washing”), creating the false impression of being LGBT (lesbian, gay, bisexual, and transgender) friendly (“pink washing”), signing up for the UN compact and using the UN logo to shift attention from controversial business practices (“blue washing”), or highlighting progress towards some Sustainable Development Goals (SDG) whilst hiding some questionable business practices in the pursuit of profit (“SDG washing”). In all such cases companies, despite their alleged ESG credentials, behave in the same way as conventional, profit-seeking ones and thus it is not surprising that the statistical properties of their stocks and the corresponding indices should be the same.

In practice it is often difficult to identify “washing” given the existing regulations on ESG reporting; for instance, only on 10 March 2021 was the EU Regulation 2019/2088 proposed by the European Council on 27 November 2019 approved by the European Parliament; this is an attempt to create a classification of green (sustainable) activities and regulate their disclosure. It is noteworthy that the BRICs countries are leaders in implementing ESG reporting practices. In 2020, they were among the top 20 countries in terms of ESG reporting regulations and the share of companies reporting on sustainability (India: 18 regulations, 98% of reporting companies; Brazil: 18 and 85% respectively; China: 15 and 78% respectively—KPMG 2020; Van der Lugt et al. 2020). For example, in India, all listed companies are required to disclose sustainability information in annual reports; Brazil has introduced 'report or explain' requirements related to the SDGs, and in China even state-owned companies disclose information on ESG criteria and SDGs (13th Five Year Plan—Van der Lugt et al. 2020).

5 Conclusions

This paper uses R/S analysis and fractional integration techniques to examine the persistence of two sets of 12 ESG and conventional stock price indices from the MSCI database over the period 2007–2020 for a large number of both developed and emerging markets. As ESG indices include companies with higher transparency in their case one would expect lower information asymmetry and thus higher market efficiency compared to the case of standard stock indices.

The R/S results imply that there are no significant differences between the two types of indices in terms of the degree of persistence and its dynamic behaviour. However, higher persistence is found for the emerging markets examined (especially the BRICS), which are less efficient and thus offer more opportunities for profitable trading strategies. The fractional integration analysis yields the same conclusions, namely with a few exceptions the two sets of indices exhibit very similar behaviour.

These findings can be rationalised by noting that, in the absence of stringent reporting regulations, several companies simply pretend to comply with ESG criteria while in actual fact their investment decisions are not affected by those (a phenomenon which is known as “washing” in its various forms); thus it is not surprising that their stocks should have the same persistence properties as those of conventional ones.