Data
Due to their transparent scoring methodology, we choose Thomson ReutersFootnote 1 as the world’s largest ESG rating database for our data source (see, i.e., Cheng et al. 2014; Durand and Jacqueminet 2015). Therefore, our dataset includes all Thomson Reuters scores (in the following referred to as TR scores), controversies and combined scores for the European, US, as well as the global market (including the US and European market) in the period under review from 2002 to 2018. These three scores represent the starting point for further calculations and are explained in more detail below.
First, the controversies scores, which pertain to Thomson Reuter’s latest scoring methodology, add a new dimension to previous approaches by capturing negative media stories from global media sources. This score is a percentile ranking that takes ESG-based scandals into account concerning and infringing on any of the following controversy topics and that occur during a company’s fiscal year. Its rating methodology consists of 23 ESG controversy topics such as “controversies privacy” or “business ethics controversies” (see Thomson Reuters 2019). This score is also benchmarked on the respective industry groups.
Thus, if a scandal occurs, it has a negative impact on the evaluation of the company involved. Ongoing legislation disputes, lawsuits and fines may also affect the ensuing years and may still be visible in further controversy ratings. Furthermore, the valuation is as follows:
$$\begin{aligned} {\text {score}} = \frac{{\# \,{\text{comp.}}\, {\text{with}}\, {\text{a}}\, {\text{worse}}\, {\text{value}}} + \frac{\#\, {\text {comp.}\, {\text{with}}\,{\text{the}}\, {\text{same}}\, {\text{value}}\, {\text{included}}\, {\text{current}}\, {\text{one}}}}{2}}{\#\, {\text {comp.}}\,{\text{with}}\,{\text {a}}\,{\text{ value}}} \end{aligned}$$
(1)
In brief: the fewer scandals that affect a company, the higher its score is.Footnote 2
The TR score evaluates a company’s environmental, social and corporate governance performance (ESG) with regard to ten main categories based on publicly available company-reported data. Each of these categories (for instance, resource use, innovation and emissions in the environmental pillar, human rights and workforce in the social pillar and management in the corporate governance pillar) receives an individually calculated category score and a related category weighting within its associated pillar. These data result in three so-called pillar scores, one for each ESG pillar. To calculate the overall ESG score, these pillar scores are aggregatedFootnote 3 and in the last step, the TR score is ranked by percentile and benchmarked against the industry. Therefore, the TR score implies an easy way to implement a best-in-class approach (see Thomson Reuters 2019).
Next, the combined score comprises both the TR and the controversies score and thus offers a broadly diversified scoring with regard to performance-based ESG data and controversies collected from worldwide media sources (see Thomson Reuters 2019). The controversies score has no impact on the TR score if it is greater than or equal to 50. In this case, the combined score equals the TR score. However, if the TR score is less than the controversies score, the combined score also equals the TR score. Only if the TR score is greater than the controversies score (\(<\,50\)), the combined score equals the average of both scores.Footnote 4
In order to determine our data universe, we only consider companies for which all three ratings are present. Moreover, penny stocks are deleted. As a result, we obtain a monthly-based dataset with over 529,000 observations in total at an average of approximately 2500 companies in a single month during our time period of 2002–2018 (192 months), more precisely between 900 and 4700 at each point in time. For all observed companies, we have a comparable dataset of the three ratings (TR, combined and controversies). Table 1 shows the descriptive statistics of our data universe.
Table 1 Descriptive statistics Concerning the TR rating, the mean value of the rating universe corresponds almost exactly to 50 with a standard deviation of approximately 17. The controversies score is approximately the same as the TR score in terms of mean value and standard deviation. As can be expected with regard to the calculation, the combined score has a lower mean value than the TR and controversies score with a standard deviation of 15.
Regarding the correlation between the three scores it is noteworthy that the correlation between the controversies score and the TR score is negative (− 0.3107). Thus, companies with a high TR score tend to have a low controversies score.
One explanation for this may be that companies that tend to have high ESG scores are affected more greatly by controversies, as reflected by the saying “the higher you fly, the harder you fall”.
Furthermore, as would be expected from the composition, the correlation between TR score and combined score is positive (0.7774) as well as between controversies score and combined score (0.3077).
The analysis in this paper is carried out from the perspective of an US investor, so all data is converted into US dollars. The total returns and market capitalization of the considered companies are received from Thomson Reuters Eikon. Discarded (delisted) or insolvent companies are considered until the last available rating or financial information. Thus, our results are not influenced by a potential survivorship bias. For more detailed insights, some descriptives for the European and US market are displayed in Table 2. While for the European market we consider over 158,000 observations based on an average of approximately 820 companies (between 400 and 1000), for the US market, our data consist of over 191,000 observations at an average of approximately 1000 companies (between 400 and 2300).
Table 2 Descriptive statistics for the European and US market Methodology
As a first step, we construct several portfolios by generally sorting stocks according to each score. To calculate the monthly returns, we select the best-rated and worst-rated stocks, respectively, and combine them in a portfolio, one being for each of the three scores. Following this procedure, we consider a best-only and worst-only strategy as well as a best-minus-worst strategy, which is long in the best-performing companies and short in the worst-performing ones. As a next step, we consider three different weighting approaches upon which to construct the portfolios. We include the common value-weighted and equally weighted strategies and also a rank-weighted strategy that we present in detail below in “A different approach: rank-weighted portfolios” section.
We obtain nine stock portfoliosFootnote 5 for value- and equally weighted and rank-weighted strategies, which is the object of contemplation in “Rank-weighted portfolios” section, respectively, in the European, US and global market—in total 27 per market. In order to determine the performance of our portfolios, we apply the Fama and French (2015) five-factor model, which is based on the regression:
$$\begin{aligned} \begin{aligned} R_{it} - R_{Ft}&= a_i + b_i (R_{Mt} - R_{Ft}) + s_i SMB_t + h_i HML_t \\&\quad + r_i RMW_t + c_i CMA_t + e_{it}. \end{aligned} \end{aligned}$$
(2)
In this model, the return of portfolio i for period t is represented by \(R_{it}\) while \(R_{Ft}\) comprises the risk-free return. \(R_{Mt}\) denotes the return of the market portfolio, \(SMB_t\) represents the small-minus-big factor (returns of small stocks minus returns of big stocks) and \(HML_t\) is the performance difference between companies with a high and low book-to-market value. The factor \(RMW_t\) indicates the difference between the returns of stocks with a weak and a robust profitability. \(CMA_t\) describes the returns of conservative (i.e., low-investment firms) minus aggressive (i.e., high-investment firms) stocks. Moreover, \(b_i, s_i, h_i, r_i\), and \(c_i\) are the estimated regression coefficients which are calculated by OLS regression, in which \(e_{it}\) denotes a (zero-mean) residual and \(a_i\) the intercept.
Since a Breusch and Pagan (1979) test applied to all portfolios indicates that the residuals of the regressions are subject of heteroskedasticity and a Godfrey (1978) and Breusch (1978) test as well as a Durbin and Watson (1971) test show autocorrelations for most of the models, we use the approach of Newey and West (1987) to calculate standard errors.
A different approach: rank-weighted portfolios
Besides equally weighted and value-weighted portfolios, we also consider a new portfolio composition strategy following a similar approach to Frazzini and Pedersen (2014) which reflects the great importance of the ESG ratings for those investors, who may wish to award a different level in the scores through a corresponding weight. Consequently, we build portfolio weights based on the respective score placements. Our new approach is to award better scores and to consequently include them with higher weights in a best-portfolio strategy and vice versa in order to reward worse scores with higher weights in the worst portfolio. In addition, the best portfolios constructed this way have, by definition, a higher ESG rating than value-weighted or equally weighted strategies, whereas the worst portfolios have lower ratings. First, we determine the best and worst stocks. Next, we divide the companies up by rank in ascending and descending order. In the best portfolios, the company with the highest score receives the (numerically) highest rank. In contrast, the company with the worst score receives the highest rank in the worst portfolios. To calculate the weights \(w_{i,t}\) of a company \(c \in C_t \subseteq C\), where C is the set of all companies within the respective data and \(C_t\) is the set of all companies within the portfolio at time t, we use
$$\begin{aligned} w_{t}:C_t \times T&\longrightarrow [0,1] \\ (c,t)&\longmapsto w_{t}(c,t) = \frac{(N_t-Rk_{t}(c))+1}{\sum _{\tilde{c} \in C_t} Rk_{t}(\tilde{c})} \end{aligned}$$
and for each \(t \in T\) there holds
$$\begin{aligned} \sum _{\tilde{c} \in C_t} w_{t}(\tilde{c},t) = 1, \end{aligned}$$
where \(Rk_{t}(c)\) note the rank of a company c at t, \(N_{t} =|C_t |\) the cardinality of the portfolio selection at t, in the monthly period under review. If a company \(\hat{c} \in C\backslash C_t\) does not appear in the portfolio selection at time t by definition, its weight is
$$\begin{aligned} w_{t}(\hat{c},t)\,{{:}{=}}\,0. \end{aligned}$$