This study investigates the sustainability disclosure effects of the introduction of the Companies Act 2006 Regulations 2013 in the United Kingdom. The regulation mandates the disclosure of information on greenhouse gas emissions, gender distribution and human rights issues. We examine two research questions: first, whether firms increased disclosure on the mandated topics after the regulation became effective relative to a control group, and second, whether a potential increase in disclosure is moderated by firms’ reporting incentives, namely, firms’ capital market visibility, growth orientation, governance structure, prior voluntary sustainability disclosure levels and critical media coverage. Our sample consists of the FTSE-350 firms and a matched control group of US firms. We use textual analysis to assess the disclosure of the mandated sustainability topics in firms’ annual reports. Specifically, we examine two types of disclosure, namely, the disclosure of the mandated key performance indicators and the narrative disclosure. Our results reveal a significant increase for both types of disclosure relative to the control group. Overall, this treatment effect tends to be smaller for firms with higher reporting incentives, i. e., reporting incentives mitigate the regulatory effect. Taken together, our results suggest that both standards and reporting incentives shape firms’ sustainability disclosure level.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
The addressees of the directive are specified as follows: “[…] the obligation to disclose a non-financial statement should apply only to those large undertakings which are public-interest entities and to those public-interest entities which are parent undertakings of a large group, in each case having an average number of employees in excess of 500, in the case of a group on a consolidated basis.” (EU Directive, recital 14.) In addition, the guidelines on non-financial information provide further guidance. “While the disclosure requirements concerning non-financial information apply to large-public interest entities which more than 500 employees, the disclosure requirements concerning board diversity apply only to large listed companies.” (Guidelines on non-financial reporting n.d., footnote 1.)
Compared to other sustainability disclosure regulations (e. g., the Grenelle I and Grenelle II in France, which mandate the disclosure of 42 sustainability-related performance indicators in firms’ annual reports), we consider the disclosure requirements of the SR Regulations as “modest”.
In the US, the Securities and Exchange Commission (SEC) is currently discussing a concept that would require the disclosure of public policy and sustainability matters.
Small companies are exempt from creating a strategic report, and medium-sized companies do not need to comply with reporting on sustainability indicators, unless they are listed. According to Article 465 of the Companies Act, a company qualifies as medium-sized if two of the following criteria are met in two consecutive years: (1) Turnover not more than £ 25.9 million, (2) balance sheet not more than £ 12.9 million, and (3) not more than 250 employees.
The switchover to IFRS is probably the largest change in an entire set of accounting standards to date and thus serves as a research setting for a vast number of studies on the consequences of financial disclosure regulation.
In addition, other streams of research focus on restatements of accounting errors (Cao et al. 2012; DeFond and Jiambalvo 1991; Palmrose and Scholz 2004) and earnings management in general (for a review see Healy and Wahlen 1999). We do not explicitly account for this literature, as these studies typically focus on influencing/misleading stakeholders through financial disclosures.
If a reporting regulation is only vaguely phrased, it is more difficult to determine non-compliance.
Matching on the level of sustainability disclosure (e. g., proxied by the Bloomberg ESG disclosure score) is not feasible since on average, the US firms have remarkably lower disclosure levels.
Files that cannot be processed in the textual analysis refer to PDF files with copy protection.
In addition, we manually adjust the reporting year if a firm’s fiscal year ends before July. We do not adjust the reporting year if the fiscal year ends in August or September, which might slightly bias our findings.
Thus, the main effect of post is not included in the regression since it is captured by the year-fixed effects.
The GRI was founded in Boston in 1997 as a non-governmental organization aiming to develop a sustainability reporting standard. In 2000, the GRI launched the first version of its sustainability reporting guidelines (G1). In 2016, the latest version of the guidelines—the GRI standards—was released.
The higher prevalence of self-constructed indices and hand-collected data in the sustainability disclosure literature most likely results from a lack of sufficient databases for the measurement of sustainability disclosure.
These pre-processing procedures include the elimination of line breaks, tabulators, unicode-wide characters and blanks that occur several times in sequence. We then split the text into single words (tokens) and eliminate all single characters and stop words. Stop words are words that appear frequently throughout a text but convey only minimal meaning (for instance, “a”, “the”, and “of”). For the identification of stop words, we rely on a list provided by McDonald (2017). In addition, we eliminate the names of the sample firms. Finally, we lemmatize the tokens using the wordnet-lemmatizer.
The search queries were defined in an iterative process. In this process, we realized that the occurrence of a numeric expression is essential in identifying the mandated key performance indicators. In addition, we realized that including the words “sex”, “gender” and “woman” in the search queries improves the identification of information that is presented in tables. Similarly, including a wildcard before and after ‘CO2’ captures expressions such as “CO2e” (CO2 equivalents) and “tCO2” (tons of CO2).
Thus the search terms must appear side by side, separated by not more than three (two words in case of greenhouse gas) words (excluding stop words).
Thus, the number of words that appear before and after the search terms is dependent on the number of words between the search terms.
The search query is composed of the following logical expression for environmental topics: ((ECOLOGY or EMISSION or WATER or ENVIRONMENTAL or OIL or WASTE or (PALM and OIL) or (NUCLEAR and POWER) or ENERGY) and (LEAK or CONTROVERSY or DAMAGE or CRITICISM or RECALL or VIOLATION or crisis)) or POLLUTION or (LAND and CONTAMINATION) or (OIL and SPILL) or (WASTE and DISCHARGE) or (TOXIC and WASTE) or CONTAMINATION or ASBESTOS.
The search query is composed of the following logical expression for social and human rights topics: ((POOR or UNSAFE or UNFAIR) and (WORK or WORKING or EMPLOYMENT)) or (CHILD and LABOR) or (WORKER and DEATH) OR (SEXUAL and EXPLOITATION) or (LAND and GRAB) or (((TRADE and UNION) or WORKER or WORK OR LABOR or (HUMAN and RIGHT)) and (ABUSE or DISCRIMINATION or SUPPRESSION or REPRESSION or VIOLENCE OR CRTICISM or CONTROVERSY or DEATH or VIOLATION)).
Note that for reasons of convenience, the values of the cosine similarity are multiplied by 100.
Note that the variables that proxy for firm-level reporting incentives are transformed based on a median split of the sample for each year.
The maximum of 24,241 articles refers to BP in 2010.
Note that despite winsorization, the maximum value for roa equals 3.58.
Note that the frequency refers to the occurrence of the words in the topic vocabularies, not in the annual reports. Because of the construction of the windows, the same word might appear more than once in the vocabulary if the word window is composed of more than one search term.
Except for ghg_narrative and analysts, hr_narrative and growth, hr_narrative and governance, ghg_KPI and prior_discl, and hr_narrative and prior_discl.
For growth, the triple interaction is positive. For media, the triple interaction is not significant.
(β3 + β5) is positive and significant for all disclosure measures and (β3 + β4) is positive and significant for some disclosure measures.
More precisely, (β3 + β4) is positive and significant for ghg_KPI and gender_KPI and (β3 + β5) is positive and significant for gender_KPI, ghg_narrative and gender_narrative.
For a thorough debate on the relationship between sustainability performance and sustainability disclosure, see Hummel and Schlick (2016).
By allowing for the occurrence of “no” and “not” in the five-word windows, one may argue that we might capture statements such as “The company emits 0 ton CO2”.
For instance, the dictionary provided by Pencle and Mălăescu (2016) includes 319 words for the employee dimension, 451 words for the environmental dimension, and 297 words for the human rights dimension.
Examples include the words “balancing”, “certification”, “agent”, “award”, “died”, “election”, “law”, “outsourcing”, “personal”, “person” or “worker” in the human rights dimension and the words “country”, “innovation”, “reasonable”, “science”, “suitable”, and “voluntary” in the environmental dimension.
Typical US words are, for instance, “EPA” and “environmental protection agency” in the environmental dimension, “African American” in the employee and human rights dimension, and “first nation” in the human rights dimension.
Specifically, the use of word counts implies that each word receives the same weight, although adjustments based on how unusual the word is typically enhance the validity of the measure (Loughran and McDonald 2016).
Positive and negative words are defined according to a word list provided by Loughran and McDonald (2011).
Higher values thus reflect better readability of the text. The measures are calculated based on the average number of words per sentence (w), the percentage of complex words relative to all words (p) and the average number of syllables per word (s):
Fog Index = 0.4 * (w + p); Flesch-Kincaid = 11.8s + 0.39w − 15.59; Flesch Reading Ease = 206.8 − 1.015w − 84.6s.
Note that we transform numbers separated by “,” or “.” into a single token. Nevertheless, our measure is noisy since we cannot exclude page numbers, chapter numbers and figure numbers.
Note that Hoberg and Maksimovic (2015) simply use a vector of word counts (i. e., the term frequency) instead of the tf-idf. In contrast, the tf-idf incorporates a term weighting procedure (i. e., the inverse document frequency) and adjusts a word’s weight based on how (un)usual the word is. It thus reflects the importance of a word in a specific document relative to the importance of that word in the entire corpus. The more unusual the word, the higher the weight (Loughran and McDonald 2016).
Albertini, E. 2014. A descriptive analysis of environmental disclosure: a longitudinal study of French companies. Journal of Business Ethics 121:233–254.
Ball, R., S. Kothari, and A. Robin. 2000. The effect of international institutional factors on properties of accounting earnings. Journal of Accounting and Economics 29(1):1–51.
Ball, R., A. Robin, and J.S. Wu. 2003. Incentives versus standards: properties of accounting income in four East Asian countries. Journal of Accounting and Economics 36:235–270.
Bassemir, M. 2018. Why do private firms adopt IFRS? Accounting and Business Research 48(3):237–263.
Bassemir, M., and Z. Novotny-Farkas. 2018. IFRS adoption, reporting incentives and financial reporting quality in private firms. Journal of Business Finance & Accounting 45(7–8):759–796.
Bebbington, J., E.A. Kirk, and C. Larrinaga. 2012. The production of normativity: a comparison of reporting regimes in Spain and the UK. Accounting, Organizations and Society 37(2):78–94.
Biddle, G., G. Hilary, and R. Verdi. 2009. How does financial reporting quality relate to investment efficiency. Journal of Accounting and Economics 48:112–131.
Botosan, C.A. 1997. Disclosure level and the cost of equity capital. Accounting review 72:323–349.
Brammer, S., and S. Pavelin. 2004. Voluntary social disclosures by large UK companies. Business Ethics: A European Review 13(2–3):86–99.
Brown, N., and C. Deegan. 1998. The public disclosure of environmental performance information—a dual test of media agenda setting theory and legitimacy theory. Accounting and Business Research 29(1):21–41.
Brüggemann, U., J.-M. Hitz, and T. Sellhorn. 2013. Intended and unintended consequences of mandatory IFRS adoption: a review of extant evidence and suggestions for future research. European Accounting Review 22(1):1–37.
Burgstahler, D.C., L. Hail, and C. Leuz. 2006. The importance of reporting incentives: earnings management in European private and public firms. The Accounting Review 81(5):983–1016.
Bushman, R.M., and J.D. Piotroski. 2006. Financial reporting incentivees for conservative accounting: the influence of legal and political institutions. Journal of Accounting and Economics 42:107–148.
Cahan, S.F., C. De Villiers, D.C. Jeter, V. Naiker, and C.J. Van Staden. 2016. Are CSR disclosures value relevant? Cross-country evidence. European Accounting Review 25(3):579–611.
Cao, Y., L.A. Myers, and T.C. Omer. 2012. Does company reputation matter for financial reporting quality? Evidence from restatements. Contemporary Accounting Research 29(3):956–990.
Chauvey, J.-N., S. Giordano-Spring, C.H. Cho, and D.M. Patten. 2015. The normativity and legitimacy of CSR disclosure: evidence from France. Journal of Business Ethics 130(4):789–803.
Cho, C.H., and D.M. Patten. 2007. The role of environmental disclosures as tools of legitimacy: a research note. Accounting, Organizations and Society 32(7–8):639–647.
Cho, C.H., R.P. Guidry, A.M. Hageman, and D.M. Patten. 2012. Do actions speak louder than words? An empirical investigation of corporate environmental reputation. Accounting, Organizations and Society 37(1):14–25.
Christensen, H.B., E. Lee, M. Walker, and C. Zeng. 2015. Incentives or standards: What determines accounting quality changes around IFRS adoption? European Accounting Review 24(1):31–61.
Clarkson, P.M., Y. Li, G.D. Richardson, and F.P. Vasvari. 2008. Revisiting the relation between environmental performance and environmental disclosure: an empirical analysis. Accounting, Organizations and Society 33(4–5):303–327.
Clarkson, P.M., M.B. Overell, and L. Chapple. 2011. Environmental reporting and its relation to corporate environmental performance. Abacus 47(1):27–60.
Cormier, D., M. Magnan, and B. van Velthoven. 2005. Environmental disclosure quality in large German companies: economic incentives, public pressures or institutional conditions? European Accounting Review 14(1):3–39.
Costa, E., and M. Agostini. 2016. Mandatory disclosure about environmental and employee matters in the reports of Italian-listed corporate groups. Social and Environmental Accountability Journal 46(1):10–33.
Daske, H., L. Hail, C. Leuz, and R. Verdi. 2008. Mandatory IFRS reporting around the world: early evidence on the economic consequences. Journal of Accounting Research 46(5):1085–1142.
Daske, H., L. Hail, C. Leuz, and R. Verdi. 2013. Adopting a label: heterogeneity in the economic consequences around IAS/IFRS adoptions. Journal of Accounting Research 51(3):495–547.
DeFond, M.L., and J. Jiambalvo. 1991. Incidence and circumstances of accounting errors. Accounting review 66:643–655.
De Franco, G., O. Hope, D. Vyas, and Y. Zhou. 2015. Analyst report readability. Contemporary Accounting Research 32:76–104.
Delbard, O. 2008. CSR legislation in France and the European regulatory paradox—an analysis of EU CSR policy and sustainability reporting practice. Corporate Governance: The International Journal of Business in Society 8(4):397–405.
Dhaliwal, D., O.Z. Li, A. Tsang, and Y.G. Yang. 2014. Corporate social responsibility disclosure and the cost of equity capital: the roles of stakeholder orientation and financial transparency. Journal of Accounting and Public Policy 33(4):328–355.
Dhaliwal, D.S., S. Radhakrishnan, A. Tsang, and Y.G. Yang. 2012. Nonfinancial disclosure and analyst forecast accuracy: international evidence on corporate social responsibility disclosure. The Accounting Review 87(3):723–759.
Directive 2003/51/EC. https://eur-lex.europa.eu/legal-content/EN/TXT/?qid=1550243120753&uri=CELEX:32003L0051. Accessed on 15/02/2019.
Directive 2014/95/EU. https://eur-lex.europa.eu/legal-content/EN/TXT/?qid=1550242961128&uri=CELEX:32014L0095. Accessed on 15/02/2019.
Eng, L.L., and Y.T. Mak. 2003. Corporate governance and voluntary disclosure. Journal of Accounting and Public Policy 22(4):325–345.
Ernstberger, J., and M. Grüning. 2013. How do firm- and country-level governance mechanisms affect firms’ disclosure? Journal of Accounting and Public Policy 32:50–67.
Fallan, E., and L. Fallan. 2009. Voluntarism versus regulation—lessons from public disclosure of environmental performance information in Norwegian companies. Journal of Accounting & Organizational Change 5(4):472–489.
Glaum, M., and D.L. Street. 2003. Compliance with the disclosure requirements of Germany’s new market: IAS versus US GAAP. Journal of International Financial Management & Accounting 14(1):64–100.
Glaum, M., P. Schmidt, D.L. Street, and S. Vogel. 2013. Compliance with IFRS 3‑and IAS 36-required disclosures across 17 European countries: company-and country-level determinants. Accounting and Business Research 43(3):163–204.
Gow, I.D., D.F. Larcker, and P.C. Reiss. 2016. Causal inference in accounting research. Journal of Accounting Research 54(2):477–523.
Guidelines on non-financial reporting. https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:52017XC0705&from=EN. Accessed on 15/02/2019.
Hahn, R., and M. Kühnen. 2013. Determinants of sustainability reporting: a review of results, trends, theory, and opportunities in an expanding field of research. Journal of Cleaner Production 59:5–21.
Hail, L. 2002. The impact of voluntary corporate disclosures on the ex-ante cost of capital for Swiss firms. European Accounting Review 11(4):741–773.
Healy, P.M., and J.M. Wahlen. 1999. A review of the earnings management literature and its implications for standard setting. Accounting Horizons 13(4):365–383.
Hoberg, G., and V. Maksimovic. 2015. Redefining financial constraints: a text-based analysis. Review of Financial Studies 28(5):1312–1352.
Hombach, K., and T. Sellhorn. 2018. Shaping corporate actions through targeted transparency regulation: a framework and review of extant evidence. Schmalenbach Business Review https://doi.org/10.1007/s41464-018-0065-z.
Hummel, K., and C. Schlick. 2016. The relationship between sustainability performance and sustainability disclosure—reconciling voluntary disclosure theory and legitimacy theory. Journal of Accounting and Public Policy 35(5):455–476.
Hummel, K., S. Mittelbach-Hörmanseder, C.H. Cho, and D. Matten. 2017. Implicit versus explicit corporate social responsibility disclosure: a textual analysis. Working paper.
Ioannou, I., and G. Serafeim. 2017. The consequences of mandatory corporate sustainability reporting. Harvard Business School research working paper No. 11–100.
Johansen, T.R. 2016. EU regulation of corporate social and environmental reporting. Social and Environmental Accountability Journal 36(1):1–9.
KPMG. 2011. KPMG international survey of corporate responsibility reporting 2011. https://www.kpmg.de/docs/Survey-corporate-responsibility-reporting-2011.pdf.
KPMG. 2013. The KPMG survey of corporate responsiblity reporting 2013. https://assets.kpmg.com/content/dam/kpmg/pdf/2015/08/kpmg-survey-of-corporate-responsibility-reporting-2013.pdf.
KPMG. 2015. Currents of changes. The KPMG survey of corporate responsibility reporting 2015. https://assets.kpmg.com/content/dam/kpmg/pdf/2016/02/kpmg-international-survey-of-corporate-responsibility-reporting-2015.pdf.
KPMG, Global Reporting Initiative, UNEP, and Centre for Corporate Governance in Africa. 2016. Carrots & sticks. Global trends in sustainability reporting regulation and policy. https://assets.kpmg.com/content/dam/kpmg/pdf/2016/05/carrots-and-sticks-may-2016.pdf.
Krause, J., T. Sellhorn, and K. Ahmed. 2017. Extreme uncertainty and forward-looking disclosure properties. Abacus 53(2):240–272.
Lang, M.H., K.V. Lins, and D.P. Miller. 2004. Concentrated control, analyst following, and valuation: Do analysts matter most when investors are portected least? Journal of Accounting Research 42(3):589–623.
Larrinaga, C., F. Carrasco, C. Correa, F. Llena, and J.M. Moneva. 2002. Accountability and accounting regulation: the case of the Spanish environmental disclosure standard. European Accounting Review 11(4):723–740.
Leuz, C., and P.D. Wysocki. 2016. The economics of disclosure and financial reporting regulation: evidence and suggestions for future research. Journal of Accounting Research 54(2):525–622.
Leuz, C., D. Nanda, and P.D. Wysocki. 2003. Earnings management and investor protection: an international comparison. Journal of financial economics 69(3):505–527.
Li, F. 2008. Annual report readability, current earnings, and earnings persistence. Journal of Accounting and Economics 45:221–247.
Lo, K., F. Ramos, and R. Rogo. 2017. Earnings management and annual report readability. Journal of Accounting and Economics 63:1–25.
Loughran, T., and B. McDonald. 2011. When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks. Journal of Finance 66(1):35–65.
Loughran, T., and B. McDonald. 2014. Measuring readability in financial disclosures. Journal of Finance 69(4):1643–1671.
Loughran, T., and B. McDonald. 2016. Textual analysis in accounting and finance: a survey. Journal of Accounting Research 54(4):1187–1230.
McDonald, B. 2017. Stop words list. http://www3.nd.edu/~mcdonald/Word_Lists.html.
Melloni, G., A. Caglio, and P. Perego. 2017. Saying more with less? Disclosure conciseness, completeness and balance in integrated reports. Journal of Accounting and Public Policy 36:220–238.
Muslu, V., S. Mutlu, S. Radhakrishnan, and A. Tsang. 2017. Corporate social responsibility report narratives and analyst forecast accuracy. Journal of Business Ethics https://doi.org/10.1007/s10551-016-3429-7.
Nazari, J.A., K. Hrazdil, and F. Mahmoudian. 2017. Assessing social and environmental performance through narrative complexitiy in CSR reoprts. Journal of Contemporary Accounting & Economics 13:166–178.
Neu, D., H. Warsame, and K. Pedwell. 1998. Managing public impressions: environmental disclosures in annual reports. Accounting, Organizations and Society 23(3):265–282.
Palmrose, Z.V., and S. Scholz. 2004. The circumstances and legal consequences of non-GAAP reporting: evidence from restatements. Contemporary Accounting Research 21(1):139–180.
Patten, D.M. 1991. Exposure, legitimacy, and social disclosure. Journal of Accounting and Public Policy 10(4):297–308.
Patten, D.M. 2002. The relation between environmental performance and environmental disclosure: a research note. Accounting, Organizations and Society 27(8):763–773.
Pencle, N., and I. Mălăescu. 2016. What’s in the words? Development and validation of a multidimensional dictionary for CSR and application using prospectuses. Journal of Emerging Technologies in Accounting 13(2):109–127.
Peters, G.F., and A.M. Romi. 2013. Discretionary compliance with mandatory environmental disclosures: evidence from SEC filings. Journal of Accounting and Public Policy 32(4):213–236.
Roulstone, D. 2003. Analyst following and market liquidity. Contemporary Accounting Research 20(3):551–578.
Sethi, S.P. 1978. Advocacy advertising—the American experience. California Management Review 21(1):55–67.
Suchman, M.C. 1995. Managing legitimacy: strategic and institutional approaches. Academy of Management Review 20(3):571–610.
Verrecchia, R.E. 2001. Essays on disclosure. Journal of Accounting and Economics 32(1–3):97–180.
Verriest, A., A. Gaeremynck, and D.B. Thornton. 2013. The impact of corporate governance on IFRS adoption choices. European Accounting Review 22(1):39–77.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Details On the Construction of the Disclosure Measures
The construction of KPI disclosure measures:
For each topic, we query according to predefined logical expressions across all documents. In particular, with respect to disclosure of
GHG emissions, the following logical expression is used for the search query:
(‘tonne’ OR ‘ton’ OR ‘numeric’) AND (‘GHG’ or ‘*CO2*’ OR ‘carbon’ OR (‘greenhouse’ AND ‘gas’))
gender distribution, the following logical expression is used for the search query:
((‘female’ OR ‘gender’ OR ‘woman’ or ‘sex’) AND (‘board’ OR ‘director’ OR ‘executive’ OR ‘manager’ or ‘employee’) AND ‘numeric’) OR ((‘gender’ AND ‘distribution’) OR (‘gender’ AND ‘split’) OR (‘gender’ AND ‘breakdown’) AND ‘numeric’)
ghg_KPI and gender_KPI take on the value of “1” if the report loads on the search query and “0” otherwise.
The construction of topic-specific narrative disclosure measures:
Let N denote the number of unique words in the entire corpus.
For each topic, we query according to predefined logical expressions across all documents. In particular, with respect to disclosure on
GHG emissions, the following logical expression is used for the search query:
(‘ghg’ AND ‘emission’) OR (‘*CO2*’ AND ‘emission’) OR (‘carbon’ AND ‘dioxide’) OR (‘greenhouse’ AND ‘gas’) OR (‘climate’ AND ‘change’) OR (‘kyoto’ AND ‘protocol’) OR (‘global’ AND ‘warming’)
gender distribution, the following logical expression is used for the search query:
(‘gender’ AND ‘split’) OR (‘gender’ AND ‘diversity’) OR (‘gender’ AND ‘distribution’) OR (‘gender’ AND ‘breakdown’) OR (‘female’ AND ‘manager’) OR (‘woman’ AND ‘manager’) OR (‘female’ AND ‘management’) OR (‘woman’ AND ‘management’) OR (‘female’ AND ‘director’) OR (‘woman’ AND ‘director’) OR (‘female’ AND ‘executive’) OR (‘woman’ AND ‘executive’) OR (‘female’ AND ‘board’) OR (‘woman’ AND ‘board’)
human rights, the following logical expression is used for the search query:
‘human’ AND ‘right’
For each topic, we aggregate all retrieved ten-word windows into a topic-specific vocabulary list. The vocabulary list includes all words that appear in all retrieved ten-word windows for each topic.
For each topic, we define an N-vector search that is filled with the term-frequency-inverse-document-frequency (tf-idf) of each word in the topic vocabulary corresponding to each of the N elements.
For each firm i in each year t, we define an N-vector texti, t that is filled with the tf-idf for each word in firm i’s annual report in year t corresponding to each of the N elements.Footnote 38
For each element of the N-vector, the inverse-document-frequency (idf) is calculated according to:
- where n::
number of all documents
number of documents in which the word appears
For each element of the N-vector search, the tf-idf is calculated as the product of the number of times the word appears in the training set and the idf.
For each element of the N-vector texti, t, the tf-idf is calculated as the product of the number of times the word appears in the annual report of firm i in year t (i. e., the term frequency) and the idf.
To neutralize the impact of the document length, we normalize the N-vector search according to:
Similarly, we normalize the N-vector texti, t according to:
To obtain the similarity between firm i’s annual disclosure in year t and the topic vocabulary, we calculate similarityi, t as the cosine similarity (i. e., the dot product) between normi, t and search_norm.
For conventional reasons, the cosine similarity is multiplied by 100.
A Simple Example for the Calculation of the Cosine Similarity (Analogous to Hummel et al. (2017))
Consider three texts that, after application of the preprocessing methods, can be described according to the following word lists:
text_1 = [‘employee’, ‘educate’, ‘women’]
text_2 = [‘engage’, ‘board’, ‘gender’, ‘composition’, ‘women’]
text_3 = [‘board’, ‘composition’, ‘engage’, ‘educate’, ‘women’]
Consider the following training set (as a result of the search query):
search = [‘gender’, ‘board’, ‘women’]
The corpus is given by:
corpus = [‘gender’, ‘board’, ‘women’, ‘composition’, ‘engage’, ‘employee’, ‘educate’]
The inverse-document-frequency for each word corresponds to:
wgender = 1.5850
wboard = 0.5850
wwomen = 0.0000
wcomposition = 0.5850
wengage = 0.5850
wemployee = 1.5850
weducate = 0.5850
The tfidf-vector for the training set and each text corresponds to:
search = [1.5850, 0.5850, 0.0, 0.0, 0.0, 0.0, 0.0]
text_1 = [0.0, 0.0, 0.0, 0.0, 0.0, 1.5850, 0.5850]
text_2 = [1.5850, 0.5850, 0.0, 0.5850, 0.5850, 0.0, 0.0]
text_3 = [0.0, 0.5850, 0.0, 0.5850, 0.5850, 0.0, 0.5850]
The normalized tfidf-vector for the training set and each text corresponds to:
norm_search = [0.9381, 0.3462, 0.0, 0.0, 0.0, 0.0, 0.0]
norm_text_1 = [0.0, 0.0, 0.0, 0.0, 0.0, 0.9381, 0.3462]
norm_text_2 = [0.8426, 0.311, 0.0, 0.311, 0.311, 0.0, 0.0]
norm_text_3 = [0.0, 0.5, 0.0, 0.5, 0.5, 0.0, 0.5]
The cosine similarity for each text corresponds to:
similarity_text_1 = norm_search ∙ norm_text_1 = 0.0000
similarity_text_2 = norm_search ∙ norm_text_2 = 0.8981
similarity_text_3 = norm_search ∙ norm_text_3 = 0.1731
Examples of Incorrect Classifications with Regard to ghg_KPI and gender_KPI
We manually checked the validity of the disclosure measures, particularly with regard to the disclosure of key performance indicators. The results indicate that the textual analysis might not correctly identify the disclosure of the key performance indicators in some cases. Table 10 provides examples of incorrect classifications.
About this article
Cite this article
Hummel, K., Rötzel, P. Mandating the Sustainability Disclosure in Annual Reports—Evidence from the United Kingdom. Schmalenbach Bus Rev 71, 205–247 (2019). https://doi.org/10.1007/s41464-019-00069-8
- Mandatory sustainability disclosure
- Reporting incentives
- Textual analysis