Abstract
This research introduces an innovative approach that utilizes machine learning to forecast Environmental, Social, and Governance (ESG) controversies within corporations, based on public opinions expressed on Twitter. Drawing on the theoretical foundations of legitimacy theory and stakeholder theory, the proposed methodology emphasizes the essential role of stakeholder engagement in effectively managing ESG risks and promoting sustainable business practices. Through the examination of eight machine-learning algorithms, the research showcases the accurate forecasting of ESG controversies, specifically achieving a remarkable overall F1-Score of 80% by LightGBM. The findings underscore the significant contribution of machine learning models and social media analytics in ESG risk management and controversy mitigation. Companies can anticipate potential controversies and proactively improve their Corporate Social Responsibility practices by actively monitoring public sentiments, especially on social media platforms. Analyzing positive sentiments as indicators of successful practices and negative sentiments as potential areas of concern further enhances their legitimacy and foster stakeholder engagement.
This is a preview of subscription content, access via your institution.





Data availability
Research data are available when requested.
References
Karaman, Y. (2022). The effect of countries’ ESG ratings on sovereign credit default swaps: An Empirical Evidence on OECD Countries (2008–2019). World Journal of Entrepreneurship, Management and Sustainable Development., 18, 447–465.
Dowling, J., & Pfeffer, J. (1975). Organizational legitimacy: social values and organizational behavior. Pacific Sociological Review, 18(1), 122–136.
Freeman, R. E. (1984). Strategic management: a stakeholder approach. Cambridge University Pres.
ISO 26000. (2010). Guidance on social responsibility, ISO 2010, Switzerland - https://iso26000.info/wp-content/uploads/2017/06/ISO-26000_2010_E_OBPpages.pdf.
Anghel, L. D., Grigore, G. F., & Roşca, M. (2011). Cause-related marketing, part of corporate social responsibility and its influence upon consumers’ attitude. Amfiteatru Economic Journal, 13(29), 72–85.
Mandhachitara, R., & Poolthong, Y. (2011). A model of customer loyalty and corporate social responsibility. Journal of Services Marketing, 25(2), 122–133.
Supanti, D., Butcher, K., & Fredline, L. (2015). Enhancing the employer-employee relationship through corporate social responsibility (CSR) engagement. International Journal of Contemporary Hospitality Management, 27(7), 1479–1498.
Kim, K., Kim, M., & Qian, C. (2018). Effects of corporate social responsibility on corporate financial performance: A competitive-action perspective. Journal of management, 44(3), 1097–1118.
Yu, H. C., Kuo, L., & Kao, M. F. (2017). The relationship between CSR disclosure and competitive advantage. Sustainability Accounting, Management and Policy Journal, 8, 547–570.
Jenkins, H. (2009). A ‘business opportunity’ model of corporate social responsibility for small- and medium-sized enterprises. Business Ethics: A European Review, 18(1), 21–36.
Fombrun, C., & Shanley, M. (1990). What’s in a name? Reputation building and corporate strategy. Academy of management Journal., 33(2), 233–258.
Shakil, M. H. (2021). Environmental, social and governance performance and financial risk: Moderating role of ESG controversies and board gender diversity. Resources Policy, 72, 102–144.
Carroll, A. B. (1979). A three-dimensional conceptual model of corporate performance. Academy of Management Review., 4(4), 497–505.
Klein, J., & Dawar, N. (2004). Corporate social responsibility and consumers’ attributions and brand evaluations in a product—Harm crisis. International Journal of Research in Marketing, 21(3), 203–217.
Cai, Y., Jo, H., & Pan, C. (2012). Doing well while doing bad? CSR in controversial industry sectors. Journal of Business Ethics., 108(4), 467–480. https://doi.org/10.1007/s10551-011-1103-7
Capelle-Blancard, G. (2019). A. petit every little helps? ESG news and stock market reaction. Journal of Business Ethics., 157, 543–565.
Treepongkaruna, S., Kyaw, K., & Jiraporn, P. (2022). Shareholder litigation rights and ESG controversies: A quasi-natural experiment. International Review of Financial Analysis, 84, 102396.
Samaniego-Medina, R., & Giráldez-Puig, P. (2022). Do sustainability risks affect credit ratings? Evidence From European Banks. Amfiteatru Economic., 24(61), 720–738.
Bachmann R., Ehrlich G., Ruzic D., (2017) Firms and collective reputation: The Volkswagen emission scandal as a case study. CESifo working paper. 6805
Barth, F., Eckert, C., Gatzert, N., & Scholz, H. (2022). Spillover effects from the volkswagen emissions scandal: An analysis of stock and corporate bond markets. Schmalenbach Journal of Business Research, 74, 37–76.
Bonini, S., & Boraschi, D. (2012). Corporate scandals and capital structure (pp. 241–269). Netherlands: Springer.
Morsing, M., & Schultz, M. (2006). Corporate social responsibility communication: stakeholder information, response and involvement strategies. Business ethics: A European review., 15(54), 323–338.
Seele, P., & Lock, I. (2015). Instrumental and/or deliberative? A typology of CSR communication tools. Journal of Business Ethics, 131, 401–414.
Chylinski, M., & Chu, A. (2010). Consumer cynicism: Antecedents and consequences. European Journal of Marketing, 44(6), 796–837.
Leonidou, C. N., & Skarmeas, D. (2017). Gray shades of green: Causes and consequences of green skepticism. Journal of Business Ethics, 144(2), 401–415.
Barbeito-Caamaño, A., & Chalmeta, R. (2020). Using big data to evaluate corporate social responsibility and sustainable development practices. Corporate Social Responsibility and Environmental Management, 27(6), 2831–2848.
Alaparthi S., M. Mishra M. (2020) Bidirectional encoder representations from transformers (BERT): A sentiment analysis odyssey. arXiv preprint arXiv:2007.01127
Jha, A., & Verma, N. (2023). Social media sustainability communication: An analysis of firm behaviour and stakeholder responses. Information Systems Frontiers, 25(2), 723–742. https://doi.org/10.1007/s10796-022-10257-6
Friede, G., Busch, T., & Bassen, A. (2015). ESG and financial performance: aggregated evidence from more than 2000 empirical studies. Journal of Sustainable, 5(4), 210–233.
Liu, M., Luo, X., & Lu, W. Z. (2023). Public perceptions of environmental, social, and governance (ESG) based on social media data: Evidence from China. Journal of Cleaner Production, 387, 135840.
Nematzadeh A., Bang G., Liu X., Ma Z. (2019). Empirical study on detecting controversy in social media. arXiv, p. arXiv:1909.01093
Teoh T. T., Heng Q. K., Chia J. J., Liaw S. W., Yang M., Nguwi Y. Y. (2019). Machine learning-based corporate socia responsibility prediction. chez In 2019 IEEE International conference on cybernetics and intelligent systems (CIS) and IEEE conference on robotics, automation and mechatronics (RAM), Bangkok, THAILAND
Antoncic, M. (2020). Uncovering hidden signals for sustainable investing using big data: Artificial intelligence, machine learning and natural language processing. Journal of Risk Management in Financial Institutions, 13(2), 106–113.
D’Amato, V., D’Ecclesia, R., & Levantesi, S. (2021). Fundamental ratios as predictors of ESG scores: A machine learning approach. Decisions in Economics and Finance, 44, 1087–1110.
Svanberg, J., Ardeshiri, T., Samsten, I., Öhman, P., Rana, T., & Danielson, M. (2022). Prediction of environmental controversies and development of a corporate environmental performance rating methodology. Journal of Cleaner Production., 344, 130979.
Svanberg, J., Ardeshiri, T., Samsten, I., Öhman, P., Neidermeyer, P. E., Rana, T., Semenova, N., & Danielson, M. (2022). Corporate governance performance ratings with machine learning. Intelligent Systems in Accounting, Finance and Management, 29(1), 50–68.
Semenova, N., & Hassel, L. G. (2015). On the validity of environmental performance metrics. Journal of Business Ethics, 132, 249–258.
Refinitiv (2022) Refinitiv | Environmental, social and governance scores from Refinitiv. [En ligne]. Available: https://www.refinitiv.com/content/dam/marketing/en_us/documents/methodology/refinitiv-esg-scores-methodology.pdf.
Osisanwo, F. Y., Akinsola, J. E. T., Awodele, O., Hinmikaiye, J. O., Olakanmi, O., & Akinjobi, J. (2017). Supervised machine learning algorithms: classification and comparison. International Journal of Computer Trends and Technology (IJCTT), 48(3), 128–138.
Kim, H., Cho, H., & Ryu, D. (2020). Corporate default predictions using machine learning: Literature review. Sustainability, 12(16), 6325.
Medhat, W., Hassan, A., & Korashy, H. (2014). Sentiment analysis algorithms and applications: A survey. Ain Shams engineering journal, 5(4), 1093–1113.
Devlin J., Chang M. W., Lee K., Toutanova K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805
Jin, D., Jin, Z., Zhou, J. T., & Szolovits, P. (2020). Is bert really robust? a strong baseline for natural language attack on text classification and entailment. Proceedings of the AAAI conference on artificial intelligence, 34(5), 8018–8025.
Hugging Face. www.huggingface.co, [En ligne]. Available: https://huggingface.co/nlptown/bert-base-multilingual-uncased-sentiment.
Tsai, C. (2009). Feature selection in bankruptcy prediction. Knowledge-Based Systems, 22(2), 120–127.
Mitchell, T. M. (2007). Machine Learning. McGraw-hill.
Leo, M., Sharma, S., & Maddulety, K. (2019). Machine learning in banking risk management: A literature review. Risks, 7(1), 29.
Gan, L., Wang, H., & Yang, Z. (2020). Machine learning solutions to challenges in finance: An application to the pricing of financial products. Technological Forecasting and Social Change., 153, 119928.
Garget, R., Aggarwal, H., Centobelli, P., & Cerchione, R. (2019). Extracting knowledge from big data for sustainability: A comparison of machine learning techniques. Sustainability, 11(23), 6669.
Abdella, G. M., Kucukvar, M., Onat, N. C., Al-Yafay, H. M., & Bulak, M. E. (2020). Sustainability assessment and modeling based on supervised machine learning techniques: The case for food consumption. Journal of Cleaner Production, 251, 119661.
Tolles, J., & Meurer, W. (2016). Logistic regression: Relating patient characteristics to outcomes. JAMA, 316(5), 533–534.
Hosmer D. W., Stanley L. (2000). Applied Logistic Regression, Wiley Series in Probability and Statistics
S. Ray S. (2019). A quick review of machine learning algorithms. In 2019 International conference on machine learning, big data, cloud and parallel computing (COMITCon), pp. 35–39
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: data mining, inference, and prediction (pp. 1–758). Springer.
Khan, A., Baharudin, B., Lee, L. H., & Khan, K. (2010). A review of machine learning algorithms for text-documents classification. Journal of advances in information technology, 1(1), 4–20.
Bhavsar, H., & Ganatra, A. (2012). A comparative study of training algorithms for supervised machine learning. International Journal of Soft Computing and Engineering (IJSCE)., 2(4), 2231–2307.
Steinberg, D. (2009). CART: Classification and regression trees. The top ten algorithms in data mining (pp. 93–216). Chapman and Hall/CRC.
Biau, G., & Scornet, E. (2016). A random forest guided tour. TEST, 25, 197–227.
Chen T., C. Guestrin. (2016). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp. 785–794
LightGBM, https://lightgbm.readthedocs.io/, [En ligne]. Available: https://lightgbm.readthedocs.io/en/latest/Parameters.html#is_unbalance.
Varma, S., & Simon, R. (2006). Bias in error estimation when using cross-validation for model selection. BMC Bioinformatics, 7(1), 1–8.
Pietersma, D., Lacroix, R., Lefebvre, D., & Wade, K. M. (2003). Performance analysis for machine-learning experiments using small data sets. Computers and electronics in agriculture, 38(1), 1–17.
Goutte C., Gaussier E. (2005). A probabilistic interpretation of precision, recall and F-score, with implication for evaluation. chez Advances in Information Retrieval: 27th European Conference on IR Research, ECIR 2005, Santiago de Compostela, Spain
Sokolova M., Japkowicz N., Szpakowicz S. (2006). Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation. chez AI 2006: Advances in Artificial Intelligence: 19th Australian Joint Conference on Artificial Intelligence, Hobart, Australia.
Chawla N.V. (2010). Data mining for imbalanced datasets: An overview. Data mining and knowledge discovery handbook, pp. 875–886
Chicco, D., Tötsch, N., Jurman, G., Chicco, D., Tötsch, N., & Jurman, G. (2021). The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation. BioData mining, 14(1), 1–22.
Aouadi, A., & Marsat, S. (2018). Do ESG controversies matter for firm value? Evidence from international data. Journal of business ethics, 151(4), 1027–1047.
Li, J., & Wu, D. (2020). Do corporate social responsibility engagements lead to real environmental, social, and governance impact? Management Science, 66(6), 2564–25880.
Drempetic, S., Klein, C., & Zwergel, B. (2020). The influence of firm size on the ESG score: Corporate sustainability ratings under review. Journal of business ethics., 167, 333–360.
Crilly, D., Zollo, M., & Hansen, T. (2012). Faking it or muddling through? Understanding decoupling in response to stakeholder pressures. Academy of Management Journal, 55(6), 1429–1448.
Capriotti, P. (2011). Communicating corporate social responsibility through the internet and social media. The handbook of communication and corporate social responsibility. 358–378
Szumniak-Samolej, J. (2019). Social media for corporate social responsibility strategy creation and communication in Poland. In A. Długopolska-Mikonowicz, S. Przytuła, & C. Stehr (Eds.), Corporate Social Responsibility in Poland. CSR, Sustainability, Ethics & Governance. Cham: Springer. https://doi.org/10.1007/978-3-030-00440-8_17.
Statista. (2023). www.statista.com [En ligne]. Available: https://www.statista.com/statistics/272014/global-social-networks-ranked-by-number-of-users/.
Chu, C., Hsu, A. L., Chou, K. H., Bandettini, P., Lin, C., Alzheimer’s Disease Neuroimaging Initiative. (2012). Does feature selection improve classification accuracy? Impact of sample size and feature selection on classification using anatomical magnetic resonance images. NeuroImage, 60(1), 59–70.
Fawcett, T. (2006). An introduction to ROC analysis. Pattern recognition letters, 27(8), 861–874.
Pilař, L., Kvasničková Stanislavská, L., Pitrová, J., Krejčí, I., Tichá, I., & Chalupová, M. (2019). Twitter analysis of global communication in the field of sustainability. Sustainability, 11(24), 6958.
Pilgrim, K., & Bohnet-Joschko, S. (2022). Corporate social responsibility on twitter: A review of topics and digital communication strategies’ success factors. Sustainability, 14(24), 16769.
Funding
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix
Comprehensive data set description
Variable | Variable definition | Data type | Variable group | |
---|---|---|---|---|
Having Controversy | Target Variable | Boolean | ESG Controversy | |
Neg Ratio (Negative) | Predictor Variable | Ratio—Numerical | Twitter Sentiment Result | |
Pos Ratio (Positive) | Predictor Variable | Ratio—Numerical | Twitter Sentiment Result | |
Neu Ratio (Neutral) | Predictor Variable | Ratio—Numerical | Twitter Sentiment Result | |
ESG Score | Predictor Variable | Numerical | ESG Metrics | |
Social Pillar Score | Predictor Variable | Numerical | ESG Metrics | |
Governance Pillar Score | Predictor Variable | Numerical | ESG Metrics | |
Environmental Pillar Score | Predictor Variable | Numerical | ESG Metrics | |
Resource Use Score | Predictor Variable | Numerical | ESG Metrics | |
Emissions Score | Predictor Variable | Numerical | ESG Metrics | |
Environmental Innovation Score | Predictor Variable | Numerical | ESG Metrics | |
Workforce Score | Predictor Variable | Numerical | ESG Metrics | |
Human Rights Score | Predictor Variable | Numerical | ESG Metrics | |
Community Score | Predictor Variable | Numerical | ESG Metrics | |
Product Responsibility Score | Predictor Variable | Numerical | ESG Metrics | |
Management Score | Predictor Variable | Numerical | ESG Metrics | |
Shareholders Score | Predictor Variable | Numerical | ESG Metrics | |
CSR Strategy Score | Predictor Variable | Numerical | ESG Metrics | |
Basic Materials | Predictor Variable | Boolean | GICS sector | |
Consumer Cyclicals | Predictor Variable | Boolean | GICS sector | |
Consumer Non-Cyclicals | Predictor Variable | Boolean | GICS sector | |
Energy | Predictor Variable | Boolean | GICS sector | |
Financials | Predictor Variable | Boolean | GICS sector | |
Healthcare | Predictor Variable | Boolean | GICS sector | |
Industrials | Predictor Variable | Boolean | GICS sector | |
Technology | Predictor Variable | Boolean | GICS sector | |
Utilities | Predictor Variable | Boolean | GICS sector | |
Location of Headquarters | Predictor Variable | Boolean | Location of Headquarters | |
Market Cap USD | Predictor Variable | Numerical | Financial Data | |
Total Debt USD | Predictor Variable | Numerical | Financial Data | |
Total Assets USD | Predictor Variable | Numerical | Financial Data | |
Common Shareholder Equity USD | Predictor Variable | Numerical | Financial Data | |
ROA | Predictor Variable | Ratio—Numerical | Financial Data | |
ROE | Predictor Variable | Ratio—Numerical | Financial Data | |
Dividend Per Share Yield | Predictor Variable | Ratio—Numerical | Financial Data | |
Annual Return | Predictor Variable | Ratio—Numerical | Stock Exchange Data | |
Change Volume | Predictor Variable | Ratio—Numerical | Stock Exchange Data | |
High-Low Return | Predictor Variable | Ratio—Numerical | Stock Exchange Data |
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Lheureux, Y. Predictive insights: leveraging Twitter sentiments and machine learning for environmental, social and governance controversy prediction. J Comput Soc Sc (2023). https://doi.org/10.1007/s42001-023-00228-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42001-023-00228-5