Skip to main content

Predictive insights: leveraging Twitter sentiments and machine learning for environmental, social and governance controversy prediction

Abstract

This research introduces an innovative approach that utilizes machine learning to forecast Environmental, Social, and Governance (ESG) controversies within corporations, based on public opinions expressed on Twitter. Drawing on the theoretical foundations of legitimacy theory and stakeholder theory, the proposed methodology emphasizes the essential role of stakeholder engagement in effectively managing ESG risks and promoting sustainable business practices. Through the examination of eight machine-learning algorithms, the research showcases the accurate forecasting of ESG controversies, specifically achieving a remarkable overall F1-Score of 80% by LightGBM. The findings underscore the significant contribution of machine learning models and social media analytics in ESG risk management and controversy mitigation. Companies can anticipate potential controversies and proactively improve their Corporate Social Responsibility practices by actively monitoring public sentiments, especially on social media platforms. Analyzing positive sentiments as indicators of successful practices and negative sentiments as potential areas of concern further enhances their legitimacy and foster stakeholder engagement.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Data availability

Research data are available when requested.

References

  1. Karaman, Y. (2022). The effect of countries’ ESG ratings on sovereign credit default swaps: An Empirical Evidence on OECD Countries (2008–2019). World Journal of Entrepreneurship, Management and Sustainable Development., 18, 447–465.

    Article  Google Scholar 

  2. Dowling, J., & Pfeffer, J. (1975). Organizational legitimacy: social values and organizational behavior. Pacific Sociological Review, 18(1), 122–136.

    Article  Google Scholar 

  3. Freeman, R. E. (1984). Strategic management: a stakeholder approach. Cambridge University Pres.

    Google Scholar 

  4. ISO 26000. (2010). Guidance on social responsibility, ISO 2010, Switzerland - https://iso26000.info/wp-content/uploads/2017/06/ISO-26000_2010_E_OBPpages.pdf.

  5. Anghel, L. D., Grigore, G. F., & Roşca, M. (2011). Cause-related marketing, part of corporate social responsibility and its influence upon consumers’ attitude. Amfiteatru Economic Journal, 13(29), 72–85.

    Google Scholar 

  6. Mandhachitara, R., & Poolthong, Y. (2011). A model of customer loyalty and corporate social responsibility. Journal of Services Marketing, 25(2), 122–133.

    Article  Google Scholar 

  7. Supanti, D., Butcher, K., & Fredline, L. (2015). Enhancing the employer-employee relationship through corporate social responsibility (CSR) engagement. International Journal of Contemporary Hospitality Management, 27(7), 1479–1498.

    Article  Google Scholar 

  8. Kim, K., Kim, M., & Qian, C. (2018). Effects of corporate social responsibility on corporate financial performance: A competitive-action perspective. Journal of management, 44(3), 1097–1118.

    Article  Google Scholar 

  9. Yu, H. C., Kuo, L., & Kao, M. F. (2017). The relationship between CSR disclosure and competitive advantage. Sustainability Accounting, Management and Policy Journal, 8, 547–570.

    Article  Google Scholar 

  10. Jenkins, H. (2009). A ‘business opportunity’ model of corporate social responsibility for small- and medium-sized enterprises. Business Ethics: A European Review, 18(1), 21–36.

    Article  Google Scholar 

  11. Fombrun, C., & Shanley, M. (1990). What’s in a name? Reputation building and corporate strategy. Academy of management Journal., 33(2), 233–258.

    Article  Google Scholar 

  12. Shakil, M. H. (2021). Environmental, social and governance performance and financial risk: Moderating role of ESG controversies and board gender diversity. Resources Policy, 72, 102–144.

    Article  Google Scholar 

  13. Carroll, A. B. (1979). A three-dimensional conceptual model of corporate performance. Academy of Management Review., 4(4), 497–505.

    Article  Google Scholar 

  14. Klein, J., & Dawar, N. (2004). Corporate social responsibility and consumers’ attributions and brand evaluations in a product—Harm crisis. International Journal of Research in Marketing, 21(3), 203–217.

    Article  Google Scholar 

  15. Cai, Y., Jo, H., & Pan, C. (2012). Doing well while doing bad? CSR in controversial industry sectors. Journal of Business Ethics., 108(4), 467–480. https://doi.org/10.1007/s10551-011-1103-7

    Article  Google Scholar 

  16. Capelle-Blancard, G. (2019). A. petit every little helps? ESG news and stock market reaction. Journal of Business Ethics., 157, 543–565.

    Article  Google Scholar 

  17. Treepongkaruna, S., Kyaw, K., & Jiraporn, P. (2022). Shareholder litigation rights and ESG controversies: A quasi-natural experiment. International Review of Financial Analysis, 84, 102396.

    Article  Google Scholar 

  18. Samaniego-Medina, R., & Giráldez-Puig, P. (2022). Do sustainability risks affect credit ratings? Evidence From European Banks. Amfiteatru Economic., 24(61), 720–738.

    Article  Google Scholar 

  19. Bachmann R., Ehrlich G., Ruzic D., (2017) Firms and collective reputation: The Volkswagen emission scandal as a case study. CESifo working paper. 6805

  20. Barth, F., Eckert, C., Gatzert, N., & Scholz, H. (2022). Spillover effects from the volkswagen emissions scandal: An analysis of stock and corporate bond markets. Schmalenbach Journal of Business Research, 74, 37–76.

    Article  Google Scholar 

  21. Bonini, S., & Boraschi, D. (2012). Corporate scandals and capital structure (pp. 241–269). Netherlands: Springer.

    Google Scholar 

  22. Morsing, M., & Schultz, M. (2006). Corporate social responsibility communication: stakeholder information, response and involvement strategies. Business ethics: A European review., 15(54), 323–338.

    Article  Google Scholar 

  23. Seele, P., & Lock, I. (2015). Instrumental and/or deliberative? A typology of CSR communication tools. Journal of Business Ethics, 131, 401–414.

    Article  Google Scholar 

  24. Chylinski, M., & Chu, A. (2010). Consumer cynicism: Antecedents and consequences. European Journal of Marketing, 44(6), 796–837.

    Article  Google Scholar 

  25. Leonidou, C. N., & Skarmeas, D. (2017). Gray shades of green: Causes and consequences of green skepticism. Journal of Business Ethics, 144(2), 401–415.

    Article  Google Scholar 

  26. Barbeito-Caamaño, A., & Chalmeta, R. (2020). Using big data to evaluate corporate social responsibility and sustainable development practices. Corporate Social Responsibility and Environmental Management, 27(6), 2831–2848.

    Article  Google Scholar 

  27. Alaparthi S., M. Mishra M. (2020) Bidirectional encoder representations from transformers (BERT): A sentiment analysis odyssey. arXiv preprint arXiv:2007.01127

  28. Jha, A., & Verma, N. (2023). Social media sustainability communication: An analysis of firm behaviour and stakeholder responses. Information Systems Frontiers, 25(2), 723–742. https://doi.org/10.1007/s10796-022-10257-6

    Article  Google Scholar 

  29. Friede, G., Busch, T., & Bassen, A. (2015). ESG and financial performance: aggregated evidence from more than 2000 empirical studies. Journal of Sustainable, 5(4), 210–233.

    Google Scholar 

  30. Liu, M., Luo, X., & Lu, W. Z. (2023). Public perceptions of environmental, social, and governance (ESG) based on social media data: Evidence from China. Journal of Cleaner Production, 387, 135840.

    Article  Google Scholar 

  31. Nematzadeh A., Bang G., Liu X., Ma Z. (2019). Empirical study on detecting controversy in social media. arXiv, p. arXiv:1909.01093

  32. Teoh T. T., Heng Q. K., Chia J. J., Liaw S. W., Yang M., Nguwi Y. Y. (2019). Machine learning-based corporate socia responsibility prediction. chez In 2019 IEEE International conference on cybernetics and intelligent systems (CIS) and IEEE conference on robotics, automation and mechatronics (RAM), Bangkok, THAILAND

  33. Antoncic, M. (2020). Uncovering hidden signals for sustainable investing using big data: Artificial intelligence, machine learning and natural language processing. Journal of Risk Management in Financial Institutions, 13(2), 106–113.

    Google Scholar 

  34. D’Amato, V., D’Ecclesia, R., & Levantesi, S. (2021). Fundamental ratios as predictors of ESG scores: A machine learning approach. Decisions in Economics and Finance, 44, 1087–1110.

    Article  Google Scholar 

  35. Svanberg, J., Ardeshiri, T., Samsten, I., Öhman, P., Rana, T., & Danielson, M. (2022). Prediction of environmental controversies and development of a corporate environmental performance rating methodology. Journal of Cleaner Production., 344, 130979.

    Article  Google Scholar 

  36. Svanberg, J., Ardeshiri, T., Samsten, I., Öhman, P., Neidermeyer, P. E., Rana, T., Semenova, N., & Danielson, M. (2022). Corporate governance performance ratings with machine learning. Intelligent Systems in Accounting, Finance and Management, 29(1), 50–68.

    Google Scholar 

  37. Semenova, N., & Hassel, L. G. (2015). On the validity of environmental performance metrics. Journal of Business Ethics, 132, 249–258.

    Article  Google Scholar 

  38. Refinitiv (2022) Refinitiv | Environmental, social and governance scores from Refinitiv. [En ligne]. Available: https://www.refinitiv.com/content/dam/marketing/en_us/documents/methodology/refinitiv-esg-scores-methodology.pdf.

  39. Osisanwo, F. Y., Akinsola, J. E. T., Awodele, O., Hinmikaiye, J. O., Olakanmi, O., & Akinjobi, J. (2017). Supervised machine learning algorithms: classification and comparison. International Journal of Computer Trends and Technology (IJCTT), 48(3), 128–138.

    Article  Google Scholar 

  40. Kim, H., Cho, H., & Ryu, D. (2020). Corporate default predictions using machine learning: Literature review. Sustainability, 12(16), 6325.

    Article  Google Scholar 

  41. Medhat, W., Hassan, A., & Korashy, H. (2014). Sentiment analysis algorithms and applications: A survey. Ain Shams engineering journal, 5(4), 1093–1113.

    Article  Google Scholar 

  42. Devlin J., Chang M. W., Lee K., Toutanova K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805

  43. Jin, D., Jin, Z., Zhou, J. T., & Szolovits, P. (2020). Is bert really robust? a strong baseline for natural language attack on text classification and entailment. Proceedings of the AAAI conference on artificial intelligence, 34(5), 8018–8025.

    Article  Google Scholar 

  44. Hugging Face. www.huggingface.co, [En ligne]. Available: https://huggingface.co/nlptown/bert-base-multilingual-uncased-sentiment.

  45. Tsai, C. (2009). Feature selection in bankruptcy prediction. Knowledge-Based Systems, 22(2), 120–127.

    Article  Google Scholar 

  46. Mitchell, T. M. (2007). Machine Learning. McGraw-hill.

    Google Scholar 

  47. Leo, M., Sharma, S., & Maddulety, K. (2019). Machine learning in banking risk management: A literature review. Risks, 7(1), 29.

    Article  Google Scholar 

  48. Gan, L., Wang, H., & Yang, Z. (2020). Machine learning solutions to challenges in finance: An application to the pricing of financial products. Technological Forecasting and Social Change., 153, 119928.

    Article  Google Scholar 

  49. Garget, R., Aggarwal, H., Centobelli, P., & Cerchione, R. (2019). Extracting knowledge from big data for sustainability: A comparison of machine learning techniques. Sustainability, 11(23), 6669.

    Article  Google Scholar 

  50. Abdella, G. M., Kucukvar, M., Onat, N. C., Al-Yafay, H. M., & Bulak, M. E. (2020). Sustainability assessment and modeling based on supervised machine learning techniques: The case for food consumption. Journal of Cleaner Production, 251, 119661.

    Article  Google Scholar 

  51. Tolles, J., & Meurer, W. (2016). Logistic regression: Relating patient characteristics to outcomes. JAMA, 316(5), 533–534.

    Article  Google Scholar 

  52. Hosmer D. W., Stanley L. (2000). Applied Logistic Regression, Wiley Series in Probability and Statistics

  53. S. Ray S. (2019). A quick review of machine learning algorithms. In 2019 International conference on machine learning, big data, cloud and parallel computing (COMITCon), pp. 35–39

  54. Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: data mining, inference, and prediction (pp. 1–758). Springer.

    Google Scholar 

  55. Khan, A., Baharudin, B., Lee, L. H., & Khan, K. (2010). A review of machine learning algorithms for text-documents classification. Journal of advances in information technology, 1(1), 4–20.

    Google Scholar 

  56. Bhavsar, H., & Ganatra, A. (2012). A comparative study of training algorithms for supervised machine learning. International Journal of Soft Computing and Engineering (IJSCE)., 2(4), 2231–2307.

    Google Scholar 

  57. Steinberg, D. (2009). CART: Classification and regression trees. The top ten algorithms in data mining (pp. 93–216). Chapman and Hall/CRC.

    Google Scholar 

  58. Biau, G., & Scornet, E. (2016). A random forest guided tour. TEST, 25, 197–227.

    Article  Google Scholar 

  59. Chen T., C. Guestrin. (2016). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp. 785–794

  60. LightGBM, https://lightgbm.readthedocs.io/, [En ligne]. Available: https://lightgbm.readthedocs.io/en/latest/Parameters.html#is_unbalance.

  61. Varma, S., & Simon, R. (2006). Bias in error estimation when using cross-validation for model selection. BMC Bioinformatics, 7(1), 1–8.

    Article  Google Scholar 

  62. Pietersma, D., Lacroix, R., Lefebvre, D., & Wade, K. M. (2003). Performance analysis for machine-learning experiments using small data sets. Computers and electronics in agriculture, 38(1), 1–17.

    Article  Google Scholar 

  63. Goutte C., Gaussier E. (2005). A probabilistic interpretation of precision, recall and F-score, with implication for evaluation. chez Advances in Information Retrieval: 27th European Conference on IR Research, ECIR 2005, Santiago de Compostela, Spain

  64. Sokolova M., Japkowicz N., Szpakowicz S. (2006). Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation. chez AI 2006: Advances in Artificial Intelligence: 19th Australian Joint Conference on Artificial Intelligence, Hobart, Australia.

  65. Chawla N.V. (2010). Data mining for imbalanced datasets: An overview. Data mining and knowledge discovery handbook, pp. 875–886

  66. Chicco, D., Tötsch, N., Jurman, G., Chicco, D., Tötsch, N., & Jurman, G. (2021). The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation. BioData mining, 14(1), 1–22.

    Article  Google Scholar 

  67. Aouadi, A., & Marsat, S. (2018). Do ESG controversies matter for firm value? Evidence from international data. Journal of business ethics, 151(4), 1027–1047.

    Article  Google Scholar 

  68. Li, J., & Wu, D. (2020). Do corporate social responsibility engagements lead to real environmental, social, and governance impact? Management Science, 66(6), 2564–25880.

    Article  Google Scholar 

  69. Drempetic, S., Klein, C., & Zwergel, B. (2020). The influence of firm size on the ESG score: Corporate sustainability ratings under review. Journal of business ethics., 167, 333–360.

    Article  Google Scholar 

  70. Crilly, D., Zollo, M., & Hansen, T. (2012). Faking it or muddling through? Understanding decoupling in response to stakeholder pressures. Academy of Management Journal, 55(6), 1429–1448.

    Article  Google Scholar 

  71. Capriotti, P. (2011). Communicating corporate social responsibility through the internet and social media. The handbook of communication and corporate social responsibility. 358–378

  72. Szumniak-Samolej, J. (2019). Social media for corporate social responsibility strategy creation and communication in Poland. In A. Długopolska-Mikonowicz, S. Przytuła, & C. Stehr (Eds.), Corporate Social Responsibility in Poland. CSR, Sustainability, Ethics & Governance. Cham: Springer. https://doi.org/10.1007/978-3-030-00440-8_17.

    Chapter  Google Scholar 

  73. Statista. (2023). www.statista.com [En ligne]. Available: https://www.statista.com/statistics/272014/global-social-networks-ranked-by-number-of-users/.

  74. Chu, C., Hsu, A. L., Chou, K. H., Bandettini, P., Lin, C., Alzheimer’s Disease Neuroimaging Initiative. (2012). Does feature selection improve classification accuracy? Impact of sample size and feature selection on classification using anatomical magnetic resonance images. NeuroImage, 60(1), 59–70.

    Article  Google Scholar 

  75. Fawcett, T. (2006). An introduction to ROC analysis. Pattern recognition letters, 27(8), 861–874.

    Article  Google Scholar 

  76. Pilař, L., Kvasničková Stanislavská, L., Pitrová, J., Krejčí, I., Tichá, I., & Chalupová, M. (2019). Twitter analysis of global communication in the field of sustainability. Sustainability, 11(24), 6958.

    Article  Google Scholar 

  77. Pilgrim, K., & Bohnet-Joschko, S. (2022). Corporate social responsibility on twitter: A review of topics and digital communication strategies’ success factors. Sustainability, 14(24), 16769.

    Article  Google Scholar 

Download references

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yasemin Lheureux.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix

Comprehensive data set description

Variable

Variable definition

Data type

 

Variable group

Having Controversy

Target Variable

Boolean

 

ESG Controversy

Neg Ratio (Negative)

Predictor Variable

Ratio—Numerical

 

Twitter Sentiment Result

Pos Ratio (Positive)

Predictor Variable

Ratio—Numerical

 

Twitter Sentiment Result

Neu Ratio (Neutral)

Predictor Variable

Ratio—Numerical

 

Twitter Sentiment Result

ESG Score

Predictor Variable

Numerical

 

ESG Metrics

Social Pillar Score

Predictor Variable

Numerical

 

ESG Metrics

Governance Pillar Score

Predictor Variable

Numerical

 

ESG Metrics

Environmental Pillar Score

Predictor Variable

Numerical

 

ESG Metrics

Resource Use Score

Predictor Variable

Numerical

 

ESG Metrics

Emissions Score

Predictor Variable

Numerical

 

ESG Metrics

Environmental Innovation Score

Predictor Variable

Numerical

 

ESG Metrics

Workforce Score

Predictor Variable

Numerical

 

ESG Metrics

Human Rights Score

Predictor Variable

Numerical

 

ESG Metrics

Community Score

Predictor Variable

Numerical

 

ESG Metrics

Product Responsibility Score

Predictor Variable

Numerical

 

ESG Metrics

Management Score

Predictor Variable

Numerical

 

ESG Metrics

Shareholders Score

Predictor Variable

Numerical

 

ESG Metrics

CSR Strategy Score

Predictor Variable

Numerical

 

ESG Metrics

Basic Materials

Predictor Variable

Boolean

 

GICS sector

Consumer Cyclicals

Predictor Variable

Boolean

 

GICS sector

Consumer Non-Cyclicals

Predictor Variable

Boolean

 

GICS sector

Energy

Predictor Variable

Boolean

 

GICS sector

Financials

Predictor Variable

Boolean

 

GICS sector

Healthcare

Predictor Variable

Boolean

 

GICS sector

Industrials

Predictor Variable

Boolean

 

GICS sector

Technology

Predictor Variable

Boolean

 

GICS sector

Utilities

Predictor Variable

Boolean

 

GICS sector

Location of Headquarters

Predictor Variable

Boolean

 

Location of Headquarters

Market Cap USD

Predictor Variable

Numerical

 

Financial Data

Total Debt USD

Predictor Variable

Numerical

 

Financial Data

Total Assets USD

Predictor Variable

Numerical

 

Financial Data

Common Shareholder Equity USD

Predictor Variable

Numerical

 

Financial Data

ROA

Predictor Variable

Ratio—Numerical

 

Financial Data

ROE

Predictor Variable

Ratio—Numerical

 

Financial Data

Dividend Per Share Yield

Predictor Variable

Ratio—Numerical

 

Financial Data

Annual Return

Predictor Variable

Ratio—Numerical

 

Stock Exchange Data

Change Volume

Predictor Variable

Ratio—Numerical

 

Stock Exchange Data

High-Low Return

Predictor Variable

Ratio—Numerical

 

Stock Exchange Data

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lheureux, Y. Predictive insights: leveraging Twitter sentiments and machine learning for environmental, social and governance controversy prediction. J Comput Soc Sc (2023). https://doi.org/10.1007/s42001-023-00228-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42001-023-00228-5

Keywords