Abstract
Recent NLP breakthroughs have significantly advanced the state of emotion classification (EC) over text data. However, current treatments guide learning by traditional performance metrics, such as classification error rate, which are not suitable for the highly-imbalanced EC problems; in fact, EC models are predominantly evaluated by variations of the F-measure, recognizing the data imbalance. This paper addresses the dissonance between the learning objective and the performance evaluation for EC with moderate to severe data imbalance. We propose a series of increasingly powerful algorithms for F-measure improvement. An ablation study demonstrates the superiority of learning an optimal class decision threshold. Increased performance is demonstrated when joint learning is carried out over both the representation and the class decision thresholds. Thorough empirical evaluation on benchmark EC datasets that span the spectrum of number of classes and class imbalance shows clear F-measure improvements over baseline models, with good improvements over pre-trained deep models and higher improvements over untrained deep architectures.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. J. Artif. Intell. Res. 61, 863–905 (2018)
Barbieri, F., Camacho-Collados, J., Espinosa-Anke, L., Neves, L.: TweetEval: unified benchmark and comparative evaluation for tweet classification. In: Findings of EMNLP (2020)
Chen, S.Y., Hsu, C.C., Kuo, C.C., Ku, L.W., et al.: EmotionLines: an emotion corpus of multi-party conversations. arXiv preprint arXiv:1802.08379 (2018)
Demszky, D., Movshovitz-Attias, D., Ko, J., Cowen, A., Nemade, G., et al.: GoEmotions: a dataset of fine-grained emotions. In: 58th Annual Meeting of the Association for Computational Linguistics (ACL) (2020)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Gayed, A., Milligan-Seville, J.S., Nicholas, J., Bryan, B.T., LaMontagne, A.D., et al.: Effectiveness of training workplace managers to understand and support the mental health needs of employees: a systematic review and meta-analysis. Occup. Environ. Med. 75(6), 462–470 (2017)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Koyejo, O., Natarajan, N., Ravikumar, P., Dhillon, I.S.: Consistent binary classification with generalized performance metrics. In: NIPS, vol. 27, pp. 2744–2752. Citeseer (2014)
Liu, M., Zhang, X., Zhou, X., Yang, T.: Faster online learning of optimal threshold for consistent f-measure optimization. In: Advances in Neural Information Processing Systems, pp. 3893–3903 (2018)
Narasimhan, H., Kar, P., Jain, P.: Optimizing non-decomposable performance measures: a tale of two classes. In: International Conference on Machine Learning, pp. 199–208 (2015)
Padurariu, C., Breaban, M.E.: Dealing with data imbalance in text classification. Procedia Comput. Sci. 159, 736–745 (2019)
Rosenthal, S., Farra, N., Nakov, P.: SemEval-2017 task 4: sentiment analysis in Twitter. In: International Workshop on Semantic Evaluation, SemEval 2017, Vancouver, Canada. Association for Computational Linguistics (2017)
Shorten, C., Khoshgoftaar, T.M., Furht, B.: Text data augmentation for deep learning. J. Big Data 8, 101 (2021)
Singh, K.: How to improve class imbalance using class weights in machine learning (2020). www.analyticsvidhya.com/blog/2020/10/improve-class-imbalance-class-weights/. Accessed 27 Jan 2022
Wang, C., Lin, H.: Constructing an affective tutoring system for designing course learning and evaluation. J. Educ. Comput. 55(8), 1111–1128 (2017)
Wolf, T., et al.: Transformers: state-of-the-art natural language processing. In: Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 38–45. Association for Computational Linguistics (2020). www.aclweb.org/anthology/2020.emnlp-demos.6
Yan, Y., Yang, T., Yang, Y., Chen, J.: A framework of online learning with imbalanced streaming data. In: AAAI Conference on Artificial Intelligence, pp. 2817–2823 (2017)
Ye, L., Xu, R., Xu, J.: Emotion prediction of news articles from reader’s perspective based on multi-label classification. In: International Conference on Machine Learning Cybernetics, vol. 5, pp. 2019–2024 (2012)
Zahiri, S.M., Choi, J.D.: Emotion detection on tv show transcripts with sequence-based convolutional neural networks. In: Workshops at the Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Zahra Rajabi, A.S., Uzuner, O.: Detecting scarce emotions via BERT and hyperparameter optimization. In: International Conference on Artificial Neural Networks (ICANN), pp. 1–12 (2021)
Zhao, M.J., Edakunni, N., Pocock, A., Brown, G.: Beyond Fano’s inequality: bounds on the optimal F-score, BER, and cost-sensitive risk and their implications. J. Mach. Learn. Res. 14(1), 1033–1090 (2013)
Acknowledgement
All experiments were run on ARGO, a computing cluster provided by the Office of Research Computing at George Mason University, VA (URL: http://orc.gmu.edu).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Inan, T.T., Liu, M., Shehu, A. (2022). F-Measure Optimization for Multi-class, Imbalanced Emotion Classification Tasks. In: Pimenidis, E., Angelov, P., Jayne, C., Papaleonidas, A., Aydin, M. (eds) Artificial Neural Networks and Machine Learning – ICANN 2022. ICANN 2022. Lecture Notes in Computer Science, vol 13529. Springer, Cham. https://doi.org/10.1007/978-3-031-15919-0_14
Download citation
DOI: https://doi.org/10.1007/978-3-031-15919-0_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15918-3
Online ISBN: 978-3-031-15919-0
eBook Packages: Computer ScienceComputer Science (R0)