Abstract
Emotions are a vital and fundamental part of our existence. Whatever we do, say, or do not say somehow reflects our feelings, however not immediately. To comprehend human’s most fundamental behaviour, we must examine these feelings using emotional data. According to the extensive literature review, categorising speech text into multiple classes is now undergoing extensive investigation. The application of this research is very limited in local and regional languages such as Hindi. This study focuses on text emotion analysis, specifically for the Hindi language. In our study, BHAAV Dataset is used, which consists of 20,304 sentences, where every other sentence has been manually annotated into one of the five emotion categories (Anger, Suspense, Joy, Sad, Neutral). Comparison of multiple machine learning and deep learning techniques with word embedding is used to demonstrate accuracy. And then, the trained model is used to predict the emotions of Hindi text. The best performance were observed in case of mBERT model with loss- 0.1689 ,balanced_accuracy- 93.88%, recall- 93.44%, auc- 99.55% and precision- 94.39 % on training data, while loss- 0.3073, balanced_accuracy- 91.84%, recall- 91.74%, auc- 98.46% and precision- 92.01% on testing data.
Similar content being viewed by others
Data Availability
The dataset analysed during the current study is now openly available on https://zenodo.org/record/3457467#.YuyoTXZBxPa Zenodo repository.
References
Ahmad Z, Jindal R, Ekbal A, Bhattachharyya P (2020) Borrow from rich cousin: transfer learning for emotion detection using cross lingual embedding. Expert Syst Appl 139:112851
Al-Azani S, El-Alfy E-SM (2020) Enhanced video analytics for sentiment analysis based on fusing textual, auditory and visual information. IEEE Access 8:136843–136857. https://doi.org/10.1109/ACCESS.2020.3011977
Alammar J (2023) The illustrated Bert, Elmo, and Co. (how NLP cracked transfer learning). http://jalammar.github.io/illustrated-bert/
Alm ECO (2008) Affect in* text and speech. University of Illinois at Urbana-Champaign
Appen: Datasets Resource Center (2022) https://appen.com/datasets-resource-center/. Accessed 06 May 2022
Blitzer J, Dredze M, Pereira F (2007) Biographies, bollywood, boom-boxes and blenders: domain adaptation for sentiment classification. In: Proceedings of the 45th annual meeting of the association of computational linguistics, pp 440–447
Buechel S, Hahn U (2017) Readers vs. writers vs. texts: coping with different perspectives of text understanding in emotion annotation. In: Proceedings of the 11th linguistic annotation workshop, pp 1–12
Calvo RA, D’Mello S (2010) Affect detection: an interdisciplinary review of models, methods, and their applications. IEEE Trans Affect Comput 1 (1):18–37
Cambria E, White B (2014) Jumping nlp curves: a review of natural language processing research. IEEE Comput Intell Mag 9(2):48–57
Cambria E, Das D, Bandyopadhyay S, Feraco A (2017) Affective computing and sentiment analysis. In: A practical guide to sentiment analysis. Springer, pp 1–10
Chaffar S, Inkpen D (2011) Using a heterogeneous dataset for emotion analysis in text. In: Canadian conference on artificial intelligence. Springer, pp 62–67
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Chen S-Y, Hsu C-C, Kuo C-C, Ku L-W, et al. (2018) Emotionlines: an emotion corpus of multi-party conversations. arXiv:1802.08379
Devlin J (2019) Bert/multilingual at google-research/bert, GitHub. https://github.com/google-research/bert/blob/master/multilingual.md. Accessed 04 Sept 2022
Feng Y, Cheng Y (2021) Short text sentiment analysis based on multi-channel cnn with multi-head attention mechanism. IEEE Access 9:19854–19863. https://doi.org/10.1109/ACCESS.2021.3054521
Ghazi D, Inkpen D, Szpakowicz S (2015) Detecting emotion stimuli in emotion-bearing sentences. In: International conference on intelligent text processing and computational linguistics. Springer, pp 152–165
Hsu C-C, Ku L-W (2022) EmotionX 2019 - datasets. https://sites.google.com/view/emotionx2019/datasets. Accessed 06 May 2022
Huang Y-H, Lee S-R, Ma M-Y, Chen Y-H, Yu Y-W, Chen Y-S (2019) Emotionx-idea: emotion bert–an affectional model for conversation. arXiv:1908.06264
Huang C, Trabelsi A, Zaïane OR (2019) Ana at semeval-2019 task 3: contextual emotion detection in conversations through hierarchical lstms and bert. arXiv:1904.00132
Joulin A, Grave E, Bojanowski P, Douze M, Jégou H, Mikolov T (2016) Fasttext.zip: compressing text classification models. arXiv:1612.03651
Kumar Y, Mahata D, Aggarwal S, Chugh A, Maheshwari R, Shah RR (2029) BHAAV- A Text Corpus for Emotion Analysis from Hindi Stories
Li Y, Su H, Shen X, Li W, Cao Z, Niu S (2017) Dailydialog: a manually labelled multi-turn dialogue dataset. arXiv:1710.03957
Liu V, Banea C, Mihalcea R (2017) Grounded emotions. In: 2017 27th International conference on affective computing and intelligent interaction (ACII). IEEE, pp 477–483
Lu Z, Cao L, Zhang Y, Chiu C-C, Fan J (2020) Speech sentiment analysis via pre-trained features from end-to-end asr models. In: ICASSP 2020 - 2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 7149–7153. https://doi.org/10.1109/ICASSP40776.2020.9052937
Luo J, Bouazizi M, Ohtsuki T (2021) Data augmentation for sentiment analysis using sentence compression-based seqgan with data screening. IEEE Access 9:99922–99931. https://doi.org/10.1109/ACCESS.2021.3094023
Maas A, Daly RE, Pham PT, Huang D, Ng AY, Potts C (2011) Learning word vectors for sentiment analysis. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, pp 142–150
Malte A, Ratadiya P (2019) Multilingual cyber abuse detection using advanced transformer architecture. In: TENCON 2019-2019 IEEE region 10 conference (TENCON). IEEE, pp 784–789
Manshu T, Bing W (2019) Adding prior knowledge in hierarchical attention neural network for cross domain sentiment classification. IEEE Access 7:32578–32588. https://doi.org/10.1109/ACCESS.2019.2901929
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. Adv Neural Inf Process Syst 26
Mohammad SM, Bravo-Marquez F (2017) Wassa-2017 shared task on emotion intensity. arXiv:1708.03700
Pang B, Lee L (2005) Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales arXiv:cs/0506075
Pang B, Lee L, et al. (2008) Opinion mining and sentiment analysis. Found TrendsⓇ Inf Retrieval 2(1–2):1–135
Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Empirical methods in natural language processing (EMNLP), pp 1532–1543. http://www.aclweb.org/anthology/D14-1162
Polignano M, Basile P, de Gemmis M, Semeraro G (2019) A comparison of word-embeddings in emotion detection from text using bilstm, cnn and self-attention. In: Adjunct publication of the 27th conference on user modeling, adaptation and personalization, pp 63–68
Preoţiuc-Pietro D, Schwartz HA, Park G, Eichstaedt J, Kern M, Ungar L, Shulman E (2016) Modelling valence and arousal in facebook posts. In: Proceedings of the 7th workshop on computational approaches to subjectivity, sentiment and social media analysis, pp 9–15
Ragheb W, Azé J., Bringay S, Servajean M (2019) Attention-based modeling for emotion detection and classification in textual conversations. arXiv:1906.07020
Rosenthal S, Farra N, Nakov P (2019) Semeval-2017 task 4: sentiment analysis in twitter. arXiv:1912.00741
Scherer KR, Wallbott HG (1994) Evidence for universality and cultural variation of differential emotion response patterning. J Personal Social Psychol 66 (2):310
Seal D, Roy UK, Basak R (2020) Sentence-level emotion detection from text based on semantic rules. In: Information and communication technology for sustainable development. Springer, pp 423–430
Seo S, Na S, Kim J (2020) Hmtl: heterogeneous modality transfer learning for audio-visual sentiment analysis. IEEE Access 8:140426–140437. https://doi.org/10.1109/ACCESS.2020.3006563
Suhasini M, Srinivasu B (2020) Emotion detection framework for twitter data using supervised classifiers. In: Data engineering and communication technology. Springer, pp 565–576
Taskin Z, Al U (2019) Natural language processing applications in library and information science
Wang B, Liakata M, Zubiaga A, Procter R, Jensen E (2016) Smiles: twitter emotion classification using domain. In: SAAIP@ IJCAI
Wang J, Yu L-C, Lai KR, Zhang X (2020) Tree-structured regional cnn-lstm model for dimensional sentiment analysis. IEEE/ACM Trans Audio Speech Language Process 28:581–591. https://doi.org/10.1109/TASLP.2019.2959251
Yin F, Wang Y, Liu J, Lin L (2020) The construction of sentiment lexicon based on context-dependent part-of-speech chunks for semantic disambiguation. IEEE Access 8:63359–63367. https://doi.org/10.1109/ACCESS.2020.2984284
Yu L-C, Lee L-H, Hao S, Wang J, He Y, Hu J, Lai KR, Zhang X (2016) Building chinese affective resources in valence-arousal dimensions. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 540–545
Zhang B, Li X, Xu X, Leung K-C, Chen Z, Ye Y (2020) Knowledge guided capsule attention network for aspect-based sentiment analysis. IEEE/ACM Trans Audio Speech Language Process 28:2538–2551. https://doi.org/10.1109/TASLP.2020.3017093
Zhu Y, Kiros R, Zemel R, Salakhutdinov R, Urtasun R, Torralba A, Fidler S (2015) Aligning books and movies: towards story-like visual explanations by watching movies and reading books. In: Proceedings of the IEEE international conference on computer vision, pp 19–27
Acknowledgements
We express our sincere gratitude to the MIDAS Research Laboratory, IIIT-Delhi, India for providing us the BHAAV dataset. The authors would like to acknowledge the technical support of Writing Lab, Institute for the Future of Education, Tecnologico de Monterrey, Mexico, in the production of this work.
Funding
There is no funding received against this research.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that there is no Conflicts of interests or Competing interests in this research.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Kumar, T., Mahrishi, M. & Sharma, G. Emotion recognition in Hindi text using multilingual BERT transformer. Multimed Tools Appl 82, 42373–42394 (2023). https://doi.org/10.1007/s11042-023-15150-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-15150-1