ShEMO: a large-scale validated database for Persian speech emotion detection

Mohamad Nezami, Omid; Jamshid Lou, Paria; Karami, Mansoureh

doi:10.1007/s10579-018-9427-x

ShEMO: a large-scale validated database for Persian speech emotion detection

Original Paper
Published: 08 October 2018

Volume 53, pages 1–16, (2019)
Cite this article

Language Resources and Evaluation Aims and scope Submit manuscript

Omid Mohamad Nezami ORCID: orcid.org/0000-0003-0062-4422¹,
Paria Jamshid Lou² &
Mansoureh Karami²

849 Accesses
43 Citations
2 Altmetric
Explore all metrics

Abstract

This paper introduces a large-scale, validated database for Persian called Sharif Emotional Speech Database (ShEMO). The database includes 3000 semi-natural utterances, equivalent to 3 h and 25 min of speech data extracted from online radio plays. The ShEMO covers speech samples of 87 native-Persian speakers for five basic emotions including anger, fear, happiness, sadness and surprise, as well as neutral state. Twelve annotators label the underlying emotional state of utterances and majority voting is used to decide on the final labels. According to the kappa measure, the inter-annotator agreement is 64% which is interpreted as “substantial agreement”. We also present benchmark results based on common classification methods in speech emotion detection task. According to the experiments, support vector machine achieves the best results for both gender-independent (58.2%) and gender-dependent models (female = 59.4%, male = 57.6%). The ShEMO will be available for academic purposes free of charge to provide a baseline for further research on Persian emotional speech.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

A survey on sentiment analysis methods, applications, and challenges

Article 07 February 2022

Mayur Wankhade, Annavarapu Chandra Sekhara Rao & Chaitanya Kulkarni

A review on sentiment analysis and emotion detection from text

Article 28 August 2021

Pansy Nandwani & Rupali Verma

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Article Open access 07 May 2022

Essam H. Houssein, Asmaa Hammad & Abdelmgeid A. Ali

Notes

Upon publishing this paper, we release our database for academic purposes.
The prompt excludes any emotional contents in order not to intervene the expression and perception of emotional states.
www.radionamayesh.ir.
Cohen’s kappa ranges generally from 0 to 1, where large numbers indicate higher reliability and values near zero suggest that agreement is attributable to chance alone.
As Landis and Koch (1977) explain, \(0.61< kappa < 0.80\) is interpreted as “substantial agreement” among the judges.
The IPA was devised by the International Phonetic Association as a standardized representation of the sounds of oral language.
It contains 88 different parameters. For further information, please refer to Eyben et al. (2016).
Happiness has the lowest number of utterances after fear. As mentioned before, fear utterances were ignored in the classification experiments.
Actors were asked to read 10 short emotionally neutral sentences.
We trained the models on the audio (not video), speech (not song) files of the database.

References

Alvarado, N. (1997). Arousal and valence in the direct scaling of emotional response to film clips. Motivation and Emotion, 21, 323–348.
Article Google Scholar
Audhkhasi, K., & Narayanan, S. (2010). Data-dependent evaluator modeling and its application to emotional valence classification from speech. In Proceedings of INTERSPEECH (pp. 2366–2369), Makuhari, Japan.
Ayadi, M., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 44(3), 572–587.
Article Google Scholar
Batliner, A., Fischer, K., Huber, R., Spilker, J., & Noth, E. (2000). Desperately seeking emotions or: Actors, wizards, and human beings. In Proceedings of ISCA workshop on speech and emotion (pp. 195–200).
Batliner, A., Fischer, K., Huber, R., Spilker, J., & Noth, E. (2003). How to find trouble in communication. Speech Communication, 40(1–2), 117–143.
Article Google Scholar
Bijankhan, M., Sheikhzadegan, J., Roohani, M., & Samareh, Y. (1994). FARSDAT—The speech database of Farsi spoken language. In Proceedings of Australian conference on speech science and technology (pp. 826–831), Perth, Australia.
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
Article Google Scholar
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., & Weiss, B. (2005). A database of German emotional speech. In Proceedings of INTERSPEECH (pp. 1517–1520), Lissabon, Portugal. ISCA.
Busso, C., Bulut, M., & Narayanan, S. (2013). Toward effective automatic recognition systems of emotion in speech. In J. Gratch & S. Marsella (Eds.), Social emotions in nature and artifact: Emotions in human and human–computer interaction (pp. 110–127). New York, NY: Oxford University Press.
Chapter Google Scholar
Cawley, G. C., & Talbot, N. L. (2010). On over-fitting in model selection and subsequent selection bias in performance evaluation. Journal of Machine Learning Research, 11, 2079–2107.
Google Scholar
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46.
Article Google Scholar
Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., et al. (2001). Emotion recognition in human–computer interaction. IEEE Signal Processing Magazine, 18(1), 32–80.
Article Google Scholar
Deng, J., Han, W., & Schuller, B. (2012). Confidence measures for speech emotion recognition: A start. In Proceedings of speech communication (pp. 1–4), Braunschweig, Germany.
Dickerson, R., Gorlin, E., & Stankovic, J. (2011). Empath: A continuous remote emotional health monitoring system for depressive illness. In Proceedings of the 2nd conference on wireless health (pp. 1–10), New York, NY, USA.
Douglas-Cowie, E., Cowie, R., & Schroeder, M. (2000). A new emotion database: Considerations, sources and scope. In Proceedings of ISCA workshop on speech and emotion (pp. 39–44).
Ekman, P. (1982). Emotion in the human face. Cambridge: Cambridge University Press.
Google Scholar
Engberg, I., Hansen, A., Andersen, O., & Dalsgaard, P. (1997). Design, recording and verification of a Danish emotional speech database. In Proceedings of EUROSPEECH (Vol. 4, pp. 1695–1698).
Esmaileyan, Z., & Marvi, H. (2013). A database for automatic Persian speech emotion recognition: Collection, processing and evaluation. International Journal of Engineering, 27, 79–90.
Google Scholar
Eyben, F., Scherer, K., Schuller, B., Sundberg, J., Andre, E., Busso, C., et al. (2016). The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Transactions on Affective Computing, 7(2), 190–202.
Article Google Scholar
Eyben, F., Wollmer, M., & Schuller, B. (2010). openSMILE—The Munich versatile and fast open-source audio feature extractor. In Proceedings of ACM multimedia (pp. 1459–1462), Florance, Italy.
Feraru, S. M., Schuller, D., & Schuller, B. (2015). Cross-language acoustic emotion recognition: An overview and some tendencies. In Proceedings of the 6th international conference on affective computing and intelligent interaction (ACII) (pp. 125–131), Xi’an, China.
Frank, M., & Stennett, J. (2001). The forced-choice paradigm and the perception of facial expressions of emotion. Personality and Social Psychology, 80(1), 75–85.
Article Google Scholar
Furnas, G. W., Landauer, T. K., Gomez, L. M., & Dumais, S. T. (1987). The vocabulary problem in human–system communication. Communications of the ACM, 30(11), 964–971.
Article Google Scholar
Gharavian, D., & Ahadi, S. (2006). Recognition of emotional speech and speech emotion in Farsi. In Proceedings of international symposium on chinese spoken language processing (Vol. 2, pp. 299–308).
Giannakopoulos, T., Pikrakis, A., & Theodoridis, S. (2009). A dimensional approach to emotion recognition of speech from movies. In Proceedings of IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 65–68).
Grimm, M., Kroschel, K., Mower, E., & Narayanan, S. (2007). Primitives-based evaluation and estimation of emotions in speech. Speech Communication, 49(10–11), 787–800.
Article Google Scholar
Hamidi, M., & Mansoorizade, M. (2012). Emotion recognition from Persian speech with neural network. Artificial Intelligence and Applications, 3(5), 107–112.
Article Google Scholar
Heni, N., & Hamam, H. (2016). Design of emotional education system mobile games for autistic children. In Proceedings of the 2nd international conference on advanced technologies for signal and image processing (ATSIP).
Huahu, X., Jue, G., & Jian, Y. (2010). Application of speech emotion recognition in intelligent household robot. In Proceedings of international conference on artificial intelligence and computational intelligence (Vol. 1, pp. 537–541).
James, A. (1994). Is there universal recognition of emotion from facial expression? A review of the cross-cultural studies. Psychological Bulletin, 115(1), 102–141.
Article Google Scholar
Johnstone, T., Van Reekum, C., Hird, K., Kirsner, K., & Scherer, K. (2005). Affective speech elicited with a computer game. Emotion, 5(4), 513–518.
Article Google Scholar
Keshtiari, N., Kuhlmann, M., Eslami, M., & Klann-Delius, G. (2015). Recognizing emotional speech in Persian: A validated database of Persian emotional speech (Persian ESD). Behavior Research Methods, 47(1), 275–294.
Article Google Scholar
Kort, B., Reilly, R., & Picard, R. (2001). An affective model of interplay between emotions and learning: Reengineering educational pedagogy-building a learning companion. In Proceedings of the IEEE international conference on advanced learning technologies (ICALT) (pp. 43–46), Washington, DC, USA.
Landis, J., & Koch, G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174.
Article Google Scholar
Lee, C., Mower, E., Busso, C., Lee, S., & Narayanan, S. (2011). Emotion recognition using a hierarchical binary decision tree approach. Speech Communication, 53(9–10), 1162–1171.
Article Google Scholar
Lewis, P. A., Critchley, H. D., Rotshtein, P., & Dolan, J. R. (2007). Neural correlates of processing valence and arousal in affective words. Cerebral Cortex, 17(3), 742–748.
Article Google Scholar
Livingstone, S., Peck, K., & Russo, F. (2012). RAVDESS: The Ryerson audio-visual database of emotional speech and song. In Proceedings of the 22nd annual meeting of the Canadian Society for Brain, Behaviour and Cognitive Science (CSBBCS), ON, Canada.
Mansoorizadeh, M. (2009). Human emotion recognition using facial expression and speech features fusion. PhD thesis, Tarbiat Modares University, Tehran, Iran (in Persian).
McKeown, G., Valstar, M., Cowie, R., & Pantic, M. (2010). The semaine corpus of emotionally coloured character interactions. In Proceedings of IEEE international conference on multimedia and expo (ICME’10) (pp. 1079–1084), Singapore, Singapore. IEEE Computer Society. https://doi.org/10.1109/ICME.2010.5583006.
Metze, F., Batliner, A., Eyben, F., Polzehl, T., Schuller, B., & Steidl, S. (2011). Emotion recognition using imperfect speech recognition. In Proceedings of INTERSPEECH (pp. 478–481), Makuhari, Japan.
Moosavian, A., Norasteh, R., & Rahati, S. (2007). Speech emotion recognition using adaptive neuro-fuzzy inference systems. In Proceedings of the 8th conference on intelligent systems (in Persian).
Mower, E., Mataric, M., & Narayanan, S. (2009b). Evaluating evaluators: A case study in understanding the benefits and pitfalls of multi-evaluator modeling. In Proceedings of INTERSPEECH (pp. 1583–1586), Brighton, UK.
Mower, E., Metallinou, A., Lee, C., Kazemzadeh, A., Busso, C., Lee, S., & Narayanan, S. (2009a). Interpreting ambiguous emotional expressions. In Proceedings of the 3rd international conference on affective computing and intelligent interaction and workshops (ACII) (pp. 662–669), Amsterdam, The Netherlands.
Nicolaou, M., Gunes, H., & Pantic, M. (2011). Continuous prediction of spontaneous affect from multiple cues and modalities in valance–arousal space. IEEE Transactions on Affective Computing, 2(2), 92–105. eemcs-eprint-21287.
Article Google Scholar
Russell, J. A. (1980). A circumplex model of affect. Personality and Social Psychology, 39(6), 1161–1178.
Article Google Scholar
Sagha, H., Matejka, P., Gavryukova, M., Povolny, F., Marchi, E., & Schuller, B. (2016). Enhancing multilingual recognition of emotion in speech by language identification. In Proceedings of INTERSPEECH (pp. 2949–2953).
Savargiv, M., & Bastanfard, A. (2015). Persian speech emotion recognition. In Proceedings of the 7th international conference on information and knowledge technology (IKT) (pp. 1–5).
Scherer, K. (1986). Vocal affect expression: A review and a model for future research. Psychol Bull, 99(2), 143–165.
Article Google Scholar
Scherer, K., Banse, R., Wallbott, H., & Goldbeck, T. (1991). Vocal cues in emotion encoding and decoding. Motivation and Emotion, 15(2), 123–148.
Article Google Scholar
Schuller, B., Batliner, A., Steidl, S., Schiel, F., & Krajewski, J. (2011). The INTERSPEECH 2011 speaker state challenge. In Proceedings of INTERSPEECH (pp. 3201–3204), Florence, Italy. ISCA.
Schuller, B., & Munchen, T. U. (2002). Towards intuitive speech interaction by the integration of emotional aspects. In Proceedings of IEEE international conference on systems, man and cybernetics (SMC) (Vol. 1, pp. 6–11).
Schuller, B., Reiter, S., Muller, R., Al-Hames, M., Lang, M., & Rigoll, G. (2005). Speaker independent speech emotion recognition by ensemble classification. In Proceedings of IEEE international conference on multimedia and expo (ICME) (pp. 864–867).
Schuller, B., Rigoll, G., & Lang, M. (2004). Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP) (Vol. 1, pp. 577–580).
Schuller, B., Steidl, S., & Batliner, A. (2009). The INTERSPEECH 2009 emotion challenge. In Proceedings of INTERSPEECH (pp. 312–315), Brighton, UK. ISCA.
Schuller, B., Steidl, S., Batliner, A., Burkhardt, F., Devillers, L., Muller, C., et al. (2010). The INTERSPEECH 2010 paralinguistic challenge. In Proceedings of INTERSPEECH (pp. 2794–2797), Makuhari, Japan. ISCA.
Schuller, B., Steidl, S., Batliner, A., Hirschberg, J., Burgoon, J., Baird, A., et al. (2016). The INTERSPEECH 2016 computational paralinguistics challenge: Deception, sincerity & native language. In Proceedings of INTERSPEECH (pp. 2001–2005), San Francisco, USA. ISCA.
Sedaaghi, M. (2008). Documentation of the Sahand Emotional Speech Database (SES). Technical report, Department of engineering, Sahand University of Technology.
Snoek, J., Larochelle, H., & Adams, R. P. (2012). Practical Bayesian optimization of machine learning algorithms. In Advances in neural information processing systems (pp. 2951–2959).
Steidl, S. (2009). Automatic classification of emotion related user states in spontaneous children’s speech. Ph.D. thesis, University of Erlangen-Nuremberg Erlangena, Bavaria, Germany.
Wöllmer, M., Kaiser, M., Eyben, F., Schuller, B., & Rigoll, G. (2013). LSTM-modeling of continuous emotions in an audiovisual affect recognition framework. Image and Vision Computing, 31(2), 153–163.
Article Google Scholar
Yu, F., Chang, E., Xu, Y., & Shum, H. (2001). Emotion detection from speech to enrich multimedia content. In Proceedings of the 2nd IEEE Pacific Rim conference on multimedia: Advances in multimedia information processing (pp. 550–557), London, UK. Springer.

Download references

Acknowledgements

We would like to thank the anonymous reviewers for their insightful comments and suggestions. We also gratefully thank Dr. Steve Cassidy for his helpful points.

Author information

Authors and Affiliations

Bijar Branch, Islamic Azad University, Bijar, Iran
Omid Mohamad Nezami
Sharif University of Technology, Tehran, Iran
Paria Jamshid Lou & Mansoureh Karami

Authors

Omid Mohamad Nezami
View author publications
You can also search for this author in PubMed Google Scholar
Paria Jamshid Lou
View author publications
You can also search for this author in PubMed Google Scholar
Mansoureh Karami
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Omid Mohamad Nezami.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mohamad Nezami, O., Jamshid Lou, P. & Karami, M. ShEMO: a large-scale validated database for Persian speech emotion detection. Lang Resources & Evaluation 53, 1–16 (2019). https://doi.org/10.1007/s10579-018-9427-x

Download citation

Published: 08 October 2018
Issue Date: 15 March 2019
DOI: https://doi.org/10.1007/s10579-018-9427-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

ShEMO: a large-scale validated database for Persian speech emotion detection

Abstract

Access this article

Similar content being viewed by others

A survey on sentiment analysis methods, applications, and challenges

A review on sentiment analysis and emotion detection from text

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

ShEMO: a large-scale validated database for Persian speech emotion detection

Abstract

Access this article

Similar content being viewed by others

A survey on sentiment analysis methods, applications, and challenges

A review on sentiment analysis and emotion detection from text

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation