Skip to main content
Log in

Emotion recognition in Arabic speech

  • Published:
Analog Integrated Circuits and Signal Processing Aims and scope Submit manuscript

Abstract

Automatic emotion recognition from speech signals without linguistic cues has been an important emerging research area. Integrating emotions in human–computer interaction is of great importance to effectively simulate real life scenarios. Research has been focusing on recognizing emotions from acted speech while little work was done on natural real life utterances. English, French, German and Chinese corpora were used for that purpose while no natural Arabic corpus was found to date. In this paper, emotion recognition in Arabic spoken data is studied for the first time. A realistic speech corpus from Arabic TV shows is collected. The videos are labeled by their perceived emotions; namely happy, angry or surprised. Prosodic features are extracted and thirty-five classification methods are applied. Results are analyzed in this paper and conclusions and future recommendations are identified.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Petrushin, V. (2000). Emotion recognition in speech signal: Experimental study, development, and application. In Proceedings of the 6th International Conference on Spoken Language Processing, Beijing, China.

  2. Liscombe, J., Riccardi, G., & Hakkani-Tnr, D. (2005) Using context to improve emotion detection in spoken dialog systems. In Interspeech, pp. 1845–1848.

  3. Roisman, G. I., Tsai, J. L., & Chiang, K. S. (2004). The emotional integration of childhood experience: Physiological, facial expressive, and self-reported emotional response during the adult attachment interview. Developmental Psychology, 40(5), 776–789.

    Article  Google Scholar 

  4. Pantic, M., Pentland, A., Nijholt, A., & Huang, T. S. (2006). Human computing and machine understanding of human behavior: A survey. In Proceedings Eighth ACM Int’l Conf. Multimodal Interfaces, 2006, pp. 239–248.

  5. Chevrie-Muller, C., Seguier, N., Spira, A., & Dordain, M. (1978). Recognition of psychiatric disorders from voice quality. Language and Speech, 21, 87–111.

    Article  Google Scholar 

  6. France, D. J., Shiavi, R. G., Silverman, S., Silverman, M., & Wilkes, M. (2000). Acoustical properties of speech as indicators of depression and suicidal risk. IEEE Transactions on Biomedical Eng, 47(7), 829–837.

    Article  Google Scholar 

  7. Ji, Q., Lan, P., & Looney, C. (2006). A probabilistic framework for modeling and real-time monitoring human fatigue. IEEE Systems, Man, and Cybernetics Part A, 36(5), 862–875.

    Article  Google Scholar 

  8. Ai, H., Litman, D. J., Forbes-Riley, K., Rotaru, M., Tetreault, J., & Purandare, A. (2006). Using system and user performance features to improve emotion detection in spoken tutoring systems. In Proceedings of Interspeech, 2006, pp. 797–800.

  9. Klein, J., Moon, Y., & Picard, R. W. (2002). This computer responds to user frustration: Theory, design and results. Interacting with Computers, 14(2), 119–140.

    Article  Google Scholar 

  10. Kuncheva, L., Bezdek, J., & Duin, R. (2006). Decision templates for multiple classifier fusion: An experimental comparison. Pattern Recognition, 34(2), 299–314.

    Article  MATH  Google Scholar 

  11. Scheirer, J., Fernandez, R., Klein, J., & Picard, R. W. (2002). Frustrating the user on purpose: A step toward building an affective computer. Interactive Comput., 34(2), 93–118.

    Article  Google Scholar 

  12. http://android-apps.com/apps/skc-interpret/, Android Apps website.

  13. http://appcrawlr.com/android/sprint-mobile-ip, Sprint Mobile IP, App Crawlr website.

  14. Ekman, P. (1971). Universals and cultural differences in facial expressions of emotion. In Proceedings of Nebraska Symp. Motivation, pp. 207–283.

  15. Ekman, P. (1982). Emotion in the human face (2nd ed.). Cambridge: Cambridge University Press.

    Google Scholar 

  16. Ekman, P., & Oster, H. (1979). Facial expressions of emotion. Annual Review of Psychology, 30, 527–554.

    Article  Google Scholar 

  17. Clavel, C., Vasilescu, I., Devillers, L., Richard, G., & Ehrette, T. (2008). Fear-type emotion recognition for future audio-based surveillance systems. Speech Communication, 50(6), 487–503.

    Article  Google Scholar 

  18. Cowie, R., & Cornelius, R. R. (2003). Describing the emotional states that are expressed in speech. Speech Communication, 40(1–2), 5–32.

    Article  MATH  Google Scholar 

  19. Kehrein, R. (2002). The prosody of authentic emotions. In Proceedings of Speech Prosody, Aix-en-Provence, 2002, pp. 423–426.

  20. Cowie, R., Douglas-Cowie, E., Savvidou, S., McMahon, E., Sawey, M., & Schröder, M. (2000). ‘Feeltrace’: An instrument for recording perceived emotion in real time. In Proceedings ISCA Workshop Speech and Emotion, 2000, pp. 19–24.

  21. Ayadi, M. E., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 44, 572–587.

    Article  MATH  Google Scholar 

  22. Marsella, S. C., & Gratch, J. (2009). EMA: A process model of appraisal dynamics. Journal of Cognitive Systems Research, 10(1), 70–90.

    Article  Google Scholar 

  23. Gratch, J., Marsella, S., & Petta, P. (2009). Modeling the cognitive antecedents and consequences of emotion. Journal of Cognitive Systems Research, 10(1), 1–5.

    Article  Google Scholar 

  24. Batliner, A., Fischer, K., Huber, R., Spilker, J., & E. Nöth. (2000). Desperately seeking emotions: Actors, wizards, and human beings. In Proceedings of the ISCA workshop on speech and emotion, Newcastle, Northern Ireland, 2000, pp. 195–200.

  25. Wilting, J., Krahmer, E., & Swerts, M. (2006). Real vs. acted emotional speech. In Proceedings of Interspeech, Pittsburgh, PA, 2006, pp. 805–808.

  26. Douglas, E., Devillers, L., Martin, J. C., Cowie, R., Savvidou, S., & Abrilian, S. (2005). Multimodal databases of everyday emotion: Facing up to complexity. In 9th European Conference on Speech Communication and Technology Lisbon, Portugal, 2005, pp. 813–816.

  27. Devillers, L., Vidrascu, L., & Lamel, L. (2005). Challenges in real-life emotion annotation and machine learning based detection. Neural Networks, 18(4), 407–422.

    Article  Google Scholar 

  28. Engberg, I., & Hansen, A. (1996). Documentation of the Danish emotional speech database DES. Center for Person Communication, Institute of Electronic Systems, Alborg University, Aalborg, Denmark.

  29. Burkhardt, F., Eckert, M., Johannsen, W., & Stegmann, J. (2010). A database of age and gender annotated telephone speech. In Proceedings of LREC, Valletta, Malta, 2010, pp. 1562–1565.

  30. Schuller, B., Eyben, F., Can, S., & Feussner, H. (2010). Speech in minimal invasive surgery—Towards an affective language resource of real-life medical operations. In Proceedings of the 3rd International Workshop on emotion: Corpora for Research on Emotion and Affect, satellite of LREC, Valletta, Malta, 2010, pp. 5–9.

  31. Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., & Weiss, B. (2005) A database of German emotional speech. In Proceedings of Interspeech, Lisbon, 2005, pp. 1517–1520.

  32. http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2002S28S, University of Pennsylvania Linguistic Data Consortium, Emotional prosody speech and transcripts, July, 2002.

  33. Jovicic, S. T., Kacic, Z., Dordevic, M., & Rajkovic, M. (2004). Serbian emotional speech database: Design, processing and evaluation. In Proceedings of 9th Conference on Speech and Computer, St. Petersburg, Russia, 2004, pp. 77–81.

  34. Nwe, T. L. (2003). Analysis and detection of human emotion and stress from speech signals. Ph.D. thesis, Department of Electrical and Computer Engineering, National University of Singapore, 2003.

  35. Breazeal, C., & Aryananda, L. (2002). Recognition of affective communicative intent in robot-directed speech. Autonomous Robots, 12(2002), 83–104.

    Article  MATH  Google Scholar 

  36. Meddeb, M., & Alimi, A. (2017). Building and analyzing emotion corpus of the arabic speech. International Workshop on Arabic Script Analysis and Recognition (ASAR), IEEE, 2017.

  37. Zhou, J., Wang, G., Yang, Y., & Chen, P. (2006). Speech emotion recognition based on rough set and SVM. In 5th IEEE International Conference on Cognitive Informatics, 2006, Vol. 1, pp. 53–61.

  38. Rao, K. S., & Koolagudi, S. G. (2011). Identification of Hindi dialects and emotions using spectral and prosodic features of speech. Systemics, Cybernetics, and Informatics, 9(4), 24–33.

    Google Scholar 

  39. Caballero-Morales, S. O. (2013). Recognition of emotions in Mexican Spanish speech: An approach based on acoustic modeling of emotion-specific vowels. The Scientific World Journal, 1–13.

  40. Song, P., Ou, S., Zheng, W., Jin, Y., & Zhao, L. (2016). Speech emotion recognition using transfer non-negative matrix factorization. In Proceedings of IEEE international conference ICASSP, 2016, pp. 5180–5184.

  41. Pravena, D., & Govind, D. (2017). Development of simulated emotion speech database for excitation source analysis. International Journal of Speech Technology, 20, 327–338.

    Article  Google Scholar 

  42. Busso, C., Bulut, M., Lee, C. C., Kazemzadeh, A., Mower, E., & Kim, S. (2008). IEMOCAP: Interactive emotional dyadic motion capture database. Journal of Language Resources and Evaluation, 42(4), 335–359.

    Article  Google Scholar 

  43. Schiel, F., Steininger, S., & Turk, U. (2002). The SmartKom multimodal corpus at BAS. In Proceedings of the 3rd Language Resources and Evaluation Conference, 2002, Canary Islands, Spain, pp. 200–206.

  44. Batliner, A., Hacker, C., Steidl, S., Noth, E., D’Arcy, S., & Russell, M. (2004). ‘You stupid tin box’—Children interacting with the AIBO robot: A cross-linguistic emotional speech corpus. In Proceedings of 4th Language Resources and Evaluation Conference, 2004, Lisbon, Portugal, pp. 171–174.

  45. Douglas-Cowie, E., Campbell, N., Cowie, R., & Roach, P. (2003). Emotional speech: Towards a new generation of databases. Speech Communication, 40(1–2), 33–60.

    Article  MATH  Google Scholar 

  46. Mohanty, S., & Swain, B. K. (2010). Emotion recognition using fuzzy K-means from Oriya speech. In International Conference [ACCTA-2010] on Special Issue of IJCCT, 2010, Vol. 1, Issue 2–4.

  47. Grimm, M., Kroschel, K., & Narayanan, S. (2008). The Vera am Mittag German audio–visual emotional speech database. In Proceedings IEEE International Conference on Multimedia and Expo, 2008, Hannover, Germany, pp. 865–868.

  48. Morrison, D., Wang, R., & De Silva, L. (2007). Ensemble methods for spoken emotion recognition in call-centres. Speech Communication, 49(2), 98–112.

    Article  Google Scholar 

  49. Mohammadi, G., Vinciarelli, A., & Mortillaro, M. (2010). The voice of personality: Mapping nonverbal vocal behavior into trait attributions. In Proceedings of second international workshop on social signal processing, 2010, Florence, pp. 17–20.

  50. Schuller, B., Muller, R., Eyben, F., Gast, J., Hornler, B., Wöllmer, M., et al. (2009). Being bored? recognizing natural interest by extensive audiovisual integration for real-life application. Image and Vision Computing, 27, 1760–1774.

    Article  Google Scholar 

  51. Lee, C. M., & Narayanan, S. S. (2005). Toward detecting emotions in spoken dialogs. IEEE Transactions on Speech and Audio Processing, 13(2), 293–303.

    Article  Google Scholar 

  52. Vidrascu, L., & Devillers, L. (2006). Real-life emotions in naturalistic data recorded in a medical call center. In 1st International Workshop on Emotion: Corpora for Research on Emotion and Affect (International conference on Language Resources and Evaluation), Genoa, Italy, 2006, pp. 20–24.

  53. Quiros-Ramirez, M. A., Polikovsky, S., Kameda, Y., & Onisawa, T. (2014). A spontaneous cross-cultural emotion database: Latin-America vs. Japan. In International conference on Kansei Engineering and emotion research, 2014, pp. 1127–1134.

  54. Koolagudi, S., & Sreenivasa Rao, K. (2012). Emotion recognition from speech: A review. International Journal of Speech Technology, 15(2), 99–117.

    Article  Google Scholar 

  55. Mubarak, O. M., Ambikairajah, E., & Epps, J. (2005). Analysis of an MFCC-based audio indexing system for efficient coding of multimedia sources. In The 8th International Symposium on Signal Processing and its Applications, Sydney, Australia, 2005, pp. 28–31.

  56. Pao, T. L., Chen, Y. T., Yeh, J. H., & Liao, W. Y. (2005). Combining acoustic features for improved emotion recognition in mandarin speech. In Lecture Notes in Computer Science 3784, ACII 2005 (pp. 279–285). Berlin, Heidelberg: Springer.

  57. Pao, T. L., Chen, Y. T., Yeh, J. H., Cheng, Y. M., & Chien, C. S. (2007). Feature combination for better differentiating anger from neutral in mandarin emotional speech. In LNCS 4738, ACII 2007. Berlin, Heidelberg: Springer.

  58. Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech and Signal Processing, 28, 357–366.

    Article  Google Scholar 

  59. Makhou, J. (1975). Linear prediction: A tutorial review. Proceedings of the IEEE, 63(4), 561–580.

    Article  Google Scholar 

  60. Cummings, K. E., & Clements, M. A. (1995). Analysis of the glottal excitation of emotionally styled and stressed speech. Journal of Acoustic Society of America, 98, 88–98.

    Article  Google Scholar 

  61. Rabiner, L. R., & Juang, B. H. (1993). Fundamentals of speech recognition. Englewood Cliffs, NJ: Prentice-Hall.

    Google Scholar 

  62. Chauhan, A., Koolagudi, S. G., Kafley, S., & Rao, K. S. (2010). Emotion recognition using LP residual. In IEEE TechSym West Bengal, India.

  63. Koolagudi, S. G., Reddy, R., & Rao, K. S. (2010). Emotion recognition from speech signal using epoch parameters. In International conference on signal processing and communications, IISc, Bangalore, India (pp. 1–5). New York: IEEE Press.

  64. Iliev, A., & Scordilis, M. S. (2001). Spoken emotion recognition using glottal symmetry. EURASIP Journal on Advances in Signal Processing, 1, 11.

    Google Scholar 

  65. Li, Y., & Zhao, Y. (1999). Recognizing emotions in speech using short-term and long term features. Budapest: Eurospeech.

    Google Scholar 

  66. Wang, Y., Li, B., Meng, Q., & Li, P. (2009). Emotional feature analysis and recognition in multilingual speech signal. Beijing: Electronic Measurement and Instruments (ICEMI).

    Book  Google Scholar 

  67. Vidrascu, L., & Devillers, L. (2007). Five emotion classes detection in real-world call center data: The use of various types of paralinguistic features. Orsay: LIMSI-CNRS.

    Google Scholar 

  68. Ayadi, M. E., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 44(3), 572–587.

    Article  MATH  Google Scholar 

  69. Xie, B., Chen, L., Chen, G. C., & Chen, C. (2007). Feature selection for emotion recognition of mandarin speech. Journal of Zhejiang University (Engineering Science), 41(11), 1816–1822.

    Google Scholar 

  70. Murray, I. R., & Arnott, J. L. (2008). Applying an analysis of acted vocal emotions to improve the simulation of synthetic speech. Computer Speech & Language, 22(2), 107–129.

    Article  Google Scholar 

  71. McGilloway, S., Cowie, R., Douglas-Cowie, E., Gielen, S., Westerdijk, M., & Strceve, S. (2000). Approaching automatic recognition of emotion from voice: A rough benchmark. In ISCA Workshop on Speech and Emotion, Belfast.

  72. Schuller, B., Wimmer, M., Mosenlechner, L., Kern, C., Arsić, D., & Rigoll, G. (2008). Brute-forcing hierarchical functionals for paralinguistics: A waste of feature space. In Proceedings of international conference in acoustic, speech, signal processing, Las Vegas, NV, 2008, pp. 4501–4504.

  73. Schuller, B., Rigoll, G., & Lang, M. (2004). Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture. In Proceedings of IEEE international conference in acoustic, speech, signal processing, New York, 2004, pp. 577–580.

  74. Wu, S., Falk, T. H., & Chan, W. Y. (2009). Automatic recognition of speech emotion using long-term spectro-temporal features. In 16th international conference on digital signal processing, Santorini-Hellas, 2009, pp. 1–6.

  75. Yildirim, S., Bulut, M., Lee, C. M., Kazemzadeh, A., Busso, C., & Deng, Z. (2004). An acoustic study of emotions expressed in speech. In International conference on spoken language processing, Jeju Island, Korea.

  76. Vidrascu, L., & Devillers, L. (2005). Real-life emotion representation and detection in call centers data. In J. Tao, T. Tan, & R. Picard (Eds.), LNCS: ACII (Vol. 3784, pp. 739–746). Berlin: Springer.

    Google Scholar 

  77. Nakatsu, R., Nicholson, J., & Tosa, N. (2000). Emotion recognition and its application to computer agents with spontaneous interactive capabilities. Knowledge-Based Systems, 13, 497–504.

    Article  Google Scholar 

  78. Clavel, C., Vasilescu, I., Devillers, L., Richard, G., & Ehrette, T. (2008). Fear-type emotion recognition for future audio-based surveillance systems. Speech Communication, 50(6), 487–503.

    Article  Google Scholar 

  79. Ververidis, D., Kotropoulos, C., & Pitas, I. (2004). Automatic emotional speech classification. In International conference on digital signal processing (pp. I593–I596). New York: IEEE Press.

  80. Hoque, M. E., Yeasin, M., & Louwerse, M. M. (2006). Robust recognition of emotion from speech. In Intelligent virtual agents. Lecture Notes in Computer Science (pp. 42–53). Berlin: Springer.

  81. Chuang, Z. J., & Wu, C. H. (2004). Emotion recognition using acoustic features and textual content. In Proceedings of IEEE international conference on multimedia and expo, 2004, Vol. 1, pp. 53–56.

  82. Hoch, S., Althoff, F., McGlaun, G., & Rigoll, G. (2005). Bimodal fusion of emotional data in an automotive environment. In Proceedings of IEEE international conference on acoustics, speech, and signal processing, 2005, Vol. 2, pp. 1085–1088.

  83. Yu, F., Chang, E., Xu, Y. Q., & Shum, H. Y. (2001). Emotion detection from speech to enrich multimedia content. In 2nd IEEE Pacific-Rim conference on multimedia, Beijing, China.

  84. Wang, Y., Du, S., & Zhan, Y. (2008). Adaptive and optimal classification of speech emotion recognition. In 4th international conference on natural computation, 2008, pp. 407–411.

  85. Zhang, S. (2008). Emotion recognition in Chinese natural speech by combining prosody and voice quality features. In F. Sun et al. (Eds.), Advances in neural networks. Lecture Notes in Computer Science (pp. 457–464). Berlin: Springer.

    Google Scholar 

  86. Luengo, I., Navas, E., Hernez, I., & Snchez, I. (2005). Automatic emotion recognition using prosodic parameters. In INTERSPEECH, Lisbon, Portugal, 2005, pp. 493–496.

  87. Lugger, M., & Yang, B. (2007). The relevance of voice quality features in speaker independent emotion recognition. In International conference in acoustic, speech, signal processing, Honolulu, Hawaii, USA, 2007, Vol. 4, pp. 17–20.

  88. Iliou, T., Anagnostopoulos, C. N. (2009). Statistical evaluation of speech features for emotion recognition. In 4th international conference on digital telecommunications, Colmar, France, 2009, pp. 121–126.

  89. Lee, C. M., Narayanan, S. S., & Pieraccini, R. (2001). Recognition of negative emotions from the speech signal. In Proceedings of Automatic Speech Recognition and Understanding workshop, 2001, pp. 240–243.

  90. Litman, D. J., & Forbes-Riley, K. (2006). Recognizing student emotions and attitudes on the basis of utterances in spoken tutoring dialogues with both human and computer tutors. Speech Communication, 48, 559–590.

    Article  Google Scholar 

  91. Dumouchel, P., Dehak, N., Attabi, Y., Dehak, R., & Boufaden, N. (2009). Cepstral and long-term features for emotion recognition. In Proceedings Interspeech, Brighton, 2009, pp. 344–347.

  92. Vlasenko, B., & Wendemuth, A. (2009). Processing affected speech within human machine interaction. In Proceedings Annual Conference of the International Speech Communication Association, 2009.

  93. Shami, M. T., & Kamel, M. S. (2005). Segment-based approach to the recognition of emotions in speech. In Proceedings of IEEE international conference on multimedia and expo, 2005, pp. 4–7.

  94. http://www.youtube.com/watch?v=uvhNyAXFTMQ, “Egypt today show”, Alfaraiin channel, May 25, 2012.

  95. http://www.youtube.com/watch?v=S1T_EKDpIR8,” New cairo show”, Al-hayat channel, May 28, 2012.

  96. http://www.youtube.com/watch?v=2v6X2VEjb4k, “Laka sumt, AlUraify show”, September 12, 2012.

  97. http://www.youtube.com/watch?v=MQv3tKTwm7k, Zain telecommunication, January 22, 2009.

  98. http://www.youtube.com/watch?v=16qNcn03G3s, Prince Sultan bin Fahed call, Alriyadiya sport channel, October 6, 2011.

  99. http://www.youtube.com/watch?v=E4TqhBo1SCk, Althaqafiya channel, January 7, 2012.

  100. http://www.youtube.com/watch?v=Wpf3OxEdJak, “Dairat al dawe”, Noon channel, Haifa wehbe call, November 19, 2009.

  101. http://www.youtube.com/watch?v=eBznv9QNU7M, “Musalsalati show”, Mona Zaki call, June 12, 2011.

  102. https://www.kaggle.com/suso172/arabic-natural-audio-dataset, Kaggle website for public datasets, 2017.

  103. Schiel, F., & Heinrich, C. (2009). Laying the foundation for in-car alcohol detection by speech. In Proceedings of Interspeech, Brighton, UK, 2009, pp. 983–986.

  104. Schuller, B., & Burkhardt, F. (2010). Learning with synthesized speech for automatic emotion recognition. In Proceedings of international conference on digital signal processing, Dallas, TX, 2010, pp. 5150–5155.

  105. Burkhardt, F., Schuller, B., Weiss, B., & Weninger, F. (2011). ‘Would you buy a car from me?’—On the likability of telephone voices. In Proceedings of Interspeech, Florence, 2011, pp. 1557–1560.

  106. Fisher, W., Doddington, G., & Goudie-Marshall, K. (1986). The DARPA speech recognition research database: Specifications and status. In Proceedings of the DARPA Workshop on Speech Recognition, 1986, pp. 93–99.

  107. Schuller, B., Steidl, S., & Batliner, A. (2009). The interspeech 2009 emotion challenge. In Proceedings of Interspeech, Brighton, UK, 2009, pp. 312–315.

  108. Schuller, B., Steidl, S., Batliner, A., Burkhardt, F., Devillers, L., & Muller, C. (2010). The interspeech 2010 paralinguistic challenge. In Proceedings of Interspeech, Makuhari, Japan, 2010, pp. 2794–2797.

  109. Schuller, B., Steidl, S., Batliner, A., & Krajewski, J. (2011). The interspeech 2011 speaker state challenge. In Proceedings of Interspeech, Florence, 2011, pp. 3201–3204.

  110. Schuller, B., et al. (2012). The interspeech 2012 speaker trait challenge. In Proceedings of Interspeech, Portland, OR, 2012, pp. 254–257.

  111. Schuller, B., Valstar, M., Cowie, R., & Pantic, M. (2011). AVEC 2011—The first international audio/visual emotion challenge (Vol. 2, pp. 415–424). Berlin: Springer.

    Google Scholar 

  112. Schuller, B., Valstar, M., Cowie, R., & Pantic, M. (2012). AVEC 2012—The continuous audio/visual emotion challenge. In Proceedings of the 2nd international audio/visual emotion challenge and workshop, AVEC, grand challenge and satellite of ACM ICMI, CA, 2012.

  113. Eyben, F., Wöllmer, M., & Schuller, B. (2010). openSMILE—The munich versatile and fast open-source audio feature extractor. ACM.

  114. https://statistics.laerd.com/spss-tutorials/kruskal-wallis-h-test-using-spss-statistics.php, “Kruskal-Wallis H Test using SPSS Statistics”, Leard Statistics website.

  115. Witten, I. A., & Frank, E. (2005). Data mining: Practical machine learning tools and techniques (2nd ed.). San Francisco, CA: Morgan Kaufmann.

    MATH  Google Scholar 

  116. Platt, J. C. (1998). Sequential minimal optimization: A fast algorithm for training support vector machines. Technical Report MSR-TR-98-14, April 1998.

  117. Vapnik, V. (1995). The nature of statistical learning theory. New York: Springer.

    Book  MATH  Google Scholar 

  118. Meddeb, M., Karray, H., & Alimi, A. (2016). Automated extraction of features from Arabic emotional speech corpus. International Journal of Computer Information Systems and Industrial Management Applications, 8, 184–194.

    Google Scholar 

  119. Meddeb, M., Hichem, K., & Alimi, A. (2015). Speech emotion recognition based on Arabic features. In 15th International conference on Intelligent Systems design and Applications (ISDA15), Marrakesh, Morocco, IEEE conference, 2015, pp. 14–16.

  120. Elliott, C. (1992). The affective reasoner: A process model of emotions in a multi-agent system. Ph.D. thesis, Inst. Learning Sciences, Northwestern University, Tech. Rep. 32, 1992.

  121. Landweh, N., Hall, M., & Frank, E. (2005). Logistic model trees. Machine Learning, 59(1–2), 161–205.

    Article  Google Scholar 

  122. Ho, T. K. (1998). The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(8), 832–844.

    Article  Google Scholar 

  123. http://weka.sourceforge.net/doc.dev/allclasses-noframe.html, Machine Learning Group at the University of Waikato, 2016.

  124. Frank, E., Wang, Y., Inglis, S., Holmes, G., & Witten, I. H. (1998). Using model trees for classification. Machine Learning, 32(1), 63–76.

    Article  MATH  Google Scholar 

  125. Aha, D., & Kibler, D. (1991). Instance-based learning algorithms. Machine Learning, 6, 37–66.

    MATH  Google Scholar 

  126. Meddeb, M., Karray, H., & Alimi, (2016). Content-based Arabic speech similarity search and emotion detection. In Proceedings of the International Conference on Advanced Intelligent Systems and Informatics, 2016, pp. 530–539.

  127. Wu, S., Falk, T. H., & Chan, W. Y. (2011). Automatic speech emotion recognition using modulation spectral features. Speech Communication, 53, 768–785.

    Article  Google Scholar 

  128. Dellaert, F., Polzin, T., & Waibel, A. (1996). Recognizing emotion in speech. In Proceedings of ICSLP, Philadelphia, 1996, pp. 1970–1973.

  129. Batliner, A., Fischer, K., Huber, R., Spilker, J., Noth, E. (2000). Desperately seeking emotions: Actors, wizards, and human beings. In Proceedings of the ISCA Workshop on Speech and Emotion, Newcastle, 2000, pp. 195–200.

  130. Schuller, B., Muller, R., Lang, M., & Rigoll, G. (2005). Speaker independent emotion recognition by early fusion of acoustic and linguistic features within ensembles. In Proceedings of Interspeech, Lisbon, 2005, pp. 805–809.

  131. McGilloway, S., Cowie, R., Doulas-Cowie, E., Gielen, S., Westerdijk, M., & Stroeve, S. (2000). Approaching automatic recognition of emotion from voice: A rough benchmark. In Proceedings of the ISCA Workshop on Speech and Emotion, Newcastle, 2000, pp. 207–212.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Samira Klaylat.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Klaylat, S., Osman, Z., Hamandi, L. et al. Emotion recognition in Arabic speech. Analog Integr Circ Sig Process 96, 337–351 (2018). https://doi.org/10.1007/s10470-018-1142-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10470-018-1142-4

Keywords

Navigation