Abstract
Automatic emotion recognition from speech signals without linguistic cues has been an important emerging research area. Integrating emotions in human–computer interaction is of great importance to effectively simulate real life scenarios. Research has been focusing on recognizing emotions from acted speech while little work was done on natural real life utterances. English, French, German and Chinese corpora were used for that purpose while no natural Arabic corpus was found to date. In this paper, emotion recognition in Arabic spoken data is studied for the first time. A realistic speech corpus from Arabic TV shows is collected. The videos are labeled by their perceived emotions; namely happy, angry or surprised. Prosodic features are extracted and thirty-five classification methods are applied. Results are analyzed in this paper and conclusions and future recommendations are identified.
Similar content being viewed by others
References
Petrushin, V. (2000). Emotion recognition in speech signal: Experimental study, development, and application. In Proceedings of the 6th International Conference on Spoken Language Processing, Beijing, China.
Liscombe, J., Riccardi, G., & Hakkani-Tnr, D. (2005) Using context to improve emotion detection in spoken dialog systems. In Interspeech, pp. 1845–1848.
Roisman, G. I., Tsai, J. L., & Chiang, K. S. (2004). The emotional integration of childhood experience: Physiological, facial expressive, and self-reported emotional response during the adult attachment interview. Developmental Psychology, 40(5), 776–789.
Pantic, M., Pentland, A., Nijholt, A., & Huang, T. S. (2006). Human computing and machine understanding of human behavior: A survey. In Proceedings Eighth ACM Int’l Conf. Multimodal Interfaces, 2006, pp. 239–248.
Chevrie-Muller, C., Seguier, N., Spira, A., & Dordain, M. (1978). Recognition of psychiatric disorders from voice quality. Language and Speech, 21, 87–111.
France, D. J., Shiavi, R. G., Silverman, S., Silverman, M., & Wilkes, M. (2000). Acoustical properties of speech as indicators of depression and suicidal risk. IEEE Transactions on Biomedical Eng, 47(7), 829–837.
Ji, Q., Lan, P., & Looney, C. (2006). A probabilistic framework for modeling and real-time monitoring human fatigue. IEEE Systems, Man, and Cybernetics Part A, 36(5), 862–875.
Ai, H., Litman, D. J., Forbes-Riley, K., Rotaru, M., Tetreault, J., & Purandare, A. (2006). Using system and user performance features to improve emotion detection in spoken tutoring systems. In Proceedings of Interspeech, 2006, pp. 797–800.
Klein, J., Moon, Y., & Picard, R. W. (2002). This computer responds to user frustration: Theory, design and results. Interacting with Computers, 14(2), 119–140.
Kuncheva, L., Bezdek, J., & Duin, R. (2006). Decision templates for multiple classifier fusion: An experimental comparison. Pattern Recognition, 34(2), 299–314.
Scheirer, J., Fernandez, R., Klein, J., & Picard, R. W. (2002). Frustrating the user on purpose: A step toward building an affective computer. Interactive Comput., 34(2), 93–118.
http://android-apps.com/apps/skc-interpret/, Android Apps website.
http://appcrawlr.com/android/sprint-mobile-ip, Sprint Mobile IP, App Crawlr website.
Ekman, P. (1971). Universals and cultural differences in facial expressions of emotion. In Proceedings of Nebraska Symp. Motivation, pp. 207–283.
Ekman, P. (1982). Emotion in the human face (2nd ed.). Cambridge: Cambridge University Press.
Ekman, P., & Oster, H. (1979). Facial expressions of emotion. Annual Review of Psychology, 30, 527–554.
Clavel, C., Vasilescu, I., Devillers, L., Richard, G., & Ehrette, T. (2008). Fear-type emotion recognition for future audio-based surveillance systems. Speech Communication, 50(6), 487–503.
Cowie, R., & Cornelius, R. R. (2003). Describing the emotional states that are expressed in speech. Speech Communication, 40(1–2), 5–32.
Kehrein, R. (2002). The prosody of authentic emotions. In Proceedings of Speech Prosody, Aix-en-Provence, 2002, pp. 423–426.
Cowie, R., Douglas-Cowie, E., Savvidou, S., McMahon, E., Sawey, M., & Schröder, M. (2000). ‘Feeltrace’: An instrument for recording perceived emotion in real time. In Proceedings ISCA Workshop Speech and Emotion, 2000, pp. 19–24.
Ayadi, M. E., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 44, 572–587.
Marsella, S. C., & Gratch, J. (2009). EMA: A process model of appraisal dynamics. Journal of Cognitive Systems Research, 10(1), 70–90.
Gratch, J., Marsella, S., & Petta, P. (2009). Modeling the cognitive antecedents and consequences of emotion. Journal of Cognitive Systems Research, 10(1), 1–5.
Batliner, A., Fischer, K., Huber, R., Spilker, J., & E. Nöth. (2000). Desperately seeking emotions: Actors, wizards, and human beings. In Proceedings of the ISCA workshop on speech and emotion, Newcastle, Northern Ireland, 2000, pp. 195–200.
Wilting, J., Krahmer, E., & Swerts, M. (2006). Real vs. acted emotional speech. In Proceedings of Interspeech, Pittsburgh, PA, 2006, pp. 805–808.
Douglas, E., Devillers, L., Martin, J. C., Cowie, R., Savvidou, S., & Abrilian, S. (2005). Multimodal databases of everyday emotion: Facing up to complexity. In 9th European Conference on Speech Communication and Technology Lisbon, Portugal, 2005, pp. 813–816.
Devillers, L., Vidrascu, L., & Lamel, L. (2005). Challenges in real-life emotion annotation and machine learning based detection. Neural Networks, 18(4), 407–422.
Engberg, I., & Hansen, A. (1996). Documentation of the Danish emotional speech database DES. Center for Person Communication, Institute of Electronic Systems, Alborg University, Aalborg, Denmark.
Burkhardt, F., Eckert, M., Johannsen, W., & Stegmann, J. (2010). A database of age and gender annotated telephone speech. In Proceedings of LREC, Valletta, Malta, 2010, pp. 1562–1565.
Schuller, B., Eyben, F., Can, S., & Feussner, H. (2010). Speech in minimal invasive surgery—Towards an affective language resource of real-life medical operations. In Proceedings of the 3rd International Workshop on emotion: Corpora for Research on Emotion and Affect, satellite of LREC, Valletta, Malta, 2010, pp. 5–9.
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., & Weiss, B. (2005) A database of German emotional speech. In Proceedings of Interspeech, Lisbon, 2005, pp. 1517–1520.
http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2002S28S, University of Pennsylvania Linguistic Data Consortium, Emotional prosody speech and transcripts, July, 2002.
Jovicic, S. T., Kacic, Z., Dordevic, M., & Rajkovic, M. (2004). Serbian emotional speech database: Design, processing and evaluation. In Proceedings of 9th Conference on Speech and Computer, St. Petersburg, Russia, 2004, pp. 77–81.
Nwe, T. L. (2003). Analysis and detection of human emotion and stress from speech signals. Ph.D. thesis, Department of Electrical and Computer Engineering, National University of Singapore, 2003.
Breazeal, C., & Aryananda, L. (2002). Recognition of affective communicative intent in robot-directed speech. Autonomous Robots, 12(2002), 83–104.
Meddeb, M., & Alimi, A. (2017). Building and analyzing emotion corpus of the arabic speech. International Workshop on Arabic Script Analysis and Recognition (ASAR), IEEE, 2017.
Zhou, J., Wang, G., Yang, Y., & Chen, P. (2006). Speech emotion recognition based on rough set and SVM. In 5th IEEE International Conference on Cognitive Informatics, 2006, Vol. 1, pp. 53–61.
Rao, K. S., & Koolagudi, S. G. (2011). Identification of Hindi dialects and emotions using spectral and prosodic features of speech. Systemics, Cybernetics, and Informatics, 9(4), 24–33.
Caballero-Morales, S. O. (2013). Recognition of emotions in Mexican Spanish speech: An approach based on acoustic modeling of emotion-specific vowels. The Scientific World Journal, 1–13.
Song, P., Ou, S., Zheng, W., Jin, Y., & Zhao, L. (2016). Speech emotion recognition using transfer non-negative matrix factorization. In Proceedings of IEEE international conference ICASSP, 2016, pp. 5180–5184.
Pravena, D., & Govind, D. (2017). Development of simulated emotion speech database for excitation source analysis. International Journal of Speech Technology, 20, 327–338.
Busso, C., Bulut, M., Lee, C. C., Kazemzadeh, A., Mower, E., & Kim, S. (2008). IEMOCAP: Interactive emotional dyadic motion capture database. Journal of Language Resources and Evaluation, 42(4), 335–359.
Schiel, F., Steininger, S., & Turk, U. (2002). The SmartKom multimodal corpus at BAS. In Proceedings of the 3rd Language Resources and Evaluation Conference, 2002, Canary Islands, Spain, pp. 200–206.
Batliner, A., Hacker, C., Steidl, S., Noth, E., D’Arcy, S., & Russell, M. (2004). ‘You stupid tin box’—Children interacting with the AIBO robot: A cross-linguistic emotional speech corpus. In Proceedings of 4th Language Resources and Evaluation Conference, 2004, Lisbon, Portugal, pp. 171–174.
Douglas-Cowie, E., Campbell, N., Cowie, R., & Roach, P. (2003). Emotional speech: Towards a new generation of databases. Speech Communication, 40(1–2), 33–60.
Mohanty, S., & Swain, B. K. (2010). Emotion recognition using fuzzy K-means from Oriya speech. In International Conference [ACCTA-2010] on Special Issue of IJCCT, 2010, Vol. 1, Issue 2–4.
Grimm, M., Kroschel, K., & Narayanan, S. (2008). The Vera am Mittag German audio–visual emotional speech database. In Proceedings IEEE International Conference on Multimedia and Expo, 2008, Hannover, Germany, pp. 865–868.
Morrison, D., Wang, R., & De Silva, L. (2007). Ensemble methods for spoken emotion recognition in call-centres. Speech Communication, 49(2), 98–112.
Mohammadi, G., Vinciarelli, A., & Mortillaro, M. (2010). The voice of personality: Mapping nonverbal vocal behavior into trait attributions. In Proceedings of second international workshop on social signal processing, 2010, Florence, pp. 17–20.
Schuller, B., Muller, R., Eyben, F., Gast, J., Hornler, B., Wöllmer, M., et al. (2009). Being bored? recognizing natural interest by extensive audiovisual integration for real-life application. Image and Vision Computing, 27, 1760–1774.
Lee, C. M., & Narayanan, S. S. (2005). Toward detecting emotions in spoken dialogs. IEEE Transactions on Speech and Audio Processing, 13(2), 293–303.
Vidrascu, L., & Devillers, L. (2006). Real-life emotions in naturalistic data recorded in a medical call center. In 1st International Workshop on Emotion: Corpora for Research on Emotion and Affect (International conference on Language Resources and Evaluation), Genoa, Italy, 2006, pp. 20–24.
Quiros-Ramirez, M. A., Polikovsky, S., Kameda, Y., & Onisawa, T. (2014). A spontaneous cross-cultural emotion database: Latin-America vs. Japan. In International conference on Kansei Engineering and emotion research, 2014, pp. 1127–1134.
Koolagudi, S., & Sreenivasa Rao, K. (2012). Emotion recognition from speech: A review. International Journal of Speech Technology, 15(2), 99–117.
Mubarak, O. M., Ambikairajah, E., & Epps, J. (2005). Analysis of an MFCC-based audio indexing system for efficient coding of multimedia sources. In The 8th International Symposium on Signal Processing and its Applications, Sydney, Australia, 2005, pp. 28–31.
Pao, T. L., Chen, Y. T., Yeh, J. H., & Liao, W. Y. (2005). Combining acoustic features for improved emotion recognition in mandarin speech. In Lecture Notes in Computer Science 3784, ACII 2005 (pp. 279–285). Berlin, Heidelberg: Springer.
Pao, T. L., Chen, Y. T., Yeh, J. H., Cheng, Y. M., & Chien, C. S. (2007). Feature combination for better differentiating anger from neutral in mandarin emotional speech. In LNCS 4738, ACII 2007. Berlin, Heidelberg: Springer.
Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech and Signal Processing, 28, 357–366.
Makhou, J. (1975). Linear prediction: A tutorial review. Proceedings of the IEEE, 63(4), 561–580.
Cummings, K. E., & Clements, M. A. (1995). Analysis of the glottal excitation of emotionally styled and stressed speech. Journal of Acoustic Society of America, 98, 88–98.
Rabiner, L. R., & Juang, B. H. (1993). Fundamentals of speech recognition. Englewood Cliffs, NJ: Prentice-Hall.
Chauhan, A., Koolagudi, S. G., Kafley, S., & Rao, K. S. (2010). Emotion recognition using LP residual. In IEEE TechSym West Bengal, India.
Koolagudi, S. G., Reddy, R., & Rao, K. S. (2010). Emotion recognition from speech signal using epoch parameters. In International conference on signal processing and communications, IISc, Bangalore, India (pp. 1–5). New York: IEEE Press.
Iliev, A., & Scordilis, M. S. (2001). Spoken emotion recognition using glottal symmetry. EURASIP Journal on Advances in Signal Processing, 1, 11.
Li, Y., & Zhao, Y. (1999). Recognizing emotions in speech using short-term and long term features. Budapest: Eurospeech.
Wang, Y., Li, B., Meng, Q., & Li, P. (2009). Emotional feature analysis and recognition in multilingual speech signal. Beijing: Electronic Measurement and Instruments (ICEMI).
Vidrascu, L., & Devillers, L. (2007). Five emotion classes detection in real-world call center data: The use of various types of paralinguistic features. Orsay: LIMSI-CNRS.
Ayadi, M. E., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 44(3), 572–587.
Xie, B., Chen, L., Chen, G. C., & Chen, C. (2007). Feature selection for emotion recognition of mandarin speech. Journal of Zhejiang University (Engineering Science), 41(11), 1816–1822.
Murray, I. R., & Arnott, J. L. (2008). Applying an analysis of acted vocal emotions to improve the simulation of synthetic speech. Computer Speech & Language, 22(2), 107–129.
McGilloway, S., Cowie, R., Douglas-Cowie, E., Gielen, S., Westerdijk, M., & Strceve, S. (2000). Approaching automatic recognition of emotion from voice: A rough benchmark. In ISCA Workshop on Speech and Emotion, Belfast.
Schuller, B., Wimmer, M., Mosenlechner, L., Kern, C., Arsić, D., & Rigoll, G. (2008). Brute-forcing hierarchical functionals for paralinguistics: A waste of feature space. In Proceedings of international conference in acoustic, speech, signal processing, Las Vegas, NV, 2008, pp. 4501–4504.
Schuller, B., Rigoll, G., & Lang, M. (2004). Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture. In Proceedings of IEEE international conference in acoustic, speech, signal processing, New York, 2004, pp. 577–580.
Wu, S., Falk, T. H., & Chan, W. Y. (2009). Automatic recognition of speech emotion using long-term spectro-temporal features. In 16th international conference on digital signal processing, Santorini-Hellas, 2009, pp. 1–6.
Yildirim, S., Bulut, M., Lee, C. M., Kazemzadeh, A., Busso, C., & Deng, Z. (2004). An acoustic study of emotions expressed in speech. In International conference on spoken language processing, Jeju Island, Korea.
Vidrascu, L., & Devillers, L. (2005). Real-life emotion representation and detection in call centers data. In J. Tao, T. Tan, & R. Picard (Eds.), LNCS: ACII (Vol. 3784, pp. 739–746). Berlin: Springer.
Nakatsu, R., Nicholson, J., & Tosa, N. (2000). Emotion recognition and its application to computer agents with spontaneous interactive capabilities. Knowledge-Based Systems, 13, 497–504.
Clavel, C., Vasilescu, I., Devillers, L., Richard, G., & Ehrette, T. (2008). Fear-type emotion recognition for future audio-based surveillance systems. Speech Communication, 50(6), 487–503.
Ververidis, D., Kotropoulos, C., & Pitas, I. (2004). Automatic emotional speech classification. In International conference on digital signal processing (pp. I593–I596). New York: IEEE Press.
Hoque, M. E., Yeasin, M., & Louwerse, M. M. (2006). Robust recognition of emotion from speech. In Intelligent virtual agents. Lecture Notes in Computer Science (pp. 42–53). Berlin: Springer.
Chuang, Z. J., & Wu, C. H. (2004). Emotion recognition using acoustic features and textual content. In Proceedings of IEEE international conference on multimedia and expo, 2004, Vol. 1, pp. 53–56.
Hoch, S., Althoff, F., McGlaun, G., & Rigoll, G. (2005). Bimodal fusion of emotional data in an automotive environment. In Proceedings of IEEE international conference on acoustics, speech, and signal processing, 2005, Vol. 2, pp. 1085–1088.
Yu, F., Chang, E., Xu, Y. Q., & Shum, H. Y. (2001). Emotion detection from speech to enrich multimedia content. In 2nd IEEE Pacific-Rim conference on multimedia, Beijing, China.
Wang, Y., Du, S., & Zhan, Y. (2008). Adaptive and optimal classification of speech emotion recognition. In 4th international conference on natural computation, 2008, pp. 407–411.
Zhang, S. (2008). Emotion recognition in Chinese natural speech by combining prosody and voice quality features. In F. Sun et al. (Eds.), Advances in neural networks. Lecture Notes in Computer Science (pp. 457–464). Berlin: Springer.
Luengo, I., Navas, E., Hernez, I., & Snchez, I. (2005). Automatic emotion recognition using prosodic parameters. In INTERSPEECH, Lisbon, Portugal, 2005, pp. 493–496.
Lugger, M., & Yang, B. (2007). The relevance of voice quality features in speaker independent emotion recognition. In International conference in acoustic, speech, signal processing, Honolulu, Hawaii, USA, 2007, Vol. 4, pp. 17–20.
Iliou, T., Anagnostopoulos, C. N. (2009). Statistical evaluation of speech features for emotion recognition. In 4th international conference on digital telecommunications, Colmar, France, 2009, pp. 121–126.
Lee, C. M., Narayanan, S. S., & Pieraccini, R. (2001). Recognition of negative emotions from the speech signal. In Proceedings of Automatic Speech Recognition and Understanding workshop, 2001, pp. 240–243.
Litman, D. J., & Forbes-Riley, K. (2006). Recognizing student emotions and attitudes on the basis of utterances in spoken tutoring dialogues with both human and computer tutors. Speech Communication, 48, 559–590.
Dumouchel, P., Dehak, N., Attabi, Y., Dehak, R., & Boufaden, N. (2009). Cepstral and long-term features for emotion recognition. In Proceedings Interspeech, Brighton, 2009, pp. 344–347.
Vlasenko, B., & Wendemuth, A. (2009). Processing affected speech within human machine interaction. In Proceedings Annual Conference of the International Speech Communication Association, 2009.
Shami, M. T., & Kamel, M. S. (2005). Segment-based approach to the recognition of emotions in speech. In Proceedings of IEEE international conference on multimedia and expo, 2005, pp. 4–7.
http://www.youtube.com/watch?v=uvhNyAXFTMQ, “Egypt today show”, Alfaraiin channel, May 25, 2012.
http://www.youtube.com/watch?v=S1T_EKDpIR8,” New cairo show”, Al-hayat channel, May 28, 2012.
http://www.youtube.com/watch?v=2v6X2VEjb4k, “Laka sumt, AlUraify show”, September 12, 2012.
http://www.youtube.com/watch?v=MQv3tKTwm7k, Zain telecommunication, January 22, 2009.
http://www.youtube.com/watch?v=16qNcn03G3s, Prince Sultan bin Fahed call, Alriyadiya sport channel, October 6, 2011.
http://www.youtube.com/watch?v=E4TqhBo1SCk, Althaqafiya channel, January 7, 2012.
http://www.youtube.com/watch?v=Wpf3OxEdJak, “Dairat al dawe”, Noon channel, Haifa wehbe call, November 19, 2009.
http://www.youtube.com/watch?v=eBznv9QNU7M, “Musalsalati show”, Mona Zaki call, June 12, 2011.
https://www.kaggle.com/suso172/arabic-natural-audio-dataset, Kaggle website for public datasets, 2017.
Schiel, F., & Heinrich, C. (2009). Laying the foundation for in-car alcohol detection by speech. In Proceedings of Interspeech, Brighton, UK, 2009, pp. 983–986.
Schuller, B., & Burkhardt, F. (2010). Learning with synthesized speech for automatic emotion recognition. In Proceedings of international conference on digital signal processing, Dallas, TX, 2010, pp. 5150–5155.
Burkhardt, F., Schuller, B., Weiss, B., & Weninger, F. (2011). ‘Would you buy a car from me?’—On the likability of telephone voices. In Proceedings of Interspeech, Florence, 2011, pp. 1557–1560.
Fisher, W., Doddington, G., & Goudie-Marshall, K. (1986). The DARPA speech recognition research database: Specifications and status. In Proceedings of the DARPA Workshop on Speech Recognition, 1986, pp. 93–99.
Schuller, B., Steidl, S., & Batliner, A. (2009). The interspeech 2009 emotion challenge. In Proceedings of Interspeech, Brighton, UK, 2009, pp. 312–315.
Schuller, B., Steidl, S., Batliner, A., Burkhardt, F., Devillers, L., & Muller, C. (2010). The interspeech 2010 paralinguistic challenge. In Proceedings of Interspeech, Makuhari, Japan, 2010, pp. 2794–2797.
Schuller, B., Steidl, S., Batliner, A., & Krajewski, J. (2011). The interspeech 2011 speaker state challenge. In Proceedings of Interspeech, Florence, 2011, pp. 3201–3204.
Schuller, B., et al. (2012). The interspeech 2012 speaker trait challenge. In Proceedings of Interspeech, Portland, OR, 2012, pp. 254–257.
Schuller, B., Valstar, M., Cowie, R., & Pantic, M. (2011). AVEC 2011—The first international audio/visual emotion challenge (Vol. 2, pp. 415–424). Berlin: Springer.
Schuller, B., Valstar, M., Cowie, R., & Pantic, M. (2012). AVEC 2012—The continuous audio/visual emotion challenge. In Proceedings of the 2nd international audio/visual emotion challenge and workshop, AVEC, grand challenge and satellite of ACM ICMI, CA, 2012.
Eyben, F., Wöllmer, M., & Schuller, B. (2010). openSMILE—The munich versatile and fast open-source audio feature extractor. ACM.
https://statistics.laerd.com/spss-tutorials/kruskal-wallis-h-test-using-spss-statistics.php, “Kruskal-Wallis H Test using SPSS Statistics”, Leard Statistics website.
Witten, I. A., & Frank, E. (2005). Data mining: Practical machine learning tools and techniques (2nd ed.). San Francisco, CA: Morgan Kaufmann.
Platt, J. C. (1998). Sequential minimal optimization: A fast algorithm for training support vector machines. Technical Report MSR-TR-98-14, April 1998.
Vapnik, V. (1995). The nature of statistical learning theory. New York: Springer.
Meddeb, M., Karray, H., & Alimi, A. (2016). Automated extraction of features from Arabic emotional speech corpus. International Journal of Computer Information Systems and Industrial Management Applications, 8, 184–194.
Meddeb, M., Hichem, K., & Alimi, A. (2015). Speech emotion recognition based on Arabic features. In 15th International conference on Intelligent Systems design and Applications (ISDA15), Marrakesh, Morocco, IEEE conference, 2015, pp. 14–16.
Elliott, C. (1992). The affective reasoner: A process model of emotions in a multi-agent system. Ph.D. thesis, Inst. Learning Sciences, Northwestern University, Tech. Rep. 32, 1992.
Landweh, N., Hall, M., & Frank, E. (2005). Logistic model trees. Machine Learning, 59(1–2), 161–205.
Ho, T. K. (1998). The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(8), 832–844.
http://weka.sourceforge.net/doc.dev/allclasses-noframe.html, Machine Learning Group at the University of Waikato, 2016.
Frank, E., Wang, Y., Inglis, S., Holmes, G., & Witten, I. H. (1998). Using model trees for classification. Machine Learning, 32(1), 63–76.
Aha, D., & Kibler, D. (1991). Instance-based learning algorithms. Machine Learning, 6, 37–66.
Meddeb, M., Karray, H., & Alimi, (2016). Content-based Arabic speech similarity search and emotion detection. In Proceedings of the International Conference on Advanced Intelligent Systems and Informatics, 2016, pp. 530–539.
Wu, S., Falk, T. H., & Chan, W. Y. (2011). Automatic speech emotion recognition using modulation spectral features. Speech Communication, 53, 768–785.
Dellaert, F., Polzin, T., & Waibel, A. (1996). Recognizing emotion in speech. In Proceedings of ICSLP, Philadelphia, 1996, pp. 1970–1973.
Batliner, A., Fischer, K., Huber, R., Spilker, J., Noth, E. (2000). Desperately seeking emotions: Actors, wizards, and human beings. In Proceedings of the ISCA Workshop on Speech and Emotion, Newcastle, 2000, pp. 195–200.
Schuller, B., Muller, R., Lang, M., & Rigoll, G. (2005). Speaker independent emotion recognition by early fusion of acoustic and linguistic features within ensembles. In Proceedings of Interspeech, Lisbon, 2005, pp. 805–809.
McGilloway, S., Cowie, R., Doulas-Cowie, E., Gielen, S., Westerdijk, M., & Stroeve, S. (2000). Approaching automatic recognition of emotion from voice: A rough benchmark. In Proceedings of the ISCA Workshop on Speech and Emotion, Newcastle, 2000, pp. 207–212.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Klaylat, S., Osman, Z., Hamandi, L. et al. Emotion recognition in Arabic speech. Analog Integr Circ Sig Process 96, 337–351 (2018). https://doi.org/10.1007/s10470-018-1142-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10470-018-1142-4