Skip to main content

Recognizing emotional speech in Persian: A validated database of Persian emotional speech (Persian ESD)

An Erratum to this article was published on 26 November 2014

Abstract

Research on emotional speech often requires valid stimuli for assessing perceived emotion through prosody and lexical content. To date, no comprehensive emotional speech database for Persian is officially available. The present article reports the process of designing, compiling, and evaluating a comprehensive emotional speech database for colloquial Persian. The database contains a set of 90 validated novel Persian sentences classified in five basic emotional categories (anger, disgust, fear, happiness, and sadness), as well as a neutral category. These sentences were validated in two experiments by a group of 1,126 native Persian speakers. The sentences were articulated by two native Persian speakers (one male, one female) in three conditions: (1) congruent (emotional lexical content articulated in a congruent emotional voice), (2) incongruent (neutral sentences articulated in an emotional voice), and (3) baseline (all emotional and neutral sentences articulated in neutral voice). The speech materials comprise about 470 sentences. The validity of the database was evaluated by a group of 34 native speakers in a perception test. Utterances recognized better than five times chance performance (71.4 %) were regarded as valid portrayals of the target emotions. Acoustic analysis of the valid emotional utterances revealed differences in pitch, intensity, and duration, attributes that may help listeners to correctly classify the intended emotion. The database is designed to be used as a reliable material source (for both text and speech) in future cross-cultural or cross-linguistic studies of emotional speech, and it is available for academic research purposes free of charge. To access the database, please contact the first author.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Notes

  1. The Stanislavski method is a progression of techniques (e.g., imagination and other mental or muscular techniques) used to train actors to experience a state similar to the intended emotion. This method, which is based on the concept of emotional memory, helps actors to draw on believable emotions in their performances (O’Brien, 2011).

  2. In order to prevent the same participant from taking the test twice, the IP address of each participant’s computer was checked.

  3. On the basis of earlier explanations, Ms. Tailor was a lady who worked as a tailor, but whose family name is not Tailor.

References

  • Anvari, H., & Givi, H. (1996). Persian grammar (2 vols). Tehran, Iran: Fatemi.

    Google Scholar 

  • Banse, R., & Scherer, K. R. (1996). Acoustic profiles in vocal emotion expression. Journal of Personality and Social Psychology, 70, 614–636. doi:10.1037/0022-3514.70.3.614

    Article  PubMed  Google Scholar 

  • Ben-David, B. M., van Lieshout, P. H., & Leszcz, T. (2011). A resource of validated affective and neutral sentences to assess identification of emotion in spoken language after a brain injury. Brain Injury, 25, 206–220.

    Article  PubMed  Google Scholar 

  • Boersma, P., & Weenink, D. (2006). Praat: Doing phonetics by computer (Version 4.4.11) [Computer program]. Retrieved February 26, 2010, from www.praat.org

  • Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W. F., & Weiss, B. (2005, September). A database of German emotional speech. Paper presented at the 9th European Conference on Speech Communication and Technology, Lisbon, Portugal.

  • Calder, J. (1998). Survey research methods. Medical Education, 32, 636–652.

    Article  Google Scholar 

  • Campbell, N. (2000, September). Databases of emotional speech. Paper presented at the ISCA Tutorial and Research Workshop (ITRW) on Speech and Emotion, Newcastle, Northern Ireland, UK.

  • Cowie, R., & Cornelius, R. R. (2003). Describing the emotional states that are expressed in speech. Speech Communication, 40, 5–32.

    Article  Google Scholar 

  • Douglas-Cowie, E., Campbell, N., Cowie, R., & Roach, P. (2003). Emotional speech: Towards a new generation of databases. Speech Communication, 40, 33–60.

    Article  Google Scholar 

  • Ekman, P. (1999). Basic emotions. In T. Dalgleish & T. Power (Eds.), The handbook of cognition and emotion (pp. 45–60). Hove, UK: Wiley.

    Google Scholar 

  • Ekman, P., Friesen, W. V., & O’Sullivan, M. (1988). Smiles when lying. Journal of Personality and Social Psychology, 54, 414–420. doi:10.1037/0022-3514.54.3.414

    Article  PubMed  Google Scholar 

  • Frank, M. G., & Stennett, J. (2001). The forced-choice paradigm and the perception of facial expressions of emotion. Journal of Personality and Social Psychology, 80, 75–85. doi:10.1037/0022-3514.80.1.75

    Article  PubMed  Google Scholar 

  • Gerrards‐Hesse, A., Spies, K., & Hesse, F. W. (1994). Experimental inductions of emotional states and their effectiveness: A review. British Journal of Psychology, 85, 55–78.

    Article  Google Scholar 

  • Gharavian, D., & Ahadi, S. M. (2009). Emotional speech recognition and emotion identification in Farsi language. Modares Technical and Engineering, 34(13), 2.

    Google Scholar 

  • Gharavian, D., & Sheikhan, M. (2010). Emotion recognition and emotion spotting improvement using formant-related features. Majlesi Journal of Electrical Engineering, 4(4).

  • Gharavian, D., Sheikhan, M., Nazerieh, A., & Garoucy, S. (2012). Speech emotion recognition using FCBF feature selection method and GA-optimized fuzzy ARTMAP neural network. Neural Computing and Applications, 21, 2115–2126.

    Article  Google Scholar 

  • Ghayoomi, M., Momtazi, S., & Bijankhan, M. (2010). A study of corpus development for Persian. International Journal on Asian Language Processing, 20, 17–33.

    Google Scholar 

  • Johnson, W. F., Emde, R. N., Scherer, K. R., & Klinnert, M. D. (1986). Recognition of emotion from vocal cues. Archives of General Psychiatry, 43, 280–283. doi:10.1001/archpsyc.1986.01800030098011

    Article  PubMed  Google Scholar 

  • Johnstone, T., & Scherer, K. R. (1999, August). The effects of emotions on voice quality. In Proceedings of the 14th International Congress of Phonetic Sciences (pp. 2029–2032). San Francisco, CA: University of California, Berkeley.

  • Juslin, P. N., & Laukka, P. (2003). Communication of emotions in vocal expression and music performance: Different channels, same code? Psychological Bulletin, 129, 770–814. doi:10.1037/0033-2909.129.5.770

    Article  PubMed  Google Scholar 

  • Likert, R. (1936). A method for measuring the sales influence of a radio program. Journal of Applied Psychology, 20, 175–182.

    Article  Google Scholar 

  • Liu, P., & Pell, M. D. (2012). Recognizing vocal emotions in Mandarin Chinese: A validated database of Chinese vocal emotional stimuli. Behavior Research Methods, 44, 1042–1051. doi:10.3758/s13428-012-0203-3

    Article  PubMed  Google Scholar 

  • Luo, X., Fu, Q. J., & Galvin, J. J. (2007). Vocal emotion recognition by normal-hearing listeners and cochlear implant users. Trends in Amplification, 11, 301–315.

    Article  PubMed Central  Google Scholar 

  • Makarova, V., & Petrushin, V. A. (2002, September). RUSLANA: A database of Russian emotional utterances. Paper presented at the International Conference of Spoken Language Processing, Colorado, USA.

  • Maurage, P., Joassin, F., Philippot, P., & Campanella, S. (2007). A validated battery of vocal emotional expressions. Neuropsychological Trends, 2, 63–74.

    Google Scholar 

  • Mitchell, R. L., Elliott, R., Barry, M., Cruttenden, A., & Woodruff, P. W. (2004). Neural response to emotional prosody in schizophrenia and in bipolar affective disorder. British Journal of Psychiatry, 184, 223–230.

    Article  PubMed  Google Scholar 

  • Niimi, Y., Kasamatsu, M., Nishinoto, T., & Araki, M. (2001, August). Synthesis of emotional speech using prosodically balanced VCV segments. Paper presented at the 4th ISCA Tutorial and Research Workshop (ITRW) on Speech Synthesis, Perthshire, Scotland.

  • O’Brien, N. (2011). Stanislavski in practice: Exercises for students. New York, NY: Routledge.

    Google Scholar 

  • Pakosz, M. (1983). Attitudinal judgments in intonation: Some evidence for a theory. Journal of Psycholinguistic Research, 12, 311–326.

    Google Scholar 

  • Paulmann, S., Pell, M. D., & Kotz, S. A. (2008). How aging affects the recognition of emotional speech. Brain and Language, 104, 262–269. doi:10.1016/j.bandl.2007.03.002

    Article  PubMed  Google Scholar 

  • Pell, M. D. (2001). Influence of emotion and focus location on prosody in matched statements and questions. Journal of the Acoustical Society of America, 109, 1668–1680. doi:10.1121/1.1352088

    Article  PubMed  Google Scholar 

  • Pell, M. D. (2002). Evaluation of nonverbal emotion in face and voice: Some preliminary findings on a new battery of tests. Brain and Cognition, 48, 499–514.

    PubMed  Google Scholar 

  • Pell, M. D., Jaywant, A., Monetta, L., & Kotz, S. A. (2011). Emotional speech processing: Disentangling the effects of prosody and semantic cues. Cognition and Emotion, 25, 834–853. doi:10.1080/02699931.2010.516915

    Article  PubMed  Google Scholar 

  • Pell, M. D., & Kotz, S. A. (2011). On the time course of vocal emotion recognition. PLoS ONE, 6, e27252. doi:10.1371/journal.pone.0016505

    Article  Google Scholar 

  • Pell, M. D., Paulmann, S., Dara, C., Alasseri, A., & Kotz, S. A. (2009). Factors in the recognition of vocally expressed emotions: A comparison of four languages. Journal of Phonetics, 37, 417–435.

    Article  Google Scholar 

  • Pell, M. D., & Skorup, V. (2008). Implicit processing of emotional prosody in a foreign versus native language. Speech Communication, 50, 519–530.

    Article  Google Scholar 

  • Petrushin, V. (1999, November). Emotion in speech: Recognition and application to call centers. Paper presented at the Conference on Artificial Neural Networks in Engineering, St. Louis, USA.

  • Roach, P. (2000, September). Techniques for the phonetic description of emotional speech. Paper presented at the ISCA Tutorial and Research Workshop (ITRW) on Speech and Emotion, Newcastle, Northern Ireland, UK.

  • Roach, P., Stibbard, R., Osborne, J., Arnfield, S., & Setter, J. (1998). Transcription of prosodic and paralinguistic features of emotional speech. Journal of the International Phonetic Association, 28, 83–94.

    Article  Google Scholar 

  • Russ, J. B., Gur, R. C., & Bilker, W. B. (2008). Validation of affective and neutral sentence content for prosodic testing. Behavior Research Methods, 40, 935–939. doi:10.3758/BRM.40.4.935

    Article  PubMed  Google Scholar 

  • Russell, J. A. (1994). Is there universal recognition of emotion from facial expressions? A review of the cross-cultural studies. Psychological Bulletin, 115, 102–141. doi:10.1037/0033-2909.115.1.102

    Article  PubMed  Google Scholar 

  • Scherer, K. R. (1986). Vocal affect expression: A review and a model for future research. Psychological Bulletin, 99, 143–165. doi:10.1037/0033-2909.99.2.143

    Article  PubMed  Google Scholar 

  • Scherer, K. R., Banse, R., Wallbott, H. G., & Goldbeck, T. (1991). Vocal cues in emotion encoding and decoding. Motivation and Emotion, 15, 123–148.

    Article  Google Scholar 

  • Scherer, K. R., Ladd, D. R., & Silverman, K. E. A. (1984). Vocal cues to speaker affect: Testing two models. Journal of the Acoustical Society of America, 76, 1346–1356. doi:10.1121/1.391450

    Article  Google Scholar 

  • Schmuckler, M. A. (2001). What is ecological validity? A dimensional analysis. Infancy, 2, 419–436.

    Article  Google Scholar 

  • Schneider, W., Eschman, A., & Zuccolotto, A. (2002). E-Prime 1.0 user’s guide. Pittsburgh, PA: Psychological Software Tools.

    Google Scholar 

  • Sims-Williams, N., & Bailey, H. W. (Eds.). (2002). Indo-Iranian languages and peoples. Oxford, UK: Oxford University Press.

    Google Scholar 

  • Tanenhaus, M. K., & Brown-Schmidt, S. (2007). Language processing in the natural world. Philosophical Transactions of the Royal Society B, 363, 1105–1122. doi:10.1098/rstb.2007.2162

    Article  Google Scholar 

  • Ververidis, D., & Kotropoulos, C. (2003, October). A state of the art review on emotional speech databases. Paper presented at the 1st Richmedia Conference, Lausanne, Switzerland.

  • Wallbott, H. G., & Scherer, K. R. (1986). How universal and specific is emotional experience? Evidence from 27 countries on five continents. Social Science Information, 25, 763–795.

    Article  Google Scholar 

  • Wilson, D., & Wharton, T. (2006). Relevance and prosody. Journal of Pragmatics, 38, 1559–1579.

    Article  Google Scholar 

  • Yu, F., Chang, E., Xu, Y., & Shum, H. Y. (2001, October). Emotion detection from speech to enrich multimedia content. Paper presented at the 2nd IEEE Pacific Rim Conference on Multimedia, London, United Kingdom.

Download references

Author note

The authors express their appreciation to Silke Paulmann, Maria Macuch, Klaus Scherer, Luna Beck, Dar Meshi, Francesca Citron, Pooya Keshtiari, Arsalan Kahnemuyipour, Saeid Sheikh Rohani, Georg Hosoya, Jörg Dreyer, Masood Ghayoomi, Elif Alkan Härtwig, Lea Gutz, Reza Nilipour, Yahya Modarresi Tehrani, Fatemeh Izadi, Trudi Falamaki-Zehnder, Liila Taruffi, Laura Hahn, Karl Brian Northeast, Arash Aryani, Christa Bös, and Afsaneh Fazly for their help with sentence construction and validation, recordings, data collection and organization, and manuscript preparation. A special thank you to our two speakers Mithra Zahedi and Vahid Etemad. The authors also thank all of the participants who took part in the various experiments in this study. This research was financially supported by a grant from the German Research Society (DFG) to N.K.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Niloofar Keshtiari.

Additional information

Copyright 2010–2012 Niloofar Keshtiari. All rights reserved. This database, despite being available to researchers, is subject to copyright law. Any unauthorized use, copying, or distribution of material contained in the database without written permission from the copyright holder will lead to copyright infringement with possible ensuing litigation. Directive 96/9/EC of the European Parliament and the Council of March 11 (1996) describe the legal protection of databases. Published work that refers to the Persian Emotional Speech Database (Persian ESD) should cite this article.

Electronic supplementary material

Below is the link to the electronic supplementary material.

ESM 1

(PDF 176 kb)

ESM 2

(PDF 166 kb)

ESM 3

(PDF 170 kb)

Appendices

Appendix A: Sample of the Persian sentences included in the database, along with their transliteration, glosses, and English translation

figure a

Abbreviations used are as follows: Ez: ezafe particle; CL: clitic ; CL.3SG third person singular clitic; DOM: direct object marker; 3SG: third person singular.

Appendix B: List of scenarios

Anger: The director is late for the rehearsal again and we have to work until late at night. Once again I have to cancel an important date.
Disgust: I have a summer job in a restaurant. Today I have to clean the toilets which are incredibly filthy and smell very strongly.
Fear: While I am on a tour bus, the driver loses control of the bus while trying to avoid another car. The bus comes to a standstill at the edge of a precipice, threatening to fall over.
Happiness: I am acting in a new play. From the start, I get along extremely well with my colleagues who even throw a party for me.
Sadness: I get a call to tell me that my best friend died suddenly.

Example 1

figure b

Note that Persian is written from right to left. The abbreviations are as follow: Ez: ezafe particle; CL.3SG third person singular clitic; DOM: direct object marker; 3SG: third person singular.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Keshtiari, N., Kuhlmann, M., Eslami, M. et al. Recognizing emotional speech in Persian: A validated database of Persian emotional speech (Persian ESD). Behav Res 47, 275–294 (2015). https://doi.org/10.3758/s13428-014-0467-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.3758/s13428-014-0467-x

Keywords

  • Emotion recognition
  • Speech
  • Emotional speech database
  • Prosody
  • Persian