Advertisement

International Journal of Speech Technology

, Volume 10, Issue 2–3, pp 143–152 | Cite as

Statistical feature evaluation for classification of stressed speech

  • H. Patro
  • G. Senthil RajaEmail author
  • S. Dandapat
Article

Abstract

The variations in speech production due to stress have an adverse affect on the performances of speech and speaker recognition algorithms. In this work, different speech features, such as Sinusoidal Frequency Features (SFF), Sinusoidal Amplitude Features (SAF), Cepstral Coefficients (CC) and Mel Frequency Cepstral Coefficients (MFCC), are evaluated to find out their relative effectiveness to represent the stressed speech. Different statistical feature evaluation techniques, such as Probability density characteristics, F-ratio test, Kolmogorov-Smirnov test (KS test) and Vector Quantization (VQ) classifier are used to assess the performances of the speech features. Four different stressed conditions, Neutral, Compassionate, Anger and Happy are tested. The stressed speech database used in this work consists of 600 stressed speech files which are recorded from 30 speakers. SAF shows maximum recognition result followed by SFF, MFCC and CC respectively with the VQ classifier. The relative classification results and the relative magnitudes of F-ratio values for SFF, MFCC and CC features are obtained with the same order. SFF and MFCC feature show consistent relative performance for all the three tests, F-ratio, K-S test and VQ classifier.

Keywords

Feature evaluation Probability density Kolmogorov-Smirnov Test 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Atal, B. S. (1976). Automatic recognition of speakers from their voices. Proc. IEEE, 64(4), 460–476. CrossRefGoogle Scholar
  2. Bhatti, M., Wang, Y., & Guan, L. (2004). A neural network approach for human emotion recognition in speech. In ICSAS’04, proc. of IEEE (pp. 181–184) 2004. Google Scholar
  3. Bou-Ghazale, S. E., & Hansen, J. H. L. (2000). A comparative study of traditional and newly proposed features for recognition of speech under stress. IEEE Transactions on Speech and Audio Processing, 8(4), 429–442. CrossRefGoogle Scholar
  4. Campbell, J. P., Jr. (1997). Speaker recognition: a tutorial. Proc. IEEE, 85(9), 1437–1462. CrossRefGoogle Scholar
  5. De Silva, L. C., & NgFourth, P. C. (2000). Bimodal emotion recognition. In IEEE international conference on automatic face and gesture recognition (pp. 332–335) Mar. 2000. Google Scholar
  6. Hansen, J. H. L., & Womack, B. (1996). Feature analysis and neural network based classification of speech under stress. IEEE Transactions on Speech and Audio Processing, 4(4), 307–313. CrossRefGoogle Scholar
  7. Hansen, J. H. L., Womack, B., & Arsian, L. M. (1994). A source generator based production model for environmental robustness in speech recognition. In International conference on spoken language processing (ICSLP) (pp. 1003–1006) 1994. Google Scholar
  8. Jensen, J., & Hansen, J. H. L. (2001). Speech enhancement using a constrained iterative sinusoidal model. IEEE Transactions on Speech and Audio Processing, 9(7), 731–740. CrossRefGoogle Scholar
  9. McAulay, R., & Quatieri, T. (1986). Speech analysis/synthesis based on a sinusoidal representation. IEEE Transactions on Acoustics, Speech, and Signal Processing, Assp-34(4), 744–754. CrossRefGoogle Scholar
  10. Nwe, T. L., Foo, S. W., & De Silva, L. C. (2003). Detection of stress and emotion in speech using traditional and FFT based log energy features. In Fourth Pacific rim conference on multimedia, information, communications & signal processing (Vol. 3, pp. 1619–1623) Dec. 2003. Google Scholar
  11. O’Shaughnessy, D. (1986). Speaker recognition. ASSP Magazine, 3(4), 4–17, Part 1. CrossRefGoogle Scholar
  12. Press, W., Teukolsky, S., & Vetterling, W. (1992). Flannery, numerical recipes in C. Cambridge: Cambridge University Press. Google Scholar
  13. Rabiner, L., & Juang, B. H. (1993). Fundamentals of speech recognition. Englewood Cliffs: Prentice Hall, 07632. Google Scholar
  14. Ramamohan, S., & Dandapat, S. (2002). Feature analysis for classification of speech under stress. In The Indo-European conference on multilingual communication technologies (IEMCT), Pune, June 2002. Google Scholar
  15. Ramamohan, S., & Dandapat, S. (2006). Sinusoidal model based analysis and classification of stressed speech. IEEE Transactions on Speech and Audio Processing, 14(3), 737–746. CrossRefGoogle Scholar
  16. Sathyanarayana, N., Dandapat, S., & Sahambi, J. S. (2001). Stressed speech analysis using sinusoidal model. In International conference on energy, automation and information technology, Indian Institute of Technology, Kharagpur, India (pp. 10–12) Dec. 2001. Google Scholar
  17. Sato, J., & Morishma, S. (1996). Emotion modeling in speech production using emotion space. In IEEE international workshop on robot and human communication (pp. 472–477) Sep. 1996. Google Scholar
  18. Schuller, B., Rigoll, G., & Lang, M. (2004). Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine—belief network architecture. In International conference on acoustics, speech and signal processing (ICASSP) (pp. 577–580) 2004. Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.Department of Electronics and Communication EngineeringIndian Institute of Technology GuwahatiGuwahatiIndia

Personalised recommendations