Spectral slope based analysis and classification of stressed speech

Shukla, Sumitra; Dandapat, S.; Prasanna, S. R. M.

doi:10.1007/s10772-011-9100-x

Spectral slope based analysis and classification of stressed speech

Published: 09 August 2011

Volume 14, pages 245–258, (2011)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Sumitra Shukla¹,
S. Dandapat¹ &
S. R. M. Prasanna¹

466 Accesses
15 Citations
Explore all metrics

Abstract

In this study, spectral slope based features are investigated for characterization and classification of stressed speech. The vocal tract spectrum is modulated with glottal flow spectra, resulting a tilt in the overall spectrum. In this study, spectral tilt is analyzed for different stress classes. Relative formant peak displacement (RFD) is proposed as the displacement of formant peaks from the 1^st formant peak. The displacement of 2^nd, 3^rd and 4^th formant peaks from 1^st formant peak is termed as RFD ₂, RFD ₃ and RFD ₄, respectively. The features are extracted from linear prediction coefficient (LPC) and cepstrally smoothed log spectrum, respectively. Analysis shows that stress effects higher formant region more than lower formant region. To evaluate the effectiveness of this feature for different stress classes, the performance of stress classification is evaluated. A simulated stressed speech database is collected under four stress conditions, namely, neutral, angry, sad and Lombard from fifteen speakers. The performance of RFD feature is similar to Mel-frequency cepstral coefficient (MFCC). This shows that RFD feature have approximately same discrimination capability for stress as MFCC. Further, the performance of cepstrally smoothed log spectra derived RFD are higher than LPC derived RFD feature. RFD features are combined with MFCC in feature, score and rank level and found improved performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Bulut, M., & Narayanana, S. (2008). On the robustness of overall F0-only modifications to the perception of emotions in speech. The Journal of the Acoustical Society of America, 123, 4547–4558.
Article Google Scholar
Chen, Y. (1988). Cepstral domain talker stress compensation for robust speech recognition. I.E.E.E. Transactions on Acoustics, Speech, and Signal Processing, 36, 433–439.
Article MATH Google Scholar
Cummings, K. E., & Clements, M. A. (1995). Analysis of the glottal excitation of emotionally styles and stressed speech. The Journal of the Acoustical Society of America, 98, 88–98.
Article Google Scholar
Hansen, J. H. (1989). Evaluation of acoustic correlates of speech under stress for robust speech recognition. In Bioengineering conference IEEE (pp. 31–32).
Google Scholar
Hansen, J. H. L. (1994). Morphological constrained feature enhancement with adaptive cepstral compensation (MCE-ACC) for speech recognition in noise and lombard effect. IEEE Transactions on Speech and Audio Processing, 2, 598–614.
Article Google Scholar
Hansen, J. H. L., & Clements, A. (1995). Source generator equalization and enhancement of spectral properties for robust speech recognition in noise and stress. IEEE Transactions on Speech and Audio Processing, 3, 407–415.
Article Google Scholar
Hansen, J. H. L. & Sahar (1995). Robust speech recognition training via duration and spectral-based stress token generation. IEEE Transactions on Speech and Audio Processing, 3, 415–421.
Article Google Scholar
Koolagudi, S. G., & Krothapalli, R. S. (2010). Two stage emotion recognition based on speaking rate. International Journal of Speech Technology
Kullback, S. (1968). Information theory and statistics. New York: Dover.
Google Scholar
Lippmann, R. P., Mack, E. A., & Paul, D.B. (1987). Multi-style training for robust isolated-word speech recognition. In Proc. 1987 IEEE ICASSP, Apr (pp. 705–708).
Google Scholar
Lu, Y., & Cooke, M. (2009). The contribution of changes in F0 and spectral tilt to increased intelligibility of speech produced in noise. Speech Communication, 51, 1253–1262.
Article Google Scholar
Patro, H., Raja, G. S., & Dandapat, S. (2007). Statistical feature evaluation for classification of stressed speech. International Journal of Speech Technology, 10, 143–152.
Article Google Scholar
Rabiner, L., & Juang, B. (2009). Fundamentals of speech recognition.
Raja, G. S. (2007). Feature analysis and compensation for speaker recognition under stressed condition. Ph.D. dissertation, Indian Institute of Technology Guwahati, Department of ECE, Guwahati, India.
Raja, G. S., & Dandapat, S. (2010). Speaker recognition under stressed condition. International Journal of Speech Technology, 13, 141–161.
Article Google Scholar
Ramamohan, S., & Dandapat, S. (2006). Sinusoidal model-based analysis and classification of stressed speech. IEEE Transactions on Audio, Speech and Language Processing, 14, 737–746.
Article Google Scholar
Sainz, Saratxaga, & Navas (2008). Subjective evaluation of an emotional speech database for Basque. In Sixth international language resources and evaluation (LREC’08).
Google Scholar
Shukla, S., Prasanna, S. R. M., & Dandapat, S. (2011). Stressed speech processing: Human vs automatic in non-professional speakers scenario. In National conference on communications (NCC) (pp. 1–5). New York: IEEE.
Chapter Google Scholar
Summers, W. V., Pisoni, D. B., & Bernadski, R. H. (1988). Effect of noise on speech production: Acoustic and perceptual analyses. The Journal of the Acoustical Society of America, 84, 917–928.
Article Google Scholar
Tartter, V. C., Gomes, H., & Litwin, E. (1993). Some acoustic effects of listening to noise on speech production. The Journal of the Acoustical Society of America, 94, 2437–2440.
Article Google Scholar
Ververidis, D., & Kotropoulos, C. (2003). A review of emotional speech databases. In PCI 2003, 9th Panhellenic conference on informatics, November 1–23.
Google Scholar
Womack, B. D., & Hansen, J. H. L. (1996). Classification of speech under stress using target driven features. Speech Communication, 20, 131–150.
Article Google Scholar
Zhou, G., & Hansen, J. H. L. (2001). Nonlinear feature based classification of speech under stress. IEEE Transactions on Speech and Audio Processing, 9, 201–216.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronics and Electrical Engineering, Indian Institute of Technology Guwahati, Guwahati, 781039, India
Sumitra Shukla, S. Dandapat & S. R. M. Prasanna

Authors

Sumitra Shukla
View author publications
You can also search for this author in PubMed Google Scholar
S. Dandapat
View author publications
You can also search for this author in PubMed Google Scholar
S. R. M. Prasanna
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sumitra Shukla.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shukla, S., Dandapat, S. & Prasanna, S.R.M. Spectral slope based analysis and classification of stressed speech. Int J Speech Technol 14, 245–258 (2011). https://doi.org/10.1007/s10772-011-9100-x

Download citation

Received: 14 March 2011
Accepted: 17 July 2011
Published: 09 August 2011
Issue Date: September 2011
DOI: https://doi.org/10.1007/s10772-011-9100-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Spectral slope based analysis and classification of stressed speech

Abstract

Access this article

Similar content being viewed by others

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Speech Emotion Recognition: A Comprehensive Survey

Databases, features and classifiers for speech emotion recognition: a review

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Spectral slope based analysis and classification of stressed speech

Abstract

Access this article

Similar content being viewed by others

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Speech Emotion Recognition: A Comprehensive Survey

Databases, features and classifiers for speech emotion recognition: a review

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation