Skip to main content
Log in

Spectral slope based analysis and classification of stressed speech

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

In this study, spectral slope based features are investigated for characterization and classification of stressed speech. The vocal tract spectrum is modulated with glottal flow spectra, resulting a tilt in the overall spectrum. In this study, spectral tilt is analyzed for different stress classes. Relative formant peak displacement (RFD) is proposed as the displacement of formant peaks from the 1st formant peak. The displacement of 2nd, 3rd and 4th formant peaks from 1st formant peak is termed as RFD 2, RFD 3 and RFD 4, respectively. The features are extracted from linear prediction coefficient (LPC) and cepstrally smoothed log spectrum, respectively. Analysis shows that stress effects higher formant region more than lower formant region. To evaluate the effectiveness of this feature for different stress classes, the performance of stress classification is evaluated. A simulated stressed speech database is collected under four stress conditions, namely, neutral, angry, sad and Lombard from fifteen speakers. The performance of RFD feature is similar to Mel-frequency cepstral coefficient (MFCC). This shows that RFD feature have approximately same discrimination capability for stress as MFCC. Further, the performance of cepstrally smoothed log spectra derived RFD are higher than LPC derived RFD feature. RFD features are combined with MFCC in feature, score and rank level and found improved performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bulut, M., & Narayanana, S. (2008). On the robustness of overall F0-only modifications to the perception of emotions in speech. The Journal of the Acoustical Society of America, 123, 4547–4558.

    Article  Google Scholar 

  • Chen, Y. (1988). Cepstral domain talker stress compensation for robust speech recognition. I.E.E.E. Transactions on Acoustics, Speech, and Signal Processing, 36, 433–439.

    Article  MATH  Google Scholar 

  • Cummings, K. E., & Clements, M. A. (1995). Analysis of the glottal excitation of emotionally styles and stressed speech. The Journal of the Acoustical Society of America, 98, 88–98.

    Article  Google Scholar 

  • Hansen, J. H. (1989). Evaluation of acoustic correlates of speech under stress for robust speech recognition. In Bioengineering conference IEEE (pp. 31–32).

    Google Scholar 

  • Hansen, J. H. L. (1994). Morphological constrained feature enhancement with adaptive cepstral compensation (MCE-ACC) for speech recognition in noise and lombard effect. IEEE Transactions on Speech and Audio Processing, 2, 598–614.

    Article  Google Scholar 

  • Hansen, J. H. L., & Clements, A. (1995). Source generator equalization and enhancement of spectral properties for robust speech recognition in noise and stress. IEEE Transactions on Speech and Audio Processing, 3, 407–415.

    Article  Google Scholar 

  • Hansen, J. H. L. & Sahar (1995). Robust speech recognition training via duration and spectral-based stress token generation. IEEE Transactions on Speech and Audio Processing, 3, 415–421.

    Article  Google Scholar 

  • Koolagudi, S. G., & Krothapalli, R. S. (2010). Two stage emotion recognition based on speaking rate. International Journal of Speech Technology

  • Kullback, S. (1968). Information theory and statistics. New York: Dover.

    Google Scholar 

  • Lippmann, R. P., Mack, E. A., & Paul, D.B. (1987). Multi-style training for robust isolated-word speech recognition. In Proc. 1987 IEEE ICASSP, Apr (pp. 705–708).

    Google Scholar 

  • Lu, Y., & Cooke, M. (2009). The contribution of changes in F0 and spectral tilt to increased intelligibility of speech produced in noise. Speech Communication, 51, 1253–1262.

    Article  Google Scholar 

  • Patro, H., Raja, G. S., & Dandapat, S. (2007). Statistical feature evaluation for classification of stressed speech. International Journal of Speech Technology, 10, 143–152.

    Article  Google Scholar 

  • Rabiner, L., & Juang, B. (2009). Fundamentals of speech recognition.

  • Raja, G. S. (2007). Feature analysis and compensation for speaker recognition under stressed condition. Ph.D. dissertation, Indian Institute of Technology Guwahati, Department of ECE, Guwahati, India.

  • Raja, G. S., & Dandapat, S. (2010). Speaker recognition under stressed condition. International Journal of Speech Technology, 13, 141–161.

    Article  Google Scholar 

  • Ramamohan, S., & Dandapat, S. (2006). Sinusoidal model-based analysis and classification of stressed speech. IEEE Transactions on Audio, Speech and Language Processing, 14, 737–746.

    Article  Google Scholar 

  • Sainz, Saratxaga, & Navas (2008). Subjective evaluation of an emotional speech database for Basque. In Sixth international language resources and evaluation (LREC’08).

    Google Scholar 

  • Shukla, S., Prasanna, S. R. M., & Dandapat, S. (2011). Stressed speech processing: Human vs automatic in non-professional speakers scenario. In National conference on communications (NCC) (pp. 1–5). New York: IEEE.

    Chapter  Google Scholar 

  • Summers, W. V., Pisoni, D. B., & Bernadski, R. H. (1988). Effect of noise on speech production: Acoustic and perceptual analyses. The Journal of the Acoustical Society of America, 84, 917–928.

    Article  Google Scholar 

  • Tartter, V. C., Gomes, H., & Litwin, E. (1993). Some acoustic effects of listening to noise on speech production. The Journal of the Acoustical Society of America, 94, 2437–2440.

    Article  Google Scholar 

  • Ververidis, D., & Kotropoulos, C. (2003). A review of emotional speech databases. In PCI 2003, 9th Panhellenic conference on informatics, November 1–23.

    Google Scholar 

  • Womack, B. D., & Hansen, J. H. L. (1996). Classification of speech under stress using target driven features. Speech Communication, 20, 131–150.

    Article  Google Scholar 

  • Zhou, G., & Hansen, J. H. L. (2001). Nonlinear feature based classification of speech under stress. IEEE Transactions on Speech and Audio Processing, 9, 201–216.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sumitra Shukla.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shukla, S., Dandapat, S. & Prasanna, S.R.M. Spectral slope based analysis and classification of stressed speech. Int J Speech Technol 14, 245–258 (2011). https://doi.org/10.1007/s10772-011-9100-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-011-9100-x

Keywords

Navigation