Abstract
In this study, spectral slope based features are investigated for characterization and classification of stressed speech. The vocal tract spectrum is modulated with glottal flow spectra, resulting a tilt in the overall spectrum. In this study, spectral tilt is analyzed for different stress classes. Relative formant peak displacement (RFD) is proposed as the displacement of formant peaks from the 1st formant peak. The displacement of 2nd, 3rd and 4th formant peaks from 1st formant peak is termed as RFD 2, RFD 3 and RFD 4, respectively. The features are extracted from linear prediction coefficient (LPC) and cepstrally smoothed log spectrum, respectively. Analysis shows that stress effects higher formant region more than lower formant region. To evaluate the effectiveness of this feature for different stress classes, the performance of stress classification is evaluated. A simulated stressed speech database is collected under four stress conditions, namely, neutral, angry, sad and Lombard from fifteen speakers. The performance of RFD feature is similar to Mel-frequency cepstral coefficient (MFCC). This shows that RFD feature have approximately same discrimination capability for stress as MFCC. Further, the performance of cepstrally smoothed log spectra derived RFD are higher than LPC derived RFD feature. RFD features are combined with MFCC in feature, score and rank level and found improved performance.
Similar content being viewed by others
References
Bulut, M., & Narayanana, S. (2008). On the robustness of overall F0-only modifications to the perception of emotions in speech. The Journal of the Acoustical Society of America, 123, 4547–4558.
Chen, Y. (1988). Cepstral domain talker stress compensation for robust speech recognition. I.E.E.E. Transactions on Acoustics, Speech, and Signal Processing, 36, 433–439.
Cummings, K. E., & Clements, M. A. (1995). Analysis of the glottal excitation of emotionally styles and stressed speech. The Journal of the Acoustical Society of America, 98, 88–98.
Hansen, J. H. (1989). Evaluation of acoustic correlates of speech under stress for robust speech recognition. In Bioengineering conference IEEE (pp. 31–32).
Hansen, J. H. L. (1994). Morphological constrained feature enhancement with adaptive cepstral compensation (MCE-ACC) for speech recognition in noise and lombard effect. IEEE Transactions on Speech and Audio Processing, 2, 598–614.
Hansen, J. H. L., & Clements, A. (1995). Source generator equalization and enhancement of spectral properties for robust speech recognition in noise and stress. IEEE Transactions on Speech and Audio Processing, 3, 407–415.
Hansen, J. H. L. & Sahar (1995). Robust speech recognition training via duration and spectral-based stress token generation. IEEE Transactions on Speech and Audio Processing, 3, 415–421.
Koolagudi, S. G., & Krothapalli, R. S. (2010). Two stage emotion recognition based on speaking rate. International Journal of Speech Technology
Kullback, S. (1968). Information theory and statistics. New York: Dover.
Lippmann, R. P., Mack, E. A., & Paul, D.B. (1987). Multi-style training for robust isolated-word speech recognition. In Proc. 1987 IEEE ICASSP, Apr (pp. 705–708).
Lu, Y., & Cooke, M. (2009). The contribution of changes in F0 and spectral tilt to increased intelligibility of speech produced in noise. Speech Communication, 51, 1253–1262.
Patro, H., Raja, G. S., & Dandapat, S. (2007). Statistical feature evaluation for classification of stressed speech. International Journal of Speech Technology, 10, 143–152.
Rabiner, L., & Juang, B. (2009). Fundamentals of speech recognition.
Raja, G. S. (2007). Feature analysis and compensation for speaker recognition under stressed condition. Ph.D. dissertation, Indian Institute of Technology Guwahati, Department of ECE, Guwahati, India.
Raja, G. S., & Dandapat, S. (2010). Speaker recognition under stressed condition. International Journal of Speech Technology, 13, 141–161.
Ramamohan, S., & Dandapat, S. (2006). Sinusoidal model-based analysis and classification of stressed speech. IEEE Transactions on Audio, Speech and Language Processing, 14, 737–746.
Sainz, Saratxaga, & Navas (2008). Subjective evaluation of an emotional speech database for Basque. In Sixth international language resources and evaluation (LREC’08).
Shukla, S., Prasanna, S. R. M., & Dandapat, S. (2011). Stressed speech processing: Human vs automatic in non-professional speakers scenario. In National conference on communications (NCC) (pp. 1–5). New York: IEEE.
Summers, W. V., Pisoni, D. B., & Bernadski, R. H. (1988). Effect of noise on speech production: Acoustic and perceptual analyses. The Journal of the Acoustical Society of America, 84, 917–928.
Tartter, V. C., Gomes, H., & Litwin, E. (1993). Some acoustic effects of listening to noise on speech production. The Journal of the Acoustical Society of America, 94, 2437–2440.
Ververidis, D., & Kotropoulos, C. (2003). A review of emotional speech databases. In PCI 2003, 9th Panhellenic conference on informatics, November 1–23.
Womack, B. D., & Hansen, J. H. L. (1996). Classification of speech under stress using target driven features. Speech Communication, 20, 131–150.
Zhou, G., & Hansen, J. H. L. (2001). Nonlinear feature based classification of speech under stress. IEEE Transactions on Speech and Audio Processing, 9, 201–216.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Shukla, S., Dandapat, S. & Prasanna, S.R.M. Spectral slope based analysis and classification of stressed speech. Int J Speech Technol 14, 245–258 (2011). https://doi.org/10.1007/s10772-011-9100-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-011-9100-x