Abstract
The dynamic use of voice qualities in spoken language can reveal useful information on a speaker’s attitude, mood and affective states. This information may be desirable for a range of speech technology applications. However, annotation of voice quality may frequently be inconsistent across raters. But whom should one trust or is the truth somewhere in between? The current study looks first to describe a voice quality feature set that is suitable for differentiating voice qualities on a tense to breathy dimension. These features are used as inputs to a fuzzy-input fuzzy-output support vector machine (F2SVM) algorithm, to automatically classify the voice qualities. The F2SVM is compared to standard approaches and shows promising results. Performances for cross validation, leave one speaker out, and cross corpus experiments of around 90% are achieved.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Laver, J.: The Phonetic Description of Voice Quality. Cambridge University Press (1980)
Gobl, C.: The voice source in speech communication. Ph. D. Thesis, KTH Speech Music and Hearing, Stockholm (2003)
Gobl, C., Ní Chasaide, A.: The role of voice quality in communicating emotion, mood and attitude. Speech Comm. 40, 189–212 (2003)
Lugger, M., Yang, B.: Cascaded emotion classification via psychological emotion dimensions using a large set of voice quality parameters. In: Proc. of ICASSP, pp. 4945–4948 (2008)
Wester, M.: Automatic classification of voice quality: Comparing regression models and hidden Markov models. In: Proc. of VOICEDATA 1998, pp. 92–97 (1998)
Thiel, C., Scherer, S., Schwenker, F.: Fuzzy-Input Fuzzy-Output One-Against-All Support Vector Machines. In: Apolloni, B., Howlett, R.J., Jain, L. (eds.) KES 2007, Part III. LNCS (LNAI), vol. 4694, pp. 156–165. Springer, Heidelberg (2007)
Alku, P., Bäckström, T., Vilkman, E.: Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering. Speech Comm. 11(2-3), 109–118 (1992)
Fant, G., Liljencrants, J., Lin, Q.: A four parameter model of glottal flow. In: KTH, QPSR, vol. 4, pp. 1–13 (1985)
Strik, H., Cranen, B., Boves, L.: Fitting a LF-model to inverse filter signals. In: Proc. of Eurospeech (ISCA), pp. 103–106 (1993)
Gobl, C.: A preliminary study of acoustic voice quality correlates. In: KTH, QPSR, vol. 4, pp. 9–21 (1989)
Kane, J., Kane, M., Gobl, C.: A spectral LF model based approach to voice source parameterisation. In: Proc. of Interspeech (ISCA), pp. 2606–2609 (2010)
Alku, P., Bäckström, T., Vilkman, E.: Normalized amplitude quotient for parameterization of the glottal flow. J. Acoust. Soc. Am. 112(2), 701–710 (2002)
Airas, M., Alku, P.: Comparison of multiple voice source parameters in different phonation types. In: Proc. of Interspeech (ISCA), pp. 1410–1413 (2007)
Lugger, M., Yang, B.: Classification of different speaking groups by means of voice quality parameters. In: Proc. of Sprach-Kommunikation (VDE) (2006)
Kane, J., Gobl, C.: Identifying regions of non-modal phonation using features of the wavelet transform. In: Proc. of Interspeech (ISCA), pp. 177–180 (2011)
Bennett, K.P., Campbell, C.: Support vector machines: hype or hallelujah? ACM SIGKDD Newsletter 2(2), 1–13 (2000)
Schwenker, F.: Solving Multi-class Pattern Recognition Problems with Tree-Structured Support Vector Machines. In: Radig, B., Florczyk, S. (eds.) DAGM 2001. LNCS, vol. 2191, pp. 283–290. Springer, Heidelberg (2001)
Lin, C.F., Wang, S.D.: Fuzzy support vector machines. IEEE Trans. Neural Net. 13, 464–471 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Scherer, S., Kane, J., Gobl, C., Schwenker, F. (2013). The Effect of Fuzzy Training Targets on Voice Quality Classification. In: Schwenker, F., Scherer, S., Morency, LP. (eds) Multimodal Pattern Recognition of Social Signals in Human-Computer-Interaction. MPRSS 2012. Lecture Notes in Computer Science(), vol 7742. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37081-6_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-37081-6_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37080-9
Online ISBN: 978-3-642-37081-6
eBook Packages: Computer ScienceComputer Science (R0)