The Effect of Fuzzy Training Targets on Voice Quality Classification

Scherer, Stefan; Kane, John; Gobl, Christer; Schwenker, Friedhelm

doi:10.1007/978-3-642-37081-6_6

The Effect of Fuzzy Training Targets on Voice Quality Classification

Stefan Scherer^21,23,
John Kane²²,
Christer Gobl²² &
…
Friedhelm Schwenker²³

Conference paper

694 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7742))

Abstract

The dynamic use of voice qualities in spoken language can reveal useful information on a speaker’s attitude, mood and affective states. This information may be desirable for a range of speech technology applications. However, annotation of voice quality may frequently be inconsistent across raters. But whom should one trust or is the truth somewhere in between? The current study looks first to describe a voice quality feature set that is suitable for differentiating voice qualities on a tense to breathy dimension. These features are used as inputs to a fuzzy-input fuzzy-output support vector machine (F²SVM) algorithm, to automatically classify the voice qualities. The F²SVM is compared to standard approaches and shows promising results. Performances for cross validation, leave one speaker out, and cross corpus experiments of around 90% are achieved.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 49.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Laver, J.: The Phonetic Description of Voice Quality. Cambridge University Press (1980)
Google Scholar
Gobl, C.: The voice source in speech communication. Ph. D. Thesis, KTH Speech Music and Hearing, Stockholm (2003)
Google Scholar
Gobl, C., Ní Chasaide, A.: The role of voice quality in communicating emotion, mood and attitude. Speech Comm. 40, 189–212 (2003)
Article MATH Google Scholar
Lugger, M., Yang, B.: Cascaded emotion classification via psychological emotion dimensions using a large set of voice quality parameters. In: Proc. of ICASSP, pp. 4945–4948 (2008)
Google Scholar
Wester, M.: Automatic classification of voice quality: Comparing regression models and hidden Markov models. In: Proc. of VOICEDATA 1998, pp. 92–97 (1998)
Google Scholar
Thiel, C., Scherer, S., Schwenker, F.: Fuzzy-Input Fuzzy-Output One-Against-All Support Vector Machines. In: Apolloni, B., Howlett, R.J., Jain, L. (eds.) KES 2007, Part III. LNCS (LNAI), vol. 4694, pp. 156–165. Springer, Heidelberg (2007)
Chapter Google Scholar
Alku, P., Bäckström, T., Vilkman, E.: Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering. Speech Comm. 11(2-3), 109–118 (1992)
Article Google Scholar
Fant, G., Liljencrants, J., Lin, Q.: A four parameter model of glottal flow. In: KTH, QPSR, vol. 4, pp. 1–13 (1985)
Google Scholar
Strik, H., Cranen, B., Boves, L.: Fitting a LF-model to inverse filter signals. In: Proc. of Eurospeech (ISCA), pp. 103–106 (1993)
Google Scholar
Gobl, C.: A preliminary study of acoustic voice quality correlates. In: KTH, QPSR, vol. 4, pp. 9–21 (1989)
Google Scholar
Kane, J., Kane, M., Gobl, C.: A spectral LF model based approach to voice source parameterisation. In: Proc. of Interspeech (ISCA), pp. 2606–2609 (2010)
Google Scholar
Alku, P., Bäckström, T., Vilkman, E.: Normalized amplitude quotient for parameterization of the glottal flow. J. Acoust. Soc. Am. 112(2), 701–710 (2002)
Article Google Scholar
Airas, M., Alku, P.: Comparison of multiple voice source parameters in different phonation types. In: Proc. of Interspeech (ISCA), pp. 1410–1413 (2007)
Google Scholar
Lugger, M., Yang, B.: Classification of different speaking groups by means of voice quality parameters. In: Proc. of Sprach-Kommunikation (VDE) (2006)
Google Scholar
Kane, J., Gobl, C.: Identifying regions of non-modal phonation using features of the wavelet transform. In: Proc. of Interspeech (ISCA), pp. 177–180 (2011)
Google Scholar
Bennett, K.P., Campbell, C.: Support vector machines: hype or hallelujah? ACM SIGKDD Newsletter 2(2), 1–13 (2000)
Article Google Scholar
Schwenker, F.: Solving Multi-class Pattern Recognition Problems with Tree-Structured Support Vector Machines. In: Radig, B., Florczyk, S. (eds.) DAGM 2001. LNCS, vol. 2191, pp. 283–290. Springer, Heidelberg (2001)
Chapter Google Scholar
Lin, C.F., Wang, S.D.: Fuzzy support vector machines. IEEE Trans. Neural Net. 13, 464–471 (2002)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Creative Technologies, University of Southern California, United States
Stefan Scherer
Phonetics and Speech Laboratory, Trinity College Dublin, Ireland
John Kane & Christer Gobl
Institute of Neural Information Processing, Ulm University, Germany
Stefan Scherer & Friedhelm Schwenker

Authors

Stefan Scherer
View author publications
You can also search for this author in PubMed Google Scholar
John Kane
View author publications
You can also search for this author in PubMed Google Scholar
Christer Gobl
View author publications
You can also search for this author in PubMed Google Scholar
Friedhelm Schwenker
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Neural Information Processing, Ulm University, 89069, Ulm, Germany
Friedhelm Schwenker
Institute for Creative Technologies, Multimodal Communication and Computation Laboratory, University of Southern California, 12015 Waterfront Drive, 90094, Playa Vista, CA, USA
Stefan Scherer & Louis-Philippe Morency &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Scherer, S., Kane, J., Gobl, C., Schwenker, F. (2013). The Effect of Fuzzy Training Targets on Voice Quality Classification. In: Schwenker, F., Scherer, S., Morency, LP. (eds) Multimodal Pattern Recognition of Social Signals in Human-Computer-Interaction. MPRSS 2012. Lecture Notes in Computer Science(), vol 7742. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37081-6_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-37081-6_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37080-9
Online ISBN: 978-3-642-37081-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics