Skip to main content

The Effect of Fuzzy Training Targets on Voice Quality Classification

  • Conference paper
  • 694 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7742))

Abstract

The dynamic use of voice qualities in spoken language can reveal useful information on a speaker’s attitude, mood and affective states. This information may be desirable for a range of speech technology applications. However, annotation of voice quality may frequently be inconsistent across raters. But whom should one trust or is the truth somewhere in between? The current study looks first to describe a voice quality feature set that is suitable for differentiating voice qualities on a tense to breathy dimension. These features are used as inputs to a fuzzy-input fuzzy-output support vector machine (F2SVM) algorithm, to automatically classify the voice qualities. The F2SVM is compared to standard approaches and shows promising results. Performances for cross validation, leave one speaker out, and cross corpus experiments of around 90% are achieved.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   49.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Laver, J.: The Phonetic Description of Voice Quality. Cambridge University Press (1980)

    Google Scholar 

  2. Gobl, C.: The voice source in speech communication. Ph. D. Thesis, KTH Speech Music and Hearing, Stockholm (2003)

    Google Scholar 

  3. Gobl, C., Ní Chasaide, A.: The role of voice quality in communicating emotion, mood and attitude. Speech Comm. 40, 189–212 (2003)

    Article  MATH  Google Scholar 

  4. Lugger, M., Yang, B.: Cascaded emotion classification via psychological emotion dimensions using a large set of voice quality parameters. In: Proc. of ICASSP, pp. 4945–4948 (2008)

    Google Scholar 

  5. Wester, M.: Automatic classification of voice quality: Comparing regression models and hidden Markov models. In: Proc. of VOICEDATA 1998, pp. 92–97 (1998)

    Google Scholar 

  6. Thiel, C., Scherer, S., Schwenker, F.: Fuzzy-Input Fuzzy-Output One-Against-All Support Vector Machines. In: Apolloni, B., Howlett, R.J., Jain, L. (eds.) KES 2007, Part III. LNCS (LNAI), vol. 4694, pp. 156–165. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  7. Alku, P., Bäckström, T., Vilkman, E.: Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering. Speech Comm. 11(2-3), 109–118 (1992)

    Article  Google Scholar 

  8. Fant, G., Liljencrants, J., Lin, Q.: A four parameter model of glottal flow. In: KTH, QPSR, vol. 4, pp. 1–13 (1985)

    Google Scholar 

  9. Strik, H., Cranen, B., Boves, L.: Fitting a LF-model to inverse filter signals. In: Proc. of Eurospeech (ISCA), pp. 103–106 (1993)

    Google Scholar 

  10. Gobl, C.: A preliminary study of acoustic voice quality correlates. In: KTH, QPSR, vol. 4, pp. 9–21 (1989)

    Google Scholar 

  11. Kane, J., Kane, M., Gobl, C.: A spectral LF model based approach to voice source parameterisation. In: Proc. of Interspeech (ISCA), pp. 2606–2609 (2010)

    Google Scholar 

  12. Alku, P., Bäckström, T., Vilkman, E.: Normalized amplitude quotient for parameterization of the glottal flow. J. Acoust. Soc. Am. 112(2), 701–710 (2002)

    Article  Google Scholar 

  13. Airas, M., Alku, P.: Comparison of multiple voice source parameters in different phonation types. In: Proc. of Interspeech (ISCA), pp. 1410–1413 (2007)

    Google Scholar 

  14. Lugger, M., Yang, B.: Classification of different speaking groups by means of voice quality parameters. In: Proc. of Sprach-Kommunikation (VDE) (2006)

    Google Scholar 

  15. Kane, J., Gobl, C.: Identifying regions of non-modal phonation using features of the wavelet transform. In: Proc. of Interspeech (ISCA), pp. 177–180 (2011)

    Google Scholar 

  16. Bennett, K.P., Campbell, C.: Support vector machines: hype or hallelujah? ACM SIGKDD Newsletter 2(2), 1–13 (2000)

    Article  Google Scholar 

  17. Schwenker, F.: Solving Multi-class Pattern Recognition Problems with Tree-Structured Support Vector Machines. In: Radig, B., Florczyk, S. (eds.) DAGM 2001. LNCS, vol. 2191, pp. 283–290. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  18. Lin, C.F., Wang, S.D.: Fuzzy support vector machines. IEEE Trans. Neural Net. 13, 464–471 (2002)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Scherer, S., Kane, J., Gobl, C., Schwenker, F. (2013). The Effect of Fuzzy Training Targets on Voice Quality Classification. In: Schwenker, F., Scherer, S., Morency, LP. (eds) Multimodal Pattern Recognition of Social Signals in Human-Computer-Interaction. MPRSS 2012. Lecture Notes in Computer Science(), vol 7742. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37081-6_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-37081-6_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-37080-9

  • Online ISBN: 978-3-642-37081-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics