Skip to main content

Part of the book series: Springer Theses ((Springer Theses))

  • 1837 Accesses

Abstract

This chapter summarises the methods presented for automatic speech and music analysis and the results obtained for speech emotion analytics and music genre identification with the openSMILE toolkit developed by the author. Further, it is discussed here if and how the aims defined upfront were achieved and open issues for future work are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    According to Google scholar citations.

  2. 2.

    It is to note that, a joint normalisation of training and test set might be best, however, it is not possible, as at training time the test set is not known (or must not be used!) but would be required to normalise the training set.

References

  • S. Alghowinem, R. Goecke, M. Wagner, J. Epps, M. Breakspear, G. Parker, Detecting Depression: A Comparison between Spontaneous and Read Speech. In Proceedings of the ICASSP 2013, Vancouver, Canada, May 2013. IEEE, pp. 7547–7551

    Google Scholar 

  • S. Alghowinem, R. Goecke, M. Wagner, J. Epps, M. Breakspear, G. Parker, From Joyous to Clinically Depressed: Mood Detection Using Spontaneous Speech. In Proceedings of the FLAIRS Conference, 2012

    Google Scholar 

  • A. Bhattacharya, W. Wu, Z. Yang, Quality of experience evaluation of voice communication: an affect-based approach. Human-centric Comput. Info. Sci. 2(1), 1–18 (2012). doi:10.1186/2192-1962-2-7

    Article  Google Scholar 

  • M.P. Black, P.G. Georgiou, A. Katsamanis, B.R. Baucom, S.S. Narayanan. You made me do it: Classification of Blame in Married Couples’ Interactions by Fusing Automatically Derived Speech and Language Information. In Proceedings of the INTERSPEECH 2011, Florence, Italy, August 2011. ISCA, pp. 89–92

    Google Scholar 

  • M.P. Black, A. Katsamanis, B.R. Baucom, C.-C. Lee, A.C. Lammert, A. Christensen, P.G. Georgiou, S.S. Narayanan, Toward automating a human behavioral coding system for married couples’ interactions using speech acoustic features. Speech Commun. 55(1), 1–21 (2013). doi:10.1016/j.specom.2011.12.003

    Article  Google Scholar 

  • D. Bone, M. P. Black, M. Li, A. Metallinou, S. Lee, S. Narayanan, Intoxicated Speech Detection by Fusion of Speaker Normalized Hierarchical Features and GMM Supervectors. In Proceedings of the INTERSPEECH 2011, Florence, Italy, August 2011. ISCA, pp. 3217–3220

    Google Scholar 

  • D. Bone, M. Li, M.P. Black, S.S. Narayanan, Intoxicated speech detection: A fusion framework with speaker-normalized hierarchical functionals and GMM supervectors. Comput. Speech Lang. 28(2), 375–391 (2014). doi:10.1016/j.csl.2012.09.004. ISSN 0885-2308

    Article  Google Scholar 

  • O. Chapelle, B. Schölkopf, A. Zien, Semi-Supervised Learning (MIT Press, Cambridge, 2006)

    Book  Google Scholar 

  • J. Deng, B. Schuller, Confidence Measures in Speech Emotion Recognition Based on Semi-supervised Learning. In Proceedings of the INTERSPEECH 2012, Portland, OR, USA, September 2012. ISCA

    Google Scholar 

  • A. Dhall, R. Goecke, J. Joshi, M. Wagner, T. Gedeon, Emotion recognition in the wild challenge 2013. In Proceedings of the 15th ACM International conference on multimodal interaction (ICMI) 2013, Sydney, Australia, December 2013. ACM, pp. 509–516

    Google Scholar 

  • F. Eyben, S. Petridis, B. Schuller, M. Pantic, Audiovisual Vocal Outburst Classification in Noisy Acoustic Conditions. In Proceedings of the ICASSP 2012, Kyoto, Japan, March 2012c. IEEE, pp. 5097–5100

    Google Scholar 

  • F. Eyben, S. Petridis, B. Schuller, G. Tzimiropoulos, S. Zafeiriou, M. Pantic, Audiovisual Classification of Vocal Outbursts in Human Conversation Using Long-Short-Term Memory Networks. In Proceedings of the ICASSP 2011,Prague, Czech Republic, May 2011. IEEE, pp. 5844–5847

    Google Scholar 

  • F. Eyben, F. Weninger, F. Gross, B. Schuller. Recent developments in openSMILE, the munich open-source multimedia feature extractor. In Proceedings of the ACM Multimedia 2013, Barcelona, Spain, 2013a, ACM, pp. 835–838

    Google Scholar 

  • F. Eyben, F. Weninger, M. Woellmer, B. Schuller. openSMILE version 2.0rc1—source code, open-source research only license, http://opensmile.sourceforge.net. 2013b

  • F. Eyben, M. Wöllmer, B. Schuller, Open Emotion and Affect Recognition (openEAR), http://sourceforge.net/projects/openart/. September 2009a

  • F. Eyben, M. Wöllmer, B. Schuller. openEAR—Introducing the Munich Open-Source Emotion and Affect Recognition Toolkit. In Proceedings of the 3rd International Conference on Affective Computing and Intelligent Interaction (ACII 2009), IEEE. vol I, Amsterdam, The Netherlands, pp. 576–581, September 2009b

    Google Scholar 

  • F. Eyben, M. Wöllmer, B. Schuller. openSMILE—The Munich Versatile and Fast Open-Source Audio Feature Extractor. In Proceedings of the ACM Multimedia 2010, Florence, Italy 2010, ACM, pp. 1459–1462

    Google Scholar 

  • X. Feng, Y. Zhang, J. Glass, Speech feature denoising and dereverberation via deep autoencoders for noisy reverberant speech recognition. In Proceedings of the ICASSP 2014, Florence, Italy, May 2014. IEEE, pp. 1778–1782

    Google Scholar 

  • A.V. Ivanov, G. Riccardi, A.J. Sporka, J. Franc, Recognition of Personality Traits from Human Spoken Conversations. In Proceedings of the INTERSPEECH 2011, Florence, Italy, August 2011. ISCA, pp. 1549–1552

    Google Scholar 

  • T. Jacykiewicz F. Ringeval, Automatic Recognition of Laughter using Verbal and Non-Verbal Acoustic Features. Master’s thesis, Department of Informatics, University of Fribourg, Switzerland, January 2014

    Google Scholar 

  • J. H. Jeon, R. Xia, Y. Liu, Level of interest sensing in spoken dialog using multi-level fusion of acoustic and lexical evidence. In Proceedings of the INTERSPEECH 2010, Makuhari, Japan, 2010. ISCA, pp. 2802–2805

    Google Scholar 

  • J. Kim, N. Kumar, A. Tsiartas, M. Li, S. Narayanan, Intelligibility classification of pathological speech using fusion of multiple subsystems. In Proceedings of the INTERSPEECH 2012, Portland, OR, USA, 2012. ISCA

    Google Scholar 

  • C.-C. Lee, E. Mower, C. Busso, S. Lee, S. Narayanan, Emotion recognition using a hierarchical binary decision tree approach. In Proceedings of the INTERSPEECH 2009, Brighton, UK, 2009. ISCA, pp. 320–323

    Google Scholar 

  • C.-C. Lee, A. Katsamanis, M.P. Black, B.R. Baucom, A. Christensen, P.G. Georgiou, S.S. Narayanan, Computing vocal entrainment: A signal-derived pca-based quantification scheme with application to affect analysis in married couple interactions. Comput. Speech Lang. 28(2), 518–539 (2014)

    Article  Google Scholar 

  • G. Lukacs, M. Jani, G. Takacs, Acoustic feature mining for mixed speech and music playlist generation. In Proceedings of the 55th International Symposium ELMAR 2013, pp. 275–278, Zadar, Croatia, September 2013. IEEE

    Google Scholar 

  • I. Mporas, T. Ganchev, Estimation of unknown speaker’s height from speech. Intern. J. Speech Technol. 12(4), 149–160 (2009). doi:10.1007/s10772-010-9064-2

    Article  Google Scholar 

  • A. Muaremi, B. Arnrich, G. Tröster, Towards measuring stress with smartphones and wearable devices during workday and sleep. BioNanoScience, pp. 1–12, 2013. doi:10.1007/s12668-013-0089-2

    Google Scholar 

  • M. Nicoletti, M. Rudnicki, W. Hemmert, A model of the auditory nerve for acoustic- and electric excitation. Frontiers in Computational Neuroscience (September 2010). doi:10.3389/conf.fncom.2010.51.00104

  • N. Nikolaou, Music Emotion Classification. Doctoral dissertation, Technical University of Crete, Crete, Greece, 2011. p. 140

    Google Scholar 

  • O. Räsänen, J. Pohjalainen, Random subset feature selection in automatic recognition of developmental disorders, affective states, and level of conflict from speech. In Proceedings of the INTERSPEECH 2013, Lyon, France, 2013. ISCA, pp. 210–214

    Google Scholar 

  • D. Reidsma, K. Truong, H. van Welbergen, D. Neiberg, S. Pammi, I. de Kok, B. van Straalen, Continuous interaction with a virtual human. J. Multimod. User Interfaces (JMUI) 4(2), 97–118 (2011). doi:10.1007/s12193-011-0060-x. ISSN 1783-7677

    Article  Google Scholar 

  • F. Ringeval, A. Sonderegger, J. Sauer, D. Lalanne, Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. In Proceedings of the 2nd International Workshop on Emotion Representation, Analysis and Synthesis in Continuous Time and Space (EmoSPACE), held in conjunction with FG 2013, Shanghai, China, April 2013. IEEE, pp. 1–8

    Google Scholar 

  • T. Sainath, B. Kingsbury, B. Ramabhadran, Auto-encoder bottleneck features using deep belief networks. In Proceedings of the ICASSP 2012, pp. 4153–4156, Kyoto, Japan, March 2012. IEEE. doi:10.1109/ICASSP.2012.6288833

  • B. Schuller, A. Batliner, S. Steidl, F. Schiel, J. Krajewski, The INTERSPEECH 2011 Speaker State Challenge. In Proceedings of the INTERSPEECH 2011, Florence, Italy, August 2011a, ISCA, pp. 3201–3204

    Google Scholar 

  • B. Schuller, S. Steidl, A. Batliner, F. Burkhardt, L. Devillers, C. Müller, S. Narayanan, The INTERSPEECH 2010 Paralinguistic Challenge. In Proceedings of the INTERSPEECH 2010, Makuhari, Japan, September 2010, ISCA, pp. 2794–2797

    Google Scholar 

  • B. Schuller, S. Steidl, A. Batliner, J. Epps, F. Eyben, F. Ringeval, E. Marchi, Y. Zhang, The INTERSPEECH 2014 computational paralinguistics challenge: Cognitive and physical load. In Proceedings of the INTERSPEECH 2014, Singapore, 2014a. ISCA (to appear)

    Google Scholar 

  • B. Schuller, S. Steidl, A. Batliner, F. Jurcicek, The INTERSPEECH 2009 Emotion Challenge. In Proceedings of the INTERSPEECH 2009, (Brighton, UK, September 2009), pp. 312–315

    Google Scholar 

  • B. Schuller, S. Steidl, A. Batliner, E. Nöth, A. Vinciarelli, F. Burkhardt, R. van Son, F. Weninger, F. Eyben, T. Bocklet, G. Mohammadi, B. Weiss, The INTERSPEECH 2012 Speaker Trait Challenge. In Proceedings of the INTERSPEECH 2012, ISCA, Portland, OR, USA, September 2012a

    Google Scholar 

  • B. Schuller, S. Steidl, A. Batliner, A. Vinciarelli, K. Scherer, F. Ringeval, M. Chetouani, et al., The INTERSPEECH 2013 Computational Paralinguistics Challenge: Social Signals, Conflict, Emotion, Autism. In Proceedings of the INTERSPEECH 2013, (ISCA, Lyon, France, 2013), pp. 148–152

    Google Scholar 

  • B. Schuller, M. Valstar, R. Cowie, M. Pantic, AVEC 2012: the continuous audio/visual emotion challenge—an introduction, in Proceedings of the 14th ACM International Conference on Multimodal Interaction (ICMI) 2012, ed. by L.-P. Morency, D. Bohus, H.K. Aghajan, J. Cassell, A. Nijholt, J. Epps (ACM, Santa Monica, CA, USA, 2012b), pp. 361–362

    Google Scholar 

  • B. Schuller, M. Valstar, F. Eyben, G. McKeown, R. Cowie, M. Pantic, AVEC 2011—The First International Audio/Visual Emotion Challenge, in Proceedings of the First International Audio/Visual Emotion Challenge and Workshop, AVEC 2011, held in conjunction with the International HUMAINE Association Conference on Affective Computing and Intelligent Interaction (ACII) 2011, vol II, ed. by B. Schuller, M. Valstar, R. Cowie, M. Pantic (Springer, Memphis, TN, USA, 2011), pp. 415–424

    Google Scholar 

  • B. Schuller, Y. Zhang, F. Eyben, F. Weninger, Intelligent systems’ Holistic Evolving Analysis of Real-life Universal speaker characteristics. In B. Schuller, P. Buitelaar, L. Devillers, C. Pelachaud, T. Declerck, A. Batliner, P. Rosso, S. Gaines, eds, Proceedings of the 5th International Workshop on Emotion Social Signals, Sentiment & Linked Open Data (ES \(^3\) LOD 2014), satellite of the 9th Language Resources and Evaluation Conference (LREC) 2014, Reykjavik, Iceland, May 2014b. ELRA. p. 8

    Google Scholar 

  • M. Suzuki, S. Nakagawa, K. Kita, Emotion recognition method based on normalization of prosodic features. In Proceedings of the 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), Kaohsiung, IEEE, October 2013. doi:10.1109/APSIPA.2013.6694147, pp. 1–5

  • M. Valstar, B. Schuller, K. Smith, F. Eyben, B. Jiang, S. Bilakhia, S. Schnieder, R. Cowie, M. Pantic. AVEC 2013—The Continuous Audio/Visual Emotion and Depression Recognition Challenge, In Proceedings of the ACM Multimedia 2013, Barcelona, Spain, October 2013. ACM

    Google Scholar 

  • F. Weninger, F. Eyben, B.W. Schuller, M. Mortillaro, K.R. Scherer, On the Acoustics of Emotion in Audio: What Speech, Music and Sound have in Common. Frontiers in Psychology, 4(Article ID 292), pp. 1–12, May 2013b. doi:10.3389/fpsyg.2013.00292

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Florian Eyben .

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Eyben, F. (2016). Discussion and Outlook. In: Real-time Speech and Music Classification by Large Audio Feature Space Extraction. Springer Theses. Springer, Cham. https://doi.org/10.1007/978-3-319-27299-3_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-27299-3_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-27298-6

  • Online ISBN: 978-3-319-27299-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics