Discussion and Outlook

Eyben, Florian

doi:10.1007/978-3-319-27299-3_7

Florian Eyben²

Part of the book series: Springer Theses ((Springer Theses))

1837 Accesses

Abstract

This chapter summarises the methods presented for automatic speech and music analysis and the results obtained for speech emotion analytics and music genre identification with the openSMILE toolkit developed by the author. Further, it is discussed here if and how the aims defined upfront were achieved and open issues for future work are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
According to Google scholar citations.
2.
It is to note that, a joint normalisation of training and test set might be best, however, it is not possible, as at training time the test set is not known (or must not be used!) but would be required to normalise the training set.

References

S. Alghowinem, R. Goecke, M. Wagner, J. Epps, M. Breakspear, G. Parker, Detecting Depression: A Comparison between Spontaneous and Read Speech. In Proceedings of the ICASSP 2013, Vancouver, Canada, May 2013. IEEE, pp. 7547–7551
Google Scholar
S. Alghowinem, R. Goecke, M. Wagner, J. Epps, M. Breakspear, G. Parker, From Joyous to Clinically Depressed: Mood Detection Using Spontaneous Speech. In Proceedings of the FLAIRS Conference, 2012
Google Scholar
A. Bhattacharya, W. Wu, Z. Yang, Quality of experience evaluation of voice communication: an affect-based approach. Human-centric Comput. Info. Sci. 2(1), 1–18 (2012). doi:10.1186/2192-1962-2-7
Article Google Scholar
M.P. Black, P.G. Georgiou, A. Katsamanis, B.R. Baucom, S.S. Narayanan. You made me do it: Classification of Blame in Married Couples’ Interactions by Fusing Automatically Derived Speech and Language Information. In Proceedings of the INTERSPEECH 2011, Florence, Italy, August 2011. ISCA, pp. 89–92
Google Scholar
M.P. Black, A. Katsamanis, B.R. Baucom, C.-C. Lee, A.C. Lammert, A. Christensen, P.G. Georgiou, S.S. Narayanan, Toward automating a human behavioral coding system for married couples’ interactions using speech acoustic features. Speech Commun. 55(1), 1–21 (2013). doi:10.1016/j.specom.2011.12.003
Article Google Scholar
D. Bone, M. P. Black, M. Li, A. Metallinou, S. Lee, S. Narayanan, Intoxicated Speech Detection by Fusion of Speaker Normalized Hierarchical Features and GMM Supervectors. In Proceedings of the INTERSPEECH 2011, Florence, Italy, August 2011. ISCA, pp. 3217–3220
Google Scholar
D. Bone, M. Li, M.P. Black, S.S. Narayanan, Intoxicated speech detection: A fusion framework with speaker-normalized hierarchical functionals and GMM supervectors. Comput. Speech Lang. 28(2), 375–391 (2014). doi:10.1016/j.csl.2012.09.004. ISSN 0885-2308
Article Google Scholar
O. Chapelle, B. Schölkopf, A. Zien, Semi-Supervised Learning (MIT Press, Cambridge, 2006)
Book Google Scholar
J. Deng, B. Schuller, Confidence Measures in Speech Emotion Recognition Based on Semi-supervised Learning. In Proceedings of the INTERSPEECH 2012, Portland, OR, USA, September 2012. ISCA
Google Scholar
A. Dhall, R. Goecke, J. Joshi, M. Wagner, T. Gedeon, Emotion recognition in the wild challenge 2013. In Proceedings of the 15th ACM International conference on multimodal interaction (ICMI) 2013, Sydney, Australia, December 2013. ACM, pp. 509–516
Google Scholar
F. Eyben, S. Petridis, B. Schuller, M. Pantic, Audiovisual Vocal Outburst Classification in Noisy Acoustic Conditions. In Proceedings of the ICASSP 2012, Kyoto, Japan, March 2012c. IEEE, pp. 5097–5100
Google Scholar
F. Eyben, S. Petridis, B. Schuller, G. Tzimiropoulos, S. Zafeiriou, M. Pantic, Audiovisual Classification of Vocal Outbursts in Human Conversation Using Long-Short-Term Memory Networks. In Proceedings of the ICASSP 2011,Prague, Czech Republic, May 2011. IEEE, pp. 5844–5847
Google Scholar
F. Eyben, F. Weninger, F. Gross, B. Schuller. Recent developments in openSMILE, the munich open-source multimedia feature extractor. In Proceedings of the ACM Multimedia 2013, Barcelona, Spain, 2013a, ACM, pp. 835–838
Google Scholar
F. Eyben, F. Weninger, M. Woellmer, B. Schuller. openSMILE version 2.0rc1—source code, open-source research only license, http://opensmile.sourceforge.net. 2013b
F. Eyben, M. Wöllmer, B. Schuller, Open Emotion and Affect Recognition (openEAR), http://sourceforge.net/projects/openart/. September 2009a
F. Eyben, M. Wöllmer, B. Schuller. openEAR—Introducing the Munich Open-Source Emotion and Affect Recognition Toolkit. In Proceedings of the 3rd International Conference on Affective Computing and Intelligent Interaction (ACII 2009), IEEE. vol I, Amsterdam, The Netherlands, pp. 576–581, September 2009b
Google Scholar
F. Eyben, M. Wöllmer, B. Schuller. openSMILE—The Munich Versatile and Fast Open-Source Audio Feature Extractor. In Proceedings of the ACM Multimedia 2010, Florence, Italy 2010, ACM, pp. 1459–1462
Google Scholar
X. Feng, Y. Zhang, J. Glass, Speech feature denoising and dereverberation via deep autoencoders for noisy reverberant speech recognition. In Proceedings of the ICASSP 2014, Florence, Italy, May 2014. IEEE, pp. 1778–1782
Google Scholar
A.V. Ivanov, G. Riccardi, A.J. Sporka, J. Franc, Recognition of Personality Traits from Human Spoken Conversations. In Proceedings of the INTERSPEECH 2011, Florence, Italy, August 2011. ISCA, pp. 1549–1552
Google Scholar
T. Jacykiewicz F. Ringeval, Automatic Recognition of Laughter using Verbal and Non-Verbal Acoustic Features. Master’s thesis, Department of Informatics, University of Fribourg, Switzerland, January 2014
Google Scholar
J. H. Jeon, R. Xia, Y. Liu, Level of interest sensing in spoken dialog using multi-level fusion of acoustic and lexical evidence. In Proceedings of the INTERSPEECH 2010, Makuhari, Japan, 2010. ISCA, pp. 2802–2805
Google Scholar
J. Kim, N. Kumar, A. Tsiartas, M. Li, S. Narayanan, Intelligibility classification of pathological speech using fusion of multiple subsystems. In Proceedings of the INTERSPEECH 2012, Portland, OR, USA, 2012. ISCA
Google Scholar
C.-C. Lee, E. Mower, C. Busso, S. Lee, S. Narayanan, Emotion recognition using a hierarchical binary decision tree approach. In Proceedings of the INTERSPEECH 2009, Brighton, UK, 2009. ISCA, pp. 320–323
Google Scholar
C.-C. Lee, A. Katsamanis, M.P. Black, B.R. Baucom, A. Christensen, P.G. Georgiou, S.S. Narayanan, Computing vocal entrainment: A signal-derived pca-based quantification scheme with application to affect analysis in married couple interactions. Comput. Speech Lang. 28(2), 518–539 (2014)
Article Google Scholar
G. Lukacs, M. Jani, G. Takacs, Acoustic feature mining for mixed speech and music playlist generation. In Proceedings of the 55th International Symposium ELMAR 2013, pp. 275–278, Zadar, Croatia, September 2013. IEEE
Google Scholar
I. Mporas, T. Ganchev, Estimation of unknown speaker’s height from speech. Intern. J. Speech Technol. 12(4), 149–160 (2009). doi:10.1007/s10772-010-9064-2
Article Google Scholar
A. Muaremi, B. Arnrich, G. Tröster, Towards measuring stress with smartphones and wearable devices during workday and sleep. BioNanoScience, pp. 1–12, 2013. doi:10.1007/s12668-013-0089-2
Google Scholar
M. Nicoletti, M. Rudnicki, W. Hemmert, A model of the auditory nerve for acoustic- and electric excitation. Frontiers in Computational Neuroscience (September 2010). doi:10.3389/conf.fncom.2010.51.00104
N. Nikolaou, Music Emotion Classification. Doctoral dissertation, Technical University of Crete, Crete, Greece, 2011. p. 140
Google Scholar
O. Räsänen, J. Pohjalainen, Random subset feature selection in automatic recognition of developmental disorders, affective states, and level of conflict from speech. In Proceedings of the INTERSPEECH 2013, Lyon, France, 2013. ISCA, pp. 210–214
Google Scholar
D. Reidsma, K. Truong, H. van Welbergen, D. Neiberg, S. Pammi, I. de Kok, B. van Straalen, Continuous interaction with a virtual human. J. Multimod. User Interfaces (JMUI) 4(2), 97–118 (2011). doi:10.1007/s12193-011-0060-x. ISSN 1783-7677
Article Google Scholar
F. Ringeval, A. Sonderegger, J. Sauer, D. Lalanne, Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. In Proceedings of the 2nd International Workshop on Emotion Representation, Analysis and Synthesis in Continuous Time and Space (EmoSPACE), held in conjunction with FG 2013, Shanghai, China, April 2013. IEEE, pp. 1–8
Google Scholar
T. Sainath, B. Kingsbury, B. Ramabhadran, Auto-encoder bottleneck features using deep belief networks. In Proceedings of the ICASSP 2012, pp. 4153–4156, Kyoto, Japan, March 2012. IEEE. doi:10.1109/ICASSP.2012.6288833
B. Schuller, A. Batliner, S. Steidl, F. Schiel, J. Krajewski, The INTERSPEECH 2011 Speaker State Challenge. In Proceedings of the INTERSPEECH 2011, Florence, Italy, August 2011a, ISCA, pp. 3201–3204
Google Scholar
B. Schuller, S. Steidl, A. Batliner, F. Burkhardt, L. Devillers, C. Müller, S. Narayanan, The INTERSPEECH 2010 Paralinguistic Challenge. In Proceedings of the INTERSPEECH 2010, Makuhari, Japan, September 2010, ISCA, pp. 2794–2797
Google Scholar
B. Schuller, S. Steidl, A. Batliner, J. Epps, F. Eyben, F. Ringeval, E. Marchi, Y. Zhang, The INTERSPEECH 2014 computational paralinguistics challenge: Cognitive and physical load. In Proceedings of the INTERSPEECH 2014, Singapore, 2014a. ISCA (to appear)
Google Scholar
B. Schuller, S. Steidl, A. Batliner, F. Jurcicek, The INTERSPEECH 2009 Emotion Challenge. In Proceedings of the INTERSPEECH 2009, (Brighton, UK, September 2009), pp. 312–315
Google Scholar
B. Schuller, S. Steidl, A. Batliner, E. Nöth, A. Vinciarelli, F. Burkhardt, R. van Son, F. Weninger, F. Eyben, T. Bocklet, G. Mohammadi, B. Weiss, The INTERSPEECH 2012 Speaker Trait Challenge. In Proceedings of the INTERSPEECH 2012, ISCA, Portland, OR, USA, September 2012a
Google Scholar
B. Schuller, S. Steidl, A. Batliner, A. Vinciarelli, K. Scherer, F. Ringeval, M. Chetouani, et al., The INTERSPEECH 2013 Computational Paralinguistics Challenge: Social Signals, Conflict, Emotion, Autism. In Proceedings of the INTERSPEECH 2013, (ISCA, Lyon, France, 2013), pp. 148–152
Google Scholar
B. Schuller, M. Valstar, R. Cowie, M. Pantic, AVEC 2012: the continuous audio/visual emotion challenge—an introduction, in Proceedings of the 14th ACM International Conference on Multimodal Interaction (ICMI) 2012, ed. by L.-P. Morency, D. Bohus, H.K. Aghajan, J. Cassell, A. Nijholt, J. Epps (ACM, Santa Monica, CA, USA, 2012b), pp. 361–362
Google Scholar
B. Schuller, M. Valstar, F. Eyben, G. McKeown, R. Cowie, M. Pantic, AVEC 2011—The First International Audio/Visual Emotion Challenge, in Proceedings of the First International Audio/Visual Emotion Challenge and Workshop, AVEC 2011, held in conjunction with the International HUMAINE Association Conference on Affective Computing and Intelligent Interaction (ACII) 2011, vol II, ed. by B. Schuller, M. Valstar, R. Cowie, M. Pantic (Springer, Memphis, TN, USA, 2011), pp. 415–424
Google Scholar
B. Schuller, Y. Zhang, F. Eyben, F. Weninger, Intelligent systems’ Holistic Evolving Analysis of Real-life Universal speaker characteristics. In B. Schuller, P. Buitelaar, L. Devillers, C. Pelachaud, T. Declerck, A. Batliner, P. Rosso, S. Gaines, eds, Proceedings of the 5th International Workshop on Emotion Social Signals, Sentiment & Linked Open Data (ES \(^3\) LOD 2014), satellite of the 9th Language Resources and Evaluation Conference (LREC) 2014, Reykjavik, Iceland, May 2014b. ELRA. p. 8
Google Scholar
M. Suzuki, S. Nakagawa, K. Kita, Emotion recognition method based on normalization of prosodic features. In Proceedings of the 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), Kaohsiung, IEEE, October 2013. doi:10.1109/APSIPA.2013.6694147, pp. 1–5
M. Valstar, B. Schuller, K. Smith, F. Eyben, B. Jiang, S. Bilakhia, S. Schnieder, R. Cowie, M. Pantic. AVEC 2013—The Continuous Audio/Visual Emotion and Depression Recognition Challenge, In Proceedings of the ACM Multimedia 2013, Barcelona, Spain, October 2013. ACM
Google Scholar
F. Weninger, F. Eyben, B.W. Schuller, M. Mortillaro, K.R. Scherer, On the Acoustics of Emotion in Audio: What Speech, Music and Sound have in Common. Frontiers in Psychology, 4(Article ID 292), pp. 1–12, May 2013b. doi:10.3389/fpsyg.2013.00292

Download references

Author information

Authors and Affiliations

Institute for Human-Machine Communication (MMK), Technische Universität München, Munich, Germany
Florian Eyben

Authors

Florian Eyben
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Florian Eyben .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Eyben, F. (2016). Discussion and Outlook. In: Real-time Speech and Music Classification by Large Audio Feature Space Extraction. Springer Theses. Springer, Cham. https://doi.org/10.1007/978-3-319-27299-3_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-27299-3_7
Published: 25 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27298-6
Online ISBN: 978-3-319-27299-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics