Abstract
The chain of processing in a typical Intelligent Audio Analysis system is outlined. Along its path, it leads from preprocessing to Low Level Descriptor extraction, chunking, supra segmental analysis and hierarchical functional extraction, feature reduction, feature selection and generation, parameter selection, model learning to the actual classification or regression. This can be followed by a fusion with other information streams, and encoding for the application context. The individual steps are explained in more detail.
Keywords
- Feature Space
- Linear Discriminant Analysis
- Independent Component Analysis
- Independent Component Analysis
- Universal Background Model
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
A complex system that works is invariably found to have evolved from a simple system that works. —John Gaule.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Schuller, B.: Voice and speech analysis in search of states and traits. In: Salah, A.A., Gevers, T. (eds.) Computer Analysis of Human Behavior, Advances in Pattern Recognition, chapter 9, pp. 227–253. Springer, Berlin (2011)
Hyvärinen, A., Karhunen, J., Oja, E.: Independent Component Analysis. Wiley, New York (2001)
Cichocki, A., Zdunek, R., Phan, A.H., Amari, S.-I.: Nonnegative Matrix and Tensor Factorizations. Wiley, Chichester (2009)
Batliner, A., Seppi, D., Steidl, S., Schuller, B.: Segmenting into adequate units for automatic recognition of emotion-related episodes: a speech-based approach. Advances in Human Computer Interaction, Special Issue on Emotion-Aware Natural Interaction, 2010(Article ID 782802), 1–15 (2010)
Pachet, F., Roy, P.: Analytical features: a knowledge-based approach to audio feature generation. EURASIP J. Audio Speech Music Process. 1, 1–23 (2009)
Schuller, B., Wimmer, M., Mösenlechner, L., Kern, C., Arsić, D., Rigoll, G.: Brute-forcing hierarchical functionals for paralinguistics: a waste of feature space? In: Proceedings 33rd IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2008, pp. 4501–4504. IEEE, Las Vegas (2008)
Eyben, F., Wöllmer, M., Schuller, B.: Opensmile—the munich versatile and fast open-source audio feature extractor. In: Proceedings of the 9th ACM International Conference on Multimedia, MM 2010, pp. 1459–1462. ACM, Florence (2010)
Jolliffe, I.T.: Principal Component Analysis. Springer, Berlin (2002)
Pudil, P., Novovicova, J., Kittler, J.: Floating search methods in feature selection. Pattern Recogn. Lett. 15, 1119–1125 (1994)
Ververidis, D., Kotropoulos, C.: Fast sequential floating forward selection applied to emotional speech features estimated on des and susas data collection. In: Proceedings of European Signal Processing Conference (EUSIPCO 2006), Florence (2006)
Bocklet, T., Stemmer, G., Zeissler, V., Nöth, E.: Age and gender recognition based on multiple systems—early versus late fusion. In: Proceedings of Interspeech, pp. 2830–2833. Makuhari, Japan (2010)
De Melo, C., Paiva, A.: Expression of emotions in virtual humans using lights, shadows, composition and filters, volume 4738 LNCS of Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Springer, Heidelberg (2007)
Baggia, P., Burnett, D.C., Carter, J., Dahl, D.A., McCobb, G., Raggett, D.: EMMA: Extensible MultiModal Annotation markup language (2007)
Schröder, M., Devillers, L., Karpouzis, K., Martin, J.-C., Pelachaud, C., Peter, C., Pirker, H., Schuller, B., Tao, J., Wilson, I.: What should a generic emotion markup language be able to represent? In: Paiva, A., Picard, R.W., Prada, R. (eds.) Affective Computing and Intelligent Interaction: Second International Conference, ACII 2007, Lisbon, Portugal. Proceedings, volume 4738/2007 of Lecture Notes on Computer Science (LNCS), pp. 440–451. Springer, Berlin, 12–14 Sept 2007
Mao, X., Li, Z., Bao, H.: An extension of MPML with emotion recognition functions attached, volume 5208 LNAI of Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Springer, Berlin (2008)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Schuller, B. (2013). Chain of Audio Processing. In: Intelligent Audio Analysis. Signals and Communication Technology. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36806-6_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-36806-6_4
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36805-9
Online ISBN: 978-3-642-36806-6
eBook Packages: EngineeringEngineering (R0)