Abstract
To represent the information contained in an audio (stream) in a compact way focussing on a task of interest, a parameterised form is usually chosen. These parameters describe properties of the audio usually in a highly information reduced form and typically at a considerably lower rate, such as the mean energy or pitch over a longer period of time. As different Intelligent Audio Analysis tasks are often best represented by different such ’features’, a broad selection of the most typical ones is presented. This includes description of the digitalisation and segmentation of the audio as first step. Features include intensity, zero-crossings, auto correlation, spectrum and cepstrum, linear prediction, line spectral pairs, perceptual linear prediction, formants, fundamental frequency and voicing probability, and jitter and shimmer from the speech domain. Further, music, sound, and textual descriptors are included. Then, the principle of supra-segmental brute-forcing and subsequent reduction and selection are explained. As an example serves the widely used openSMILE feature extractor.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
The ability to focus attention on important things is a defining characteristic of intelligence.
—Robert J. Shiller.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
ISO/IEC JTC 1/SC 29/WG 11 N7708.
- 2.
\((VC)^m\) here means an \(m\)-fold repetition of the string \(VC\)
- 3.
- 4.
- 5.
openNLP notation is followed for POS classes.
- 6.
Available at: http://opensmile.sourceforge.net/.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
A more detailed description can be found in the openSMILE documentation available in the download package at http://sourceforge.net/projects/opensmile/.
- 15.
openSMILE was awarded third place in the ACM Multimedia 2010 Open-Source Software Competition. It was further used as standard feature extractor for baseline computation and use by participants in six research challenges.
References
Parsons, T.: Voice and Speech Processing. McGraw-Hill (1987)
Ruske, G.: Automatische Spracherkennung, 2nd edn. Methoden der Klassifikation und Merkmalsextraktion. Oldenbourg, Munich (1993)
Oppenheim, A.V., Willsky, A.S., Hamid, S.: Signals and Systems, 2nd edn. Prentice Hall, (1996)
Wendemuth, A.: Grundlagen der digitalen Signalverarbeitung: Ein Mathematischer Zugang. Springer, Berlin (2005)
Wendemuth, A.: Grundlagen der stochastischen Sprachverarbeitung. Oldenbourg, München, Wien (2004)
Deller, J., Proakis, J., Hansen, J.: Discrete-Time Processing of Speech Signals. Macmillan Publishing Company, Yew York (1993)
O’Shaughnessy, D.: Speech Communication, 2nd edn. Adison-Wesley (1990)
Schuller, B., Rigoll, G.: Timing levels in segment-based speech emotion recognition. In: Proceedings of the 9th International Conference on Spoken Language Processing, INTERSPEECH 2006, ICSLP, ISCA, pp. 1818–1821, Pittsburgh, Sep 2006
Schuller, B., Wimmer, M., Mösenlechner, L., Kern, C., Arsić, D., Rigoll, G.: Brute-forcing hierarchical functionals for paralinguistics: a waste of feature space? In: Proceedings of the 33rd IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2008, (IEEE) pp. 4501–4504, Las Vegas, NV, April 2008
Sohn, J., Kim, N.: A statistical model-based voice activity detection. IEEE Signal Process. Lett. 6(1), 1–3 (1999)
Ramirez, J., Segura, J., Benitez, M., De La Torre, A., Rubio, A.: Efficient voice activity detection algorithms using long-term speech information. Speech Commun. 42(3), 271–287 (2004)
Ramirez, J., Segura, J., Benitez, C., Garcia, L., Rubio, A.: Statistical voice activity detection using a multiple observation likelihood ratio test. IEEE Signal Process. Lett. 12(10), 689–692 (2005)
R. Gemello, F. Mana, and R. D. Mori. Non-linear esimation of voice activity to improve automatic recognition of noisy speech. In: Proceedings of INTERSPEECH, 2005, ISCA pp. 2617–2620, Lisbon, Sept 2005
Mousazadeh, S., Cohen, I.: AR-GARCH in presence of noise: parameter estimation and its application to voice activity detection. IEEE Trans. Audio Speech Lang. Process. 19(4), 916–926 (2011)
Zwicker, E., Fastl, H.: Psychoacoustics—Facts and Models, 2nd edn. Springer, Berlin (1999)
Kießling, A.: Extraktion und Klassifikation prosodischer Merkmale in der automatischen Sprachverarbeitung. Berichte aus der Informatik. Shaker, Aachen (1997)
Furui, S.: Digital Speech Processing: Synthesis, and Recognition. Signal Processing and Communications, 2nd edn. Marcel Denker Inc, New York (1996)
Schuller, B.: Automatische Emotionserkennung aus sprachlicher und manueller Interaktion. Doctoral thesis, Technische Universität München, Munich, Germany, June (2006)
Fant, G.: Speech Sounds and Features. MIT Press, Cambridge (1973)
Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK Book (v3.4). Cambridge University Press, Cambridge, (2006)
Kabal, P., Ramachandran, R.P.: The Computation of Line Spectral Frequencies Using Chebyshev Polynomials. IEEE Trans. Acoust. Speech Signal Process. 34(6), 1419–1426 (December 1986)
Hermansky, H.: Perceptual linear predictive (PLP) analysis of speech. J. Acoust. Soc. Am. 87, 1738–1752 (1990)
Hermansky, H., Morgan, N., Bayya, A., Kohn, P.: RASTA-PLP speech analysis technique. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 121–124 (1992)
Rigoll, G.: A new algorithm for estimation of formant trajectories directly from the speech signal based on an extended Kalman-filter. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 11, pp. 1229–1232. Tokyo (1986)
Broad, D.J., Clermont, F.: Formant estimation by linear transformation of the LPC cepstrum. J. Acoust. Soc. Am. 86, 2013–2017 (1989)
McCandless, S.: An algorithm for automatic formant extraction using linear prediction spectra. IEEE Trans. Acoust. 22, 134–141 (1974)
Gläser, C., Heckmann, M., Joublin, F., Goerick, C.: Combining auditory preprocessing and bayesian estimation for robust formant tracking. IEEE Trans. Audio Speech Lang. Process. 18(2), 224–236 (2010)
Deng, L., Cui, X., Pruvenok, R., Huang, J., Momen, S., Chen, Y., Alwan A.: A database of vocal tract resonance trajectories for research in speech processing. In: Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), p. 1. Toulouse May 2006.
Fulop, S.A.: Accuracy of formant measurement for synthesized vowels using the reassigned spectrogram and comparison with linear prediction. J. Acoust. Soc. Am. 127, 2114–2117 (2010)
Miyanaga, Y., Miki, N., Nagai, N.: Adaptive identification of a time-varying ARMA speech model. IEEE Trans. Acoust. 34, 423–433 (1986)
Steiglitz, K.: On the simultaneous estimation of poles and zeros in speech analysis. IEEE Trans. Acoust. 25, 229–234 (1977)
Batliner, A., Steidl, S., Schuller, B., Seppi, D., Vogt, T., Devillers, L., Vidrascu, L., Amir, N., Kessous, L., Aharonson, V.: The impact of f0 extraction errors on the classification of prominence and emotion. In: Proceedings 16th International Congress of Phonetic Sciences, ICPhS 2007, pp. 2201–2204. Saarbrücken, Aug 2007
Hess, W.: Pitch Determination of Speech Signals. Springer, Berlin (1983)
Heckmann, M., Joublin, F., Nakadai, K.: Pitch extraction in human-robot interaction. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE/RSJ, Taipei (2010)
Hermes, D.J.: Measurement of pitch by subharmonic summation. J. Acoust. Soc. Am. 83(1), 257–264 (1988)
Ahmadi, S., Spanias, A.S.: Cepstrum-Based Pitch Detection Using a New Statistical V/UV Classification Algorithm. IEEE Trans. Speech Audio Process. 7(3), 333–338 (May 1999)
Botros, N.: Speech-pitch detection using maximum likelihood algorithm. In: Proceedings of the First Joint BMES/EMBS Conference, vol. 2. (1999)
Markel, J.: The SIFT algorithm for fundamental frequency estimation. IEEE Trans. Audio Electroacoust. 20, 367–377 (1972)
Boersma, P.: Praat, a system for doing phonetics by computer. Glot Int. 5, 341–345 (2001)
Ross, M., Shaffer, H., Cohen, A., Freudberg, R., Manley, H.: Average magnitude difference function pitch extractor. IEEE Trans. Acoust. Speech Signal Process. 22, 353–362 (1974)
Orlikoff, R.-F., Baken, R.: The effect of the heartbeat on vocal fundamental frequency perturbation. J. Sport Health Res. 32(3), 576–582 (1989)
Haji, T., Horiguchi, S., Baer, T., Gould, W.: Frequency and amplitude perturbation analysis of electroglottograph during sustained phonation. J. Acoust. Soc. Am. 80(1), 58–62 (1986)
Schuller, B.: Voice and speech analysis in search of states and traits. In: Salah, A.A., Gevers, T. (eds.) Computer Analysis of Human Behavior, Advances in Pattern Recognition, chapter 9, pp. 227–253. Springer, Heidelberg (2011)
Schuller, B., Gollan, B.: Music theoretic and perception-based features for audio key determination. J. New Music Res. 41(2), 175–193 (2012)
Harte, C.A., Sandler, M.: Automatic chord identification using a quantised chromagram. In: Proceedings of the 118th Convention of the AES, May 2005
Schuller, B., Dorfner, J., Rigoll, G.: Determination of non-prototypical valence and arousal in popular music: Features and performances. EURASIP J. Audio Speech Music Process. (Special Issue Scalable Audio Content Anal.) 735854, 19 (2010)
Schuller, B., Hörnler, B., Arsić, D., Rigoll, G.: Audio chord labeling by musiological modeling and beat-synchronization. In: Proceedings of the 10th IEEE International Conference on Multimedia and Expo, ICME 2009, IEEE, pp. 526–529. New York, July 2009
Müller, M.: Information Retrieval for Music and Motion. Springer, Berlin (2007)
Müller, M., Kurth, F., Clausen, M.: Chroma-based statistical audio features for audio matching. In: Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 275–278, Oct 2005
Müller, M., Kurth, F.: Towards structural analysis of audio recordings in the presence of mucical variations. EURASIP J. Adv. Signal Process. 89686 (2007)
Schuller, B., Dibiasi, F., Eyben, F., Rigoll, G.: Music thumbnailing incorporating harmony- and rhythm structure. In: Detyniecki, M., Leiner, U., Nürnberger, A. (eds.) Adaptive Multimedia Retrieval: 6th International Workshop, AMR 2008, Berlin, Germany, 26–27 June 2008. Revised Selected Papers. Lecture Notes in Computer Science, vol. 5811, pp. 78–88. (LNCS) Springer, Berlin (2010)
Gomez, E.: Estimating the tonality of polyphonic audio files: cognitive versus machine learning modelling strategies. In: Proceedings of the 5th International Conference on Music Information Retrieval, Barcelona (2004)
Krumhansl, C.L.: Cognitive Foundations of Musical Pitch. Oxford University Press, New York (1990)
Polzin, T.S., Waibel, A.: Emotion-sensitive human-computer interfaces. In: Proceedings of the ISCA Workshop on Speech and Emotion, pp. 201–206, Belfast (2000)
Devillers, L., Vasilescu, I., Lamel, L.: Emotion detection in task-oriented dialog corpus. In: Proceedings of the ICME 2003, IEEE, Multimedia Human-Machine Interface and Interaction, pp. 549–552, Baltimore (2003)
Schuller, B., Rigoll, G., Lang, M.: Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture. In: Proceedings of the 29th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2004, IEEE, vol. I, pp. 577–580. Montreal, May 2004
Schuller, B., Müller, R., Lang, M., Rigoll, G.: Speaker independent emotion recognition by early fusion of acoustic and linguistic features within ensembles. In: Proceedings of Interspeech, Eurospeech, ISCA, pp. 805–809. Lisbo, Sept 2005
Schuller, B., Hage, C., Schuller, D., Rigoll, G.: “mister d.j., cheer me up!”: musical and textual features for automatic mood classification. J. New Music Res. 39(1), 13–34 (2010)
Eyben, F., Wöllmer, M., Valstar, M., Gunes, H., Schuller, B., Pantic, M.: String-based audiovisual fusion of behavioural events for the assessment of dimensional affect. In: Proceedings International Workshop on Emotion Synthesis, Representation, and Analysis in Continuous spacE, EmoSPACE 2011, held in conjunction with the 9th IEEE International Conference on Automatic Face & Gesture Recognition and Workshops, FG 2011, IEEE, IEEE, pp. 322–329. Santa Barbara, CA, March 2011
Porter, M.F.: An algorithm for suffix stripping. Program |textbf3(14), 130–137 (1980)
Batliner, A., Buckow, J., Huber, R., Warnke, V., Nöth, E., Niemann, H.: Prosodic feature evaluation: brute force or well designed? In: Proceedings of the 14th International Congress of Phonetic Sciences, vol. 3, pp. 2315–2318, San Francisco, (1999)
Batliner, A., Steidl, S., Schuller, B., Seppi, D., Laskowski, K., Vogt, T., Vidrascu, L., Amir, N., Kessous, L., Aharonson, V.: Combining efforts for improving automatic classification of emotional user states. In: Proceedings 5th Slovenian and 1st International Language Technologies Conference, ISLTC 2006, Slovenian Language Technologies Society, pp. 240–245. Ljubljana, Slovenia, Oct 2006
Athanaselis, T., Bakamidis, S., Dologlou, I., Cowie, R., Douglas-Cowie, E., Cox, C.: ASR for emotional speech: clarifying the issues and enhancing performance. Neural Netw. 18, 437–444 (2005)
Wöllmer, M., Eyben, F., Keshet, J., Graves, A., Schuller, B., Rigoll, G.: Robust discriminative keyword spotting for emotionally colored spontaneous speech using bidirectional lstm networks. In: Proceedings 34th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2009, IEEE, IEEE, pp. 3949–3952. Taipei, Taiwan, April 2009
Steidl, S., Batliner, A., Seppi, D., Schuller, B.: On the impact of children’s emotional speech on acoustic and language models. EURASIP J. Audio Speech Music Process. (Special Issue on Atyp. Speech 2010) 783954, p. 14 (2010)
Seppi, D., Gerosa, M., Schuller, B., Batliner, A., Steidl, S.: Detecting problems in spoken child-computer interaction. In: Proceedings 1st Workshop on Child, Computer and Interaction, WOCCI 2008, ACM ICMI 2008 post-conference workshop, ISCA, p. 4. Chania, Greece, Oct 2008
Metze, F., Batliner, A., Eyben, F., Polzehl, T., Schuller, B., Steidl, S.: Emotion recognition using imperfect speech recognition. In: Proceedings of INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, ISCA, pp. 478–481. Makuhari, Sept 2010
Schuller, B., Müller, R., Rigoll, G., Lang, M.: Applying bayesian belief networks in approximate string matching for robust keyword-based retrieval. In: Proceedings 5th IEEE International Conference on Multimedia and Expo, ICME 2004, IEEE, vol. 3, pp. 1999–2002. Taipei, Taiwan, June 2004
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) Proceedings of 10th European Conference on Machine Learning (ECML), Chemnitz, pp. 137–142. Springer, Heidelberg (1998)
Schuller, B., Köhler, N., Müller, R., Rigoll, G.: Recognition of interest in human conversational speech. In: Proceedings of INTERSPEECH 2006, 9th International Conference on Spoken Language Processing, ICSLP, ISCA, pp. 793–796. Pittsburgh, Sept 2006
Schuller, B., Müller, R., Eyben, F., Gast, J., Hörnler, B., Wöllmer, M., Rigoll, G., Höthker, A., Konosu, H.: Being bored? recognising natural interest by extensive audiovisual integration for real-life application. Image Vis. Compu. (Special Issue Visual Multimodal Anal. Hum. Spontaneous Behav. 27(12), 1760–1774 (2009)
Batliner, A., Steidl, S., Schuller, B., Seppi, D., Vogt, T., Wagner, J., Devillers, L., Vidrascu, L., Aharonson, V., Kessous, L., Amir, N.: Whodunnit—searching for the most important feature types signalling emotion-related user states in speech. Comput. Speech Lang. (Special Issue Affect. Speech Real-Life Interact.) 25(1), 4–28 (2011)
Russell, J., Bachorowski, J., Fernandez-Dols, J.: Facial and vocal expressions of emotion. Annu. Rev. Psychol. 54, pp. 329–349 (2003)
Campbell, N., Kashioka, H., Ohara, R.: No laughing matter. In: Proceedings of Interspeech, pp. 465–468, Lisbon (2005)
Truong, K.P., van Leeuwen, D.A.: Automatic detection of laughter. In: Proceedings of Interspeech, pp. 485–488, Lisbon (2005)
Pal, P., Iyer, A., Yantorno, R.: Emotion detection from infant facial expressions and cries. Proc. ICASSP 2, 809–812 (2006)
Matos, S., Birring, S., Pavord, I., Evans, D.: Detection of cough signals in continuous audio recordings using hmm. IEEE Trans. Biomed. Eng. 53, pp. 1078–1083 (2006)
Schuller, B., Eyben, F., Rigoll, G.: Static and dynamic modelling for the recognition of non-verbal vocalisations in conversational speech. In: André, E., Dybkjaer, L., Neumann, H., Pieraccini, R., Weber, M. (eds.) Perception in Multimodal Dialogue Systems: 4th IEEE Tutorial and Research Workshop on Perception and Interactive Technologies for Speech-Based Systems, PIT 2008, Kloster Irsee, Germany, 16–18 June 2008. Lecture Notes on Computer Science (LNCS), vol. 5078, pp. 99–110. Springer, Berlin (2008)
Iurgel, U.: Automatic media monitoring using stochastic pattern recognition techniques. Ph.D thesis, Technische Universität München, Germany, (2007)
Schuller, B.: Recognizing affect from linguistic information in 3d continuous space. IEEE Trans. Affect. Comput. 2(4), 192–205 (2012)
Schuller, B., Schenk, J., Rigoll, G., Knaup, T.: “the godfather” vs. “chaos”: Comparing linguistic analysis based on online knowledge sources and bags-of-n-grams for movie review valence estimation. In: Proceedings of the 10th International Conference on Document Analysis and Recognition, ICDAR 2009, IAPR, IEEE, pp. 858–862. Barcelona July 2009
Schuller, B., Knaup, T.: Learning and knowledge-based sentiment analysis in movie review key excerpts. In: Esposito, A., Esposito, A.M., Martone, R., Müller, V., Scarpetta, G. (eds.) Toward Autonomous, Adaptive, and Context-Aware Multimodal Interfaces: Theoretical and Practical Issues: Third COST 2102 International Training School, Caserta, Italy, March 15–19, 2010, Revised Selected Papers. Lecture Notes on Computer Science, 1st edn, vol. 6456, pages 448–472. (LNCS) Springer, Heidelberg, (2011)
Jurafsky, D., Martin, J.H.: Speech and Language Processing. Prentice-Hall, Upper saddle river (2000)
Havasi, C., Speer, R., Alonso, J.: Conceptnet 3: a flexible, multilingual semantic network for common sense knowledge. In: Recent Advances in Natural Language Processing. Borovets, Sept 2007
Stone, P., Kirsh, J., Associates, C.C.: The General Inquirer: A Computer Approach to Content Analysis. MIT Press, Cambridge (1966)
Fellbaum, C. Wordnet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
Katz, B.: From sentence processing to information access on the world wide web. In: Proceedings of the AAAI Spring Symposium on Natural Language Processing for the, World Wide Web, pp. 77–86 (1997)
Yi, J., Nasukawa, T., Bunescu, R., Niblack, W.: Sentiment analyzer: extracting sentiments about a given topic using natural language processing techniques. In: Proceedings of the Third IEEE International Conference on Data Mining, pp. 427–434, Nov 2003
Morinaga, S., Yamanishi, K., Tateishi, K., Fukushima, T.: Mining product reputations on the web. In: KDD ’02: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp. 341–349, New York (2002)
Turney, P.D., Littman, M.L.: Measuring praise and criticism: inference of semantic orientation from association. ACM Trans. Inf. Syst. 21(4), 315–346 (2003)
Zhang, M., Ye, X.: A generation model to unify topic relevance and lexicon-based sentiment for opinion retrieval. In: SIGIR ’08: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, New York, pp. 411–418 (2008)
Ding, X., Liu, B., Yu, P.S.: A holistic lexicon-based approach to opinion mining. In: Proceedings of the WSDM ’08 International Conference on Web Search and Web Data Mining, ACM, New York, pp. 231–240 (2008)
Pachet, F., Roy, P.: Analytical features: a knowledge-based approach to audio feature generation. EURASIP J. Audio Speech Music Process. 153017, 23 (2009)
Eyben, F., Wöllmer, M., Schuller, B.: Opensmile—the munich versatile and fast open-source audio feature extractor. In: Proceedings of the 9th ACM International Conference on Multimedia, MM 2010, ACM, Florence, pp. 1459–1462, Oct 2010
Boersma, P., Weenink, D.: Praat: doing phonetics by computer (v. 4.3.14). http://www.praat.org/ (2005)
Fernandez, R.: A Computational Model for the Automatic Recognition of Affect in Speech. Ph.D thesis, MIT Media Arts and Science (2004)
Garner, P.N., Dines, J., Hain, T., El Hannani, A., Karafiat, M., Korchagin, D., Lincoln, M., Wan, V., Zhang, L.: Real-time asr from meetings. In Proceedings of INTERSPEECH, ISCA, Brighton 2009
McEnnis, D., McKay, C., Fujinaga, I., Depalle, P.: Jaudio: a feature extraction library. In: Proceedings of ISMIR 2005, pp. 600–603 (2005)
Lerch, A., Eisenberg, G.: FEAPI: a low level feature extraction plug-in api. In: Proceedings of the 8th International Conference on Digital Audio Effects (DAFx), Madrid 2005
Amatriain, X., Arumi, P., Garcia,D.: A framework for efficient and rapid development of cross-platform audio applications. Multimedia Syst. 14(1), 15–32 (2008)
Schuller, B., Eyben, F., Rigoll, G.: Fast and robust meter and tempo recognition for the automatic discrimination of ballroom dance styles. In: Proceedings 32nd IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2007, IEEE, vol. I, pp. 217–220. Honolulu, April 2007
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Hoare, C.A.R.: Quicksort. Comput. J. 5(1), 10–16 (1962)
Eyben, F., Wöllmer, M., Schuller, B.: Openear - introducing the munich open-source emotion and affect recognition toolkit. In: Proceedings of 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops, ACII 2009, HUMAINE Association, IEEE, vol. I, pp. 576–581. Amsterdam, Sept 2009
Schuller, B., Arsić, D., Wallhoff, F., Lang, M., Rigoll, G.: Bioanalog acoustic emotion recognition by genetic feature generation based on low-level-descriptors. In: Proceedings International Conference on Computer as a Tool, EUROCON 2005, IEEE, vol. 2, pp. 1292–1295. Belgrade, Serbia and Montenegro, Nov 2005
Schuller, B., Reiter, S., Rigoll, G.: Evolutionary feature generation in speech emotion recognition. In: Proceedings of 7th IEEE International Conference on Multimedia and Expo, ICME 2006, IEEE, pp. 5–8. Toronto, July 2006
Schuller, B., Wallhoff, F., Arsić, D., Rigoll, G.: Musical signal type discrimination based on large open feature sets. In: Proceedings of 7th IEEE International Conference on Multimedia and Expo, ICME 2006, IEEE, pp. 1089–1092. Toronto, July 2006
Kroschel, K., Rigoll, G., Schuller, B.: Statistische Informationstechnik, 5th edn. Springer, Berlin (2011)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Schuller, B. (2013). Audio Features. In: Intelligent Audio Analysis. Signals and Communication Technology. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36806-6_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-36806-6_6
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36805-9
Online ISBN: 978-3-642-36806-6
eBook Packages: EngineeringEngineering (R0)