Audio Feature Selection for Recognition of Non-linguistic Vocalization Sounds

Theodorou, Theodoros; Mporas, Iosif; Fakotakis, Nikos

doi:10.1007/978-3-319-07064-3_32

Theodoros Theodorou²²,
Iosif Mporas^22,23 &
Nikos Fakotakis²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8445))

Included in the following conference series:

Hellenic Conference on Artificial Intelligence

2751 Accesses
1 Citations

Abstract

Aiming at automatic detection of non-linguistic sounds from vocalizations, we investigate the applicability of various subsets of audio features, which were formed on the basis of ranking the relevance and the individual quality of several audio features. Specifically, based on the ranking of the large set of audio descriptors, we performed selection of subsets and evaluated them on the non-linguistic sound recognition task. During the audio parameterization process, every input utterance is converted to a single feature vector, which consists of 207 parameters. Next, a subset of this feature vector is fed to a classification model, which aims at straight estimation of the unknown sound class. The experimental evaluation showed that the feature vector composed of the 50-best ranked parameters provides a good trade-off between computational demands and accuracy, and that the best accuracy, in terms of recognition accuracy, is observed for the 150-best subset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Cosker, D., Edge, J.: Laughing, Crying, Sneezing and Yawning: Automatic Voice Driven Animation of Non-Speech Articulation. In: Proceedings of Computer Animation and Social Agents, CASA 2009 (2009)
Google Scholar
Sun, Z., Purohit, A., Yang, K., Pattan, N., Siewiorek, D., Smailagic, A., Lane, I., Zhang, P.: CoughLoc: Location-Aware Indoor Acoustic Sensing for Non-Intrusive Cough Detection. In: International Workshop on Emerging Mobile Sensing Technologies, Systems, and Applications (2011)
Google Scholar
Matos, S., Birring, S.S., Pavord, I.D., Evans, D.H.: Detection of Cough Signals in Continuous Audio Recordings Using Hidden Markov Models. IEEE Transactions on Biomedical Engineering 53(6), 1078–1083 (2006)
Article Google Scholar
Reyes-Galaviz, O.F., Reyes-Garcia, C.A.: A System for the Processing of Infant Cry to Recognize Pathologies in Recently Born Babies with Neural Networks. In: SPIIRAS, ISCA (eds.) The 9th International Conference “Speech and Computer” SPECOM 2004, St. Petersburg, Russia, pp. 552–557 (September 2004)
Google Scholar
Abaza, A.A., Day, J.B., Reynolds, J.S., Mahmoud, A.M., Goldsmith, W.T., McKinney, W.G., Petsonk, E.L., Frazer, D.G.: Classification of voluntary cough sound and airflow patterns for detecting abnormal pulmonary function. Cough 2009 5(1):8 (November 20, 2009)
Google Scholar
Drugman, T., Urbain, J., Bauwens, N.: Audio and Contact Microphones for Cough Detection. In: Interspeech 2012, Portland, Oregon (2012)
Google Scholar
Truong, K.P., van Leeuwen, D.A.: Automatic discrimination between laughter and speech. Speech Communication 49(2), 144–158 (2007)
Article Google Scholar
Chan, C.-F., Yu, E.W.M.: An Abnormal Sound Detection and Classification System for Surveillance Applications. In: 18th European Signal Processing Conference EUSIPCO 2010, August 23-27 (2010)
Google Scholar
Dat Tran, H., Li, H.: Sound Event Recognition with Probabilistic Distance SVMs. IEEE Transactions on Audio, Speech, and Language Processing 19(6), 1556–1568 (2011)
Article Google Scholar
Weninger, F., Schuller, B., Wollmer, M., Rigoll, G.: Localization of Non-Linguistic Events in Spontaneous Speech by Non-Negative Matrix Factorization and Long Short-Term Memory. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2011), May 22-27, pp. 5840–5843 (2011)
Google Scholar
Wöllmer, M., Marchi, E., Squartini, S., Schuller, B.: Robust Multi-stream Keyword and Non-linguistic Vocalization Detection for Computationally Intelligent Virtual Agents. In: Liu, D., Zhang, H., Polycarpou, M., Alippi, C., He, H. (eds.) ISNN 2011, Part II. LNCS, vol. 6676, pp. 496–505. Springer, Heidelberg (2011); Special Session “Computational Intelligence Algorithms for Advanced Human-Machine Interaction”. IEEE Computational Intelligence Society
Google Scholar
Petridis, S., Pantic, M.: Audiovisual Discrimination Between Speech and Laughter: Why and When Visual Information Might Help. IEEE Transactions on Multimedia 13(2), 216–234 (2011)
Article Google Scholar
Escalera, S., Puertas, E., Radeva, P., Pujol, O.: Multi-modal Laughter Recognition in Video Conversations. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, June 20-25, pp. 110–115. CVPR Workshops (2009)
Google Scholar
Petridis, S., Pantic, M.: Fusion of Audio and Visual Cues for Laughter Detection. In: Proceedings of the 2008 International Conference on Content-Based Image and Video Retrieval, CIVR 2008, pp. 329–338 (July 2008)
Google Scholar
Petridis, S., Pantic, M.: Audiovisual Laughter Detection Based on Temporal Features. In: 2008 International Conference on Multimodal Interfaces ICMI 2008, Chania, Crete, Greece, October 20-22 (2008)
Google Scholar
Drugman, T., Urbain, J., Dutoit, T.: Assessment of Audio Features for Automatic Cough Detection. In: 19th European Signal Processing Conference (Eusipco 2011), Barcelona, Spain (2011)
Google Scholar
Mikami, T., Kojima, Y., Yamamoto, M., Furukawa, M.: Automatic Classification of Oral/Nasal Snoring Sounds based on the Acoustic Properties. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2012, March 25-30, pp. 609–612 (2012)
Google Scholar
Duckitt, W.D., Tuomi, S.K., Niesler, T.R.: Automatic Detection, Segmentation and Assessment of Snoring from Ambient Acoustic Data. Physiological Measurement 27(10), 1047–1056 (2006)
Article Google Scholar
Lopatka, K., Czyzewski, A.: Automatic regular voice, raised voice and scream recognition employing fuzzy logic. AES 132nd Convention, Budapest, Hungary, April 26-29 (2012)
Google Scholar
Gerosa, L., Valenzise, G., Tagliasacchi, M., Antonacci, F., Sarti, A.: Scream and Gunshot Detection in Noisy Environments. In: Proceedings of the 2007 IEEE Conference on Advanced Video and Signal Based Surveillance AVSS 2007, pp. 21–26 (2007)
Google Scholar
BBC Postcasts & Downloads, online data collections, http://www.bbc.co.uk/podcasts/series/globalnews
The BBC Sound Effects Library Original Series (May 2006), http://www.sound-ideas.com
http://www.partnersinrhyme.com/soundfx/human.shtml
http://soundbible.com/tags-laugh.html
Eyben, F., Wöllmer, M., Schuller, B.: openSMILE - The Munich Versatile and Fast Open-Source Audio Feature Extractor. In: Proc. ACM Multimedia (MM), pp. 1459–1462. ACM, Florence, Italy, October 25-29, 2010 (2009) ISBN 978-1-60558-933-6
Google Scholar
Slaney, M.: Auditory Toolbox. Version 2. Technical Report #1998-010. Interval Research Corporation (1998)
Google Scholar
Lee, K., Slaney, M.: Automatic Chord Recognition from Audio Using an HMM with Supervised Learning. In: Proceedings of the 1st ACM Workshop on Audio and Music Computing Multimedia, AMCMM 2006, pp. 11–20 (2006)
Google Scholar
Bartsch, M.A., Wakefield, G.H.: Audio Thumbnailing of Popular Music Using Chroma-Based Representations. IEEE Transactions on Multimedia 7(1), 96–104 (2005)
Article Google Scholar
Robnik-Sikonja, M., Kononenko, I.: An adaptation of Relief for attribute estimation in regression. In: 4th International Conference on Machine Learning, pp. 296–304 (1997)
Google Scholar
Aha, D., Kibler, D.: Instance-based learning algorithms. Machine Learning 6, 37–66 (1991)
Google Scholar
Quinlan, R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo (1993)
Google Scholar
Mitchell, T.M.: Machine Learning. McGraw-Hill International Editions (1997)
Google Scholar
Keerthi, S.S., Shevade, S.K., Bhattacharyya, C., Murthy, K.R.K.: Improvements to Platt’s SMO Algorithm for SVM Classifier Design. Neural Computation 13(3), 637–649 (2001)
Article MATH Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Artificial Intelligence Group, Wire Communications Laboratory, Dept. of Electrical and Computer Engineering, University of Patras, 26500, Rion-Patras, Greece
Theodoros Theodorou, Iosif Mporas & Nikos Fakotakis
Dept. of Mechanical Engineering, Technological Educational Institute of Western Greece, 26334, Koukouli-Patras, Greece
Iosif Mporas

Authors

Theodoros Theodorou
View author publications
You can also search for this author in PubMed Google Scholar
Iosif Mporas
View author publications
You can also search for this author in PubMed Google Scholar
Nikos Fakotakis
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Ioannina, GR 45110, Ioannina, Greece
Aristidis Likas
Department of Computer Science, University of Ioannina, P.O. Box 1186, 45110, Ioannina, Greece
Konstantinos Blekas
Hellenic Open University, GR 26335, Peribola, Patras, Greece
Dimitris Kalles

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Theodorou, T., Mporas, I., Fakotakis, N. (2014). Audio Feature Selection for Recognition of Non-linguistic Vocalization Sounds. In: Likas, A., Blekas, K., Kalles, D. (eds) Artificial Intelligence: Methods and Applications. SETN 2014. Lecture Notes in Computer Science(), vol 8445. Springer, Cham. https://doi.org/10.1007/978-3-319-07064-3_32

Download citation

DOI: https://doi.org/10.1007/978-3-319-07064-3_32
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07063-6
Online ISBN: 978-3-319-07064-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics