Skip to main content

Audio Feature Selection for Recognition of Non-linguistic Vocalization Sounds

  • Conference paper
Book cover Artificial Intelligence: Methods and Applications (SETN 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8445))

Included in the following conference series:

Abstract

Aiming at automatic detection of non-linguistic sounds from vocalizations, we investigate the applicability of various subsets of audio features, which were formed on the basis of ranking the relevance and the individual quality of several audio features. Specifically, based on the ranking of the large set of audio descriptors, we performed selection of subsets and evaluated them on the non-linguistic sound recognition task. During the audio parameterization process, every input utterance is converted to a single feature vector, which consists of 207 parameters. Next, a subset of this feature vector is fed to a classification model, which aims at straight estimation of the unknown sound class. The experimental evaluation showed that the feature vector composed of the 50-best ranked parameters provides a good trade-off between computational demands and accuracy, and that the best accuracy, in terms of recognition accuracy, is observed for the 150-best subset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cosker, D., Edge, J.: Laughing, Crying, Sneezing and Yawning: Automatic Voice Driven Animation of Non-Speech Articulation. In: Proceedings of Computer Animation and Social Agents, CASA 2009 (2009)

    Google Scholar 

  2. Sun, Z., Purohit, A., Yang, K., Pattan, N., Siewiorek, D., Smailagic, A., Lane, I., Zhang, P.: CoughLoc: Location-Aware Indoor Acoustic Sensing for Non-Intrusive Cough Detection. In: International Workshop on Emerging Mobile Sensing Technologies, Systems, and Applications (2011)

    Google Scholar 

  3. Matos, S., Birring, S.S., Pavord, I.D., Evans, D.H.: Detection of Cough Signals in Continuous Audio Recordings Using Hidden Markov Models. IEEE Transactions on Biomedical Engineering 53(6), 1078–1083 (2006)

    Article  Google Scholar 

  4. Reyes-Galaviz, O.F., Reyes-Garcia, C.A.: A System for the Processing of Infant Cry to Recognize Pathologies in Recently Born Babies with Neural Networks. In: SPIIRAS, ISCA (eds.) The 9th International Conference “Speech and Computer” SPECOM 2004, St. Petersburg, Russia, pp. 552–557 (September 2004)

    Google Scholar 

  5. Abaza, A.A., Day, J.B., Reynolds, J.S., Mahmoud, A.M., Goldsmith, W.T., McKinney, W.G., Petsonk, E.L., Frazer, D.G.: Classification of voluntary cough sound and airflow patterns for detecting abnormal pulmonary function. Cough 2009 5(1):8 (November 20, 2009)

    Google Scholar 

  6. Drugman, T., Urbain, J., Bauwens, N.: Audio and Contact Microphones for Cough Detection. In: Interspeech 2012, Portland, Oregon (2012)

    Google Scholar 

  7. Truong, K.P., van Leeuwen, D.A.: Automatic discrimination between laughter and speech. Speech Communication 49(2), 144–158 (2007)

    Article  Google Scholar 

  8. Chan, C.-F., Yu, E.W.M.: An Abnormal Sound Detection and Classification System for Surveillance Applications. In: 18th European Signal Processing Conference EUSIPCO 2010, August 23-27 (2010)

    Google Scholar 

  9. Dat Tran, H., Li, H.: Sound Event Recognition with Probabilistic Distance SVMs. IEEE Transactions on Audio, Speech, and Language Processing 19(6), 1556–1568 (2011)

    Article  Google Scholar 

  10. Weninger, F., Schuller, B., Wollmer, M., Rigoll, G.: Localization of Non-Linguistic Events in Spontaneous Speech by Non-Negative Matrix Factorization and Long Short-Term Memory. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2011), May 22-27, pp. 5840–5843 (2011)

    Google Scholar 

  11. Wöllmer, M., Marchi, E., Squartini, S., Schuller, B.: Robust Multi-stream Keyword and Non-linguistic Vocalization Detection for Computationally Intelligent Virtual Agents. In: Liu, D., Zhang, H., Polycarpou, M., Alippi, C., He, H. (eds.) ISNN 2011, Part II. LNCS, vol. 6676, pp. 496–505. Springer, Heidelberg (2011); Special Session “Computational Intelligence Algorithms for Advanced Human-Machine Interaction”. IEEE Computational Intelligence Society

    Google Scholar 

  12. Petridis, S., Pantic, M.: Audiovisual Discrimination Between Speech and Laughter: Why and When Visual Information Might Help. IEEE Transactions on Multimedia 13(2), 216–234 (2011)

    Article  Google Scholar 

  13. Escalera, S., Puertas, E., Radeva, P., Pujol, O.: Multi-modal Laughter Recognition in Video Conversations. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, June 20-25, pp. 110–115. CVPR Workshops (2009)

    Google Scholar 

  14. Petridis, S., Pantic, M.: Fusion of Audio and Visual Cues for Laughter Detection. In: Proceedings of the 2008 International Conference on Content-Based Image and Video Retrieval, CIVR 2008, pp. 329–338 (July 2008)

    Google Scholar 

  15. Petridis, S., Pantic, M.: Audiovisual Laughter Detection Based on Temporal Features. In: 2008 International Conference on Multimodal Interfaces ICMI 2008, Chania, Crete, Greece, October 20-22 (2008)

    Google Scholar 

  16. Drugman, T., Urbain, J., Dutoit, T.: Assessment of Audio Features for Automatic Cough Detection. In: 19th European Signal Processing Conference (Eusipco 2011), Barcelona, Spain (2011)

    Google Scholar 

  17. Mikami, T., Kojima, Y., Yamamoto, M., Furukawa, M.: Automatic Classification of Oral/Nasal Snoring Sounds based on the Acoustic Properties. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2012, March 25-30, pp. 609–612 (2012)

    Google Scholar 

  18. Duckitt, W.D., Tuomi, S.K., Niesler, T.R.: Automatic Detection, Segmentation and Assessment of Snoring from Ambient Acoustic Data. Physiological Measurement 27(10), 1047–1056 (2006)

    Article  Google Scholar 

  19. Lopatka, K., Czyzewski, A.: Automatic regular voice, raised voice and scream recognition employing fuzzy logic. AES 132nd Convention, Budapest, Hungary, April 26-29 (2012)

    Google Scholar 

  20. Gerosa, L., Valenzise, G., Tagliasacchi, M., Antonacci, F., Sarti, A.: Scream and Gunshot Detection in Noisy Environments. In: Proceedings of the 2007 IEEE Conference on Advanced Video and Signal Based Surveillance AVSS 2007, pp. 21–26 (2007)

    Google Scholar 

  21. BBC Postcasts & Downloads, online data collections, http://www.bbc.co.uk/podcasts/series/globalnews

  22. The BBC Sound Effects Library Original Series (May 2006), http://www.sound-ideas.com

  23. http://www.partnersinrhyme.com/soundfx/human.shtml

  24. http://soundbible.com/tags-laugh.html

  25. Eyben, F., Wöllmer, M., Schuller, B.: openSMILE - The Munich Versatile and Fast Open-Source Audio Feature Extractor. In: Proc. ACM Multimedia (MM), pp. 1459–1462. ACM, Florence, Italy, October 25-29, 2010 (2009) ISBN 978-1-60558-933-6

    Google Scholar 

  26. Slaney, M.: Auditory Toolbox. Version 2. Technical Report #1998-010. Interval Research Corporation (1998)

    Google Scholar 

  27. Lee, K., Slaney, M.: Automatic Chord Recognition from Audio Using an HMM with Supervised Learning. In: Proceedings of the 1st ACM Workshop on Audio and Music Computing Multimedia, AMCMM 2006, pp. 11–20 (2006)

    Google Scholar 

  28. Bartsch, M.A., Wakefield, G.H.: Audio Thumbnailing of Popular Music Using Chroma-Based Representations. IEEE Transactions on Multimedia 7(1), 96–104 (2005)

    Article  Google Scholar 

  29. Robnik-Sikonja, M., Kononenko, I.: An adaptation of Relief for attribute estimation in regression. In: 4th International Conference on Machine Learning, pp. 296–304 (1997)

    Google Scholar 

  30. Aha, D., Kibler, D.: Instance-based learning algorithms. Machine Learning 6, 37–66 (1991)

    Google Scholar 

  31. Quinlan, R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo (1993)

    Google Scholar 

  32. Mitchell, T.M.: Machine Learning. McGraw-Hill International Editions (1997)

    Google Scholar 

  33. Keerthi, S.S., Shevade, S.K., Bhattacharyya, C., Murthy, K.R.K.: Improvements to Platt’s SMO Algorithm for SVM Classifier Design. Neural Computation 13(3), 637–649 (2001)

    Article  MATH  Google Scholar 

  34. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Theodorou, T., Mporas, I., Fakotakis, N. (2014). Audio Feature Selection for Recognition of Non-linguistic Vocalization Sounds. In: Likas, A., Blekas, K., Kalles, D. (eds) Artificial Intelligence: Methods and Applications. SETN 2014. Lecture Notes in Computer Science(), vol 8445. Springer, Cham. https://doi.org/10.1007/978-3-319-07064-3_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-07064-3_32

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-07063-6

  • Online ISBN: 978-3-319-07064-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics