Audio Feature Selection for Recognition of Non-linguistic Vocalization Sounds

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8445)


Aiming at automatic detection of non-linguistic sounds from vocalizations, we investigate the applicability of various subsets of audio features, which were formed on the basis of ranking the relevance and the individual quality of several audio features. Specifically, based on the ranking of the large set of audio descriptors, we performed selection of subsets and evaluated them on the non-linguistic sound recognition task. During the audio parameterization process, every input utterance is converted to a single feature vector, which consists of 207 parameters. Next, a subset of this feature vector is fed to a classification model, which aims at straight estimation of the unknown sound class. The experimental evaluation showed that the feature vector composed of the 50-best ranked parameters provides a good trade-off between computational demands and accuracy, and that the best accuracy, in terms of recognition accuracy, is observed for the 150-best subset.


Non-linguistic vocalizations sound recognition audio features classification algorithms 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Cosker, D., Edge, J.: Laughing, Crying, Sneezing and Yawning: Automatic Voice Driven Animation of Non-Speech Articulation. In: Proceedings of Computer Animation and Social Agents, CASA 2009 (2009)Google Scholar
  2. 2.
    Sun, Z., Purohit, A., Yang, K., Pattan, N., Siewiorek, D., Smailagic, A., Lane, I., Zhang, P.: CoughLoc: Location-Aware Indoor Acoustic Sensing for Non-Intrusive Cough Detection. In: International Workshop on Emerging Mobile Sensing Technologies, Systems, and Applications (2011)Google Scholar
  3. 3.
    Matos, S., Birring, S.S., Pavord, I.D., Evans, D.H.: Detection of Cough Signals in Continuous Audio Recordings Using Hidden Markov Models. IEEE Transactions on Biomedical Engineering 53(6), 1078–1083 (2006)CrossRefGoogle Scholar
  4. 4.
    Reyes-Galaviz, O.F., Reyes-Garcia, C.A.: A System for the Processing of Infant Cry to Recognize Pathologies in Recently Born Babies with Neural Networks. In: SPIIRAS, ISCA (eds.) The 9th International Conference “Speech and Computer” SPECOM 2004, St. Petersburg, Russia, pp. 552–557 (September 2004)Google Scholar
  5. 5.
    Abaza, A.A., Day, J.B., Reynolds, J.S., Mahmoud, A.M., Goldsmith, W.T., McKinney, W.G., Petsonk, E.L., Frazer, D.G.: Classification of voluntary cough sound and airflow patterns for detecting abnormal pulmonary function. Cough 2009 5(1):8 (November 20, 2009)Google Scholar
  6. 6.
    Drugman, T., Urbain, J., Bauwens, N.: Audio and Contact Microphones for Cough Detection. In: Interspeech 2012, Portland, Oregon (2012)Google Scholar
  7. 7.
    Truong, K.P., van Leeuwen, D.A.: Automatic discrimination between laughter and speech. Speech Communication 49(2), 144–158 (2007)CrossRefGoogle Scholar
  8. 8.
    Chan, C.-F., Yu, E.W.M.: An Abnormal Sound Detection and Classification System for Surveillance Applications. In: 18th European Signal Processing Conference EUSIPCO 2010, August 23-27 (2010)Google Scholar
  9. 9.
    Dat Tran, H., Li, H.: Sound Event Recognition with Probabilistic Distance SVMs. IEEE Transactions on Audio, Speech, and Language Processing 19(6), 1556–1568 (2011)CrossRefGoogle Scholar
  10. 10.
    Weninger, F., Schuller, B., Wollmer, M., Rigoll, G.: Localization of Non-Linguistic Events in Spontaneous Speech by Non-Negative Matrix Factorization and Long Short-Term Memory. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2011), May 22-27, pp. 5840–5843 (2011)Google Scholar
  11. 11.
    Wöllmer, M., Marchi, E., Squartini, S., Schuller, B.: Robust Multi-stream Keyword and Non-linguistic Vocalization Detection for Computationally Intelligent Virtual Agents. In: Liu, D., Zhang, H., Polycarpou, M., Alippi, C., He, H. (eds.) ISNN 2011, Part II. LNCS, vol. 6676, pp. 496–505. Springer, Heidelberg (2011); Special Session “Computational Intelligence Algorithms for Advanced Human-Machine Interaction”. IEEE Computational Intelligence SocietyGoogle Scholar
  12. 12.
    Petridis, S., Pantic, M.: Audiovisual Discrimination Between Speech and Laughter: Why and When Visual Information Might Help. IEEE Transactions on Multimedia 13(2), 216–234 (2011)CrossRefGoogle Scholar
  13. 13.
    Escalera, S., Puertas, E., Radeva, P., Pujol, O.: Multi-modal Laughter Recognition in Video Conversations. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, June 20-25, pp. 110–115. CVPR Workshops (2009)Google Scholar
  14. 14.
    Petridis, S., Pantic, M.: Fusion of Audio and Visual Cues for Laughter Detection. In: Proceedings of the 2008 International Conference on Content-Based Image and Video Retrieval, CIVR 2008, pp. 329–338 (July 2008)Google Scholar
  15. 15.
    Petridis, S., Pantic, M.: Audiovisual Laughter Detection Based on Temporal Features. In: 2008 International Conference on Multimodal Interfaces ICMI 2008, Chania, Crete, Greece, October 20-22 (2008)Google Scholar
  16. 16.
    Drugman, T., Urbain, J., Dutoit, T.: Assessment of Audio Features for Automatic Cough Detection. In: 19th European Signal Processing Conference (Eusipco 2011), Barcelona, Spain (2011)Google Scholar
  17. 17.
    Mikami, T., Kojima, Y., Yamamoto, M., Furukawa, M.: Automatic Classification of Oral/Nasal Snoring Sounds based on the Acoustic Properties. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2012, March 25-30, pp. 609–612 (2012)Google Scholar
  18. 18.
    Duckitt, W.D., Tuomi, S.K., Niesler, T.R.: Automatic Detection, Segmentation and Assessment of Snoring from Ambient Acoustic Data. Physiological Measurement 27(10), 1047–1056 (2006)CrossRefGoogle Scholar
  19. 19.
    Lopatka, K., Czyzewski, A.: Automatic regular voice, raised voice and scream recognition employing fuzzy logic. AES 132nd Convention, Budapest, Hungary, April 26-29 (2012)Google Scholar
  20. 20.
    Gerosa, L., Valenzise, G., Tagliasacchi, M., Antonacci, F., Sarti, A.: Scream and Gunshot Detection in Noisy Environments. In: Proceedings of the 2007 IEEE Conference on Advanced Video and Signal Based Surveillance AVSS 2007, pp. 21–26 (2007)Google Scholar
  21. 21.
    BBC Postcasts & Downloads, online data collections,
  22. 22.
    The BBC Sound Effects Library Original Series (May 2006),
  23. 23.
  24. 24.
  25. 25.
    Eyben, F., Wöllmer, M., Schuller, B.: openSMILE - The Munich Versatile and Fast Open-Source Audio Feature Extractor. In: Proc. ACM Multimedia (MM), pp. 1459–1462. ACM, Florence, Italy, October 25-29, 2010 (2009) ISBN 978-1-60558-933-6Google Scholar
  26. 26.
    Slaney, M.: Auditory Toolbox. Version 2. Technical Report #1998-010. Interval Research Corporation (1998)Google Scholar
  27. 27.
    Lee, K., Slaney, M.: Automatic Chord Recognition from Audio Using an HMM with Supervised Learning. In: Proceedings of the 1st ACM Workshop on Audio and Music Computing Multimedia, AMCMM 2006, pp. 11–20 (2006)Google Scholar
  28. 28.
    Bartsch, M.A., Wakefield, G.H.: Audio Thumbnailing of Popular Music Using Chroma-Based Representations. IEEE Transactions on Multimedia 7(1), 96–104 (2005)CrossRefGoogle Scholar
  29. 29.
    Robnik-Sikonja, M., Kononenko, I.: An adaptation of Relief for attribute estimation in regression. In: 4th International Conference on Machine Learning, pp. 296–304 (1997)Google Scholar
  30. 30.
    Aha, D., Kibler, D.: Instance-based learning algorithms. Machine Learning 6, 37–66 (1991)Google Scholar
  31. 31.
    Quinlan, R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo (1993)Google Scholar
  32. 32.
    Mitchell, T.M.: Machine Learning. McGraw-Hill International Editions (1997)Google Scholar
  33. 33.
    Keerthi, S.S., Shevade, S.K., Bhattacharyya, C., Murthy, K.R.K.: Improvements to Platt’s SMO Algorithm for SVM Classifier Design. Neural Computation 13(3), 637–649 (2001)CrossRefzbMATHGoogle Scholar
  34. 34.
    Bishop, C.M.: Pattern Recognition and Machine Learning. Springer (2006)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.Artificial Intelligence Group, Wire Communications Laboratory, Dept. of Electrical and Computer EngineeringUniversity of PatrasRion-PatrasGreece
  2. 2.Dept. of Mechanical EngineeringTechnological Educational Institute of Western GreeceKoukouli-PatrasGreece

Personalised recommendations