Abstract
Automatic identification of activities can be used to provide information to caregivers of persons with dementia for identifying assistance needs. Environmental audio provides significant and representative information of the context, making microphones a choice to identify activities automatically. However, in real situations, the audio captured by microphones comes from overlapping sound sources, making its identification a challenge for audio analysis and retrieval. In this paper we propose a succinct representation of the signal by measuring the multiband spectral entropy of the signal frame by frame, followed by a cosine transform and binary codification, we call this the Cosine Multi-Band Spectral Entropy Signature (CMBSES). To test our proposal, we created a database of a mix-up of triples from a collection of nine environmental sounds in four different signal-to-noise ratios (SNR). We codified both the original sounds and the triples and then searched all the original sounds in the mix-up collection. To establish a ground truth we also tested the same database with 48 people of assorted ages. Our feature extraction outperforms the state-of-the-art Mel Frequency Cepstral Coefficients (MFCC) and it also surpass humans in the experiment.
Chapter PDF
References
Rialle, V., Ollivet, C., Guigui, C., Hervé, C.: What do family caregivers of alzheimer’s disease patients desire in smart home technologies? CoRR abs/0904.0437 (2009)
Morris, M., Lundell, J., Dishman, E., Needham, B.: New Perspectives on Ubiquitous Computing from Ethnographic Study of Elders with Cognitive Decline. In: Dey, A.K., Schmidt, A., McCarthy, J.F. (eds.) UbiComp 2003. LNCS, vol. 2864, pp. 227–242. Springer, Heidelberg (2003)
Lane, N.D., Miluzzo, E., Lu, H., Peebles, D., Choudhury, T., Campbell, A.T.: A survey of mobile phone sensing. Comm. Mag. 48, 140–150 (2010)
Potamitis, I., Ganchev, T.: Generalized recognition of sound events: Approaches and applications, pp. 41–79 (2008)
Wichern, G., Xue, J., Thornburg, H., Mechtley, B., Spanias, A.: Segmentation, indexing, and retrieval for environmental and natural sounds. Trans. Audio, Speech and Lang. Proc. 18, 688–707 (2010)
Handte, M., Iqbal, U., Apolinarski, W., Marrón, P.J.: Challenges in ubiquitous context recognition with personal mobile devices. In: Proceedings of the 4th ACM International Workshop on Context-Awareness for Self-Managing Systems, CASEMANS 2010, pp. 6:40–6:45. ACM, New York (2010)
Niessen, M.E., van Maanen, L., Andringa, T.C.: Disambiguating sounds through context. In: Proceedings of the 2008 IEEE International Conference on Semantic Computing, pp. 88–95. IEEE Computer Society, Washington, DC (2008)
Bronkhorst, A.W.: The cocktail party phenomenon: A review of research on speech intelligibility in multiple-talker conditions. Acta Acustica United with Acustica, 117–128 (January 2000)
Mitrovic, D., Zeppelzauer, M., Breiteneder, C.: Features for content-based audio retrieval. Advances in Computers 78, 71–150 (2010)
Ward, J.A., Lukowicz, P., Troster, G., Starner, T.E.: Activity recognition of assembly tasks using body-worn microphones and accelerometers. IEEE Trans. Pattern Anal. Mach. Intell. 28, 1553–1567 (2006)
Min, C.H., Ince, N.F., Tewfik, A.H.: Number Eusipco. In: Early Morning Activity Detection Using Acoustics and Wearable Wireless Sensors (2008)
Kern, N., Schiele, B., Schmidt, A.: Recognizing context for annotating a live life recording. Personal Ubiquitous Comput. 11, 251–263 (2007)
Ma, L., Milner, B., Smith, D.: Acoustic environment classification. ACM Trans. Speech Lang. Process. 3, 1–22 (2006)
Chu, S., Narayanan, S., Kuo, C.C.J.: Environmental sound recognition with time-frequency audio features. Trans. Audio, Speech and Lang. Proc. 17, 1142–1158 (2009)
Lu, H., Pan, W., Lane, N.D., Choudhury, T., Campbell, A.T.: Soundsense: scalable sound sensing for people-centric applications on mobile phones. In: Proceedings of the 7th International Conference on Mobile Systems, Applications, and Services, MobiSys 2009, pp. 165–178. ACM, New York (2009)
Heittola, T., Mesaros, A., Virtanen, T., Eronen, A.: Sound event detection in multi-source environments using source separation. In: Workshop on Machine Listening in Multisource Environments, pp. 36–40 (2011), http://spandh.dcs.shef.ac.uk/projects/chime/workshop/
Camarena-Ibarrola, A., Chávez, E., Tellez, E.S.: Robust Radio Broadcast Monitoring Using a Multi-Band Spectral Entropy Signature. In: Bayro-Corrochano, E., Eklundh, J.-O. (eds.) CIARP 2009. LNCS, vol. 5856, pp. 587–594. Springer, Heidelberg (2009)
Sigurdsson, S., Petersen, K.B.,, T.L.S.: Mel frequency cepstral coefficients: An evaluation of robustness of mp3 encoded music. In: ISMIR, pp. 286–289 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Beltrán-Márquez, J., Chávez, E., Favela, J. (2012). Environmental Sound Recognition by Measuring Significant Changes in the Spectral Entropy. In: Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F., Olvera López, J.A., Boyer, K.L. (eds) Pattern Recognition. MCPR 2012. Lecture Notes in Computer Science, vol 7329. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31149-9_34
Download citation
DOI: https://doi.org/10.1007/978-3-642-31149-9_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31148-2
Online ISBN: 978-3-642-31149-9
eBook Packages: Computer ScienceComputer Science (R0)