Advertisement

International Journal of Speech Technology

, Volume 21, Issue 4, pp 825–836 | Cite as

Large scale data based audio scene classification

  • E. Sophiya
  • S. Jothilakshmi
Article
  • 49 Downloads

Abstract

Artificial Intelligence and Machine learning has been used by many research groups for processing large scale data known as big data. Machine learning techniques to handle large scale complex datasets are expensive to process computation. Apache Spark framework called spark MLlib is becoming a popular platform for handling big data analysis and it is used for many machine learning problems such as classification, regression and clustering. In this work, Apache Spark and the advanced machine learning architecture of a Deep Multilayer Perceptron (MLP), is proposed for Audio Scene Classification. Log Mel band features are used to represent the characteristics of the input audio scenes. The parameters of the DNN are set according to the DNN baseline of DCASE 2017 challenge. The system is evaluated with TUT dataset (2017) and the result is compared with the baseline provided.

Keywords

Big data analytics Machine learning Apache spark MLlib Audio processing Audio scene analysis Audio scene classification Deep learning Audio features 

References

  1. Abeber, J., Mimilakis, S. I., Grafe, R., & Lukashevich, H. (2017). Acoustic Scene classification by combining autoencoder-based dimensionality reduction and convolutional neural networks. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2017).Google Scholar
  2. Bisot, V., Serizel, R., Essid, S., & Richard, G. (2017). Nonnegative feature learning methods for acoustic scene classification. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE 2017).Google Scholar
  3. Bouguelia, M. R., Verikas, A., Nowaczyk, S., & Santosh, K. C. (2017). Agreeing to disagree: Active learning with noisy labels without crowdsourcing. International Journal of Machine Learning and Cybernetics, 9(8), 1307–1319.CrossRefGoogle Scholar
  4. Bugalho, M., Portelo, J., Trancoso, I., Pellegrini, T. S., & Abad, A. (2009). Detecting audio events for semantic video search. Interspeech, 2009, 1151–1154.Google Scholar
  5. Cai, R., Lu, L., Hanjalic, A., Zhang, H. J., & Cai, L. H. (2006). A flexible framework for key audio effects detection and auditory context inference. IEEE Transactions on Audio, Speech, and Language Processing, 14(3), 1026–1039.CrossRefGoogle Scholar
  6. Candel, A., Lanford, J., LeDell, E., Parmar, V., & Arora, A. (2015) Deep learning with H2O, by H2O.ai, c.Google Scholar
  7. Cotton, C. V., & Ellis, D. P. W. (2011). Spectral vs spectro-temporal features for acoustic event detection. In Applications of Signal Processing to Audio and Acoustics (WASPAA), 2011 IEEE Workshop on (pp. 69–72). IEEE.Google Scholar
  8. Dahl, G. E., Sainath, T. N., & Hinton, G. E. (2013). Improving deep neural networks for lvcsr using rectifier linear units and dropout. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on (pp. 8609–8613). IEEE.Google Scholar
  9. Dey, N., & Ashour, A. S. (2018). Challenges and future perspectives in speech-sources direction of arrival estimation and localization. In Direction of arrival estimation and localization of multi-speech sources (pp. 35–48). Cham: Springer.CrossRefGoogle Scholar
  10. Dey, N., & Ashour, A. S. (2018). Applied examples and applications of localization and tracking problem of multiple speech sources. In Direction of arrival estimation and localization of multi-speech sources (pp. 35–48). Cham: Springer.CrossRefGoogle Scholar
  11. Fonseca, E., Gong, R., Bogdanov, D., Slizovskaia, O., Gomez Gutierrez, E., & Serra, X. (2017). Acoustic scene classification by ensembling gradient boosting machine and convolutional neural networks. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2017).Google Scholar
  12. Gupta, A., Thakur, H. K., Shrivastava, R., Kumar, P., & Nag, S. (2017). A big data analysis framework using apache spark and deep learning. In Data Mining Workshops (ICDMW), 2017 IEEE International Conference on (pp. 9–16). IEEE.Google Scholar
  13. Han, Y., Park, J., & Lee, K. (2017). Convolutional neural networks with binaural representations and background subtraction for acoustic scene classification. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2017).Google Scholar
  14. Heittola, T., Mesaros, A., Eronen, A., & Virtanen, T. (2013). Context-dependent sound event detection. EURASIP Journal on Audio, Speech, and Music Processing, 1, 1–13.CrossRefGoogle Scholar
  15. Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A. R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T. N., & Kingsbury, B. (2012). Deep neural network for acoustic modeling in speech recognition. IEEE Signal Processing Magazine, 29, 82–97.CrossRefGoogle Scholar
  16. Jimenez, A., Elizalde, B., & Raj, B. (2017). Acoustic scene classification using shiftinvariant kernels and random features. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2017).Google Scholar
  17. Kong, Q., Sobieraj, I., Wang, W., & Plumbley, M. D. (2016). Deep neural network baseline for dcase challenge 2016. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2016).Google Scholar
  18. Kumar, A., & Raj, B. (2016). Audio event and scene recognition: A unified approach using strongly and weakly labeled data. In Neural Networks (IJCNN), 2017 International Joint Conference on (pp. 3475–3482). IEEE.Google Scholar
  19. Laffitte, P., Sodoyer, D., Tatkeu, C., & Girin, L. (2016). Deep neural networks for automatic detection of screams and shouted speech in subway trains. In Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on (pp. 6460–6464). IEEE.Google Scholar
  20. Lee, H., Pham, P., Largman, Y., & Ng, A. Y. (2009). Unsuper-vised feature learning for audio classification using convolutional deep belief networks. Advances in Neural Information Processing Systems, 2009, 1096–1104.Google Scholar
  21. Lim, H., Park, J., & Han, Y. (2017). Rare sound event detection using 1d convolutional recurrent neural networks. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2017).Google Scholar
  22. Mesaros, A., Heittola, T., & Klapuri, A. (2011). Latent semantic analysis in sound event detection. In Signal Processing Conference, 2011 19th European (pp. 1307–1311). IEEE.Google Scholar
  23. Mesaros, A., Heittola, T., & Virtanen, T. (2016). TUT database for acoustic scene classification and sound event detection. In Signal Processing Conference (EUSIPCO), 2016 24th European (pp. 1128–1132). IEEE.Google Scholar
  24. Mukherjee, H., Obaidullah, S. M., Santosh, K. C., Phadikar, S., & Roy, K. (2018). Line spectral frequency–based features and extreme learning machine for voice activity detection from audio signal. International Journal of Speech Technology, 2018, 1–8.Google Scholar
  25. Nam, J., Hyung, Z., & Lee, K. (2013). Acoustic scene classification using sparse feature learning and selective max-pooling by event detection. In IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE2013).Google Scholar
  26. Phan, H., Hertel, L., Maass, M., Koch, P., Mazur, R., & Mertins, A. (2017). Improved audio scene classification based on label-tree embeddings and convolutional neural networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(6), 1278–1290.CrossRefGoogle Scholar
  27. Rakotomamonjy, A. (2017). Supervised representation learning for audio scene classification. IEEE/ACM Transactions on Audio, Speech, And Language Processing, 25(6), 1253–1265.CrossRefGoogle Scholar
  28. Schroder, J., Moritz, N., Anemuller, J., Goetze, S., & Kollmeier, B. (2017). Classifier architectures for acoustic scenes and events: Implications for DNNs, TDNNs, and perceptual features from DCASE 2016. IEEE/ACM Transactions on Audio, Speech and Language Processing, 25(6), 1304–1314.CrossRefGoogle Scholar
  29. Schroder, J., Wabnik, S., van Hengel, P. W. J., & Gotze, S. (2011). Detection and classification of acoustic events for in-home care. In Ambient assisted living (pp. 181–195). Berlin: Springer.CrossRefGoogle Scholar
  30. Vajda, S., & Santosh, K. C. (2017). A fast k-nearest neighbor classifier using unsupervised clustering. In Recent trends in image processing and pattern recognition, CCIS (Vol. 709, pp. 185–193). Singapore: Springer.CrossRefGoogle Scholar
  31. Valenti, M., Squartini, S., Diment, A., Parascandolo, G., & Virtanen, T. (2016). A convolutional neural network approach for acoustic scene classification. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2016).Google Scholar
  32. Valenzise, G., Gerosa, L., Tagliasacchi, M., Antonacci, F., & Sarti, A. (2007). Scream and gunshot detection and localization for audio surveillance systems. In IEEE International Conference on Advanced video and Signal based Surveillance.Google Scholar
  33. Wang, C. H., You, J. K., & Liu, Y. W. (2017). Sound event detection from real-life audio by training a long short-term memory network with mono and stereo features. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2017).Google Scholar
  34. Wang, D. L., & Brown, G. J. (2006). Fundamentals of computational auditory scene analysis: Principles, algorithms, and applications. Hoboken: Wiley.CrossRefGoogle Scholar
  35. Xu, M., Xu, C., Duan, L., Jin, J. S., & Luo, S. (2008). Audio keywords generation for sports video analysis. ACM Transactions on Multimedia Computing, Communications, and Applications, 4(2), 1–23.CrossRefGoogle Scholar
  36. Xu, Y., Huang, Q., Wang, W., & Plumbley, M. D. (2016). Hierar-chical learning for Dnn-based acoustic scene classification. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2016).Google Scholar
  37. Xu, Y., Huangy, Q., Wang, W., Jackson, P. J. B., & Plumbley, M. D. (2016). Fully Dnn-based multi-label regression for audio tagging. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2016).Google Scholar
  38. Zahid, S., Hussain, F., Rashid, M., Yousaf, M. H., & Habib, H. A. (2015). Optimized audio classification and segmentation algorithm by using ensemble methods. Hindawi Publishing Corporation Mathematical Problems in Engineering, Article ID 209814, p. 11.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringAnnamalai UniversityChidambaramIndia
  2. 2.Department of Information TechnologyAnnamalai UniversityChidambaramIndia

Personalised recommendations