Audio Classification

Sen, Soumya; Dutta, Anjan; Dey, Nilanjan

doi:10.1007/978-981-13-6098-5_4

Soumya Sen⁴,
Anjan Dutta⁵ &
Nilanjan Dey⁶

Part of the book series: SpringerBriefs in Applied Sciences and Technology ((BRIEFSINTELL))

1025 Accesses
2 Citations

Abstract

Classification falls under supervised learning. Supervised learning is a learning process from a given dataset or training dataset where both input and mapping output data are provided. The decision rules are designed by observing the training dataset to determine the category or class for future decision-making. Classification is the process of assigning an individual item or dataset to one of the number of existing categories or classes depending on the characteristics or features of the input data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Retrieved September 26, 2018, from https://www.youtube.com/watch?v=4HKqjENq9OU.
Retrieved October 22, 2018, from http://www.scholarpedia.org/article/K-nearest_neighbor.
Retrieved October 22, 2018, from https://www.jstor.org/stable/1403796?seq=1#page_scan_tab_contents.
Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1), 21–27.
Article Google Scholar
Hellman, M. E. (1970). The nearest neighbor classification rule with a reject option. IEEE Transactions on Systems Science and Cybernetics, 3, 179–185.
Article Google Scholar
Fukunaga, K., & Hostetler, L. (1975). k-nearest-neighbor bayes-risk estimation. IEEE Transactions on Information Theory, 21(3), 285–293.
Article MathSciNet Google Scholar
Dudani, S. A. (1976). The distance-weighted k-nearest-neighbor rule. IEEE Transactions on Systems Science and Cybernetics, SMC-6:325–327.
Article Google Scholar
Bailey, T., & Jain, A. (1978). A note on distance-weighted k-nearest neighbor rules. IEEE Transactions on Systems, Man, Cybernetics, 8, 311–313.
Google Scholar
Bermejo, S., & Cabestany, J. (2000). Adaptive soft k-nearest-neighbour classifiers. Pattern Recognition, 33, 1999–2005.
Article Google Scholar
Jozwik, A. (1983). A learning scheme for a fuzzy k-nn rule. Pattern Recognition Letters, 1, 287–289.
Article Google Scholar
Pao, T. L., Liao, W. Y., & Chen, Y. T. (2007). Audio-visual speech recognition with weighted KNN-based classification in mandarin database. In 2007 Third International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIHMSP 2007 (Vol. 1, pp. 39–42). IEEE.
Google Scholar
Kacur, J., Vargic, R., & Mulinka, P. (2011). Speaker identification by K-nearest neighbors: Application of PCA and LDA prior to KNN. In 2011 18th International Conference on Systems, Signals and Image Processing (IWSSIP) (pp. 1–4). IEEE.
Google Scholar
Feraru, M., & Zbancioc, M. (2013). Speech emotion recognition for SROL database using weighted KNN algorithm. In 2013 International Conference on Electronics, Computers and Artificial Intelligence (ECAI) (pp. 1–4). IEEE.
Google Scholar
Rizwan, M., & Anderson, D. V. (2014). Using k-Nearest Neighbor and speaker ranking for phoneme prediction. In 2014 13th International Conference on Machine Learning and Applications (ICMLA) (pp. 383–387). IEEE.
Google Scholar
Retrieved October 08, 2018, from http://www.statsoft.com/textbook/naive-bayes-classifier.
Russell, S., & Norvig, P. (2003). Artificial intelligence: A modern approach (2nd ed.). Prentice Hall. ISBN 978-0137903955. [1995].
Google Scholar
Fu, Z., Lu, G., Ting, K. M., & Zhang, D. (2010). Learning Naïve Bayes classifiers for music classification and retrieval. In 2010 20th International Conference on Pattern Recognition (ICPR) (pp. 4589–4592). IEEE.
Google Scholar
Sanchis, A., Juan, A., & Vidal, E. (2012). A word-based Naïve Bayes classifier for confidence estimation in speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 20(2), 565–574.
Google Scholar
Bhakre, S. K., & Bang, A. (2016). Emotion recognition on the basis of audio signal using Naïve Bayes classifier. In 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI) (pp. 2363–2367). IEEE.
Google Scholar
Quinlan, J. R. (1986). Induction of decision trees. Machine learning, 1(1), 81–106.
Google Scholar
Retrieved October 11, 2018, from https://www.youtube.com/watch?v=qDcl-FRnwSU.
Navada, A., Ansari, A. N., Patil, S., & Sonkamble, B. A. (2011). Overview of use of decision tree algorithms in machine learning. In Control and system graduate research colloquium (icsgrc), 2011 IEEE (pp. 37–42). IEEE.
Google Scholar
Akamine, M., & Ajmera, J. (2012). Decision tree-based acoustic models for speech recognition. EURASIP Journal on Audio, Speech, and Music Processing, 2012(1), 10.
Article Google Scholar
Telaar, D., & Fuhs, M. C. (2013). Accent-and speaker-specific polyphone decision trees for non-native speech recognition. In INTERSPEECH (pp. 3313–3316).
Google Scholar
Hinton, G., et al. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6), 82–97.
Article Google Scholar
Mohamed, A. R., Dahl, G., & Hinton, G. (2009). Deep belief networks for phone recognition. In Nips workshop on deep learning for speech recognition and related applications (Vol. 1, No. 9, p. 39).
Google Scholar
Mohamed, A. R., Dahl, G. E., & Hinton, G. (2012). Acoustic modeling using deep belief networks. IEEE Transactions on Audio, Speech & Language Processing, 20(1), 14–22.
Article Google Scholar
Jaitly, N., Nguyen, P., Senior, A., & Vanhoucke, V. (2012). Application of pretrained deep neural networks to large vocabulary speech recognition. In Thirteenth Annual Conference of the International Speech Communication Association.
Google Scholar
Seide, F., Li, G., & Yu, D. (2011). Conversational speech transcription using context-dependent deep neural networks. In Twelfth Annual Conference of the International Speech Communication Association.
Google Scholar
Dahl, G. E., Yu, D., Deng, L., & Acero, A. (2012). Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Transactions on Audio, Speech and Language Processing, 20(1), 30–42.
Article Google Scholar
Li, X., & Wu, X. (2014). Decision tree based state tying for speech recognition using DNN derived embeddings. In 2014 9th International Symposium on Chinese Spoken Language Processing (ISCSLP) (pp. 123–127). IEEE.
Google Scholar
Bressan, G. M., de Azevedo, B. C., & ElisangelaAp, S. L. (2017). A decision tree approach for the musical genres classification. Applied Mathematics, 11(6), 1703–1713.
Google Scholar
Wang, Y., Cao, L., Dey, N., Ashour, A. S., & Shi, F. (2017). Mice liver cirrhosis microscopic image analysis using gray level co-occurrence matrix and support vector machines. Frontiers in artificial intelligence and applications. In Proceedings of ITITS (pp. 509–515).
Google Scholar
Zemmal, N., Azizi, N., Dey, N., & Sellami, M. (2016). Adaptive semi supervised support vector machine semi supervised learning with features cooperation for breast cancer classification. Journal of Medical Imaging and Health Informatics, 6(1), 53–62.
Article Google Scholar
Wang, C., et al. (2018). Histogram of oriented gradient based plantar pressure image feature extraction and classification employing fuzzy support vector machine. Journal of Medical Imaging and Health Informatics, 8(4), 842–854.
Article Google Scholar
Retrieved October 10, 2018, from https://www.kdnuggets.com/2016/07/support-vector-machines-simple-explanation.html.
Kowalczyk, A. (2017). Support vector machines succinctly.
Google Scholar
Padrell-Sendra, J., Martin-Iglesias, D., & Diaz-de-Maria, F. (2006, September). Support vector machines for continuous speech recognition. In 2006 14th European Signal Processing Conference (pp. 1–4). IEEE.
Google Scholar
Dey, N., & Ashour, A. S. (2018). Challenges and future perspectives in speech-sources direction of arrival estimation and localization. In Direction of arrival estimation and localization of multi-speech sources (pp. 49–52). Springer, Cham.
Google Scholar
Dey, N., & Ashour, A. S. (2018). Direction of arrival estimation and localization of multi-speech sources. Springer International Publishing.
Google Scholar
Dey, N., & Ashour, A. S. (2018). Applied examples and applications of localization and tracking problem of multiple speech sources. In Direction of arrival estimation and localization of multi-speech sources (pp. 35–48). Springer, Cham.
Google Scholar
Dey, N., & Ashour, A. S. (2018). Microphone array principles. In Direction of arrival estimation and localization of multi-speech sources (pp. 5–22). Springer, Cham.
Google Scholar
Shen, P., Changjun, Z., & Chen, X. (2011). Automatic speech emotion recognition using support vector machine. In 2011 International Conference on Electronic and Mechanical Engineering and Information Technology (EMEIT) (Vol. 2, pp. 621–625). IEEE.
Google Scholar
Mahmoodi, D., Marvi, H., Taghizadeh, M., Soleimani, A., Razzazi, F., & Mahmoodi, M. (2011). Age estimation based on speech features and support vector machine. In 2011 3rd Computer Science and Electronic Engineering Conference (CEEC), (pp. 60–64). IEEE.
Google Scholar
Matoušek, J., & Tihelka, D. (2013). SVM-based detection of misannotated words in read speech corpora. In International Conference on Text, Speech and Dialogue (pp. 457–464). Springer, Heidelberg.
Chapter Google Scholar
Aida-zade, K., Xocayev, A., & Rustamov, S. (2016). Speech recognition using support vector machines. In 2016 IEEE 10th International Conference on Application of Information and Communication Technologies (AICT), (pp. 1–4). IEEE.
Google Scholar
Shi, W., & Fan, X. (2017). Speech classification based on cuckoo algorithm and support vector machines. In 2017 2nd IEEE International Conference on Computational Intelligence and Applications (ICCIA) (pp. 98–102). IEEE.
Google Scholar
Chan, M. V., Feng, X., Heinen, J. A., & Niederjohn, R. J. (1994). Classification of speech accents with neural networks. In 1994 IEEE International Conference on Neural Networks, IEEE World Congress on Computational Intelligence (Vol. 7, pp. 4483–4486). IEEE.
Google Scholar
Kohonen, T. (2012). Self-organization and associative memory (Vol. 8). Springer Science & Business Media, New York.
Google Scholar
Rumelhart, D. E., & McClelland, J. L. (1986). Parallel distributed processing: explorations in the microstructure of cognition. volume 1. foundations.
Google Scholar
Hecht-Nielsen, R. (1990). Neurocomputing. Boston: Addison-Wesley.
Google Scholar
Hansen, J. H., & Womack, B. D. (1996). Feature analysis and neural network-based classification of speech under stress. IEEE Transactions on Speech and Audio Processing, 4(4), 307–313.
Article Google Scholar
Polur, P. D., Zhou, R., Yang, J., Adnani, F., & Hobson, R. S. (2001). Isolated speech recognition using artificial neural networks. Virginia Commonwealth Univ Richmond School of Engineering.
Google Scholar
Shao, C., & Bouchard, M. (2003). Efficient classification of noisy speech using neural networks. In 2003 Proceedings of Seventh International Symposium on Signal Processing and Its Applications (Vol. 1, pp. 357–360). IEEE.
Google Scholar
Alexandre, E., Cuadra, L., Rosa-Zurera, M., & López-Ferreras, F. (2008). Speech/non-speech classification in hearing aids driven by tailored neural networks. In Speech, Audio, Image and Biomedical Signal Processing using Neural Networks (pp. 145–167). Springer, Heidelberg.
Google Scholar
Hinton, G., et al. (2012). Deep neural networks for acoustic modeling in speech recognition. Signal Processing Magazine, 29(6), 82–97. IEEE.
Google Scholar
Wang, Y., et al. (2018). Classification of mice hepatic granuloma microscopic images based on a deep convolutional neural network. Applied Soft Computing.
Google Scholar
Lan, K., Wang, D. T., Fong, S., Liu, L. S., Wong, K. K., & Dey, N. (2018). A survey of data mining and deep learning in bioinformatics. Journal of Medical Systems, 42(8), 139.
Article Google Scholar
Hu, S., Liu, M., Fong, S., Song, W., Dey, N., & Wong, R. (2018). Forecasting China future MNP by deep learning. In Behavior engineering and applications (pp. 169–210). Springer, Cham.
Google Scholar
Dey, N., Fong, S., Song, W., & Cho, K. (2017). Forecasting energy consumption from smart home sensor network by deep learning. In International Conference on Smart Trends for Information Technology and Computer Communications (pp. 255–265). Springer, Singapore.
Google Scholar
Dey, N., Ashour, A. S., & Nguyen, G. N. Recent advancement in multimedia content using deep learning.
Google Scholar
Mohamed, A., Dahl, G.E., & Hinton, G. (2012). Acoustic modeling using deep belief networks. IEEE Transactions on Audio, Speech, and Language Processing, 20(1), 14–22.
Article Google Scholar
Richardson, F., Reynolds, D., & Dehak, N. (2015). Deep neural network approaches to speaker and language recognition. IEEE Signal Processing Letters, 22(10), 1671–1675.
Article Google Scholar
Rajanna, A. R., Aryafar, K., Shokoufandeh, A., &Ptucha, R. (2015). Deep neural networks: A case study for music genre classification. In 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA) (pp. 655–660). IEEE.
Google Scholar
Dumpala, S. H., & Kopparapu, S. K. (2017). Improved speaker recognition system for stressed speech using deep neural networks. In 2017 International Joint Conference on Neural Networks (IJCNN) (pp. 1257–1264). IEEE.
Google Scholar

Download references

Author information

Authors and Affiliations

A.K. Choudhury School of Information Technology, University of Calcutta, Kolkata, West Bengal, India
Soumya Sen
Department of Information Technology, Techno India College of Technology, Kolkata, West Bengal, India
Anjan Dutta
Department of Information Technology, Techno India College of Technology, Kolkata, West Bengal, India
Nilanjan Dey

Authors

Soumya Sen
View author publications
You can also search for this author in PubMed Google Scholar
Anjan Dutta
View author publications
You can also search for this author in PubMed Google Scholar
Nilanjan Dey
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Sen, S., Dutta, A., Dey, N. (2019). Audio Classification. In: Audio Processing and Speech Recognition. SpringerBriefs in Applied Sciences and Technology(). Springer, Singapore. https://doi.org/10.1007/978-981-13-6098-5_4

Download citation

DOI: https://doi.org/10.1007/978-981-13-6098-5_4
Published: 31 January 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-6097-8
Online ISBN: 978-981-13-6098-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics