Skip to main content

Audio Classification

  • Chapter
  • First Online:
Audio Processing and Speech Recognition

Part of the book series: SpringerBriefs in Applied Sciences and Technology ((BRIEFSINTELL))

Abstract

Classification falls under supervised learning. Supervised learning is a learning process from a given dataset or training dataset where both input and mapping output data are provided. The decision rules are designed by observing the training dataset to determine the category or class for future decision-making. Classification is the process of assigning an individual item or dataset to one of the number of existing categories or classes depending on the characteristics or features of the input data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Retrieved September 26, 2018, from https://www.youtube.com/watch?v=4HKqjENq9OU.

  2. Retrieved October 22, 2018, from http://www.scholarpedia.org/article/K-nearest_neighbor.

  3. Retrieved October 22, 2018, from https://www.jstor.org/stable/1403796?seq=1#page_scan_tab_contents.

  4. Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1), 21–27.

    Article  Google Scholar 

  5. Hellman, M. E. (1970). The nearest neighbor classification rule with a reject option. IEEE Transactions on Systems Science and Cybernetics, 3, 179–185.

    Article  Google Scholar 

  6. Fukunaga, K., & Hostetler, L. (1975). k-nearest-neighbor bayes-risk estimation. IEEE Transactions on Information Theory, 21(3), 285–293.

    Article  MathSciNet  Google Scholar 

  7. Dudani, S. A. (1976). The distance-weighted k-nearest-neighbor rule. IEEE Transactions on Systems Science and Cybernetics, SMC-6:325–327.

    Article  Google Scholar 

  8. Bailey, T., & Jain, A. (1978). A note on distance-weighted k-nearest neighbor rules. IEEE Transactions on Systems, Man, Cybernetics, 8, 311–313.

    Google Scholar 

  9. Bermejo, S., & Cabestany, J. (2000). Adaptive soft k-nearest-neighbour classifiers. Pattern Recognition, 33, 1999–2005.

    Article  Google Scholar 

  10. Jozwik, A. (1983). A learning scheme for a fuzzy k-nn rule. Pattern Recognition Letters, 1, 287–289.

    Article  Google Scholar 

  11. Pao, T. L., Liao, W. Y., & Chen, Y. T. (2007). Audio-visual speech recognition with weighted KNN-based classification in mandarin database. In 2007 Third International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIHMSP 2007 (Vol. 1, pp. 39–42). IEEE.

    Google Scholar 

  12. Kacur, J., Vargic, R., & Mulinka, P. (2011). Speaker identification by K-nearest neighbors: Application of PCA and LDA prior to KNN. In 2011 18th International Conference on Systems, Signals and Image Processing (IWSSIP) (pp. 1–4). IEEE.

    Google Scholar 

  13. Feraru, M., & Zbancioc, M. (2013). Speech emotion recognition for SROL database using weighted KNN algorithm. In 2013 International Conference on Electronics, Computers and Artificial Intelligence (ECAI) (pp. 1–4). IEEE.

    Google Scholar 

  14. Rizwan, M., & Anderson, D. V. (2014). Using k-Nearest Neighbor and speaker ranking for phoneme prediction. In 2014 13th International Conference on Machine Learning and Applications (ICMLA) (pp. 383–387). IEEE.

    Google Scholar 

  15. Retrieved October 08, 2018, from http://www.statsoft.com/textbook/naive-bayes-classifier.

  16. Russell, S., & Norvig, P. (2003). Artificial intelligence: A modern approach (2nd ed.). Prentice Hall. ISBN 978-0137903955. [1995].

    Google Scholar 

  17. Fu, Z., Lu, G., Ting, K. M., & Zhang, D. (2010). Learning Naïve Bayes classifiers for music classification and retrieval. In 2010 20th International Conference on Pattern Recognition (ICPR) (pp. 4589–4592). IEEE.

    Google Scholar 

  18. Sanchis, A., Juan, A., & Vidal, E. (2012). A word-based Naïve Bayes classifier for confidence estimation in speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 20(2), 565–574.

    Google Scholar 

  19. Bhakre, S. K., & Bang, A. (2016). Emotion recognition on the basis of audio signal using Naïve Bayes classifier. In 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI) (pp. 2363–2367). IEEE.

    Google Scholar 

  20. Quinlan, J. R. (1986). Induction of decision trees. Machine learning, 1(1), 81–106.

    Google Scholar 

  21. Retrieved October 11, 2018, from https://www.youtube.com/watch?v=qDcl-FRnwSU.

  22. Navada, A., Ansari, A. N., Patil, S., & Sonkamble, B. A. (2011). Overview of use of decision tree algorithms in machine learning. In Control and system graduate research colloquium (icsgrc), 2011 IEEE (pp. 37–42). IEEE.

    Google Scholar 

  23. Akamine, M., & Ajmera, J. (2012). Decision tree-based acoustic models for speech recognition. EURASIP Journal on Audio, Speech, and Music Processing, 2012(1), 10.

    Article  Google Scholar 

  24. Telaar, D., & Fuhs, M. C. (2013). Accent-and speaker-specific polyphone decision trees for non-native speech recognition. In INTERSPEECH (pp. 3313–3316).

    Google Scholar 

  25. Hinton, G., et al. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6), 82–97.

    Article  Google Scholar 

  26. Mohamed, A. R., Dahl, G., & Hinton, G. (2009). Deep belief networks for phone recognition. In Nips workshop on deep learning for speech recognition and related applications (Vol. 1, No. 9, p. 39).

    Google Scholar 

  27. Mohamed, A. R., Dahl, G. E., & Hinton, G. (2012). Acoustic modeling using deep belief networks. IEEE Transactions on Audio, Speech & Language Processing, 20(1), 14–22.

    Article  Google Scholar 

  28. Jaitly, N., Nguyen, P., Senior, A., & Vanhoucke, V. (2012). Application of pretrained deep neural networks to large vocabulary speech recognition. In Thirteenth Annual Conference of the International Speech Communication Association.

    Google Scholar 

  29. Seide, F., Li, G., & Yu, D. (2011). Conversational speech transcription using context-dependent deep neural networks. In Twelfth Annual Conference of the International Speech Communication Association.

    Google Scholar 

  30. Dahl, G. E., Yu, D., Deng, L., & Acero, A. (2012). Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Transactions on Audio, Speech and Language Processing, 20(1), 30–42.

    Article  Google Scholar 

  31. Li, X., & Wu, X. (2014). Decision tree based state tying for speech recognition using DNN derived embeddings. In 2014 9th International Symposium on Chinese Spoken Language Processing (ISCSLP) (pp. 123–127). IEEE.

    Google Scholar 

  32. Bressan, G. M., de Azevedo, B. C., & ElisangelaAp, S. L. (2017). A decision tree approach for the musical genres classification. Applied Mathematics, 11(6), 1703–1713.

    Google Scholar 

  33. Wang, Y., Cao, L., Dey, N., Ashour, A. S., & Shi, F. (2017). Mice liver cirrhosis microscopic image analysis using gray level co-occurrence matrix and support vector machines. Frontiers in artificial intelligence and applications. In Proceedings of ITITS (pp. 509–515).

    Google Scholar 

  34. Zemmal, N., Azizi, N., Dey, N., & Sellami, M. (2016). Adaptive semi supervised support vector machine semi supervised learning with features cooperation for breast cancer classification. Journal of Medical Imaging and Health Informatics, 6(1), 53–62.

    Article  Google Scholar 

  35. Wang, C., et al. (2018). Histogram of oriented gradient based plantar pressure image feature extraction and classification employing fuzzy support vector machine. Journal of Medical Imaging and Health Informatics, 8(4), 842–854.

    Article  Google Scholar 

  36. Retrieved October 10, 2018, from https://www.kdnuggets.com/2016/07/support-vector-machines-simple-explanation.html.

  37. Kowalczyk, A. (2017). Support vector machines succinctly.

    Google Scholar 

  38. Padrell-Sendra, J., Martin-Iglesias, D., & Diaz-de-Maria, F. (2006, September). Support vector machines for continuous speech recognition. In 2006 14th European Signal Processing Conference (pp. 1–4). IEEE.

    Google Scholar 

  39. Dey, N., & Ashour, A. S. (2018). Challenges and future perspectives in speech-sources direction of arrival estimation and localization. In Direction of arrival estimation and localization of multi-speech sources (pp. 49–52). Springer, Cham.

    Google Scholar 

  40. Dey, N., & Ashour, A. S. (2018). Direction of arrival estimation and localization of multi-speech sources. Springer International Publishing.

    Google Scholar 

  41. Dey, N., & Ashour, A. S. (2018). Applied examples and applications of localization and tracking problem of multiple speech sources. In Direction of arrival estimation and localization of multi-speech sources (pp. 35–48). Springer, Cham.

    Google Scholar 

  42. Dey, N., & Ashour, A. S. (2018). Microphone array principles. In Direction of arrival estimation and localization of multi-speech sources (pp. 5–22). Springer, Cham.

    Google Scholar 

  43. Shen, P., Changjun, Z., & Chen, X. (2011). Automatic speech emotion recognition using support vector machine. In 2011 International Conference on Electronic and Mechanical Engineering and Information Technology (EMEIT) (Vol. 2, pp. 621–625). IEEE.

    Google Scholar 

  44. Mahmoodi, D., Marvi, H., Taghizadeh, M., Soleimani, A., Razzazi, F., & Mahmoodi, M. (2011). Age estimation based on speech features and support vector machine. In 2011 3rd Computer Science and Electronic Engineering Conference (CEEC), (pp. 60–64). IEEE.

    Google Scholar 

  45. Matoušek, J., & Tihelka, D. (2013). SVM-based detection of misannotated words in read speech corpora. In International Conference on Text, Speech and Dialogue (pp. 457–464). Springer, Heidelberg.

    Chapter  Google Scholar 

  46. Aida-zade, K., Xocayev, A., & Rustamov, S. (2016). Speech recognition using support vector machines. In 2016 IEEE 10th International Conference on Application of Information and Communication Technologies (AICT), (pp. 1–4). IEEE.

    Google Scholar 

  47. Shi, W., & Fan, X. (2017). Speech classification based on cuckoo algorithm and support vector machines. In 2017 2nd IEEE International Conference on Computational Intelligence and Applications (ICCIA) (pp. 98–102). IEEE.

    Google Scholar 

  48. Chan, M. V., Feng, X., Heinen, J. A., & Niederjohn, R. J. (1994). Classification of speech accents with neural networks. In 1994 IEEE International Conference on Neural Networks, IEEE World Congress on Computational Intelligence (Vol. 7, pp. 4483–4486). IEEE.

    Google Scholar 

  49. Kohonen, T. (2012). Self-organization and associative memory (Vol. 8). Springer Science & Business Media, New York.

    Google Scholar 

  50. Rumelhart, D. E., & McClelland, J. L. (1986). Parallel distributed processing: explorations in the microstructure of cognition. volume 1. foundations.

    Google Scholar 

  51. Hecht-Nielsen, R. (1990). Neurocomputing. Boston: Addison-Wesley.

    Google Scholar 

  52. Hansen, J. H., & Womack, B. D. (1996). Feature analysis and neural network-based classification of speech under stress. IEEE Transactions on Speech and Audio Processing, 4(4), 307–313.

    Article  Google Scholar 

  53. Polur, P. D., Zhou, R., Yang, J., Adnani, F., & Hobson, R. S. (2001). Isolated speech recognition using artificial neural networks. Virginia Commonwealth Univ Richmond School of Engineering.

    Google Scholar 

  54. Shao, C., & Bouchard, M. (2003). Efficient classification of noisy speech using neural networks. In 2003 Proceedings of Seventh International Symposium on Signal Processing and Its Applications (Vol. 1, pp. 357–360). IEEE.

    Google Scholar 

  55. Alexandre, E., Cuadra, L., Rosa-Zurera, M., & López-Ferreras, F. (2008). Speech/non-speech classification in hearing aids driven by tailored neural networks. In Speech, Audio, Image and Biomedical Signal Processing using Neural Networks (pp. 145–167). Springer, Heidelberg.

    Google Scholar 

  56. Hinton, G., et al. (2012). Deep neural networks for acoustic modeling in speech recognition. Signal Processing Magazine, 29(6), 82–97. IEEE.

    Google Scholar 

  57. Wang, Y., et al. (2018). Classification of mice hepatic granuloma microscopic images based on a deep convolutional neural network. Applied Soft Computing.

    Google Scholar 

  58. Lan, K., Wang, D. T., Fong, S., Liu, L. S., Wong, K. K., & Dey, N. (2018). A survey of data mining and deep learning in bioinformatics. Journal of Medical Systems, 42(8), 139.

    Article  Google Scholar 

  59. Hu, S., Liu, M., Fong, S., Song, W., Dey, N., & Wong, R. (2018). Forecasting China future MNP by deep learning. In Behavior engineering and applications (pp. 169–210). Springer, Cham.

    Google Scholar 

  60. Dey, N., Fong, S., Song, W., & Cho, K. (2017). Forecasting energy consumption from smart home sensor network by deep learning. In International Conference on Smart Trends for Information Technology and Computer Communications (pp. 255–265). Springer, Singapore.

    Google Scholar 

  61. Dey, N., Ashour, A. S., & Nguyen, G. N. Recent advancement in multimedia content using deep learning.

    Google Scholar 

  62. Mohamed, A., Dahl, G.E., & Hinton, G. (2012). Acoustic modeling using deep belief networks. IEEE Transactions on Audio, Speech, and Language Processing, 20(1), 14–22.

    Article  Google Scholar 

  63. Richardson, F., Reynolds, D., & Dehak, N. (2015). Deep neural network approaches to speaker and language recognition. IEEE Signal Processing Letters, 22(10), 1671–1675.

    Article  Google Scholar 

  64. Rajanna, A. R., Aryafar, K., Shokoufandeh, A., &Ptucha, R. (2015). Deep neural networks: A case study for music genre classification. In 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA) (pp. 655–660). IEEE.

    Google Scholar 

  65. Dumpala, S. H., & Kopparapu, S. K. (2017). Improved speaker recognition system for stressed speech using deep neural networks. In 2017 International Joint Conference on Neural Networks (IJCNN) (pp. 1257–1264). IEEE.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2019 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Sen, S., Dutta, A., Dey, N. (2019). Audio Classification. In: Audio Processing and Speech Recognition. SpringerBriefs in Applied Sciences and Technology(). Springer, Singapore. https://doi.org/10.1007/978-981-13-6098-5_4

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-6098-5_4

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-6097-8

  • Online ISBN: 978-981-13-6098-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics