Research on sound classification based on SVM

  • Pengcheng WeiEmail author
  • Fangcheng He
  • Li Li
  • Jing Li
Deep Learning for Big Data Analytics


Sound is a ubiquitous natural phenomenon that contains a wealth of information that constantly enhances our understanding of the objective world. With the continuous development of computer network technology and communication technology, audio information has become a very important part. Audio is a non-semantic symbolic representation and an unstructured binary stream. Because the audio itself lacks the description of content semantics and structured organization, it brings great difficulty to the audio classification work. The research of digital audio classification will become more and more important with the increasing number of digital audio resources in the network. Digital audio classification technology is the key technology to solve this problem. It is the key to solve the problem of audio structure and extract audio structured information and content semantics. It is a research hot spot in the field of audio analysis. It has important application value in many fields, such as audio retrieval, video summary and auxiliary video analysis. This paper studies the structure of audio, the analysis and extraction of audio features, the digital audio classifier based on support vector machines (SVM) and the audio segmentation technology based on BCI. SVM is an important achievement of machine learning research in recent years. As a new machine learning method, SVM can solve practical problems such as small sample, nonlinearity and high dimension, so it has become a new research hot spot after the study of neural network. Experiments show that the SVM-based audio classification algorithm has good classification effect, and the smoothed audio segmentation results are more accurate. With the further development of the research, the research results will be well applied in practice.


Support vector machine Audio segmentation Audio classification Audio signal preprocessing 



This work was supported by Chongqing Big Data Engineering Laboratory for Children, Chongqing Electronics Engineering Technology Research Center for Interactive Learning, the Science and Technology Research Project of Chongqing Municipal Education Commission of China (No. KJ1601401), the Science and Technology Research Project of Chongqing University of Education (No. KY201725C), Basic Research and Frontier Exploration of Chongqing Science and Technology Commission (CSTC2014jcyjA40019), Project of Science and Technology Research Program of Chongqing Education Commission of China (N0. KJZD-K201801601).


  1. 1.
    Vapnik V (1995) The nature of statistical learning theory. Springer, New YorkCrossRefzbMATHGoogle Scholar
  2. 2.
    Zhang T (2001) Audio content analysis for online audiovisual data segmentation and classification. IEEE Trans Speech Audio Process 96(4):440457Google Scholar
  3. 3.
    Kumar M, Mao YH, Wang YH et al (2017) Fuzzy theoretic approach to signals and systems: static systems. Inf Sci 418:668–702CrossRefGoogle Scholar
  4. 4.
    Zhang WP, Yang JZ, Fang YL et al (2017) Analytical fuzzy approach to biological data analysis. Saudi J Biol Sci 24(3):563–573CrossRefGoogle Scholar
  5. 5.
    Duda RO, Hart PE, Stork DG (2001) Pattern classification, vol 2. Wiley, New YorkzbMATHGoogle Scholar
  6. 6.
    Molla Md.KI, Hirose K (2004) On the effectiveness of MFCCs and their statistical distribution properties in speaker identification. In: IEEE international conference on virtual environments, human–computer interfaces and measurement systems, pp 136–141Google Scholar
  7. 7.
    Picone JW (1976) Signal modeling techniques in speech recognition. Proc IEEE 79(4):157–161Google Scholar
  8. 8.
    Zhou B, Hansen JH (2005) Efficient audio stream segmentation via the combined T2 statistic and Bayesian information criterion. IEEE Trans Speech Audio Process 13(4):467CrossRefGoogle Scholar
  9. 9.
    Seheirer E, Slaney M (1997, April) Construction and evaluation of a robust multifeature music/speech discriminator. In: Proceedings of ICASSP 97Google Scholar
  10. 10.
    Vernstrom T, Gaensler BM, Brown S et al (2017) Low frequency radio constraints on the synchrotron cosmic web. Mon Not R Astron Soc 467(4):4914–4936CrossRefGoogle Scholar
  11. 11.
    Reynolds DA, Rose RC (1995) Text-independent speaker identification using Gaussian mixture speaker models. In: IEEE Transaction on SAP, pp 72–83Google Scholar
  12. 12.
    Li SZ (2000) Content-Based classification and retrieval of audio using the nearest feature line method. IEEE Trans Speech Audio Process 8(5):619–625MathSciNetCrossRefGoogle Scholar
  13. 13.
    Feiten B, Frank R, Ungvary T (1991) Organization of sounds with neural nets. In: Proceedings of the 1991 international computer music conference. International computer music association, San Francisco, pp 441–444Google Scholar
  14. 14.
    Liang B, Yaali H, Songyang L, Jianyun C, Lingda W (2004) Feature analysis and extraction for audio automatic classification. In: The International workshop on image, video, audio retrieval and mining, CanadaGoogle Scholar
  15. 15.
    Lu L, Jiang H, Zhang HJ (2001) A robust audio classification and segmentation method. In: Proceedings of the 9th ACM international conference on multimedia, pp 203–211Google Scholar
  16. 16.
    Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297zbMATHGoogle Scholar
  17. 17.
    Shirvani A, Chegini H, Setayeshi S et al. (2009) Polynomial kernel function and its application in locally polynomial neurofuzzy models. In: International CSI computer conference. IEEE, pp 54–59Google Scholar
  18. 18.
    Vapnik VN (1998) Statistical learning theory. Wiley, New YorkzbMATHGoogle Scholar
  19. 19.
    Kim H, Elter D, Sikora T (2005) Hybrid speaker-based segmentation system using model-level clustering. In: Proceedings of the IEEE international conference onacoustics speech, and signal processing, pp 745–748Google Scholar
  20. 20.
    Chen S, Gopalakrishnan PS (1998) Speaker, environment and channel change detection and clustering via the Bayesian information criterion. In: Proceedings of the speech recognition workshopGoogle Scholar
  21. 21.
    L Lu, H-J Zhang (2002) Real-time unsupervised speaker change detection. In: 6th International conference on pattern recognition, pp 358–361Google Scholar
  22. 22.
    Cheng SS, Wang HM, Fu HC (2008) BIC-based audio segmentation by divide and conquer. In: Proceedings of ICASSP 2008. IEEE Press, Las Vegas, pp 4841–4844Google Scholar
  23. 23.
    Chen S, Gopalakrishnan R (1998) Speaker environment and channel change detection and clustering via the bayesian information criterion. In: Proceedings of DARPA broadecast news transcription and understanding workshop, Lansdowne, VA, USA, pp 127–132Google Scholar
  24. 24.
    Cettolo M, Vescovi M. (2003) Efficient audio segmentation algorithms based on the BIC. In: Proceedings of the international conference on acoustics, speech, and signal processing, Hong Kong, China, pp 537–540Google Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2019

Authors and Affiliations

  1. 1.School of Mathematics and Information EngineeringChongqing University of Education at NanshanChongqingChina
  2. 2.College of Foreign Languages LiteratureChongqing University of Education at NanshanChongqingChina

Personalised recommendations