Segregation of Speech and Songs - A Precursor to Audio Interactive Applications

Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 836)


Audio interactive applications have eased our lives in numerous ways encompassing speech recognition to song identification. Such applications have helped the common people in using Information Technology by providing them a passage for skipping the complicated user interactivity procedures. Audio based search applications have become very popular nowadays especially for searching songs. A system which can distinguish between speech and songs can help to boost the performance of such applications by minimizing the search space and at the same time decide the method of recognition based on the type of audio. It can also help in music-speech separation from audio for karaoke development. In this paper, a system to segregate songs and speech has been proposed using Line Spectral Pair based features. The system has been tested on a database of 19374 clips and a highest accuracy of 99.88% has been obtained with Ensemble Learning based classification.


Speech recognition Audio based searching Line Spectral Pair Framing Ensemble Learning 


  1. 1.
    Al-Shoshan, A.I.: Speech and music classification and separation: a review. J. King Saud Univ. 19(1), 95–133 (2006)Google Scholar
  2. 2.
    Gao, T., Du, J., Dai, L.R., Lee, C.H.: Joint training of front-end and back-end deep neural networks for robust speech recognition. In: Proceedings of ICASSP-2015, pp. 4375–4379 (2015)Google Scholar
  3. 3.
    Giri, R., Seltzer, M.L., Droppo, J., Yu, D.: Improving speech recognition in reverberation using a room-aware deep neural network and multitask learning. In: Proceedings of ICASSP-2015, pp. 5014–5018 (2015)Google Scholar
  4. 4.
    Ritter, M., Mueller, M., Stueker, S., Metze, F., Waibel, A.: Training deep neural networks for reverberation robust speech recognition. In: ITG Symposium on Speech Communication, pp. 1–5 (2016)Google Scholar
  5. 5.
    Wold, E., Blum, T., Keislar, D., Wheaten, J.: Content-based classification, search, and retrieval of audio. IEEE Multimedia 3(3), 27–36 (1996)CrossRefGoogle Scholar
  6. 6.
    Mazzoni, D., Dannenberg, R.B.: Melody matching directly from audio. In: Proceedings of ISMIR-2001, pp. 17–18 (2001)Google Scholar
  7. 7.
    Foote, J.T.: Content-based retrieval of music and audio. In: Multimedia Storage and Archiving Systems II, pp. 138–148 (1997)Google Scholar
  8. 8.
    Prakash, K., Hepzibha, R.D.: Blind source separation for speech music and speech speech mixtures. Int. J. Comput. Appl. 110(12), 40–43 (2015)Google Scholar
  9. 9.
    Gerhard, D.B.: Computationally measurable differences between speech and song. Doctoral dissertation, School of Computing Science, Simon Fraser University (2003)Google Scholar
  10. 10.
    Ghosal, A., Chakraborty, R., Dhara, B.C., Saha, S.K.: A hierarchical approach for speech-instrumental-song classification. SpringerPlus 2(1), 526 (2013)CrossRefGoogle Scholar
  11. 11.
    Rong, F.: Audio classification method based on machine learning. In: Proceedings of ICITBS-2016, pp. 81–84 (2016)Google Scholar
  12. 12.
    Saunders, J.: Real-time discrimination of broadcast speech/music. In: Proceedings of ICASSP-1996, vol. 2, pp. 993–996 (1996)Google Scholar
  13. 13.
    Sadjadi, S.O., Ahadi, S.M., Hazrati, O.: Unsupervised speech/music classification using one-class support vector machines. In: Proceedings of ICICS-2007, pp. 1–5 (2007)Google Scholar
  14. 14.
    Thoshkahna, B., Sudha, V., Ramakrishnan, K.R.: A speech-music discriminator using HILN model based features. In: Proceedings of ICASSP-2006, vol. 5, pp. V 425-V 428 (2006)Google Scholar
  15. 15.
    Ethnologue. Accessed 1 Sept 2017
  16. 16.
    Youtube. Accessed 1 Sept 2017
  17. 17.
    Mukherjee, H., Rakshit, P., Phadikar, S., Roy, K.: REARC-a Bangla phoneme recognizer. In: Proceedings of ICADW-2016, pp. 177–180 (2016)Google Scholar
  18. 18.
    Paliwal, K.K.: On the use of line spectral frequency parameters for speech recognition. Digit. Signal Process. 2(2), 80–87 (1992)CrossRefGoogle Scholar
  19. 19.
    Breiman, L.: Random forests. Machine Learn. 45(1), 5–32 (2001)CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  1. 1.Department of Computer ScienceWest Bengal State UniversityKolkataIndia
  2. 2.Department of Computer Science and EngineeringMaulana Abul Kalam Azad University of TechnologyKolkataIndia

Personalised recommendations