Text-Dependent Speaker Recognition System Using Symbolic Modelling of Voiceprint

  • Shanmukhappa A. Angadi
  • Sanjeevakumar M. Hatture
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 801)


Speaker recognition system automatically recognize/identify a speaker by their combined behavioral and physiological characteristics. A symbolic inference system for text-dependent speaker recognition system by exploring the physiological characteristics embedded in the user utterance is presented in this paper. The inter-lexical pause position, complementary spectral features such as spectral centroid, spectral entropy and spectral flatness, loudness, pitch and formants features are extracted from the voiceprint and symbolic data object is constructed. These features are explored in this work as inter-lexical pause position provides the articulation capability of user vocal tract. The functional properties of the human ear is modelled with spectral characteristics and loudness feature provides the strength of ear’s perception. The relation between physical and perceptual properties of sound is estimated through pitch whereas formants provide the acoustic reverberation of the human vocal tract. The variability in features of user/speaker utterance of words is represented with symbolic data. The speaker identification is performed using modified span, content and position symbolic similarity measures [3], modified for the current work. The proposed method is evaluated on 100 users of voice corpus of VTU-BEC-DB multimodal biometric database and achieves an overall identification rate of 90.56%.


Speaker identification Symbolic object Voice biometric Complementary spectral features Symbolic similarity measure 


  1. 1.
    Faundez-Zanuy, M., Monte-Moreno, E.: State-of-the-art in speaker recognition. IEEE Aerosp. Electron. Syst. Mag. 20(5), 7–12 (2005)CrossRefGoogle Scholar
  2. 2.
    Angadi, S.A., Kagawade, V.C.: A robust face recognition approach through symbolic modeling of polar FFT features. Pattern Recogn. 71, 235–248 (2017)Google Scholar
  3. 3.
    Gowda, C.K.: Symbolic objects and symbolic classification. In: International Conference on Symbolic and Spatial Data Analysis: Mining Complex Data Structures, pp. l–18 (2004)Google Scholar
  4. 4.
    Nagabhushan, P., Angadi, S.A., Anami, B.S.: Symbolic data structure for postal address representation and address validation through symbolic knowledge base. In: Pal, Sankar K., Bandyopadhyay, S., Biswas, S. (eds.) PReMI 2005. LNCS, vol. 3776, pp. 388–394. Springer, Heidelberg (2005). Scholar
  5. 5.
    Jourlin, P., Luettin, J., Genoud, D., Wassner, H.: Integrating acoustic and labial information for speaker identification and verification. In: Fifth European Conference on Speech Communication Technology, pp. 1603–1606 (1997)Google Scholar
  6. 6.
    Rabiul, I.M., Fayzur, R.M.: Improvement of text dependent speaker identification system using neuro-genetic hybrid algorithm in office environmental conditions. Int. J. Comput. Sci. Issues 1, 42–47 (2009)Google Scholar
  7. 7.
    Dash, K., Padhi, D., Panda, B., Mohanty, S.: Speaker identification using mel frequency cepstral coefficient and BPNN. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 2(4), 326–332 (2012)Google Scholar
  8. 8.
    Erzin, E., Yemez, Y., Tekalp, A.M.: Multimodal speaker identification using an adaptive classifier cascade based on modality reliability. IEEE Trans. Multimed. 7(5), 840–852 (2005)CrossRefGoogle Scholar
  9. 9.
    Aladwan, A.A., Shamroukh, R.M., Aladwan, A.: A novel study of biometric speaker identification using neural networks and multi-level wavelet decomposition. World Comput. Sci. Inf. Technol. J. 2(2), 68–73 (2012)Google Scholar
  10. 10.
    Li, Z., Gao, Y.: Acoustic feature extraction method for robust speaker identification. Multimed. tools Appl. 75(12), 7391–7406 (2016)CrossRefGoogle Scholar
  11. 11.
    Stafylakis, T., Jahangir, A.M., Kenny, P.: Text-dependent speaker recognition with random digit strings. IEEE/ACM Trans. Audio Speech Lang. Process. 24(7), 1194–1203 (2016)CrossRefGoogle Scholar
  12. 12.
    Zhang, S-X., Chen, Z., Zhao, Y., Li, J., Gong, Y.: End-to-end attention based text-dependent speaker verification. In: IEEE Workshop on Spoken Language Technology, pp. 171–178 (2017)Google Scholar
  13. 13.
    Büyük, O.: Sentence-HMM state-based i-vector/PLDA modelling for improved performance in text dependent single utterance speaker verification. IET Sig. Process. 10(8), 918–923 (2016)CrossRefGoogle Scholar
  14. 14.
    You, H., Li, W., Li, L., Zhu, J.: Lexicon-based local representation for text-dependent speaker verification. IEICE Trans. Inf. Syst. E100–D(3), 587–589 (2017)CrossRefGoogle Scholar
  15. 15.
    Sun, H., Kong, A.L., Ma, B.: A new study of GMM-SVM system for text-dependent speaker recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4195–4199 (2015)Google Scholar
  16. 16.
    Jang, R., Shing, J.: End point detection, pp. 1–23. MIR lab, CSIE department national Taiwan university, Taiwan (2015)Google Scholar
  17. 17.
    Zellner, B.: Pauses and the temporal structure of speech. In: Fundamentals of Speech Synthesis and Speech Recognition, pp. 41–62 (1994)Google Scholar
  18. 18.
    Misra, H., Shajith, I., Bourlard, H., Hermansky, H.: Spectral entropy based feature for robust ASR. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 1–6 (2004)Google Scholar
  19. 19.
    Paliwal, K.K.: Spectral subband centroid features for speech recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 2, pp. 617–620 (1998)Google Scholar
  20. 20.
    Tzanetakis, G., Cook, P.R.: Musical genre classification of audio signals. IEEE Trans. Speech Audio Process. 10, 293–302 (2002)CrossRefGoogle Scholar
  21. 21.
    Rouat, J., Liu, Y.C., Morissette, D.: A pitch determination and voiced/unvoiced decision algorithm for noisy speech. Speech Commun. 21, 191–207 (1997)CrossRefGoogle Scholar
  22. 22.
    Esben, S., Soren, H.N.: Evaluation of different loudness models with music and speech material. In: Audio Engineering Society 117 Convention, pp. 1–34 (2004)Google Scholar
  23. 23.
    Zwicker, E., Fastl, H., Widmann, U., Kurakata, K., Kuwano, S.N.: Program for calculating loudness according to DIN 45631 (ISO 532B). J. Acoust. Soc. Jpn. 12, 39–42 (1991)CrossRefGoogle Scholar
  24. 24.
    Roy, S.C., Fausto, M.: Formant location from LPC analysis data. IEEE Trans. Speech Audio Process. 1(2), 129–134 (1993)CrossRefGoogle Scholar
  25. 25.
    Rabiner, L.R., Schafer, R.W.: Introduction to digital speech processing. Found. Trends Sig. Process. 1(1–2), 1–194 (2007)zbMATHGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  1. 1.Visvesvaraya Technological UniversityBelagaviIndia
  2. 2.Basaveshwar Engineering CollegeBagalkotIndia

Personalised recommendations