Application of Proposed Phoneme Segmentation Technique for Speaker Identification

  • Mousmita Sarma
  • Kandarpa Kumar Sarma
Part of the Studies in Computational Intelligence book series (SCI, volume 550)


This chapter presents a neural model for speaker identification using speaker-specific information extracted from vowel sounds. The vowel sound is segmented out from words spoken by the speaker to be identified. Vowel sounds occur in a speech more frequently and with higher energy. Therefore, situations where acoustic information is noise corrupted, vowel sounds can be used to extract different amounts of speaker discriminative information. The model explained here uses a neural framework formed with PNN and LVQ where the proposed SOM-based vowel segmentation technique is used. The work extracts glottal source information of the speakers initially using LP residual. Later, empirical-mode decomposition (EMD) of the speech signal is performed to extract the residual. Depending on these residual features a LVQ-based speaker code book is formed. The work shows the use of residual signal obtained from EMD of speech as a speaker discriminative feature. The neural approach of speaker identification gives superior performance in comparison with the conventional statistical approach like hidden Markov models (HMMs), Gaussian mixture models (GMMs), etc. found in the literature. Although the proposed model has been experimented in case of the speakers of Assamese language, it shall also be suitable for other Indian languages for which the speaker database should contain samples of that specific language.


Speaker Identification ANN Codebook 


  1. 1.
    Miles MJ (1989) Speaker recognition based upon an analysis of vowel sounds and its application to Forensic work, Masters Dissertation, University of Auckland, NewZeland.Google Scholar
  2. 2.
    Kumar R, Ranjan R, Singh SK, Kala R, Shukla A, Tiwari R (2010) Text-dependent multilingual speaker identification for Indian Languages using artificial neural network. In: Proceedings of 3rd international conference on emerging trends in engineering and technology, pp 632–635.Google Scholar
  3. 3.
    Lajish VL, Sunil Kumar RK, Lajish VL, Sunil Kumar RK, Vivek P (2012) Speaker identification using a nonlinear speech model and ANN. Int J Adv Inf Technol 2(5):15–24Google Scholar
  4. 4.
    Qian B, Tang Z, Li Y, Xu L, Zhang Y (2007) Neural network ensemble based on vowel classification for Chinese speaker recognition. In: Proceedings of the 3rd international conference on natural computation, USA, 03.Google Scholar
  5. 5.
    Ranjan R, Singh SK, Shukla A, Tiwari R (2010) Text-dependent multilingual speaker identification for indian languages using artificial neural network. Proceedings of 3rd international conference on emerging trends in engineering and technology. Gwalior, India, pp 632–635Google Scholar
  6. 6.
    Chelali F, Djeradi A, Djeradi R (2011) Speaker identification system based on PLP coefficients and artificial neural network. In: Proceedings of the world congress on engineering, London, p 2.Google Scholar
  7. 7.
    Soria RAB, Cabral EF (1996) Speaker recognition with artificial neural networks and mel-frequency cepstral coefficients correlations. In: Proceedings of European signal processing conference, Italy.Google Scholar
  8. 8.
    Justin J, Vennila I (2011) Performance of speech recognition using artificial neural network and fuzzy logic. Eur J Sci Res 66(1):41–47Google Scholar
  9. 9.
    Yadav R, Mandal D (2011) Optimization of artificial neural network for speaker recognition using particle swarm optimization. Int J Soft Comput Eng 1(3):80–84Google Scholar
  10. 10.
    Hu YH, Hwang JN (2002) Handbook of neural network signal processing., The electrical engineering and applied signal processing seriesCRC Press, USA.Google Scholar
  11. 11.
    Templeton TG, Gullemin BJ (1990) Speaker identification based on vowel sounds using neural networks. In: Proceedings of 3rd international conference on speech science and technology, Australia, pp 280–285.Google Scholar
  12. 12.
    Reynolds DA, Rose RC (1995) Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans Speech Audio Process 3(1):72–83Google Scholar
  13. 13.
    Hasan T, Hansen J H L (2011) Robust speaker recognition in non-stationary room environments based on empirical mode decomposition. In: Proceedings of Interspeech.Google Scholar
  14. 14.
    Hsieh CT, Lai E, Wang YC (2003) Robust speaker identification system based on wavelet transform and Gaussian mixture model. J Inf Sci Eng 19:267–282Google Scholar
  15. 15.
    Ertas F (2001) Feature selection and classification techniques for speaker recognition. J Eng Sci 07(1):47–54Google Scholar
  16. 16.
    Patil V, Joshi S, Rao P (2009) Improving the robustness of phonetic segmentation to accent and style variation with a two-staged approach. Proceedings of Interspeech. Brighton, UK, pp 2543–2546Google Scholar
  17. 17.
    Campbell JP (1997) Speaker recognition: a tutorial. Proc IEEE 85(9):1437–1462Google Scholar
  18. 18.
    Huang NE, Shen Z, Long SR, Wu ML, Shih HH, Zheng Q, Yen NC, Tung CC, Liu HH (1998) The empirical mode decomposition and hilbert spectrum for nonlinear and nonstationary time series analysis. Proc Royal Soc Lond A 454:903–995CrossRefMATHMathSciNetGoogle Scholar
  19. 19.
    Fakotakis N, Tsopanoglou A, Kokkinakis G (1991) Text-independent speaker recognition based on vowel spotting. In: Proceedings of 6th international conference on digital processing of signals in communications, Loughborough, pp 272–277.Google Scholar
  20. 20.
    Thcvenaz P, Hiigli H (1995) Usefulness of the LPC-residue in text-independent speaker verification. Speech Commun 17:145–157Google Scholar
  21. 21.
    Radova V, Psutka J (1997) An approach to speaker identification using multiple classifiers. Proceedings of IEEE international conference on acoustics, speech, and signal processing 2:1135–1138Google Scholar
  22. 22.
    Sarma SV, Zue VW (1997) A segment-based speaker verification system using \(summit^{1}\). In: Proceedings of EUROSPEECH.Google Scholar
  23. 23.
    Mahadeva Prasanna SR, Gupta CS, Yegnanarayana B (2006) Extraction of speaker-specific excitation information from linear prediction residual of speech. Speech Commun 48:1243–1261CrossRefGoogle Scholar
  24. 24.
    Espy-Wilson CY, Manocha S, Vishnubhotla S (2006) A new set of features for text-independent speaker identification. In: Proceedings of INTERSPEECH, ISCA.Google Scholar
  25. 25.
    Antal M (2008) Phonetic speaker recognition. In: Proceedings of 7th international conference, COMMUNICATIONS, pp 67–72.Google Scholar
  26. 26.
    Jiahong Y, Mark L (2008) Speaker identification on the SCOTUS corpus. J Acoust Soc Am 123(5):3878Google Scholar
  27. 27.
    Ferras M, Barras C, Gauvain J (2009) Lattice-based MLLR for Speaker Recognition. In: IEEE international conference on acoustics, speech and signal processing, pp 4537–4540.Google Scholar
  28. 28.
    Tzagkarakis C, Mouchtaris A (2010) Robust text-independent speaker identification using short test and training. In: Proceedings of 18th European signal processing conference, Denmark, pp 586–590.Google Scholar
  29. 29.
    Shimada K, Yamamoto K, Nakagawa S (2011) Speaker identification using pseudo pitch synchronized phase information in voiced sound. In: Proceedings of annual summit and conference of Asia pacific signal and information processing association, Xian, China.Google Scholar
  30. 30.
    Pradhan G, Prasanna SRM (2011) Significance of vowel onset point information for speaker verification. Int J Comput CommunTechnol 2(6):60–66Google Scholar
  31. 31.
    Kinnunen T, Kilpelainen T, Franti P (2011) Comparison of clustering algorithms in speaker identification. Available via
  32. 32.
    Vuppala AK, Rao KS (2012) Speaker identification under background noise using features extracted from steady vowel regions. Int J Adapt Control Signal Process. doi: 10.1002/acs.2357 Google Scholar
  33. 33.
    Pati D, Prasanna SRM (2012) Speaker verification using excitation source information. Int J Speech Technol. doi: 10.1007/s10772-012-9137-5 Google Scholar
  34. 34.
    Rilling G, Flandrin P, Goncalves P (2003) On empirical mode decomposition and its algorithms. In: Proceedings of the 6th IEEE/EURASIP workshop on nonlinear signal and image processing, Italy.Google Scholar
  35. 35.
    Bouzid A, Ellouze N (2007) EMD analysis of speech signal in voiced mode.In: Proceedings of ITRW on non-linear speech processing. France, Paris, pp 112–115.Google Scholar
  36. 36.
    Schlotthauer G, Torres ME, Rufiner HL (2009) Voice fundamental frequency extraction algorithm based on ensemble empirical mode decomposition and entropies. In: Proceedings of the world congress on medical physics and biomedical engineering, Germany.Google Scholar
  37. 37.
    Schlotthauer G, Torres ME, Rufiner HL (2009) A new algorithm for instantaneous F0 speech extraction based on ensemble empirical mode decomposition. In: Proceedings of 17th European signal processing conference.Google Scholar
  38. 38.
    Hasan T, Hasan K (2009) Suppression of residual noise from speech signals using empirical mode decomposition. IEEE Signal Process Lett 16(1):2–5CrossRefGoogle Scholar
  39. 39.
    Battista BM, Knapp C, McGee T, Goebel V (2007) Application of the empirical mode decomposition and hilbert-huang transform to seismic reflection data. Geophysics 72:29–37CrossRefGoogle Scholar
  40. 40.
    Bullinaria JA (2000) A learning vector quantization algorithm for probabilistic models. Proceedings of EUSIPCO 2:721–724Google Scholar
  41. 41.
    Boersma P, Weenink D Praat: doing phonetics by computer. Available via
  42. 42.
    Fakotakis N, Tsopanoglou A, Kokkinakis G (1993) A text-independent speaker recognition system based on vowel spotting. Speech Commun 12(1):57–68CrossRefGoogle Scholar

Copyright information

© Springer India 2014

Authors and Affiliations

  1. 1.Department of Electronics and Communication EngineeringGauhati UniversityGuwahatiIndia
  2. 2.Department of Electronics and Communication TechnologyGauhati UniversityGuwahatiIndia

Personalised recommendations