Application of Proposed Phoneme Segmentation Technique for Speaker Identification

Sarma, Mousmita; Sarma, Kandarpa Kumar

doi:10.1007/978-81-322-1862-3_9

Mousmita Sarma⁴ &
Kandarpa Kumar Sarma⁵

Part of the book series: Studies in Computational Intelligence ((SCI,volume 550))

575 Accesses

Abstract

This chapter presents a neural model for speaker identification using speaker-specific information extracted from vowel sounds. The vowel sound is segmented out from words spoken by the speaker to be identified. Vowel sounds occur in a speech more frequently and with higher energy. Therefore, situations where acoustic information is noise corrupted, vowel sounds can be used to extract different amounts of speaker discriminative information. The model explained here uses a neural framework formed with PNN and LVQ where the proposed SOM-based vowel segmentation technique is used. The work extracts glottal source information of the speakers initially using LP residual. Later, empirical-mode decomposition (EMD) of the speech signal is performed to extract the residual. Depending on these residual features a LVQ-based speaker code book is formed. The work shows the use of residual signal obtained from EMD of speech as a speaker discriminative feature. The neural approach of speaker identification gives superior performance in comparison with the conventional statistical approach like hidden Markov models (HMMs), Gaussian mixture models (GMMs), etc. found in the literature. Although the proposed model has been experimented in case of the speakers of Assamese language, it shall also be suitable for other Indian languages for which the speaker database should contain samples of that specific language.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Miles MJ (1989) Speaker recognition based upon an analysis of vowel sounds and its application to Forensic work, Masters Dissertation, University of Auckland, NewZeland.
Google Scholar
Kumar R, Ranjan R, Singh SK, Kala R, Shukla A, Tiwari R (2010) Text-dependent multilingual speaker identification for Indian Languages using artificial neural network. In: Proceedings of 3rd international conference on emerging trends in engineering and technology, pp 632–635.
Google Scholar
Lajish VL, Sunil Kumar RK, Lajish VL, Sunil Kumar RK, Vivek P (2012) Speaker identification using a nonlinear speech model and ANN. Int J Adv Inf Technol 2(5):15–24
Google Scholar
Qian B, Tang Z, Li Y, Xu L, Zhang Y (2007) Neural network ensemble based on vowel classification for Chinese speaker recognition. In: Proceedings of the 3rd international conference on natural computation, USA, 03.
Google Scholar
Ranjan R, Singh SK, Shukla A, Tiwari R (2010) Text-dependent multilingual speaker identification for indian languages using artificial neural network. Proceedings of 3rd international conference on emerging trends in engineering and technology. Gwalior, India, pp 632–635
Google Scholar
Chelali F, Djeradi A, Djeradi R (2011) Speaker identification system based on PLP coefficients and artificial neural network. In: Proceedings of the world congress on engineering, London, p 2.
Google Scholar
Soria RAB, Cabral EF (1996) Speaker recognition with artificial neural networks and mel-frequency cepstral coefficients correlations. In: Proceedings of European signal processing conference, Italy.
Google Scholar
Justin J, Vennila I (2011) Performance of speech recognition using artificial neural network and fuzzy logic. Eur J Sci Res 66(1):41–47
Google Scholar
Yadav R, Mandal D (2011) Optimization of artificial neural network for speaker recognition using particle swarm optimization. Int J Soft Comput Eng 1(3):80–84
Google Scholar
Hu YH, Hwang JN (2002) Handbook of neural network signal processing., The electrical engineering and applied signal processing seriesCRC Press, USA.
Google Scholar
Templeton TG, Gullemin BJ (1990) Speaker identification based on vowel sounds using neural networks. In: Proceedings of 3rd international conference on speech science and technology, Australia, pp 280–285.
Google Scholar
Reynolds DA, Rose RC (1995) Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans Speech Audio Process 3(1):72–83
Google Scholar
Hasan T, Hansen J H L (2011) Robust speaker recognition in non-stationary room environments based on empirical mode decomposition. In: Proceedings of Interspeech.
Google Scholar
Hsieh CT, Lai E, Wang YC (2003) Robust speaker identification system based on wavelet transform and Gaussian mixture model. J Inf Sci Eng 19:267–282
Google Scholar
Ertas F (2001) Feature selection and classification techniques for speaker recognition. J Eng Sci 07(1):47–54
Google Scholar
Patil V, Joshi S, Rao P (2009) Improving the robustness of phonetic segmentation to accent and style variation with a two-staged approach. Proceedings of Interspeech. Brighton, UK, pp 2543–2546
Google Scholar
Campbell JP (1997) Speaker recognition: a tutorial. Proc IEEE 85(9):1437–1462
Google Scholar
Huang NE, Shen Z, Long SR, Wu ML, Shih HH, Zheng Q, Yen NC, Tung CC, Liu HH (1998) The empirical mode decomposition and hilbert spectrum for nonlinear and nonstationary time series analysis. Proc Royal Soc Lond A 454:903–995
Article MATH MathSciNet Google Scholar
Fakotakis N, Tsopanoglou A, Kokkinakis G (1991) Text-independent speaker recognition based on vowel spotting. In: Proceedings of 6th international conference on digital processing of signals in communications, Loughborough, pp 272–277.
Google Scholar
Thcvenaz P, Hiigli H (1995) Usefulness of the LPC-residue in text-independent speaker verification. Speech Commun 17:145–157
Google Scholar
Radova V, Psutka J (1997) An approach to speaker identification using multiple classifiers. Proceedings of IEEE international conference on acoustics, speech, and signal processing 2:1135–1138
Google Scholar
Sarma SV, Zue VW (1997) A segment-based speaker verification system using \(summit^{1}\). In: Proceedings of EUROSPEECH.
Google Scholar
Mahadeva Prasanna SR, Gupta CS, Yegnanarayana B (2006) Extraction of speaker-specific excitation information from linear prediction residual of speech. Speech Commun 48:1243–1261
Article Google Scholar
Espy-Wilson CY, Manocha S, Vishnubhotla S (2006) A new set of features for text-independent speaker identification. In: Proceedings of INTERSPEECH, ISCA.
Google Scholar
Antal M (2008) Phonetic speaker recognition. In: Proceedings of 7th international conference, COMMUNICATIONS, pp 67–72.
Google Scholar
Jiahong Y, Mark L (2008) Speaker identification on the SCOTUS corpus. J Acoust Soc Am 123(5):3878
Google Scholar
Ferras M, Barras C, Gauvain J (2009) Lattice-based MLLR for Speaker Recognition. In: IEEE international conference on acoustics, speech and signal processing, pp 4537–4540.
Google Scholar
Tzagkarakis C, Mouchtaris A (2010) Robust text-independent speaker identification using short test and training. In: Proceedings of 18th European signal processing conference, Denmark, pp 586–590.
Google Scholar
Shimada K, Yamamoto K, Nakagawa S (2011) Speaker identification using pseudo pitch synchronized phase information in voiced sound. In: Proceedings of annual summit and conference of Asia pacific signal and information processing association, Xian, China.
Google Scholar
Pradhan G, Prasanna SRM (2011) Significance of vowel onset point information for speaker verification. Int J Comput CommunTechnol 2(6):60–66
Google Scholar
Kinnunen T, Kilpelainen T, Franti P (2011) Comparison of clustering algorithms in speaker identification. Available via http://www.cs.joensuu.fi/pages/tkinnu/webpage/pdf/ComparisonClusteringAlgsSpeakerRec.pdf
Vuppala AK, Rao KS (2012) Speaker identification under background noise using features extracted from steady vowel regions. Int J Adapt Control Signal Process. doi:10.1002/acs.2357
Google Scholar
Pati D, Prasanna SRM (2012) Speaker verification using excitation source information. Int J Speech Technol. doi:10.1007/s10772-012-9137-5
Google Scholar
Rilling G, Flandrin P, Goncalves P (2003) On empirical mode decomposition and its algorithms. In: Proceedings of the 6th IEEE/EURASIP workshop on nonlinear signal and image processing, Italy.
Google Scholar
Bouzid A, Ellouze N (2007) EMD analysis of speech signal in voiced mode.In: Proceedings of ITRW on non-linear speech processing. France, Paris, pp 112–115.
Google Scholar
Schlotthauer G, Torres ME, Rufiner HL (2009) Voice fundamental frequency extraction algorithm based on ensemble empirical mode decomposition and entropies. In: Proceedings of the world congress on medical physics and biomedical engineering, Germany.
Google Scholar
Schlotthauer G, Torres ME, Rufiner HL (2009) A new algorithm for instantaneous F0 speech extraction based on ensemble empirical mode decomposition. In: Proceedings of 17th European signal processing conference.
Google Scholar
Hasan T, Hasan K (2009) Suppression of residual noise from speech signals using empirical mode decomposition. IEEE Signal Process Lett 16(1):2–5
Article Google Scholar
Battista BM, Knapp C, McGee T, Goebel V (2007) Application of the empirical mode decomposition and hilbert-huang transform to seismic reflection data. Geophysics 72:29–37
Article Google Scholar
Bullinaria JA (2000) A learning vector quantization algorithm for probabilistic models. Proceedings of EUSIPCO 2:721–724
Google Scholar
Boersma P, Weenink D Praat: doing phonetics by computer. Available via http://www.fon.hum.uva.nl/praat/
Fakotakis N, Tsopanoglou A, Kokkinakis G (1993) A text-independent speaker recognition system based on vowel spotting. Speech Commun 12(1):57–68
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronics and Communication Engineering, Gauhati University, Guwahati, Assam, India
Mousmita Sarma
Department of Electronics and Communication Technology, Gauhati University, Guwahati, Assam, India
Kandarpa Kumar Sarma

Authors

Mousmita Sarma
View author publications
You can also search for this author in PubMed Google Scholar
Kandarpa Kumar Sarma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mousmita Sarma .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Sarma, M., Sarma, K.K. (2014). Application of Proposed Phoneme Segmentation Technique for Speaker Identification. In: Phoneme-Based Speech Segmentation using Hybrid Soft Computing Framework. Studies in Computational Intelligence, vol 550. Springer, New Delhi. https://doi.org/10.1007/978-81-322-1862-3_9

Download citation

DOI: https://doi.org/10.1007/978-81-322-1862-3_9
Published: 05 April 2014
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-1861-6
Online ISBN: 978-81-322-1862-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics