Automatic Speaker Recognition Using Multi-Directional Local Features (MDLF)

Mahmood, Awais; Alsulaiman, Mansour; Muhammad, Ghulam

doi:10.1007/s13369-014-1048-0

Automatic Speaker Recognition Using Multi-Directional Local Features (MDLF)

Research Article - Computer Engineering and Computer Science
Published: 09 April 2014

Volume 39, pages 3799–3811, (2014)
Cite this article

Arabian Journal for Science and Engineering Aims and scope Submit manuscript

Awais Mahmood¹,
Mansour Alsulaiman¹ &
Ghulam Muhammad¹

181 Accesses
14 Citations
Explore all metrics

Abstract

A new feature called multi-directional local feature (MDLF) is proposed and applied in automatic speaker recognition. In order to extract the MDLF, a windowed speech signal is processed by fast Fourier transform and passed through 24 mel-scaled Filter Bank, followed by log compression. A three-point linear regression is then applied in four different directions, which are horizontal (time axis), vertical (frequency axis), 45^◦ (time–frequency) and 135^◦ (time–frequency). MDLF holds the characteristics of the speaker in time spectrum and results in better performance. In the experiments conducted, a Gaussian mixture model (GMM) with a different number of mixtures is used as the classifier. Experimental results show that the proposed MDLF has better recognition accuracy than the traditional MFCC features. The MDLF achieves excellent results both in text-dependent and text-independent speaker recognition, and in Arabic and English speech. The proposed technique is also language independent.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic speech recognition: a survey

Article 10 November 2020

Speech Emotion Recognition: A Comprehensive Survey

Article 08 March 2023

Chinese dialect speech recognition: a comprehensive survey

Article Open access 31 January 2024

References

Atal, B.S.: Automatic recognition of speakers from their voices. In: Proceedings of IEEE, vol. 64, pp. 460–475 (1976)
Campbell J.P. Jr.: Speaker recognition: a tutorial. In: Proceedings of the IEEE, vol. 85(9), pp. 1437–1462 (1997)
Reynolds D.A., Quatieri T.F., Dunn R.: Speaker verification using adapted Gaussian mixture models. Digital Signal Process. 10(1–3), 19–41 (2000)
Article Google Scholar
Wildermoth, B.; Paliwal, K.K.: Use of voicing and pitch information for speaker recognition. In: Proceedings of Australian International Conference on Speech Science and Technology (SST-2000), Canberra, Australia, pp. 324–328 (2000)
Wang N., Ching P.C., Zheng N., Lee T.: Robust speaker recognition using denoised vocal source and vocal tract features. IEEE Trans. Audio Speech Lang. Process. 19(1), 196–205 (2011)
Article Google Scholar
Lawson, A.; Vabishchevich, P.; Huggins, M.; Ardis, P.; Battles, B.: Survey and evaluation of acoustic features for speaker recognition. ICASSP2011 (2011)
Nitta, T.: A novel feature-extraction for speech recognition based on multiple acoustic-feature planes. In: Proceedings of IEEE ICASSP’98, vol. I, pp. 29–32 (1998)
Nitta, T.: Feature extraction for speech recognition based on orthogonal acoustic-feature planes and LDA. In: Proceedings of IEEE ICASSP’99, vol. I, pp. 421–424 (1999)
Fukuda, T.; et al.: Orthogonalized distinctive phonetic feature extraction for noise-robust automatic speech recognition. IEICE Trans. Info. Sys. E87-D(5), 1110–1118 (2004)
Fukuda, T.; Nitta, T.: A Study on Japanese distinctive phonetic feature set for robust speech recognition. In: Proceedings of the 2003 Autumn Meeting of The Acoustical Society of Japan, vol. I, pp. 1–6, 9–10 Sept 2003, in Japanese (2003)
Hassan, F.; Rokibul, M.; Kotwal, A.; Rahman, M.; Nasiruddin, M.; Latif, A.; Huda M.N.: Local feature or mel frequency cepstral coefficients: Which one is better for MLN-based Bangla speech recognition? pp. 54–161. Springer, Berlin (2011)
Alotaibi, Y.A.; Abdullah-Al-Mamun, K.; Muhammad, G.: Noise effect on Arabic alphadigits in automatic speech recognition. IPCV’09: The 2009 International Conference on Image Processing, Computer Vision, and Pattern Recognition, Las Vegas, Nevada, USA, 13–16 July (2009)
Graciarena, M.; Kanjarekar, S.; Stolcke, A.; Shriberg, E.: Noise robust speaker identification for spontaneous Arabic speech. In: International Conference on Acoustic, Speech and Signal Processing (ICASSP) (2007)
Alotaibi Y.A., Mauhammad G.: Study on pharyngeal and uvular consonants in foreign accented Arabic for ASR. Comput. Speech. Lang. Sci. Direct. 24, 219–231 (2010)
Article Google Scholar
Kirchhoff, K.; Bilmes, J.; Das, S.; Duta, N.; Egan, M.; Gang, J.; Feng, H.; Henderson, J.; Daben, L.; Noamany, M.; Schone, P.; Schwartz, R.; Vergyri, D.: Novel Speech Recognition Models for Arabic. Johns–Hopkins University Summer Research Workshop 2002: Final Report, http://www.clsp.jhu.edu/ws02 (2002)
Alkhouli, M.: Alaswaat Alaghawaiyah. Daar Alfalah, Jordan (Arabic) (1990)
Al-Zabibi, M.: An Acoustic–phonetic Approach in Automatic Arabic Speech Recognition. The British Library in Association with UMI, UK (1990)
El-Imam, Y.: An unrestricted vocabulary Arabic speech synthesis system. In: Proceedings of IEEE Transactions on Acoustic, Speech, and Signal Processing, vol. 37(12), pp. 1829–1845 (1989)
Nofal, M.; Abdulrahem, E.; El-Henawy, H.: Arabic/English automatic language identification. In: Proceedings of IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, pp. 400–403 (1999)
Hussain, S.J.; Mohamed, F.A.: Optimizing network parameters for Arabic speakers recognition. In: Proceedings of the 41st SICE Annual Conference vol. 5, pp. 2682–2685 (2002)
Oumaour-Sayoud, S.; Sayoud, H.; Boudraa, M.: Application of the MLVQ1 in speaker identification. In: Proceedings of ISCA Tutorial and Research Workshop on Non-Linear Speaker Processing (NOLISP’03), France (2003)
Sakka, Z.; Kachouri, A.; Meghzani, A.; Samet, M.: A new method for speech denoising and speaker verification using subband architecture. In: Proceedings of First International Symposium on Control, Communications and Signal Processing, pp. 37–40 (2004)
El-Gama, M.A.; Abu-El Yazeed, M.F.; El Ayadi, M.M.H.: Dimensionality reduction for text independent speaker identification using Gaussian mixture model. In: Proceedings of IEEE International Symposium on Micro-NanoMechatronics and Human Science (2005)
Abu el-Yazeed M.F., Abdelkader N.S., EL-Henawy M.M.: A modified group vector quantization algorithm for speaker identification. Micro-Mech. Human Sci. 2, 629–632 (2005)
Google Scholar
Elmisery, F.E.; Khaleil, A.H.; Salama, A.E.; El-Galdawi, F.: An FPGA VQ for Speaker Identification. Cairo University, Giza, Egypt (2005)
Al-Kanhal, M.; Alghamdi, M.; Muzzaffar, Z.: Speaker verification based on Saudi accented Arabic database. In: International Symposium on Signal Processing and its Applications in conjunction with the International Conference on Information Sciences, Signal Processing and its Applications. Sharjah, United Arab Emirates, 12–15 Feb (2007)
Saeed K., Nammous M.K.: A speech and speaker identification system: feature extraction, description, and classification of speech-signal image. IEEE Trans. Ind. Electron. 54(2), 887–897 (2007)
Article Google Scholar
Kachouri, A.; Hdiji, T.; Sakka, Z.; Samet, M.: Contribution to the vocal print recogntion in Arabic language. J. Appl. Sci. 7(18), 2560–2567
Stolcke, A.; Kajarekar, S.: Recognizing Arabic speakers with english phones. In: The Speaker and Language Recognition Workshop, Odyssey 2008, Paper 024 (2008)
Al Marshali, A.; Al-Dakak, O.: Automatic, text-independant, speaker identification and verification system using mel cepstrum and GMM. In: Proceedings of 3rd International Conference on Information and Communication Technologies: From Theory to Applications. ICTTA, pp. 1–6 (2008)
Daqrooq, K.; Al-Sawalmeh, W.; Al-Qawasmi, A.; Abu-Isbeih, I.: Speaker identification wavelet transform based method. In: Proceedings of 5th International Multi-Conference Systems, Signals and Devices (IEEE SSD), pp. 1–5 (2008)
Bengharabi, M.; Tounsi, B.; Bessalah, H.; Harizi, F.: Forensic Identification Reporting Using a GMM Based Speaker Recognition Dedicated to Algerian Arabic Dialect Speakers. Advanced Technologies Research Center, Algeria (2008)
Al-Dahri, S.S.; Al-Jassar, Y.H.; Alotaibi, Y.A.; Alsulaiman, M.M.; Abdullah-Al-Mamun, K.: A Word-Dependent Automatic Arabic Speaker Identification System, ISSPIT’08, BOSNIA (2008)
Alsulaiman, M.; Muhammad, G; Alotaibi, Y.; Mahmood, A.; Bencherif, M.A.: Building a speaker recognition with one sample. In: Proceedings of the Second Symposium International Computer Science and Computational Technology (ISCSCT ’09) Huangshan, People’s Republic of China, 26–28 Dec 2009, pp. 330–334 (2009)
Alsulaiman, M.; Mahmood, A.; Muhammad, G.; Muhammad, A.; Bencherif, A.; Alotaibi, Y.: A technique to overcome the problem of small size database for automatic speaker recognition. In: Proceedings of the 5th International Conference on Digital Information Management (ICDIM 2010), Lakehead University, Thunder Bay, Canada, 05–08 July (2010)
Tazi, E.; Benabbou, A.; Harti, M.: Design of an automatic speaker recognition based on adapted MFCC and GMM methods for Arabic speech. In: IJCSNS International Journal of Computer Science and Network Security, vol. 10 (2010)
Tolba H.: A high-performance text-independent speaker identification of Arabic speakers using a CHMM-based approach. Alex. Eng. J. 50, 43–47 (2011)
Article Google Scholar
Harrag A., Mohamadi T.: PCA, SFS or LDA: What is the best choice for extracting speaker features?. Int. J. Comput. Appl. 15(3), 1–3 (2011)
Google Scholar
Alsulaiman, M.; Alotaibi, Y.; Mahmood, A.; Bencherif, M.A.: Survey of Arabic speaker recognition. Research report, College of Computer and Information Sciences, King Saud University, Saudi Arabia (2009)
Alsulaiman M., Alotaibi Y., Muhammad G., Bencherif M.A., Mahmood A.: Arabic speaker recognition: Babylon Levantine Subset Case Study. J. Comput. Sci. USA 6(4), 381–385 (2010)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Speech Processing Group, Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, Riyadh, 11543, Saudi Arabia
Awais Mahmood, Mansour Alsulaiman & Ghulam Muhammad

Authors

Awais Mahmood
View author publications
You can also search for this author in PubMed Google Scholar
Mansour Alsulaiman
View author publications
You can also search for this author in PubMed Google Scholar
Ghulam Muhammad
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Awais Mahmood.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mahmood, A., Alsulaiman, M. & Muhammad, G. Automatic Speaker Recognition Using Multi-Directional Local Features (MDLF). Arab J Sci Eng 39, 3799–3811 (2014). https://doi.org/10.1007/s13369-014-1048-0

Download citation

Received: 04 September 2012
Accepted: 15 January 2013
Published: 09 April 2014
Issue Date: May 2014
DOI: https://doi.org/10.1007/s13369-014-1048-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic Speaker Recognition Using Multi-Directional Local Features (MDLF)

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

Speech Emotion Recognition: A Comprehensive Survey

Chinese dialect speech recognition: a comprehensive survey

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Automatic Speaker Recognition Using Multi-Directional Local Features (MDLF)

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

Speech Emotion Recognition: A Comprehensive Survey

Chinese dialect speech recognition: a comprehensive survey

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation