Skip to main content

Advertisement

Log in

Automatic Speaker Recognition Using Multi-Directional Local Features (MDLF)

  • Research Article - Computer Engineering and Computer Science
  • Published:
Arabian Journal for Science and Engineering Aims and scope Submit manuscript

Abstract

A new feature called multi-directional local feature (MDLF) is proposed and applied in automatic speaker recognition. In order to extract the MDLF, a windowed speech signal is processed by fast Fourier transform and passed through 24 mel-scaled Filter Bank, followed by log compression. A three-point linear regression is then applied in four different directions, which are horizontal (time axis), vertical (frequency axis), 45 (time–frequency) and 135 (time–frequency). MDLF holds the characteristics of the speaker in time spectrum and results in better performance. In the experiments conducted, a Gaussian mixture model (GMM) with a different number of mixtures is used as the classifier. Experimental results show that the proposed MDLF has better recognition accuracy than the traditional MFCC features. The MDLF achieves excellent results both in text-dependent and text-independent speaker recognition, and in Arabic and English speech. The proposed technique is also language independent.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Atal, B.S.: Automatic recognition of speakers from their voices. In: Proceedings of IEEE, vol. 64, pp. 460–475 (1976)

  2. Campbell J.P. Jr.: Speaker recognition: a tutorial. In: Proceedings of the IEEE, vol. 85(9), pp. 1437–1462 (1997)

  3. Reynolds D.A., Quatieri T.F., Dunn R.: Speaker verification using adapted Gaussian mixture models. Digital Signal Process. 10(1–3), 19–41 (2000)

    Article  Google Scholar 

  4. Wildermoth, B.; Paliwal, K.K.: Use of voicing and pitch information for speaker recognition. In: Proceedings of Australian International Conference on Speech Science and Technology (SST-2000), Canberra, Australia, pp. 324–328 (2000)

  5. Wang N., Ching P.C., Zheng N., Lee T.: Robust speaker recognition using denoised vocal source and vocal tract features. IEEE Trans. Audio Speech Lang. Process. 19(1), 196–205 (2011)

    Article  Google Scholar 

  6. Lawson, A.; Vabishchevich, P.; Huggins, M.; Ardis, P.; Battles, B.: Survey and evaluation of acoustic features for speaker recognition. ICASSP2011 (2011)

  7. Nitta, T.: A novel feature-extraction for speech recognition based on multiple acoustic-feature planes. In: Proceedings of IEEE ICASSP’98, vol. I, pp. 29–32 (1998)

  8. Nitta, T.: Feature extraction for speech recognition based on orthogonal acoustic-feature planes and LDA. In: Proceedings of IEEE ICASSP’99, vol. I, pp. 421–424 (1999)

  9. Fukuda, T.; et al.: Orthogonalized distinctive phonetic feature extraction for noise-robust automatic speech recognition. IEICE Trans. Info. Sys. E87-D(5), 1110–1118 (2004)

  10. Fukuda, T.; Nitta, T.: A Study on Japanese distinctive phonetic feature set for robust speech recognition. In: Proceedings of the 2003 Autumn Meeting of The Acoustical Society of Japan, vol. I, pp. 1–6, 9–10 Sept 2003, in Japanese (2003)

  11. Hassan, F.; Rokibul, M.; Kotwal, A.; Rahman, M.; Nasiruddin, M.; Latif, A.; Huda M.N.: Local feature or mel frequency cepstral coefficients: Which one is better for MLN-based Bangla speech recognition? pp. 54–161. Springer, Berlin (2011)

  12. Alotaibi, Y.A.; Abdullah-Al-Mamun, K.; Muhammad, G.: Noise effect on Arabic alphadigits in automatic speech recognition. IPCV’09: The 2009 International Conference on Image Processing, Computer Vision, and Pattern Recognition, Las Vegas, Nevada, USA, 13–16 July (2009)

  13. Graciarena, M.; Kanjarekar, S.; Stolcke, A.; Shriberg, E.: Noise robust speaker identification for spontaneous Arabic speech. In: International Conference on Acoustic, Speech and Signal Processing (ICASSP) (2007)

  14. Alotaibi Y.A., Mauhammad G.: Study on pharyngeal and uvular consonants in foreign accented Arabic for ASR. Comput. Speech. Lang. Sci. Direct. 24, 219–231 (2010)

    Article  Google Scholar 

  15. Kirchhoff, K.; Bilmes, J.; Das, S.; Duta, N.; Egan, M.; Gang, J.; Feng, H.; Henderson, J.; Daben, L.; Noamany, M.; Schone, P.; Schwartz, R.; Vergyri, D.: Novel Speech Recognition Models for Arabic. Johns–Hopkins University Summer Research Workshop 2002: Final Report, http://www.clsp.jhu.edu/ws02 (2002)

  16. Alkhouli, M.: Alaswaat Alaghawaiyah. Daar Alfalah, Jordan (Arabic) (1990)

  17. Al-Zabibi, M.: An Acoustic–phonetic Approach in Automatic Arabic Speech Recognition. The British Library in Association with UMI, UK (1990)

  18. El-Imam, Y.: An unrestricted vocabulary Arabic speech synthesis system. In: Proceedings of IEEE Transactions on Acoustic, Speech, and Signal Processing, vol. 37(12), pp. 1829–1845 (1989)

  19. Nofal, M.; Abdulrahem, E.; El-Henawy, H.: Arabic/English automatic language identification. In: Proceedings of IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, pp. 400–403 (1999)

  20. Hussain, S.J.; Mohamed, F.A.: Optimizing network parameters for Arabic speakers recognition. In: Proceedings of the 41st SICE Annual Conference vol. 5, pp. 2682–2685 (2002)

  21. Oumaour-Sayoud, S.; Sayoud, H.; Boudraa, M.: Application of the MLVQ1 in speaker identification. In: Proceedings of ISCA Tutorial and Research Workshop on Non-Linear Speaker Processing (NOLISP’03), France (2003)

  22. Sakka, Z.; Kachouri, A.; Meghzani, A.; Samet, M.: A new method for speech denoising and speaker verification using subband architecture. In: Proceedings of First International Symposium on Control, Communications and Signal Processing, pp. 37–40 (2004)

  23. El-Gama, M.A.; Abu-El Yazeed, M.F.; El Ayadi, M.M.H.: Dimensionality reduction for text independent speaker identification using Gaussian mixture model. In: Proceedings of IEEE International Symposium on Micro-NanoMechatronics and Human Science (2005)

  24. Abu el-Yazeed M.F., Abdelkader N.S., EL-Henawy M.M.: A modified group vector quantization algorithm for speaker identification. Micro-Mech. Human Sci. 2, 629–632 (2005)

    Google Scholar 

  25. Elmisery, F.E.; Khaleil, A.H.; Salama, A.E.; El-Galdawi, F.: An FPGA VQ for Speaker Identification. Cairo University, Giza, Egypt (2005)

  26. Al-Kanhal, M.; Alghamdi, M.; Muzzaffar, Z.: Speaker verification based on Saudi accented Arabic database. In: International Symposium on Signal Processing and its Applications in conjunction with the International Conference on Information Sciences, Signal Processing and its Applications. Sharjah, United Arab Emirates, 12–15 Feb (2007)

  27. Saeed K., Nammous M.K.: A speech and speaker identification system: feature extraction, description, and classification of speech-signal image. IEEE Trans. Ind. Electron. 54(2), 887–897 (2007)

    Article  Google Scholar 

  28. Kachouri, A.; Hdiji, T.; Sakka, Z.; Samet, M.: Contribution to the vocal print recogntion in Arabic language. J. Appl. Sci. 7(18), 2560–2567

  29. Stolcke, A.; Kajarekar, S.: Recognizing Arabic speakers with english phones. In: The Speaker and Language Recognition Workshop, Odyssey 2008, Paper 024 (2008)

  30. Al Marshali, A.; Al-Dakak, O.: Automatic, text-independant, speaker identification and verification system using mel cepstrum and GMM. In: Proceedings of 3rd International Conference on Information and Communication Technologies: From Theory to Applications. ICTTA, pp. 1–6 (2008)

  31. Daqrooq, K.; Al-Sawalmeh, W.; Al-Qawasmi, A.; Abu-Isbeih, I.: Speaker identification wavelet transform based method. In: Proceedings of 5th International Multi-Conference Systems, Signals and Devices (IEEE SSD), pp. 1–5 (2008)

  32. Bengharabi, M.; Tounsi, B.; Bessalah, H.; Harizi, F.: Forensic Identification Reporting Using a GMM Based Speaker Recognition Dedicated to Algerian Arabic Dialect Speakers. Advanced Technologies Research Center, Algeria (2008)

  33. Al-Dahri, S.S.; Al-Jassar, Y.H.; Alotaibi, Y.A.; Alsulaiman, M.M.; Abdullah-Al-Mamun, K.: A Word-Dependent Automatic Arabic Speaker Identification System, ISSPIT’08, BOSNIA (2008)

  34. Alsulaiman, M.; Muhammad, G; Alotaibi, Y.; Mahmood, A.; Bencherif, M.A.: Building a speaker recognition with one sample. In: Proceedings of the Second Symposium International Computer Science and Computational Technology (ISCSCT ’09) Huangshan, People’s Republic of China, 26–28 Dec 2009, pp. 330–334 (2009)

  35. Alsulaiman, M.; Mahmood, A.; Muhammad, G.; Muhammad, A.; Bencherif, A.; Alotaibi, Y.: A technique to overcome the problem of small size database for automatic speaker recognition. In: Proceedings of the 5th International Conference on Digital Information Management (ICDIM 2010), Lakehead University, Thunder Bay, Canada, 05–08 July (2010)

  36. Tazi, E.; Benabbou, A.; Harti, M.: Design of an automatic speaker recognition based on adapted MFCC and GMM methods for Arabic speech. In: IJCSNS International Journal of Computer Science and Network Security, vol. 10 (2010)

  37. Tolba H.: A high-performance text-independent speaker identification of Arabic speakers using a CHMM-based approach. Alex. Eng. J. 50, 43–47 (2011)

    Article  Google Scholar 

  38. Harrag A., Mohamadi T.: PCA, SFS or LDA: What is the best choice for extracting speaker features?. Int. J. Comput. Appl. 15(3), 1–3 (2011)

    Google Scholar 

  39. Alsulaiman, M.; Alotaibi, Y.; Mahmood, A.; Bencherif, M.A.: Survey of Arabic speaker recognition. Research report, College of Computer and Information Sciences, King Saud University, Saudi Arabia (2009)

  40. Alsulaiman M., Alotaibi Y., Muhammad G., Bencherif M.A., Mahmood A.: Arabic speaker recognition: Babylon Levantine Subset Case Study. J. Comput. Sci. USA 6(4), 381–385 (2010)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Awais Mahmood.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mahmood, A., Alsulaiman, M. & Muhammad, G. Automatic Speaker Recognition Using Multi-Directional Local Features (MDLF). Arab J Sci Eng 39, 3799–3811 (2014). https://doi.org/10.1007/s13369-014-1048-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13369-014-1048-0

Keywords

Navigation