Advertisement

International Journal of Speech Technology

, Volume 21, Issue 4, pp 941–951 | Cite as

Speaker identification based on normalized pitch frequency and Mel Frequency Cepstral Coefficients

  • Marwa A. Nasr
  • Mohammed Abd-Elnaby
  • Adel S. El-Fishawy
  • S. El-Rabaie
  • Fathi E. Abd El-Samie
Article
  • 51 Downloads

Abstract

This paper presents an efficient approach for automatic speaker identification based on cepstral features and the Normalized Pitch Frequency (NPF). Most relevant speaker identification methods adopt a cepstral strategy. Inclusion of the pitch frequency as a new feature in the speaker identification process is expected to enhance the speaker identification accuracy. In the proposed framework for speaker identification, a neural classifier with a single hidden layer is used. Different transform domains are investigated for reliable feature extraction from the speech signal. Moreover, a pre-processing noise reduction step, is used prior to the feature extraction process to enhance the performance of the speaker identification system. Simulation results prove that the NPF as a feature in speaker identification enhances the performance of the speaker identification system, especially with the Discrete Cosine Transform (DCT) and wavelet denoising pre-processing step.

Keywords

Speaker identification MFCCs Normalized pitch frequency ANNs 

References

  1. Abd El-Moneim, S., Dessouky, M., Abd El-Samie, F. E., Nassar, M. A., & Abd El-Naby, M. (2015). Hybrid speech enhancement with empirical mode decomposition and spectral subtraction for efficient speaker identification. International Journal of Speech Technology, 3, 555–564.CrossRefGoogle Scholar
  2. Abd El-Samie, F. E., Shafik, A., El-sayed, H. S., Elhalafawy, S. M., Diab, S. M., Sallam, B. M., et al. (2015). Sensitivity of automatic speaker identification to SVD digital audio watermarking, International Journal of Speech Technology, 18, 565–581.CrossRefGoogle Scholar
  3. Dave, N. (2013). Feature extraction methods LPC, PLP and MFCC in speech recognition. International Journal for Advance Research in Engineering and Technology (IJARET), 1, 1–5.Google Scholar
  4. Dreyfus, G. (2005). Neural networks methodology and applications. Berlin: Springer.zbMATHGoogle Scholar
  5. Furui, S. (1981). Cepstral analysis technique for automatic speaker verification. IEEE Transactions on Acoustics, Speech, And Signal Processing, 29(2), 254–272.CrossRefGoogle Scholar
  6. Galushkin, A. I. (2007). Neural networks theory. Berlin: Springer.zbMATHGoogle Scholar
  7. Gandhiraj, R., & Sathidevi, P. S. (2007). Auditory-based wavelet packet filter bank for speech recognition using neural network. In Proceedings of the 15th international conference on advanced computing and communications, Guwahati pp. 666–671.Google Scholar
  8. Hayati, M., & Shirvany, Y. (2007). Artificial neural network approach for short term load forecasting for Illam region. In Proceeding of World Academy of Science, Engineering and Technology, Turkey (Vol. 22). ISSN 1307-6884.Google Scholar
  9. Islam, A. (2017). Modified mel-frequency cepstral coefficients (MMFCC) in robust text-dependent speaker identification. In International Conference on Advances in Electrical Engineering (ICAEE), Dhaka pp. 505–509.Google Scholar
  10. Kopparapu, S. K., & Laxminarayana, M. (2010). Choice of mel-filter bank in computing MFCC of a resampled speech. In IEEE, International conference on information science, signal processing and their applications (ISSPA), Kuala Lumpur (pp. 121–124).Google Scholar
  11. Kura, V. (2003). Novel pitch detection algorithm with application to speech coding.Google Scholar
  12. Li, X., Xie, H., & Cheng, B. (2006). Noisy speech enhancement based on discrete sine transform. In Proceedings of IEEE international multi-symposiums on computer and computational sciences (IMSCCS), Hangzhou.Google Scholar
  13. McLeod, P. (2008). Fast, accurate PD tools for music analysis. Ph.D. thesis, the University of Otago, Dunedin, New Zealand.Google Scholar
  14. Nakagawa, S., Wang, L., & Ohtsuka, S. (2012). Speaker identification and verification by combining MFCC and phase information. IEEE Transactions on Audio, Speech, and Language Processing, 20(4), 1085–1095.CrossRefGoogle Scholar
  15. Nasr, M. A., El-Rabaie, S., Abd El-Samie, F. E., El-Fishawy, A. S., & Abd-Elnaby, M. (2018). Efficient implementation of adaptive wiener filter for pitch detection from noisy speech signals. MJEER, 27, 109–126.Google Scholar
  16. Nazar, M. N. (2002). Speaker identification using cepstral analysis. In Proceedings of IEEE ISCON'02 Conference (Vol. 1, pp. 139–143).Google Scholar
  17. Polur, P. D., & Miller, G. E. (2005). Experiments with fast Fourier transform, linear predictive and cepstral coefficients in dysarthric speech recognition algorithms using Hidden Markov Model. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 13(4), 558–561.CrossRefGoogle Scholar
  18. Pullella, D. (2006). Speaker identification using higher order spectra. Dissertation of Bachelor of Electrical and Electronic Engineering, University of Western Australia.Google Scholar
  19. Sahidullah, M., & Saha, G. (2013). A novel windowing technique for efficient computation of MFCC for speaker recognition. IEEE Signal Processing Letters, 20(2), 149–153.CrossRefGoogle Scholar
  20. Shafik, A., Elhalafawy, S. M., Diab, S. M., Sallam, B. M., & Abd El-samie, F. E. (2009a). A wavelet based approach for speaker identification from degraded speech. International Journal of Communication Networks and Information Security (IJCNIS), 1(3), 52–58.Google Scholar
  21. Shafik, A., Elhalafawy, S. M., Diab, S. M., Sallam, B. M., & Abd El-Samie, F. E. (2009b). DCT assisted speaker identification in the presence of noise and channel degradation. In International conference on computer engineering & system (ICCES), Cairo (pp. 191–196).Google Scholar
  22. Shuling, L., & Wang, C. (2009). Nonspecific speech recognition method based on composite LVQ1 and LVQ2 network. In Chinese control and decision conference (CCDC), Guilin (pp. 2304–2308)Google Scholar
  23. Sinith, M. S., Salim, A., Sankar, K. G., Narayanan, K. V., & Soman, V. (2010). A novel method for text-independent speaker identification using MFCC and GMM. In IEEE international conference on audio, language and image processing (ICALIP), Shanghai (pp. 292–296).Google Scholar
  24. Sukhostat, L., Imamverdiyev, Y., & Azerbaijan, B. (2014). A comparative analysis of PD methods under the influence of different noise conditions. Journal of Voice, 4, 1–8.Google Scholar
  25. Veena, K. V., & Mathew, D. (2015). Speaker identification and verification of noisy speech using multitaper MFCC and Gaussian mixture models. In IEEE international conference on power, instrumentation, control and computing (PICC), Thrissur (pp. 1–4).Google Scholar
  26. Walker, J. S. (1999). A primer on wavelets and their scientific applications. CRC Press, Boca Raton.CrossRefzbMATHGoogle Scholar
  27. Zulfiqar, A., Muhammad, A., & Martinez Enriquez, A. M. (2009). A speaker identification system using MFCC features with VQ technique. In IEEE third international symposium on intelligent information technology application, Nanchang (Vol. 9, pp. 115–118).Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Marwa A. Nasr
    • 1
  • Mohammed Abd-Elnaby
    • 1
  • Adel S. El-Fishawy
    • 1
  • S. El-Rabaie
    • 1
  • Fathi E. Abd El-Samie
    • 1
  1. 1.Department of Electronics and Electrical Communications, Faculty of Electronic EngineeringMenoufia UniversityMenoufEgypt

Personalised recommendations