Skip to main content

Advertisement

Log in

Filterbank Optimization for Text-Dependent Speaker Verification by Evolutionary Algorithm Using Spline-Defined Design Parameters

  • Research Article - Computer Engineering and Computer Science
  • Published:
Arabian Journal for Science and Engineering Aims and scope Submit manuscript

Abstract

Mel frequency cepstral coefficients (MFCCs) have been the most predominantly used spectral features in many a speech-based application. It was primarily introduced to address speech recognition and was later adopted for various other applications such as speaker recognition and emotion recognition. Several findings, in recent times, suggest that Mel-scale filterbank, which is primarily inspired by human perception phenomenon, may not be the most optimum one for speaker recognition. Working in the same direction, this study attempts optimization of filterbank design for text-dependent speaker verification. Motivated by the success of evolutionary computations in the related fields, an evolutionary algorithm is used to carry out this optimization process. This brings into effect data-driven learning of the design parameters and is hypothesized to yield filterbanks which would suit the specific task of speaker-phrase discrimination. The filterbanks have been optimized for the task of text-dependent speaker verification in general, and also for specific cases of speakers and phrases. The proposed filterbank results in relative equal error rate reduction of up to 39.41% with respect to the baseline MFCCs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (France)

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

References

  1. Hansen, J.H.; Hasan, T.: Speaker recognition by machines and humans: a tutorial review. IEEE Signal Process. Mag. 32(6), 74–99 (2015)

    Article  Google Scholar 

  2. Tirumala, S.S.; Shahamiri, S.R.; Garhwal, A.S.; Wang, R.: Speaker identification features extraction methods: a systematic review. Expert Syst. Appl. 90, 250–271 (2017)

    Article  Google Scholar 

  3. Mahmood, A.; Alsulaiman, M.; Muhammad, G.: Automatic speaker recognition using multi-directional local features (mdlf). Arab. J. Sci. Eng. 39(5), 3799–3811 (2014)

    Article  Google Scholar 

  4. Aronowitz, H.: Text dependent speaker verification using a small development set. In: Odyssey 2012-The Speaker and Language Recognition Workshop (2012)

  5. Al-Kaltakchi, M.T.; Woo, W.L.; Dlay, S.; Chambers, J.A.: Evaluation of a speaker identification system with and without fusion using three databases in the presence of noise and handset effects. EURASIP J. Adv. Signal Process. 2017, 80 (2017)

    Article  Google Scholar 

  6. Al-Kaltakchi, M.T.; Woo W.L.; Dlay, S.S.; Chambers, J.A.: Comparison of I-vector and GMM-UBM approaches to speaker identification with TIMIT and NIST 2008 databases in challenging environments. In: 2017 25th European Signal Processing Conference (EUSIPCO), pp. 533–537. IEEE (2017)

  7. Larcher, A.; Lee, K.A.; Ma, B.; Li, H.: Text-dependent speaker verification: classifiers, databases and RSR2015. Speech Commun. 60, 56–77 (2014)

    Article  Google Scholar 

  8. Al-Kaltakchi, M.T.; Woo, W.L.; Dlay, S.S.; Chambers, J.A.: Multi-dimensional I-vector closed set speaker identification based on an extreme learning machine with and without fusion technologies. In 2017 Intelligent Systems Conference (IntelliSys), pp. 1141–1146. IEEE (2017)

  9. Hanilçi, C.; Çeliktaş, H.: Turkish text-dependent speaker verification using i-vector/PLDA approach. In 2018 26th Signal Processing and Communications Applications Conference (SIU), pp. 1–4. IEEE (2018).

  10. Kenny, P.: Joint factor analysis of speaker and session variability: theory and algorithms. CRIM, Montreal (Report) CRIM-06/08-13, 14, 28–29 (2005)

  11. Kenny, P.; Stafylakis, T.; Ouellet, P.; Alam, M.J.: JFA-based front ends for speaker recognition. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1705–1709. IEEE (2014).

  12. Kanagasundaram, A.; Vogt, R.; Dean, D.B.; Sridharan, S.; Mason, M.W.: I-vector based speaker recognition on short utterances. In Proceedings of the 12th Annual Conference of the International Speech Communication Association, International Speech Communication Association (ISCA), pp. 2341–2344 (2011).

  13. Larcher, A.; Bonastre, J.F.; Mason, J.: Reinforced temporal structure information for embedded utterance-based speaker recognition. In: Interspeech (2008)

  14. Ali, H.; Tran, S.N.; Benetos, E.; Garcez, A.S.D.A.: Speaker recognition with hybrid features from a deep belief network. Neural Comput. Appl. 29(6), 13–19 (2018)

    Article  Google Scholar 

  15. Zeinali, H.; Sameti, H.; Burget, L.: Text-dependent speaker verification based on i-vectors, neural networks and hidden Markov models. Comput. Speech Lang. 46, 53–71 (2017)

    Article  Google Scholar 

  16. Liu, Y.; Qian, Y.; Chen, N.; Fu, T.; Zhang, Y.; Yu, K.: Deep feature for text-dependent speaker verification. Speech Commun. 73, 1–13 (2015)

    Article  Google Scholar 

  17. Variani, E.; Lei, X.; McDermott, E.; Moreno, I.L.; Gonzalez-Dominguez, J.: Deep neural networks for small footprint text-dependent speaker verification. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4052–4056. IEEE (2014)

  18. Bhattacharya, G.; Alam, M.J.; Stafylakis, T. Kenny, P.: Deep neural network based text-dependent speaker recognition: preliminary results. In: Odyssey Speak. Lang. Recognit. Work, pp. 9–15 (2016)

  19. Heigold, G.; Moreno, I.; Bengio, S.; Shazeer, N.: End-to-end text-dependent speaker verification. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5115-5119. IEEE (2016).

  20. Sadjadi, S.O.; Hansen, J.H.: Mean Hilbert envelope coefficients (MHEC) for robust speaker and language identification. Speech Commun. 72, 138–148 (2015)

    Article  Google Scholar 

  21. Lippmann, R.P.: Speech recognition by machines and humans. Speech Commun. 22(1), 1–15 (1997)

    Article  Google Scholar 

  22. Charbuillet, C.; Gas, B.; Chetouani, M.; Zarader, J.L.: Filter bank design for speaker diarization based on genetic algorithms. In: 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, 1, I-I. IEEE (2006)

  23. Pinheiro, H.N.; Neto, F.M.; Oliveira, A.L.; Ren, T.I.; Cavalcanti, G.D.; Adami, A.G.: Optimizing speaker-specific filter banks for speaker verification. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5350–5354. IEEE (2017)

  24. Miyajima, C.; Watanabe, H.; Tokuda, K.; Kitamura, T.; Katagiri, S.: A new approach to designing a feature extractor in speaker identification based on discriminative feature extraction. Speech Commun. 35(3–4), 203–218 (2001)

    Article  MATH  Google Scholar 

  25. Charbuillet, C.; Gas, B.; Chetouani, M.; Zarader, J.L.: Optimizing feature complementarity by evolution strategy: application to automatic speaker verification. Speech Commun. 51(9), 724–731 (2009)

    Article  Google Scholar 

  26. Vignolo, L.D.; Prasanna, S.M.; Dandapat, S.; Rufiner, H.L.; Milone, D.H.: Feature optimization for stress recognition in speech. Pattern Recognit. Lett. 84, 1–7 (2016)

    Article  Google Scholar 

  27. Chittaragi, N.B.; Prakash, A.; Koolagudi, S.G.: Dialect identification using spectral and prosodic features on single and ensemble classifiers. Arab. J. Sci. Eng. 43(8), 4289–4302 (2018)

    Article  Google Scholar 

  28. Dey, S.; Motlicek, P.; Madikeri, S.; Ferras, M.: Template-matching for text-dependent speaker verification. Speech Commun. 88, 96–105 (2017)

    Article  Google Scholar 

  29. Vignolo, L.D.; Rufiner, H.L.; Milone, D.H.; Goddard, J.C.: Evolutionary cepstral coefficients. Appl. Soft Comput. 11(4), 3419–3428 (2011)

    Article  Google Scholar 

  30. Vignolo, L.D.; Rufiner, H.L.; Milone, D.H.; Goddard, J.C.: Evolutionary splines for cepstral filterbank optimization in phoneme classification. EURASIP J. Adv. Signal Process. 2011, 8 (2011)

    Article  Google Scholar 

  31. Whitley, D.: A genetic algorithm tutorial. Stat. Comput. 4(2), 65–85 (1994)

    Article  Google Scholar 

  32. Deb, K.: An introduction to genetic algorithms. Sadhana 24(4–5), 293–315 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  33. Al-Salami, N.M.: Evolutionary algorithm definition. Am. J. Eng. Appl. Sci. 2(4), 789–795 (2009)

    Article  Google Scholar 

  34. Goldberg, D.E.: Genetic Algorithms. Pearson Education, New delhi (2006)

    Google Scholar 

  35. Young, S.J.; Young, S.: The HTK Hidden Markov Model Toolkit: Design and Philosophy, p. 28. University of Cambridge, Department of Engineering, Cambridge (1993)

    Google Scholar 

  36. Gallardo, L.F.: Human and Automatic Speaker Recognition Over Telecommunication Channels. Springer, Berlin (2015)

    Google Scholar 

  37. Lei, H.; Lopez, E.: Mel, linear, and antimel frequency cepstral coefficients in broad phonetic regions for telephone speaker recognition. In: Tenth Annual Conference of the International Speech Communication Association (2009)

  38. Zeinali, H.; Sameti, H.; Burget, L.: HMM-based phrase-independent i-vector extractor for text-dependent speaker verification. IEEE ACM Trans. Audio Speech Lang. Process. 25(7), 1421–1435 (2017)

    Article  Google Scholar 

  39. Chen, N.; Qian, Y.; Yu, K.: Multi-task learning for text-dependent speaker verification. In: Sixteenth annual conference of the international speech communication association (2015)

  40. Snyder, D.; Garcia-Romero, D.; Sell, G.; Povey, D.; Khudanpur, S.: X-vectors: robust DNN embeddings for speaker recognition. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5329–5333. IEEE (2018)

  41. Laskar, M.A.; Laskar, R.H.: Integrating DNN–HMM technique with hierarchical multi-layer acoustic model for text-dependent speaker verification. Circuits Syst. Signal Process. 38, 1531–5878 (2019)

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to extend their gratitude to the members of the Speech and Image Processing Lab of the National Institute of Technology Silchar for their encouragement and support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammad Azharuddin Laskar.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Laskar, M.A., Laskar, R.H. Filterbank Optimization for Text-Dependent Speaker Verification by Evolutionary Algorithm Using Spline-Defined Design Parameters. Arab J Sci Eng 44, 9703–9718 (2019). https://doi.org/10.1007/s13369-019-04090-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13369-019-04090-4

Keywords

Navigation