Advertisement

Efficient speaker identification using spectral entropy

  • Fernando Luque-SuárezEmail author
  • Antonio Camarena-Ibarrola
  • Edgar Chávez
Article
  • 33 Downloads

Abstract

In voice recognition, the two main problems are speech recognition (what was said), and speaker recognition (who was speaking). The usual method for speaker recognition is to postulate a model where the speaker identity corresponds to the parameters of the model, which estimation could be time-consuming when the number of candidate speakers is large. In this paper, we model the speaker as a high dimensional point cloud of entropy-based features, extracted from the speech signal. The method allows indexing, and hence it can manage large databases. We experimentally assessed the quality of the identification with a publicly available database formed by extracting audio from a collection of YouTube videos of 1,000 different speakers. With 20 second audio excerpts, we were able to identify a speaker with 97% accuracy when the recording environment is not controlled, and with 99% accuracy for controlled recording environments.

Keywords

Speaker recognition Speaker identification Entropygrams 

Notes

References

  1. 1.
    Beltrán J, Chávez E, Favela J (2015) Scalable identification of mixed environmental sounds, recorded from heterogeneous sources. Pattern Recogn Lett 68:153–160CrossRefGoogle Scholar
  2. 2.
    Bernhardsson E Annoy: approximate nearest neighbors in C++/Python optimized for memory usage and loading/saving to disk. https://github.com/spotify/annoy
  3. 3.
    Davis SB, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. In: IEEE transactions on acoustics, speech, and signal processing, vol 28, pp 357–366Google Scholar
  4. 4.
    Dehak N, Kenny PJ, Dehak R, Dumouchel P, Ouellet P (2011) Front-end factor analysis for speaker verification. In: IEEE transactions on audio, speech and language processing, vol 19. pp 788–798Google Scholar
  5. 5.
    Greenberg C, Bansé D (2014) The NIST 2014 speaker recognition i-vector machine learning challenge. In: Proc the speaker and language recognition workshop, pp 224–230Google Scholar
  6. 6.
    Hansen JH, Hasan T (2015) Speaker recognition by machines and humans: a tutorial review. IEEE Signal Proc Mag 32(6):74–99CrossRefGoogle Scholar
  7. 7.
    Kenny P (2005) Joint factor analysis of speaker and session variability: theory and algorithms. CRIM, Montreal,(Report) CRIM-06/08-13, pp 1–17Google Scholar
  8. 8.
    Kenny P, Mihoubi M, Dumouchel P (2003) New MAP estimators for speaker recognition. Interspeech, pp 1–4Google Scholar
  9. 9.
    Kinnunen T, Li H (2010) An overview of text-independent speaker recognition: from features to supervectors. Speech Comm 52:12–40CrossRefGoogle Scholar
  10. 10.
    Kulis B, Grauman K (2012) Kernelized locality-sensitive hashing. IEEE Trans Pattern Anal Mach Intell 34(6):1092–1104CrossRefGoogle Scholar
  11. 11.
    Liu Y, Nie L, Liu L, Rosenblum DS (2016) From action to activity: sensor-based activity recognition. Neurocomputing 181:108–115CrossRefGoogle Scholar
  12. 12.
    Reynolds DA, Quatieri TF, Dunn RB (2000) Speaker verification using adapted Gaussian mixture models. Digital Signal Processing: A Review Journal 10(1):19–41CrossRefGoogle Scholar
  13. 13.
    Schmidt L (2014) Large scale speaker identification. In: 2014 IEEE international conference on acoustic, speech and signal processing (ICASSP), pp 1669–1673Google Scholar
  14. 14.
    Shannon CE (2001) A mathematical theory of communication. ACM SIGMOBILE Mobile Computing and Communications Review 5(1):3MathSciNetCrossRefGoogle Scholar
  15. 15.
    Snyder D, Garcia-romero D, Povey D (2015) Time delay deep neural network-based universal background models for speaker recognition. In: 2015 IEEE workshop on automatic speech recognition and understanding (ASRU), IEEE, pp 92–97Google Scholar
  16. 16.
    Uhlmann JK (1991) Satisfying general proximity / similarity queries with metric trees. Inf Process Lett 40:175–179CrossRefGoogle Scholar
  17. 17.
    Yianilos PN (1993) Data structures and algorithms for nearest neighbor search in general metric spaces. Annual ACM-SIAM Symposium on Discrete Algorithms, pp 311–321Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.CICESEEnsenadaMexico
  2. 2.Universidad MichoacanaMoreliaMexico

Personalised recommendations