Skip to main content
Log in

Frame level sparse representation classification for speaker verification

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In this paper, we analyze the application of the sparse representation of frames of the speech signal for the speaker verification. It is lately shown that Sparse Representation Classification (SRC), is promising for speaker recognition. We bring evidence that the frame level sparse representation classification resembles process of speech recognition in human sensory system. Since the recognition of different voices (noises) helps individuals to immediately distinguish between the noise and the original speech signal, a noise aware system was designed. As a principal in the sparse representation, we argued the mutual coherence of the dictionary columns, called dictionary atoms, which is not efficiently considered in the already published SRC base speaker verification researches. To suppress the mutual coherence, we use a dictionary learning method to construct a dictionary with effective atoms. Our proposed Frame Level Sparse Representation Classification (FSRC), provides new insights to the SRC based speaker verification. We demonstrate that, in the SRC based speaker verification, using a dictionary whose atoms are orthogonal can be more extensible than a dictionary whose atoms are highly correlated, and that the mutual coherence suppression is even more effective than imposing strong orthogonality on the dictionary atoms. We consider the performance of state-of-the-art speaker recognition systems and the proposed method on NIST SRE 2004 data. Experimental results show that in comparison to baseline methods, when we have enough amount of information in the registration of targets, the proposed method improves the performance of speaker verification system in noisy conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Brümmer N (2010) Measuring, refining and calibrating speaker and language information extracted from speech, PhD Dissertation, Stellenbosch University, 2010

  2. Brümmer N, Swart A, van Leeuwen D (2014) A comparison of linear and non-linear calibrations for speaker recognition, arXiv:1402.2447

  3. Campbell WM, Sturim DE, Reynolds DA (2006) Support vector machines using GMM supervectors for speaker verification. IEEE Signal Process Lett 13(5):308–311

    Article  Google Scholar 

  4. Chi YT, Ali M, Rajwade A, Ho J (2013) Block and group regularized sparse modeling for dictionary learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013, pp. 377–382

  5. Dehak N, Kenny PJ, Dehak R, Dumouchel P, Ouellet P (2011) Front-end factor analysis for speaker verification. IEEE Trans Audio Speech Lang Process 19(4):788–798

    Article  Google Scholar 

  6. Harandi M, Sanderson C, Shen C, Lovell BC (2013) Dictionary learning and sparse coding on grassmann manifolds: an extrinsic solution. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 3120–3127

  7. Haris BC, Sinha R (2012) Sparse representation over learned and discriminatively learned dictionaries for speaker verification. In: Acoustics, Speech and Signal Processing (ICASSP), no. 3, pp. 4785–4788

  8. Hautamäki V, Kinnunen T, Kärkkäinen I, Saastamoinen J, Tuononen M, Fränti P (2008) Maximum a posteriori adaptation of the centroid model for speaker verification. IEEE Signal Process Lett 15:162–165

    Article  Google Scholar 

  9. Hautamӓki V, Kinnunen T, Sedlák F, Lee KA, Ma B, Li H (2013) Sparse classifier fusion for speaker verification. IEEE Trans Audio Speech Lang Process 21(8):1622–1631

    Article  Google Scholar 

  10. Hurmalainen A, Saeidi R, Virtanen T (2015) Noise robust speaker recognition with convolutive sparse coding. In: Proceedings of the Annual Conference of the International Speech Communication Association, Interspeech, pp. 244–248

  11. Kim C, Stern RM (2012) Power-Normalized Cepstral Coefficients (PNCC) for robust speech recognition, In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , Kyoto, pp. 4101–4104, 2012

  12. Kua JMK, Epps J, Ambikairajah E (2013) i-Vector with sparse representation classification for speaker verification. Speech Comm 55(5):707–720

    Article  Google Scholar 

  13. Labusch K, Barth E, Martinetz T (2009) Sparse coding neural gas: learning of overcomplete data representations. Neurocomputing 72(7–9):1547–1555

    Article  Google Scholar 

  14. Li M, Zhang X, Yan Y, Narayanan SS (2011) Speaker verification using sparse representations on total variability i-vectors. In: Interspeech, pp. 2729–2732

  15. Nakagawa S, Wang L, Ohtsuka S (2012) Speaker identification and verification by combining MFCC and phase information. IEEE Trans Audio Speech Lang Process 20(4):1085–1095

    Article  Google Scholar 

  16. Naseem I, Togneri R, Bennamoun M (2010) Sparse representation for speaker identification 2010 20th International Conference Pattern Recognition, pp. 4460–4463, Aug. 2010

  17. Olshausen BA, Field DJ (2004) Sparse coding of sensory inputs. Curr Oppin Neurobiol 14:481–487

    Article  Google Scholar 

  18. Padmanabhan R, Hari S, Parthasarathi K, Murthy HA (2009) Robustness of phase based features for speaker recognition. In: Proc. Interspeech, pp. 2355–235

  19. Reynolds DA, Quatieri TF, Dunn RB (2000) Speaker verification using adapted Gaussian mixture models. Digital Signal Process 10(1–3):19–41

    Article  Google Scholar 

  20. Reynolds DA, Rose RC (1995) Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans Speech Audio Process 3(1):72–83

    Article  Google Scholar 

  21. Roach P (1991) English phonetics and phonology. Cambridge University Press

  22. Sadjadi SO, Slaney M, Heck L (2013) MSR Identity Toolbox v1. 0: A matlab toolbox for speaker-recognition research. Speech Lang Process Tech Comm Newsl 1(4)

  23. Saeidi R, Hurmalainen A, Virtanen T, Van Leeuwen DA (2012) Exemplar-based sparse representation and sparse discrimination for noise robust speaker identification. In: Odyssey speaker and language recognition workshop, pp. 248–255

  24. Solomonoff A, Quillen C, Campbell WM (2004) Channel compensation for SVM speaker recognition. Odyssey 4:57–62

    Google Scholar 

  25. The evaluation plan of NIST 2004 Speaker Recognition evaluation campaign. [Online]. Available: http://www.nist.gov/speech/tests/spk/2004/SRE-04 evalplan-v1a.pdf

  26. Tzagkarakis C, Mouchtaris A (2013) Sparsity based robust speaker identification using a discriminative dictionary learning approach, In21st European Signal Processing Conference (EUSIPCO 2013), (pp. 1–5). IEEE, 9 Sep. 2013

  27. Wang L, Minami K, Yamamoto K, Nakagawa S (2010) Speaker recognition by combining MFCC and phase information in noisy conditions. IEICE Trans Inf Syst E93–D(9):2397–2406

    Article  Google Scholar 

  28. Yaghoobi M, Blumensath T, Davies ME (2009) Dictionary learning for sparse approximations with the majorization method. IEEE Trans Signal Process 57(6):2178–2191

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hassan Farsi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hasheminejad, M., Farsi, H. Frame level sparse representation classification for speaker verification. Multimed Tools Appl 76, 21211–21224 (2017). https://doi.org/10.1007/s11042-016-4071-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-016-4071-1

Keywords

Navigation