Multimedia Tools and Applications

, Volume 76, Issue 5, pp 7251–7281 | Cite as

Multi-factor authentication model based on multipurpose speech watermarking and online speaker recognition

  • Mohammad Ali Nematollahi
  • Hamurabi Gamboa-Rosales
  • Francisco J. Martinez-Ruiz
  • Jose I. De la Rosa-Vargas
  • S. A. R. Al-Haddad
  • Mansour Esmaeilpour


In this paper, a Multi-Factor Authentication (MFA) method is developed by a combination of Personal Identification Number (PIN), One Time Password (OTP), and speaker biometric through the speech watermarks. For this reason, a multipurpose digital speech watermarking applied to embed semi-fragile and robust watermarks simultaneously in the speech signal, respectively to provide tamper detection and proof of ownership. Similarly, the blind semi-fragile speech watermarking technique, Discrete Wavelet Packet Transform (DWPT) and Quantization Index Modulation (QIM) are used to embed the watermark in an angle of the wavelet’s sub-bands where more speaker specific information is available. For copyright protection of the speech, a blind and robust speech watermarking are used by applying DWPT and multiplication. Where less speaker specific information is available the robust watermark is embedded through manipulating the amplitude of the wavelet’s sub-bands. Experimental results on TIMIT, MIT, and MOBIO demonstrate that there is a trade-off among recognition performance of speaker recognition systems, robustness, and capacity which are presented by various triangles. Furthermore, threat model and attack analysis are used to evaluate the feasibility of the developed MFA model. Accordingly, the developed MFA model is able to enhance the security of the systems against spoofing and communication attacks while improving the recognition performance via solving problems and overcoming limitations.


Speech watermarking Online speaker recognition Discrete wavelet packet transform Threat model Attack analysis Multi-factor authentication 


  1. 1.
    Akhaee MA, Kalantari NK, Marvasti F (2009) Robust multiplicative audio and speech watermarking using statistical modeling. In IEEE International Conference on Communications, ICC’09. 2009. IEEEGoogle Scholar
  2. 2.
    Akhaee MA, Kalantari NK, Marvasti F (2010) Robust audio and speech watermarking using Gaussian and Laplacian modeling. Signal Process 90(8):2487–2497CrossRefMATHGoogle Scholar
  3. 3.
    Al-Nuaimy W et al (2011) An SVD audio watermarking approach using chaotic encrypted images. Digit Sig Process 21(6):764–779CrossRefGoogle Scholar
  4. 4.
    Baroughi AF, Craver S (2014) Additive attacks on speaker recognition. In IS&T/SPIE Electronic imaging. International Society for Optics and PhotonicsGoogle Scholar
  5. 5.
    Besacier L, Bonastre J-F, Fredouille C (2000) Localization and selection of speaker-specific information with statistical modeling. Speech Comm 31(2):89–106CrossRefGoogle Scholar
  6. 6.
    Bimbot F et al (2004) A tutorial on text-independent speaker verification. EURASIP J Appl Sig Process 2004:430–451CrossRefGoogle Scholar
  7. 7.
    Bolten JB (2003) E-authentication guidance for federal agencies. Office of Management and Budget, 2003
  8. 8.
    Brookes M (2006) VOICEBOX: a speech processing toolbox for MATLABGoogle Scholar
  9. 9.
    Chaturvedi A, Mishra D, Mukhopadhyay S (2013) Improved biometric-based three-factor remote user authentication scheme with key agreement using smart card. In Information systems security, Springer, p 63–77Google Scholar
  10. 10.
    Dehak N et al (2011) Front-end factor analysis for speaker verification. Audio Speech Lang Process IEEE Trans 19(4):788–798CrossRefGoogle Scholar
  11. 11.
    Faundez-Zanuy M, Hagmüller M, Kubin G (2006) Speaker verification security improvement by means of speech watermarking. Speech Comm 48(12):1608–1619CrossRefMATHGoogle Scholar
  12. 12.
    Faundez-Zanuy M, Hagmüller M, Kubin G (2007) Speaker identification security improvement by means of speech watermarking. Pattern Recogn 40(11):3027–3034CrossRefMATHGoogle Scholar
  13. 13.
    Garofolo JS, L.D. Consortium (1993) TIMIT: acoustic-phonetic continuous speech corpus, Linguistic Data ConsortiumGoogle Scholar
  14. 14.
    Hinkley DV (1969) On the ratio of two correlated normal random variables. Biometrika 56(3):635–639MathSciNetCrossRefMATHGoogle Scholar
  15. 15.
    Huber R, Stögner H, Uhl A (2011) Two-factor biometric recognition with integrated tamper-protection watermarking. In Communications and multimedia security, SpringerGoogle Scholar
  16. 16.
    Hyon S (2012) An investigation of dependencies between frequency components and speaker characteristics based on phoneme mean F-ratio contribution. In Signal & Information Processing Association Annual Summit and Conference (APSIPA ASC), 2012. Asia-Pacific: IEEEGoogle Scholar
  17. 17.
    Kenny P (2012) A small foot-print i-vector extractor. In Proc. OdysseyGoogle Scholar
  18. 18.
    Khitrov M (2013) Talking passwords: voice biometrics for data access and security. Biom Technol Today 2013(2):9–11CrossRefGoogle Scholar
  19. 19.
    Kim J-J, Hong S-P (2011) A method of risk assessment for multi-factor authentication. J Inf Process Syst (JIPS) 7(1):187–198CrossRefGoogle Scholar
  20. 20.
    Kumar A, Lee HJ (2013) Multi-factor authentication process using more than one token with watermark security. In Future information communication technology and applications, Springer, p 579–587Google Scholar
  21. 21.
    Li C-T, Hwang M-S (2010) An efficient biometrics-based remote user authentication scheme using smart cards. J Netw Comput Appl 33(1):1–5CrossRefGoogle Scholar
  22. 22.
    Li Q, Memon N, Sencar HT (2006) Security issues in watermarking applications-A deeper look. In Proceedings of the 4th ACM international workshop on Contents protection and security. ACMGoogle Scholar
  23. 23.
    Lu X, Dang J (2008) An investigation of dependencies between frequency components and speaker characteristics for text-independent speaker identification. Speech Comm 50(4):312–322CrossRefGoogle Scholar
  24. 24.
    Mallat S (2008) A wavelet tour of signal processing: the sparse way. Academic pressGoogle Scholar
  25. 25.
    McCool C et al (2012) Bi-modal person recognition on a mobile phone: using mobile phone data. In Multimedia and Expo Workshops (ICMEW), 2012 I.E. International Conference on, IEEEGoogle Scholar
  26. 26.
    Mohamed S et al (2013) A method for speech watermarking in speaker verificationGoogle Scholar
  27. 27.
    Nematollahi MA, Akhaee MA, Al-Haddad SAR, Gamboa-Rosales H (2015) Semi-fragile digital speech watermarking for online speaker recognition. EURASIP J Audio Speech Music Process 2015(1):1–15CrossRefGoogle Scholar
  28. 28.
    Nematollahi MA, Al-Haddad S (2015) Distant speaker recognition: an overview. Int J Humanoid Robot 12(03):1–45Google Scholar
  29. 29.
    Nematollahi MA, Gamboa-Rosales H, Akhaee MA, Al-Haddad SAR (2015) Robust digital speech watermarking for online speaker recognition. Mathematical Problems in Engineering, 2015Google Scholar
  30. 30.
    O’Gorman L (2003) Comparing passwords, tokens, and biometrics for user authentication. Proc IEEE 91(12):2021–2040CrossRefGoogle Scholar
  31. 31.
    Pathak MA, Raj B (2013) Privacy-preserving speaker verification and identification using gaussian mixture models. Audio Speech Lang Process IEEE Trans 21(2):397–406CrossRefGoogle Scholar
  32. 32.
    Reynolds DA (1995) Speaker identification and verification using Gaussian mixture speaker models. Speech Comm 17(1):91–108CrossRefGoogle Scholar
  33. 33.
    Reynolds DA, Quatieri TF, Dunn RB (2000) Speaker verification using adapted Gaussian mixture models. Digit Sig Process 10(1):19–41CrossRefGoogle Scholar
  34. 34.
    Roberts C (2007) Biometric attack vectors and defences. Comput Secur 26(1):14–25CrossRefGoogle Scholar
  35. 35.
    Seyed Omid Sadjadi MS, Heck L (2013) MSR Identity toolbox v1.0: A MATLAB toolbox for speaker recognition research, IEEEGoogle Scholar
  36. 36.
    Simon J (2012) DataHashGoogle Scholar
  37. 37.
    Woo RH, Park A, Hazen TJ (2006) The MIT mobile device speaker verification corpus: data collection and preliminary experiments. In Speaker and Language Recognition Workshop, IEEE Odyssey 2006: The. 2006. IEEEGoogle Scholar
  38. 38.
    Wu Z et al (2015) Spoofing and countermeasures for speaker verification: a survey. Speech Comm 66:130–153CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Mohammad Ali Nematollahi
    • 3
  • Hamurabi Gamboa-Rosales
    • 1
  • Francisco J. Martinez-Ruiz
    • 1
  • Jose I. De la Rosa-Vargas
    • 1
  • S. A. R. Al-Haddad
    • 2
  • Mansour Esmaeilpour
    • 3
  1. 1.Department of Electronics EngineeringAutonomous University of ZacatecasZac.Mexico
  2. 2.Department of Computer & Communication Systems Engineering, Faculty of EngineeringUniversity Putra Malaysia, UPMSerdangMalaysia
  3. 3.Computer Engineering Department, Hamedan BranchIslamic Azad UniversityHamedanIran

Personalised recommendations