Skip to main content
Log in

Sensitivity of automatic speaker identification to SVD digital audio watermarking

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

This paper proposes the utilization of SVD digital audio watermarking to increase the security of automatic speaker identification (ASI) systems and presents a study for the effect of watermarking on the ASI system performance. The SVD audio watermarking algorithm can be implemented on audio signals in time domain or in another appropriate transform domain and can be applied to the audio signal as a whole or on a segment-by-segment basis. The speaker recognition system works by generating a database of speaker’s features using the MFCCs and polynomial shape coefficients extracted from each speaker after they are lexicographically ordered into 1-D signals. A matching process is performed for any new speaker to determine if he is belonging to the database or not, using a trained neural network. Experimental results show that the SVD audio watermarking doesn’t degrade the ASI system performance severely. So, it can be used with ASI to increase security. Also, it was shown the segment by segment watermarking in the time domain achieves the highest detectability of the watermark. So, we can say that it is recommended to use SVD segment by segment audio watermarking with ASI systems implementing features extracted from the DCT or the DWT.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22

Similar content being viewed by others

References

  • Abd El-Samie, F. E. (2009). An efficient singular value decomposition algorithm for digital audio watermarking. International Journal of Speech Technology, 17, 27–45.

    Article  Google Scholar 

  • Barnwell, T. P & Voiers W. D. (1979) An analysis of objective measures for user acceptance of voice communication systems, final report.

  • Barnwell, T. P., Bush, A. M. & Mersereau, R.M. (1978) Speech quality measurement, final report.

  • Campbell, J. P. (1997). Speaker recognition: A tutorial. Proceedings of the IEEE, 85(9), 1437–1462.

    Article  Google Scholar 

  • Chiyi, J., Kubichek, R. (1996) Vector quantization techniques for output-based objective speech quality. In IEEE international conference on Acoustics, speech and signal processing, ICASSP-96, conference proceedings, (Vol. 1, pp. 491–494).

  • Crochiere, R. E., Tribole, J. E., & Rabiner, L. R. (1980). An interpretation of the log likelihood ratio as a measure of waveform coder performance. IEEE Transactions on Acoustic, Speech and Signal Processing, ASSP-28(3), 318–323.

    Article  Google Scholar 

  • Dimolitsas, S. (1989). Objective speech distortion measures and their relevance to speech quality assessments. IEEE Proceedings Communication, Speech and Vision, 136(5), 317–324.

    Article  Google Scholar 

  • Hossain, M., Ahmed B. & Asrafi M. (2007). A real time speaker identification using artificial network network. In: 10th international conference on computer and information technology 2008, ICCIT 2008 (pp. 1–5).

  • Lam, K. H., Au, O. C., Chan, C.C.(1996) Objective speech quality measure for cellular phone. In IEEE international conference on acoustics, speech and signal processing (ICASSP -96), conference proceeding, (Vol.1, pp. 487–490).

  • Lara, J. R. (2005). A method of automatic speaker recognition using cepstral features and vectorial quantization (pp. 146–153). Berlin/Heidelberg: LNCS/Springer.

    Google Scholar 

  • Lungyun, G., Harris, J.G., Shrivastav, R. (2006) Disordered speech evaluation using objective quality measures. In IEEE international conference on acoustics, speech and signal processing, proceedings (ICASSP’05), (Vol. 1, pp. 321–324).

  • Makhoul, J. (1973). Spectral analysis of speech by linear prediction. IEEE Transactions on Audio and Electroacoustics, AU-21(3), 140–148.

    Article  Google Scholar 

  • Makhoul, J. (1975). Linear prediction: A tutorial review. Proceedings of the IEEE, 63(4), 561–580.

    Article  Google Scholar 

  • Nica, A., Caruntu, A., Toderean, G. (2006) Analysis and synthesis of vowels using Matlab. In IEEE international conference on automatic, quality and testing, Robotics, (Vol. 2, pp. 371–374).

  • Noll, P.W. (1974) Adaptive quantization in speech coding systems. In IEEE international Zurich Seminar.

  • Paul, AK, Das, D & Kamal, MM. (2009) Bangla speech recognition system using LPC and ANN. In seventh international conference on advances in pattern recognition, ICAPR’09 (pp. 171–174).

  • Picone, J. (1993). Signal modelling techniques in speech recognition. IEEE proceedings, 81(9), 1215–1247.

    Article  Google Scholar 

  • Reynolds, D.A. (2002) An overview of automatic speaker recognition technology. In proceedings IEEE international conference on acoustics, speech and signal processing (ICASSP), (Vol.4, pp. 4072–4075).

  • Saha, G., & Kumar, P. (2004). A comparative study of feature extraction algorithms on ANN based speaker model for speaker recognition application (Vol. 3773, pp. 1192–1197). Berlin/Heidelberg: LNCS/Springer.

    Google Scholar 

  • Schroeder, M. R., Atal, B. S., & Hall, J. L. (1979). Optimizing digital speech coders by exploiting properties of the human ear. The Journal of the Acoustical Society of America, 66, 1647–1652.

    Article  Google Scholar 

  • Sleit, A., Serhan. S. & Nemir L. (2008) A histogram based speaker identification technique. In First international conference on the applications of digital information and web technologies ICADIWT, (pp. 384–338).

  • Srinivasan, S. H. (2004) Speech quality measure based on auditory scene analysis. In IEEE 6th workshop on multimedia signal processing, (pp. 371–374).

  • Swain, A. K., & Abdulla, W. (2004). Estimation of LPC Parameters of Speech Signal in Noise Environment. IEEE Region 10 conference TENCON 2004, 1, 134–142.

    Google Scholar 

  • Tanprasert, C., Wutiwiwatcha, C. & Sae- Tang, S. (2000) Text-dependent speaker identification using neural network on distinictive thai tone marks. In Nectec Technical Journal, (Vol. 1, No. 6).

  • Thorpe, L.A. & Shelton B.R. (1993) Subjective test methodology: MOS vs. DMOS in evaluation of speech coding. In IEEE workshop on speech coding for telecommunication, proceedings, (pp. 73–74).

  • Wang, S., Sekey, A., & Gersho, A. (1992). An objective measure for predicting subjective quality of speech coders. IEEE J Selected Areas Communication, 10, 819–828.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Osama S. Faragallah.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Abd El-Samie, F.E., Shafik, A., El-sayed, H.S. et al. Sensitivity of automatic speaker identification to SVD digital audio watermarking. Int J Speech Technol 18, 565–581 (2015). https://doi.org/10.1007/s10772-015-9292-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-015-9292-6

Keywords

Navigation