Skip to main content
Log in

An efficient perceptual hashing based on improved spectral entropy for speech authentication

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Combined with the linear prediction-minimum mean squared error (LP-MMSE), an efficient perceptual hashing algorithm based on improved spectral entropy for speech authentication was proposed in this paper. The linear prediction analysis is conducted on speech signal after preprocessing, framing and adding windows, and obtained the minimum mean squared error coefficient matrix. And then, the spectral entropy parameter matrix of each frame is calculated by using improved spectral entropy method. And the final binary perceptual hashing sequence is generated based on the above two matrices, and the speech authentication is completed. Comparing the experimental results of combining the Teager energy operator (TEO) with the linear predictive coefficients (LPC), LP-MMSE and line spectrum pair (LSP) coefficient respectively, it can be seen that the proposed algorithm had a good compromise between robustness, discrimination and authentication efficiency, and the proposed algorithm can meet the requirement of real-time speech authentication in speech communication. Experimental results show that the proposed algorithm was better than other existing methods in compactness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Chen N, Wan W (2010) Robust speech hash function. ETRI J 32(2):345–347. doi:10.4218/etrij.10.0209.0309

    Article  Google Scholar 

  2. Chen N, Wan W, Xiao HD (2010) Robust audio hashing based on discrete-wavelet-transform and non-negative matrix factorisation. IET Commun 4(14):1722–1731. doi:10.1049/iet-com.2009.0749

    Article  MathSciNet  MATH  Google Scholar 

  3. Chen N, Xiao HD, Wan W (2011) Audio hash function based on non-negative matrix factorisation of Mel-frequency cepstral coefficients. IET Inf Secur 5(1):19–25. doi:10.1049/iet-ifs.2010.0097

    Article  Google Scholar 

  4. Chen N, Xiao HD, Zhu J, Lin JJ, Wang Y, Yuan WH (2013) Robust audio hashing scheme based on cochleagram and cross recurrence analysis. Electron Lett 49(1):7–8. doi:10.1049/el.2012.3812

    Article  Google Scholar 

  5. Deng J, Wan W, Swaminathan R, Yu X, Pan X (2011) An audio fingerprinting system based on spectral energy structure. In Smart and Sustainable City (ICSSC 2011), IET international conference on. IET 1–4. doi:10.1049/cp.2011.0301

  6. Gao H, Chen S, An P, Su G (2012) Emotion recognition of Mandarin speech for different speech corpora based on nonlinear features. In Signal Processing (ICSP), 2012 I.E. 11th international conference on. IEEE. 1:567–570. doi:10.1109/ICoSP.2012.6491552

  7. Grutzek G, Strobl J, Mainka B, Kurth F, Pörschmann C, Knospe H (2012) Perceptual hashing for the identification of telephone speech. In Speech Communication; 10. ITG Symposium; Proceedings of . VDE. 1–4

  8. Haitsma J, Kalker T (2002) A highly robust audio fingerprinting system. In ISMIR 2002:107–115

    Google Scholar 

  9. Huang HC, Fang WC (2011) Authenticity preservation with histogram-based reversible data hiding and quadtree concepts. Sensors 11(10):9717–9731. doi:10.3390/s111009717

    Article  Google Scholar 

  10. Huang YB, Zhang QY, Yuan ZT (2014) Perceptual speech hashing authentication algorithm based on linear prediction analysis. Indonesian Journal of Electrical Engineering and Computer Science 12(4):3214–3223

    Google Scholar 

  11. Huang Y, Zhang Q, Yuan Z, Yang Z (2015) The hash algorithm of speech perception based on the integration of adaptive MFCC and LPCC. J. Huazhong Univ. of Sci. and Tech. (Natural Science Edition) 43(2): 124–128. doi:10.13245/j.hust.150226

  12. Jiao YH (2010) Research on perceptual audio hashing. Dissertation, Harbin, Harbin Institute of Technology

    Google Scholar 

  13. Jiao Y, Tian Y, Li Q, Niu X (2007) Content Integrity Verification for G. 729 Coded Speech. In Intelligent Information Hiding and Multimedia Signal Processing, 2007. IIHMSP 2007. Third international conference on. IEEE 2: 295–300. doi:10.1109/IIHMSP.2007.4457709

  14. Jiao Y, Li Q, Niu X (2008) Compressed domain perceptual hashing for MELP coded speech. In Intelligent Information Hiding and Multimedia Signal Processing, 2008. IIHMSP’08 international conference on. IEEE. 410-413. doi:10.1109/IIH-MSP.2008.210

  15. Jiao Y, Ji L, Niu X (2009) Robust speech hashing for content authentication. IEEE Signal Processing Letters 16(9):818–821. doi:10.1109/LSP.2009.2025827

    Article  Google Scholar 

  16. Li J, Wang H, Jing Y (2015a) Audio perceptual hashing based on NMF and MDCT coefficients. Chin J Electron 24(3):579–588. doi:10.1049/cje.2015.07.024

    Article  Google Scholar 

  17. Li J, Wu T, Wang H (2015b) Perceptual hashing based on correlation coefficient of MFCC for speech authentication. Journal of Beijing University of Posts and Telecommunications 38(2):89–93 doi:10.13190 /j.jbupt.2015.02.016

    Google Scholar 

  18. Nathwani K, Khunteta S, Nathwani P, Hegde RM (2014) Multi channel speech dereverberation using LP residual cepstrum in an adaptive beamforming framework. In Communications (NCC), 2014 Twentieth National Conference on. IEEE. 1-6. doi:10.1109/NCC.2014.6811324

  19. Nurminen J, Silén H, Helander E, Gabbouj M (2013) Evaluation of detailed modeling of the LP residual in statistical speech synthesis. In 2013 I.E. international symposium on circuits and systems (ISCAS2013). IEEE. 313–316. doi:10.1109/ISCAS.2013.6571844

  20. Padaki H, Nathwani K, Hegde RM (2013) Single channel speech dereverberation using the LP residual cepstrum. In Communications (NCC), 2013 National Conference on. IEEE. 1–5. doi:10.1109/NCC.2013.6487990

  21. Prathosh AP, Ananthapadmanabha TV, Ramakrishnan AG (2013) Epoch extraction based on integrated linear prediction residual using plosion index. IEEE Trans Audio Speech Lang Process 21(12):2471–2480. doi:10.1109/TASL.2013.2273717

    Article  Google Scholar 

  22. Song ZY (2013) The application of MATLAB in speech signal analysis and synthesis. Beihang University Press, Beijing

    Google Scholar 

  23. Sun Y (2016) An improved password authentication scheme for Telecare medical information systems based on chaotic maps with privacy protection. Journal of Information Hiding and Multimedia Signal Processing 7(5):1006–1019

    Google Scholar 

  24. Wang L, Li CR (2010) An improved speech endpoint detection method based on adaptive band-partition spectral entropy. Computer Simulation 27(12):373–375

    MathSciNet  Google Scholar 

  25. Wang KC, Tasi YH (2008) Voice activity detection algorithm with low signal-to-noise ratios based on spectrum entropy. In Universal Communication, 2008. ISUC’08. Second International Symposium on. IEEE. 423–428. doi:10.1109/ISUC.2008.55

  26. Wang H, Yu X, Wan W, Swaminathan R (2012) Robust audio fingerprint extraction algorithm based on 2-D chroma. In Audio, Language and Image Processing (ICALIP), 2012 International Conference on. IEEE. 763-767. doi:10.1109/ICALIP.2012.6376716

  27. Yuan Y, Zhao P, Zhou Q (2010) Research of speaker recognition based on combination of LPCC and MFCC. In Intelligent Computing and Intelligent Systems (ICIS), 2010 I.E. international conference on. IEEE. 3: 765–767. doi:10.1109/ICICISYS.2010.5658337

  28. Zhang LH, Zheng BY, Yang Z (2005) A study of feature parameters based on LPC analysis with applications to speaker identification. Journal of Nanjing University of Posts and Telecommunications 25(6):1–6

    Google Scholar 

  29. Zhang QY, Yang ZP, Huang YB, Yu S, Ren ZW (2015) Robust speech perceptual hashing algorithm based on linear predication residual of G.729 speech codec. International Journal of Innovative Computing, Information and Control 11(6):2159–2175

    Google Scholar 

  30. Zhang QY, Xing PF, Huang YB, Dong RH, Yang ZP (2016) Perceptual hashing algorithm for multi-format audio. Journal of Beijing University of Posts and Telecommunications 39(4):77–82. doi:10.13190/j.jbupt.2016.04.015

    Google Scholar 

  31. Zhao H, Peng X, Hu L, Wang G, Yu F, Xu C (2011) An improved speech enhancement method based on teager energy operator and perceptual wavelet packet decomposition. J Multimed 6(3):308–315. doi:10.4304/jmm.6.3.308-315

    Article  Google Scholar 

Download references

Acknowledgement

This work is supported by the National Natural Science Foundation of China (No. 61363078), the Natural Science Foundation of Gansu Province of China (No. 1310RJYA004). The authors would like to thank the anonymous reviewers for their helpful comments and suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qiu-yu Zhang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Qy., Hu, Wj., Huang, Yb. et al. An efficient perceptual hashing based on improved spectral entropy for speech authentication. Multimed Tools Appl 77, 1555–1581 (2018). https://doi.org/10.1007/s11042-017-4381-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-017-4381-y

Keywords

Navigation