Abstract
Combined with the linear prediction-minimum mean squared error (LP-MMSE), an efficient perceptual hashing algorithm based on improved spectral entropy for speech authentication was proposed in this paper. The linear prediction analysis is conducted on speech signal after preprocessing, framing and adding windows, and obtained the minimum mean squared error coefficient matrix. And then, the spectral entropy parameter matrix of each frame is calculated by using improved spectral entropy method. And the final binary perceptual hashing sequence is generated based on the above two matrices, and the speech authentication is completed. Comparing the experimental results of combining the Teager energy operator (TEO) with the linear predictive coefficients (LPC), LP-MMSE and line spectrum pair (LSP) coefficient respectively, it can be seen that the proposed algorithm had a good compromise between robustness, discrimination and authentication efficiency, and the proposed algorithm can meet the requirement of real-time speech authentication in speech communication. Experimental results show that the proposed algorithm was better than other existing methods in compactness.
Similar content being viewed by others
References
Chen N, Wan W (2010) Robust speech hash function. ETRI J 32(2):345–347. doi:10.4218/etrij.10.0209.0309
Chen N, Wan W, Xiao HD (2010) Robust audio hashing based on discrete-wavelet-transform and non-negative matrix factorisation. IET Commun 4(14):1722–1731. doi:10.1049/iet-com.2009.0749
Chen N, Xiao HD, Wan W (2011) Audio hash function based on non-negative matrix factorisation of Mel-frequency cepstral coefficients. IET Inf Secur 5(1):19–25. doi:10.1049/iet-ifs.2010.0097
Chen N, Xiao HD, Zhu J, Lin JJ, Wang Y, Yuan WH (2013) Robust audio hashing scheme based on cochleagram and cross recurrence analysis. Electron Lett 49(1):7–8. doi:10.1049/el.2012.3812
Deng J, Wan W, Swaminathan R, Yu X, Pan X (2011) An audio fingerprinting system based on spectral energy structure. In Smart and Sustainable City (ICSSC 2011), IET international conference on. IET 1–4. doi:10.1049/cp.2011.0301
Gao H, Chen S, An P, Su G (2012) Emotion recognition of Mandarin speech for different speech corpora based on nonlinear features. In Signal Processing (ICSP), 2012 I.E. 11th international conference on. IEEE. 1:567–570. doi:10.1109/ICoSP.2012.6491552
Grutzek G, Strobl J, Mainka B, Kurth F, Pörschmann C, Knospe H (2012) Perceptual hashing for the identification of telephone speech. In Speech Communication; 10. ITG Symposium; Proceedings of . VDE. 1–4
Haitsma J, Kalker T (2002) A highly robust audio fingerprinting system. In ISMIR 2002:107–115
Huang HC, Fang WC (2011) Authenticity preservation with histogram-based reversible data hiding and quadtree concepts. Sensors 11(10):9717–9731. doi:10.3390/s111009717
Huang YB, Zhang QY, Yuan ZT (2014) Perceptual speech hashing authentication algorithm based on linear prediction analysis. Indonesian Journal of Electrical Engineering and Computer Science 12(4):3214–3223
Huang Y, Zhang Q, Yuan Z, Yang Z (2015) The hash algorithm of speech perception based on the integration of adaptive MFCC and LPCC. J. Huazhong Univ. of Sci. and Tech. (Natural Science Edition) 43(2): 124–128. doi:10.13245/j.hust.150226
Jiao YH (2010) Research on perceptual audio hashing. Dissertation, Harbin, Harbin Institute of Technology
Jiao Y, Tian Y, Li Q, Niu X (2007) Content Integrity Verification for G. 729 Coded Speech. In Intelligent Information Hiding and Multimedia Signal Processing, 2007. IIHMSP 2007. Third international conference on. IEEE 2: 295–300. doi:10.1109/IIHMSP.2007.4457709
Jiao Y, Li Q, Niu X (2008) Compressed domain perceptual hashing for MELP coded speech. In Intelligent Information Hiding and Multimedia Signal Processing, 2008. IIHMSP’08 international conference on. IEEE. 410-413. doi:10.1109/IIH-MSP.2008.210
Jiao Y, Ji L, Niu X (2009) Robust speech hashing for content authentication. IEEE Signal Processing Letters 16(9):818–821. doi:10.1109/LSP.2009.2025827
Li J, Wang H, Jing Y (2015a) Audio perceptual hashing based on NMF and MDCT coefficients. Chin J Electron 24(3):579–588. doi:10.1049/cje.2015.07.024
Li J, Wu T, Wang H (2015b) Perceptual hashing based on correlation coefficient of MFCC for speech authentication. Journal of Beijing University of Posts and Telecommunications 38(2):89–93 doi:10.13190 /j.jbupt.2015.02.016
Nathwani K, Khunteta S, Nathwani P, Hegde RM (2014) Multi channel speech dereverberation using LP residual cepstrum in an adaptive beamforming framework. In Communications (NCC), 2014 Twentieth National Conference on. IEEE. 1-6. doi:10.1109/NCC.2014.6811324
Nurminen J, Silén H, Helander E, Gabbouj M (2013) Evaluation of detailed modeling of the LP residual in statistical speech synthesis. In 2013 I.E. international symposium on circuits and systems (ISCAS2013). IEEE. 313–316. doi:10.1109/ISCAS.2013.6571844
Padaki H, Nathwani K, Hegde RM (2013) Single channel speech dereverberation using the LP residual cepstrum. In Communications (NCC), 2013 National Conference on. IEEE. 1–5. doi:10.1109/NCC.2013.6487990
Prathosh AP, Ananthapadmanabha TV, Ramakrishnan AG (2013) Epoch extraction based on integrated linear prediction residual using plosion index. IEEE Trans Audio Speech Lang Process 21(12):2471–2480. doi:10.1109/TASL.2013.2273717
Song ZY (2013) The application of MATLAB in speech signal analysis and synthesis. Beihang University Press, Beijing
Sun Y (2016) An improved password authentication scheme for Telecare medical information systems based on chaotic maps with privacy protection. Journal of Information Hiding and Multimedia Signal Processing 7(5):1006–1019
Wang L, Li CR (2010) An improved speech endpoint detection method based on adaptive band-partition spectral entropy. Computer Simulation 27(12):373–375
Wang KC, Tasi YH (2008) Voice activity detection algorithm with low signal-to-noise ratios based on spectrum entropy. In Universal Communication, 2008. ISUC’08. Second International Symposium on. IEEE. 423–428. doi:10.1109/ISUC.2008.55
Wang H, Yu X, Wan W, Swaminathan R (2012) Robust audio fingerprint extraction algorithm based on 2-D chroma. In Audio, Language and Image Processing (ICALIP), 2012 International Conference on. IEEE. 763-767. doi:10.1109/ICALIP.2012.6376716
Yuan Y, Zhao P, Zhou Q (2010) Research of speaker recognition based on combination of LPCC and MFCC. In Intelligent Computing and Intelligent Systems (ICIS), 2010 I.E. international conference on. IEEE. 3: 765–767. doi:10.1109/ICICISYS.2010.5658337
Zhang LH, Zheng BY, Yang Z (2005) A study of feature parameters based on LPC analysis with applications to speaker identification. Journal of Nanjing University of Posts and Telecommunications 25(6):1–6
Zhang QY, Yang ZP, Huang YB, Yu S, Ren ZW (2015) Robust speech perceptual hashing algorithm based on linear predication residual of G.729 speech codec. International Journal of Innovative Computing, Information and Control 11(6):2159–2175
Zhang QY, Xing PF, Huang YB, Dong RH, Yang ZP (2016) Perceptual hashing algorithm for multi-format audio. Journal of Beijing University of Posts and Telecommunications 39(4):77–82. doi:10.13190/j.jbupt.2016.04.015
Zhao H, Peng X, Hu L, Wang G, Yu F, Xu C (2011) An improved speech enhancement method based on teager energy operator and perceptual wavelet packet decomposition. J Multimed 6(3):308–315. doi:10.4304/jmm.6.3.308-315
Acknowledgement
This work is supported by the National Natural Science Foundation of China (No. 61363078), the Natural Science Foundation of Gansu Province of China (No. 1310RJYA004). The authors would like to thank the anonymous reviewers for their helpful comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, Qy., Hu, Wj., Huang, Yb. et al. An efficient perceptual hashing based on improved spectral entropy for speech authentication. Multimed Tools Appl 77, 1555–1581 (2018). https://doi.org/10.1007/s11042-017-4381-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-017-4381-y