An efficient perceptual hashing based on improved spectral entropy for speech authentication

Zhang, Qiu-yu; Hu, Wen-jin; Huang, Yi-bo; Qiao, Si-bin

doi:10.1007/s11042-017-4381-y

An efficient perceptual hashing based on improved spectral entropy for speech authentication

Published: 17 January 2017

Volume 77, pages 1555–1581, (2018)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Qiu-yu Zhang ORCID: orcid.org/0000-0003-1488-388X¹,
Wen-jin Hu¹,
Yi-bo Huang² &
…
Si-bin Qiao¹

1399 Accesses
16 Citations
Explore all metrics

Abstract

Combined with the linear prediction-minimum mean squared error (LP-MMSE), an efficient perceptual hashing algorithm based on improved spectral entropy for speech authentication was proposed in this paper. The linear prediction analysis is conducted on speech signal after preprocessing, framing and adding windows, and obtained the minimum mean squared error coefficient matrix. And then, the spectral entropy parameter matrix of each frame is calculated by using improved spectral entropy method. And the final binary perceptual hashing sequence is generated based on the above two matrices, and the speech authentication is completed. Comparing the experimental results of combining the Teager energy operator (TEO) with the linear predictive coefficients (LPC), LP-MMSE and line spectrum pair (LSP) coefficient respectively, it can be seen that the proposed algorithm had a good compromise between robustness, discrimination and authentication efficiency, and the proposed algorithm can meet the requirement of real-time speech authentication in speech communication. Experimental results show that the proposed algorithm was better than other existing methods in compactness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature extraction using GTCC spectrogram and ResNet50 based classification for audio spoof detection

Article 29 March 2024

Nidhi Chakravarty & Mohit Dua

Speech Emotion Recognition: A Comprehensive Survey

Article 08 March 2023

Mohammed Jawad Al-Dujaili & Abbas Ebrahimi-Moghadam

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Article Open access 03 January 2024

Mahendra Kumar Gourisaria, Rakshit Agrawal, … Pradeep Kumar Singh

References

Chen N, Wan W (2010) Robust speech hash function. ETRI J 32(2):345–347. doi:10.4218/etrij.10.0209.0309
Article Google Scholar
Chen N, Wan W, Xiao HD (2010) Robust audio hashing based on discrete-wavelet-transform and non-negative matrix factorisation. IET Commun 4(14):1722–1731. doi:10.1049/iet-com.2009.0749
Article MathSciNet MATH Google Scholar
Chen N, Xiao HD, Wan W (2011) Audio hash function based on non-negative matrix factorisation of Mel-frequency cepstral coefficients. IET Inf Secur 5(1):19–25. doi:10.1049/iet-ifs.2010.0097
Article Google Scholar
Chen N, Xiao HD, Zhu J, Lin JJ, Wang Y, Yuan WH (2013) Robust audio hashing scheme based on cochleagram and cross recurrence analysis. Electron Lett 49(1):7–8. doi:10.1049/el.2012.3812
Article Google Scholar
Deng J, Wan W, Swaminathan R, Yu X, Pan X (2011) An audio fingerprinting system based on spectral energy structure. In Smart and Sustainable City (ICSSC 2011), IET international conference on. IET 1–4. doi:10.1049/cp.2011.0301
Gao H, Chen S, An P, Su G (2012) Emotion recognition of Mandarin speech for different speech corpora based on nonlinear features. In Signal Processing (ICSP), 2012 I.E. 11th international conference on. IEEE. 1:567–570. doi:10.1109/ICoSP.2012.6491552
Grutzek G, Strobl J, Mainka B, Kurth F, Pörschmann C, Knospe H (2012) Perceptual hashing for the identification of telephone speech. In Speech Communication; 10. ITG Symposium; Proceedings of . VDE. 1–4
Haitsma J, Kalker T (2002) A highly robust audio fingerprinting system. In ISMIR 2002:107–115
Google Scholar
Huang HC, Fang WC (2011) Authenticity preservation with histogram-based reversible data hiding and quadtree concepts. Sensors 11(10):9717–9731. doi:10.3390/s111009717
Article Google Scholar
Huang YB, Zhang QY, Yuan ZT (2014) Perceptual speech hashing authentication algorithm based on linear prediction analysis. Indonesian Journal of Electrical Engineering and Computer Science 12(4):3214–3223
Google Scholar
Huang Y, Zhang Q, Yuan Z, Yang Z (2015) The hash algorithm of speech perception based on the integration of adaptive MFCC and LPCC. J. Huazhong Univ. of Sci. and Tech. (Natural Science Edition) 43(2): 124–128. doi:10.13245/j.hust.150226
Jiao YH (2010) Research on perceptual audio hashing. Dissertation, Harbin, Harbin Institute of Technology
Google Scholar
Jiao Y, Tian Y, Li Q, Niu X (2007) Content Integrity Verification for G. 729 Coded Speech. In Intelligent Information Hiding and Multimedia Signal Processing, 2007. IIHMSP 2007. Third international conference on. IEEE 2: 295–300. doi:10.1109/IIHMSP.2007.4457709
Jiao Y, Li Q, Niu X (2008) Compressed domain perceptual hashing for MELP coded speech. In Intelligent Information Hiding and Multimedia Signal Processing, 2008. IIHMSP’08 international conference on. IEEE. 410-413. doi:10.1109/IIH-MSP.2008.210
Jiao Y, Ji L, Niu X (2009) Robust speech hashing for content authentication. IEEE Signal Processing Letters 16(9):818–821. doi:10.1109/LSP.2009.2025827
Article Google Scholar
Li J, Wang H, Jing Y (2015a) Audio perceptual hashing based on NMF and MDCT coefficients. Chin J Electron 24(3):579–588. doi:10.1049/cje.2015.07.024
Article Google Scholar
Li J, Wu T, Wang H (2015b) Perceptual hashing based on correlation coefficient of MFCC for speech authentication. Journal of Beijing University of Posts and Telecommunications 38(2):89–93 doi:10.13190 /j.jbupt.2015.02.016
Google Scholar
Nathwani K, Khunteta S, Nathwani P, Hegde RM (2014) Multi channel speech dereverberation using LP residual cepstrum in an adaptive beamforming framework. In Communications (NCC), 2014 Twentieth National Conference on. IEEE. 1-6. doi:10.1109/NCC.2014.6811324
Nurminen J, Silén H, Helander E, Gabbouj M (2013) Evaluation of detailed modeling of the LP residual in statistical speech synthesis. In 2013 I.E. international symposium on circuits and systems (ISCAS2013). IEEE. 313–316. doi:10.1109/ISCAS.2013.6571844
Padaki H, Nathwani K, Hegde RM (2013) Single channel speech dereverberation using the LP residual cepstrum. In Communications (NCC), 2013 National Conference on. IEEE. 1–5. doi:10.1109/NCC.2013.6487990
Prathosh AP, Ananthapadmanabha TV, Ramakrishnan AG (2013) Epoch extraction based on integrated linear prediction residual using plosion index. IEEE Trans Audio Speech Lang Process 21(12):2471–2480. doi:10.1109/TASL.2013.2273717
Article Google Scholar
Song ZY (2013) The application of MATLAB in speech signal analysis and synthesis. Beihang University Press, Beijing
Google Scholar
Sun Y (2016) An improved password authentication scheme for Telecare medical information systems based on chaotic maps with privacy protection. Journal of Information Hiding and Multimedia Signal Processing 7(5):1006–1019
Google Scholar
Wang L, Li CR (2010) An improved speech endpoint detection method based on adaptive band-partition spectral entropy. Computer Simulation 27(12):373–375
MathSciNet Google Scholar
Wang KC, Tasi YH (2008) Voice activity detection algorithm with low signal-to-noise ratios based on spectrum entropy. In Universal Communication, 2008. ISUC’08. Second International Symposium on. IEEE. 423–428. doi:10.1109/ISUC.2008.55
Wang H, Yu X, Wan W, Swaminathan R (2012) Robust audio fingerprint extraction algorithm based on 2-D chroma. In Audio, Language and Image Processing (ICALIP), 2012 International Conference on. IEEE. 763-767. doi:10.1109/ICALIP.2012.6376716
Yuan Y, Zhao P, Zhou Q (2010) Research of speaker recognition based on combination of LPCC and MFCC. In Intelligent Computing and Intelligent Systems (ICIS), 2010 I.E. international conference on. IEEE. 3: 765–767. doi:10.1109/ICICISYS.2010.5658337
Zhang LH, Zheng BY, Yang Z (2005) A study of feature parameters based on LPC analysis with applications to speaker identification. Journal of Nanjing University of Posts and Telecommunications 25(6):1–6
Google Scholar
Zhang QY, Yang ZP, Huang YB, Yu S, Ren ZW (2015) Robust speech perceptual hashing algorithm based on linear predication residual of G.729 speech codec. International Journal of Innovative Computing, Information and Control 11(6):2159–2175
Google Scholar
Zhang QY, Xing PF, Huang YB, Dong RH, Yang ZP (2016) Perceptual hashing algorithm for multi-format audio. Journal of Beijing University of Posts and Telecommunications 39(4):77–82. doi:10.13190/j.jbupt.2016.04.015
Google Scholar
Zhao H, Peng X, Hu L, Wang G, Yu F, Xu C (2011) An improved speech enhancement method based on teager energy operator and perceptual wavelet packet decomposition. J Multimed 6(3):308–315. doi:10.4304/jmm.6.3.308-315
Article Google Scholar

Download references

Acknowledgement

This work is supported by the National Natural Science Foundation of China (No. 61363078), the Natural Science Foundation of Gansu Province of China (No. 1310RJYA004). The authors would like to thank the anonymous reviewers for their helpful comments and suggestions.

Author information

Authors and Affiliations

School of Computer and Communication, Lanzhou University of Technology, No. 287, Lan-Gong-Ping Road, Lanzhou, Gansu, 730050, China
Qiu-yu Zhang, Wen-jin Hu & Si-bin Qiao
College of Physics and Electronic Engineering, Northwest Normal University, Lanzhou, 730070, China
Yi-bo Huang

Authors

Qiu-yu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Wen-jin Hu
View author publications
You can also search for this author in PubMed Google Scholar
Yi-bo Huang
View author publications
You can also search for this author in PubMed Google Scholar
Si-bin Qiao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qiu-yu Zhang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, Qy., Hu, Wj., Huang, Yb. et al. An efficient perceptual hashing based on improved spectral entropy for speech authentication. Multimed Tools Appl 77, 1555–1581 (2018). https://doi.org/10.1007/s11042-017-4381-y

Download citation

Received: 08 July 2016
Revised: 01 December 2016
Accepted: 09 January 2017
Published: 17 January 2017
Issue Date: January 2018
DOI: https://doi.org/10.1007/s11042-017-4381-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An efficient perceptual hashing based on improved spectral entropy for speech authentication

Abstract

Access this article

Similar content being viewed by others

Feature extraction using GTCC spectrogram and ResNet50 based classification for audio spoof detection

Speech Emotion Recognition: A Comprehensive Survey

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Abstract

Access this article

Similar content being viewed by others

Feature extraction using GTCC spectrogram and ResNet50 based classification for audio spoof detection

Speech Emotion Recognition: A Comprehensive Survey

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation