Non-negative Frequency-Weighted Energy-Based Speech Quality Estimation for Different Modes and Quality of Speech

Shome, Nirupam; Laskar, Rabul Hussain; Kashyap, Richik

doi:10.1007/s00034-022-02070-y

Non-negative Frequency-Weighted Energy-Based Speech Quality Estimation for Different Modes and Quality of Speech

Published: 12 July 2022

Volume 41, pages 6788–6826, (2022)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

236 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

This paper proposes a robust signal-to-noise ratio (SNR) estimation technique from continuous speech based on the non-negative frequency-weighted energy operator using the envelope of derivative of the speech signal. The proposed SNR evaluation method is implemented in two phases. The first phase of SNR estimation consists of speech region identification by glottal activity detection (GAD) along with two pre-processing and post-processing methods. Speech enhancement with background subtraction is the pre-processing and obstruent detection with duration modification is the post-processing modules to GAD, which improves the speech and non-speech region detection performance. The next stage comprises estimation of SNR from instantaneous energy contour obtained from the envelope of the derivative of the speech signal, which is based on non-negative frequency-weighted energy estimation. The ratio of instantaneous energy of speech region to non-speech region is considered as the estimated segmental SNR. The final SNR is calculated by averaging segmental SNR and compared with true SNR to get the estimation efficiency of the proposed method. The performance of the proposed method for SNR estimation from speech sample is evaluated using TIMIT, NOIZEUS, and TESDHE acoustic speech corpus. The TIMIT and NOIZEUS databases are used to evaluate the performance for a different type of noise and TESDHE database is used for analysis under a different mode of data. The result of our study shows that the proposed system provides accurate SNR estimation across different types of noise and different modes of signal for various signal levels and outperforms the other methods under observation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

Automatic speech recognition: a survey

Article 10 November 2020

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Article Open access 03 January 2024

Data Availability

The data that support the findings of this study are available in TIMIT[13], NOISEX-92[57],NOIZEUS[21], and TESDHE[31] with the identifiers https://doi.org/10.35111/17gk-bn40, https://doi.org/10.1016/0167-6393(93)90095-3, https://doi.org/10.1016/j.specom.2006.12.006, and https://doi.org/10.1007/s10772-018-9557-y.

References

P. Alku, T. Bäckström, E. Vilkman, Normalized amplitude quotient for parametrization of the glottal flow. J. Acoust. Soc. Am. 112(2), 701–710 (2002)
Article Google Scholar
R. Aralikatti, D.K. Margam, T. Sharma, A. Thanda, S.M. Venkatesan, Global SNR Estimation of speech signals using entropy and uncertainty estimates from dropout networks, in INTERSPEECH, Hyderabad, India, 2018
S. Boll, Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acoust. Speech Signal Process. 27(2), 113–120 (1979)
Article Google Scholar
A.L. Bowers, T. Saltuklaroglu, A. Harkrider, M. Wilson, M.A. Toner, Dynamic modulation of shared sensory and motor cortical rhythms mediates speech and non-speech discrimination performance. Front. Psychol. 5, 366 (2014)
Article Google Scholar
C. Breithaupt, T. Gerkmann, R. Martin, A novel a priori SNR estimation approach based on selective cepstro-temporal smoothing, in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, IEEE, 2008, pp. 4897–4900. https://doi.org/10.1109/ICASSP.2008.4518755.
I. Cohen, Relaxed statistical model for speech enhancement and a priori SNR estimation. IEEE Trans. Speech Audio Process. 13(5), 870–881 (2005). https://doi.org/10.1109/TSA.2005.851940
Article Google Scholar
J.A.M. Cordovilla, N. Ma, V. Sánchez, J.L. Carmona, A.M. Peinado, J. Barker, A pitch based noise estimation technique for robust speech recognition with missing data, in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, IEEE, 2011, pp. 4808–4811. https://doi.org/10.1109/ICASSP.2011.5947431
N. Dhananjaya, B. Yegnanarayana, Voiced/nonvoiced detection based on robustness of voiced epochs. IEEE Signal Process. Lett. 17(3), 273–276 (2009)
Article Google Scholar
M.A.A. El-Fattah, M.I. Dessouky, A.M. Abbas, S.M. Diab, E.S.M. El-Rabaie, W. Al-Nuaimy, S.A. Alshebeili, F.E. Abd El-Samie, Speech enhancement with an adaptive Wiener filter. Int. J. Speech Technol. 17(1), 53–64 (2014). https://doi.org/10.1007/s10772-013-9205-5
Article Google Scholar
S. Elshamy, N. Madhu, W. Tirry, T. Fingscheidt, An iterative speech model-based a priori SNR estimator, in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2015, pp. 1740–1744
A.K. Fuchs, C. Amon, M. Hagmüller, Speech/non-speech detection for electro-larynx speech using EMG, in International Conference on Bio-Inspired Systems and Signal Processing, SCITEPRESS, 2015, pp. 138–144
S. Furui, Digital speech processing, synthesis, and recognition. CRC Press (2018). https://doi.org/10.1201/9781482270648
Article Google Scholar
J.S. Garofolo, Timit acoustic phonetic continuous speech corpus, Linguist. Data Consort. 1993. (1993)
D. Govind, S.R. Mahadeva Prasanna, B. Yegnanarayana, Significance of glottal activity detection for duration modification, in Proceedings of the 6th International Conference on Speech Prosody, SP 2012, 2012, pp. 470–473
F. Grondin, F. Michaud, Robust speech/non-speech discrimination based on pitch estimation for mobile robots, in 2016 IEEE International Conference on Robotics and Automation (ICRA), IEEE, 2016, pp. 1650–1655.
J.H.L. Hansen, B.L. Pellom, An effective quality evaluation protocol for speech enhancement algorithms, in Fifth International Conference on Spoken Language Processing, 1998
S. Hiroya, K. Jasmin, S. Krishnan, C. Lima, M. Ostarek, D. Boebinger, S.K. Scott, Speech rhythm measure of non-native speech using a statistical phoneme duration model, in The 8th Annual Meeting of the Society for the Neurobiology of Language, 2016
H.G. Hirsch, C. Ehrlicher, Noise estimation techniques for robust speech recognition, in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, IEEE, 1995, pp. 153–156. https://doi.org/10.1109/icassp.1995.479387
R.S. Holambe, M.S. Deshpande, Nonlinearity framework in speech processing, in Advances in Non-Linear Modeling for Speech Processing, Springer, 2012, pp. 11–25
Y. Hu, P.C. Loizou, Subjective comparison and evaluation of speech enhancement algorithms. Speech Commun. 49(7–8), 588–601 (2007)
Article Google Scholar
G. Hu, D. Wang, Segregation of unvoiced speech from nonspeech interference. J. Acoust. Soc. Am. 124(2), 1306–1319 (2008). https://doi.org/10.1121/1.2939132
Article Google Scholar
J.F. Kaiser, On a simple algorithm to calculate the ‘energy’ of a signal, in International Conference on Acoustics, Speech, and Signal Processing, IEEE, 1990, pp. 381–384
J.F. Kaiser, Some useful properties of Teager’s energy operators, in Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, 1993, pp. 149–152. https://doi.org/10.1109/icassp.1993.319457
C. Kim, R.M. Stern, Robust signal-to-noise ratio estimation based on waveform amplitude distribution analysis, in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2008, pp. 2598–2601
T. Kinnunen, H. Li, An overview of text-independent speaker recognition: from features to supervectors. Speech Commun. 52(1), 12–40 (2010)
Article Google Scholar
P. Ladefoged, K. Johnson, A course in phonetics, Cengage learning, 2014
H. Li, D. Wang, X. Zhang, G. Gao, Frame-level signal-to-noise ratio estimation using deep learning., in INTERSPEECH, 2020, pp. 4626–4630
S. Lv, Y. Hu, S. Zhang, L. Xie, DCCRN+: Channel-wise subband DCCRN with SNR estimation for speech enhancement, ArXiv Preprint. (2021). http://arxiv.org/abs/2106.08672.
R. Martin, An efficient algorithm to estimate the instantaneous SNR of speech signals, in Third European Conference on Speech Communication and Technology (EUROSPEECH ’93), 1993: pp. 1093–1096.
R. Martin, Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Trans. Speech Audio Process. 9(5), 504–512 (2001). https://doi.org/10.1109/89.928915
Article Google Scholar
A. Milton, K.A. Monsely, Tamil and English speech database for heartbeat estimation. Int. J. Speech Technol. 21(4), 967–973 (2018)
Article Google Scholar
T. Moazzeni, A. Amei, J. Ma, Y. Jiang, Statistical model based SNR estimation method for speech signals. Electron. Lett. 48(12), 727–729 (2012). https://doi.org/10.1049/el.2012.0799
Article Google Scholar
K.S.R. Murty, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16(8), 1602–1613 (2008)
Article Google Scholar
A. Narayanan, D. Wang, A CASA-based system for long-term SNR estimation. IEEE Trans. Audio Speech Lang. Process. 20(9), 2518–2527 (2012). https://doi.org/10.1109/TASL.2012.2205242
Article Google Scholar
NIST-SNR, “NIST speech signal to noise ratio measurements,” [Online]., Created May 19, 2015. (n.d.). available: https://www.nist.gov/itl/iad/%0Amig/nist-speech-signal-noise-ratio-measurements
J.M. O’Toole, A. Temko, N. Stevenson, Assessing instantaneous energy in the EEG: a non-negative, frequency-weighted energy operator, in 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, IEEE, 2014, pp. 3288–3291
J.M. O’Toole, B.G. Zapirain, I.M. Saiz, A.B.A. Chen, I.Y. Santamaría, Estimating the time-varying periodicity of epileptiform discharges in the electroencephalogram, in 2012 11th International Conference on Information Science, Signal Processing and Their Applications (ISSPA), IEEE, 2012, pp. 1229–1234
K. Palmu, N. Stevenson, S. Wikström, L. Hellström-Westas, S. Vanhatalo, J.M. Palva, Optimization of an NLEO-based algorithm for automated detection of spontaneous activity transients in early preterm EEG. Physiol. Meas. 31(11), N85 (2010)
Article Google Scholar
P. Papadopoulos, A. Tsiartas, S. Narayanan, Long-term SNR estimation of speech signals in known and unknown channel conditions. IEEE/ACM Trans. Audio Speech Lang. Process. 24(12), 2495–2506 (2016). https://doi.org/10.1109/TASLP.2016.2615240
Article Google Scholar
C. Plapous, C. Marro, P. Scalart, Improved signal-to-noise ratio estimation for speech enhancement. IEEE Trans. Audio Speech Lang. Process. 14(6), 2098–2108 (2006). https://doi.org/10.1109/TASL.2006.872621
Article Google Scholar
S.R.M. Prasanna, J.M. Zachariah, B. Yegnanarayana, Begin-end detection using vowel onset points, in Workshop on Spoken Language Processing, 2003
F. Qu, S. Lei, Z. Zhao, J. Zhang, Z. Nie, A modified a priori SNR estimation for spectral subtraction speech enhancement, in 2021 IEEE 4th International Conference on Electronics Technology (ICET), IEEE, 2021, pp. 861–864. https://doi.org/10.1109/icet51757.2021.9451018
Z. Rafii, B. Pardo, Online REPET-SIM for real-time speech enhancement, in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, 2013, pp. 848–852
Z. Rafii, B. Pardo, Music/voice separation using the similarity matrix, in ISMIR, 2012, pp. 583–588
Z. Rafii, B. Pardo, Repeating pattern extraction technique (REPET): a simple method for music/voice separation. IEEE Trans. Audio Speech Lang. Process. 21(1), 73–84 (2012)
Article Google Scholar
Y. Ren, M.T. Johnson, An improved SNR estimator for speech enhancement, in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, IEEE, 2008, pp. 4901–4904. https://doi.org/10.1109/ICASSP.2008.4518756
P. Saha, U. Baruah, R.H. Laskar, S. Mishra, S.P. Choudhury, T.K. Das, Robust analysis for improvement of vowel onset point detection under noisy conditions. Int. J. Speech Technol. 19(3), 433–448 (2016). https://doi.org/10.1007/s10772-016-9336-6
Article Google Scholar
B.D. Sarma, S.R.M. Prasanna, P. Sarmah, Consonant-vowel unit recognition using dominant aperiodic and transition region detection. Speech Commun. 92, 77–89 (2017)
Article Google Scholar
Y. Shao, C.-H. Chang, A versatile speech enhancement system based on perceptual wavelet denoising, in 2005 IEEE International Symposium on Circuits and Systems, IEEE, 2005, pp. 864–867.
N. Shome, R.H. Laskar, D. Das, Reference free speech quality estimation for diverse data condition. Int. J. Speech Technol. (2019). https://doi.org/10.1007/s10772-018-9537-2
Article Google Scholar
S. Suhadi, C. Last, T. Fingscheidt, A data-driven approach to a priori SNR estimation. IEEE Trans. Audio Speech Lang. Process. 19(1), 186–195 (2011). https://doi.org/10.1109/TASL.2010.2045799
Article Google Scholar
J. Tchroz, B. Kollmeier, SNR estimation based on amplitude modulation analysis with applications to noise suppression. IEEE Trans. Speech Audio Process. 11(3), 184–192 (2003). https://doi.org/10.1109/TSA.2003.811542
Article Google Scholar
H.M. Teager, S.M. Teager, Evidence for nonlinear sound production mechanisms in the vocal tract, in Speech Production and Speech Modelling, Springer, 1990, pp. 241–261.
S. V Thambi, K.T. Sreekumar, C.S. Kumar, P.C.R. Raj, Random forest algorithm for improving the performance of speech/non-speech detection, in 2014 First International Conference on Computational Systems and Communications (ICCSC), IEEE, 2014, pp. 28–32.
R. Thirumuru, A.K. Vuppala, Application of non-negative frequency-weighted energy operator for vowel region detection. Int. J. Speech Technol. 21(2), 279–291 (2018)
Article Google Scholar
D. Thornton, A.W. Harkrider, D. Jenson, T. Saltuklaroglu, Sensorimotor activity measured via oscillations of EEG mu rhythms in speech and non-speech discrimination tasks with and without segmentation demands. Brain Lang. 187, 62–73 (2018)
Article Google Scholar
N. Upadhyay, R.K. Jaiswal, Single channel speech enhancement: using Wiener filtering with recursive noise estimation. Procedia Comput. Sci. 84, 22–30 (2016)
Article Google Scholar
A. Varga, The NOISEX-92 study on the effect of additive noise on automatic speech recognition, Ical Report, DRA Speech Res. Unit. (1992)
D. Wang, On ideal binary mask as the computational goal of auditory scene analysis, in Speech Separation by Humans and Machines. (Springer, Boston, 2005), pp. 181–197. https://doi.org/10.1007/0-387-22794-6_12
K. Yang, Z. Huang, X. Wang, F. Wang, An SNR estimation technique based on deep learning. Electronics 8(10), 1139 (2019). https://doi.org/10.3390/electronics8101139
Article Google Scholar
X. Zhao, Y. Shao, D.L. Wang, Robust speaker identification using a CASA front-end, in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, IEEE, 2011, pp. 5468–5471. https://doi.org/10.1109/ICASSP.2011.5947596

Download references

Author information

Authors and Affiliations

Department of Electronics & Communication Engineering, Assam University, Silchar, Assam, India
Nirupam Shome & Richik Kashyap
Department of Electronics and Communication Engineering, National Institute Technology, Silchar, Assam, India
Rabul Hussain Laskar

Authors

Nirupam Shome
View author publications
You can also search for this author in PubMed Google Scholar
Rabul Hussain Laskar
View author publications
You can also search for this author in PubMed Google Scholar
Richik Kashyap
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nirupam Shome.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shome, N., Laskar, R.H. & Kashyap, R. Non-negative Frequency-Weighted Energy-Based Speech Quality Estimation for Different Modes and Quality of Speech. Circuits Syst Signal Process 41, 6788–6826 (2022). https://doi.org/10.1007/s00034-022-02070-y

Download citation

Received: 23 August 2021
Revised: 20 May 2022
Accepted: 21 May 2022
Published: 12 July 2022
Issue Date: December 2022
DOI: https://doi.org/10.1007/s00034-022-02070-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Non-negative Frequency-Weighted Energy-Based Speech Quality Estimation for Different Modes and Quality of Speech

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Automatic speech recognition: a survey

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Non-negative Frequency-Weighted Energy-Based Speech Quality Estimation for Different Modes and Quality of Speech

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Automatic speech recognition: a survey

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation