Abstract
This paper proposes a robust signal-to-noise ratio (SNR) estimation technique from continuous speech based on the non-negative frequency-weighted energy operator using the envelope of derivative of the speech signal. The proposed SNR evaluation method is implemented in two phases. The first phase of SNR estimation consists of speech region identification by glottal activity detection (GAD) along with two pre-processing and post-processing methods. Speech enhancement with background subtraction is the pre-processing and obstruent detection with duration modification is the post-processing modules to GAD, which improves the speech and non-speech region detection performance. The next stage comprises estimation of SNR from instantaneous energy contour obtained from the envelope of the derivative of the speech signal, which is based on non-negative frequency-weighted energy estimation. The ratio of instantaneous energy of speech region to non-speech region is considered as the estimated segmental SNR. The final SNR is calculated by averaging segmental SNR and compared with true SNR to get the estimation efficiency of the proposed method. The performance of the proposed method for SNR estimation from speech sample is evaluated using TIMIT, NOIZEUS, and TESDHE acoustic speech corpus. The TIMIT and NOIZEUS databases are used to evaluate the performance for a different type of noise and TESDHE database is used for analysis under a different mode of data. The result of our study shows that the proposed system provides accurate SNR estimation across different types of noise and different modes of signal for various signal levels and outperforms the other methods under observation.
Similar content being viewed by others
Data Availability
The data that support the findings of this study are available in TIMIT[13], NOISEX-92[57],NOIZEUS[21], and TESDHE[31] with the identifiers https://doi.org/10.35111/17gk-bn40, https://doi.org/10.1016/0167-6393(93)90095-3, https://doi.org/10.1016/j.specom.2006.12.006, and https://doi.org/10.1007/s10772-018-9557-y.
References
P. Alku, T. Bäckström, E. Vilkman, Normalized amplitude quotient for parametrization of the glottal flow. J. Acoust. Soc. Am. 112(2), 701–710 (2002)
R. Aralikatti, D.K. Margam, T. Sharma, A. Thanda, S.M. Venkatesan, Global SNR Estimation of speech signals using entropy and uncertainty estimates from dropout networks, in INTERSPEECH, Hyderabad, India, 2018
S. Boll, Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acoust. Speech Signal Process. 27(2), 113–120 (1979)
A.L. Bowers, T. Saltuklaroglu, A. Harkrider, M. Wilson, M.A. Toner, Dynamic modulation of shared sensory and motor cortical rhythms mediates speech and non-speech discrimination performance. Front. Psychol. 5, 366 (2014)
C. Breithaupt, T. Gerkmann, R. Martin, A novel a priori SNR estimation approach based on selective cepstro-temporal smoothing, in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, IEEE, 2008, pp. 4897–4900. https://doi.org/10.1109/ICASSP.2008.4518755.
I. Cohen, Relaxed statistical model for speech enhancement and a priori SNR estimation. IEEE Trans. Speech Audio Process. 13(5), 870–881 (2005). https://doi.org/10.1109/TSA.2005.851940
J.A.M. Cordovilla, N. Ma, V. Sánchez, J.L. Carmona, A.M. Peinado, J. Barker, A pitch based noise estimation technique for robust speech recognition with missing data, in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, IEEE, 2011, pp. 4808–4811. https://doi.org/10.1109/ICASSP.2011.5947431
N. Dhananjaya, B. Yegnanarayana, Voiced/nonvoiced detection based on robustness of voiced epochs. IEEE Signal Process. Lett. 17(3), 273–276 (2009)
M.A.A. El-Fattah, M.I. Dessouky, A.M. Abbas, S.M. Diab, E.S.M. El-Rabaie, W. Al-Nuaimy, S.A. Alshebeili, F.E. Abd El-Samie, Speech enhancement with an adaptive Wiener filter. Int. J. Speech Technol. 17(1), 53–64 (2014). https://doi.org/10.1007/s10772-013-9205-5
S. Elshamy, N. Madhu, W. Tirry, T. Fingscheidt, An iterative speech model-based a priori SNR estimator, in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2015, pp. 1740–1744
A.K. Fuchs, C. Amon, M. Hagmüller, Speech/non-speech detection for electro-larynx speech using EMG, in International Conference on Bio-Inspired Systems and Signal Processing, SCITEPRESS, 2015, pp. 138–144
S. Furui, Digital speech processing, synthesis, and recognition. CRC Press (2018). https://doi.org/10.1201/9781482270648
J.S. Garofolo, Timit acoustic phonetic continuous speech corpus, Linguist. Data Consort. 1993. (1993)
D. Govind, S.R. Mahadeva Prasanna, B. Yegnanarayana, Significance of glottal activity detection for duration modification, in Proceedings of the 6th International Conference on Speech Prosody, SP 2012, 2012, pp. 470–473
F. Grondin, F. Michaud, Robust speech/non-speech discrimination based on pitch estimation for mobile robots, in 2016 IEEE International Conference on Robotics and Automation (ICRA), IEEE, 2016, pp. 1650–1655.
J.H.L. Hansen, B.L. Pellom, An effective quality evaluation protocol for speech enhancement algorithms, in Fifth International Conference on Spoken Language Processing, 1998
S. Hiroya, K. Jasmin, S. Krishnan, C. Lima, M. Ostarek, D. Boebinger, S.K. Scott, Speech rhythm measure of non-native speech using a statistical phoneme duration model, in The 8th Annual Meeting of the Society for the Neurobiology of Language, 2016
H.G. Hirsch, C. Ehrlicher, Noise estimation techniques for robust speech recognition, in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, IEEE, 1995, pp. 153–156. https://doi.org/10.1109/icassp.1995.479387
R.S. Holambe, M.S. Deshpande, Nonlinearity framework in speech processing, in Advances in Non-Linear Modeling for Speech Processing, Springer, 2012, pp. 11–25
Y. Hu, P.C. Loizou, Subjective comparison and evaluation of speech enhancement algorithms. Speech Commun. 49(7–8), 588–601 (2007)
G. Hu, D. Wang, Segregation of unvoiced speech from nonspeech interference. J. Acoust. Soc. Am. 124(2), 1306–1319 (2008). https://doi.org/10.1121/1.2939132
J.F. Kaiser, On a simple algorithm to calculate the ‘energy’ of a signal, in International Conference on Acoustics, Speech, and Signal Processing, IEEE, 1990, pp. 381–384
J.F. Kaiser, Some useful properties of Teager’s energy operators, in Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, 1993, pp. 149–152. https://doi.org/10.1109/icassp.1993.319457
C. Kim, R.M. Stern, Robust signal-to-noise ratio estimation based on waveform amplitude distribution analysis, in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2008, pp. 2598–2601
T. Kinnunen, H. Li, An overview of text-independent speaker recognition: from features to supervectors. Speech Commun. 52(1), 12–40 (2010)
P. Ladefoged, K. Johnson, A course in phonetics, Cengage learning, 2014
H. Li, D. Wang, X. Zhang, G. Gao, Frame-level signal-to-noise ratio estimation using deep learning., in INTERSPEECH, 2020, pp. 4626–4630
S. Lv, Y. Hu, S. Zhang, L. Xie, DCCRN+: Channel-wise subband DCCRN with SNR estimation for speech enhancement, ArXiv Preprint. (2021). http://arxiv.org/abs/2106.08672.
R. Martin, An efficient algorithm to estimate the instantaneous SNR of speech signals, in Third European Conference on Speech Communication and Technology (EUROSPEECH ’93), 1993: pp. 1093–1096.
R. Martin, Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Trans. Speech Audio Process. 9(5), 504–512 (2001). https://doi.org/10.1109/89.928915
A. Milton, K.A. Monsely, Tamil and English speech database for heartbeat estimation. Int. J. Speech Technol. 21(4), 967–973 (2018)
T. Moazzeni, A. Amei, J. Ma, Y. Jiang, Statistical model based SNR estimation method for speech signals. Electron. Lett. 48(12), 727–729 (2012). https://doi.org/10.1049/el.2012.0799
K.S.R. Murty, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16(8), 1602–1613 (2008)
A. Narayanan, D. Wang, A CASA-based system for long-term SNR estimation. IEEE Trans. Audio Speech Lang. Process. 20(9), 2518–2527 (2012). https://doi.org/10.1109/TASL.2012.2205242
NIST-SNR, “NIST speech signal to noise ratio measurements,” [Online]., Created May 19, 2015. (n.d.). available: https://www.nist.gov/itl/iad/%0Amig/nist-speech-signal-noise-ratio-measurements
J.M. O’Toole, A. Temko, N. Stevenson, Assessing instantaneous energy in the EEG: a non-negative, frequency-weighted energy operator, in 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, IEEE, 2014, pp. 3288–3291
J.M. O’Toole, B.G. Zapirain, I.M. Saiz, A.B.A. Chen, I.Y. Santamaría, Estimating the time-varying periodicity of epileptiform discharges in the electroencephalogram, in 2012 11th International Conference on Information Science, Signal Processing and Their Applications (ISSPA), IEEE, 2012, pp. 1229–1234
K. Palmu, N. Stevenson, S. Wikström, L. Hellström-Westas, S. Vanhatalo, J.M. Palva, Optimization of an NLEO-based algorithm for automated detection of spontaneous activity transients in early preterm EEG. Physiol. Meas. 31(11), N85 (2010)
P. Papadopoulos, A. Tsiartas, S. Narayanan, Long-term SNR estimation of speech signals in known and unknown channel conditions. IEEE/ACM Trans. Audio Speech Lang. Process. 24(12), 2495–2506 (2016). https://doi.org/10.1109/TASLP.2016.2615240
C. Plapous, C. Marro, P. Scalart, Improved signal-to-noise ratio estimation for speech enhancement. IEEE Trans. Audio Speech Lang. Process. 14(6), 2098–2108 (2006). https://doi.org/10.1109/TASL.2006.872621
S.R.M. Prasanna, J.M. Zachariah, B. Yegnanarayana, Begin-end detection using vowel onset points, in Workshop on Spoken Language Processing, 2003
F. Qu, S. Lei, Z. Zhao, J. Zhang, Z. Nie, A modified a priori SNR estimation for spectral subtraction speech enhancement, in 2021 IEEE 4th International Conference on Electronics Technology (ICET), IEEE, 2021, pp. 861–864. https://doi.org/10.1109/icet51757.2021.9451018
Z. Rafii, B. Pardo, Online REPET-SIM for real-time speech enhancement, in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, 2013, pp. 848–852
Z. Rafii, B. Pardo, Music/voice separation using the similarity matrix, in ISMIR, 2012, pp. 583–588
Z. Rafii, B. Pardo, Repeating pattern extraction technique (REPET): a simple method for music/voice separation. IEEE Trans. Audio Speech Lang. Process. 21(1), 73–84 (2012)
Y. Ren, M.T. Johnson, An improved SNR estimator for speech enhancement, in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, IEEE, 2008, pp. 4901–4904. https://doi.org/10.1109/ICASSP.2008.4518756
P. Saha, U. Baruah, R.H. Laskar, S. Mishra, S.P. Choudhury, T.K. Das, Robust analysis for improvement of vowel onset point detection under noisy conditions. Int. J. Speech Technol. 19(3), 433–448 (2016). https://doi.org/10.1007/s10772-016-9336-6
B.D. Sarma, S.R.M. Prasanna, P. Sarmah, Consonant-vowel unit recognition using dominant aperiodic and transition region detection. Speech Commun. 92, 77–89 (2017)
Y. Shao, C.-H. Chang, A versatile speech enhancement system based on perceptual wavelet denoising, in 2005 IEEE International Symposium on Circuits and Systems, IEEE, 2005, pp. 864–867.
N. Shome, R.H. Laskar, D. Das, Reference free speech quality estimation for diverse data condition. Int. J. Speech Technol. (2019). https://doi.org/10.1007/s10772-018-9537-2
S. Suhadi, C. Last, T. Fingscheidt, A data-driven approach to a priori SNR estimation. IEEE Trans. Audio Speech Lang. Process. 19(1), 186–195 (2011). https://doi.org/10.1109/TASL.2010.2045799
J. Tchroz, B. Kollmeier, SNR estimation based on amplitude modulation analysis with applications to noise suppression. IEEE Trans. Speech Audio Process. 11(3), 184–192 (2003). https://doi.org/10.1109/TSA.2003.811542
H.M. Teager, S.M. Teager, Evidence for nonlinear sound production mechanisms in the vocal tract, in Speech Production and Speech Modelling, Springer, 1990, pp. 241–261.
S. V Thambi, K.T. Sreekumar, C.S. Kumar, P.C.R. Raj, Random forest algorithm for improving the performance of speech/non-speech detection, in 2014 First International Conference on Computational Systems and Communications (ICCSC), IEEE, 2014, pp. 28–32.
R. Thirumuru, A.K. Vuppala, Application of non-negative frequency-weighted energy operator for vowel region detection. Int. J. Speech Technol. 21(2), 279–291 (2018)
D. Thornton, A.W. Harkrider, D. Jenson, T. Saltuklaroglu, Sensorimotor activity measured via oscillations of EEG mu rhythms in speech and non-speech discrimination tasks with and without segmentation demands. Brain Lang. 187, 62–73 (2018)
N. Upadhyay, R.K. Jaiswal, Single channel speech enhancement: using Wiener filtering with recursive noise estimation. Procedia Comput. Sci. 84, 22–30 (2016)
A. Varga, The NOISEX-92 study on the effect of additive noise on automatic speech recognition, Ical Report, DRA Speech Res. Unit. (1992)
D. Wang, On ideal binary mask as the computational goal of auditory scene analysis, in Speech Separation by Humans and Machines. (Springer, Boston, 2005), pp. 181–197. https://doi.org/10.1007/0-387-22794-6_12
K. Yang, Z. Huang, X. Wang, F. Wang, An SNR estimation technique based on deep learning. Electronics 8(10), 1139 (2019). https://doi.org/10.3390/electronics8101139
X. Zhao, Y. Shao, D.L. Wang, Robust speaker identification using a CASA front-end, in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, IEEE, 2011, pp. 5468–5471. https://doi.org/10.1109/ICASSP.2011.5947596
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Shome, N., Laskar, R.H. & Kashyap, R. Non-negative Frequency-Weighted Energy-Based Speech Quality Estimation for Different Modes and Quality of Speech. Circuits Syst Signal Process 41, 6788–6826 (2022). https://doi.org/10.1007/s00034-022-02070-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00034-022-02070-y