Dominant Voiced Speech Segregation and Noise Reduction Pre-processing Module for Hearing Aids and Speech Processing Applications

Hamsa, Shibani; Iraqi, Youssef; Shahin, Ismail; Werghi, Naoufel

doi:10.1007/978-3-030-73689-7_38

Shibani Hamsa²¹,
Youssef Iraqi²¹,
Ismail Shahin²² &
…
Naoufel Werghi²¹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1383))

Included in the following conference series:

International Conference on Soft Computing and Pattern Recognition

888 Accesses
1 Citations

Abstract

Speech experiences different acoustic obstructions in normal environment, whereas numerous of the applications require a compelling way to partitioned the original dominant speech from the impedance, a perfect hearing framework ought to be able to isolated and recognize sound-related occasions precisely from complex sound-related scenes and in unfavorable conditions. Difficulty in distinguishing a particular speech from a mixture of other unwanted conversations is one of the problems faced by people wearing hearing aid. The possibility of partition of overwhelming discourse from other discourse signals and its enhancement from that point will be accommodating for individuals with hearing disability. The recent literature in the Computational auditory scene analysis (CASA) systems are based on gammatone filter bank and Short time Fourier transform (STFT). But higher computational complexity associated with those models adversely affect the implementation of digital hearing aids. This paper introduces a cochlear model using Wavelet packet transform (WPT) and a novel approach for dominant voiced speech segregation. The experiments confirmed the enhancement of our model in terms of computational complexity and recognition rate when compared to competitive models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Hayat, M., Khan, S.H., Werghi, N., Goecke, R.: Joint registration and representation learning for unconstrained face identification. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1551–1560 (2017). https://doi.org/10.1109/CVPR.2017.169
Taha, B., Hayat, M., Berretti, S., Hatzinakos, D., Werghi, N.: Learned 3D shape representations using fused geometrically augmented images: application to facial expression and action unit detection. IEEE Trans. Circuits Syst. Video Technol. 30(9), 2900–2916 (2020). https://doi.org/10.1109/TCSVT.2020.2984241
Article Google Scholar
Xiao, Y., Siebert, P., Werghi, N.: Topological segmentation of discrete human body shapes in various postures based on geodesic distance. In: Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004, vol. 3, pp. 131–135 (2004). https://doi.org/10.1109/ICPR.2004.1334486
Werghi, N., Xiao, Y.: Recognition of human body posture from a cloud of 3D data points using wavelet transform coefficients. In: Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition, pp. 77–82 (2002). https://doi.org/10.1109/AFGR.2002.1004135
Hamsa, S., Shahin, I., Iraqi, Y., Werghi, N.: Emotion recognition from speech using wavelet packet transform cochlear filter bank and random forest classifier. IEEE Access 8, 96994–97006 (2020). https://doi.org/10.1109/ACCESS.2020.2991811
Li, P., Guan, Y., Wang, S., Xu, B., Liu, W.: Monaural speech separation based on MAXVQ and CASA for robust speech recognition. Comput. Speech Lang. 24(1), 30–44 (2010). https://doi.org/10.1007/s00521-018-3760-2
Article Google Scholar
Shahin, I., Nassif, A.B., Hamsa, S.: Novel cascaded gaussian mixture model-deep neural network classifier for speaker identification in emotional talking environments. Neural Comput. Appl. 32(7), 2575–2587 (2020). https://doi.org/10.1109/ICPR.2002.1044704
Article Google Scholar
Werghi, N., Xiao, Y.: Wavelet moments for recognizing human body posture from 3D scans. In: Object Recognition Supported by User Interaction for Service Robots, vol. 1. IEEE, pp. 319–322 (2002). https://doi.org/10.1016/j.patrec.2004.09.018
Werghi, N.: A discriminative 3D wavelet-based descriptors: application to the recognition of human body postures. Pattern Recogn. Lett. 26(5), 663–677 (2005). https://doi.org/10.1016/j.patrec.2004.09.018
Article Google Scholar
Taha, B., Dias, J., Werghi, N.: Classification of cervical-cancer using pap-smear images: a convolutional neural network approach. Communications in Computer and Information Science, vol. 723 (2017). https://doi.org/10.1109/ACCESS.2019.2901352
Hardcastle, W.J., Laver, J., Gibbon, F.E.: The Handbook of Phonetic Sciences, vol. 116. John Wiley & Sons, Hoboken (2012)
Google Scholar
Zwicker, E.: Subdivision of the audible frequency range into critical bands. J. Acoust. Soc. Am. 33(2), 248–248 (1961)
Article Google Scholar
Subasi, A., Ercelebi, E.: Classification of EEG signals using neural network and logistic regression. Comput. Methods Programs Biomed. 78(2), 87–99 (2005)
Article Google Scholar
Mahmoodzadeh, A., Abutalebi, H.R., Soltanian-Zadeh, H., Sheikhzadeh, H.: Single channel speech separation with a frame-based pitch range estimation method in modulation frequency. In: 2010 5th International Symposium on Telecommunications. IEEE, pp. 609–613 (2010)
Google Scholar
Drullman, R., Festen, J.M., Plomp, R.: Effect of temporal envelope smearing on speech reception. J. Acoust. Soc. Am. 95(2), 1053–1064 (1994)
Article Google Scholar
Mahmoodzadeh, A., Abutalebi, H.R., Soltanian-Zadeh, H., Sheikhzadeh, H.: Single channel speech separation in modulation frequency domain based on a novel pitch range estimation method. EURASIP J. Adv. Signal Process. 2012(1), 67 (2012)
Article Google Scholar
Patterson, R.D., Nimmo-Smith, I., Holdsworth, J., Rice, P.: An efficient auditory filterbank based on the gammatone function. In: A Meeting of the IOC Speech Group on Auditory Modelling at RSRE, vol. 2, no. 7 (1987)
Google Scholar
Shahin, I., Nassif, A.B., Hamsa, S.: Emotion recognition using hybrid Gaussian mixture model and deep neural network. IEEE Access, 7, 26777–26787 (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

C2PS, Department of ECE, Khalifa University, Abu Dhabi, UAE
Shibani Hamsa, Youssef Iraqi & Naoufel Werghi
University of Sharjah, Sharjah, UAE
Ismail Shahin

Authors

Shibani Hamsa
View author publications
You can also search for this author in PubMed Google Scholar
Youssef Iraqi
View author publications
You can also search for this author in PubMed Google Scholar
Ismail Shahin
View author publications
You can also search for this author in PubMed Google Scholar
Naoufel Werghi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Naoufel Werghi .

Editor information

Editors and Affiliations

Scientific Network for Innovation and Research Excellence, Machine Intelligence Research Labs (MIR Labs), Auburn, WA, USA
Ajith Abraham
Systems Innovation, School of Engineering, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
Yukio Ohsawa
Scientific Network for Innovation and Research Excellence, Machine Intelligence Research Labs (MIR Labs), Auburn, WA, USA
Niketa Gandhi
Vardhaman College of Engineering, Hyderabad, Telangana, India
M.A. Jabbar
Faculty of Sciences and Techniques, Hassan 1st University, Settat, Morocco
Abdelkrim Haqiq
Queen’s University, Belfast, UK
Seán McLoone
Northumbria University, Newcastle, UK
Biju Issac

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hamsa, S., Iraqi, Y., Shahin, I., Werghi, N. (2021). Dominant Voiced Speech Segregation and Noise Reduction Pre-processing Module for Hearing Aids and Speech Processing Applications. In: Abraham, A., et al. Proceedings of the 12th International Conference on Soft Computing and Pattern Recognition (SoCPaR 2020). SoCPaR 2020. Advances in Intelligent Systems and Computing, vol 1383. Springer, Cham. https://doi.org/10.1007/978-3-030-73689-7_38

Download citation

DOI: https://doi.org/10.1007/978-3-030-73689-7_38
Published: 16 April 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-73688-0
Online ISBN: 978-3-030-73689-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics