Skip to main content

Dominant Voiced Speech Segregation and Noise Reduction Pre-processing Module for Hearing Aids and Speech Processing Applications

  • Conference paper
  • First Online:
Proceedings of the 12th International Conference on Soft Computing and Pattern Recognition (SoCPaR 2020) (SoCPaR 2020)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1383))

Included in the following conference series:

Abstract

Speech experiences different acoustic obstructions in normal environment, whereas numerous of the applications require a compelling way to partitioned the original dominant speech from the impedance, a perfect hearing framework ought to be able to isolated and recognize sound-related occasions precisely from complex sound-related scenes and in unfavorable conditions. Difficulty in distinguishing a particular speech from a mixture of other unwanted conversations is one of the problems faced by people wearing hearing aid. The possibility of partition of overwhelming discourse from other discourse signals and its enhancement from that point will be accommodating for individuals with hearing disability. The recent literature in the Computational auditory scene analysis (CASA) systems are based on gammatone filter bank and Short time Fourier transform (STFT). But higher computational complexity associated with those models adversely affect the implementation of digital hearing aids. This paper introduces a cochlear model using Wavelet packet transform (WPT) and a novel approach for dominant voiced speech segregation. The experiments confirmed the enhancement of our model in terms of computational complexity and recognition rate when compared to competitive models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Hayat, M., Khan, S.H., Werghi, N., Goecke, R.: Joint registration and representation learning for unconstrained face identification. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1551–1560 (2017). https://doi.org/10.1109/CVPR.2017.169

  2. Taha, B., Hayat, M., Berretti, S., Hatzinakos, D., Werghi, N.: Learned 3D shape representations using fused geometrically augmented images: application to facial expression and action unit detection. IEEE Trans. Circuits Syst. Video Technol. 30(9), 2900–2916 (2020). https://doi.org/10.1109/TCSVT.2020.2984241

    Article  Google Scholar 

  3. Xiao, Y., Siebert, P., Werghi, N.: Topological segmentation of discrete human body shapes in various postures based on geodesic distance. In: Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004, vol. 3, pp. 131–135 (2004). https://doi.org/10.1109/ICPR.2004.1334486

  4. Werghi, N., Xiao, Y.: Recognition of human body posture from a cloud of 3D data points using wavelet transform coefficients. In: Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition, pp. 77–82 (2002). https://doi.org/10.1109/AFGR.2002.1004135

  5. Hamsa, S., Shahin, I., Iraqi, Y., Werghi, N.: Emotion recognition from speech using wavelet packet transform cochlear filter bank and random forest classifier. IEEE Access 8, 96994–97006 (2020). https://doi.org/10.1109/ACCESS.2020.2991811

  6. Li, P., Guan, Y., Wang, S., Xu, B., Liu, W.: Monaural speech separation based on MAXVQ and CASA for robust speech recognition. Comput. Speech Lang. 24(1), 30–44 (2010). https://doi.org/10.1007/s00521-018-3760-2

    Article  Google Scholar 

  7. Shahin, I., Nassif, A.B., Hamsa, S.: Novel cascaded gaussian mixture model-deep neural network classifier for speaker identification in emotional talking environments. Neural Comput. Appl. 32(7), 2575–2587 (2020). https://doi.org/10.1109/ICPR.2002.1044704

    Article  Google Scholar 

  8. Werghi, N., Xiao, Y.: Wavelet moments for recognizing human body posture from 3D scans. In: Object Recognition Supported by User Interaction for Service Robots, vol. 1. IEEE, pp. 319–322 (2002). https://doi.org/10.1016/j.patrec.2004.09.018

  9. Werghi, N.: A discriminative 3D wavelet-based descriptors: application to the recognition of human body postures. Pattern Recogn. Lett. 26(5), 663–677 (2005). https://doi.org/10.1016/j.patrec.2004.09.018

    Article  Google Scholar 

  10. Taha, B., Dias, J., Werghi, N.: Classification of cervical-cancer using pap-smear images: a convolutional neural network approach. Communications in Computer and Information Science, vol. 723 (2017). https://doi.org/10.1109/ACCESS.2019.2901352

  11. Hardcastle, W.J., Laver, J., Gibbon, F.E.: The Handbook of Phonetic Sciences, vol. 116. John Wiley & Sons, Hoboken (2012)

    Google Scholar 

  12. Zwicker, E.: Subdivision of the audible frequency range into critical bands. J. Acoust. Soc. Am. 33(2), 248–248 (1961)

    Article  Google Scholar 

  13. Subasi, A., Ercelebi, E.: Classification of EEG signals using neural network and logistic regression. Comput. Methods Programs Biomed. 78(2), 87–99 (2005)

    Article  Google Scholar 

  14. Mahmoodzadeh, A., Abutalebi, H.R., Soltanian-Zadeh, H., Sheikhzadeh, H.: Single channel speech separation with a frame-based pitch range estimation method in modulation frequency. In: 2010 5th International Symposium on Telecommunications. IEEE, pp. 609–613 (2010)

    Google Scholar 

  15. Drullman, R., Festen, J.M., Plomp, R.: Effect of temporal envelope smearing on speech reception. J. Acoust. Soc. Am. 95(2), 1053–1064 (1994)

    Article  Google Scholar 

  16. Mahmoodzadeh, A., Abutalebi, H.R., Soltanian-Zadeh, H., Sheikhzadeh, H.: Single channel speech separation in modulation frequency domain based on a novel pitch range estimation method. EURASIP J. Adv. Signal Process. 2012(1), 67 (2012)

    Article  Google Scholar 

  17. Patterson, R.D., Nimmo-Smith, I., Holdsworth, J., Rice, P.: An efficient auditory filterbank based on the gammatone function. In: A Meeting of the IOC Speech Group on Auditory Modelling at RSRE, vol. 2, no. 7 (1987)

    Google Scholar 

  18. Shahin, I., Nassif, A.B., Hamsa, S.: Emotion recognition using hybrid Gaussian mixture model and deep neural network. IEEE Access, 7, 26777–26787 (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Naoufel Werghi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hamsa, S., Iraqi, Y., Shahin, I., Werghi, N. (2021). Dominant Voiced Speech Segregation and Noise Reduction Pre-processing Module for Hearing Aids and Speech Processing Applications. In: Abraham, A., et al. Proceedings of the 12th International Conference on Soft Computing and Pattern Recognition (SoCPaR 2020). SoCPaR 2020. Advances in Intelligent Systems and Computing, vol 1383. Springer, Cham. https://doi.org/10.1007/978-3-030-73689-7_38

Download citation

Publish with us

Policies and ethics