Skip to main content
Log in

Unsupervised Phonetic Segmentation of Classical Arabic Speech Using Forward and Inverse Characteristics of the Vocal Tract

  • Research Article - Electrical Engineering
  • Published:
Arabian Journal for Science and Engineering Aims and scope Submit manuscript

Abstract

Automatic segmentation of speech is about identifying boundaries of phonemes in a given utterance. This paper presents a strategy driven by cosine distance similarity scores for identifying phoneme boundaries. The proposed strategy helps in the selection of appropriate feature extraction technique for speech segmentation applications. After assessing various state-of-the-art speech processing techniques, a novel combination of forward and inverse characteristics of vocal tract (FICV) is developed. The proposed technique is evaluated on Classical Arabic dataset. Extensive experiments are made to compare the proposed technique with state-of-the-art techniques, including the hidden Markov model-based forced alignment procedures. The results show that proposed technique has total error rate of 14.48%, while the accuracy is 85.2% within 10 ms alignment error. When compared with the existing state-of-the-art technique, the proposed technique outperforms by 12.29% and 22.73% in terms of error rates and alignment accuracies, respectively, which signifies the potential of using FICV in speech segmentation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

References

  1. Brognaux, S.; Drugman, T.: HMM-based speech segmentation: improvements of fully automatic approaches. IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP) 24, 5–15 (2016)

    Article  Google Scholar 

  2. Adell, J.; Bonafonte, A.: Towards phone segmentation for concatenative speech synthesis. In: Fifth ISCA Workshop on Speech Synthesis (2004)

  3. Lee, K.-F.; Hon, H.-W.; Reddy, R.: An overview of the SPHINX speech recognition system. In: Waibel, A., Lee, K.-F. (eds.) Readings in Speech Recognition, pp. 600–610. Morgan Kaufmann, San Francisco (1990). https://doi.org/10.1016/B978-0-08-051584-7.50056-5

    Chapter  Google Scholar 

  4. Graves, A.; Mohamed, A.-R.; Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6645–6649 (2013)

  5. Sharma, M.; Mammone, R.: Subword-based text-dependent speaker verification system with user-selectable passwords. In: 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings, pp. 93–96 (1996)

  6. Alsulaiman, M.; Mahmood, A.; Muhammad, G.: Speaker recognition based on Arabic phonemes. Speech Commun. 86, 42–51 (2017)

    Article  Google Scholar 

  7. Pradhan, G.; Prasanna, S.M.: Speaker verification by vowel and nonvowel like segmentation. IEEE Trans. Audio Speech Lang. Process. 21, 854–867 (2013)

    Article  Google Scholar 

  8. Muthusamy, Y.K.; Barnard, E.; Cole, R.A.: Reviewing automatic language identification. IEEE Signal Process. Mag. 11, 33–41 (1994)

    Article  Google Scholar 

  9. Adami, A.G.; Hermansky, H.: Segmentation of speech for speaker and language recognition. In: Eighth European Conference on Speech Communication and Technology (2003)

  10. van Hemert, J.P.: Automatic segmentation of speech. IEEE Trans. Signal Process. 39, 1008–1012 (1991)

    Article  Google Scholar 

  11. Hosom, J.-P.: Automatic time alignment of phonemes using acoustic-phonetic information. Thesis, OHSU (2000). http://digitalcommons.ohsu.edu/etd/175

  12. Awais, M.; Masud, S.; Shamail, S.: Continuous arabic speech segmentation using FFT spectrogram. Innov. Inf. Technol. 2006, 1–6 (2006)

    Google Scholar 

  13. Ljolje, A.; Riley, M.: Automatic segmentation and labeling of speech. In: 1991 International Conference on Acoustics, Speech, and Signal Processing, 1991. ICASSP-91, pp. 473–476 (1991)

  14. Kessens, J.M.; Strik, H.: On automatic phonetic transcription quality: lower word error rates do not guarantee better transcriptions. Comput. Speech Lang. 18, 123–141 (2004)

    Article  Google Scholar 

  15. Kim, Y.-J.; Conkie, A.: Automatic segmentation combining an HMM-based approach and spectral boundary correction. In: Seventh International Conference on Spoken Language Processing (2002)

  16. Scharenborg, O.; Wan, V.; Ernestus, M.: Unsupervised speech segmentation: an analysis of the hypothesized phone boundaries. J. Acoust. Soc. Am. 127, 1084–1095 (2010)

    Article  Google Scholar 

  17. Rasanen, O.; Laine, U.; Altosaar, T.: Blind segmentation of speech using non-linear filtering methods. In: Ipsic, I. (ed.) Speech Technologies, pp. 105–124. IntechOpen (2011). https://doi.org/10.5772/16433

    Google Scholar 

  18. Khanagha, V.; Daoudi, K.; Pont, O.; Yahia, H.: Phonetic segmentation of speech signal using local singularity analysis. Digit. Signal Proc. 35, 86–94 (2014)

    Article  Google Scholar 

  19. Dusan, S.; Rabiner, L.: On the relation between maximum spectral transition positions and phone boundaries. In: Ninth International Conference on Spoken Language Processing (2006)

  20. Frihia, H.; Bahi, H.: HMM/SVM segmentation and labelling of Arabic speech for speech recognition applications. Int. J. Speech Technol. 20, 563–573 (2017)

    Article  Google Scholar 

  21. Sangeetha, J.; Jothilakshmi, S.: Robust automatic continuous speech segmentation for indian languages to improve speech to speech translation. Int. J. Comput. Appl. 53, 13–16 (2012)

    Google Scholar 

  22. Anwar, M.J.; Awais, M.; Masud, S.; Shamail, S.: Automatic Arabic speech segmentation system. Int. J. Inf. Technol. 12, 102–111 (2006)

    Google Scholar 

  23. Kaur, E.A.; Singh, E.T.: Segmentation of continuous punjabi speech signal into syllables. In: Proceedings of the World Congress on Engineering and Computer Science, pp. 20–22 (2010)

  24. Tolba, M.; Nazmy, T.; Abdelhamid, A.; Gadallah, M.: A novel method for Arabic consonant/vowel segmentation using wavelet transform. Int. J. Intell. Coop. Inf. Syst. IJICIS 5, 353–364 (2005)

    Google Scholar 

  25. Shah, N.J.; Vachhani, B.B.; Sailor, H.B.; Patil, H.A.: Effectiveness of PLP-based phonetic segmentation for speech synthesis. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 270–274 (2014)

  26. Nagarajan, T.; Murthy, H.A.; Hegde, R.M.: Segmentation of speech into syllable-like units. In: Eighth European Conference on Speech Communication and Technology (2003)

  27. Rabiner, L.R.; Schafer, R.W.: Theory and Applications of Digital Speech Processing, vol. 64. Pearson, Upper Saddle River (2011)

    Google Scholar 

  28. Hermansky, H.: Perceptual linear predictive (PLP) analysis of speech. J. Acoust. Soc. Am. 87, 1738–1752 (1990)

    Article  Google Scholar 

  29. Hemdal, J.F.; Lougheed, R.M.: Morphological approaches to the automatic extraction of phonetic features. IEEE Trans. Signal Process. 39, 490–497 (1991)

    Article  Google Scholar 

  30. Dimitriadis, D.; Maragos, P.; Potamianos, A.: On the effects of filterbank design and energy computation on robust speech recognition. IEEE Trans. Audio Speech Lang. Process. 19, 1504–1516 (2011)

    Article  Google Scholar 

  31. Do, C.-T.; Pastor, D.; Goalic, A.: On the recognition of cochlear implant-like spectrally reduced speech with MFCC and HMM-based ASR. IEEE Trans. Audio Speech Lang. Process. 18, 1065–1068 (2010)

    Article  Google Scholar 

  32. Makhoul, J.: Linear prediction: a tutorial review. Proc. IEEE 63, 561–580 (1975)

    Article  Google Scholar 

  33. Markel, J.D.; Gray, A.J.: Linear Prediction of Speech. Springer, Berlin (1976)

    Book  Google Scholar 

  34. Sara, S.: Ibn Sina: A Treatise on Arabic Phonetics. LINCOM Publishers, Munich (2009)

    Google Scholar 

  35. Baig, M.M.A.; Qazi, S.A.; Kadri, M.B.: Discriminative training for phonetic recognition of the Holy Quran. Arab. J. Sci. Eng. 40, 2629–2640 (2015)

    Article  Google Scholar 

  36. Alghamdi, M.M.; Ajami Alotaibi, Y.: HMM automatic speech recognition system of Arabic alphadigits. Arab. J. Sci. Eng. 35, 137 (2010)

    Google Scholar 

  37. Alotaibi, Y.A.; Muhammad, G.: Study on pharyngeal and uvular consonants in foreign accented Arabic for ASR. Comput. Speech Lang. 24, 219–231 (2010)

    Article  Google Scholar 

  38. Boersma, P.: Praat: doing phonetics by computer. http://www.praat.org/ (2006). Accessed 1 Jan 2014

  39. Furui, S.: On the role of spectral transition for speech perception. J. Acoust. Soc. Am. 80, 1016–1025 (1986)

    Article  Google Scholar 

  40. Davis, S.B.; Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. In: Waibel, A., Lee, K.-F. (eds.) Readings in Speech Recognition, pp. 65–74. Morgan Kaufmann, San Francisco (1990). https://doi.org/10.1016/B978-0-08-051584-7.50056-5

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Muhammad Javed.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Javed, M., Baig, M.M.A. & Qazi, S.A. Unsupervised Phonetic Segmentation of Classical Arabic Speech Using Forward and Inverse Characteristics of the Vocal Tract. Arab J Sci Eng 45, 1581–1597 (2020). https://doi.org/10.1007/s13369-019-04065-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13369-019-04065-5

Keywords

Navigation