Unsupervised Phonetic Segmentation of Classical Arabic Speech Using Forward and Inverse Characteristics of the Vocal Tract

Javed, Muhammad; Baig, Mirza Muhammad Ali; Qazi, Saad Ahmed

doi:10.1007/s13369-019-04065-5

Unsupervised Phonetic Segmentation of Classical Arabic Speech Using Forward and Inverse Characteristics of the Vocal Tract

Research Article - Electrical Engineering
Published: 22 August 2019

Volume 45, pages 1581–1597, (2020)
Cite this article

Arabian Journal for Science and Engineering Aims and scope Submit manuscript

Muhammad Javed¹,
Mirza Muhammad Ali Baig¹ &
Saad Ahmed Qazi¹

403 Accesses
3 Citations
Explore all metrics

Abstract

Automatic segmentation of speech is about identifying boundaries of phonemes in a given utterance. This paper presents a strategy driven by cosine distance similarity scores for identifying phoneme boundaries. The proposed strategy helps in the selection of appropriate feature extraction technique for speech segmentation applications. After assessing various state-of-the-art speech processing techniques, a novel combination of forward and inverse characteristics of vocal tract (FICV) is developed. The proposed technique is evaluated on Classical Arabic dataset. Extensive experiments are made to compare the proposed technique with state-of-the-art techniques, including the hidden Markov model-based forced alignment procedures. The results show that proposed technique has total error rate of 14.48%, while the accuracy is 85.2% within 10 ms alignment error. When compared with the existing state-of-the-art technique, the proposed technique outperforms by 12.29% and 22.73% in terms of error rates and alignment accuracies, respectively, which signifies the potential of using FICV in speech segmentation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Embedded Learning Segmentation Approach for Arabic Speech Recognition

Probabilistic Approach for Detection of Vocal Pathologies in the Arabic Speech

HMM/SVM segmentation and labelling of Arabic speech for speech recognition applications

Article 09 June 2017

References

Brognaux, S.; Drugman, T.: HMM-based speech segmentation: improvements of fully automatic approaches. IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP) 24, 5–15 (2016)
Article Google Scholar
Adell, J.; Bonafonte, A.: Towards phone segmentation for concatenative speech synthesis. In: Fifth ISCA Workshop on Speech Synthesis (2004)
Lee, K.-F.; Hon, H.-W.; Reddy, R.: An overview of the SPHINX speech recognition system. In: Waibel, A., Lee, K.-F. (eds.) Readings in Speech Recognition, pp. 600–610. Morgan Kaufmann, San Francisco (1990). https://doi.org/10.1016/B978-0-08-051584-7.50056-5
Chapter Google Scholar
Graves, A.; Mohamed, A.-R.; Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6645–6649 (2013)
Sharma, M.; Mammone, R.: Subword-based text-dependent speaker verification system with user-selectable passwords. In: 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings, pp. 93–96 (1996)
Alsulaiman, M.; Mahmood, A.; Muhammad, G.: Speaker recognition based on Arabic phonemes. Speech Commun. 86, 42–51 (2017)
Article Google Scholar
Pradhan, G.; Prasanna, S.M.: Speaker verification by vowel and nonvowel like segmentation. IEEE Trans. Audio Speech Lang. Process. 21, 854–867 (2013)
Article Google Scholar
Muthusamy, Y.K.; Barnard, E.; Cole, R.A.: Reviewing automatic language identification. IEEE Signal Process. Mag. 11, 33–41 (1994)
Article Google Scholar
Adami, A.G.; Hermansky, H.: Segmentation of speech for speaker and language recognition. In: Eighth European Conference on Speech Communication and Technology (2003)
van Hemert, J.P.: Automatic segmentation of speech. IEEE Trans. Signal Process. 39, 1008–1012 (1991)
Article Google Scholar
Hosom, J.-P.: Automatic time alignment of phonemes using acoustic-phonetic information. Thesis, OHSU (2000). http://digitalcommons.ohsu.edu/etd/175
Awais, M.; Masud, S.; Shamail, S.: Continuous arabic speech segmentation using FFT spectrogram. Innov. Inf. Technol. 2006, 1–6 (2006)
Google Scholar
Ljolje, A.; Riley, M.: Automatic segmentation and labeling of speech. In: 1991 International Conference on Acoustics, Speech, and Signal Processing, 1991. ICASSP-91, pp. 473–476 (1991)
Kessens, J.M.; Strik, H.: On automatic phonetic transcription quality: lower word error rates do not guarantee better transcriptions. Comput. Speech Lang. 18, 123–141 (2004)
Article Google Scholar
Kim, Y.-J.; Conkie, A.: Automatic segmentation combining an HMM-based approach and spectral boundary correction. In: Seventh International Conference on Spoken Language Processing (2002)
Scharenborg, O.; Wan, V.; Ernestus, M.: Unsupervised speech segmentation: an analysis of the hypothesized phone boundaries. J. Acoust. Soc. Am. 127, 1084–1095 (2010)
Article Google Scholar
Rasanen, O.; Laine, U.; Altosaar, T.: Blind segmentation of speech using non-linear filtering methods. In: Ipsic, I. (ed.) Speech Technologies, pp. 105–124. IntechOpen (2011). https://doi.org/10.5772/16433
Google Scholar
Khanagha, V.; Daoudi, K.; Pont, O.; Yahia, H.: Phonetic segmentation of speech signal using local singularity analysis. Digit. Signal Proc. 35, 86–94 (2014)
Article Google Scholar
Dusan, S.; Rabiner, L.: On the relation between maximum spectral transition positions and phone boundaries. In: Ninth International Conference on Spoken Language Processing (2006)
Frihia, H.; Bahi, H.: HMM/SVM segmentation and labelling of Arabic speech for speech recognition applications. Int. J. Speech Technol. 20, 563–573 (2017)
Article Google Scholar
Sangeetha, J.; Jothilakshmi, S.: Robust automatic continuous speech segmentation for indian languages to improve speech to speech translation. Int. J. Comput. Appl. 53, 13–16 (2012)
Google Scholar
Anwar, M.J.; Awais, M.; Masud, S.; Shamail, S.: Automatic Arabic speech segmentation system. Int. J. Inf. Technol. 12, 102–111 (2006)
Google Scholar
Kaur, E.A.; Singh, E.T.: Segmentation of continuous punjabi speech signal into syllables. In: Proceedings of the World Congress on Engineering and Computer Science, pp. 20–22 (2010)
Tolba, M.; Nazmy, T.; Abdelhamid, A.; Gadallah, M.: A novel method for Arabic consonant/vowel segmentation using wavelet transform. Int. J. Intell. Coop. Inf. Syst. IJICIS 5, 353–364 (2005)
Google Scholar
Shah, N.J.; Vachhani, B.B.; Sailor, H.B.; Patil, H.A.: Effectiveness of PLP-based phonetic segmentation for speech synthesis. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 270–274 (2014)
Nagarajan, T.; Murthy, H.A.; Hegde, R.M.: Segmentation of speech into syllable-like units. In: Eighth European Conference on Speech Communication and Technology (2003)
Rabiner, L.R.; Schafer, R.W.: Theory and Applications of Digital Speech Processing, vol. 64. Pearson, Upper Saddle River (2011)
Google Scholar
Hermansky, H.: Perceptual linear predictive (PLP) analysis of speech. J. Acoust. Soc. Am. 87, 1738–1752 (1990)
Article Google Scholar
Hemdal, J.F.; Lougheed, R.M.: Morphological approaches to the automatic extraction of phonetic features. IEEE Trans. Signal Process. 39, 490–497 (1991)
Article Google Scholar
Dimitriadis, D.; Maragos, P.; Potamianos, A.: On the effects of filterbank design and energy computation on robust speech recognition. IEEE Trans. Audio Speech Lang. Process. 19, 1504–1516 (2011)
Article Google Scholar
Do, C.-T.; Pastor, D.; Goalic, A.: On the recognition of cochlear implant-like spectrally reduced speech with MFCC and HMM-based ASR. IEEE Trans. Audio Speech Lang. Process. 18, 1065–1068 (2010)
Article Google Scholar
Makhoul, J.: Linear prediction: a tutorial review. Proc. IEEE 63, 561–580 (1975)
Article Google Scholar
Markel, J.D.; Gray, A.J.: Linear Prediction of Speech. Springer, Berlin (1976)
Book Google Scholar
Sara, S.: Ibn Sina: A Treatise on Arabic Phonetics. LINCOM Publishers, Munich (2009)
Google Scholar
Baig, M.M.A.; Qazi, S.A.; Kadri, M.B.: Discriminative training for phonetic recognition of the Holy Quran. Arab. J. Sci. Eng. 40, 2629–2640 (2015)
Article Google Scholar
Alghamdi, M.M.; Ajami Alotaibi, Y.: HMM automatic speech recognition system of Arabic alphadigits. Arab. J. Sci. Eng. 35, 137 (2010)
Google Scholar
Alotaibi, Y.A.; Muhammad, G.: Study on pharyngeal and uvular consonants in foreign accented Arabic for ASR. Comput. Speech Lang. 24, 219–231 (2010)
Article Google Scholar
Boersma, P.: Praat: doing phonetics by computer. http://www.praat.org/ (2006). Accessed 1 Jan 2014
Furui, S.: On the role of spectral transition for speech perception. J. Acoust. Soc. Am. 80, 1016–1025 (1986)
Article Google Scholar
Davis, S.B.; Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. In: Waibel, A., Lee, K.-F. (eds.) Readings in Speech Recognition, pp. 65–74. Morgan Kaufmann, San Francisco (1990). https://doi.org/10.1016/B978-0-08-051584-7.50056-5
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

NED University of Engineering and Technology, Karachi, Pakistan
Muhammad Javed, Mirza Muhammad Ali Baig & Saad Ahmed Qazi

Authors

Muhammad Javed
View author publications
You can also search for this author in PubMed Google Scholar
Mirza Muhammad Ali Baig
View author publications
You can also search for this author in PubMed Google Scholar
Saad Ahmed Qazi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Muhammad Javed.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Javed, M., Baig, M.M.A. & Qazi, S.A. Unsupervised Phonetic Segmentation of Classical Arabic Speech Using Forward and Inverse Characteristics of the Vocal Tract. Arab J Sci Eng 45, 1581–1597 (2020). https://doi.org/10.1007/s13369-019-04065-5

Download citation

Received: 02 November 2018
Accepted: 24 July 2019
Published: 22 August 2019
Issue Date: March 2020
DOI: https://doi.org/10.1007/s13369-019-04065-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Unsupervised Phonetic Segmentation of Classical Arabic Speech Using Forward and Inverse Characteristics of the Vocal Tract

Abstract

Access this article

Similar content being viewed by others

Embedded Learning Segmentation Approach for Arabic Speech Recognition

Probabilistic Approach for Detection of Vocal Pathologies in the Arabic Speech

HMM/SVM segmentation and labelling of Arabic speech for speech recognition applications

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Unsupervised Phonetic Segmentation of Classical Arabic Speech Using Forward and Inverse Characteristics of the Vocal Tract

Abstract

Access this article

Similar content being viewed by others

Embedded Learning Segmentation Approach for Arabic Speech Recognition

Probabilistic Approach for Detection of Vocal Pathologies in the Arabic Speech

HMM/SVM segmentation and labelling of Arabic speech for speech recognition applications

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation