Syllable modeling in continuous speech recognition for Tamil language

Thangarajan, R.; Natarajan, A. M.; Selvam, M.

doi:10.1007/s10772-009-9058-0

Syllable modeling in continuous speech recognition for Tamil language

Published: 18 November 2009

Volume 12, pages 47–57, (2009)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

R. Thangarajan¹,
A. M. Natarajan² &
M. Selvam¹

307 Accesses
17 Citations
Explore all metrics

Abstract

In automatic speech recognition, the phone has probably been a dominating sub-word unit for more than one decade. Context Dependent phone or triphone modeling accounts for contextual variations between adjacent phones and state tying addresses modeling of triphones that are not seen during training. Recently, syllable is gaining momentum as a new sub-word unit. Syllable being a larger unit than a phone addresses the severe contextual variations between phones within it. Therefore, it is more stable than a phone and models pronunciation variability in a systematic way. Tamil language has challenging features like agglutination and morpho-phonology. In this paper, attempts have been made to provide solutions to these issues by using the syllable as a sub-word unit in an acoustic model. Initially, a small vocabulary context independent word models and a medium vocabulary context dependent phone models are developed. Subsequently, an algorithm based on prosodic syllable is proposed and two experiments have been conducted. First, syllable based context independent models have been trained and tested. Despite large number of syllables, this system has performed reasonably well compared to context independent word models in terms of word error rate and out of vocabulary words. Subsequently, in the second experiment, syllable information is integrated in conventional triphone modeling wherein cross-syllable triphones are replaced with monophones and the number of context dependent phone models is reduced by 22.76% in untied units. In spite of reduction in the number of models, the accuracy of the proposed system is comparable to that of the baseline triphone system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic speech recognition: a survey

Article 10 November 2020

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

Article 29 September 2022

Abbreviations

ANN:: Artificial Neural Networks
ASR:: Automatic Speech Recognition
CD:: Context Dependent
CI:: Context Independent
CIIL:: Central Institute of Indian Languages, Mysore
CMU:: Carnegie Melon University
HMM:: Hidden Markov Model
LVCSR:: Large Vocabulary Continuous Speech Recognition
SVM:: Support Vector Machine
WER:: Word Error Rate

References

Arden, A. H. (1934). A progressive grammar of common Tamil (4th ed.). Madras: Christian Literature Society, pp. 59.
Google Scholar
Arokianathan, S. (1981). Tamil clitics. Trivandrum: Dravidian Linguistics Association, pp. 5.
Google Scholar
Asher, R. E., & Keane, E. L. (2005). Diphthongs in colloquial Tamil. In W. J. Hardcastle & J. Mackenzie Beck (Eds.) (pp. 141–171).
Bahl, L. R., Bakis, R., Cohen, P. S., Cole, A. G., Jelinek, F., Lewis, B. L., & Mercer, R. L. (1980). Further results on the recognition of a continuously read natural corpus, presented at the IEEE international. In Conference on acoustics, speech, signal processing.
Bahl, L. R., Brown, P. F., De Souza, P. V., & Mercer, R. L. (1988). Acoustic Markov models used in the Tangora speech recognition system. Presented at the IEEE international conference on acoustics, speech, signal processing, 1988.
Balasubramanian, T. (1980). Timing in Tamil. Journal of Phonetics, 8, 449–467.
Google Scholar
CIIL, Central Institute of Indian Languages, Mysore, India. http://www.ciilcorpora.net/tamsam.htm.
Fujimura, O. (1975). Syllable as a unit of speech recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, ASSP-23(1), 82–87.
Article Google Scholar
Ganapathiraju, A., Hamaker, J., Picone, J., Ordowski, M., & Doddington, G. R. (2001). Syllable based large vocabulary continuous speech recognition. IEEE Transactions on Speech and Audio Processing, 9(4), 358–366.
Article Google Scholar
Greenberg, S. (1998). Speaking in short hand—a syllable centric perspective for understanding pronunciation variation. In Proceedings of the ESCA workshop on modeling pronunciation variation for automatic speech recognition, Kekrade, 1998 (pp. 47–56).
Huang, X., Acero, A., & Hon, H. W. (2001). Spoken language processing—a guide to theory, algorithm and system development. Englewood Cliffs: Prentice-Hall PTR. ISBN:0-13-022616-5.
Google Scholar
Hwang, M. Y., & Huang, X. D. (1993). Shared distribution hidden Markov models for speech recognition. IEEE Transactions on Speech and Audio Processing, 1(4), 414–420.
Article Google Scholar
Khan, A. N., & Yegnanarayana, B. (2001). Development of speech recognition system for Tamil for small restricted task. In Proceedings of national conference on communication, India, 2001.
Lakshmi, A., & Hema, A. M. (2006). A syllable based continuous speech recognizer for Tamil. In INTERSPEECH 2006, Pittsburgh, Pennsylvania (pp. 1878–1881).
Lamere, P., Kwok, P., Walker, W., Gouvea, E., Singh, R., Raj, B., & Wolf, P. (2003). Design of the CMU Sphinx-4 decoder. In EUROSPEECH 2003.
Lee, K. F. (1990). Context dependent phonetic Markov models for speaker independent continuous speech recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, 38(4), 599–609.
Article Google Scholar
Lippmann, R. P., Martin, E. A., & Paul, D. P. (1987). Multi-style training for robust isolated-word speech recognition. In Proc. IEEE international conference on acoustics, speech, signal processing (pp. 705–708).
Marthandan, C. R. (1983). Phonetics of casual Tamil. Ph.D. thesis, University of London.
Nagarajan, T., Kamakshi Prasad, V., & Hema, A. M. (2001). The minimum phase signal derived from the magnitude spectrum and its applications to speech segmentation. In Sixth biennial conference of signal processing and communications.
Nagarajan, T., Hema, A. M., & Hegde, R. M. (2003). Segmentation speech into syllable-like units. In EUROSPEECH-2003 (pp. 2893–2896).
Paul, D. B., & Martin, E. A. (1988). Speaker stress-resistant continuous speech recognition. Presented at the IEEE international conference on acoustics, speech, signal processing.
Plauche, M., Udhyakumar, N., Wooters, C., Pal, J., & Ramachadran, D. (2006). Speech recognition for illiterate access to information and technology. In Proceedings of first international conference on ICT and development.
Rabiner, L. R., Wilpon, J. G., & Soong, F. K. (1988). High performance connected digit recognition using hidden Markov models. Presented at the IEEE int. conf. acoustics, speech, signal processing.
Saraswathi, S., & Geetha, T. V. (2004). Lecture notes in computer science: Vol. 3285. Implementation of Tamil speech recognition system using neural networks.
Saraswathi, S., & Geetha, T. V. (2007). Comparison of performance of enhanced morpheme-based language model with different word-based language models for improving the performance of Tamil speech recognition system. ACM Transaction on Asian Language Information Processing, 6(3), Article 9.
Schwartz, R. M., Chow, Y. L., Roucos, S., Krasner, M., & Makhoul, J. (1984). Improved hidden Markov modeling phonemes for continuous speech recognition. Presented at the IEEE international conference acoustics, speech, signal processing.
Soundaraj, F. (2000). Accent in Tamil: Speech research for speech technology. In K. Nagamma Reddy (Ed.), Speech technology: Issues and implications in Indian languages (pp. 246–256). Thiruvananthapuram: International School of Dravidian Linguistics.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Technology, Kongu Engineering College, Perundurai, 638 052, Erode, India
R. Thangarajan & M. Selvam
Department of Electronics and Communication Engineering, Bannari Amman Institute of Technology, Sathyamangalam, 638 401, Erode, India
A. M. Natarajan

Authors

R. Thangarajan
View author publications
You can also search for this author in PubMed Google Scholar
A. M. Natarajan
View author publications
You can also search for this author in PubMed Google Scholar
M. Selvam
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to R. Thangarajan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Thangarajan, R., Natarajan, A.M. & Selvam, M. Syllable modeling in continuous speech recognition for Tamil language. Int J Speech Technol 12, 47–57 (2009). https://doi.org/10.1007/s10772-009-9058-0

Download citation

Received: 01 November 2009
Accepted: 02 November 2009
Published: 18 November 2009
Issue Date: March 2009
DOI: https://doi.org/10.1007/s10772-009-9058-0

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Syllable modeling in continuous speech recognition for Tamil language

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

Abbreviations

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Navigation

Syllable modeling in continuous speech recognition for Tamil language

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

Abbreviations

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation