Skip to main content
Log in

Syllable modeling in continuous speech recognition for Tamil language

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

In automatic speech recognition, the phone has probably been a dominating sub-word unit for more than one decade. Context Dependent phone or triphone modeling accounts for contextual variations between adjacent phones and state tying addresses modeling of triphones that are not seen during training. Recently, syllable is gaining momentum as a new sub-word unit. Syllable being a larger unit than a phone addresses the severe contextual variations between phones within it. Therefore, it is more stable than a phone and models pronunciation variability in a systematic way. Tamil language has challenging features like agglutination and morpho-phonology. In this paper, attempts have been made to provide solutions to these issues by using the syllable as a sub-word unit in an acoustic model. Initially, a small vocabulary context independent word models and a medium vocabulary context dependent phone models are developed. Subsequently, an algorithm based on prosodic syllable is proposed and two experiments have been conducted. First, syllable based context independent models have been trained and tested. Despite large number of syllables, this system has performed reasonably well compared to context independent word models in terms of word error rate and out of vocabulary words. Subsequently, in the second experiment, syllable information is integrated in conventional triphone modeling wherein cross-syllable triphones are replaced with monophones and the number of context dependent phone models is reduced by 22.76% in untied units. In spite of reduction in the number of models, the accuracy of the proposed system is comparable to that of the baseline triphone system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Abbreviations

ANN:

Artificial Neural Networks

ASR:

Automatic Speech Recognition

CD:

Context Dependent

CI:

Context Independent

CIIL:

Central Institute of Indian Languages, Mysore

CMU:

Carnegie Melon University

HMM:

Hidden Markov Model

LVCSR:

Large Vocabulary Continuous Speech Recognition

SVM:

Support Vector Machine

WER:

Word Error Rate

References

  • Arden, A. H. (1934). A progressive grammar of common Tamil (4th  ed.). Madras: Christian Literature Society, pp. 59.

    Google Scholar 

  • Arokianathan, S. (1981). Tamil clitics. Trivandrum: Dravidian Linguistics Association, pp. 5.

    Google Scholar 

  • Asher, R. E., & Keane, E. L. (2005). Diphthongs in colloquial Tamil. In W. J. Hardcastle & J. Mackenzie Beck (Eds.) (pp. 141–171).

  • Bahl, L. R., Bakis, R., Cohen, P. S., Cole, A. G., Jelinek, F., Lewis, B. L., & Mercer, R. L. (1980). Further results on the recognition of a continuously read natural corpus, presented at the IEEE international. In Conference on acoustics, speech, signal processing.

  • Bahl, L. R., Brown, P. F., De Souza, P. V., & Mercer, R. L. (1988). Acoustic Markov models used in the Tangora speech recognition system. Presented at the IEEE international conference on acoustics, speech, signal processing, 1988.

  • Balasubramanian, T. (1980). Timing in Tamil. Journal of Phonetics, 8, 449–467.

    Google Scholar 

  • CIIL, Central Institute of Indian Languages, Mysore, India. http://www.ciilcorpora.net/tamsam.htm.

  • Fujimura, O. (1975). Syllable as a unit of speech recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, ASSP-23(1), 82–87.

    Article  Google Scholar 

  • Ganapathiraju, A., Hamaker, J., Picone, J., Ordowski, M., & Doddington, G. R. (2001). Syllable based large vocabulary continuous speech recognition. IEEE Transactions on Speech and Audio Processing, 9(4), 358–366.

    Article  Google Scholar 

  • Greenberg, S. (1998). Speaking in short hand—a syllable centric perspective for understanding pronunciation variation. In Proceedings of the ESCA workshop on modeling pronunciation variation for automatic speech recognition, Kekrade, 1998 (pp. 47–56).

  • Huang, X., Acero, A., & Hon, H. W. (2001). Spoken language processing—a guide to theory, algorithm and system development. Englewood Cliffs: Prentice-Hall PTR. ISBN:0-13-022616-5.

    Google Scholar 

  • Hwang, M. Y., & Huang, X. D. (1993). Shared distribution hidden Markov models for speech recognition. IEEE Transactions on Speech and Audio Processing, 1(4), 414–420.

    Article  Google Scholar 

  • Khan, A. N., & Yegnanarayana, B. (2001). Development of speech recognition system for Tamil for small restricted task. In Proceedings of national conference on communication, India, 2001.

  • Lakshmi, A., & Hema, A. M. (2006). A syllable based continuous speech recognizer for Tamil. In INTERSPEECH 2006, Pittsburgh, Pennsylvania (pp. 1878–1881).

  • Lamere, P., Kwok, P., Walker, W., Gouvea, E., Singh, R., Raj, B., & Wolf, P. (2003). Design of the CMU Sphinx-4 decoder. In EUROSPEECH 2003.

  • Lee, K. F. (1990). Context dependent phonetic Markov models for speaker independent continuous speech recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, 38(4), 599–609.

    Article  Google Scholar 

  • Lippmann, R. P., Martin, E. A., & Paul, D. P. (1987). Multi-style training for robust isolated-word speech recognition. In Proc. IEEE international conference on acoustics, speech, signal processing (pp. 705–708).

  • Marthandan, C. R. (1983). Phonetics of casual Tamil. Ph.D. thesis, University of London.

  • Nagarajan, T., Kamakshi Prasad, V., & Hema, A. M. (2001). The minimum phase signal derived from the magnitude spectrum and its applications to speech segmentation. In Sixth biennial conference of signal processing and communications.

  • Nagarajan, T., Hema, A. M., & Hegde, R. M. (2003). Segmentation speech into syllable-like units. In EUROSPEECH-2003 (pp. 2893–2896).

  • Paul, D. B., & Martin, E. A. (1988). Speaker stress-resistant continuous speech recognition. Presented at the IEEE international conference on acoustics, speech, signal processing.

  • Plauche, M., Udhyakumar, N., Wooters, C., Pal, J., & Ramachadran, D. (2006). Speech recognition for illiterate access to information and technology. In Proceedings of first international conference on ICT and development.

  • Rabiner, L. R., Wilpon, J. G., & Soong, F. K. (1988). High performance connected digit recognition using hidden Markov models. Presented at the IEEE int. conf. acoustics, speech, signal processing.

  • Saraswathi, S., & Geetha, T. V. (2004). Lecture notes in computer science: Vol. 3285. Implementation of Tamil speech recognition system using neural networks.

  • Saraswathi, S., & Geetha, T. V. (2007). Comparison of performance of enhanced morpheme-based language model with different word-based language models for improving the performance of Tamil speech recognition system. ACM Transaction on Asian Language Information Processing, 6(3), Article 9.

  • Schwartz, R. M., Chow, Y. L., Roucos, S., Krasner, M., & Makhoul, J. (1984). Improved hidden Markov modeling phonemes for continuous speech recognition. Presented at the IEEE international conference acoustics, speech, signal processing.

  • Soundaraj, F. (2000). Accent in Tamil: Speech research for speech technology. In K. Nagamma Reddy (Ed.), Speech technology: Issues and implications in Indian languages (pp. 246–256). Thiruvananthapuram: International School of Dravidian Linguistics.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to R. Thangarajan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Thangarajan, R., Natarajan, A.M. & Selvam, M. Syllable modeling in continuous speech recognition for Tamil language. Int J Speech Technol 12, 47–57 (2009). https://doi.org/10.1007/s10772-009-9058-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-009-9058-0

Navigation