Language Identification Using Prosodic Features

  • K. Sreenivasa RaoEmail author
  • V. Ramu Reddy
  • Sudhamay Maity
Part of the SpringerBriefs in Electrical and Computer Engineering book series


In previous chapter language-specific spectral features are discussed for language identification (LID). Present chapter mainly focuses on language-specific prosodic features at syllable, word and global levels for LID task. For improving the recognition accuracy of LID system further, combination of spectral and prosodic features has been explored.


Vowel onset point Zero frequency filter Intonation Rhythm Stress Word level prosodic features Syllable level prosodic features Combination of speech features Multilevel prosodic features Prosodic contours Global prosodic features 


  1. 1.
    Rao KS, Yegnanarayana B (2007) Modeling durations of syllables using neural networks. Comput Speech Lang 21:282–295CrossRefGoogle Scholar
  2. 2.
    Shriberg E, Stolcke A, Hakkani-Tur D, Tur G (2000) Prosody-based automatic segmentation of speech into sentences and topics. Speech Commun 32:127–154CrossRefGoogle Scholar
  3. 3.
    Ramu Reddy V, Sreenivasa Rao K (2013) Two-stage intonation modeling using feedforward neural networks for syllable based text-to-speech synthesis. Comput Speech Lang (Elsevier) 27:1105–1126Google Scholar
  4. 4.
    Sreenivasa Rao K (2011) Application of prosody models for developing speech systems in Indian languages. Int J Speech Technol (Springer) 14:19–33Google Scholar
  5. 5.
    Sreenivasa Rao K, Yegnanarayana B (2009) Intonation modeling for Indian languages. Comput Speech Lang (Elsevier) 23:240–256Google Scholar
  6. 6.
    Ramu Reddy V, Sreenivasa Rao K (2012) Modeling the intensity of syllables using classification and regression trees. In: National conference on communications (NCC), IIT Kharagpur, India, Feb 2012Google Scholar
  7. 7.
    Mary L, Rao KS, Yegnanarayana B (2005) Neural network classifiers for language identification using syntactic and prosodic features. In: Proceedings IEEE international conference intelligent sensing and information processing, (Chennai, India), pp 404–408, Jan 2005Google Scholar
  8. 8.
    Navratil J (2001) Spoken language recognition a step toward multilinguality in speech processing. IEEE Trans Speech Audio Process 9:678–685Google Scholar
  9. 9.
    Zissman MA (1996) Comparison of four approaches to automatic language identification of telephone speech. IEEE Trans Speech Audio Process 4:31–44CrossRefGoogle Scholar
  10. 10.
    Khan AN, Gangashetty SV, Yegnanarayana B (2003) Syllabic properties of three Indian languages: implications for speech recognition and language identification. In: Proceedings international conference natural language processing, (Mysore, India), pp 125–134, Dec 2003Google Scholar
  11. 11.
    Ember M, Ember CR (1999) Cross-language predictors of consonant-vowel syllables. Am Anthropologist 101:730–742Google Scholar
  12. 12.
    Ohman SEG (1966) Coarticulation in VCV utterances:spectrographic measurements. J Acoust Soc Am 39:151–168Google Scholar
  13. 13.
    Sekhar CC (1996) Neural network models for recognition of stop consonant-vowel (SCV) segments in continuous speech. PhD thesis, Indian Institute of Technology Madras, Department of Computer Science and Engineering Chennai, India, 1996Google Scholar
  14. 14.
    Prasanna SRM, Gangashetty SV, Yegnanarayana B (2001) Significance of vowel onset point for speech analysis. In: Proceedings international conference signal processing and communication, vol. 1, (Bangalore, India), pp 81–86, July 2001Google Scholar
  15. 15.
    Gangashetty SV (2005) Neural network models for recognition of consonant-vowel units of speech in multiple languages. PhD thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Feb 2005Google Scholar
  16. 16.
    Sreenivasa Rao K, Anil Kumar V (2013) Non-uniform time scale modification using instants of significant excitation and vowel onset points. Speech Commun (Elsevier), vol 55, no 6, pp 745–756, July 2013Google Scholar
  17. 17.
    Sreenivasa Rao K, Yegnanarayana B (2009) Duration modification using glottal closure instants and vowel onset points. Speech Commun (Elsevier), vol 51, pp 1263–1269, Dec 2009Google Scholar
  18. 18.
    Jainath Y, Sreenivasa Rao K (2013) Detection of vowel offset point from speech signal. IEEE Signal Process Lett 20(4):299–302Google Scholar
  19. 19.
    Vuppala AK, Yadav J, Chakrabarti S, Rao KS (2012) Vowel onset point detection for low bit rate coded speech. IEEE Trans Audio Speech Lang Process 20:1894–1903CrossRefGoogle Scholar
  20. 20.
    Madhukumar AS (1993) Intonation knowledge for speech systems for an Indian language. PhD thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai-600 036, India 1993Google Scholar
  21. 21.
    Madhukumar AS, Rajendran S, Yegnanarayana B (1993) Intonation component of text-to-speech system for Hindi. Comput Speech Lang 7:283–301CrossRefGoogle Scholar
  22. 22.
    Cummins F, Gers F, Schmidhuber J (1999) Comparing prosody across languages. Tech. Rep. I. D. S. I. A. Technical Report IDSIA-07-99, Istituto Molle di Studie sull’Intelligenza Artificiale, CH6900 Lugano, Switzerland, 1999Google Scholar
  23. 23.
    Dutoit T (1997) An introduction to text-to-speech synthesis. Kluwer Academic Publishers, DordrechtCrossRefGoogle Scholar
  24. 24.
    Xu Y (1998) Consistency of tone-syllable alignment across different syllable structures and speaking rates. Phonetica 55:179–203CrossRefGoogle Scholar
  25. 25.
    Murty K, Yegnanarayana B (2008) Epoch extraction from speech signals. IEEEASLP 16:1602–1613Google Scholar
  26. 26.
    Taylor P (2000) Analysis and synthesis of intonation using the tilt model. J Acoust Soc Am 107:1697–1714Google Scholar
  27. 27.
    Gussenhoven C, Reepp BH, Rietveld A, Rump HH, Terken J (1997) The perceptual prominence of fundamental frequency peaks. J Acoust Soc Am 102:3009–3022Google Scholar
  28. 28.
    Ramus F, Mehler J (1999) Language identification with suprasegmental cues: a study based on speech resynthesis. J Acoust Soc Am 105:512–521Google Scholar
  29. 29.
    MacNeilage PF (1998) The frame/content theory of evolution of speech production. Behav Brain Sci 21:499–546Google Scholar
  30. 30.
    Krakow RA (1999) Physiological organization of syllables: a review. J Phonetics 27:23–54CrossRefGoogle Scholar
  31. 31.
    Ramus F, Nespor M, Mehler J (1999) Correlates of linguistic rhythm in speech signal. Cognition 73(3):265–292CrossRefGoogle Scholar
  32. 32.
    Cutler A, Ladd DR (1983) Prosody: models and measurements. Springer, BerlinCrossRefGoogle Scholar
  33. 33.
    Ramu Reddy V, Sudhamay M, Sreenivasa Rao K (2013) Recognition of Indian languages using multi-level spectral and prosodic features. Int J Speech Technol (Springer), vol 16, no 4, pp 489–510, Dec 2013Google Scholar

Copyright information

© The Author(s) 2015

Authors and Affiliations

  • K. Sreenivasa Rao
    • 1
    Email author
  • V. Ramu Reddy
    • 2
  • Sudhamay Maity
    • 3
  1. 1.Indian Institute of Technology KharagpurKharagpurIndia
  2. 2.Innovation Lab KolkataKolkataIndia
  3. 3.Indian Institute of Technology KharagpurKharagpurIndia

Personalised recommendations