Improved Language Identification in Presence of Speech Coding

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9468)


Automatically identifying the language being spoken from speech plays a vital role in operating multilingual speech processing applications. A rapid growth in the use of mobile communication devices has inflicted the necessity of operating all speech processing applications in mobile environments. Degradation in the performance of any speech processing applications is majorly due to varying background environments, speech coding and transmission errors. In this work, we focus on developing a language identification system robust to degradations in coding environments in Indian scenario. Spectral features (MFCC) extracted from high sonority regions of speech are used for language identification. Sonorant regions of speech are the regions of speech that are perceptually loud, carry a clear pitch. The quality of coded speech in high sonority region is high compared to less sonorant regions. Spectral features (MFCC) extracted from high sonority regions of speech are used for language identification. In this work, GMM-UBM based modelling technique is employed to develop an language identification (LID) system. Present study is carried out on IITKGP-MLILSC speech database.


Automatic language identification Mobile environments Speech coders Sonority regions Glottal closure region GMM GMM-UBM 


  1. 1.
    Joseph, M.A., Guruprasad, S., Yegnanarayana, B.: Extracting formants from short segments of speech using group delay functions. In: Proceedings of Interspeech, pp. 1009–1012 (2006)Google Scholar
  2. 2.
    Maity, S., Vuppala, A.K., Rao, K.S., Nandi, D.: IITKGP-MLILSC speech database for language identification. In: 2012 National Conference on Communications (NCC), pp. 1–5. IEEE (2012)Google Scholar
  3. 3.
    Mary, L., Yegnanarayana, B.: Extraction and representation of prosodic features for language and speaker recognition. Speech Commun. 50(10), 782–796 (2008)CrossRefGoogle Scholar
  4. 4.
    Murty, K.S.R., Yegnanarayana, B.: Epoch extraction from speech signals. IEEE Trans. Speech Audio Lang. Process. 16(8), 1602–1613 (2008)CrossRefGoogle Scholar
  5. 5.
    Nagarajan, T.: Implicit systems for spoken language identification. Ph.D. thesis, Indian Institute of Technology, Madras (2004)Google Scholar
  6. 6.
    Nandi, D., Dutta, A.K., Rao, K.S.: Significance of cv transition and steady vowel regions for language identification. In: 2014 Seventh International Conference on Contemporary Computing (IC3), pp. 513–517. IEEE (2014)Google Scholar
  7. 7.
    Quatieri, T.F., Singer, E., Dunn, R.B., Reynolds, D.A., Campbell, J.P.: Speaker and language recognition using speech codec parameters. Technical report, DTIC Document (1999)Google Scholar
  8. 8.
    Rao, K.S., Maity, S., Reddy, V.R.: Pitch synchronous and glottal closure based speech analysis for language recognition. Int. J. Speech Technol. 16(4), 413–430 (2013)CrossRefGoogle Scholar
  9. 9.
    Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted gaussian mixture models. Digital Sig. Process. 10(1), 19–41 (2000)CrossRefGoogle Scholar
  10. 10.
    Vydana, H.K., Mounica.K, Vuppala, A.K.: Improved syllable nuclei detection using formant energy in glottal closure regions. In: International Conference on Devices, Circuits and Communications (Accepted). IEEE (2014)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Speech and Vision LabInternational Institute of Information TechnologyHyderabadIndia
  2. 2.SRKR Engineering CollegeHyderabadIndia

Personalised recommendations