Advertisement

A Pre-classification-Based Language Identification for Northeast Indian Languages Using Prosody and Spectral Features

  • Chuya China Bhanja
  • Mohammad Azharuddin Laskar
  • Rabul Hussain Laskar
Article
  • 45 Downloads

Abstract

This paper is aimed at developing a two-stage language identification (LID) system for Northeast Indian languages. In the first stage, languages are pre-classified into tonal and non-tonal categories, and in the second stage, individual languages are identified from languages of the corresponding category. In this work, new parameters to model the prosodic characteristics of the speech signal have been proposed for pre-classification as well as individual language identification. Also, the effectiveness of spectral features, namely Mel-frequency cepstral coefficient (MFCC) and their combination with prosodic features, has been studied for pre-classification task. The usefulness of MFCC with their delta and acceleration coefficients in combination with prosodic features has been investigated for individual language identification. The performance of the system is analyzed for the features extracted of different analysis units, such as syllable, disyllable, word, and utterance. Comparative performance analysis of three different classifiers, namely artificial neural network (ANN), Gaussian mixture model–Universal background model (GMM–UBM), and i-vector based support vector machine (i-vector based SVM), has been made for pre-classification as well as individual language identification. A new database, NIT Silchar language database (NITS-LD), has been developed for seven NE Indian languages using All India Radio broadcast news. The experimental analysis suggests that the parameters proposed to represent the prosodic characteristics help to improve the performance of both the stages and show improvements over existing parameters by as much as 7.4%, 11.9%, and 9.1% for 30 s, 10 s, and 3 s test data, respectively, in the pre-classification stage. Of the baseline single-stage systems, GMM–UBM provides the highest accuracies of 80%, 76.8%, and 72% for 30 s, 10 s, and 3 s test data, respectively. In the proposed system, the combination of the ANN model in pre-classification stage and the GMM–UBM model in individual language identification stage provides the highest accuracies, and it shows the improvements over the baseline system by 7.2%, 7%, and 4.9% for 30 s, 10 s, and 3 s test data. For OGI-Multilingual (OGI-MLTS) database, improvements of 8.1%, 7.4%, and 5.7% for 30 s, 10 s, and 3 s test data, respectively, are observed over the baseline LID system.

Keywords

Language identification Pre-classification of tonal and non-tonal languages Syllables Features Classifiers Database 

References

  1. 1.
    A.G. Adami, R. Mihaescu, D.A. Reynolds, J.J. Godfrey, Modeling prosodic dynamics for speaker recognition, in Proceedings, IEEE International Conference on Acoustic, Speech Signal Process, vol. 4 (Hong Kong, 2003), pp. 788–791Google Scholar
  2. 2.
    F. Adeeba, S. Hussain, Acoustic feature analysis and discriminative modeling for language identification of closely related South-Asian languages. Circuits System Signal Process. (2017).  https://doi.org/10.1007/s00034-017-0724-1 CrossRefGoogle Scholar
  3. 3.
    C.L. Alan, Tonal effects on perceived vowel duration. Lab. Phonol. 10(4), 151–168 (2010)MathSciNetGoogle Scholar
  4. 4.
    E. Ambikairajah, H. Li, L. Wang, B. Yin, V. Sethu, Language identification: a tutorial. IEEE Circuits Syst. Mag. 11(2), 82–108 (2011)CrossRefGoogle Scholar
  5. 5.
    M. Atterer, D.R. Ladd, On the phonetics and phonology of “segmental anchoring” of F 0. J. Phonetics 32, 177–197 (2004)CrossRefGoogle Scholar
  6. 6.
    D. Dan, D. Robert Ladd, Linguistic tone is related to the population frequency of the adaptive haplogroups of two brain size genes, ASPM and microcephalin. PANS (2007).  https://doi.org/10.1073/pnas.0610848104 CrossRefGoogle Scholar
  7. 7.
    N. Dehak, P. Kenny, R. Dehak, P. Dumouchel, P. Ouellet, Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 99(4), 788–798 (2010)CrossRefGoogle Scholar
  8. 8.
    N. Dehak, P. Torres-Carrasquillo, D. Reynolds, R. Dehak, Language recognition via i-vectors and dimensionality reduction, in Interspeech Conference (Florence, 2011), pp. 857–860Google Scholar
  9. 9.
    S. Dey, P. Motlicek, S. Madikeri, M. Ferras, Template-matching for text-dependent speaker verification. Speech Commun. 88, 96–105 (2017)CrossRefGoogle Scholar
  10. 10.
    M. Dorofki, A.H. Elshafie, O. Gaafar, O.A. Karim, S. Mastura, Comparison of artificial neural network transfer functions abilities to simulate extreme runoff data, in International Conference on Environment, Energy and Biotechnology (Singapore, 2012)Google Scholar
  11. 11.
    S. Duanmu, Tone and non-tone languages: an alternative to language typology and parameters. Lang. Linguist. 5(4), 891–923 (2004)MathSciNetGoogle Scholar
  12. 12.
    S. Dusan, L. Deng, Recovering vocal tract shapes from MFCC parameters, in 5th International Conference on Spoken Language Processing (1998)Google Scholar
  13. 13.
    C. Everett, D. Basì, S.G. Roberts, Climate, vocal folds, and tonal languages: connecting the physiological and geographical dots. PNAS 112(5), 1322–1327 (2016)CrossRefGoogle Scholar
  14. 14.
    J. Gandour, Counterfeit tones in the speech of Southern Thai bidialectals. Lingua 41(2), 125–143 (1977)CrossRefGoogle Scholar
  15. 15.
    A. Gelbukh, Computational Linguistics and Intelligent Text Processing, Part-1 (Springer, Berlin, 2011)Google Scholar
  16. 16.
    A.O. Hatch, S. Kajarekar, A. Stolcke, Within-class covariance normalization for SVM-based speaker recognition, in Proceeding of the ICSLP (2006), pp. 1471–1474Google Scholar
  17. 17.
    S. Jothilakshmi, V. Ramalingam, S. Palanivel, A hierarchical language identification system for Indian languages. Digit. Signal Proc. 22(3), 544–553 (2012)MathSciNetCrossRefGoogle Scholar
  18. 18.
    A.N. Khan, S.V. Gangashetty, B. Yegnanarayana, Syllabic properties of three Indian languages: implications for speech recognition and language identification, in International Conference on Natural Language Processing (Mysore, 2003), pp. 125–134Google Scholar
  19. 19.
    E. Kidder, Tone, intonation, stress and duration in Navajo. in En Linguistic Theory at the University of Arizona, eds. by Mans Hulden y Shannon T. Bischoff (Arizona: University of Arizona Linguistics Circle, 2008), Vol. 16, pp 55–66Google Scholar
  20. 20.
    R.A. Krakow, Physiological organization of syllables: a review. J. Phonetics 27, 23–54 (1999)CrossRefGoogle Scholar
  21. 21.
    P.N. Le, E. Ambikairajah, E.H. Choi, Improvement of vietnamese tone classification using FM and MFCC features, in International Conference on Computing and Communication Technologies, (RIVF’09) (2009), pp. 1–4Google Scholar
  22. 22.
    I. Maddieson, Tone, in The World Atlas of Language Structures Online, ed. by Matthew S. Dryer, Martin Haspelmath (Max Planck Institute for Evolutionary Anthropology, Leipzig, 2013)Google Scholar
  23. 23.
    D. Martinez, E.A. Lleida: Ortega and A. Miguel, prosodic features and formant modeling for an i-vector based Language recognition system, in ICASSP (2013)Google Scholar
  24. 24.
    L. Mary, B. Yegnanarayana, Extraction and representation of prosodic features for language and speaker recognition. Speech Commun. 50(10), 782–796 (2008)CrossRefGoogle Scholar
  25. 25.
    L. Mary, Multilevel implicit features for language and speaker recognition. Ph.D. Dissertation (IIT Madras, 2006)Google Scholar
  26. 26.
    Y. Muthusamy, R. Cole, B. Oshika, The OGI multi-language telephone speech corpuses, in Proceedings of International Conference Spoken Language Processing (ICSLP) (1992), pp. 895–898Google Scholar
  27. 27.
    R.W.M. Ng, T. Lee, C.C. Leung, B. Ma, H. Li, Analysis and selection of prosodic features for language identification, in Proc. IALP. (2009), pp. 123–128Google Scholar
  28. 28.
    P. Pittayaporn, Directionality of tone change, in Proceedings of the 16th International Congress of Phonetic Sciences (Saarland University, Saarbrücken, 2007), pp. 1421–1424Google Scholar
  29. 29.
    A. Poddar, M. Sahidullah, G. Saha, Improved i-vector extraction technique for speaker verification with short utterances. Int. J. Speech Technol. 3, 1–16 (2017)Google Scholar
  30. 30.
    S.R.M. Prasanna, B.V.S. Reddy, P. Krishnamurthy, Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Trans. Audio Speech Lang. Process. 17, 556–565 (2009)CrossRefGoogle Scholar
  31. 31.
    C. Qu, H. Goad, The interaction of stress and tone in standard Chinese: experimental findings and theoretical consequences (Theory and Practice, Max Planck Institute for Evolutionary Anthropology, Tone, 2012)Google Scholar
  32. 32.
    V. Ramu Reddy, S. Maity, K.S. Rao, Identification of Indian languages using multi-level spectral and prosodic features. Int. J. Speech Technol. 16(4), 489–511 (2013)CrossRefGoogle Scholar
  33. 33.
    K.S. Rao, Application of prosody models for developing speech systems in Indian languages. Int. J. Speech Technol. 14(1), 19–33 (2011)CrossRefGoogle Scholar
  34. 34.
    R.A. Redner, H.F. Walker, Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev. 26(2), 195–239 (1984)MathSciNetCrossRefGoogle Scholar
  35. 35.
    B. Remijsen, The study of tone in languages with a quantity contrast. Language Documentation and Conservation. 8, 634–651 (2014)Google Scholar
  36. 36.
    D. Reynolds, Gaussian Mixture Models. Encyclopedia of Biometric Recognition (Springer, Berlin, 2008)Google Scholar
  37. 37.
    N. Ryant, J. Hong Yuan, M. Liberman, Mandarin tone classification without pitch tracking, in ICASSP (2014)Google Scholar
  38. 38.
    P. Sarmah, C.R. Wiltshire, A preliminary acoustic study of Mizo vowels and tones. J. Acoust. Soc. India 37(3), 121–129 (2010)Google Scholar
  39. 39.
    A.K. Singh, A computational phonetic model for Indian language scripts, in Constraints on Spelling Changes. Fifth International Workshop on Writing Systems (Nijmegen, 2006)Google Scholar
  40. 40.
    D. Steven, P. Mermelstein, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28, 357–366 (1980)CrossRefGoogle Scholar
  41. 41.
    M.N. Stuttle, A Gaussian mixture model spectral representation for speech recognition. Ph.D. Dissertation (University of Cambridge, 2003)Google Scholar
  42. 42.
    M.J.S. Suresh, S.A. Thorat, Language identification system using MFCC and SDC feature, Language (2018)Google Scholar
  43. 43.
    D. Talkin, A robust algorithm for pitch tracking (RAPT), in Speech Coding and Synthesis, ed. by W.B. Klein, K.K. Paliwal (Elsevier, New York, 1995)Google Scholar
  44. 44.
    L. Wang, E.E. Ambikairajah, H.C. Choi, Automatic tonal and non-tonal language classification and language identification using prosodic information, in International Symposium on Chinese Spoken language Processing. (ISCSLP) (2006), pp. 485–496Google Scholar
  45. 45.
    L. Wang, E. Ambikairajah, H.C. Choi Eric, Automatic language recognition with tonal and non-tonal language pre-classification, in 15th European Signal Processing Conference (2007)Google Scholar
  46. 46.
    Y. Xu, ‘Effects of tone and focus on the formation and alignment of F 0 contours. J. Phonetics 27, 55–105 (1999)CrossRefGoogle Scholar
  47. 47.
    Y. Xu, Consistency of tone-syllable alignment across different syllable structures and speaking rates. Phonetica 55, 179–203 (1998)CrossRefGoogle Scholar
  48. 48.
    Y. Xu, Understanding tone from the perspective of production and perception. Lang. Linguist. 5(4), 757–797 (2004)MathSciNetGoogle Scholar
  49. 49.
    B. Yegnanarayana, Artificial Neural Networks (Prentice-Hall of india Private Limited, New Delhi, 2005)Google Scholar
  50. 50.
    B. Yin, Language identification with language and feature dependency. Ph.D. Dissertation (The University of New South Wales, 2009)Google Scholar
  51. 51.
    J. Zhang, Tones, tonal phonology, and tone sandhi, in Chinese Linguistics, ed. by C.-T. James Huang, Y.-H. Audrey Li, A. Simpson (Wiley, Oxford, 2014), pp. 443–464CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Chuya China Bhanja
    • 1
  • Mohammad Azharuddin Laskar
    • 1
  • Rabul Hussain Laskar
    • 1
  1. 1.Department of Electronics and Communication EngineeringNational Institute of Technology SilcharSilcharIndia

Personalised recommendations