Skip to main content
Log in

Language identification using phase information

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

The present work investigates the importance of phase in language identification (LID). We have proposed three phase based features for the language recognition task. In this work, auto-regressive model with scale factor error augmentation have been used for better representation of phase based features. We have developed three group delay based systems, namely, normal group delay based system, auto-regressive model group delay based system and auto-regressive group delay with scale factor augmentation based system. As mel-frequency cepstral coefficients (MFCCs) are extracted from the magnitude of the Fourier transform, we have combined this MFCC-based system with our phase-based systems to exploit the complete information contained in a speech signal. In this work, we have used IITKGP-MLILSC speech database and OGI Multi-language Telephone Speech (OGI-MLTS) corpus for our experiments. We have used Gaussian mixture models for building the language models. From the experimental results it is observed that the LID accuracy obtained from our proposed phase based features is comparable with MFCC features. We have also observed some performance improvement in the LID accuracy on combining the proposed phase-based systems with the state of the art MFCC-based system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Alsteris, L. D., & Paliwal, K. K. (2004). Importance of window shape for phase-only reconstruction of speech. IEEE, 1, 1–573.

    Article  Google Scholar 

  • Alvin, M. Robert, W. Goodman, F.J. (1989). Improved automatic language identification in noisy speech. In International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (vol. 1, pp. 528–531).

  • Balleda, J. Murthy, H. A. &  Nagarajan, T. (2000). Language identification from short segments of speech. In Interspeech (pp. 1033–1036).

  • Bhaskar, B.  Nandi, D. & Rao, K. S. (2013). Analysis of language identification performance based on gender and hierarchial grouping approaches. In International Conference on Natural Language Processing (ICON-2013), CDAC, Noida, India.

  • Dutta, A. K. & Rao, K. S. (2015, August, 20-22). Robust language identification using power normalized cepstral coefficients. In Eighth International Conference on Contemporary Computing, IC3 Noida, India (pp. 253–256).

  • Foil, J. T. (1986). Language identification using noisy speech. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. 861–864).

  • Hegde, R. M., Murthy, H. A., & Gadde, V. R. R. (2007). Significance of the modified group delay feature in speech recognition. IEEE Transactions on Audio, Speech & Language Processing, 15(1), 190–202.

    Article  Google Scholar 

  • Itahashi, S. Zhou, J. X. &  Tanaka, K. (1994). Spoken language discrimination using speech fundamental frequency. In Third International Conference on Spoken Language Processing.

  • Kadambe, S. & Hieronymus, J. L. (1995). Language identification with phonological and lexical models. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (vol. 5, pp. 3507–3510).

  • Leonard, G. (1980). Language recognition test and evaluation.

  • Li, K.-P. (1994). Automatic language identification using syllabic spectral features. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (vol. 1, pp. 1–297).

  • Liu, L., He, J., & Palm, G. (1997). Effects of phase on the perception of intervocalic stop consonants. Speech Communication, 22(4), 403–417.

    Article  Google Scholar 

  • Loweimi, E. Ahadi, S. M. &  Sheikhzadeh, H. (2011). Phase-only speech reconstruction using very short frames. In Twelfth Annual Conference of the International Speech Communication Association.

  • Maity, S. Vuppala, A. K. Rao, K. S. &  Nandi, D. (2012). IITKGP-MLILSC speech database for language identification. In IEEE National Conference on Communications (NCC) (pp. 1–5).

  • Martínez, D.  Burget, L.  Ferrer, L. &  Scheffer, N. (2012). ivector-based prosodic system for language identification. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4861–4864).

  • Mary, L. &  Yegnanarayana, B. (2004). Autoassociative neural network models for language identification. In IEEE Intelligent Sensing and Information Processing. Proceedings of International Conference on (pp. 317–320).

  • Mary, L. (2006). Multilevel implicit features for language and speaker recognition.

  • Mary, Y. B. L. (2008). Extraction and representation of prosodic features for language and speaker recognition. Speech Commun, 50, 782–796.

    Article  Google Scholar 

  • Murthy, H. A. (1992). Algorithms for processing fourier transform phase of signals, Ph. D. Dissertation, Department of Computer Science and Engineering, Indian Institute of Technology, Madras, India.

  • Muthusamy, Y. K.  Cole, R.  Gopalakrishnan, M. et al., (1991). A segment-based approach to automatic language identification. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. 353–356).

  • Muthusamy, Y. K. Cole, R. A. Oshika, B. T. Consortium, L. D. et al., (1992). The ogi multi-language telephone speech corpus. In Citeseer ICSLP (vol. 92, pp. 895–898).

  • Nagarajan, T. & Murthy, H. A. (2004). Language identification using parallel syllable-like unit recognition. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (vol. 1, pp. 1–401).

  • Nandi, D. Dutta, A. K. & Rao, K. S. (2014). Significance of CV transition and steady vowel regions for language identification, in IEEE Seventh International Conference on Contemporary Computing (IC3) (pp. 513–517).

  • Nandi, D., Pati, D., & Rao, K. S. (2015). Implicit excitation source features for robust language identification. International Journal of Speech Technology, 18(3), 459–477.

    Article  Google Scholar 

  • Ohm, G. S. (1843). Uber die definition des tones, nebst daran geknfter theorie der sirene und hnlicher tonbildender vorichtungen. Annual Review of Physical Chemistry, 135(8), 513–565.

    Article  Google Scholar 

  • Oppenheim, A. V., & Lim, J. S. (1981). The importance of phase in signals. Proceedings of the IEEE, 69, 529–550.

    Article  Google Scholar 

  • Oppenheim, A. V., Schafer, R. W., Buck, J. R., et al. (1989). Discrete-time signal processing. New Jersey: Prentice-hall Englewood Cliffs.

    MATH  Google Scholar 

  • Pellegrino, F. &  André-Obrecht, R. (1999). An unsupervised approach to language identification. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (vol. 2, pp. 833–836).

  • Rao, K. S., Maity, S., & Reddy, V. R. (2013). Pitch synchronous and glottal closure based speech analysis for language recognition. International Journal of Speech Technology, 16(4), 413–430.

    Article  Google Scholar 

  • Reddy, V. R., Maity, S., & Rao, K. S. (2013). Identification of Indian languages using multi-level spectral and prosodic features. International Journal of Speech Technology, 16(4), 489–511.

    Article  Google Scholar 

  • Sangwan, A.  Mehrabani, M. & Hansen, J. H. (2010). Automatic language analysis and identification based on speech production knowledge. In IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP) (pp. 5006–5009).

  • Savic, M.  Acosta, E. & Gupta, S. K. (1991). An automatic language identification system. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. 817–820).

  • Sugiyama, M. (1991). Automatic language recognition using acoustic features. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. 813–816).

  • Tribolet, J. (1977). A new phase unwrapping algorithm. IEEE Transactions on Acoustics Speech and Signal Processing, 25(2), 170–177.

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arup Kumar Dutta.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dutta, A.K., Rao, K.S. Language identification using phase information. Int J Speech Technol 21, 509–519 (2018). https://doi.org/10.1007/s10772-017-9482-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-017-9482-5

Keywords

Navigation