Phone-Segments Based Language Identification for Spanish, Basque and English

  • Víctor Guijarrubia
  • M. Inés Torres
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4756)


This paper presents a series of language identification (LID) experiments for Spanish, Basque and English. Spanish and Basque are both official languages in the Basque Country, a region located in northern Spain. We focused our research on some techniques based on phone decoding. We propose the use of phone segments as decoding units instead of just phones. We describe a simple procedure to obtain a set of phone segments that typically appear in the languages involved. In comparison with similar techniques that do not rely on phone segments, the choice of these segments as decoding units yields a remarkable improvement in terms of LID accuracy: from 93.02% using phones to 98.32% using phone segments, when applied to trilingual read speech.


language identification phone decoding 


  1. 1.
    Itakahashi, S., Du, L.: Language identification based on speech fundamental frequency. In: EUROSPEECH, Madrid, Spain, vol. 2, pp. 1359–1362 (1995)Google Scholar
  2. 2.
    Zissman, M.A., Singer, E.: Automatic language identification of telephone speech messages using phoneme recognition and n-gram modelling. In: ICASSP, Adelaide, Australia, vol. 1, pp. 305–308 (1994)Google Scholar
  3. 3.
    Navrátil, J., Zühlke, W.: An efficient phonotactic-acoustic system for language identification. In: ICASSP, Seattle, USA, vol. 2, pp. 781–784 (1998)Google Scholar
  4. 4.
    Singer, E., Torres-Carrasquillo, P.A., Gleason, T.P., Campbell, W.M., Reynolds, D.A.: Acoustic, phonetic and discriminative approaches to automatic language identification. In: EUROSPEECH, Geneva, Switzerland, pp. 1349–1352 (2003)Google Scholar
  5. 5.
    Schultz, T., Rogina, I., Waibel, A.: Lvcsr-based language identification. In: ICASSP, Atlanta, USA, pp. 781–784 (1996)Google Scholar
  6. 6.
    Martin, A.F., Le, A.N.: The current state of language recognition: Nist 2005 evaluation results. In: Proceedings of the IEEE Odyssey 2006, the Speaker and Language Recognition Workshop, San Juan, Puerto Rico (2006)Google Scholar
  7. 7.
    Li, H., Ma, B.: A phonotactic language model for spoken language identification. In: ACL 2005, Morristown, NJ, USA, pp. 515–522 (2005)Google Scholar
  8. 8.
    Guijarrubia, V., Torres, I.: Basque-spanish language identification using phonebased methods. In: Proceedings of International Conference of Spoken Language Processing, Pittsburgh, USA, pp. 1780–1783 (2006)Google Scholar
  9. 9.
    Young, S.R.: Detecting misrecognitions and out-of-vocabulary words. In: ICASSP, Adelaide, Australia, vol. 2, pp. 21–24 (1994)Google Scholar
  10. 10.
    Hieronymus, J.L., Kadambe, S.: Spoken Language Identification Using Large Vocabulary Speech Recognition. In: Proceedings of International Conference of Spoken Language Processing, Philadelphia, USA, pp. 1780–1783 (1996)Google Scholar
  11. 11.
    Guijarrubia, V., Torres, I., Rodríguez, L.J.: Evaluation of a Spoken Phonetic Database in Basque Language. In: LREC 2004, Lisbon, vol. 6, pp. 2127–2130 (2004)Google Scholar
  12. 12.
    Moreno, A., Poch, D., Bonafonte, A., Lleida, E., Llisterri, J., Mariño, J.B., Nadeu, C.: Albayzin speech database: Design of the phonetic corpus. In: EUROSPEECH, Lisbon (1993)Google Scholar
  13. 13.
    Pérez, A., Torres, I., Casacuberta, F., Guijarrubia, V.: A Spanish-Basque weather forecast corpus for probabilistic speech translation. In: 5th SALTMIL Workshop on Minority Languages, Genoa, Italy, pp. 99–101 (2006)Google Scholar
  14. 14.
    Torres, I., Varona, A.: K-TSS Language Model in a Speech Recognition System. Computer Speech and Language 15(2), 127–149 (2001)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Víctor Guijarrubia
    • 1
  • M. Inés Torres
    • 1
  1. 1.Departamento de Electricidad y Electrónica, Universidad del País Vasco, Apartado 644, 48080 BilbaoSpain

Personalised recommendations