Comparative Study of Several Phonotactic-Based Approaches to Spanish-Basque Language Identification

  • Víctor G. Guijarrubia
  • M. Inés Torres
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5197)


This paper presents a series of language identification (LID) experiments for Spanish and Basque. Spanish and Basque are both official languages in the Basque Country, a region located in northern Spain. We focused our research on studying several phonotactic-based methodologies, comparing both the performance of phonotactic models trained from text and audio samples and the use of phone and phone-sequences as decoding units. The results show that whereas the use of audio-based phonotactic models performs better than the text ones, when using task-specific information it is also possible to achieve great accuracies. The use of phone sequences as decoding units appears to be useful when constraining the phone decoders to those sequences.


language identification phone decoding pprlm 


  1. 1.
    Zissman, M.A.: Comparison of four approaches to automatic language identification of telephone speech. IEEE Transactions on Acoustics Speech and Audio Processing 4(1), 31–44 (1996)CrossRefGoogle Scholar
  2. 2.
    Torres-Carrasquillo, P.A., Reynolds, D.A., Deller, J.R.: Language identification using gaussian mixture model tokenization. In: ICASSP, Orlando, vol. 1, pp. 757–760 (2002)Google Scholar
  3. 3.
    Martin, A.F., Le, A.N.: The current state of language recognition: Nist 2005 evaluation results. In: IEEE Odyssey 2006, the Speaker and Language Recognition Workshop, San Juan, Puerto Rico, pp. 1–6 (2006)Google Scholar
  4. 4.
    Young, S.R.: Detecting misrecognitions and out-of-vocabulary words. In: ICASSP, Adelaide, Australia, vol. 2, pp. 21–24 (1994)Google Scholar
  5. 5.
    Pérez, A., Torres, I., Casacuberta, F., Guijarrubia, V.: A Spanish-Basque weather forecast corpus for probabilistic speech translation. In: 5th SALTMIL Workshop on Minority Languages, Genoa, Italy, pp. 99–101 (2006)Google Scholar
  6. 6.
    Guijarrubia, V., Torres, I., Rodriguez, L.J.: Evaluation of a spoken phonetic database in basque language. In: LREC, Lisbon, vol. 6, pp. 2127–2130 (2004)Google Scholar
  7. 7.
    Moreno, A., Poch, D., Bonafonte, A., Lleida, E., Llisterri, J., Mariño, J.B., Nadeu, C.: Albayzin speech database: Design of the phonetic corpus. In: EUROSPEECH, Lisbon, vol. 1, pp. 175–178 (1993)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Víctor G. Guijarrubia
    • 1
  • M. Inés Torres
    • 1
  1. 1.Departamento de Electricidad y ElectrónicaUniversidad del País VascoBilbaoSpain

Personalised recommendations