Abstract
This paper presents a series of language identification (LID) experiments for Spanish and Basque. Spanish and Basque are both official languages in the Basque Country, a region located in northern Spain. We focused our research on studying several phonotactic-based methodologies, comparing both the performance of phonotactic models trained from text and audio samples and the use of phone and phone-sequences as decoding units. The results show that whereas the use of audio-based phonotactic models performs better than the text ones, when using task-specific information it is also possible to achieve great accuracies. The use of phone sequences as decoding units appears to be useful when constraining the phone decoders to those sequences.
This work was partially supported by the Spanish CICYT project TIN2005-08660-C04-03 and by the University of the Basque Country under grant GIU07/57.
Chapter PDF
Similar content being viewed by others
References
Zissman, M.A.: Comparison of four approaches to automatic language identification of telephone speech. IEEE Transactions on Acoustics Speech and Audio Processing 4(1), 31–44 (1996)
Torres-Carrasquillo, P.A., Reynolds, D.A., Deller, J.R.: Language identification using gaussian mixture model tokenization. In: ICASSP, Orlando, vol. 1, pp. 757–760 (2002)
Martin, A.F., Le, A.N.: The current state of language recognition: Nist 2005 evaluation results. In: IEEE Odyssey 2006, the Speaker and Language Recognition Workshop, San Juan, Puerto Rico, pp. 1–6 (2006)
Young, S.R.: Detecting misrecognitions and out-of-vocabulary words. In: ICASSP, Adelaide, Australia, vol. 2, pp. 21–24 (1994)
Pérez, A., Torres, I., Casacuberta, F., Guijarrubia, V.: A Spanish-Basque weather forecast corpus for probabilistic speech translation. In: 5th SALTMIL Workshop on Minority Languages, Genoa, Italy, pp. 99–101 (2006)
Guijarrubia, V., Torres, I., Rodriguez, L.J.: Evaluation of a spoken phonetic database in basque language. In: LREC, Lisbon, vol. 6, pp. 2127–2130 (2004)
Moreno, A., Poch, D., Bonafonte, A., Lleida, E., Llisterri, J., Mariño, J.B., Nadeu, C.: Albayzin speech database: Design of the phonetic corpus. In: EUROSPEECH, Lisbon, vol. 1, pp. 175–178 (1993)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Guijarrubia, V.G., Torres, M.I. (2008). Comparative Study of Several Phonotactic-Based Approaches to Spanish-Basque Language Identification. In: Ruiz-Shulcloper, J., Kropatsch, W.G. (eds) Progress in Pattern Recognition, Image Analysis and Applications. CIARP 2008. Lecture Notes in Computer Science, vol 5197. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85920-8_16
Download citation
DOI: https://doi.org/10.1007/978-3-540-85920-8_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85919-2
Online ISBN: 978-3-540-85920-8
eBook Packages: Computer ScienceComputer Science (R0)