Comparative Study of Several Phonotactic-Based Approaches to Spanish-Basque Language Identification

Guijarrubia, Víctor G.; Torres, M. Inés

doi:10.1007/978-3-540-85920-8_16

Víctor G. Guijarrubia¹ &
M. Inés Torres¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 5197))

Included in the following conference series:

Iberoamerican Congress on Pattern Recognition

1898 Accesses

Abstract

This paper presents a series of language identification (LID) experiments for Spanish and Basque. Spanish and Basque are both official languages in the Basque Country, a region located in northern Spain. We focused our research on studying several phonotactic-based methodologies, comparing both the performance of phonotactic models trained from text and audio samples and the use of phone and phone-sequences as decoding units. The results show that whereas the use of audio-based phonotactic models performs better than the text ones, when using task-specific information it is also possible to achieve great accuracies. The use of phone sequences as decoding units appears to be useful when constraining the phone decoders to those sequences.

This work was partially supported by the Spanish CICYT project TIN2005-08660-C04-03 and by the University of the Basque Country under grant GIU07/57.

Download to read the full chapter text

Chapter PDF

Spoken Indian language identification: a review of features and databases

Article 12 April 2018

Spoken Language Identification Using ConvNets

A Pre-classification-Based Language Identification for Northeast Indian Languages Using Prosody and Spectral Features

Article 12 October 2018

Keywords

References

Zissman, M.A.: Comparison of four approaches to automatic language identification of telephone speech. IEEE Transactions on Acoustics Speech and Audio Processing 4(1), 31–44 (1996)
Article Google Scholar
Torres-Carrasquillo, P.A., Reynolds, D.A., Deller, J.R.: Language identification using gaussian mixture model tokenization. In: ICASSP, Orlando, vol. 1, pp. 757–760 (2002)
Google Scholar
Martin, A.F., Le, A.N.: The current state of language recognition: Nist 2005 evaluation results. In: IEEE Odyssey 2006, the Speaker and Language Recognition Workshop, San Juan, Puerto Rico, pp. 1–6 (2006)
Google Scholar
Young, S.R.: Detecting misrecognitions and out-of-vocabulary words. In: ICASSP, Adelaide, Australia, vol. 2, pp. 21–24 (1994)
Google Scholar
Pérez, A., Torres, I., Casacuberta, F., Guijarrubia, V.: A Spanish-Basque weather forecast corpus for probabilistic speech translation. In: 5th SALTMIL Workshop on Minority Languages, Genoa, Italy, pp. 99–101 (2006)
Google Scholar
Guijarrubia, V., Torres, I., Rodriguez, L.J.: Evaluation of a spoken phonetic database in basque language. In: LREC, Lisbon, vol. 6, pp. 2127–2130 (2004)
Google Scholar
Moreno, A., Poch, D., Bonafonte, A., Lleida, E., Llisterri, J., Mariño, J.B., Nadeu, C.: Albayzin speech database: Design of the phonetic corpus. In: EUROSPEECH, Lisbon, vol. 1, pp. 175–178 (1993)
Google Scholar

Download references

Author information

Authors and Affiliations

Departamento de Electricidad y Electrónica, Universidad del País Vasco, Apartado 644, 48080, Bilbao, Spain
Víctor G. Guijarrubia & M. Inés Torres

Authors

Víctor G. Guijarrubia
View author publications
You can also search for this author in PubMed Google Scholar
M. Inés Torres
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

José Ruiz-Shulcloper Walter G. Kropatsch

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Guijarrubia, V.G., Torres, M.I. (2008). Comparative Study of Several Phonotactic-Based Approaches to Spanish-Basque Language Identification. In: Ruiz-Shulcloper, J., Kropatsch, W.G. (eds) Progress in Pattern Recognition, Image Analysis and Applications. CIARP 2008. Lecture Notes in Computer Science, vol 5197. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85920-8_16

Download citation

DOI: https://doi.org/10.1007/978-3-540-85920-8_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85919-2
Online ISBN: 978-3-540-85920-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Comparative Study of Several Phonotactic-Based Approaches to Spanish-Basque Language Identification

Abstract

Chapter PDF

Similar content being viewed by others

Spoken Indian language identification: a review of features and databases

Spoken Language Identification Using ConvNets

A Pre-classification-Based Language Identification for Northeast Indian Languages Using Prosody and Spectral Features

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Comparative Study of Several Phonotactic-Based Approaches to Spanish-Basque Language Identification

Abstract

Chapter PDF

Similar content being viewed by others

Spoken Indian language identification: a review of features and databases

Spoken Language Identification Using ConvNets

A Pre-classification-Based Language Identification for Northeast Indian Languages Using Prosody and Spectral Features

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation