A Straightforward Method for Automatic Identification of Marginalized Languages
Spoken language identification consists in recognizing a language based on a sample of speech from an unknown speaker. The traditional approach for this task mainly considers the phonothactic information of languages. However, for marginalized languages –languages with few speakers or oral languages without a fixed writing standard–, this information is practically not at hand and consequently the usual approach is not applicable. In this paper, we present a method that only considers the acoustic features of the speech signal and does not use any kind of linguistic information. The experimental results on a pairwise discrimination task among nine languages demonstrated that our proposal is comparable to other similar methods. Nevertheless, its great advantage is the straightforward characterization of the acoustic signal.
KeywordsSpeech Signal Gaussian Mixture Model Automatic Identification Acoustic Feature Language Identification
Unable to display preview. Download preview PDF.
- 1.Casseiro, D., Troncoso, I.: Language Identification Using Minimum Linguistic Information. In: 10th Portuguese on Pattern Recognition RECPAD 1998, Lisbon, Portugal (1998)Google Scholar
- 2.Andersen, O., Dalsgaard, P.: Language Identification based on Cross-Language Acoustic models and Optimized Information Combination. In: EUROSPEECH 1997, Rhodes, Greece (1997)Google Scholar
- 3.Cummins, F., Gers, F., Schmidhuber, J.: Language Identification from Prosody without explicit Features. In: EUROSPEECH 1999, Budapest, Hungary (1999)Google Scholar
- 4.Rouas, J.-L., Farinas, J., Pellegrino, F., André-Obrecht, R.: Modeling prosody for language identification on read and spontaneous speech. In: IEEE ICASSP 2003, Hong Kong (2003)Google Scholar
- 5.Samouelian, A.: Automatic Language Identification using Inductive Inference. In: 4th International Conference on Spoken Language Processing ICSLP 1996, Philadelphia, USA (1996)Google Scholar
- 6.Muthusamy, Y.K., Cole, R., Oshika, B.: The OGI multi-language telephone speech corpus. In: International Conference on Spoken Language Processing, Alberta, Canada (1992)Google Scholar
- 8.Huang, X., Acero, A., Hon, H.-W.: Spoken Language Processing. A Guide to Theory, Algorithm and System Development. Prentice Hall, Englewood Cliffs (2001)Google Scholar
- 9.Johnson, H., Amith, J.: Archive of the Indigenous Languages of Latin America. Access=public. Texas University, USA (2005), http://www.ailla.org