A Straightforward Method for Automatic Identification of Marginalized Languages

  • Ana Lilia Reyes-Herrera
  • Luis Villaseñor-Pineda
  • Manuel Montes-y-Gómez
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4139)


Spoken language identification consists in recognizing a language based on a sample of speech from an unknown speaker. The traditional approach for this task mainly considers the phonothactic information of languages. However, for marginalized languages –languages with few speakers or oral languages without a fixed writing standard–, this information is practically not at hand and consequently the usual approach is not applicable. In this paper, we present a method that only considers the acoustic features of the speech signal and does not use any kind of linguistic information. The experimental results on a pairwise discrimination task among nine languages demonstrated that our proposal is comparable to other similar methods. Nevertheless, its great advantage is the straightforward characterization of the acoustic signal.


Speech Signal Gaussian Mixture Model Automatic Identification Acoustic Feature Language Identification 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Casseiro, D., Troncoso, I.: Language Identification Using Minimum Linguistic Information. In: 10th Portuguese on Pattern Recognition RECPAD 1998, Lisbon, Portugal (1998)Google Scholar
  2. 2.
    Andersen, O., Dalsgaard, P.: Language Identification based on Cross-Language Acoustic models and Optimized Information Combination. In: EUROSPEECH 1997, Rhodes, Greece (1997)Google Scholar
  3. 3.
    Cummins, F., Gers, F., Schmidhuber, J.: Language Identification from Prosody without explicit Features. In: EUROSPEECH 1999, Budapest, Hungary (1999)Google Scholar
  4. 4.
    Rouas, J.-L., Farinas, J., Pellegrino, F., André-Obrecht, R.: Modeling prosody for language identification on read and spontaneous speech. In: IEEE ICASSP 2003, Hong Kong (2003)Google Scholar
  5. 5.
    Samouelian, A.: Automatic Language Identification using Inductive Inference. In: 4th International Conference on Spoken Language Processing ICSLP 1996, Philadelphia, USA (1996)Google Scholar
  6. 6.
    Muthusamy, Y.K., Cole, R., Oshika, B.: The OGI multi-language telephone speech corpus. In: International Conference on Spoken Language Processing, Alberta, Canada (1992)Google Scholar
  7. 7.
    Ramus, F., Nespor, M., Mehler, J.: Correlates of linguistic rhythm in the speech signal. Cognition 73(3), 265–293 (1999)CrossRefGoogle Scholar
  8. 8.
    Huang, X., Acero, A., Hon, H.-W.: Spoken Language Processing. A Guide to Theory, Algorithm and System Development. Prentice Hall, Englewood Cliffs (2001)Google Scholar
  9. 9.
    Johnson, H., Amith, J.: Archive of the Indigenous Languages of Latin America. Access=public. Texas University, USA (2005), http://www.ailla.org

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Ana Lilia Reyes-Herrera
    • 1
  • Luis Villaseñor-Pineda
    • 1
  • Manuel Montes-y-Gómez
    • 1
  1. 1.Language Technologies Group, Computer Science DepartmentNational Institute of Astrophysics, Optics and Electronics (INAOE)Mexico

Personalised recommendations