Automatic Language Recognition Via Spectral and Token Based Approaches

Part of the Springer Handbooks book series (SPRINGERHAND)

Abstract

Automatic language recognition from speech consists of algorithms and techniques that model and classify the language being spoken. Current state-of-the-art language recognition systems fall into two broad categories: spectral- and token-sequence-based approaches. In this chapter, we describe algorithms for extracting features and models representing these types of language cues and systems for making recognition decisions using one or more of these language cues. A performance assessment of these systems is also provided, in terms of both accuracy and computation considerations, using the National Institute of Science and Technology (NIST) language recognition evaluation benchmarks.

Keywords

Support Vector Machine Linear Discriminant Analysis Gaussian Mixture Model Target Language Equal Error Rate 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Abbreviations

ASR

automatic speech recognition

DET

detection error tradeoff

EER

equal error rate

EM

expectation maximization

FFT

fast Fourier transform

GLDS

generalized linear discriminant sequence

GMM

Gaussian mixture model

HMM

hidden Markov models

HTK

hidden Markov model toolkit

LDA

linear discriminant analysis

LRE

language recognition evaluation

MFCC

mel-filter cepstral coefficient

PLP

perceptual linear prediction

PPRLM

parallel PRLM

PR

phone recognizer

PRLM

phoneme recognition followed by language modeling

RASTA

relative spectra

ROC

receiver operating characteristic

RT

rich transcription

SDC

shifted delta cepstral

SVM

support vector machines

TFLLR

term frequency log-likelihood ratio

UBM

universal background model

References

  1. 41.1.
    S. Davis, P. Mermelstein: Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences, IEEE Trans. Acoust. Speech Signal Process. (1980) pp. 357-366Google Scholar
  2. 41.2.
    H. Hermansky, N. Morgan, A. Bayya, P. Kohn: Compensation for the Effect of the Communication Channel in Auditory-Like Analysis of Speech, Proc. Eurospeech (1991) pp. 1367-1371Google Scholar
  3. 41.3.
    D.A. Reynolds: Channel robust speaker verification via feature mapping, Proceedings of the, International Conference on Acoustics Speech and Signal Processing (2003) pp. II-53, -56Google Scholar
  4. 41.4.
    P.A. Torres-Carrasquillo, E. Singer, M.A. Kohler, R.J. Greene, D.A. Reynolds, J.R. Deller Jr.: Approaches to language identification using Gaussian mixture models and shifted delta cepstral features, International Conference on Spoken Language Processing (2002) pp. 89-92Google Scholar
  5. 41.5.
    M. Zissman: Comparison of Four Approaches to Automatic Language Identification of Telephone Speech, IEEE Trans. Speech Audio Process. 4(1), 31-44 (1996)CrossRefGoogle Scholar
  6. 41.6.
    A. Dempster, N. Laird, D. Rubin: Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc. 39, 1-38 (1977)MathSciNetMATHGoogle Scholar
  7. 41.7.
    D.A. Reynolds, T.F. Quatieri, R. Dunn: Speaker Verification Using Adapted Gaussian Mixture Models, Digital Signal Process. 10(1-3), 19-41 (2000)CrossRefGoogle Scholar
  8. 41.8.
    P. Matějka, L. Burget, P. Schwarz, J. Černocký: Brno University of Technology System for NIST 2005 Language Recognition Evaluation, Proc. IEEE 2006 Odyssey: The Speaker and Language Recognition Workshop (2006)Google Scholar
  9. 41.9.
    V.N. Vapnik: Statistical Learning Theory (Wiley, New York 1998)MATHGoogle Scholar
  10. 41.10.
    R. Collobert, S. Bengio: SVMTorch: Support Vector Machines for Large-Scale Regression Problems, J. Mach. Learn. Res. 1, 143-160 (2001)MathSciNetMATHGoogle Scholar
  11. 41.11.
    W.M. Campbell: Generalized Linear Discriminant Sequence Kernels for Speaker Recognition, Proceedings of the International Conference on Acoustics Speech and Signal Processing (2002) pp. 161-164Google Scholar
  12. 41.12.
    J.C. Platt: Probabilities for SV Machines. In: Advances in Large Margin Classifiers, ed. by A.J. Smola, P.L. Bartlett, B. Schölkopf, D. Schuurmans (MIT, Cambridge 2000) pp. 61-74Google Scholar
  13. 41.13.
    W.M. Campbell, J.P. Campbell, D.A. Reynolds, E. Singer, P.A. Torres-Carasquillo: Support vector machines for speaker and language recognition, Comput. Speech. Lang. 20(2-3), 210-229 (2006)CrossRefGoogle Scholar
  14. 41.14.
    P.A. Torres-Carrasquillo, D.A. Reynolds, J.R. Deller Jr.: Language Identification Using Gaussian Mixture Model Tokenization, Proceedings of the International Conference on Acoustics Speech and Signal Processing (2002) pp. 757-760Google Scholar
  15. 41.15.
    M. Zissman, E. Singer: Automatic Language Identification of Telephone Speech Messages Using Phoneme Recognition and n -Gram Modeling, Proceedings of the International Conference on Acoustics Speech and Signal Processing (1994) pp. 305-308Google Scholar
  16. 41.16.
    Y. Yan, E. Barnard: An Approach to Automatic Language Identification Based on Language-Dependent Phone Recognition, Proceedings of the International Conference on Acoustics Speech and Signal Processing (1995) pp. 3511-3514Google Scholar
  17. 41.17.
    A. House, E. Neuberg: Toward automatic identification of the language of an utterance. I, Preliminary methodological considerations, J. Acoust. Soc. Am. 62, 708-713 (1977)CrossRefGoogle Scholar
  18. 41.18.
    H. Hermansky: Perceptual linear predictive (PLP) analysis for speech, J. Acoust Soc. Am. 87, 1738-1752 (1990)CrossRefGoogle Scholar
  19. 41.19.
    D. Zhu, M. Adda-Decker, F. Antoine: Different Size Multilingual Phone Inventories and Context-Dependent Acoustic Models for Language Identification, Proc. Interspeech (2005) pp. 2833-2836Google Scholar
  20. 41.20.
    Y.K. Muthusamy, R.A. Cole, B.T. Oshika: The OGI Multi-language Telephone Speech Corpus, International Conference on Spoken Language Processing (1992) pp. 895-898Google Scholar
  21. 41.21.
    Linguistic Data Consortium, http://www.ldc.upenn.edu/ (2007)Google Scholar
  22. 41.22.
    J.L. Gauvain, A. Messaoudi, H. Schwenk: Language Recognition Using Phoneme Lattices, International Conference on Spoken Language Processing (2004) pp. 2833-2836Google Scholar
  23. 41.23.
    W. Campbell, T. Gleason, J. Navritil, D. Reynolds, W. Shen, E. Singer, P. Torres-Carrasquillo: Advanced Language Recognition using Cepstra and Phonotactics: MITLL System Performance on the NIST 2005 Language Recognition Evaluation, Proc. IEEE 2006 Odyssey: The Speaker and Language Recognition Workshop (2006)Google Scholar
  24. 41.24.
    Eastern European Speech Databases for Creation of Voice Driven Teleservices, http://www.fee.vutbr.cz/SPEECHDAT-E/ (2007)Google Scholar
  25. 41.25.
    E. Singer, P.A. Torres-Carrasquillo, T.P. Gleason, W.M. Campbell, D.A. Reynolds: Acoustic, Phonetic, and Discriminative Approaches to Automatic Language Identification, Proceedings of Eurospeech (2003) pp. 1345-1348Google Scholar
  26. 41.26.
    J. Navratil: Recent advances in phonotactic language recognition using binary-decision trees, International Conference on Spoken Language Processing, Vol. 2 (2006)Google Scholar
  27. 41.27.
    P. Matějka, P. Schwarz, J. Černocký, P. Chytil: Phonotactic Language Identification using High Quality Phoneme Recognition, Proceedings of Interspeech (2005) pp. 2833-2836Google Scholar
  28. 41.28.
    F. Weng, A. Stolcke, A. Sankar: New Developments in Lattice-based Search Strategies in SRIʼs H4 system, Proceedings of DARPA Speech Recognition Workshop (1998) p. 100Google Scholar
  29. 41.29.
    H. Ney, X. Aubert: A word graph algorithm for large vocabulary, continuous speech recognition, International Conference on Spoken Language Processing (1994) pp. 1355-1358Google Scholar
  30. 41.30.
    F. Weng, A. Stolcke, A. Sankar: Efficient Lattice Representation and Generation, International Conference on Spoken Language Processing (1998) pp. 100-100Google Scholar
  31. 41.31.
    R. Schwartz, S. Austin: A Comparison of several approximate algorithms for finding multiple (n-best) sentence hypotheses, Proceedings of the International Conference on Acoustics Speech and Signal Processing (1991) pp. 701-704Google Scholar
  32. 41.32.
    J. Navratil: Automatic Language Identification. In: Multilingual Speech Processing, ed. by T.S. Pittsburgh, K. Kirchhoff (Academic, New York 2006)Google Scholar
  33. 41.33.
    W.M. Campbell, J.P. Campbell, D.A. Reynolds, D.A. Jones, T.R. Leek: High-Level Speaker Verification with Support Vector Machines, Proceedings of the International Conference on Acoustics Speech and Signal Processing (2004) pp. I-73-I-76Google Scholar
  34. 41.34.
    Y. Yan, E. Barnard: Experiments for an Approach to Language Identification with Conversational Telephone Speech, Proceedings of the International Conference on Acoustics Speech and Signal Processing (1996) pp. 789-792Google Scholar
  35. 41.35.
    M.A. Zissman: Predicting, Diagnosing and Improving Automatic Language Identification Performance, Proceedings of Eurospeech (1997) pp. 51-54Google Scholar
  36. 41.36.
    R. Lippman, L. Kukolich: LNKnet Pattern Recognition Software Package, http://www.ll.mit.edu/IST/lnknet/ (2007)Google Scholar
  37. 41.37.
    A. Martin, G. Doddington: The 2005 NIST Language Recognition Evaluation Plan, http://www.nist.gov/speech/tests/lang/2005/LRE05EvalPlan-v5-2.pdf (2005)Google Scholar
  38. 41.38.
    A.F. Martin, A.N. Le: The Current State of Language Recognition: NIST 2005 Evaluation Results, Proc. IEEE 2006 Odyssey: The Speaker and Language Recognition Workshop (2006)Google Scholar
  39. 41.39.
    E. Wong, S. Sridharan: Methods to improve Gaussian mixture model based language identification system, International Conference on Spoken Language Processing (2002) pp. 93-96Google Scholar
  40. 41.40.
    Hidden Markov Model Toolkit, http://htk.eng.cam.ac.uk/ (2007)Google Scholar
  41. 41.41.
    W. Shen, W. Campbell, T. Gleason, D. Reynolds, E. Singer: Experiments with Lattice-based PPRLM Language Identification, Proc. IEEE 2006 Odyssey: The Speaker and Language Recognition Workshop (2006)Google Scholar
  42. 41.42.
    A. Martin, G. Doddington, T. Kamm, M. Ordowski, M. Przybocki: The DET Curve in Assessment of Detection Task Performance, Proceedings of Eurospeech (1997) pp. 1895-1898Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  1. 1.Lincoln Laboratory, Information Systems Technology GroupMassachusetts Institute of TechnologyLexingtonUSA
  2. 2.Information Systems Technology GroupMIT Lincoln LaboratoryLexingtonUSA
  3. 3.Communication Systems, Information Systems Technology, Lincoln LaboratoryMassachusetts Institute of TechnologyLexingtonUSA
  4. 4.Information Systems Technology Group, Lincoln LaboratoryMassachusetts Institute of TechnologyLexingtonUSA

Personalised recommendations