Springer Handbook of Speech Processing pp 811-824 | Cite as
Automatic Language Recognition Via Spectral and Token Based Approaches
Abstract
Automatic language recognition from speech consists of algorithms and techniques that model and classify the language being spoken. Current state-of-the-art language recognition systems fall into two broad categories: spectral- and token-sequence-based approaches. In this chapter, we describe algorithms for extracting features and models representing these types of language cues and systems for making recognition decisions using one or more of these language cues. A performance assessment of these systems is also provided, in terms of both accuracy and computation considerations, using the National Institute of Science and Technology (NIST) language recognition evaluation benchmarks.
Keywords
Support Vector Machine Linear Discriminant Analysis Gaussian Mixture Model Target Language Equal Error RateAbbreviations
- ASR
automatic speech recognition
- DET
detection error tradeoff
- EER
equal error rate
- EM
expectation maximization
- FFT
fast Fourier transform
- GLDS
generalized linear discriminant sequence
- GMM
Gaussian mixture model
- HMM
hidden Markov models
- HTK
hidden Markov model toolkit
- LDA
linear discriminant analysis
- LRE
language recognition evaluation
- MFCC
mel-filter cepstral coefficient
- PLP
perceptual linear prediction
- PPRLM
parallel PRLM
- PR
phone recognizer
- PRLM
phoneme recognition followed by language modeling
- RASTA
relative spectra
- ROC
receiver operating characteristic
- RT
rich transcription
- SDC
shifted delta cepstral
- SVM
support vector machines
- TFLLR
term frequency log-likelihood ratio
- UBM
universal background model
References
- 41.1.S. Davis, P. Mermelstein: Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences, IEEE Trans. Acoust. Speech Signal Process. (1980) pp. 357-366Google Scholar
- 41.2.H. Hermansky, N. Morgan, A. Bayya, P. Kohn: Compensation for the Effect of the Communication Channel in Auditory-Like Analysis of Speech, Proc. Eurospeech (1991) pp. 1367-1371Google Scholar
- 41.3.D.A. Reynolds: Channel robust speaker verification via feature mapping, Proceedings of the, International Conference on Acoustics Speech and Signal Processing (2003) pp. II-53, -56Google Scholar
- 41.4.P.A. Torres-Carrasquillo, E. Singer, M.A. Kohler, R.J. Greene, D.A. Reynolds, J.R. Deller Jr.: Approaches to language identification using Gaussian mixture models and shifted delta cepstral features, International Conference on Spoken Language Processing (2002) pp. 89-92Google Scholar
- 41.5.M. Zissman: Comparison of Four Approaches to Automatic Language Identification of Telephone Speech, IEEE Trans. Speech Audio Process. 4(1), 31-44 (1996)CrossRefGoogle Scholar
- 41.6.A. Dempster, N. Laird, D. Rubin: Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc. 39, 1-38 (1977)MathSciNetMATHGoogle Scholar
- 41.7.D.A. Reynolds, T.F. Quatieri, R. Dunn: Speaker Verification Using Adapted Gaussian Mixture Models, Digital Signal Process. 10(1-3), 19-41 (2000)CrossRefGoogle Scholar
- 41.8.P. Matějka, L. Burget, P. Schwarz, J. Černocký: Brno University of Technology System for NIST 2005 Language Recognition Evaluation, Proc. IEEE 2006 Odyssey: The Speaker and Language Recognition Workshop (2006)Google Scholar
- 41.9.V.N. Vapnik: Statistical Learning Theory (Wiley, New York 1998)MATHGoogle Scholar
- 41.10.R. Collobert, S. Bengio: SVMTorch: Support Vector Machines for Large-Scale Regression Problems, J. Mach. Learn. Res. 1, 143-160 (2001)MathSciNetMATHGoogle Scholar
- 41.11.W.M. Campbell: Generalized Linear Discriminant Sequence Kernels for Speaker Recognition, Proceedings of the International Conference on Acoustics Speech and Signal Processing (2002) pp. 161-164Google Scholar
- 41.12.J.C. Platt: Probabilities for SV Machines. In: Advances in Large Margin Classifiers, ed. by A.J. Smola, P.L. Bartlett, B. Schölkopf, D. Schuurmans (MIT, Cambridge 2000) pp. 61-74Google Scholar
- 41.13.W.M. Campbell, J.P. Campbell, D.A. Reynolds, E. Singer, P.A. Torres-Carasquillo: Support vector machines for speaker and language recognition, Comput. Speech. Lang. 20(2-3), 210-229 (2006)CrossRefGoogle Scholar
- 41.14.P.A. Torres-Carrasquillo, D.A. Reynolds, J.R. Deller Jr.: Language Identification Using Gaussian Mixture Model Tokenization, Proceedings of the International Conference on Acoustics Speech and Signal Processing (2002) pp. 757-760Google Scholar
- 41.15.M. Zissman, E. Singer: Automatic Language Identification of Telephone Speech Messages Using Phoneme Recognition and n -Gram Modeling, Proceedings of the International Conference on Acoustics Speech and Signal Processing (1994) pp. 305-308Google Scholar
- 41.16.Y. Yan, E. Barnard: An Approach to Automatic Language Identification Based on Language-Dependent Phone Recognition, Proceedings of the International Conference on Acoustics Speech and Signal Processing (1995) pp. 3511-3514Google Scholar
- 41.17.A. House, E. Neuberg: Toward automatic identification of the language of an utterance. I, Preliminary methodological considerations, J. Acoust. Soc. Am. 62, 708-713 (1977)CrossRefGoogle Scholar
- 41.18.H. Hermansky: Perceptual linear predictive (PLP) analysis for speech, J. Acoust Soc. Am. 87, 1738-1752 (1990)CrossRefGoogle Scholar
- 41.19.D. Zhu, M. Adda-Decker, F. Antoine: Different Size Multilingual Phone Inventories and Context-Dependent Acoustic Models for Language Identification, Proc. Interspeech (2005) pp. 2833-2836Google Scholar
- 41.20.Y.K. Muthusamy, R.A. Cole, B.T. Oshika: The OGI Multi-language Telephone Speech Corpus, International Conference on Spoken Language Processing (1992) pp. 895-898Google Scholar
- 41.21.Linguistic Data Consortium, http://www.ldc.upenn.edu/ (2007)Google Scholar
- 41.22.J.L. Gauvain, A. Messaoudi, H. Schwenk: Language Recognition Using Phoneme Lattices, International Conference on Spoken Language Processing (2004) pp. 2833-2836Google Scholar
- 41.23.W. Campbell, T. Gleason, J. Navritil, D. Reynolds, W. Shen, E. Singer, P. Torres-Carrasquillo: Advanced Language Recognition using Cepstra and Phonotactics: MITLL System Performance on the NIST 2005 Language Recognition Evaluation, Proc. IEEE 2006 Odyssey: The Speaker and Language Recognition Workshop (2006)Google Scholar
- 41.24.Eastern European Speech Databases for Creation of Voice Driven Teleservices, http://www.fee.vutbr.cz/SPEECHDAT-E/ (2007)Google Scholar
- 41.25.E. Singer, P.A. Torres-Carrasquillo, T.P. Gleason, W.M. Campbell, D.A. Reynolds: Acoustic, Phonetic, and Discriminative Approaches to Automatic Language Identification, Proceedings of Eurospeech (2003) pp. 1345-1348Google Scholar
- 41.26.J. Navratil: Recent advances in phonotactic language recognition using binary-decision trees, International Conference on Spoken Language Processing, Vol. 2 (2006)Google Scholar
- 41.27.P. Matějka, P. Schwarz, J. Černocký, P. Chytil: Phonotactic Language Identification using High Quality Phoneme Recognition, Proceedings of Interspeech (2005) pp. 2833-2836Google Scholar
- 41.28.F. Weng, A. Stolcke, A. Sankar: New Developments in Lattice-based Search Strategies in SRIʼs H4 system, Proceedings of DARPA Speech Recognition Workshop (1998) p. 100Google Scholar
- 41.29.H. Ney, X. Aubert: A word graph algorithm for large vocabulary, continuous speech recognition, International Conference on Spoken Language Processing (1994) pp. 1355-1358Google Scholar
- 41.30.F. Weng, A. Stolcke, A. Sankar: Efficient Lattice Representation and Generation, International Conference on Spoken Language Processing (1998) pp. 100-100Google Scholar
- 41.31.R. Schwartz, S. Austin: A Comparison of several approximate algorithms for finding multiple (n-best) sentence hypotheses, Proceedings of the International Conference on Acoustics Speech and Signal Processing (1991) pp. 701-704Google Scholar
- 41.32.J. Navratil: Automatic Language Identification. In: Multilingual Speech Processing, ed. by T.S. Pittsburgh, K. Kirchhoff (Academic, New York 2006)Google Scholar
- 41.33.W.M. Campbell, J.P. Campbell, D.A. Reynolds, D.A. Jones, T.R. Leek: High-Level Speaker Verification with Support Vector Machines, Proceedings of the International Conference on Acoustics Speech and Signal Processing (2004) pp. I-73-I-76Google Scholar
- 41.34.Y. Yan, E. Barnard: Experiments for an Approach to Language Identification with Conversational Telephone Speech, Proceedings of the International Conference on Acoustics Speech and Signal Processing (1996) pp. 789-792Google Scholar
- 41.35.M.A. Zissman: Predicting, Diagnosing and Improving Automatic Language Identification Performance, Proceedings of Eurospeech (1997) pp. 51-54Google Scholar
- 41.36.R. Lippman, L. Kukolich: LNKnet Pattern Recognition Software Package, http://www.ll.mit.edu/IST/lnknet/ (2007)Google Scholar
- 41.37.A. Martin, G. Doddington: The 2005 NIST Language Recognition Evaluation Plan, http://www.nist.gov/speech/tests/lang/2005/LRE05EvalPlan-v5-2.pdf (2005)Google Scholar
- 41.38.A.F. Martin, A.N. Le: The Current State of Language Recognition: NIST 2005 Evaluation Results, Proc. IEEE 2006 Odyssey: The Speaker and Language Recognition Workshop (2006)Google Scholar
- 41.39.E. Wong, S. Sridharan: Methods to improve Gaussian mixture model based language identification system, International Conference on Spoken Language Processing (2002) pp. 93-96Google Scholar
- 41.40.Hidden Markov Model Toolkit, http://htk.eng.cam.ac.uk/ (2007)Google Scholar
- 41.41.W. Shen, W. Campbell, T. Gleason, D. Reynolds, E. Singer: Experiments with Lattice-based PPRLM Language Identification, Proc. IEEE 2006 Odyssey: The Speaker and Language Recognition Workshop (2006)Google Scholar
- 41.42.A. Martin, G. Doddington, T. Kamm, M. Ordowski, M. Przybocki: The DET Curve in Assessment of Detection Task Performance, Proceedings of Eurospeech (1997) pp. 1895-1898Google Scholar