Advertisement

Unsupervised Accent Modeling for Language Identification

  • David Martínez González
  • Jesús Villalba López
  • Eduardo Lleida Solano
  • Alfonso Ortega Gimenez
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8854)

Abstract

In this paper we propose to cluster iVectors to model different accents within a language. The motivation is that not all the speakers of the same language have the same pronunciation style. This source of variability is not usually considered in state-of-the-art language identification systems, and we show that taking it into account helps. For each language, the iVector space is partitioned according to the similarity of the iVectors, and each cluster is considered a different accent. Then, a simplified probabilistic linear discriminant analysis model is trained with all the accents, and during the test, each utterance is evaluated against all of them. The highest score of each language is selected to make decisions. The experiment was carried out on 6 languages of the 2011 NIST LRE dataset. For the 30 s condition, the relative improvement over the baseline was of 11%.

Keywords

Language identification accent modeling iVector agglomerative hierarchical clustering PLDA 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Humphries, J.J., Woodland, P.C., Pearce, D.: Using Accent-Specific Pronunciation Modelling for Robust Speech Recognition. In: Fourth International Conference on Spoken Language Processing, Philadelphia, PA, USA, pp. 2324–2327 (1996)Google Scholar
  2. 2.
    Glembek, O., Matejka, P., Burget, L., Mikolov, T.: Advances in Phonotactic Language Recognition. In: Interspeech, Brisbane, Australia (2008)Google Scholar
  3. 3.
    Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P.: Front-End Factor Analysis for Speaker Verification. IEEE Transactions on Audio, Speech, and Language Processing 19(4), 788–798 (2011)CrossRefGoogle Scholar
  4. 4.
    Brümmer, N., de Villiers, E.: The Speaker Partitioning Problem. In: Odyssey 2010: The Speaker and Language Recognition Workshop, Brno, Czech Republic, pp. 194–201 (2010)Google Scholar
  5. 5.
    Villalba, J., Lleida, E.: Handling iVectors from Different Recording Conditions Using Multi-Channel Simplified PLDA in Speaker Recognition. In: ICASSP, Vancouver, BC, Canada, pp. 6763–6767 (2013)Google Scholar
  6. 6.
    Bishop, C.M.: Pattern Recognition and Machine Learning. Springer (2006)Google Scholar
  7. 7.
    Martínez, D., Plchot, O., Burget, L., Ondrej, G., Matejka, P.: Language Recognition in iVectors Space. In: Interspeech, Florence, Italy (2011)Google Scholar
  8. 8.
    Garcia-Romero, D., Espy-Wilson, C.: Analysis of i-Vector Length Normalization in Speaker Recognition Systems. In: Interspeech, Florence, Italy, pp. 249–252 (2011)Google Scholar
  9. 9.
    Avilés-Casco, C.V.: Robust Diarization for Speaker Characterization. PhD thesis, Universidad de Zaragoza (2011)Google Scholar
  10. 10.
    Brümmer, N.: FoCal Multi-Class: Toolkit for Evaluation, Fusion and Calibration of Multi-Class Recognition Scores (2007)Google Scholar
  11. 11.
    Martínez, D., Villalba, J., Ortega, A., Lleida, E.: I3A Language Recognition System Description for NIST LRE 2011. In: NIST Language Recognition Evaluation, Atlanta, GE, USA (2011)Google Scholar
  12. 12.
    The 2011 NIST Language Recognition Evaluation Plan (LRE 2011). Technical report (2011)Google Scholar
  13. 13.
    The 2009 NIST Language Recognition Evaluation Plan (LRE 2009). Technical report (2009)Google Scholar
  14. 14.
    Lopez-Moreno, I., Gonzalez-Dominguez, J., Plchot, O., Martínez, D., Gonzalez-Rodriguez, J., Moreno, P.: Automatic Language Identification Using Deep Neural Networks. In: ICASSP, Florence, Italy (2014)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • David Martínez González
    • 1
  • Jesús Villalba López
    • 1
  • Eduardo Lleida Solano
    • 1
  • Alfonso Ortega Gimenez
    • 1
  1. 1.ViVoLab Speech Technologies GroupUniversity of ZaragozaSpain

Personalised recommendations