Machine Translation

, Volume 21, Issue 4, pp 187–207 | Cite as

Bilingual LSA-based adaptation for statistical machine translation

Article

Abstract

We propose a novel approach to cross-lingual language model and translation lexicon adaptation for statistical machine translation (SMT) based on bilingual latent semantic analysis. Bilingual LSA enables latent topic distributions to be efficiently transferred across languages by enforcing a one-to-one topic correspondence during training. Using the proposed bilingual LSA framework, model adaptation can be performed by, first, inferring the topic posterior distribution of the source text and then applying the inferred distribution to an n-gram language model of the target language and translation lexicon via marginal adaptation. The background phrase table is enhanced with the additional phrase scores computed using the adapted translation lexicon. The proposed framework also features rapid bootstrapping of LSA models for new languages based on a source LSA model of another language. Our approach is evaluated on the Chinese–English MT06 test set using the medium-scale SMT system and the GALE SMT system measured in BLEU and NIST scores. Improvement in both scores is observed on both systems when the adapted language model and the adapted translation lexicon are applied individually. When the adapted language model and the adapted translation lexicon are applied simultaneously, the gain is additive. At the 95% confidence interval of the unadapted baseline system, the gain in both scores is statistically significant using the medium-scale SMT system, while the gain in the NIST score is statistically significant using the GALE SMT system.

Keywords

Bilingual latent semantic analysis Latent Dirichlet-tree allocation Cross-lingual language model adaptation Lexicon adaptation Topic distribution transfer Statistical machine translation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bellegarda JR (2000) Large vocabulary speech recognition with multispan statistical language models. IEEE Trans Speech Audio Process 8: 76–84CrossRefGoogle Scholar
  2. Blei D, Ng A, Jordan M (2003) Latent Dirichlet allocation. J Mach Learn Res 3: 1107–1135CrossRefGoogle Scholar
  3. Brown PF, Della Pietra SA, Della Pietra VJ, Mercer RL (1994) The mathematics of statistical machine translation: parameter estimation. Comput Linguist 19: 263–311Google Scholar
  4. Darroch JN, Ratcliff D (1972) Generalized iterative scaling for log-linear models. Ann Math Stat 43: 1470–1480CrossRefGoogle Scholar
  5. Deerwester SC, Dumais ST, Landauer TK, Furnas GW, Harshman RA (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41: 391–407CrossRefGoogle Scholar
  6. Doddington G (2002) Automatic evaluation of MT quality using n-gram co-occurrence statistics. In: Proceedings of human language technology conference 2002, San Diego, CA, pp 138–145Google Scholar
  7. Griffiths TL, Steyvers M, Blei DM, Tenenbaum JB (2004) Integrating topics and syntax. In: Saul LK, Weiss Y, Bottou L (eds) Advances in neural information processing systems 17, Proceedings of the 2004 conference. MIT Press, Cambridge MA, pp 537–544Google Scholar
  8. Hofmann T (1999) Probabilistic latent semantic indexing. In: UAI ’99, proceedings of the fifteenth conference on uncertainty in artificial intelligence, Stockholm, Sweden, pp 289–296Google Scholar
  9. Hsu B-J(P), Glass J (2006) Style & topic language model adaptation using HMM-LDA. In: EMNLP 2006, 2006 conference on empirical methods in natural language processing, Sydney, Australia, pp 373–381Google Scholar
  10. Iyer R, Ostendorf M (1996) Modeling long distance dependence in language: topic mixtures vs. dynamic cache models. In: ICSLP 96, fourth international conference on spoken language processing, Philadelphia, PA, pp 236–239Google Scholar
  11. Jolliffe IT (2002) Principal component analysis, 2nd edn. Springer, New YorkGoogle Scholar
  12. Kim W, Khudanpur S (2003) LM adaptation using cross-lingual information. In: 8th European conference on speech communication and technology (Eurospeech 2003 – Interspeech 2003), Geneva, Switzerland, pp 3129–3132Google Scholar
  13. Kim W, Khudanpur S (2004) Cross-lingual latent semantic analysis for LM. In: 2004 IEEE international conference on acoustics, speech, and signal processing, vol 1. Montreal, Quebec, Canada, pp 257–260Google Scholar
  14. Kneser R, Peters J, Klakow D (1997) Language model adaptation using dynamic marginals. In: Proceedings of Eurospeech ’97, 5th European conference on speech communication and technology, Rhodes, Greece, pp 1971–1974Google Scholar
  15. Mrva D, Woodland PC (2006) Unsupervised language model adaptation for Mandarin broadcast conversation transcription. In: Interspeech 2006 – ICSLP, ninth international conference on spoken language processing, Pittsburgh, Pennsylvania, paper 1549-Thu1A2O.3Google Scholar
  16. Och FJ (2003) Minimum error rate training in statistical machine translation. In: ACL-03, 41st annual meeting of the Association for Computational Linguistics, Sapporo, Japan, pp 160–167Google Scholar
  17. Papineni K, Roukos S, Ward T, Zhu W (2002) BLEU: a method for automatic evaluation of machine translation. In: 40th annual meeting of the Association of Computational Linguistics, Philadelphia, Pennsylvania, pp 311–318Google Scholar
  18. Paulik M, Fügen C, Schaaf T, Schultz T, Stüker S, Waibel A (2005) Document driven machine translation enhanced automatic speech recognition. In: Proceedings of Interspeech’2005 – Eurospeech, 9th European conference on speech communication and technology, Lisbon, Portugal, pp 2261–2264Google Scholar
  19. Rottmann K, Vogel S (2007) Word reordering in statistical machine translation with a POS-based distortion model. In: TMI 2007, proceedings of the 11th international conference on theoretical and methodological issues in machine translation, Skövde, pp 171–180Google Scholar
  20. Stolcke A (2002) SRILM – an extensible language modeling toolkit. In: Proceedings of the 7th international conference on spoken language processing ICSLP/Interspeech, Denver, Colorado, pp 901–904Google Scholar
  21. Tam YC, Schultz T (2005) Language model adaptation using variational Bayes inference. In: Proceedings of Interspeech’2005 – Eurospeech, 9th European conference on speech communication and technology, Lisbon, Portugal, pp 5–8Google Scholar
  22. Tam YC, Schultz T (2006) Unsupervised language model adaptation using latent semantic marginals. In: Interspeech 2006 – ICSLP, ninth international conference on spoken language processing, Pittsburgh, Pennsylvania, paper 1705-Thu1A2O.2Google Scholar
  23. Tam YC, Schultz T (2007) Correlated latent semantic model for unsupervised language model adaptation. In: Proceedings of ICASSP 2007, international conference on acoustics, speech, and signal processing, vol IV. Honolulu, Hawaii, pp 41–44Google Scholar
  24. Tseng H, Chang P, Andrew G, Jurafsky D, Manning C (2005) A conditional random field word segmenter. In: IJCNLP-05, fourth SIGHAN workshop on Chinese language processing, Jeju Island, Korea, pp 168–171Google Scholar
  25. Vogel S, Zhang Y, Huang F, Tribble A, Venugopal A, Zhao B, Waibel A (2003) The CMU statistical translation system. In: MT summit IX, proceedings of the ninth machine translation summit, New Orleans, pp 402–409Google Scholar
  26. Zhang Y, Vogel S (2004) Measuring confidence intervals for the machine translation evaluation metrics. In: Proceedings of the tenth conference on theoretical and methodological issues in machine translation TMI-04, Baltimore, Maryland, pp 85–94Google Scholar
  27. Zhao B, Xing EP (2006) BiTAM: Bilingual topic admixture models for word alignment. In: Coling · ACL 2006, 21st international conference on computational linguistics and 44th annual meeting of the Association for Computational Linguistics, proceedings of the main conference poster sessions, Sydney, Australia, pp 969–976Google Scholar
  28. Zhao B, Xing EP (2007) HM-BiTAM: Bilingual topic exploration, word alignment, and translation. In: Twenty-second annual conference on neural information processing systems, Vancouver BC, CanadaGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2008

Authors and Affiliations

  1. 1.Carnegie Mellon UniversityPittsburghUSA

Personalised recommendations