Advertisement

DUSTer: A Method for Unraveling Cross-Language Divergences for Statistical Word-Level Alignment

  • Bonnie J. Dorr
  • Lisa Pearl
  • Rebecca Hwa
  • Nizar Habash
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2499)

Abstract

The frequent occurrence of divergenceS—structural differences between languages—presents a great challenge for statistical word-level alignment. In this paper, we introduce DUSTer, a method for systematically identifying common divergence types and transforming an English sentence structure to bear a closer resemblance to that of another language. Our ultimate goal is to enable more accurate alignment and projection of dependency trees in another language without requiring any training on dependency-tree data in that language. We present an empirical analysis comparing the complexities of performing word-level alignments with and without divergence handling. Our results suggest that our approach facilitates word-level alignment, particularly for sentence pairs containing divergences.

Keywords

Foreign Language Machine Translation Regular Expression Dependency Tree Statistical Machine Translation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Al-Onaizan, Y., Curin, J., Jahr, M., Knight, K., Lafferty, J., Melamed, I.D., Och, F.J., Purdy, D., Smith, N.A., Yarowsky, D.: Statistical machine translation: Final report. In: Proceedings of the Summer Workshop on Language Engineering. John Hopkins University Center for Language and Speech Processing (1999)Google Scholar
  2. 2.
    Alshawi, H., Douglas, S.: Learning Dependency Transduction Models from Unannotated Examples. Philosophical Transactions, Series A: Mathematical, Physical and Engineering Sciences (2000)Google Scholar
  3. 3.
    Alshawi, H., Bangalore, S., Douglas, S.: Learning Dependency Translation Models as Collections of Finite State Head Transducers. Computational Linguistics. Vol. 26 (2000)Google Scholar
  4. 4.
    Brown, P.F., Cocke, J., Della-Pietra, S., Della-Pietra, V.J., Jelinek, F., Lafferty, J.D., Mercer, R.L., Roossin, P.S.: A Statistical Approach to Machine Translation. Computational Linguistics. Vol. 16(2) (1990) 79–85Google Scholar
  5. 5.
    Brown, P.F., Della-Pietra, S.A., Della-Pietra, V.J., Mercer, R.L.: The Mathematics of Machine Translation: Parameter Estimation. Computational Linguistics. (1993)Google Scholar
  6. 6.
    Dorr, B.J., Pearl, L., Hwa, R., Habash, N.: Improved Word-Level Alignment: Injecting Knowledge about MT Divergences. University of Maryland Technical Report LAMP-TR-082, CS-TR-4333, UMIACS-TR-2002-15 College Park, MD. (2002)Google Scholar
  7. 7.
    Fellbaum, C., Palmer, M., Dang, H.T., Delfs, L., Wolff, S.: Manual and Automatic Semantic Annotation with WordNet. In: Proceedings of the NAACL Workshop on WordNet and Other Lexical Resources: Applications, Customizations. Carnegie Mellon University. Pittsburg, PA (2001)Google Scholar
  8. 8.
    Habash, N., Dorr, B.J.: Generation-Heavy Machine Translation. In: Proceedings of the Fifth Conference of the Association for Machine Translation in the Americas, AMTA-2002 (this volume). Tiburon, CA. (2002)Google Scholar
  9. 9.
    Han, C.-H., Lavoie, B., Palmer, M., Rambow, O., Kittredge, R., Korelsky, T., Kim, N., Kim, M.: Handling Structural Divergences and Recovering Dropped Arguments in a Korean/English Machine Translation System. In: Proceedings of the Fourth Conference of the Association for Machine Translation in the Americas, AMTA-2000. Cuernavaca, Mexico (2000)Google Scholar
  10. 10.
    Hermjakob, U., Mooney, R.J.: Learning Parse and Translation Decisions from Examples with Rich Context. In: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics. (1997) 482–489Google Scholar
  11. 11.
    Hwa, R.: Sample selection for statistical grammar induction. In: Proceedings of the 2000 Joint SIGDAT Conference on EMNLP and VLC. Hong Kong, China (2000) 45–52Google Scholar
  12. 12.
    Hwa, R., Resnik, P., Weinberg, A., Kolak, O.: Evaluating Translational Correspondence Using Annotation Projection. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Philadelphia, PA (2002)Google Scholar
  13. 13.
    Lavoie, B., Kittredge, R., Korelsky, T., Rambow, O.: A Framework for MT and Multilingual NLG Systems Based on Uniform Lexico-Structural Processing. In: Proceedings of the 1st Annual North American Association of Computational Linguistics, ANLP/NAACL-2000. Seattle, WA (2000)Google Scholar
  14. 14.
    Lavoie, B., White, M., Korelsky, T.: Inducing Lexico-Structural Transfer Rules from Parsed Bi-texts. In: Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics-DDMT Workshop. Toulouse, France (2001)Google Scholar
  15. 15.
    Lin, D.: Government-Binding Theory and Principle-Based Parsing. University of Maryland Technical Report. Submitted to Computational Linguistics. University of Maryland (1995)Google Scholar
  16. 16.
    Lin, D.: Dependency-Based Evaluation of MINIPAR. In: Proceedings of the Workshop on the Evaluation of Parsing Systems, First International Conference on Language Resources and Evaluation. Granada, Spain (1998)Google Scholar
  17. 17.
    Melamed, I.D.: Empirical Methods for MT Lexicon Development. In: Proceedings of the Third Conference of the Association for Machine Translation in the Americas, AMTA-98. Langhorne, PA (1998)Google Scholar
  18. 18.
    Menezes, A., Richardson, S.D.: A best-first alignment algorithm for automatic extraction of transfer mappings from bilingual corpora. In: Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics-DDMT Workshop. Toulouse, France (2001)Google Scholar
  19. 19.
    Meyers, A., Kosaka, M., Grishman, R.: Chart-Based Transfer Rule Application in Machine Translation. In: Proceedings of the 18th International Conference on Computational Linguistics (COLING-2000). Saarbrüken, Germany (2000)Google Scholar
  20. 20.
    Och, F.J., Ney, H.: Improved Statistical Alignment Models. In: Proceedings of the 38th Annual Conference of the Association for Computational Linguistics. Hongkong, China (2000) 440–447Google Scholar
  21. 21.
    Slobin, D.I.: Two Ways to Travel: Verbs of Motion in English and Spanish. In: Shibatani, M., Thompson, S.A. (eds.): Grammatical Constructions: Their Form and Meaning. Oxford University Press, New York (1996) 195–219Google Scholar
  22. 22.
    Watanabe, H., Kurohashi, S., Aramaki, E.: Finding Structural Correspondences from Bilingual Parsed Corpus for Corpus-based Transaltion. In: Proceedings of the 18th International Conference on Computational Linguistics (COLING-2000). Saarbrüken, Germany (2000)Google Scholar
  23. 23.
    Wu, D.: Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora. Computational Linguistics. Vol. 23(3) (1997) 377–400Google Scholar
  24. 24.
    Xia, F., Palmer, M., Xue, N., Okurowski, M.E., Kovarik, J., Huang, S., Kroch, T., Marcus, M.: Developing Guidelines and Ensuring Consistency for Chinese Text Annotation. In: Proceedings of the 2nd International Conference on Language Resources and Evaluation (LREC-2000). Athens, Greece (2000)Google Scholar
  25. 25.
    Yamada, K., Knight, K.: A Syntax-Based Statistical Translation Model. In: Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics. Toulouse, France (2001) 523–529Google Scholar
  26. 26.
    Yarowsky, D., Ngai, G.: Inducing Multilingual POS Taggers and NP Bracketers via Robust Projection across Aligned Corpora. In: Proceedings of NAACL-2001. Pittsburgh, PA (2001) 200–207Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Bonnie J. Dorr
    • 1
  • Lisa Pearl
    • 1
  • Rebecca Hwa
    • 1
  • Nizar Habash
    • 1
  1. 1.Institute for Advanced Computer StudiesUniversity of MarylandCollege Park

Personalised recommendations