Skip to main content

An Improved Method for Finding Bilingual Collocation Correspondences from Monolingual Corpora

  • Conference paper
Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead (ICCPOL 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4285))

Included in the following conference series:

  • 1008 Accesses

Abstract

Bilingual collocation correspondence is helpful to machine translation and second language learning. Existing techniques for identifying Chinese-English collocation correspondence suffer from two major problems. They are sensitive to the coverage of the bilingual dictionary and the insensitive to semantic and contextual information. This paper presents the ICT (Improved Collocation Translation) method to overcome these problems. For a given Chinese collocation, the word translation candidates extracted from a bilingual dictionary are expanded to improve the coverage. A new translation model, which incorporates statistics extracted from monolingual corpora, word semantic similarities from monolingual thesaurus and bilingual context similarities, is employed to estimate and rank the probabilities of the collocation correspondence candidates. Experiments show that ICT is robust to the coverage of bilingual dictionary. It achieves 50.1% accuracy for the first candidate and 73.1% accuracy for the top-3 candidates.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chang, B.B.: Translation Equivalent Pairs Extraction Based on Statistical Measures. Chinese Journal of Computers 26(1), 616–621 (2003)

    Google Scholar 

  2. Dagan, I., Itai, A.: Word Sense Disambiguation Using a Second Language Monolingual Corpus. Computational Linguistics 20(4), 563–596 (1994)

    Google Scholar 

  3. Fung, P., Yuen, Y.L.: An IR Approach for Translating New Words from Nonparallel, Comparable Texts. In: Proc. of ACL 1998, pp. 414–420 (1998)

    Google Scholar 

  4. Haruno, M., Ikehara, S., Yamazaki, T.: Learning Bilingual Collocations by Word-level Sorting. In: Proc. 16th COLING, pp. 525–530 (1996)

    Google Scholar 

  5. Koehn, P., Knight, K.: Estimating Word Translation Probabilities from Unrelated Mono-lingual Corpora using the EM Algorithm. In: Proc. of NCAI 2000, pp. 711–715 (2000)

    Google Scholar 

  6. Kupiec, J.: An Algorithm for Finding Noun Phrase Correspondences in Bilingual Corpora. In: Proc. of ACL 1993, pp. 23–30 (1993)

    Google Scholar 

  7. Li, H., Li, C.: Word Translation Disambiguation Using Bilingual Bootstrapping. Computational Linguistics 30(1) (2004)

    Google Scholar 

  8. Lin, D.K.: Principar – An Efficient, Broad-coverage, Principle-based Parser. In: Proc. of 12th COLING, pp. 482–488 (1994)

    Google Scholar 

  9. Lv, Y.J., Zhou, M.: Collocation Translation Acquisition Using Monolingual Corpora. In: Proc. of ACL 2004, pp. 167–174 (2004)

    Google Scholar 

  10. Ma, J.S., Zhang, Y., Liu, T., Li, S.: A Statistical Dependency Parser of Chinese under Small Training Data. In: Proc. of 1st IJCNLP (2004)

    Google Scholar 

  11. Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)

    MATH  Google Scholar 

  12. Mei, J.J., et al. (eds.): TongYiCiCiLin. Shanghai Dictionary Press (1996)

    Google Scholar 

  13. Patwardhan.: Incorporating Dictionary and Corpus Information into a Context Vector Measure of Semantic Relatedness, MSc. Thesis, University of Minnesota, U.S (2003)

    Google Scholar 

  14. Piao, S.L., McEnery, T.: Multi-word Unit Alignment in English-Chinese Parallel Corpora. In: Proceedings of Corpus Linguistic 2001, pp. 466–475 (2001)

    Google Scholar 

  15. Rapp, R.: Automatic Identification of Word Translations from Unrelated English and German Corpora. In: Proc. of ACL 1999, pp. 519–526 (1999)

    Google Scholar 

  16. Smadja, F., Mckeown, K.F., Hatzivassiloglou, V.: Translation Collocations for Bilingual Lexicons: A Statistical Approach. Computational Linguistics 22, 1–38 (1996)

    Google Scholar 

  17. Xu, R.F., Lu, Q.: A Multi-stage Chinese Collocation Extraction System. In: Yeung, D.S., Liu, Z.-Q., Wang, X.-Z., Yan, H. (eds.) ICMLC 2005. LNCS (LNAI), vol. 3930, pp. 740–749. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  18. Yarowsky, D.: Unsupervised Word Sense Disambiguation Rivaling Supervised Methods. In: Proc. of ACL 1995, pp. 189–196 (1995)

    Google Scholar 

  19. Zhang, X.Z., Dai, W.P., Gao, P., Chen, S.B.: Everyday English Word Collocations. Dalian University of Technology Press (2003)

    Google Scholar 

  20. Zhang, Y.C., Sun, L., et al.: Bilingual Dictionary Extraction for Special Domain Based on Web Data. Journal of Chinese Information Processing 20(2), 16–23 (2006)

    Google Scholar 

  21. Zhou, M., Yuan, M., Huang, C.N.: Improving Translation Selection with a New Translation Model Trained by Independent Monolingual Corpora. Computational Linguistics and Chinese Language Processing 6(1), 1–26 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Xu, R., Wong, KF., Lu, Q., Li, W. (2006). An Improved Method for Finding Bilingual Collocation Correspondences from Monolingual Corpora. In: Matsumoto, Y., Sproat, R.W., Wong, KF., Zhang, M. (eds) Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead. ICCPOL 2006. Lecture Notes in Computer Science(), vol 4285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11940098_6

Download citation

  • DOI: https://doi.org/10.1007/11940098_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-49667-0

  • Online ISBN: 978-3-540-49668-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics