Skip to main content

Bilingual Chunk Alignment Based on Interactional Matching and Probabilistic Latent Semantic Indexing

  • Conference paper
  • 1580 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3248))

Abstract

An integrated method for bilingual chunk partition andalignment, called “Interactional Matching”, is proposed in this paper. Different from former works, our method tries to get as necessary information as possible from the bilingual corpora themselves, and through bilingual constraint it can automatically build one-to-one chunk-pairs associated with the chunk-pair confidence coefficients. Also, our method partitions bilingual sentences entirely into chunks with no fragments left, different from collocation extracting methods. Furthermore, with the technology of Probabilistic Latent Semantic Indexing(PLSI), this method can deal with not only compositional chunks, but also non-compositional ones. The experiments show that, for overall process (including partition and alignment), our method can obtain 85% precision with 57% recall for the written language chunk-pairs and 78% precision with 53% recall for the spoken language chunk-pairs.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Smadja, F.: Retrieving Collocations from Text: Xtract. Computational Linguistics 19(1), 143–177 (1993)

    Google Scholar 

  2. Zhou, Q.: Automatically Bracket and Tag Chinese Phrase. Journal of Chinese Information Processing 11(1), 1–10 (1997)

    Google Scholar 

  3. Chen, B., Du Alignment, L.: of Single Source Words and Target Multiword Units from Parallet Corpus. In: 1st Students’ Workshop on Computational Linguistics Proceedings, August 20-23, pp. 318-127 (2002)

    Google Scholar 

  4. Silva, J.F., Dias, G., Guillor, S., Lopes, J.G.P.: Using Localmaxs Algorithm for Extraction of Contiguous and Non-contiguous Multiword Lexical Units. In: 9th Portuguese Conference in Artificial Intelligence. Lecture Notes, Spring-Verlag, Universidade de Evora (1999)

    Google Scholar 

  5. Wang, W., Zhou, M., Huang, J., Huang, C.: Structure Alignment Using Bilingual Chunking. In: Proceedings of COLING 2002, Taipei, August 24-September 1 (2002)

    Google Scholar 

  6. Hofmann, T.: Probabilistic Latent Semantic Indexing. In: Proceedings of the 22nd Annual ACM Conference on Research and Development in Information Retrieval, Berkeley, Cali-fornia, pp. 50–57 (1999)

    Google Scholar 

  7. Wu, D.: Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora. Computational Linguistics 23(3), 377–400 (1997)

    Google Scholar 

  8. Blei, D., Ng, A.Y., Jordan, M.: Latent Dirichlet Allocation. Journal of Machine Learning Research, 993–1022 (2003)

    Google Scholar 

  9. Golub, G., Solna, K., Van Dooren, P.: Computing the SVD of a General Matrix Product/Quotient. SIAM Journal on Matrix Analysis and Applications 22(1), 1–19 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  10. Cheng, W., Zhao, J., Xu, B., Liu, F.: Bilingual Chunking for Chinese- English Spoken-language Translation. Journal of Chinese Information Processing 17(2), 21–27 (2003)

    Google Scholar 

  11. Zhao, J.: The Framework of Cross-lingual Information Retrieval. Chinese-Japanese Natural Language Processing Proseminar (2nd) (2002)

    Google Scholar 

  12. Li, C., Li, H.: Word Translation Disambiguation Using Bilingual Bootstrapping. In: Proceedings of the Fortieth Annual Meeting of the Association for Computational Linguistics (ACL 2002), Philadelphia (July 2002)

    Google Scholar 

  13. Watanabe, H., Kurohashi, S., Aramaki, E.: Finding Structural Correspondences from Bilingual Parsed Corpus for Corpus-based Translation. In: COLING 2000 (2000)

    Google Scholar 

  14. Le, S., Youbing, J., Lin, D., Yufang, S.: Word Alignment of English-Chinese Bilingual Corpus Based on Chunks. In: Proc. 2000 EMNLP and VLC, pp. 110–116 (2000)

    Google Scholar 

  15. Jin, Q.: Zhao, J., Xu, B.: Weakly-Supervised Probabilistic Latent Semantic Analysis and its Applications in Multilingual Information Retrieval. In: Proceedings of 7th Joint Symposium on Computational Linguistics, August 9-11, pp. 9–11 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Liu, F., Jin, Q., Zhao, J., Xu, B. (2005). Bilingual Chunk Alignment Based on Interactional Matching and Probabilistic Latent Semantic Indexing. In: Su, KY., Tsujii, J., Lee, JH., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2004. IJCNLP 2004. Lecture Notes in Computer Science(), vol 3248. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30211-7_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30211-7_44

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-24475-2

  • Online ISBN: 978-3-540-30211-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics