Bilingual Chunk Alignment Based on Interactional Matching and Probabilistic Latent Semantic Indexing

  • Feifan Liu
  • Qianli Jin
  • Jun Zhao
  • Bo Xu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3248)


An integrated method for bilingual chunk partition andalignment, called “Interactional Matching”, is proposed in this paper. Different from former works, our method tries to get as necessary information as possible from the bilingual corpora themselves, and through bilingual constraint it can automatically build one-to-one chunk-pairs associated with the chunk-pair confidence coefficients. Also, our method partitions bilingual sentences entirely into chunks with no fragments left, different from collocation extracting methods. Furthermore, with the technology of Probabilistic Latent Semantic Indexing(PLSI), this method can deal with not only compositional chunks, but also non-compositional ones. The experiments show that, for overall process (including partition and alignment), our method can obtain 85% precision with 57% recall for the written language chunk-pairs and 78% precision with 53% recall for the spoken language chunk-pairs.


Bilingual Chunking Alignment Interactional Matching PLSI 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Smadja, F.: Retrieving Collocations from Text: Xtract. Computational Linguistics 19(1), 143–177 (1993)Google Scholar
  2. 2.
    Zhou, Q.: Automatically Bracket and Tag Chinese Phrase. Journal of Chinese Information Processing 11(1), 1–10 (1997)Google Scholar
  3. 3.
    Chen, B., Du Alignment, L.: of Single Source Words and Target Multiword Units from Parallet Corpus. In: 1st Students’ Workshop on Computational Linguistics Proceedings, August 20-23, pp. 318-127 (2002)Google Scholar
  4. 4.
    Silva, J.F., Dias, G., Guillor, S., Lopes, J.G.P.: Using Localmaxs Algorithm for Extraction of Contiguous and Non-contiguous Multiword Lexical Units. In: 9th Portuguese Conference in Artificial Intelligence. Lecture Notes, Spring-Verlag, Universidade de Evora (1999)Google Scholar
  5. 5.
    Wang, W., Zhou, M., Huang, J., Huang, C.: Structure Alignment Using Bilingual Chunking. In: Proceedings of COLING 2002, Taipei, August 24-September 1 (2002)Google Scholar
  6. 6.
    Hofmann, T.: Probabilistic Latent Semantic Indexing. In: Proceedings of the 22nd Annual ACM Conference on Research and Development in Information Retrieval, Berkeley, Cali-fornia, pp. 50–57 (1999)Google Scholar
  7. 7.
    Wu, D.: Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora. Computational Linguistics 23(3), 377–400 (1997)Google Scholar
  8. 8.
    Blei, D., Ng, A.Y., Jordan, M.: Latent Dirichlet Allocation. Journal of Machine Learning Research, 993–1022 (2003)Google Scholar
  9. 9.
    Golub, G., Solna, K., Van Dooren, P.: Computing the SVD of a General Matrix Product/Quotient. SIAM Journal on Matrix Analysis and Applications 22(1), 1–19 (2000)zbMATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Cheng, W., Zhao, J., Xu, B., Liu, F.: Bilingual Chunking for Chinese- English Spoken-language Translation. Journal of Chinese Information Processing 17(2), 21–27 (2003)Google Scholar
  11. 11.
    Zhao, J.: The Framework of Cross-lingual Information Retrieval. Chinese-Japanese Natural Language Processing Proseminar (2nd) (2002)Google Scholar
  12. 12.
    Li, C., Li, H.: Word Translation Disambiguation Using Bilingual Bootstrapping. In: Proceedings of the Fortieth Annual Meeting of the Association for Computational Linguistics (ACL 2002), Philadelphia (July 2002)Google Scholar
  13. 13.
    Watanabe, H., Kurohashi, S., Aramaki, E.: Finding Structural Correspondences from Bilingual Parsed Corpus for Corpus-based Translation. In: COLING 2000 (2000)Google Scholar
  14. 14.
    Le, S., Youbing, J., Lin, D., Yufang, S.: Word Alignment of English-Chinese Bilingual Corpus Based on Chunks. In: Proc. 2000 EMNLP and VLC, pp. 110–116 (2000)Google Scholar
  15. 15.
    Jin, Q.: Zhao, J., Xu, B.: Weakly-Supervised Probabilistic Latent Semantic Analysis and its Applications in Multilingual Information Retrieval. In: Proceedings of 7th Joint Symposium on Computational Linguistics, August 9-11, pp. 9–11 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Feifan Liu
    • 1
  • Qianli Jin
    • 1
  • Jun Zhao
    • 1
  • Bo Xu
    • 1
  1. 1.National Laboratory of Pattern RecognitionInstitute of Automation, Chinese Academy of SciencesBeijing

Personalised recommendations