Automatic Acquisition of Adjacent Information and Its Effectiveness in Extraction of Bilingual Word Pairs from Parallel Corpora

Echizen-ya, Hiroshi; Araki, Kenji; Momouchi, Yoshio

doi:10.1007/11428817_34

Hiroshi Echizen-ya¹⁹,
Kenji Araki²⁰ &
Yoshio Momouchi¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3513))

Included in the following conference series:

International Conference on Application of Natural Language to Information Systems

1361 Accesses

Abstract

We propose a learning method for solving the sparse data problem in automatic extraction of bilingual word pairs from parallel corpora. In general, methods based on similarity measures are insufficient because of the sparse data problem. The essence of our method is the use of this inference: in local parts of bilingual sentence pairs (e.g., phrases, not sentences), the equivalents of words that adjoin the source language words of bilingual word pairs also adjoin the target language words of bilingual word pairs. Our learning method automatically acquires such adjacent information. The acquired adjacent information is used to extract bilingual word pairs. As a result, our system can limit the search scope for the decision of equivalents in bilingual sentence pairs by extracting only word pairs that adjoin the acquired adjacent information. We applied our method to two systems based on Yates’ χ ² and AIC. Results of evaluation experiments indicate that the extraction rates respectively improved 6.1 and 6.0 percentage points using our method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
MATH Google Scholar
Echizen-ya, H., Araki, K., Momouchi, Y., Tochinai, K.: Study of Practical Effectiveness for Machine Translation Using Recursive Chain-link-type Learning. In: Proceedings of COLING 2002, pp. 246–252 (2002)
Google Scholar
Hisamitsu, T., Niwa, Y.: Topic-Word Selection Based on Combinatorial Probability. In: NLPRS 2001, pp. 289–296 (2001)
Google Scholar
Akaike, H.: A New Look at the Statistical Model Identification. IEEE Transactions on Automatic Control AC-19, 716–723 (1974)
Article MathSciNet Google Scholar
Utsuro, T., Hino, K., Kida, M.: Integrating Cross-Lingually Relevant News Articles and Monolingual Web Documents in Bilingual Lexicon Acquisition. In: Proceedings of COLING 2004, pp. 1036–1042 (2004)
Google Scholar
Kaji, H., Aizono, T.: Extracting Word Correspondences from Bilingual Corpora Based on Word Co-occurrence Information. In: Proceedings of COLING 1996, pp. 23–28 (1996)
Google Scholar
McTait, K., Trujillo, A.: A Language-Neutral Sparse-Data Algorithm for Extracting Translation Patterns. In: Proceedings of TMI 1999, pp. 98–108 (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Electronics and Information, Hokkai-Gakuen University, S26-Jo, W11-Chome, Chuo-ku, Sapporo, 064-0926, Japan
Hiroshi Echizen-ya & Yoshio Momouchi
Graduate School of Information Science and Technology, Hokkaido University, N14-Jo, W9-Chome, Kita-ku, Sapporo, 060-0814, Japan
Kenji Araki

Authors

Hiroshi Echizen-ya
View author publications
You can also search for this author in PubMed Google Scholar
Kenji Araki
View author publications
You can also search for this author in PubMed Google Scholar
Yoshio Momouchi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Software and Computing Systems, University of Alicante, Spain
Andrés Montoyo
Grupo de investigación del Procesamiento del Lenguaje y Sistemas de Información, Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante, Alicante, Spain
Rafael Muńoz
Lab. CEDRIC, CNAM, Paris, France
Elisabeth Métais

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Echizen-ya, H., Araki, K., Momouchi, Y. (2005). Automatic Acquisition of Adjacent Information and Its Effectiveness in Extraction of Bilingual Word Pairs from Parallel Corpora. In: Montoyo, A., Muńoz, R., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2005. Lecture Notes in Computer Science, vol 3513. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11428817_34

Download citation

DOI: https://doi.org/10.1007/11428817_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26031-8
Online ISBN: 978-3-540-32110-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics