Japanese-English aligned bilingual corpora

Isahara, Hitoshi; Haruno, Masahiko

doi:10.1007/978-94-017-2535-4_16

Hitoshi Isahara⁴ &
Masahiko Haruno⁵

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 13))

253 Accesses
1 Citations

Abstract

This chapter describes the bilingual corpora developed in Japan. First, we discuss problems of corpus development and some corpora which are, or will be, available in Japan. Next, we describe the bilingual corpus project of JEIDA (Japan Electronics Industry Development Association). The main purpose of this project is to develop a medium-sized aligned parallel corpus of English and Japanese. Also through this project, we are able to discuss various facets involved in the development of a bilingual corpus, to do research on the alignment of Japanese and English sentences and to investigate automatic acquisition of linguistic knowledge using the developed corpus. This chapter offers an overview of the automatic alignment system developed by NTT (Nippon Telegram and Telephone Co. Ltd.), which includes the entire alignment algorithm in detail. It also describes the graphical alignment environment BACCS in which the user can see the alignment results, and easily modify the results and the user dictionary.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bonhomme, P. (Ed.) (1995). LINGUA Information and Technical Aspect. Internal report. Laboratoire Loria, Nancy, France.
Google Scholar
Brill, E. (1992). A simple rule-based part of speech tagger. Proceedings of the Third Conference on Applied Natural Language Processing (ANLP’92), Trento, 152–155.
Google Scholar
Brill, E. (1994). Some advances in transformation-based part of speech tagging, Proceedings of the Twefth National Conference on Artificial Intelligence (AAAI’94), Seattle, Washington, 722–727.
Google Scholar
Brown, P. F., Lai, J. C. and Mercer, R. L. (1991). Aligning Sentences in Parallel Corpora, Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics. Berkeley, 169–176.
Google Scholar
Brown, P. F., Della Pietra, S., Della Pietra, V. J. and Mercer, R. L. (1993). The mathematics of statistical machine translation: parameter estimation. Computational Linguistics, 19(2), 263311.
Google Scholar
Bumard, L. and Sperberg-McQueen, C. M. (1995). TEI Lite: An Introduction to Text Encoding for Interchange. [Online] Available: http://sable.ox.ac.uk/ota/teilite.
Google Scholar
Chen, S. (1993). Aligning sentences in bilingual corpora using lexical information. Proceedings of the 31“ Annual Conference of the Association for Computational Linguistics, Columbus, Ohio, 9–16.
Google Scholar
Church, K. W. (1993). Char_align: a program for aligning parallel texts at the character level. Proceedings of the 31“ Annual Meeting of the Association for Computational Linguistics, Columbus, Ohio, 1–8.
Google Scholar
Dagan, I. and Church, K. W. (1994). Termight: identifying and translating technical terminology. Proceedings of the 4’ h Conference on Applied Natural Language Processing (ANLP ‘84), University of Stuttgart, Germany, 34–40.
Google Scholar
Fung, P. and Church, K. W. (1994). K-vec: A new approach for aligning parallel texts, Proceedings of the 15th International Conference on Computational Linguistics (COLING ‘84), Kyoto, 1096–1102.
Google Scholar
Gale, W. A. and Church, K. W. (1993). A program for aligning sentences in bilingual corpora. Computational Linguistics, 19 (3), 75–102.
Google Scholar
Haruno, M., Ikehara, S. and Yamazaki, T. (1996). Learning bilingual collocations by word-level sorting, Proceedings of the 16th International Conference on Computational Linguistics (COLING’96), Copenhagen, 525–530.
Google Scholar
Haruno, M. and Yamazaki, T. (1997). High-performance bilingual text alignment using statistical and dictionary information, Natural Language Engineering, 3 (1), 1–14.
Article Google Scholar
Isahara, H. (1995). JEIDA’s Test-Sets for Quality Evaluation of MT Systems — Technical Evaluation from the Developer’s Point of View, Proceedings of the Fifth Machine Translation Summit, MT Summit V, Luxembourg [no page numbers in original].
Google Scholar
Isahara, H (1998). JEIDA’s English-Japanese Bilingual Corpus Project, Proceedings of the First International Conference on Language Resources and Evaluation, Granada, Spain, 471–474.
Google Scholar
Kaji, H., Kida, Y. and Morimoto, Y. (1992). Learning translation templates from bilingual texts, Proceedings of the 14th International Conference on Computational Linguistics (COLING’92), Nantes, 672–678.
Google Scholar
Kay, M. and Röscheisen, M. (1993). Text-translation alignment. Computational Linguistics, 19 (1), 121–142.
Google Scholar
Kumano, A. and Hirakawa, H. (1994). Building an MT dictionary from parallel texts based on linguistic and statistical information. Proceedings of 15th International Conference on Computational Linguistics (COLING’94), Kyoto, 76–81.
Google Scholar
Kupiec, J. (1993). An algorithm for finding noun phrase correspondences in bilingual corpora. Proceedings of the 31 S ` Annual Meeting of the Association for Computational Linguistics, Columbus, Ohio, 17–22.
Google Scholar
Kurohashi S., Nakamura, T., Matsumoto, Y. and Nagao, M. (1994). Improvements of Japanese morphological analyzer Juman. Proceeding of International Workshop on Sharable Natural Language Resources, Nara, Japan, 22–28.
Google Scholar
Maler, E., El Andaloussi, J. (1996). Developing SGML DTDs From Text to Model to Markup,Prentice Hall PTR.
Google Scholar
Matsumoto, Y., Ishimoto, H. and Utsuro, T. (1993). Structural matching of parallel texts. Proceedings of the 31st Annual Meeting of the Association for Computational Liguistics, Columbus, Ohio, 23–30.
Google Scholar
Sato, S. and Nagao, M. (1990). Toward memory-based translation. Proceedings of the 12th Interna- tional Conference on Computational Linguistics, COLING’90, Helsinki, Finland, 247–252.
Google Scholar
Smadja, F. A. and McKeown, K. R. (1993). Translating collections for use in bilingual lexicons. Proceedings of the Human Language Technology Workshop, Plainsboro, NJ, 152–156.
Google Scholar
Takahashi, Y., Shirai, S. and Bond, F. (1997). A method for automatically aligning Japanese and English articles, Proceedings of the Natural Language Processing Pacific rim Symposium 1997 (NLPRS’97), Phuket, Thailand, 657–660.
Google Scholar
Wu, D. (1994). Aligning a Parallel English-Chinese Corpus Statistically with Lexical Criteria. Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics. Las Cruces, 80–87.
Google Scholar

Download references

Author information

Authors and Affiliations

Communications Research Laboratory, Japan
Hitoshi Isahara
ATR Human Information Processing Research Laboratories, Japan
Masahiko Haruno

Authors

Hitoshi Isahara
View author publications
You can also search for this author in PubMed Google Scholar
Masahiko Haruno
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Université de Provence and CNRS, 29, Avenue Robert Schuman, 13100, Aix-en-Provence, France
Jean Véronis

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Isahara, H., Haruno, M. (2000). Japanese-English aligned bilingual corpora. In: Véronis, J. (eds) Parallel Text Processing. Text, Speech and Language Technology, vol 13. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-2535-4_16

Download citation

DOI: https://doi.org/10.1007/978-94-017-2535-4_16
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-5555-2
Online ISBN: 978-94-017-2535-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics