Skip to main content

Japanese-English aligned bilingual corpora

  • Chapter
Parallel Text Processing

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 13))

Abstract

This chapter describes the bilingual corpora developed in Japan. First, we discuss problems of corpus development and some corpora which are, or will be, available in Japan. Next, we describe the bilingual corpus project of JEIDA (Japan Electronics Industry Development Association). The main purpose of this project is to develop a medium-sized aligned parallel corpus of English and Japanese. Also through this project, we are able to discuss various facets involved in the development of a bilingual corpus, to do research on the alignment of Japanese and English sentences and to investigate automatic acquisition of linguistic knowledge using the developed corpus. This chapter offers an overview of the automatic alignment system developed by NTT (Nippon Telegram and Telephone Co. Ltd.), which includes the entire alignment algorithm in detail. It also describes the graphical alignment environment BACCS in which the user can see the alignment results, and easily modify the results and the user dictionary.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Bonhomme, P. (Ed.) (1995). LINGUA Information and Technical Aspect. Internal report. Laboratoire Loria, Nancy, France.

    Google Scholar 

  • Brill, E. (1992). A simple rule-based part of speech tagger. Proceedings of the Third Conference on Applied Natural Language Processing (ANLP’92), Trento, 152–155.

    Google Scholar 

  • Brill, E. (1994). Some advances in transformation-based part of speech tagging, Proceedings of the Twefth National Conference on Artificial Intelligence (AAAI’94), Seattle, Washington, 722–727.

    Google Scholar 

  • Brown, P. F., Lai, J. C. and Mercer, R. L. (1991). Aligning Sentences in Parallel Corpora, Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics. Berkeley, 169–176.

    Google Scholar 

  • Brown, P. F., Della Pietra, S., Della Pietra, V. J. and Mercer, R. L. (1993). The mathematics of statistical machine translation: parameter estimation. Computational Linguistics, 19(2), 263311.

    Google Scholar 

  • Bumard, L. and Sperberg-McQueen, C. M. (1995). TEI Lite: An Introduction to Text Encoding for Interchange. [Online] Available: http://sable.ox.ac.uk/ota/teilite.

    Google Scholar 

  • Chen, S. (1993). Aligning sentences in bilingual corpora using lexical information. Proceedings of the 31“ Annual Conference of the Association for Computational Linguistics, Columbus, Ohio, 9–16.

    Google Scholar 

  • Church, K. W. (1993). Char_align: a program for aligning parallel texts at the character level. Proceedings of the 31“ Annual Meeting of the Association for Computational Linguistics, Columbus, Ohio, 1–8.

    Google Scholar 

  • Dagan, I. and Church, K. W. (1994). Termight: identifying and translating technical terminology. Proceedings of the 4’ h Conference on Applied Natural Language Processing (ANLP ‘84), University of Stuttgart, Germany, 34–40.

    Google Scholar 

  • Fung, P. and Church, K. W. (1994). K-vec: A new approach for aligning parallel texts, Proceedings of the 15th International Conference on Computational Linguistics (COLING ‘84), Kyoto, 1096–1102.

    Google Scholar 

  • Gale, W. A. and Church, K. W. (1993). A program for aligning sentences in bilingual corpora. Computational Linguistics, 19 (3), 75–102.

    Google Scholar 

  • Haruno, M., Ikehara, S. and Yamazaki, T. (1996). Learning bilingual collocations by word-level sorting, Proceedings of the 16th International Conference on Computational Linguistics (COLING’96), Copenhagen, 525–530.

    Google Scholar 

  • Haruno, M. and Yamazaki, T. (1997). High-performance bilingual text alignment using statistical and dictionary information, Natural Language Engineering, 3 (1), 1–14.

    Article  Google Scholar 

  • Isahara, H. (1995). JEIDA’s Test-Sets for Quality Evaluation of MT Systems — Technical Evaluation from the Developer’s Point of View, Proceedings of the Fifth Machine Translation Summit, MT Summit V, Luxembourg [no page numbers in original].

    Google Scholar 

  • Isahara, H (1998). JEIDA’s English-Japanese Bilingual Corpus Project, Proceedings of the First International Conference on Language Resources and Evaluation, Granada, Spain, 471–474.

    Google Scholar 

  • Kaji, H., Kida, Y. and Morimoto, Y. (1992). Learning translation templates from bilingual texts, Proceedings of the 14th International Conference on Computational Linguistics (COLING’92), Nantes, 672–678.

    Google Scholar 

  • Kay, M. and Röscheisen, M. (1993). Text-translation alignment. Computational Linguistics, 19 (1), 121–142.

    Google Scholar 

  • Kumano, A. and Hirakawa, H. (1994). Building an MT dictionary from parallel texts based on linguistic and statistical information. Proceedings of 15th International Conference on Computational Linguistics (COLING’94), Kyoto, 76–81.

    Google Scholar 

  • Kupiec, J. (1993). An algorithm for finding noun phrase correspondences in bilingual corpora. Proceedings of the 31 S ` Annual Meeting of the Association for Computational Linguistics, Columbus, Ohio, 17–22.

    Google Scholar 

  • Kurohashi S., Nakamura, T., Matsumoto, Y. and Nagao, M. (1994). Improvements of Japanese morphological analyzer Juman. Proceeding of International Workshop on Sharable Natural Language Resources, Nara, Japan, 22–28.

    Google Scholar 

  • Maler, E., El Andaloussi, J. (1996). Developing SGML DTDs From Text to Model to Markup,Prentice Hall PTR.

    Google Scholar 

  • Matsumoto, Y., Ishimoto, H. and Utsuro, T. (1993). Structural matching of parallel texts. Proceedings of the 31st Annual Meeting of the Association for Computational Liguistics, Columbus, Ohio, 23–30.

    Google Scholar 

  • Sato, S. and Nagao, M. (1990). Toward memory-based translation. Proceedings of the 12th Interna- tional Conference on Computational Linguistics, COLING’90, Helsinki, Finland, 247–252.

    Google Scholar 

  • Smadja, F. A. and McKeown, K. R. (1993). Translating collections for use in bilingual lexicons. Proceedings of the Human Language Technology Workshop, Plainsboro, NJ, 152–156.

    Google Scholar 

  • Takahashi, Y., Shirai, S. and Bond, F. (1997). A method for automatically aligning Japanese and English articles, Proceedings of the Natural Language Processing Pacific rim Symposium 1997 (NLPRS’97), Phuket, Thailand, 657–660.

    Google Scholar 

  • Wu, D. (1994). Aligning a Parallel English-Chinese Corpus Statistically with Lexical Criteria. Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics. Las Cruces, 80–87.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Isahara, H., Haruno, M. (2000). Japanese-English aligned bilingual corpora. In: Véronis, J. (eds) Parallel Text Processing. Text, Speech and Language Technology, vol 13. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-2535-4_16

Download citation

  • DOI: https://doi.org/10.1007/978-94-017-2535-4_16

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-90-481-5555-2

  • Online ISBN: 978-94-017-2535-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics