Skip to main content

Combining Sentence Length with Location Information to Align Monolingual Parallel Texts

  • Conference paper
Book cover Information Retrieval Technology (AIRS 2004)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3411))

Included in the following conference series:

  • 415 Accesses

Abstract

Abundant Chinese paraphrasing resource on Internet can be attained from different Chinese translations of one foreign masterpiece. Paraphrases corpus is the corpus that includes sentence pairs to convey the same information. The irregular characteristics of the real monolingual parallel texts, especially without the strictly aligned paragraph boundaries between two translations, bring a challenge to alignment technology. The traditional alignment methods on bilingual texts have some difficulties in competency for doing this. A new method for aligning real monolingual parallel texts using sentence pair’s length and location information is described in this paper. The model was motivated by the observation that the location of a sentence pair with certain length is distributed in the whole text similarly. And presently, a paraphrases corpus with about fifty thousand sentence pairs is constructed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Barzilay, R., McKeown, K.: Extracting paraphrases from a parallel corpus. In: Meeting of the Association for Computational Linguistics, pp. 50–57 (2001)

    Google Scholar 

  2. Lin, D., Pantel, P.: Discovery of inference rules for question answering. Natural Language Engineering 1 (2001)

    Google Scholar 

  3. Rinaldi, F., Dowdall, J., Kaljurand, K., Hess, M., Mollá, D.: Exploiting paraphrases in a question answering system. In: Inui, K., Hermjakob, U. (eds.) Proceedings of the Second International Workshop on Paraphrasing, pp. 25–32 (2003)

    Google Scholar 

  4. France, F.D.: Learning paraphrases to improve a question-answering system. EACL-Natural Language Processing for Question Answering (2003)

    Google Scholar 

  5. Tomuro, N.: Interrogative reformulation patterns and acquisition of question paraphrases. In: Inui, K., Hermjakob, U. (eds.) Proceedings of the Second International Workshop on Paraphrasing, pp. 33–40 (2003)

    Google Scholar 

  6. Takahashi, T., Nawata, K., Kouda, S., Inui, K., Matsumoto, Y.: Effects of structural matching and paraphrasing in question answering. IEICE Transactions on Information and Syste (2003)

    Google Scholar 

  7. Shinyama, Y., Sekine, S.: Paraphrase acquisition for information extraction. In: Inui, K., Hermjakob, U. (eds.) Proceedings of the Second International Workshop on Paraphrasing, pp. 65–71 (2003)

    Google Scholar 

  8. Kanayama, H.: Paraphrasing rules for automatic evaluation of translation into Japanese. In: Inui, K., Hermjakob, U. (eds.) Proceedings of the Second International Workshop on Paraphrasing, pp. 88–93 (2003)

    Google Scholar 

  9. Jacquemin, C.: Syntagmatic and paradigmatic representations of term variation. In: 37th Annual Meeting of the Association for Computational Linguistics (ACL 1999), Proceedings, Maryland, pp. 341–348 (1999)

    Google Scholar 

  10. Barzilay, R., Elhadad, N., McKeown, K.R.: Inferring strategies for sentence ordering in multidocument news summarization. Journal of Artificial Intelligence Research 17, 35–55 (2002)

    MATH  Google Scholar 

  11. Shinyama, Y., Sekine, S., Sudo, K., Grishman, R.: Automatic paraphrase acquisition from news articles (2002)

    Google Scholar 

  12. Brown, P.F., Lai, J.C., Mercer, R.L.: Aligning sentences in parallel corpora. In: Meeting of the Association for Computational Linguistics, pp. 169–176 (1991)

    Google Scholar 

  13. Gale, W.A., Church, K.W.: A program for aligning sentences in bilingual corpora. Computational Linguistics 19, 75–102 (1993)

    Google Scholar 

  14. Simard, M., Foster, G.F., Isabelle, P.: Using cognates to align sentences in bilingual corpora. In: Proc. of the Fourth International Conference on Theoretical and Methodological Issues in Machine Translation: Empiricist vs. Rationalist Methods in MT, Montreal, Canada, pp. 67–81 (1992)

    Google Scholar 

  15. Wu, D.: Aligning a parallel english-chinese corpus statistically with lexical criteria. In: Meeting of the Association for Computational Linguistics, pp. 80–87 (1994)

    Google Scholar 

  16. Church, K.W.: Char_align: A program for aligning parallel texts at the character level. In: ACL 1993, pp. 1–8 (1993)

    Google Scholar 

  17. Chen, S.F.: Aligning sentences in bilingual corpora using lexical information. In: Meeting of the Association for Computational Linguistics, pp. 9–16 (1993)

    Google Scholar 

  18. Pascale, F., Mckeown, K.: Aligning noisy parallel corpora across language groups: Word pair feature matching by dynamic time warping (1994)

    Google Scholar 

  19. Bin, W., Qin, L., Xiang, Z.: Automatic chinese-english paragraph segmentation and alignment. Journal of Software 11, 1547–1553 (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Li, W., Liu, T., Li, S. (2005). Combining Sentence Length with Location Information to Align Monolingual Parallel Texts. In: Myaeng, S.H., Zhou, M., Wong, KF., Zhang, HJ. (eds) Information Retrieval Technology. AIRS 2004. Lecture Notes in Computer Science, vol 3411. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-31871-2_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-31871-2_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-25065-4

  • Online ISBN: 978-3-540-31871-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics