Skip to main content

Comparative Discourse Analysis of Parallel Texts

  • Chapter
Book cover Natural Language Processing Using Very Large Corpora

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 11))

  • 375 Accesses

Abstract

A quantitative representation of discourse structure can be computed by measuring lexical cohesion relations among adjacent text elements. These representations have previously been proposed to deal with sub-topic text segmentation. In a parallel corpus, similar representations can be derived for versions of a text in various languages. These can be used for parallel segmentation and as an alternative measure of text-translation similarity 1.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Brown, P., Lai, J. and Mercer, R. 1991. Aligning sentences in parallel corpora. In Proceedings of the 29th Annual Meeting of the Association for Computational. Linguistics, pp. 169–176.

    Google Scholar 

  • Digital Equipment Corporation. 1993. Digital Extended Math Library for DEC OSF/1 AXP, August 1993. Maynard, Massachusetts.

    Google Scholar 

  • Gale, W. and Church, K. 1993. A program for aligning sentences in bilingual corpora. Computational Linguistics, 19 (1): 75–102.

    Google Scholar 

  • Gale, W., Church, K. and Yarowsky, D. 1992. Using bilingual materials to develop word sense disambiguation methods. In Fourth International Conference on Theoretical and Methodological Issues in Machine Translation,pp. 101–112, Montréal.

    Google Scholar 

  • Hearst, M. 1993. TextTiling: a quantitative approach to discourse segmentation.

    Google Scholar 

  • Technical Report 93/24, Project Sequoia, University of California, Berkeley. Morris, J. and Hirst, G. 1991. Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Computational Linguistics, 17 (1): 21–48.

    Google Scholar 

  • O’Shaughnessy, D. 1987. Speech Communication. Addison-Wesley.

    Google Scholar 

  • Salton, G. and McGill, M. 1983. Introduction to Modern Structured Information Retrieval. McGraw-Hill.

    Google Scholar 

  • Alphen, P. 1992. HMM-based continuous-speech recognition. Ph.D. thesis, Universiteit van Amsterdam.

    Google Scholar 

  • Eijk, P. 1993. Automating the acquisition of bilingual terminology. In Sixth Conference of the European Chapter of the Association for Computational Linguistics, pp. 113–119.

    Google Scholar 

  • Yarowsky, D. 1992. Word-sense disambiguation using statistical models of Roget’s categories trained on large corpora. In Proceedings of the 14th International Conference on Computational Linguistics (COLING), pp. 454–460.

    Google Scholar 

Download references

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Van Der Eijk, P. (1999). Comparative Discourse Analysis of Parallel Texts. In: Armstrong, S., Church, K., Isabelle, P., Manzi, S., Tzoukermann, E., Yarowsky, D. (eds) Natural Language Processing Using Very Large Corpora. Text, Speech and Language Technology, vol 11. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-2390-9_16

Download citation

  • DOI: https://doi.org/10.1007/978-94-017-2390-9_16

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-90-481-5349-7

  • Online ISBN: 978-94-017-2390-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics