Abstract
A quantitative representation of discourse structure can be computed by measuring lexical cohesion relations among adjacent text elements. These representations have previously been proposed to deal with sub-topic text segmentation. In a parallel corpus, similar representations can be derived for versions of a text in various languages. These can be used for parallel segmentation and as an alternative measure of text-translation similarity 1.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Brown, P., Lai, J. and Mercer, R. 1991. Aligning sentences in parallel corpora. In Proceedings of the 29th Annual Meeting of the Association for Computational. Linguistics, pp. 169–176.
Digital Equipment Corporation. 1993. Digital Extended Math Library for DEC OSF/1 AXP, August 1993. Maynard, Massachusetts.
Gale, W. and Church, K. 1993. A program for aligning sentences in bilingual corpora. Computational Linguistics, 19 (1): 75–102.
Gale, W., Church, K. and Yarowsky, D. 1992. Using bilingual materials to develop word sense disambiguation methods. In Fourth International Conference on Theoretical and Methodological Issues in Machine Translation,pp. 101–112, Montréal.
Hearst, M. 1993. TextTiling: a quantitative approach to discourse segmentation.
Technical Report 93/24, Project Sequoia, University of California, Berkeley. Morris, J. and Hirst, G. 1991. Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Computational Linguistics, 17 (1): 21–48.
O’Shaughnessy, D. 1987. Speech Communication. Addison-Wesley.
Salton, G. and McGill, M. 1983. Introduction to Modern Structured Information Retrieval. McGraw-Hill.
Alphen, P. 1992. HMM-based continuous-speech recognition. Ph.D. thesis, Universiteit van Amsterdam.
Eijk, P. 1993. Automating the acquisition of bilingual terminology. In Sixth Conference of the European Chapter of the Association for Computational Linguistics, pp. 113–119.
Yarowsky, D. 1992. Word-sense disambiguation using statistical models of Roget’s categories trained on large corpora. In Proceedings of the 14th International Conference on Computational Linguistics (COLING), pp. 454–460.
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Van Der Eijk, P. (1999). Comparative Discourse Analysis of Parallel Texts. In: Armstrong, S., Church, K., Isabelle, P., Manzi, S., Tzoukermann, E., Yarowsky, D. (eds) Natural Language Processing Using Very Large Corpora. Text, Speech and Language Technology, vol 11. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-2390-9_16
Download citation
DOI: https://doi.org/10.1007/978-94-017-2390-9_16
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-5349-7
Online ISBN: 978-94-017-2390-9
eBook Packages: Springer Book Archive