Text Segmentation into Paragraphs Based on Local Text Cohesion

  • Igor A. Bolshakov
  • Alexander Gelbukh
Conference paper

DOI: 10.1007/3-540-44805-5_20

Part of the Lecture Notes in Computer Science book series (LNCS, volume 2166)
Cite this paper as:
Bolshakov I.A., Gelbukh A. (2001) Text Segmentation into Paragraphs Based on Local Text Cohesion. In: Matoušek V., Mautner P., Mouček R., Taušer K. (eds) Text, Speech and Dialogue. TSD 2001. Lecture Notes in Computer Science, vol 2166. Springer, Berlin, Heidelberg

Abstract

The problem of automatic text segmentation is subcategorized into two different problems: thematic segmentation into rather large topically self-contained sections and splitting into paragraphs, i.e., lexico-grammatical segmentation of lower level. In this paper we consider the latter problem. We propose a method of reasonably splitting text into paragraph based on a text cohesion measure. Specifically, we propose a method of quantitative evaluation of text cohesion based on a large linguistic resource - a collocation network. At each step, our algorithm compares word occurrences in a text against a large DB of collocations and semantic links between words in the given natural language. The procedure consists in evaluation of the cohesion function, its smoothing, normalization, and comparing with a specially constructed threshold.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Igor A. Bolshakov
    • 1
  • Alexander Gelbukh
    • 1
  1. 1.Center for Computing Research (CIC)National Polytechnic Institute (IPN)Mexico CityMexico

Personalised recommendations