TextLec: A Novel Method of Segmentation by Topic Using Lower Windows and Lexical Cohesion

  • Laritza Hernández Rojas
  • José E. Medina Pagola
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4756)


The automatic detection of appropriate subtopic boundaries in a document is a difficult and very useful task in text processing. Some methods have tried to solve this problem, several of them have had favorable results, but they have presented some drawbacks as well. Besides, several of these solutions are application domain dependant. In this work we propose a new algorithm which uses a window below the paragraphs to measure the lexical cohesion to detect subtopics in scientific papers. We compare our method against two algorithms that use the lexical cohesion too. In this comparison we notice that our method has a good performance and outperforms the other two algorithms.


Text processing Segmentation by topic Lexical cohesion 


  1. 1.
    Angheluta, R., Busser, R., Moens, M.F.: The Use of Topic Segmentation for Automatic Summarization. In: Proceedings of the ACL-2002, Post-Conference Workshop on Automatic Summarization (2002)Google Scholar
  2. 2.
    Beeferman, D., Berger, A., Lafferty, J.: Text segmentation using exponential models. In: Proceedings of the Second Conference on Empirical Methods in Natural Language Processing, pp. 35–46 (1997)Google Scholar
  3. 3.
    Bolshakov, I.A., Gelbukh, A.: Text segmentation into paragraphs based on local text cohesion. In: Proceedings of the 4th International Conference on Text, Speech and Dialogue, Lecture Notes in Artificial Intelligence, pp. 158–166 (2001)Google Scholar
  4. 4.
    Halliday, M.A.K., Hasan, R.: Cohesion in English. Longman Group, New York (1976)Google Scholar
  5. 5.
    Hearst, M.A.: TextTiling: Segmenting Text into Multi-paragraph Subtopic Passages. In: Computational Linguistics, vol. 23(1) (1997)Google Scholar
  6. 6.
    Heinonen, O.: Optimal Multi-Paragraph Text Segmentation by Dynamic Programming. In: Proceedings of COLING-ACL 1998, Montreal, Canada, Cite as: arXiv:cs/9812005v1 [cs.CL] pp. 1484–1486 (1998)Google Scholar
  7. 7.
    Pevzner, L., Hearst, M.A.: A Critique and Improvement of an Evaluation Metric for Text Segmentation. In: Computational Linguistics, vol. 16(1) (2000)Google Scholar
  8. 8.
    Ponte, J.M., Bruce Croft, W.: Text segmentation by topic. In: Peters, C., Thanos, C. (eds.) ECDL 1997. LNCS, vol. 1324, pp. 113–125. Springer, Heidelberg (1997)CrossRefGoogle Scholar
  9. 9.
    Reynar, J.C.: An automatic method of finding topic boundaries. In: Proceedings of the 32nd annual meeting on Association for Computational Linguistics, Las Cruces, New Mexico, Student session, Cite as arXiv:cmp-lg/9406017 v1 7 Jun 1994 cmp-lg/9406017 7 Jun 1994 (1994) pp. 331–333 (1994)Google Scholar
  10. 10.
    Reynar, J.C.: Topic Segmentation: Algorithms and Applications. Thesis Doctoral, Presented to the Faculties of the University of Pennsylvania, Pennsylvania (1998)Google Scholar
  11. 11.
    Reynar, J.C.: Statistical Models for Topic Segmentation. In: Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics, pp. 357–364 (1999) ISBN:1-55860-609-3Google Scholar
  12. 12.
    Stokes, N., Carthy, J., Smeaton, A-F.: SeLeCT: A Lexical Cohesion Based News Story Segmentation System. AI Communications 17(1), 3–12 (2004)zbMATHMathSciNetGoogle Scholar
  13. 13.
    Stokes, N.: Applications of Lexical Cohesion Analysis in the Topic Detection and Tracking Domain. Thesis Doctoral, Department of Computer Science Faculty of Science, National University of Ireland, Dublin (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Laritza Hernández Rojas
    • 1
  • José E. Medina Pagola
    • 1
  1. 1.Advanced Technologies Application Centre (CENATAV), 7a #21812 e/ 218 y 222, Rpto. Siboney, Playa. C.P. 12200, C. HabanaCuba

Personalised recommendations