Abstract
The automatic detection of appropriate subtopic boundaries in a document is a difficult and very useful task in text processing. Some methods have tried to solve this problem, several of them have had favorable results, but they have presented some drawbacks as well. Besides, several of these solutions are application domain dependant. In this work we propose a new algorithm which uses a window below the paragraphs to measure the lexical cohesion to detect subtopics in scientific papers. We compare our method against two algorithms that use the lexical cohesion too. In this comparison we notice that our method has a good performance and outperforms the other two algorithms.
Chapter PDF
Similar content being viewed by others
References
Angheluta, R., Busser, R., Moens, M.F.: The Use of Topic Segmentation for Automatic Summarization. In: Proceedings of the ACL-2002, Post-Conference Workshop on Automatic Summarization (2002)
Beeferman, D., Berger, A., Lafferty, J.: Text segmentation using exponential models. In: Proceedings of the Second Conference on Empirical Methods in Natural Language Processing, pp. 35–46 (1997)
Bolshakov, I.A., Gelbukh, A.: Text segmentation into paragraphs based on local text cohesion. In: Proceedings of the 4th International Conference on Text, Speech and Dialogue, Lecture Notes in Artificial Intelligence, pp. 158–166 (2001)
Halliday, M.A.K., Hasan, R.: Cohesion in English. Longman Group, New York (1976)
Hearst, M.A.: TextTiling: Segmenting Text into Multi-paragraph Subtopic Passages. In: Computational Linguistics, vol. 23(1) (1997)
Heinonen, O.: Optimal Multi-Paragraph Text Segmentation by Dynamic Programming. In: Proceedings of COLING-ACL 1998, Montreal, Canada, Cite as: arXiv:cs/9812005v1 [cs.CL] pp. 1484–1486 (1998)
Pevzner, L., Hearst, M.A.: A Critique and Improvement of an Evaluation Metric for Text Segmentation. In: Computational Linguistics, vol. 16(1) (2000)
Ponte, J.M., Bruce Croft, W.: Text segmentation by topic. In: Peters, C., Thanos, C. (eds.) ECDL 1997. LNCS, vol. 1324, pp. 113–125. Springer, Heidelberg (1997)
Reynar, J.C.: An automatic method of finding topic boundaries. In: Proceedings of the 32nd annual meeting on Association for Computational Linguistics, Las Cruces, New Mexico, Student session, Cite as arXiv:cmp-lg/9406017 v1 7 Jun 1994 cmp-lg/9406017 7 Jun 1994 (1994) pp. 331–333 (1994)
Reynar, J.C.: Topic Segmentation: Algorithms and Applications. Thesis Doctoral, Presented to the Faculties of the University of Pennsylvania, Pennsylvania (1998)
Reynar, J.C.: Statistical Models for Topic Segmentation. In: Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics, pp. 357–364 (1999) ISBN:1-55860-609-3
Stokes, N., Carthy, J., Smeaton, A-F.: SeLeCT: A Lexical Cohesion Based News Story Segmentation System. AI Communications 17(1), 3–12 (2004)
Stokes, N.: Applications of Lexical Cohesion Analysis in the Topic Detection and Tracking Domain. Thesis Doctoral, Department of Computer Science Faculty of Science, National University of Ireland, Dublin (2004)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rojas, L.H., Pagola, J.E.M. (2007). TextLec: A Novel Method of Segmentation by Topic Using Lower Windows and Lexical Cohesion. In: Rueda, L., Mery, D., Kittler, J. (eds) Progress in Pattern Recognition, Image Analysis and Applications. CIARP 2007. Lecture Notes in Computer Science, vol 4756. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-76725-1_75
Download citation
DOI: https://doi.org/10.1007/978-3-540-76725-1_75
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-76724-4
Online ISBN: 978-3-540-76725-1
eBook Packages: Computer ScienceComputer Science (R0)