International Conference on Knowledge Engineering and the Semantic Web

Knowledge Engineering and Semantic Web pp 3-15

Subtopic Segmentation of Scientific Texts: Parameter Optimisation

  • Natalia Avdeeva
  • Galina Artemova
  • Kirill Boyarsky
  • Natalia Gusarova
  • Natalia Dobrenko
  • Eugeny Kanevsky
Conference paper

DOI: 10.1007/978-3-319-24543-0_1

Part of the Communications in Computer and Information Science book series (CCIS, volume 518)
Cite this paper as:
Avdeeva N., Artemova G., Boyarsky K., Gusarova N., Dobrenko N., Kanevsky E. (2015) Subtopic Segmentation of Scientific Texts: Parameter Optimisation. In: Klinov P., Mouromtsev D. (eds) Knowledge Engineering and Semantic Web. Communications in Computer and Information Science, vol 518. Springer, Cham

Abstract

Information research within a scientific text needs to deal with the problem of automatic document partition on subtopics by taking text specifics and user purposes into account. This task is important for primary source selection, for working with texts in foreign languages or for getting acquainted with research problems. This paper is focused on the application of subtopic segmentation algorithms to real-life scientific texts. For studying this we use monographs on the same subject written in three languages. The corpus includes several original and professionally trasnlated fragments. The research is based on the TextTiling algorithm that analyses how tightly adjoining parts of the text cohere. We examine how some parameters (the cutoff rate, the size of moving window and of the shift from one block to the next one) influence the segmentation quality and define the optimal combinations of these parameters for several languages. The studies on Russian suggest that external lexical resources notably improve the segmentation quality.

Keywords

Text tiling Classification Parsing Segmentation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Natalia Avdeeva
    • 1
  • Galina Artemova
    • 2
  • Kirill Boyarsky
    • 2
  • Natalia Gusarova
    • 2
  • Natalia Dobrenko
    • 2
  • Eugeny Kanevsky
    • 1
  1. 1.Saint Petersburg Institute for Economics and MathematicsRussian Academy of SciencesSaint PetersburgRussia
  2. 2.Saint Petersburg National Research University of Information Technologies, Mechanics and Optics (ITMO University)Saint PetersburgRussia

Personalised recommendations