Abstract
This chapter introduces the core librarian-inspired concepts and text analysis techniques needed for the realisation of the ‘Librarian of the Web’. In particular, a new graph-based approach to represent text documents and search queries alike by single, descriptive terms, called centroid terms, as well as a method for their fast calculation will be presented and evaluated. Important properties of centroid terms and their applicability in automatic text processing tasks are investigated in a large set of experiments. As the following two chapters build upon and extend these fundamentals, their understanding is of high importance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
This can be achieved by adding a sufficiently high number of documents to it during its building process.
- 2.
Modified from https://commons.wikimedia.org/wiki/File:Bird_toy_showing_center_of_gravity.jpg, original author: APN MJM, Creative Commons licence: CC BY-SA 3.0.
- 3.
Interested readers may download these datasets (4.1 MB) from: http://www.docanalyser.de/sa-corpora.zip.
- 4.
Interested readers may download these datasets (1.03 MB) from: http://www.docanalyser.de/cd-prop-corpora.zip.
- 5.
Interested readers may download these datasets (1.3 MB) from: http://www.docanalyser.de/cd-corpora.zip.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Kubek, M. (2020). Text-Representing Centroid Terms. In: Concepts and Methods for a Librarian of the Web. Studies in Big Data, vol 62. Springer, Cham. https://doi.org/10.1007/978-3-030-23136-1_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-23136-1_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-23135-4
Online ISBN: 978-3-030-23136-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)