Skip to main content

Text-Representing Centroid Terms

  • Chapter
  • First Online:
Concepts and Methods for a Librarian of the Web

Part of the book series: Studies in Big Data ((SBD,volume 62))

  • 369 Accesses

Abstract

This chapter introduces the core librarian-inspired concepts and text analysis techniques needed for the realisation of the ‘Librarian of the Web’. In particular, a new graph-based approach to represent text documents and search queries alike by single, descriptive terms, called centroid terms, as well as a method for their fast calculation will be presented and evaluated. Important properties of centroid terms and their applicability in automatic text processing tasks are investigated in a large set of experiments. As the following two chapters build upon and extend these fundamentals, their understanding is of high importance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 139.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    This can be achieved by adding a sufficiently high number of documents to it during its building process.

  2. 2.

    Modified from https://commons.wikimedia.org/wiki/File:Bird_toy_showing_center_of_gravity.jpg, original author: APN MJM, Creative Commons licence: CC BY-SA 3.0.

  3. 3.

    Interested readers may download these datasets (4.1 MB) from: http://www.docanalyser.de/sa-corpora.zip.

  4. 4.

    Interested readers may download these datasets (1.03 MB) from: http://www.docanalyser.de/cd-prop-corpora.zip.

  5. 5.

    Interested readers may download these datasets (1.3 MB) from: http://www.docanalyser.de/cd-corpora.zip.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mario Kubek .

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Kubek, M. (2020). Text-Representing Centroid Terms. In: Concepts and Methods for a Librarian of the Web. Studies in Big Data, vol 62. Springer, Cham. https://doi.org/10.1007/978-3-030-23136-1_6

Download citation

Publish with us

Policies and ethics