Vector Space Model for Texts and the tf-idf Measure

  • Grigori Sidorov
Part of the SpringerBriefs in Computer Science book series (BRIEFSCOMPUTER)


In this chapter, we discuss the features that are used for text representation while comparing them in vector space model, such as words or n-grams. We also present the possible values of these features: tf, idf, and tf-idf.


  1. 4.
    Argamon, S., Juola, P.: Overview of the international authorship identification competition at PAN-2011. In: Proc. of 5th Int. Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (2011)Google Scholar
  2. 7.
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley (1999)Google Scholar
  3. 51.
    Jiménez-Salazar, H., Pinto, D., Rosso, P.: Uso del punto de transición en la selección de términos índice para agrupamiento de textos cortos. Procesamiento del Lenguaje Natural, 35, pp. 383–390 (2005)Google Scholar
  4. 66.
    Manning, C., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA (1999)zbMATHGoogle Scholar
  5. 102.
    Stamatatos, E.: A survey of modern authorship attribution methods. Journal of the American Society for information Science and Technology 60(3): 538–556 (2009)CrossRefGoogle Scholar

Copyright information

© The Author(s), under exclusive licence to Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Grigori Sidorov
    • 1
  1. 1.Instituto Politécnico NacionalCentro de Investigación en ComputaciónMexico CityMexico

Personalised recommendations