Chapter

Artifical Intelligence and Soft Computing

Volume 6114 of the series Lecture Notes in Computer Science pp 532-539

Clustering Polish Texts with Latent Semantic Analysis

  • Marcin KutaAffiliated withInstitute of Computer Science, AGH University of Science and Technology
  • , Jacek KitowskiAffiliated withInstitute of Computer Science, AGH University of Science and Technology

* Final gross prices may vary according to local VAT.

Get Access

Abstract

The document clustering is an important technique of Natural Language Processing (NLP). The paper presents performance of partitional and agglomerative algorithms applied to clustering large number of Polish newspaper articles. We investigate different representations of the documents. The focus of the paper is on the applicability of the Latent Semantic Analysis to such clustering for Polish.

Keywords

document clustering latent semantic analysis part-of-speech tagging natural language processing