Advertisement

Sense Cluster Based Categorization and Clustering of Abstracts

  • Davide Buscaldi
  • Paolo Rosso
  • Mikhail Alexandrov
  • Alfons Juan Ciscar
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3878)

Abstract

This paper focuses on the use of sense clusters for classification and clustering of very short texts such as conference abstracts. Common keyword-based techniques are effective for very short documents only when the data pertain to different domains. In the case of conference abstracts, all the documents are from a narrow domain (i.e., share a similar terminology), that increases the difficulty of the task. Sense clusters are extracted from abstracts, exploiting the WordNet relationships existing between words in the same text. Experiments were carried out both for the categorization task, using Bernoulli mixtures for binary data, and the clustering task, by means of Stein’s MajorClust method.

Keywords

Short Document Conference Abstract Cluster Validity Concept Cluster Cluster Task 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Alexandrov, M., Gelbukh, A., Rosso, P.: An Approach to Clustering Abstracts. In: Montoyo, A., Muńoz, R., Métais, E. (eds.) NLDB 2005. LNCS, vol. 3513, pp. 275–285. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  2. 2.
    Eissen, S.M., Stein, B.: Analysis of Clustering Algorithms for Web-based Search. In: Karagiannis, D., Reimer, U. (eds.) PAKM 2002. LNCS (LNAI), vol. 2569, pp. 168–178. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  3. 3.
    Hynek, J., Jezek, K., Rohlikm, O.: Short Document Categorization Itemsets Method. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, p. 6. Springer, Heidelberg (2000)Google Scholar
  4. 4.
    Juan, A., Vidal, E.: On the use of Bernoulli mixture models for text classification. Pattern Recognition 35(12), 2705–2710 (2002)zbMATHCrossRefGoogle Scholar
  5. 5.
    Kang, B., Kim, H., Lee, S.: Performance Analysis of Semantic Indexing in Text Retrieval. In: Gelbukh, A. (ed.) CICLing 2004. LNCS, vol. 2945, pp. 433–436. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  6. 6.
    Stein, B., Eissen, S.M., Wissbrock, F.: On Cluster Validity and the Information Need of Users. In: Proc. 3-rd IASTED Intern. Conf. on Artificial Intelligence and Applications (AIA 2003), pp. 216–221. Acta Press (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Davide Buscaldi
    • 1
  • Paolo Rosso
    • 1
  • Mikhail Alexandrov
    • 2
  • Alfons Juan Ciscar
    • 1
  1. 1.Dpto. Sistemas Informáticos y Computación (DSIC)Universidad Politécnica de ValenciaSpain
  2. 2.Center for Computing ResearchNational Polytechnic InstituteMexico

Personalised recommendations