BUAP: Performance of K-Star at the INEX’09 Clustering Task

  • David Pinto
  • Mireya Tovar
  • Darnes Vilariño
  • Beatriz Beltrán
  • Héctor Jiménez-Salazar
  • Basilia Campos
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6203)

Abstract

The aim of this paper is to use unsupervised classification techniques in order to group the documents of a given huge collection into clusters. We approached this challenge by using a simple clustering algorithm (K-Star) in a recursive clustering process over subsets of the complete collection.

The presented approach is a scalable algorithm which may automatically discover the number of clusters. The obtained results outperformed different baselines presented in the INEX 2009 clustering task.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    MacKay, D.J.C.: Information Theory, Inference and Learning Algorithms. Cambridge University Press, Cambridge (2003)MATHGoogle Scholar
  2. 2.
    Mirkin, B.G.: Mathematical Classification and Clustering. Springer, Heidelberg (1996)CrossRefMATHGoogle Scholar
  3. 3.
    MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proc. of the 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297. University of California Press, Berkeley (1967)Google Scholar
  4. 4.
    Meyer zu Eissen, S.: On information need and categorizing search. PhD dissertation, University of Paderborn, Germany (2007)Google Scholar
  5. 5.
    Kernighan, B.W., Lin, S.: An efficient heuristic procedure for partitioning graphs. Bell Systems Technical Journal 49(2), 291–308 (1970)CrossRefMATHGoogle Scholar
  6. 6.
    Shin, K., Han, S.Y.: Fast clustering algorithm for information organization. In: Gelbukh, A. (ed.) CICLing 2003. LNCS, vol. 2588, pp. 619–622. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  7. 7.
    Salton, G., Wong, A., Yang, C.: A vector space model for automatic indexing. Communications of the ACM 18(11), 613–620 (1975)CrossRefMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • David Pinto
    • 1
  • Mireya Tovar
    • 1
  • Darnes Vilariño
    • 1
  • Beatriz Beltrán
    • 1
  • Héctor Jiménez-Salazar
    • 2
  • Basilia Campos
    • 1
  1. 1.Faculty of Computer ScienceB. Autonomous University of PueblaMexico
  2. 2.Department of Information TechnologiesAutonomous Metropolitan UniversityMexico

Personalised recommendations