Categorization of Wikipedia Articles with Spectral Clustering

  • Julian Szymański
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6936)


The article reports application of clustering algorithms for creating hierarchical groups within Wikipedia articles. We evaluate three spectral clustering algorithms based on datasets constructed with usage of Wikipedia categories. Selected algorithm has been implemented in the system that categorize Wikipedia search results in the fly.


Spectral Cluster Test Package Laplacian Matrix Vector Space Model High Abstraction Level 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Manning, C., Raghavan, P., Schütze, H., Corporation, E.: Introduction to information retrieval, vol. 1. Cambridge University Press, Cambridge (2008)CrossRefzbMATHGoogle Scholar
  2. 2.
    Cvetkovic, D., Doob, M., Sachs, H.: Spectra of Graphs–Theory and Applications, III revised and enlarged edition. Johan Ambrosius Bart. Verlag, Heidelberg (1995)zbMATHGoogle Scholar
  3. 3.
    Vazirani, V.: Algorytmy aproksymacyjne. WNT Warszawa (2005)Google Scholar
  4. 4.
    Ng, A., Jordan, M., Weiss, Y.: On spectral clustering: Analysis and an algorithm. In: Advances in Neural Information Processing Systems, vol. 2, pp. 849–856 (2002)Google Scholar
  5. 5.
    Kannan, R., Vetta, A.: On clusterings: Good, bad and spectral. Journal of the ACM (JACM) 51, 497–515 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Verma, D., Meila, M.: A comparison of spectral clustering algorithms. University of Washington, Tech. Rep. UW-CSE-03-05-01 (2003)Google Scholar
  7. 7.
    Salton, G., Wong, A., Yang, C.: A vector space model for automatic indexing. Communications of the ACM 18, 613–620 (1975)CrossRefzbMATHGoogle Scholar
  8. 8.
    Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing & Management 24, 513–523 (1988)CrossRefGoogle Scholar
  9. 9.
    Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: KDD Workshop on Text Mining, vol. 400, pp. 525–526. Citeseer (2000)Google Scholar
  10. 10.
    Wagstaff, K., Cardie, C., Rogers, S., Schrödl, S.: Constrained k-means clustering with background knowledge. In: Proceedings of the Eighteenth International Conference on Machine Learning, vol. 577, p. 584. Citeseer (2001)Google Scholar
  11. 11.
    Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 888–905 (2000)CrossRefGoogle Scholar
  12. 12.
    Eldridge, S., Ashby, D., Bennett, C., Wakelin, M., Feder, G.: Internal and external validity of cluster randomised trials: systematic review of recent trials. Bmj 336, 876 (2008)CrossRefGoogle Scholar
  13. 13.
    Yeung, K., Haynor, D., Ruzzo, W.: Validating clustering for gene expression data. Bioinformatics 17, 309 (2001)CrossRefGoogle Scholar
  14. 14.
    Provost, F., Fawcett, T., Kohavi, R.: The case against accuracy estimation for comparing induction algorithms. In: Proceedings of the Fifteenth International Conference on Machine Learning, vol. 445. Citeseer (1998)Google Scholar
  15. 15.
    Kriegel, H., Pfeifle, M.: Density-based clustering of uncertain data. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, vol. 677. ACM, New York (2005)Google Scholar
  16. 16.
    Szymański, J.: Towards automatic classification of wikipedia content. In: Fyfe, C., Tino, P., Charles, D., Garcia-Osorio, C., Yin, H. (eds.) IDEAL 2010. LNCS, vol. 6283, pp. 102–109. Springer, Heidelberg (2010)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Julian Szymański
    • 1
  1. 1.Department of Computer Systems ArchitectureGdańsk University of TechnologyPoland

Personalised recommendations