Using Links to Classify Wikipedia Pages

  • Rianne Kaptein
  • Jaap Kamps
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5631)


This paper contains a description of experiments for the 2008 INEX XML-mining track. Our goal for the XML-mining track is to explore whether we can use link information to improve classification accuracy. Our approach is to propagate category probabilities over linked pages. We find that using link information leads to marginal improvements over a baseline that uses a Naive Bayes model. For the initially misclassified pages, link information is either not available or contains too much noise.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Denoyer, L., Gallinari, P.: Report on the xml mining track at inex 2007 categorization and clustering of xml documents. SIGIR Forum 42(1), 22–28 (2008)CrossRefGoogle Scholar
  2. 2.
    Kamps, J., Koolen, M.: The importance of link evidence in Wikipedia. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 270–282. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  3. 3.
    Vercoustre, A.M., Pehcevski, J., Thom, J.A.: Using wikipedia categories and links in entity ranking. In: Focused Access to XML Documents, pp. 321–335 (2007)Google Scholar
  4. 4.
    Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34, 1–47 (2002)CrossRefGoogle Scholar
  5. 5.
    Williams, K.: Ai: categorizer - automatic text categorization. Perl Module (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Rianne Kaptein
    • 1
  • Jaap Kamps
    • 1
    • 2
  1. 1.Archives and Information Studies, Faculty of HumanitiesUniversity of AmsterdamNetherlands
  2. 2.ISLA, Faculty of ScienceUniversity of AmsterdamNetherlands

Personalised recommendations