Exploit Semantic Information for Category Annotation Recommendation in Wikipedia

  • Yang Wang
  • Haofen Wang
  • Haiping Zhu
  • Yong Yu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4592)


Compared with plain-text resources, the ones in “semi-semantic” web sites, such as Wikipedia, contain high-level semantic information which will benefit various automatically annotating tasks on themself. In this paper, we propose a “collaborative annotating” approach to automatically recommend categories for a Wikipedia article by reusing category annotations from its most similar articles and ranking these annotations by their confidence. In this approach, four typical semantic features in Wikipedia, namely incoming link, outgoing link, section heading and template item, are investigated and exploited as the representation of articles to feed the similarity calculation. The experiment results have not only proven that these semantic features improve the performance of category annotating, with comparison to the plain text feature; but also demonstrated the strength of our approach in discovering missing annotations and proper level ones for Wikipedia articles.


Collaborative Annotating Semantic Features Vector Space Model Wikipedia Category 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Voss, J.: Collaborative thesaurus tagging the Wikipedia way. Wikimetrics (2006)Google Scholar
  2. 2.
    Ruiz-Casado, M.: From Wikipedia to Semantic Relationships: a semi-automated Annotation Approach. SemWiki (2006)Google Scholar
  3. 3.
    Lee, T.B., Hardler, J., Lassila, O.: The Semantic Web. Scientific American Magazine (2001)Google Scholar
  4. 4.
    Melville, P., Mooney, R.J., Nagarajan, R.: Content-Boosted Collaborative Filtering for Improved Recommendations. AAAI (2002)Google Scholar
  5. 5.
    Buitelaar, P.: Ontology Learning from Text. Tutorial at ECML/PKDD (2005)Google Scholar
  6. 6.
    Uren, V., Cimiano, P., Iria, J., Handschuh, S., Ciravegna, F.: Semantic annotation for knowledge management: Requirements and a survey of the state of the art. Journal of Web Semantics (2005)Google Scholar
  7. 7.
    Mukherjee, S., Yang, G., Ramakrishnan, I.V.: Automatic Annotation of Content-Rich HTML Documents:Sementic Analysis. In: Fensel, D., Sycara, K.P., Mylopoulos, J. (eds.) ISWC 2003. LNCS, vol. 2870, Springer, Heidelberg (2003)Google Scholar
  8. 8.
    Erdmann, M., Maedche, A.: From Manual to Semi-automatic Semantic Annotation: About Ontology-based Text Annotation Tools. Semantic Annotation (2000)Google Scholar
  9. 9.
    Kiryakov, A., Popov, B., Terziev, I.: Semantic Annotation, Indexing, and Retrieval. Journal of Web Semantics (2004)Google Scholar
  10. 10.
    Cimiano, P., Handschuh, S., Staab, S.: Towards the SelfAnnotating Web. In: WWW 2004 (2004)Google Scholar
  11. 11.
    Cimiano, P., Handschuh, S., Staab, S.: Gimme’ the context: context-driven automatic semantic annotation with C-PANKOW. WWW (2005)Google Scholar
  12. 12.
    Marques, O., Barman, N.: Semi-Automatic Semantic Annotation of Images Using Machine Learning Techniques. In: Fensel, D., Sycara, K.P., Mylopoulos, J. (eds.) ISWC 2003. LNCS, vol. 2870, Springer, Heidelberg (2003)Google Scholar
  13. 13.
    Ruiz-Casado, M.: Automatic Extraction of Semantic Relationships forWordNet by Means of Pattern Learning fromWikipedia. In: Montoyo, A., Muńoz, R., Métais, E. (eds.) NLDB 2005. LNCS, vol. 3513, Springer, Heidelberg (2005)Google Scholar
  14. 14.
    Adafre, S.F., de Rijke, M.: Discovering Missing Links in Wikipedia. LinkKDD (2005)Google Scholar
  15. 15.
    Chernov, S., Iofciu, T.: Extracting Semantic Relationships between Wikipedia Categories. SemWiki (2006)Google Scholar
  16. 16.
    Mobasher, B., Dai, H., Luo, T., Nakagawa, M.: Improving the Effectiveness of Collaborative Filtering on Anonymous Web Usage Data. In: IJCAI 2001 (2001)Google Scholar
  17. 17.
    Fan, W., Gordon, M.D.: Ranking Function Optimization For Effective Web Search By Genetic Programming: An Empirical Study. In: HICSS 2003 (2003)Google Scholar
  18. 18.
    Bloehdorn, S., Hotho, A.: Boosting for Text Classification with Semantic Features. In: Proceeding of Text Information Rerieval 2004 (2004)Google Scholar
  19. 19.
    Liddy, E.D., Paik, W., Yu, E.S.: Text Categorization for Multiple Users Based on Semantic Features from a Machine-Readable Dictionary. ACM TransactIons on Information Systems 12(3), 278–295 (1994)CrossRefGoogle Scholar
  20. 20.
    Aha, D.W., Kibler, D., Albert, M.K.: Instance-Based Learning Algorithms. Journal of Machine Learning 6, 37–66 (1991)Google Scholar
  21. 21.
    Hepp, M.: Harvesting Wiki Consensus - Using Wikipedia Entries as Ontology Elements. In: Processing of ESWC workshop, SemWiki 2006 (2006)Google Scholar
  22. 22.
    Vöel, M., Krözsch, M., Vrandecic, D., Haller, H., Studer, R.: Semantic Wikipedia. In: WWW 2006 (2006)Google Scholar
  23. 23.
    Denoyer, L.: The Wikipedia XML Corpus. SIGIR Forum 2006 (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Yang Wang
    • 1
  • Haofen Wang
    • 1
  • Haiping Zhu
    • 1
  • Yong Yu
    • 1
  1. 1.APEX Data and Knowledge Management Lab, Department of Computer Science and Engineering, Shanghai JiaoTong University, Shanghai, 200240P.R. China

Personalised recommendations