Distinguishing between Instances and Classes in the Wikipedia Taxonomy

  • Cäcilia Zirn
  • Vivi Nastase
  • Michael Strube
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5021)


This paper presents an automatic method for differentiating between instances and classes in a large scale taxonomy induced from the Wikipedia category network. The method exploits characteristics of the category names and the structure of the network. The approach we present is the first attempt to make this distinction automatically in a large scale resource. In contrast, this distinction has been made in WordNet and Cyc based on manual annotations. The result of the process is evaluated against ResearchCyc. On the subnetwork shared by our taxonomy and ResearchCyc we report 84.52% accuracy.


Noun Phrase Standard Deviation Score Name Entity Recognizer Computational Linguistics Plural Noun 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Berners-Lee, T., Hendler, J., Lassila, O.: The semantic web. Scientific American 284(5), 34–43 (2001)CrossRefGoogle Scholar
  2. 2.
    Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)zbMATHGoogle Scholar
  3. 3.
    Lenat, D.B., Guha, R.V.: Building Large Knowledge-Based Systems: Representation and Inference in the CYC Project. Addison-Wesley, Reading (1990)Google Scholar
  4. 4.
    Fridman Noy, N., Hafner, C.D.: The state of the art in ontology design: A survey and comparative review. AI Magazine 18(3), 53–74 (1997)Google Scholar
  5. 5.
    Ponzetto, S.P., Strube, M.: Deriving a large scale taxonomy from Wikipedia. In: Proceedings of the 22nd National Conference on Artificial Intelligence, Vancouver, B.C., Canada, July 22–26, pp. 1440–1447 (2007)Google Scholar
  6. 6.
    Woods, W.A.: What’s in a link: The semantics of semantic networks. In: Bobrow, D.G., Collins, A.M. (eds.) Representation and Understanding, pp. 35–79. Academic Press, New York (1975)Google Scholar
  7. 7.
    Gangemi, A., Guarino, N., Oltramari, A.: Conceptual analysis of lexical taxonomies: The case of WordNet top-level. In: Proceedings of the 2nd International Conference on Formal Ontology in Information Systems, Ogunquit, Maine, October 17-19, 2001, pp. 285–296 (2001)Google Scholar
  8. 8.
    Oltramari, A., Gangemi, A., Guarino, N., Masolo, C.: Restructuring WordNet’s top-level: The OntoClean approach. In: Proceedings of the Workshop on Ontologies and Lexical Knowledge Bases at LREC 2002, Las Palmas, Spain, May 27, 2002, pp. 17–26 (2002)Google Scholar
  9. 9.
    Miller, G.A., Hristea, F.: WordNet nouns: Classes and instances. Computational Linguistics 32(1), 1–3 (2006)CrossRefGoogle Scholar
  10. 10.
    Miller, G., Hristea, F.: Towards building a WordNet noun ontology. Revue Roumaine de Linguistique LI(3-4), 405–413 (2006)Google Scholar
  11. 11.
    Siegel, S., Castellan, N.J.: Nonparametric Statistics for the Behavioral Sciences, 2nd edn. McGraw-Hill, New York (1988)Google Scholar
  12. 12.
    Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 15th International Conference on Computational Linguistics, Nantes, France, August 23-28, 1992, pp. 539–545 (1992)Google Scholar
  13. 13.
    Caraballo, S.A.: Automatic construction of a hypernym-labeled noun hierarchy from text. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, College Park, Md., June 20–26, 1999, pp. 120–126 (1999)Google Scholar
  14. 14.
    Harman, D., Liberman, M.: TIPSTER Complete. LDC93T3A, Linguistic Data Consortium, Philadelphia, Penn. (1993)Google Scholar
  15. 15.
    Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by Gibbs sampling. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, Ann Arbor, Mich., June 25–30, 2005, pp. 363–370 (2005)Google Scholar
  16. 16.
    Klein, D., Manning, C.D.: Fast exact inference with a factored model for natural language parsing. In: Becker, S., Thrun, S., Obermayer, K. (eds.) Advances in Neural Information Processing Systems 15 (NIPS 2002), pp. 3–10. MIT Press, Cambridge (2003)Google Scholar
  17. 17.
    Bunescu, R., Paşca, M.: Using encyclopedic knowledge for named entity disambiguation. In: Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics, Trento, Italy, April 3–7, 2006, pp. 9–16 (2006)Google Scholar
  18. 18.
    Santorini, B.: Part of speech tagging guidelines for the Penn Treebank Project (1990),

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Cäcilia Zirn
    • 1
    • 2
  • Vivi Nastase
    • 1
  • Michael Strube
    • 1
  1. 1.EML Research gGmbHHeidelbergGermany
  2. 2.Department of Computational LinguisticsUniversity of HeidelbergHeidelbergGermany

Personalised recommendations