Uncovering the Semantics of Wikipedia Categories
Abstract
The Wikipedia category graph serves as the taxonomic backbone for large-scale knowledge graphs like YAGO or Probase, and has been used extensively for tasks like entity disambiguation or semantic similarity estimation. Wikipedia’s categories are a rich source of taxonomic as well as non-taxonomic information. The category German science fiction writers, for example, encodes the type of its resources (Writer), as well as their nationality (German) and genre (Science Fiction). Several approaches in the literature make use of fractions of this encoded information without exploiting its full potential. In this paper, we introduce an approach for the discovery of category axioms that uses information from the category network, category instances, and their lexicalisations. With DBpedia as background knowledge, we discover 703k axioms covering 502k of Wikipedia’s categories and populate the DBpedia knowledge graph with additional 4.4M relation assertions and 3.3M type assertions at more than 87% and 90% precision, respectively.
Keywords
Knowledge graph completion Wikipedia category graph Ontology learning DBpediaReferences
- 1.Aprosio, A.P., Giuliano, C., Lavelli, A.: Extending the coverage of DBpedia properties using distant supervision over Wikipedia. In: NLP-DBpedia@ ISWC (2013)Google Scholar
- 2.Bryl, V., Bizer, C., Paulheim, H.: Gathering alternative surface forms for DBpedia entities. In: Workshop on NLP&DBpedia, pp. 13–24 (2015)Google Scholar
- 3.Färber, M., Bartscherer, F., Menne, C., Rettinger, A.: Linked data quality of DBpedia, Freebase, OpenCyc, Wikidata, and YAGO. Semant. Web 9, 1–53 (2016)Google Scholar
- 4.Flati, T., et al.: Two is bigger (and better) than one: the Wikipedia bitaxonomy project. In: 52nd Annual Meeting of the ACL, vol. 1, pp. 945–955 (2014)Google Scholar
- 5.Fleiss, J.L.: Measuring nominal scale agreement among many raters. Psychol. Bull. 76(5), 378 (1971) CrossRefGoogle Scholar
- 6.Fossati, M., Kontokostas, D., Lehmann, J.: Unsupervised learning of an extensive and usable taxonomy for DBpedia. In: 11th International Conference on Semantic Systems, pp. 177–184. ACM (2015)Google Scholar
- 7.Gerber, D., Ngomo, A.C.N.: Bootstrapping the linked data web. In: 1st Workshop on Web Scale Knowledge Extraction@ ISWC, vol. 2011 (2011)Google Scholar
- 8.Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: 14th Conference on Computational Linguistics, vol. 2, pp. 539–545 (1992)Google Scholar
- 9.Heist, N., Hertling, S., Paulheim, H.: Language-agnostic relation extraction from abstracts in Wikis. Information 9(4), 75 (2018)CrossRefGoogle Scholar
- 10.Hertling, S., Paulheim, H.: WebIsALOD: providing hypernymy relations extracted from the Web as linked open data. In: d’Amato, C., Fernandez, M., Tamma, V., Lecue, F., Cudré-Mauroux, P., Sequeda, J., Lange, C., Heflin, J. (eds.) ISWC 2017. LNCS, vol. 10588, pp. 111–119. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68204-4_11CrossRefGoogle Scholar
- 11.Hertling, S., Paulheim, H.: DBkWik: a consolidated knowledge graph from thousands of Wikis. In: IEEE International Conference on Big Knowledge, ICBK (2018)Google Scholar
- 12.Kozareva, Z., Hovy, E.: Learning arguments and supertypes of semantic relations using recursive patterns. In: 48th Annual Meeting of the ACL, pp. 1482–1491. ACL (2010)Google Scholar
- 13.Kuhn, P., Mischkewitz, S., et al.: Type inference on Wikipedia list pages. Informatik 46, 2101–2111 (2016)Google Scholar
- 14.Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics 33, 159–174 (1977) CrossRefGoogle Scholar
- 15.Lehmann, J.: DL-learner: learning concepts in description logics. J. Mach. Learn. Res. 10(Nov), 2639–2642 (2009)MathSciNetzbMATHGoogle Scholar
- 16.Lehmann, J., Isele, R., Jakob, M., et al.: Dbpedia-a large-scale, multilingual knowledge base extracted from Wikipedia. Semant. Web 6(2), 167–195 (2015)Google Scholar
- 17.Liu, Q., Xu, K., Zhang, L., Wang, H., Yu, Y., Pan, Y.: Catriple: extracting triples from wikipedia categories. In: Domingue, J., Anutariya, C. (eds.) ASWC 2008. LNCS, vol. 5367, pp. 330–344. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-89704-0_23CrossRefGoogle Scholar
- 18.Mahdisoltani, F., Biega, J., Suchanek, F.M.: YAGO3: a knowledge base from multilingual Wikipedias. In: CIDR (2013)Google Scholar
- 19.Mintz, M., Bills, S., et al.: Distant supervision for relation extraction without labeled data. ACL-AFNLP 2, 1003–1011 (2009)Google Scholar
- 20.Muñoz, E., Hogan, A., Mileo, A.: Triplifying Wikipedia’s tables. In: LD4IE@ ISWC, vol. 1057 (2013)Google Scholar
- 21.Nastase, V., Strube, M.: Decoding Wikipedia categories for knowledge acquisition. AAAI 8, 1219–1224 (2008)Google Scholar
- 22.Paulheim, H.: Knowledge graph refinement: a survey of approaches and evaluation methods. Semant. Web 8(3), 489–508 (2017)CrossRefGoogle Scholar
- 23.Paulheim, H., Bizer, C.: Type inference on noisy RDF data. In: Alani, H., et al. (eds.) ISWC 2013. LNCS, vol. 8218, pp. 510–525. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41335-3_32 CrossRefGoogle Scholar
- 24.Paulheim, H., Ponzetto, S.P.: Extending DBpedia with Wikipedia list pages. NLP-DBpedia ISWC 13, 1–6 (2013)Google Scholar
- 25.Ponzetto, S.P., Strube, M.: Deriving a large scale taxonomy from Wikipedia. AAAI 7, 1440–1445 (2007)Google Scholar
- 26.Rettinger, A., Lösch, U., Tresp, V., d’Amato, C., Fanizzi, N.: Mining the semantic web. Data Min. Knowl. Discov. 24(3), 613–662 (2012)MathSciNetCrossRefGoogle Scholar
- 27.Ringler, D., Paulheim, H.: One knowledge graph to rule them all? Analyzing the differences between DBpedia, YAGO, Wikidata & co. In: Kern-Isberner, G., Fürnkranz, J., Thimm, M. (eds.) KI 2017. LNCS, vol. 10505. Springer, Heidelberg (2017). https://doi.org/10.1007/978-3-319-67190-1_33CrossRefGoogle Scholar
- 28.Ritze, D., Lehmberg, O., Bizer, C.: Matching HTML tables to DBpedia. In: 5th International Conference on Web Intelligence, Mining and Semantics, p. 10. ACM, New York (2015)Google Scholar
- 29.Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a core of semantic knowledge. In: 16th International Conference on World Wide Web, pp. 697–706. ACM (2007)Google Scholar
- 30.Velardi, P., Faralli, S., Navigli, R.: OntoLearn reloaded: a graph-based algorithm for taxonomy induction. Comput. Linguist. 39(3), 665–707 (2013)CrossRefGoogle Scholar
- 31.Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10), 78–85 (2014)CrossRefGoogle Scholar
- 32.Xu, B., Xie, C., et al.: Learning defining features for categories. In: IJCAI, pp. 3924–3930 (2016)Google Scholar
- 33.Zaveri, A., Rula, A., Maurino, A., et al.: Quality assessment for linked data: a survey. Semant. Web 7(1), 63–93 (2016)CrossRefGoogle Scholar