Automatic Expansion of DBpedia Exploiting Wikipedia Cross-Language Information

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7882)


DBpedia is a project aiming to represent Wikipedia content in RDF triples. It plays a central role in the Semantic Web, due to the large and growing number of resources linked to it. Nowadays, only 1.7M Wikipedia pages are deeply classified in the DBpedia ontology, although the English Wikipedia contains almost 4M pages, showing a clear problem of coverage. In other languages (like French and Spanish) this coverage is even lower. The objective of this paper is to define a methodology to increase the coverage of DBpedia in different languages. The major problems that we have to solve concern the high number of classes involved in the DBpedia ontology and the lack of coverage for some classes in certain languages. In order to deal with these problems, we first extend the population of the classes for the different languages by connecting the corresponding Wikipedia pages through cross-language links. Then, we train a supervised classifier using this extended set as training data. We evaluated our system using a manually annotated test set, demonstrating that our approach can add more than 1M new entities to DBpedia with high precision (90%) and recall (50%). The resulting resource is available through a SPARQL endpoint and a downloadable package.


Singular Value Decomposition Vector Space Model Computational Linguistics Proximity Matrix SPARQL Endpoint 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.G.: DBpedia: A nucleus for a web of open data. In: Aberer, K., et al. (eds.) ISWC/ASWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  2. 2.
    Cabrio, E., Cojan, J., Palmero Aprosio, A., Magnini, B., Lavelli, A., Gandon, F.: QAKiS: An open domain QA system based on relational patterns. In: Glimm, B., Huynh, D. (eds.) International Semantic Web Conference (Posters & Demos). CEUR Workshop Proceedings, vol. 914, (2012)Google Scholar
  3. 3.
    Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)CrossRefGoogle Scholar
  4. 4.
    Fleischman, M., Hovy, E.: Fine grained classification of named entities. In: Proceedings of the 19th International Conference on Computational Linguistics, Taipei, Taiwan (2002)Google Scholar
  5. 5.
    Fleiss, J.L.: Measuring Nominal Scale Agreement Among Many Raters. Psychological Bulletin 76(5), 378–382 (1971)CrossRefGoogle Scholar
  6. 6.
    Gangemi, A., Nuzzolese, A.G., Presutti, V., Draicchio, F., Musetti, A., Ciancarini, P.: Automatic typing of DBpedia entities. In: Cudré-Mauroux, P., Heflin, J., Sirin, E., Tudorache, T., Euzenat, J., Hauswirth, M., Parreira, J.X., Hendler, J., Schreiber, G., Bernstein, A., Blomqvist, E. (eds.) ISWC 2012, Part I. LNCS, vol. 7649, pp. 65–81. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  7. 7.
    Giuliano, C.: Fine-grained classification of named entities exploiting latent semantic kernels. In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning, CoNLL 2009, pp. 201–209. Association for Computational Linguistics, Stroudsburg (2009)CrossRefGoogle Scholar
  8. 8.
    Giuliano, C., Gliozzo, A.: Instance based lexical entailment for ontology population. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 248–256. Association for Computational Linguistics, Prague (2007)Google Scholar
  9. 9.
    Gliozzo, A., Strapparava, C.: Domain kernels for text categorization. In: Ninth Conference on Computational Natural Language Learning (CoNLL 2005), Ann Arbor, Michigan, pp. 56–63 (June 2005)Google Scholar
  10. 10.
    Gliozzo, A.M., Giuliano, C., Strapparava, C.: Domain kernels for word sense disambiguation. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005), Ann Arbor, Michigan, pp. 403–410 (June 2005)Google Scholar
  11. 11.
    Kontokostas, D., Bratsas, C., Auer, S., Hellmann, S., Antoniou, I., Metakides, G.: Internationalization of Linked Data: The case of the Greek DBpedia edition. Web Semantics: Science, Services and Agents on the World Wide Web 15, 51–61 (2012)CrossRefGoogle Scholar
  12. 12.
    Li, X., Roth, D.: Learning question classifiers: The role of semantic information. Natural Language Engineering 12(3), 229–249 (2005)CrossRefGoogle Scholar
  13. 13.
    Dan Melamed, I., Resnik, P.: Tagger evaluation given hierarchical tag sets. Computers and the Humanities, 79–84 (2000)Google Scholar
  14. 14.
    Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, ACL 2009, vol. 2, pp. 1003–1011. Association for Computational Linguistics, Stroudsburg (2009)Google Scholar
  15. 15.
    Nastase, V., Strube, M.: Decoding Wikipedia categories for knowledge acquisition. In: Proceedings of the 23rd National Conference on Artificial Intelligence, AAAI 2008, vol. 2, pp. 1219–1224. AAAI Press (2008)Google Scholar
  16. 16.
    Noreen, E.W.: Computer-Intensive Methods for Testing Hypotheses: An Introduction. Wiley-Interscience (1989)Google Scholar
  17. 17.
    Nothman, J., Curran, J.R., Murphy, T.: Transforming Wikipedia into named entity training data. In: Proceedings of the Australasian Language Technology Workshop, Hobart, Australia (2008)Google Scholar
  18. 18.
    Pohl, A.: Classifying the Wikipedia Articles into the OpenCyc Taxonomy. In: Proceedings of the Web of Linked Entities Workshop in Conjuction with the 11th International Semantic Web Conference (2012)Google Scholar
  19. 19.
    Schölkopf, B., Smola, A.: Learning with Kernels. MIT Press, Cambridge (2002)Google Scholar
  20. 20.
    Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press (2004)Google Scholar
  21. 21.
    Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: A core of semantic knowledge. In: Proceedings of the 16th International Conference on World Wide Web, WWW 2007, pp. 697–706. ACM, New York (2007)CrossRefGoogle Scholar
  22. 22.
    Wang, P., Hu, J., Zeng, H.-J., Chen, Z.: Using wikipedia knowledge to improve text classification. Knowledge and Information Systems 19, 265–281 (2009), doi:10.1007/s10115-008-0152-4CrossRefGoogle Scholar
  23. 23.
    Wu, F., Weld, D.S.: Open information extraction using wikipedia. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL 2010, pp. 118–127. Association for Computational Linguistics, Stroudsburg (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.Fondazione Bruno KesslerTrentoItaly
  2. 2.Università degli Studi di MilanoMilanoItaly

Personalised recommendations