Advertisement

Towards an Automatic Creation of Localized Versions of DBpedia

  • Alessio Palmero Aprosio
  • Claudio Giuliano
  • Alberto Lavelli
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8218)

Abstract

DBpedia is a large-scale knowledge base that exploits Wikipedia as primary data source. The extraction procedure requires to manually map Wikipedia infoboxes into the DBpedia ontology. Thanks to crowdsourcing, a large number of infoboxes has been mapped in the English DBpedia. Consequently, the same procedure has been applied to other languages to create the localized versions of DBpedia. However, the number of accomplished mappings is still small and limited to most frequent infoboxes. Furthermore, mappings need maintenance due to the constant and quick changes of Wikipedia articles. In this paper, we focus on the problem of automatically mapping infobox attributes to properties into the DBpedia ontology for extending the coverage of the existing localized versions or building from scratch versions for languages not covered in the current version. The evaluation has been performed on the Italian mappings. We compared our results with the current mappings on a random sample re-annotated by the authors. We report results comparable to the ones obtained by a human annotator in term of precision, but our approach leads to a significant improvement in recall and speed. Specifically, we mapped 45,978 Wikipedia infobox attributes to DBpedia properties in 14 different languages for which mappings were not yet available. The resource is made available in an open format.

Keywords

Localize Version Target Language Schema Match Primary Data Source Buenos Aires Province 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Adar, E., Skinner, M., Weld, D.S.: Information arbitrage across multi-lingual Wikipedia. In: Proceedings of the Second ACM International Conference on Web Search and Data Mining, WSDM 2009, pp. 94–103. ACM, New York (2009), http://doi.acm.org/10.1145/1498759.1498813 CrossRefGoogle Scholar
  2. 2.
    Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: DBpedia - a crystallization point for the web of data. Web Semant. 7(3), 154–165 (2009), http://dx.doi.org/10.1016/j.websem.2009.07.002 CrossRefGoogle Scholar
  3. 3.
    Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, pp. 1247–1250. ACM, New York (2008), http://doi.acm.org/10.1145/1376616.1376746 CrossRefGoogle Scholar
  4. 4.
    Bouma, G., Duarte, S., Islam, Z.: Cross-lingual alignment and completion of Wikipedia templates. In: Proceedings of the Third International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies, CLIAWS3 2009, pp. 21–29. Association for Computational Linguistics, Stroudsburg (2009), http://dl.acm.org/citation.cfm?id=1572433.1572437 Google Scholar
  5. 5.
    Cancedda, N., Gaussier, E., Goutte, C., Renders, J.M.: Word sequence kernels. J. Mach. Learn. Res. 3, 1059–1082 (2003), http://dl.acm.org/citation.cfm?id=944919.944963 MathSciNetzbMATHGoogle Scholar
  6. 6.
    Fleiss, J.L.: Measuring Nominal Scale Agreement Among Many Raters. Psychological Bulletin 76(5), 378–382 (1971), http://dx.doi.org/10.1037/h0031619 CrossRefGoogle Scholar
  7. 7.
    Kontokostas, D., Bratsas, C., Auer, S., Hellmann, S., Antoniou, I., Metakides, G.: Internationalization of Linked Data: The case of the Greek DBpedia edition. Web Semantics: Science, Services and Agents on the World Wide Web 15, 51–61 (2012), http://www.sciencedirect.com/science/article/pii/S1570826812000030 CrossRefGoogle Scholar
  8. 8.
    Lodhi, H., Shawe-Taylor, J., Cristianini, N.: Text classification using string kernels. Journal of Machine Learning Research 2, 563–569 (2002)Google Scholar
  9. 9.
    Nguyen, T., Moreira, V., Nguyen, H., Nguyen, H., Freire, J.: Multilingual schema matching for Wikipedia infoboxes. Proc. VLDB Endow. 5(2), 133–144 (2011), http://dl.acm.org/citation.cfm?id=2078324.2078329 Google Scholar
  10. 10.
    Palmero Aprosio, A., Giuliano, C., Lavelli, A.: Automatic expansion of DBpedia exploiting Wikipedia cross-language information. In: Cimiano, P., Corcho, O., Presutti, V., Hollink, L., Rudolph, S. (eds.) ESWC 2013. LNCS, vol. 7882, pp. 397–411. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  11. 11.
    Palmero Aprosio, A., Giuliano, C., Lavelli, A.: Automatic Mapping of Wikipedia Templates for Fast Deployment of Localised DBpedia Datasets. In: Proceedings of the 13th International Conference on Knowledge Management and Knowledge Technologies (2013)Google Scholar
  12. 12.
    Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. The VLDB Journal 10(4), 334–350 (2001), http://dx.doi.org/10.1007/s007780100057 CrossRefzbMATHGoogle Scholar
  13. 13.
    Rinser, D., Lange, D., Naumann, F.: Cross-lingual entity matching and infobox alignment in Wikipedia. Information Systems 38(6), 887–907 (2013), http://www.sciencedirect.com/science/article/pii/S0306437912001299 CrossRefGoogle Scholar
  14. 14.
    Saunders, C., Tschach, H., Taylor, J.S.: Syllables and other String Kernel Extensions. In: Proc. 19th International Conference on Machine Learning (ICML 2002), pp. 530–537 (2002)Google Scholar
  15. 15.
    Shvaiko, P., Euzenat, J.: A survey of schema-based matching approaches. Journal on Data Semantics 4, 146–171 (2005)Google Scholar
  16. 16.
    Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: Proceedings of the 16th International Conference on World Wide Web, WWW 2007, pp. 697–706. ACM, New York (2007), http://doi.acm.org/10.1145/1242572.1242667 CrossRefGoogle Scholar
  17. 17.
    Sultana, A., Hasan, Q.M., Biswas, A.K., Das, S., Rahman, H., Ding, C., Li, C.: Infobox suggestion for Wikipedia entities. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM 2012, pp. 2307–2310. ACM, New York (2012), http://doi.acm.org/10.1145/2396761.2398627
  18. 18.
    Vrandečić, D.: Wikidata: a new platform for collaborative data collection. In: Proceedings of the 21st International Conference Companion on World Wide Web, WWW 2012 Companion, pp. 1063–1064. ACM, New York (2012), http://doi.acm.org/10.1145/2187980.2188242 Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Alessio Palmero Aprosio
    • 1
  • Claudio Giuliano
    • 2
  • Alberto Lavelli
    • 2
  1. 1.Università degli Studi di MilanoMilanoItaly
  2. 2.Fondazione Bruno KesslerTrentoItaly

Personalised recommendations