Skip to main content

Utilization of DBpedia Mapping in Cross Lingual Wikipedia Infobox Completion

Part of the Lecture Notes in Computer Science book series (LNAI,volume 9992)

Abstract

Wikipedia plays a central role in the web as one of the biggest knowledge source due to its large coverage of information that comes from various domains. However, due to the enormous number of pages and limited number of contributors to maintain all of the pages, the problem of missing information among Wikipedia articles has emerged, especially articles in multiple language versions. Several approaches have been studied to fix information gap in between cross- language Wikipedia articles. However, they can only be applied for languages that came from the same root. In this paper, we propose an approach to generate new information for Wikipedia infoboxes written in different languages with different roots by utilizing the existing DBpedia mappings. We combined mapping information from DBpedia with an instance-based method to align the existing Korean-English infobox attribute-value pairs as well as to generate new pairs from the Korean version to fill missing information in the English version. The results showed that we could expand up to 38% of the existing English Wikipedia attribute-value pairs from our datasets with 61% of accuracy.

Keywords

  • Infobox alignment
  • Infobox completion
  • DBpedia
  • Cross language Wikipedia

This is a preview of subscription content, access via your institution.

Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.

Notes

  1. 1.

    https://www.wikipedia.org/.

  2. 2.

    http://wiki.dbpedia.org/Downloads2015-10.

  3. 3.

    https://www.microsoft.com/en-us/translator/translatorapi.aspx.

  4. 4.

    https://dumps.wikimedia.org/.

  5. 5.

    DBpedia mapping (http://mappings.dbpedia.org/) version 5 March 2016.

  6. 6.

    https://github.com/thomlee/infobox2rdf.

References

  1. Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyaniak, R., Hellmann, S.: DBpedia - a crystallization point for the web of data. Web Sem. Sci. Serv. Agents World Wide Web 7(3), 154–165 (2009)

    CrossRef  Google Scholar 

  2. Rinser, D., Lange, D., Naumann, F.: Cross-lingual entity matching and infobox alignment in Wikipedia. Inf. Syst. 38(6), 887–907 (2013)

    CrossRef  Google Scholar 

  3. Palmero Aprosio, A., Giuliano, C., Lavelli, A.: Towards an automatic creation of localized versions of DBpedia. In: Alani, H., et al. (eds.) ISWC 2013, Part I. LNCS, vol. 8218, pp. 494–509. Springer, Heidelberg (2013)

    Google Scholar 

  4. Adar, E., Skinner, M., Weld, D.S.: Information arbitrage across multi-lingual Wikipedia. In: Proceedings of the Second ACM International Conference on Web Search and Data Mining. ACM (2009)

    Google Scholar 

  5. Wu, F., Weld, D.S.: Autonomously semantifying Wikipedia. In: Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management. ACM (2007)

    Google Scholar 

  6. Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB J. 10(4), 334–350 (2001)

    CrossRef  Google Scholar 

  7. Melnik, S., Garcia-Molina, H., Rahm, E.: Similarity flooding: a versatile graph matching algorithm and its application to schema matching. In: Proceedings of 18th International Conference on Data Engineering. IEEE (2002)

    Google Scholar 

  8. Li, W.-S., Clifton, C.: SEMINT: a tool for identifying attribute correspondences in heterogeneous databases using neural networks. Data Knowl. Eng. 33(1), 49–84 (2000)

    CrossRef  Google Scholar 

  9. Nottelmann, H., Straccia, U.: Information retrieval and machine learning for probabilistic schema matching. Inf. Process. Manag. 43(3), 552–576 (2007)

    CrossRef  Google Scholar 

  10. Kohonen, T.: Adaptive, associative, and self-organizing functions in neural computing. Appl. Opt. 26(23), 4910–4918 (1987)

    CrossRef  Google Scholar 

  11. Fuhr, N.: Probabilistic datalog—a logic for powerful retrieval methods. In: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM (1995)

    Google Scholar 

  12. Wang, H., et al.: Identifying indirect attribute correspondences in multilingual schemas. In: 17th International Workshop on Database and Expert Systems Applications, 2006. DEXA 2006. IEEE (2006)

    Google Scholar 

  13. Fu, B., Brennan, R., O’Sullivan, D.: Cross-lingual ontology mapping – an investigation of the impact of machine translation. In: Gómez-Pérez, A., Yu, Y., Ding, Y. (eds.) ASWC 2009. LNCS, vol. 5926, pp. 1–15. Springer, Heidelberg (2009)

    CrossRef  Google Scholar 

  14. Dos Santos, C.T., Quaresma, P., Vieira, R.: An API for multilingual ontology matching. In: Proceedings of 7th Conference on Language Resources and Evaluation Conference (LREC). No commercial editor (2010)

    Google Scholar 

  15. Bouma, G., Duarte, S., Islam, Z.: Cross-lingual alignment and completion of Wikipedia templates. In: Proceedings of the Third International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies. Association for Computational Linguistics (2009)

    Google Scholar 

  16. Nguyen, T., et al.: Multilingual schema matching for Wikipedia infoboxes. Proc. VLDB Endow. 5(2), 133–144 (2011)

    CrossRef  Google Scholar 

  17. Cojan, J., Cabrio, E., Gandon, F.: Filling the gaps among DBpedia multilingual chapters for question answering. In: Proceedings of the 5th Annual ACM Web Science Conference. ACM (2013)

    Google Scholar 

  18. Kim, E.-K., Choi, K.-S.: Cross-lingual property alignment for DBpedia ontology using triple conceptualization (2014)

    Google Scholar 

  19. Palmero Aprosio, A., Giuliano, C., Lavelli, A.: Automatic expansion of DBpedia exploiting Wikipedia cross-language information. In: Cimiano, P., Corcho, O., Presutti, V., Hollink, L., Rudolph, S. (eds.) ESWC 2013. LNCS, vol. 7882, pp. 397–411. Springer, Heidelberg (2013). doi:10.1007/978-3-642-38288-8_27

    CrossRef  Google Scholar 

  20. Kim, E.K., et al.: An approach for supplementing the Korean Wikipedia based on DBpedia. Liliana Cabral (Open University, UK) Tania Tudorache (Stanford University, USA), p. 7 (2010)

    Google Scholar 

  21. Mahdisoltani, F., Biega, J., Suchanek, F.: Yago3: a knowledge base from multilingual Wikipedias. In: 7th Biennial Conference on Innovative Data Systems Research. CIDR Conference (2014)

    Google Scholar 

  22. Tacchini, E., Schultz, A., Bizer, C.: Experiments with Wikipedia cross-language data fusion. In: Workshop on Scripting and Development (2009)

    Google Scholar 

  23. Spohr, D., Hollink, L., Cimiano, P.: A machine learning approach to multilingual and cross-lingual ontology matching. In: Aroyo, L., et al. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 665–680. Springer, Heidelberg (2011)

    CrossRef  Google Scholar 

  24. Salhi, A., Camacho, H.: A string metric based on a one-to-one greedy matching algorithm. Res. Comput. Sci. 19, 171–182 (2006)

    Google Scholar 

  25. Lee, T.Y., et al.: Automating relational database schema design for very large semantic datasets. Technical report, Department of Computer Science, University of Hong Kong (2013)

    Google Scholar 

  26. Lehmann, J., et al.: DBpedia–a large-scale, multilingual knowledge base extracted from Wikipedia. Semant. Web 6(2), 167–195 (2015)

    CrossRef  Google Scholar 

Download references

Acknowledgments

This work was supported by the Industrial Strategic Technology Development Program, 10052955, Experiential Knowledge Platform Development Research for the Acquisition and Utilization of Field Expert Knowledge, funded by the Ministry of Trade, Industry & Energy (MI, Korea).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mun Yong Yi .

Editor information

Editors and Affiliations

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 2.5 International License (http://creativecommons.org/licenses/by-nc/2.5/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and Permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Megawati, Jang, S., Yi, M.Y. (2016). Utilization of DBpedia Mapping in Cross Lingual Wikipedia Infobox Completion. In: Kang, B., Bai, Q. (eds) AI 2016: Advances in Artificial Intelligence. AI 2016. Lecture Notes in Computer Science(), vol 9992. Springer, Cham. https://doi.org/10.1007/978-3-319-50127-7_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-50127-7_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-50126-0

  • Online ISBN: 978-3-319-50127-7

  • eBook Packages: Computer ScienceComputer Science (R0)