Abstract
Wikipedia plays a central role in the web as one of the biggest knowledge source due to its large coverage of information that comes from various domains. However, due to the enormous number of pages and limited number of contributors to maintain all of the pages, the problem of missing information among Wikipedia articles has emerged, especially articles in multiple language versions. Several approaches have been studied to fix information gap in between cross- language Wikipedia articles. However, they can only be applied for languages that came from the same root. In this paper, we propose an approach to generate new information for Wikipedia infoboxes written in different languages with different roots by utilizing the existing DBpedia mappings. We combined mapping information from DBpedia with an instance-based method to align the existing Korean-English infobox attribute-value pairs as well as to generate new pairs from the Korean version to fill missing information in the English version. The results showed that we could expand up to 38% of the existing English Wikipedia attribute-value pairs from our datasets with 61% of accuracy.
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
DBpedia mapping (http://mappings.dbpedia.org/) version 5 March 2016.
- 6.
References
Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyaniak, R., Hellmann, S.: DBpedia - a crystallization point for the web of data. Web Sem. Sci. Serv. Agents World Wide Web 7(3), 154–165 (2009)
Rinser, D., Lange, D., Naumann, F.: Cross-lingual entity matching and infobox alignment in Wikipedia. Inf. Syst. 38(6), 887–907 (2013)
Palmero Aprosio, A., Giuliano, C., Lavelli, A.: Towards an automatic creation of localized versions of DBpedia. In: Alani, H., et al. (eds.) ISWC 2013, Part I. LNCS, vol. 8218, pp. 494–509. Springer, Heidelberg (2013)
Adar, E., Skinner, M., Weld, D.S.: Information arbitrage across multi-lingual Wikipedia. In: Proceedings of the Second ACM International Conference on Web Search and Data Mining. ACM (2009)
Wu, F., Weld, D.S.: Autonomously semantifying Wikipedia. In: Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management. ACM (2007)
Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB J. 10(4), 334–350 (2001)
Melnik, S., Garcia-Molina, H., Rahm, E.: Similarity flooding: a versatile graph matching algorithm and its application to schema matching. In: Proceedings of 18th International Conference on Data Engineering. IEEE (2002)
Li, W.-S., Clifton, C.: SEMINT: a tool for identifying attribute correspondences in heterogeneous databases using neural networks. Data Knowl. Eng. 33(1), 49–84 (2000)
Nottelmann, H., Straccia, U.: Information retrieval and machine learning for probabilistic schema matching. Inf. Process. Manag. 43(3), 552–576 (2007)
Kohonen, T.: Adaptive, associative, and self-organizing functions in neural computing. Appl. Opt. 26(23), 4910–4918 (1987)
Fuhr, N.: Probabilistic datalog—a logic for powerful retrieval methods. In: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM (1995)
Wang, H., et al.: Identifying indirect attribute correspondences in multilingual schemas. In: 17th International Workshop on Database and Expert Systems Applications, 2006. DEXA 2006. IEEE (2006)
Fu, B., Brennan, R., O’Sullivan, D.: Cross-lingual ontology mapping – an investigation of the impact of machine translation. In: Gómez-Pérez, A., Yu, Y., Ding, Y. (eds.) ASWC 2009. LNCS, vol. 5926, pp. 1–15. Springer, Heidelberg (2009)
Dos Santos, C.T., Quaresma, P., Vieira, R.: An API for multilingual ontology matching. In: Proceedings of 7th Conference on Language Resources and Evaluation Conference (LREC). No commercial editor (2010)
Bouma, G., Duarte, S., Islam, Z.: Cross-lingual alignment and completion of Wikipedia templates. In: Proceedings of the Third International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies. Association for Computational Linguistics (2009)
Nguyen, T., et al.: Multilingual schema matching for Wikipedia infoboxes. Proc. VLDB Endow. 5(2), 133–144 (2011)
Cojan, J., Cabrio, E., Gandon, F.: Filling the gaps among DBpedia multilingual chapters for question answering. In: Proceedings of the 5th Annual ACM Web Science Conference. ACM (2013)
Kim, E.-K., Choi, K.-S.: Cross-lingual property alignment for DBpedia ontology using triple conceptualization (2014)
Palmero Aprosio, A., Giuliano, C., Lavelli, A.: Automatic expansion of DBpedia exploiting Wikipedia cross-language information. In: Cimiano, P., Corcho, O., Presutti, V., Hollink, L., Rudolph, S. (eds.) ESWC 2013. LNCS, vol. 7882, pp. 397–411. Springer, Heidelberg (2013). doi:10.1007/978-3-642-38288-8_27
Kim, E.K., et al.: An approach for supplementing the Korean Wikipedia based on DBpedia. Liliana Cabral (Open University, UK) Tania Tudorache (Stanford University, USA), p. 7 (2010)
Mahdisoltani, F., Biega, J., Suchanek, F.: Yago3: a knowledge base from multilingual Wikipedias. In: 7th Biennial Conference on Innovative Data Systems Research. CIDR Conference (2014)
Tacchini, E., Schultz, A., Bizer, C.: Experiments with Wikipedia cross-language data fusion. In: Workshop on Scripting and Development (2009)
Spohr, D., Hollink, L., Cimiano, P.: A machine learning approach to multilingual and cross-lingual ontology matching. In: Aroyo, L., et al. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 665–680. Springer, Heidelberg (2011)
Salhi, A., Camacho, H.: A string metric based on a one-to-one greedy matching algorithm. Res. Comput. Sci. 19, 171–182 (2006)
Lee, T.Y., et al.: Automating relational database schema design for very large semantic datasets. Technical report, Department of Computer Science, University of Hong Kong (2013)
Lehmann, J., et al.: DBpedia–a large-scale, multilingual knowledge base extracted from Wikipedia. Semant. Web 6(2), 167–195 (2015)
Acknowledgments
This work was supported by the Industrial Strategic Technology Development Program, 10052955, Experiential Knowledge Platform Development Research for the Acquisition and Utilization of Field Expert Knowledge, funded by the Ministry of Trade, Industry & Energy (MI, Korea).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Megawati, Jang, S., Yi, M.Y. (2016). Utilization of DBpedia Mapping in Cross Lingual Wikipedia Infobox Completion. In: Kang, B.H., Bai, Q. (eds) AI 2016: Advances in Artificial Intelligence. AI 2016. Lecture Notes in Computer Science(), vol 9992. Springer, Cham. https://doi.org/10.1007/978-3-319-50127-7_25
Download citation
DOI: https://doi.org/10.1007/978-3-319-50127-7_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50126-0
Online ISBN: 978-3-319-50127-7
eBook Packages: Computer ScienceComputer Science (R0)