Knowledge graph construction from multiple online encyclopedias

  • Tianxing Wu
  • Haofen Wang
  • Cheng Li
  • Guilin QiEmail author
  • Xing Niu
  • Meng Wang
  • Lin Li
  • Chaomin Shi
Part of the following topical collections:
  1. Special Issue on Application-Driven Knowledge Acquisition


In recent years, lots of knowledge graphs built from Wikipedia, the largest multilingual online encyclopedia, have been published on the Web to support various applications. However, since non-English data in Wikipedia are sparse, some projects work on knowledge graph construction from multiple non-English online encyclopedias, but many technical details are missing, so it is hard to reuse their frameworks or techniques. In this paper, we propose a new framework to solve knowledge graph construction from multiple online encyclopedias. The core modules are knowledge extraction and knowledge linking. Knowledge extraction consists of regular extraction, i.e., extracting targeted article contents in the whole online encyclopedias periodically, and live extraction, which only extracts the article contents of new and updated entities. Knowledge linking utilizes heuristic lightweight entity matching strategies and a semi-supervised learning method to find duplicated entities and properties from different online encyclopedias. Experimental results show that our approaches for knowledge extraction and linking outperform state-of-the-art baselines in different evaluation metrics, and our framework can generate a large-scale knowledge graph after inputting multiple online encyclopedias.


Knowledge graph Knowledge extraction Knowledge linking Semantic Web 



This work was supported in part by National Key R&D Program of China (2017YFB1002801, 2018YFC0830200), National Natural Science Foundation of China Key Project (U1736204), and the Judicial Big Data Research Centre, School of Law at Southeast University.


  1. 1.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. Proc. of VLDB 1215, 487–499 (1994)Google Scholar
  2. 2.
    Bizer, C., Heath, T., Berners-Lee, T.: Linked data-the story so far. Int. J. Semantic Web Inf. Syst. 5(3), 1–22 (2009)CrossRefGoogle Scholar
  3. 3.
    Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: DBpedia-a crystallization point for the Web of data. J. Web Semantics 7(3), 154–165 (2009)CrossRefGoogle Scholar
  4. 4.
    Brown, L.D., Cai, T.T., DasGupta, A.: Interval estimation for a binomial proportion. Stat. Sci., 101–117 (2001)Google Scholar
  5. 5.
    Chen, M., Tian, Y., Yang, M., Zaniolo, C.: Multilingual knowledge graph embeddings for cross-lingual knowledge alignment. In: Proc. of IJCAI, pp 1511–1517 (2017)Google Scholar
  6. 6.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Series B (Methodological), 1–38 (1977)Google Scholar
  7. 7.
    Euzenat, J., Shvaiko, P.: Ontology Matching. Springer (2007)Google Scholar
  8. 8.
    Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)CrossRefzbMATHGoogle Scholar
  9. 9.
    Fleiss, J.L.: Measuring nominal scale agreement among many raters. Psychol. Bull. 76(5), 378 (1971)CrossRefGoogle Scholar
  10. 10.
    Hellmann, S., Stadler, C., Lehmann, J., Auer, S.: DBpedia live extraction (2009)Google Scholar
  11. 11.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  12. 12.
    Hu, W., Chen, J., Qu, Y.: A self-training approach for resolving object coreference on the semantic Web. In: Proc. of WWW, pp 87–96 (2011)Google Scholar
  13. 13.
    Hu, W., Jia, C.: A bootstrapping approach to entity linkage on the semantic Web. J. Web Semantics 34, 1–12 (2015)CrossRefGoogle Scholar
  14. 14.
    Jin, H., Li, C., Zhang, J., Hou, L., Li, J., Zhang, P.: XLORE2: Large-scale cross-lingual knowledge graph construction and application. Data Intell. 1 (1), 77–98 (2019)CrossRefGoogle Scholar
  15. 15.
    Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., Van Kleef, P., Auer, S., et al: Dbpedia–A large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web 6(2), 167–195 (2015)Google Scholar
  16. 16.
    Liang, J., Zhang, S., Xiao, Y.: How to keep a knowledge base synchronized with its encyclopedia source. In: Proc. of IJCAI, pp 3749–3755 (2017)Google Scholar
  17. 17.
    Mahdisoltani, F., Biega, J., Suchanek, F.M.: YAGO3: A knowledge base from multilingual wikipedias. In: Proc. of CIDR (2013)Google Scholar
  18. 18.
    Mikolov, T., Zweig, G.: Context dependent recurrent neural network language model. In: Proc. of SLT, pp 234–239 (2013)Google Scholar
  19. 19.
    Navigli, R., Ponzetto, S.P.: BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artif. Intell. 193, 217–250 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Nentwig, M., Hartung, M., Ngonga Ngomo, A.C., Rahm, E.: A survey of current link discovery frameworks. Semantic Web 8(3), 419–436 (2017)CrossRefGoogle Scholar
  21. 21.
    Ngomo, A.C.N., Auer, S.: LIMES-a time-efficient approach for large-scale link discovery on the Web of data. In: Proc. of IJCAI, pp 2312–2317 (2011)Google Scholar
  22. 22.
    Nikolov, A., Uren, V., Motta, E.: KnoFuss: A comprehensive architecture for knowledge fusion. In: Proc. of K-CAP, pp 185–186 (2007)Google Scholar
  23. 23.
    Niu, X., Sun, X., Wang, H., Rong, S., Qi, G., Yu, Y.: Zhishi. me-Weaving Chinese linking open data. In: Proc. of ISWC, Part II, pp 205–220 (2011)Google Scholar
  24. 24.
    Niu, X., Rong, S., Wang, H., Yu, Y.: An effective rule miner for instance matching in a Web of data. In: Proc. of CIKM, pp 1085–1094 (2012)Google Scholar
  25. 25.
    Rico, M., Mihindukulasooriya, N., Gómez-Pérez, A.: Data-driven RDF property semantic-equivalence detection using NLP techniques. In: European Knowledge Acquisition Workshop, pp 797–804 (2016)Google Scholar
  26. 26.
    Shafer, G.: A Mathematical Theory of Evidence. Princeton University Press, Princeton (1976)zbMATHGoogle Scholar
  27. 27.
    Sherif, M.A., Ngomo, A.C.N., Lehmann, J.: WOMBAT–A generalization approach for automatic link discovery. In: European Semantic Web Conference, pp 103–119 (2017)Google Scholar
  28. 28.
    Sun, Z., Hu, W., Li, C.: Cross-lingual entity alignment via joint attribute-preserving embedding. In: Proc. of ISWC, Part I, pp 628–644 (2017)Google Scholar
  29. 29.
    Sun, Z., Hu, W., Zhang, Q., Qu, Y.: Bootstrapping entity alignment with knowledge graph embedding. In: Proc. of IJCAI, pp 4396–4402 (2018)Google Scholar
  30. 30.
    Völker, J., Niepert, M.: Statistical schema induction. In: Proc. of ESWC, pp 124–138 (2011)Google Scholar
  31. 31.
    Volz, J., Bizer, C., Gaedke, M., Kobilarov, G.: Discovering and maintaining links on the Web of data. In: Proc. of ISWC, pp 650–665 (2009)Google Scholar
  32. 32.
    Wang, Z., Li, J., Wang, Z., Li, S., Li, M., Zhang, D., Shi, Y., Liu, Y., Zhang, P., Tang, J.: XLore: A large-scale english-chinese bilingual knowledge graph. In: Proc. of ISWC (Posters & Demos), vol. 1035, pp 121–124 (2013)Google Scholar
  33. 33.
    Werbos, P.J.: Backpropagation through time: What it does and how to do it? Proc. IEEE 78(10), 1550–1560 (1990)CrossRefGoogle Scholar
  34. 34.
    Wu, F., Weld, D.S.: Autonomously semantifying wikipedia. In: Proc. of CIKM, pp 41–50 (2007)Google Scholar
  35. 35.
    Wu, T., Qi, G., Li, C., Wang, M.: A survey of techniques for constructing Chinese knowledge graphs and their applications. Sustainability 10(9), 3245 (2018)CrossRefGoogle Scholar
  36. 36.
    Wu, T., Qi, G., Luo, B., Zhang, L., Wang, H.: Language-independent type inference of the instances from multilingual wikipedia. Int. J. Semantic Web Inf. Syst. 15(2), 22–46 (2019)CrossRefGoogle Scholar
  37. 37.
    Xu, B., Xu, Y., Liang, J., Xie, C., Liang, B., Cui, W., Xiao, Y.: CN-DBpedia: A never-ending chinese knowledge extraction system. In: Proc. of IEA/AIE, pp 428–438 (2017)Google Scholar
  38. 38.
    Zhang, Z., Gentile, A.L., Blomqvist, E., Augenstein, I., Ciravegna, F.: Statistical knowledge patterns: Identifying synonymous relations in large linked datasets. In: Proc. of ISWC, Part I, pp 703–719 (2013)Google Scholar
  39. 39.
    Zhu, H., Xie, R., Liu, Z., Sun, M.: Iterative entity alignment via joint knowledge embeddings. In: Proc. of IJCAI, pp 4258–4264 (2017)Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  • Tianxing Wu
    • 1
    • 2
  • Haofen Wang
    • 3
  • Cheng Li
    • 1
  • Guilin Qi
    • 1
    Email author
  • Xing Niu
    • 4
  • Meng Wang
    • 1
  • Lin Li
    • 1
  • Chaomin Shi
    • 1
  1. 1.Southeast UniversityNanjingChina
  2. 2.Nanyang Technological UniversitySingaporeSingapore
  3. 3.Intelligent Big Data Visualization Lab, Tongji UniversityShanghaiChina
  4. 4.University of MarylandCollege ParkUSA

Personalised recommendations