Evaluating and Comparing Web-Scale Extracted Knowledge Bases in Chinese and English

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9544)

Abstract

DBpedia and YAGO are the two main data sources serving as the hub of Linking Open Data (LOD), and they both contain Chinese data. Zhishi.me and SSCO extract Chinese knowledge from Wikipedia and other Chinese Encyclopedic Web sites like Baidu-Baike and Hudong-Baike. The quality of these Knowledge Bases (KBs) are not well investigated while their qualities are key to smart applications. In this paper, we evaluate three large Chinese KBs including DBpedia Chinese, zhishi.me and SSCO, and further compare them with English KBs. Since traditional methods on evaluating Web ontology can not be easily adapted to web-scale extracted KBs, we design two metric sets considering Richness and Correctness based on a quasi-formal conceptual representation to measure and compare these KBs. We also design a novel metric set on overlapped instances of different KBs to make the metric results comparable. Finally, we employ random sampling to reduce human efforts for assessing the correctness. The findings in these KBs give a detailed status report of the current situation of extracted KBs in both Chinese and English.

References

  1. 1.
    Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: Dbpedia-a crystallization point for the web of data. Web Semant.: Sci. Serv. Agents World Wide Web 7(3), 154–165 (2009)CrossRefGoogle Scholar
  2. 2.
    Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: Proceedings of the 16th international conference on World Wide Web, pp. 697–706. ACM (2007)Google Scholar
  3. 3.
    Niu, X., Sun, X., Wang, H., Rong, S., Qi, G., Yu, Y.: Zhishi.me - Weaving chinese linking open data. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part II. LNCS, vol. 7032, pp. 205–220. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  4. 4.
    Hu, F., Shao, Z., Ruan, T.: Self-supervised chinese ontology learning from online encyclopedias. The Scientific World Journal, AcceptedGoogle Scholar
  5. 5.
    Wang, Z., Li, J., Wang, Z., Li, S., Li, M., Zhang, D., Shi, Y., Liu, Y., Zhang, P., Tang, J.: Xlore: A large-scale english-chinese bilingual knowledge graph. In: Proceedings of the International Semantic Web Conference (2013)Google Scholar
  6. 6.
    Zaveri, A., Rula, A., Maurino, A., Pietrobon, R., Lehmann, J., Auer, S.: Quality assessment methodologies for linked open data. SWJ (2012)Google Scholar
  7. 7.
    Kontokostas, D., Zaveri, A., Auer, S., Lehmann, J.: TripleCheckMate: A tool for crowdsourcing the quality assessment of linked data. In: Klinov, P., Mouromtsev, D. (eds.) KESW 2013. CCIS, vol. 394, pp. 265–272. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  8. 8.
    Hoffart, J., Suchanek, F.M., Berberich, K., Weikum, G.: Yago2: A spatially and temporally enhanced knowledge base from wikipedia. In: Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, pp. 3161–3165. AAAI Press (2013)Google Scholar
  9. 9.
    Ell, B., Vrandečić, D., Simperl, E.: Labels in the web of data. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 162–176. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  10. 10.
    Zhang, H., Li, Y.F., Tan, H.B.K.: Measuring design complexity of semantic web ontologies. J. Syst. Softw. 83(5), 803–814 (2010)CrossRefGoogle Scholar
  11. 11.
    Guéret, C., Groth, P., Stadler, C., Lehmann, J.: Assessing linked data mappings using network measures. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 87–102. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  12. 12.
    Hogan, A., Harth, A., Passant, A., Decker, S., Polleres, A.: Weaving the pedantic web. In: Proceedings of the WWW Workshop on Linked Data on the Web (2010)Google Scholar
  13. 13.
    Färber, M., Ell, B., Menne, C., Rettinger, A.: A comparative survey of dbpedia, freebase, opencyc, wikidata, and yagoGoogle Scholar
  14. 14.
    Zaveri, A., Kontokostas, D., Sherif, M.A., Bühmann, L., Morsey, M., Auer, S., Lehmann, J.: User-driven quality evaluation of dbpedia. In: Proceedings of the 9th International Conference on Semantic Systems, pp. 97–104. ACM (2013)Google Scholar
  15. 15.
    Wienand, D., Paulheim, H.: Detecting incorrect numerical data in DBpedia. In: Presutti, V., d’Amato, C., Gandon, F., d’Aquin, M., Staab, S., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8465, pp. 504–518. Springer, Heidelberg (2014)CrossRefGoogle Scholar
  16. 16.
    Paulheim, H., Bizer, C.: Improving the quality of linked data using statistical distributions. Inter. J. Semant. Web Inf. Syst. (IJSWIS) 10(2), 63–86 (2014)CrossRefGoogle Scholar
  17. 17.
    Kontokostas, D., Westphal, P., Auer, S., Hellmann, S., Lehmann, J., Cornelissen, R., Zaveri, A.: Test-driven evaluation of linked data quality. In: Proceedings of the 23rd International Conference on World Wide Web, pp. 747–758. ACM (2014)Google Scholar
  18. 18.
    Dong, X., Gabrilovich, E., Heitz, G., Horn, W., Lao, N., Murphy, K., Strohmann, T., Sun, S., Zhang, W.: Knowledge vault: A web-scale approach to probabilistic knowledge fusion. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 601–610. ACM (2014)Google Scholar
  19. 19.
    Mendes, P.N., Mühleisen, H., Bizer, C.: Sieve: linked data quality assessment and fusion. In: Proceedings of the 2012 Joint EDBT/ICDT Workshops, pp. 116–123. ACM (2012)Google Scholar
  20. 20.
    Bizer, C., Cyganiak, R.: Quality-driven information filtering using the wiqa policy framework. Web Semant.: Sci. Serv. Agents World Wide Web 7(1), 1–10 (2009)CrossRefGoogle Scholar
  21. 21.
    Rieß, C., Heino, N., Tramp, S., Auer, S.: EvoPat – Pattern-based evolution and refactoring of RDF knowledge bases. In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z., Horrocks, I., Glimm, B. (eds.) ISWC 2010, Part I. LNCS, vol. 6496, pp. 647–662. Springer, Heidelberg (2010)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringEast China University of Science and TechnologyShanghaiChina

Personalised recommendations