Advertisement

Journal of Zhejiang University SCIENCE C

, Volume 13, Issue 4, pp 268–280 | Cite as

Knowledge extraction from Chinese wiki encyclopedias

  • Zhi-chun Wang
  • Zhi-gang Wang
  • Juan-zi Li
  • Jeff Z. Pan
Article

Abstract

The vision of the Semantic Web is to build a ‘Web of data’ that enables machines to understand the semantics of information on the Web. The Linked Open Data (LOD) project encourages people and organizations to publish various open data sets as Resource Description Framework (RDF) on the Web, which promotes the development of the Semantic Web. Among various LOD datasets, DBpedia has proved a successful structured knowledge base, and has become the central interlinking-hub of the Web of data in English. However, in the Chinese language, there is little linked data published and linked to DBpedia. This hinders the structured knowledge sharing of both Chinese and cross-lingual resources. This paper deals with an approach for building a large-scale Chinese structured knowledge base from Chinese wiki resources, including Hudong and Baidu Baike. The proposed approach first builds an ontology based on the wiki category system and infoboxes, and then extracts instances from wiki articles. Using Hudong as our source, our approach builds an ontology containing 19 542 concepts and 2381 properties. 802 593 instances are extracted and described using the concepts and properties in the extracted ontology and 62 679 of them are linked to equivalent instances in DBpedia. As from Baidu Baike, our approach builds an ontology containing 299 concepts, 37 object properties, and 5590 data type properties. 1 319 703 instances are extracted from Baidu Baike, and 84 343 of them are linked to instances in DBpedia. We provide RDF dumps and SPARQL endpoint to access the established Chinese knowledge bases. The knowledge bases built using our approach can be used not only in Chinese linked data building, but also in many useful applications of large-scale knowledge bases, such as question-answering and semantic search.

Key words

Semantic Web Linked Data Ontology Knowledge base 

CLC number

TP311 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Auer, S.R., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z., 2007. DBpedia: a Nucleus for a Web of Open Data. Proc. 6th Int. Semantic Web Conf. and 2nd Asian Semantic Web Conf., p.722–735.Google Scholar
  2. Berners-Lee, T., 1998. Semantic Web Road Map. Available from http://www.w3.org/DesignIssues/Semantic.html
  3. Bizer, C., Lehmann, J., Kobilarov, G., Auer, S.R., Becker, C., Cyganiak, R., Hellmann, S., 2009a. DBpedia—a crystallization point for the Web of data. Web Semant., 7(3):154–165. [doi:10.1016/j.websem.2009.07.002]CrossRefGoogle Scholar
  4. Bizer, C., Heath, T., Berners-Lee, T., 2009b. Linked data—the story so far. Int. J. Semant. Web Inform. Syst., 5(3):1–22. [doi:10.4018/jswis.2009081901]CrossRefGoogle Scholar
  5. Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J., 2008. Freebase: a Collaboratively Created Graph Database for Structuring Human Knowledge. Proc. ACM SIGMOD Int. Conf. on Management of Data, p.1247–1250. [doi:10.1145/1376616.1376746]Google Scholar
  6. Buitelaar, P., Cimiano, P., 2008. Ontology Learning and Population: Bridging the Gap Between Text and Knowledge. Frontiers in Artificial Intelligence and Applications, 167:45–69.Google Scholar
  7. Buitelaar, P., Cimiano, P., Magnini, B., 2005. Ontology Learning from Text: Methods, Evaluation and Applications. IOS Press, Amsterdam.Google Scholar
  8. Euzenat, J., Shvaiko, P., 2007. Ontology Matching. Springer-Verlag, Heidelberg (DE).zbMATHGoogle Scholar
  9. Fellbaum, C., 1998. WordNet: an Electronic Lexical Database. MIT Press, Cambridge, MA.zbMATHGoogle Scholar
  10. García-Silva, A., Szomszor, Y., Alani, M.Y., Corcho, Ó.H.Y., 2009. Preliminary Results in Tag Disambiguation Using DBpedia. 1st Int. Workshop Collective Knowledge Capturing and Representation, p.41–44.Google Scholar
  11. Heath, T., Bizer, C., 2011. Linked data: evolving the Web into a global data space. Synth. Lect. Semant. Web Theory Technol., 1(1):1–136. [doi:10.2200/S00334ED1V01Y201102WBE001]CrossRefGoogle Scholar
  12. Kasneci, G., Ramanath, M., Suchanek, F., Weikum, G., 2008. The YAGO-NAGA approach to knowledge discovery. SIGMOD Rec., 37(4):41–47. [doi:10.1145/1519103.1519110]CrossRefGoogle Scholar
  13. Lenat, D.B., 1995. CYC: a large-scale investment in knowledge infrastructure. ACM Commun., 38(11):33–38. [doi:10.1145/219717.219745]CrossRefGoogle Scholar
  14. Maedche, A., Staab, S., 2001. Ontology learning for the Semantic Web. IEEE Intell. Syst., 16(2):72–79. [doi:10.1109/5254.920602]CrossRefGoogle Scholar
  15. Matuszek, C., Cabral, J., Witbrock, M., Deoliveira, J., 2006. An Introduction to the Syntax and Content of Cyc. AAAI Spring Symp., p.44–49.Google Scholar
  16. Melo, G.D., Weikum, G., 2010. MENTA: Inducing Multilingual Taxonomies from Wikipedia. Proc. 19th ACM Int. Conf. on Information and Knowledge Management, p.1099–1108.Google Scholar
  17. Navigli, R., Velardi, P., 2004. Learning domain ontologies from document warehouses and dedicated Web sites. Comput. Ling., 30(2):151–179. [doi:10.1162/089120104323093276]zbMATHCrossRefGoogle Scholar
  18. Navigli, R., Velardi, P., Gangemi, A., 2003. Ontology learning and its application to automated terminology translation. IEEE Intell. Syst. Their Appl., 18(1):22–31. [doi:10.1109/MIS.2003.1179190]CrossRefGoogle Scholar
  19. Niles, I., Pease, A., 2001. Towards a Standard Upper Ontology. Proc. Int. Conf. on Formal Ontology in Information Systems, p.2–9.Google Scholar
  20. Niu, X., Sun, X., Wang, H., Rong, S., Qi, G., Yu, Y., 2011. Zhishi.me—Weaving Chinese Linking Open Data. Proc. 10th Int. Conf. on the Semantic Web, 2:205–220.Google Scholar
  21. Passant, A., 2010. dbrec—Music Recommendations Using DBpedia. Proc. 9th Int. Semantic Web Conf., 2:209–224.Google Scholar
  22. Pease, A., Niles, I., 2002. IEEE standard upper ontology: a progress report. Knowl. Eng. Rev., 17(1):65–70. [doi:10.1017/S0269888902000395]CrossRefGoogle Scholar
  23. Piek, V., 1997. EuroWordNet: a Multilingual Database for Information Retrieval. Proc. Delos Workshop on Cross-Language Information Retrieval, p.5–7.Google Scholar
  24. Ponzetto, S.P., Strube, M., 2007. Deriving a Large Scale Taxonomy from Wikipedia. Proc. 22nd National Conf. on Artificial Intelligence, 2:1440–1445.Google Scholar
  25. Shadbolt, N., Berners-Lee, T., Hall, W., 2006. The Semantic Web revisited. IEEE Intell. Syst. Their Appl., 21(3):96–101. [doi:10.1109/MIS.2006.62]CrossRefGoogle Scholar
  26. Suchanek, F.M., Kasneci, G., Weikum, G., 2007. Yago: a Core of Semantic Knowledge. Proc. 16th Int. Conf. on World Wide Web, p.697–706. [doi:10.1145/1242572.1242667]Google Scholar
  27. Suchanek, F.M., Kasneci, G., Weikum, G., 2008. YAGO: a large ontology from Wikipedia and WordNet. Web Semant., 6(3):203–217. [doi:10.1016/j.websem.2008.06.001]CrossRefGoogle Scholar
  28. Vossen, P., 1998. Introduction to EuroWordNet. Comput. Human., 32(2/3):73–89. [doi:10.1023/A:1001175424222]CrossRefGoogle Scholar
  29. Wu, F., Weld, D.S., 2007. Autonomously Semantifying Wikipedia. Proc. 16th ACM Conf. on Information and Knowledge Management, p.41–50.Google Scholar
  30. Wu, F., Weld, D.S., 2008. Automatically Refining the Wikipedia Infobox Ontology. Proc. 17th Int. Conf. on World Wide Web, p.635–644. [doi:10.1145/1367497.1367583]Google Scholar

Recommended reading

  1. Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S., 2009. DBpedia—a crystallization point for the Web of data. Web Semant., 7(3):154–165. [doi:10.1016/j.websem.2009.07.002]CrossRefGoogle Scholar
  2. Suchanek, F.M., Kasneci, G., Weikum, G., 2008. YAGO: a large ontology from Wikipedia and WordNet. Web Semant., 6(3):203–217. [doi:10.1016/j.websem.2008.06.001]CrossRefGoogle Scholar
  3. Bizer, C., Heath, T., Berners-Lee, T., 2009. Linked Data—the story so far. Int. J. Semant. Web Inf. Syst., 5(3):1–22. [doi:10.4018/jswis.2009081901]CrossRefGoogle Scholar
  4. Wu, F., Weld, D.S., 2008. Automatically Refining the Wikipedia Infobox Ontology. Proc. 17th Int. Conf. on World Wide Web, p.635–644. [doi:10.1145/1367497.1367583]Google Scholar
  5. Kontokostas, D., Bratsas, C., Auer, S., Hellmann, S., Antoniou, I., Metakides, G., 2012. Internationalization of Linked Data: the case of the Greek DBpedia edition. Web Semant., online 18 January 2012.Google Scholar

Copyright information

© Journal of Zhejiang University Science Editorial Office and Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Zhi-chun Wang
    • 1
  • Zhi-gang Wang
    • 1
  • Juan-zi Li
    • 1
  • Jeff Z. Pan
    • 2
  1. 1.Department of Computer Science and TechnologyTsinghua UniversityBeijingChina
  2. 2.Department of Computer ScienceUniversity of AberdeenAberdeenUK

Personalised recommendations