Skip to main content
Log in

Knowledge extraction from Chinese wiki encyclopedias

  • Published:
Journal of Zhejiang University SCIENCE C Aims and scope Submit manuscript

Abstract

The vision of the Semantic Web is to build a ‘Web of data’ that enables machines to understand the semantics of information on the Web. The Linked Open Data (LOD) project encourages people and organizations to publish various open data sets as Resource Description Framework (RDF) on the Web, which promotes the development of the Semantic Web. Among various LOD datasets, DBpedia has proved a successful structured knowledge base, and has become the central interlinking-hub of the Web of data in English. However, in the Chinese language, there is little linked data published and linked to DBpedia. This hinders the structured knowledge sharing of both Chinese and cross-lingual resources. This paper deals with an approach for building a large-scale Chinese structured knowledge base from Chinese wiki resources, including Hudong and Baidu Baike. The proposed approach first builds an ontology based on the wiki category system and infoboxes, and then extracts instances from wiki articles. Using Hudong as our source, our approach builds an ontology containing 19 542 concepts and 2381 properties. 802 593 instances are extracted and described using the concepts and properties in the extracted ontology and 62 679 of them are linked to equivalent instances in DBpedia. As from Baidu Baike, our approach builds an ontology containing 299 concepts, 37 object properties, and 5590 data type properties. 1 319 703 instances are extracted from Baidu Baike, and 84 343 of them are linked to instances in DBpedia. We provide RDF dumps and SPARQL endpoint to access the established Chinese knowledge bases. The knowledge bases built using our approach can be used not only in Chinese linked data building, but also in many useful applications of large-scale knowledge bases, such as question-answering and semantic search.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Auer, S.R., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z., 2007. DBpedia: a Nucleus for a Web of Open Data. Proc. 6th Int. Semantic Web Conf. and 2nd Asian Semantic Web Conf., p.722–735.

  • Berners-Lee, T., 1998. Semantic Web Road Map. Available from http://www.w3.org/DesignIssues/Semantic.html

  • Bizer, C., Lehmann, J., Kobilarov, G., Auer, S.R., Becker, C., Cyganiak, R., Hellmann, S., 2009a. DBpedia—a crystallization point for the Web of data. Web Semant., 7(3):154–165. [doi:10.1016/j.websem.2009.07.002]

    Article  Google Scholar 

  • Bizer, C., Heath, T., Berners-Lee, T., 2009b. Linked data—the story so far. Int. J. Semant. Web Inform. Syst., 5(3):1–22. [doi:10.4018/jswis.2009081901]

    Article  Google Scholar 

  • Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J., 2008. Freebase: a Collaboratively Created Graph Database for Structuring Human Knowledge. Proc. ACM SIGMOD Int. Conf. on Management of Data, p.1247–1250. [doi:10.1145/1376616.1376746]

  • Buitelaar, P., Cimiano, P., 2008. Ontology Learning and Population: Bridging the Gap Between Text and Knowledge. Frontiers in Artificial Intelligence and Applications, 167:45–69.

    Google Scholar 

  • Buitelaar, P., Cimiano, P., Magnini, B., 2005. Ontology Learning from Text: Methods, Evaluation and Applications. IOS Press, Amsterdam.

    Google Scholar 

  • Euzenat, J., Shvaiko, P., 2007. Ontology Matching. Springer-Verlag, Heidelberg (DE).

    MATH  Google Scholar 

  • Fellbaum, C., 1998. WordNet: an Electronic Lexical Database. MIT Press, Cambridge, MA.

    MATH  Google Scholar 

  • García-Silva, A., Szomszor, Y., Alani, M.Y., Corcho, Ó.H.Y., 2009. Preliminary Results in Tag Disambiguation Using DBpedia. 1st Int. Workshop Collective Knowledge Capturing and Representation, p.41–44.

  • Heath, T., Bizer, C., 2011. Linked data: evolving the Web into a global data space. Synth. Lect. Semant. Web Theory Technol., 1(1):1–136. [doi:10.2200/S00334ED1V01Y201102WBE001]

    Article  Google Scholar 

  • Kasneci, G., Ramanath, M., Suchanek, F., Weikum, G., 2008. The YAGO-NAGA approach to knowledge discovery. SIGMOD Rec., 37(4):41–47. [doi:10.1145/1519103.1519110]

    Article  Google Scholar 

  • Lenat, D.B., 1995. CYC: a large-scale investment in knowledge infrastructure. ACM Commun., 38(11):33–38. [doi:10.1145/219717.219745]

    Article  Google Scholar 

  • Maedche, A., Staab, S., 2001. Ontology learning for the Semantic Web. IEEE Intell. Syst., 16(2):72–79. [doi:10.1109/5254.920602]

    Article  Google Scholar 

  • Matuszek, C., Cabral, J., Witbrock, M., Deoliveira, J., 2006. An Introduction to the Syntax and Content of Cyc. AAAI Spring Symp., p.44–49.

  • Melo, G.D., Weikum, G., 2010. MENTA: Inducing Multilingual Taxonomies from Wikipedia. Proc. 19th ACM Int. Conf. on Information and Knowledge Management, p.1099–1108.

  • Navigli, R., Velardi, P., 2004. Learning domain ontologies from document warehouses and dedicated Web sites. Comput. Ling., 30(2):151–179. [doi:10.1162/089120104323093276]

    Article  MATH  Google Scholar 

  • Navigli, R., Velardi, P., Gangemi, A., 2003. Ontology learning and its application to automated terminology translation. IEEE Intell. Syst. Their Appl., 18(1):22–31. [doi:10.1109/MIS.2003.1179190]

    Article  Google Scholar 

  • Niles, I., Pease, A., 2001. Towards a Standard Upper Ontology. Proc. Int. Conf. on Formal Ontology in Information Systems, p.2–9.

  • Niu, X., Sun, X., Wang, H., Rong, S., Qi, G., Yu, Y., 2011. Zhishi.me—Weaving Chinese Linking Open Data. Proc. 10th Int. Conf. on the Semantic Web, 2:205–220.

    Google Scholar 

  • Passant, A., 2010. dbrec—Music Recommendations Using DBpedia. Proc. 9th Int. Semantic Web Conf., 2:209–224.

    Google Scholar 

  • Pease, A., Niles, I., 2002. IEEE standard upper ontology: a progress report. Knowl. Eng. Rev., 17(1):65–70. [doi:10.1017/S0269888902000395]

    Article  Google Scholar 

  • Piek, V., 1997. EuroWordNet: a Multilingual Database for Information Retrieval. Proc. Delos Workshop on Cross-Language Information Retrieval, p.5–7.

  • Ponzetto, S.P., Strube, M., 2007. Deriving a Large Scale Taxonomy from Wikipedia. Proc. 22nd National Conf. on Artificial Intelligence, 2:1440–1445.

    Google Scholar 

  • Shadbolt, N., Berners-Lee, T., Hall, W., 2006. The Semantic Web revisited. IEEE Intell. Syst. Their Appl., 21(3):96–101. [doi:10.1109/MIS.2006.62]

    Article  Google Scholar 

  • Suchanek, F.M., Kasneci, G., Weikum, G., 2007. Yago: a Core of Semantic Knowledge. Proc. 16th Int. Conf. on World Wide Web, p.697–706. [doi:10.1145/1242572.1242667]

  • Suchanek, F.M., Kasneci, G., Weikum, G., 2008. YAGO: a large ontology from Wikipedia and WordNet. Web Semant., 6(3):203–217. [doi:10.1016/j.websem.2008.06.001]

    Article  Google Scholar 

  • Vossen, P., 1998. Introduction to EuroWordNet. Comput. Human., 32(2/3):73–89. [doi:10.1023/A:1001175424222]

    Article  Google Scholar 

  • Wu, F., Weld, D.S., 2007. Autonomously Semantifying Wikipedia. Proc. 16th ACM Conf. on Information and Knowledge Management, p.41–50.

  • Wu, F., Weld, D.S., 2008. Automatically Refining the Wikipedia Infobox Ontology. Proc. 17th Int. Conf. on World Wide Web, p.635–644. [doi:10.1145/1367497.1367583]

Recommended reading

  • Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S., 2009. DBpedia—a crystallization point for the Web of data. Web Semant., 7(3):154–165. [doi:10.1016/j.websem.2009.07.002]

    Article  Google Scholar 

  • Suchanek, F.M., Kasneci, G., Weikum, G., 2008. YAGO: a large ontology from Wikipedia and WordNet. Web Semant., 6(3):203–217. [doi:10.1016/j.websem.2008.06.001]

    Article  Google Scholar 

  • Bizer, C., Heath, T., Berners-Lee, T., 2009. Linked Data—the story so far. Int. J. Semant. Web Inf. Syst., 5(3):1–22. [doi:10.4018/jswis.2009081901]

    Article  Google Scholar 

  • Wu, F., Weld, D.S., 2008. Automatically Refining the Wikipedia Infobox Ontology. Proc. 17th Int. Conf. on World Wide Web, p.635–644. [doi:10.1145/1367497.1367583]

  • Kontokostas, D., Bratsas, C., Auer, S., Hellmann, S., Antoniou, I., Metakides, G., 2012. Internationalization of Linked Data: the case of the Greek DBpedia edition. Web Semant., online 18 January 2012.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhi-chun Wang.

Additional information

Project supported by the National Natural Science Foundation of China (Nos. 661035004 and 60973102), the China Postdoctoral Science Foundation (No. 20110490390), and the THU-NUS Next Research Center

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, Zc., Wang, Zg., Li, Jz. et al. Knowledge extraction from Chinese wiki encyclopedias. J. Zhejiang Univ. - Sci. C 13, 268–280 (2012). https://doi.org/10.1631/jzus.C1101008

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1631/jzus.C1101008

Key words

CLC number

Navigation