Advertisement

Mining High-Quality Fine-Grained Type Information from Chinese Online Encyclopedias

  • Maoxiang Hao
  • Zhixu LiEmail author
  • Yan Zhao
  • Kai Zheng
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11234)

Abstract

Entity typing is a necessary step in building knowledge graphs. So far, plenty of efforts have been made in mining type information for entities from online encyclopedias, but usually only coarse-grained type information could be obtained for entities, which are not fine enough for the purpose of knowledge graphs construction or query answering. The situation becomes even worse for mining type information for entities in Chinese. In this paper, we work on mining high-quality fine-grained type information for entities from not only the title-labels and info-boxes in the entity’s encyclopedias page, but also the abstracts and crowd-labels in the page, which could provide a lot more candidate fine-grained type information (with noises). To maintain the high quality of the mined type information, initially we only get reliable type information from the title-labels and info-boxes. Then by putting entities, attributes, values and types into one graph, some path information can be obtained between each candidate entity-type pair, then we rely on a proposed Path-CNN binary classification model to identify more correct entity-type pairs from the graph. Compared with the previous approach and DBpedia, our work could mine a lot more high-quality fine-grained type information for entities from the online encyclopedia. By performing our approach on the largest Chinese online encyclopedia, Baidu Baike, we have generated 25,651,022 type information (with more than 80% accuracy) for the entities involved in this encyclopedia.

Keywords

Entity typing Entity classification Knowledge graph 

Notes

Acknowledgments

This research is partially supported by National Natural Science Foundation of China (Grant No. 61632016, 61402313, 61472263), and the Natural Science Research Project of Jiangsu Higher Education Institution (No. 17KJA520003).

References

  1. 1.
    Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007).  https://doi.org/10.1007/978-3-540-76298-0_52CrossRefGoogle Scholar
  2. 2.
    Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: SIGMOD 2008, pp. 1247–1250 (2008)Google Scholar
  3. 3.
    Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: ICONIP, pp. 2787–2795 (2013)Google Scholar
  4. 4.
    Chang, J.Z., Tsai, R.T., Chang, J.S.: Wikisense: supersense tagging of Wikipedia named entities based wordnet. In: PACLIC 23, pp. 72–81 (2009)Google Scholar
  5. 5.
    Cui, W., Wang, H., Wang, H., Song, Y., Hwang, S.W., Wang, W.: KBQA: learning question answering over QA corpora and knowledge bases. PVLDB 10(5), 565–576 (2017)Google Scholar
  6. 6.
    Dakka, W., Cucerzan, S.: Augmenting Wikipedia with named entity tags. In: IJCNLP, pp. 545–552 (2008)Google Scholar
  7. 7.
    Dong, Y., Chawla, N.V., Swami, A.: metapath2vec: scalable representation learning for heterogeneous networks. In: KDD, pp. 135–144 (2017)Google Scholar
  8. 8.
    Fellbaum, C., Miller, G.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)zbMATHGoogle Scholar
  9. 9.
    Higashinaka, R., Sadamitsu, K., Saito, K., Makino, T., Matsuo, Y.: Creating an extended named entity dictionary from Wikipedia. In: COLING, pp. 1163–1178 (2012)Google Scholar
  10. 10.
    Lin, Y., Liu, Z., Zhu, X., Zhu, X., Zhu, X.: Learning entity and relation embeddings for knowledge graph completion. In: AAAI, pp. 2181–2187 (2015)Google Scholar
  11. 11.
    Navigli, R., Ponzetto, S.P.: BabelNet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artif. Intell. 193(6), 217–250 (2012)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Niu, X., Sun, X., Wang, H., Rong, S., Qi, G., Yu, Y.: Zhishi.me - weaving Chinese linking open data. In: Aroyo, L., et al. (eds.) ISWC 2011. LNCS, vol. 7032, pp. 205–220. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-25093-4_14CrossRefGoogle Scholar
  13. 13.
    Sekine, S., Sudo, K., Nobata, C.: Extended named entity hierarchy. In: LREC (2002)Google Scholar
  14. 14.
    Shen, W., Han, J., Wang, J., Yuan, X., Yang, Z.: Shine+: a general framework for domain-specific entity linking with heterogeneous information networks. IEEE Trans. Knowl. Data Eng. 30(2), 353–366 (2018)CrossRefGoogle Scholar
  15. 15.
    Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a large ontology from Wikipedia and wordnet. Web Semant.: Sci. Serv. Agents World Wide Web 6(3), 203–217 (2008)CrossRefGoogle Scholar
  16. 16.
    Suzuki, M., Matsuda, K., Sekine, S., Okazaki, N., Inui, K.: Neural joint learning for classifying Wikipedia articles into fine-grained named entity types. In: PACLIC 30 (2016)Google Scholar
  17. 17.
    Tardif, S., Curran, J.R., Murphy, T.: Improved text categorisation for Wikipedia named entities. In: ALTA, pp. 104–108 (2009)Google Scholar
  18. 18.
    Toral, A., Mu, R.: A proposal to automatically build and maintain gazetteers for named entity recognition by using Wikipedia. In: EACL, pp. 56–61 (2006)Google Scholar
  19. 19.
    Wang, Q., Liu, J., Luo, Y., Wang, B., Lin, C.Y.: Knowledge base completion via coupled path ranking. In: ACL, pp. 1308–1318 (2016)Google Scholar
  20. 20.
    Wu, T., Ling, S., Qi, G., Wang, H.: Mining type information from Chinese online encyclopedias. In: Supnithi, T., Yamaguchi, T., Pan, J.Z., Wuwongse, V., Buranarach, M. (eds.) JIST 2014. LNCS, vol. 8943, pp. 213–229. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-15615-6_16CrossRefGoogle Scholar
  21. 21.
    Wu, W., Li, H., Wang, H., Zhu, K.Q.: Probase: a probabilistic taxonomy for text understanding. In: SIGMOD, pp. 481–492 (2012)Google Scholar
  22. 22.
    Xu, B., et al.: CN-DBpedia: a never-ending Chinese knowledge extraction system. In: Benferhat, S., Tabia, K., Ali, M. (eds.) IEA/AIE 2017. LNCS (LNAI), vol. 10351, pp. 428–438. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-60045-1_44CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.School of Computer Science and TechnologySoochow UniversitySuzhouChina
  2. 2.University of Electronic Science and Technology of ChinaChengduChina

Personalised recommendations