Skip to main content
Log in

Entity set expansion in knowledge graph: a heterogeneous information network perspective

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Entity set expansion (ESE) aims to expand an entity seed set to obtain more entities which have common properties. ESE is important for many applications such as dictionary construction and query suggestion. Traditional ESE methods relied heavily on the text and Web information of entities. Recently, some ESE methods employed knowledge graphs (KGs) to extend entities. However, they failed to effectively and efficiently utilize the rich semantics contained in a KG and ignored the text information of entities in Wikipedia. In this paper, we model a KG as a heterogeneous information network (HIN) containing multiple types of objects and relations. Fine-grained multi-type meta paths are proposed to capture the hidden relation among seed entities in a KG and thus to retrieve candidate entities. Then we rank the entities according to the meta path based structural similarity. Furthermore, to utilize the text description of entities in Wikipedia, we propose an extended model CoMeSE++ which combines both structural information revealed by a KG and text information in Wikipedia for ESE. Extensive experiments on real-world datasets demonstrate that our model achieves better performance by combining structural and textual information of entities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Cohen W, Sarawagi S. Exploiting dictionaries in named entity extraction: combining semi-markov extraction processes and data integration methods. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2004, 89–98

  2. Pantel P, Lin D. Discovering word senses from text. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2002, 613–619

  3. Hu J, Wang G, Lochovsky F, Sun J T, Chen Z. Understanding user’s query intent with wikipedia. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2009, 471–480

  4. Cao H, Jiang D, Pei J, He Q, Liao Z, Chen E, Li H. Context-aware query suggestion by mining click-through and session data. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2008, 875–883

  5. He Y Y, Xin D. Seisa: set expansion by iterative similarity aggregation. In: Proceedings of the 20th International Conference on World Wide Web. 2011, 427–436

  6. Wang R C, Cohen W W. Language-independent set expansion of named entities using the Web. In: Proceedings of the 7th IEEE International Conference on Data Mining. 2007, 342–350

  7. Sarniento L, Jijkuon V, De R M, Oliveira E. More like these: growing entity classes from seeds. In: Proceedings of the 16th ACM Conference on Information and Knowledge Management. 2007, 959–962

  8. Li X L, Zhang L, Liu B, Ng S K. Distributional similarity vs. PU learning for entity set expansion. In: Proceedings of the ACL 2010 Conference Short Papers. 2010, 359–364

  9. Qi Z, Liu K, Zhao J. Choosing better seeds for entity set expansion by leveraging wikipedia semantic knowledge. In: Proceedings of the Chinese Conference on Pattern Recognition. 2012, 655–662

  10. Qi Z, Liu K, Zhao J. A novel entity set expansion method leveraging entity semantic knowledge. Journal of Chinese Information Processing, 2013, 27(2): 1–9

    Google Scholar 

  11. Zheng Y, Shi C, Cao X, Li X, Wu B. Entity set expansion with meta path in knowledge graph. In: Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2017, 317–329

  12. Zheng Y, Shi C, Cao X, Li X, Wu B. A meta path based method for entity set expansion in knowledge graph. IEEE Transactions on Big Data, 2018

  13. Shi C, Li Y, Zhang J, Sun Y, Philip S Y. A survey of heterogeneous information network analysis. IEEE Transactions on Knowledge and Data Engineering, 2016, 29(1): 17–37

    Article  Google Scholar 

  14. Sun Y, Han J, Yan X, Yu, Philip S, Wu T. Pathsim: meta path-based top-k similarity search in heterogeneous information networks. Proceedings of the VLDB Endowment, 2011, 4(11): 992–1003

    Article  Google Scholar 

  15. Auer S, Bizer C, Kobilarov G, Lehmann J, Cyganiak R, Ives Z. Dbpedia: a nucleus for a web of open data. In: Aberer K, et al. eds. The Semantic Web. Springer, Berlin, Heidelberg, 2007, 722–735

    Chapter  Google Scholar 

  16. Cao X, Shi C, Zheng Y, Ding J, Li X, Wu B. A heterogeneous information network method for entity set expansion in knowledge graph. In: Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2018, 288–299

  17. Wang R C, Cohen W W. Iterative set expansion of named entities using the web. In: Proceedings of the 8th IEEE International Conference on Data Mining. 2008, 1091–1096

  18. Shi B, Zhang Z, Sun L, Han X. A probabilistic co-bootstrapping method for entity set expansion. In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers. 2014, 2280–2290

  19. Zhang Z, Sun L, Han X. A joint model for entity set expansion and attribute extraction from web search queries. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence. 2016, 3101–3107

  20. Shen J, Wu Z, Lei D, Shang J, Ren X, Han J. Setexpan: corpus-based set expansion via context feature selection and rank ensemble. In: Proceedings of Joint European Conference on Machine Learning and Knowledge Discovery in Databases. 2017, 288–304

  21. Krishnan A, Padmanabhan D, Ranu S, Mehta S. Select, link and rank: diversified query expansion and entity ranking using wikipedia. In: Proceedings of the International Conference on Web Information Systems Engineering. 2016, 157–173

  22. Bing L, Lam W, Wong T L. Wikipedia entity expansion and attribute extraction from the web using semi-supervised learning. In: Proceedings of the 6th ACM International Conference on Web Search and Data Mining. 2013, 567–576

  23. Sadamitsu K, Saito K, Imamura K, Kikui G. Entity set expansion using topic information. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 2011, 726–731

  24. Jindal P, Roth D. Learning from negative examples in set-expansion. In: Proceedings of the 11th IEEE International Conference on Data Mining. 2011, 1110–1115

  25. Yu X, Sun Y, Norick B, Mao T, Han J. User guided entity similarity search using meta-path selection in heterogeneous information networks. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management. 2012, 2025–2029

  26. Metzger S, Schenkel R, Sydow M. QBEES: query by entity examples. In: Proceedings of the 22nd ACM International Conference on Information and Knowledge Management. 2013, 1829–1832

  27. Metzger S, Schenkel R, Sydow M. Aspect-based similar entity search in semantic knowledge graphs with diversity-awareness and relaxation. In: Proceedings of the 2014IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT). 2014, 60–69

  28. Fetahu B, Gadiraju U, Dietze S. Improving entity retrieval on structured data. In: Proceedings of International Semantic Web Conference. 2015, 474–491

  29. Ma D, Chen, Y, Chang K C, Du X, Xu C, Chang Y. Leveraging finegrained wikipedia categories for entity search. In: Proceedings of the 2018 World Wide Web Conference. 2018, 1623–1632

  30. Han J. Mining heterogeneous information networks: the next frontier. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2012, 2–3

  31. Sun Y, Norick B, Han J, Yan X, Yu P S, Yu X. Integrating meta-path selection with user-guided object clustering in heterogeneous information networks. ACM Transactions on Knowledge Discovery from Data, 2012, 7(3): 11

    Google Scholar 

  32. Singhal A. Introducing the knowledge graph: things, not strings. Official Google Blog, 2012

  33. Suchanek F M, Kasneci G, Weikum G. Yago: a core of semantic knowledge. In: Proceedings of the 16th International Conference on World Wide Web. 2007, 697–706

  34. Lao N, Cohen W W. Relational retrieval using a combination of path-constrained random walks. Machine Learning, 2010, 81(1): 53–67

    Article  MathSciNet  Google Scholar 

  35. Shi C, Kong X, Huang Y, Philip S Y, Wu B. Hetesim: a general framework for relevance measure in heterogeneous networks. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(10): 2479–2492

    Article  Google Scholar 

  36. Charles E, Keith N. Learning classifiers from only positive and unlabeled data. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2008, 213–220

  37. Perozzi B, Al-Rfou R, Skiena S. Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2014, 701–710

  38. Wang C, Song Y, Li H, Zhang M, Han J. KnowSim: a document similarity measure on structured heterogeneous information networks. In: Proceedings of IEEE International Conference on Data Mining. 2015, 1015–1020

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant Nos. 61806020, 61772082, 61972047, 61702296), the National Key Research and Development Program of China (2017YFB0803304), the Beijing Municipal Natural Science Foundation (4182043), the CCF-Tencent Open Fund, and the Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Linmei Hu.

Additional information

Chuan Shi received the BS degree from the Jilin University, China in 2001, the MS degree from the Wuhan University, China in 2004, and PhD degree from the ICT of Chinese Academic of Sciences, China in 2007. He joined the Beijing University of Posts and Telecommunications as a lecturer in 2007, and is a professor and deputy director of Beijing Key Lab of Intelligent Telecommunications Software and Multimedia at present. His research interests are in data mining, machine learning, and evolutionary computing. He has published more than 40 papers in refereed journals and conferences.

Jiayu Ding received the BS degree from the Beijing University of Posts and Telecommunications (BUPT), China in 2017. He is currently working toward the MS degree in the School of Computer Science at BUPT, China. His research interests are in data mining and nature language process.

Xiaohuan Cao received the BS degree from the Beijing University of Posts and Telecommunications (BUPT), China in 2015, the MS degree in the School of Computer Science at BUPT, China in 2018. Her research interests are in data mining and machine learning, especially heterogeneous information network studies.

Linmei Hu is an assistant professor in School of Computer Sciences, Beijing University of Posts and Communications, China. She received her PhD degree from Tsinghua University, China in 2018. Her research interests focus on natural language processing and data mining. She was awarded Beijing Excellent PhD Student in 2018.

Bin Wu received the BS degree from the Beijing University of Posts and Telecommunications, China in 1991, and the MS and PhD degrees from the ICT of Chinese Academic of Sciences, China in 1998 and 2002, respectively. He joined the Beijing University of Posts and Telecommunications as a lecturer in 2002, and is a professor at present. His research interests include data mining, complex network, and cloud computing. He has published more than 100 papers in refereed journals and conferences. He is a member of the IEEE.

Xiaoli Li is currently a department head at the Institute for Infocomm Research, A*STAR, Singapore. He also holds adjunct professor positions at the National University of Singapore and Nanyang Technological University. His research interests include data mining, machine learning, AI, and bioinformatics. He has served as a (senior) PC member/workshop chair/session chair in leading data mining related conferences (including KDD, ICDM, SDM, PKDD/ECML, WWW, IJCAI, AAAI, ACL and CIKM). Xiaoli has published more than 160 high quality papers and won best paper/benchmark competition awards.

Electronic Supplementary Material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shi, C., Ding, J., Cao, X. et al. Entity set expansion in knowledge graph: a heterogeneous information network perspective. Front. Comput. Sci. 15, 151307 (2021). https://doi.org/10.1007/s11704-020-9240-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11704-020-9240-8

Keywords

Navigation