Skip to main content
Log in

Incorporating multi-kernel function and Internet verification for Chinese person name disambiguation

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

The study on person name disambiguation aims to identify different entities with the same person name through document linking to different entities. The traditional disambiguation approach makes use of words in documents as features to distinguish different entities. Due to the lack of use of word order as a feature and the limited use of external knowledge, the traditional approach has performance limitations. This paper presents an approach for named entity disambiguation through entity linking based on a multikernel function and Internet verification to improve Chinese person name disambiguation. The proposed approach extends a linear kernel that uses in-document word features by adding a string kernel to construct a multi-kernel function. This multi-kernel can then calculate the similarities between an input document and the entity descriptions in a named person knowledge base to form a ranked list of candidates to different entities. Furthermore, Internet search results based on keywords extracted from the input document and entity descriptions in the knowledge base are used to train classifiers for verification. The evaluations on CIPS-SIGHAN 2012 person name disambiguation bakeoff dataset show that the use of word orders and Internet knowledge through a multi-kernel function can improve both precision and recall and our system has achieved state-of-the-art performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Chen L W, Feng Y S, Zou L, Zhao D Y. Explore person specific evidence in Web person name disambiguation. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 2012, 832–842

    Google Scholar 

  2. Zhang B L, Huang H Z, Pan X M, Ji H, Knight K, Wen Z, Sun Y Z, Han J W, Yener B. Be appropriate and funny: automatic entity morph encoding. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. 2014

    Google Scholar 

  3. Huang H Z, Wen Z, Yu D, Ji H, Sun Y Z, Han J W, Li H. Resolving entity morphs in censored data. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. 2013, 1083–1093

  4. Wang H F,Mei Z. Chinese multi-document personal name disambiguation. High Techlology Letters, 2005, 11(3): 280–283

    Google Scholar 

  5. Xu J, Lu Q, Liu Z Z. Aggregating skip bigrams into key phrase-based vector space model for Web person disambiguation. In: Proceedings of KONVENS 2012 (Main track: oral presentations). 2012, 108–117

    Google Scholar 

  6. Yoshida M, Ikeda M, Ono S, Sato I, Nakagawa H. Person name disambiguation by bootstrapping. In: Proceedings of the 33rd international ACMSIGIR Conference on Research and Development in Information Retrieval. 2010, 10–17

    Google Scholar 

  7. Xu J, Lu Q, Liu Z Z. Combining classification with clustering for web person disambiguation. In: Proceedings of the 21st International Conference Companion on World Wide Web. 2012, 637–638

    Google Scholar 

  8. Chen C, Hu J F, Wang H F. Clustering technique in multi-document personal name disambiguation. In: Proceedings of the ACL-IJCNLP 2009 Student Research Workshop. 2009, 88–95

    Chapter  Google Scholar 

  9. Chen Z, Tamang S, Lee A, Li X, Lin W P, Snover M, Artiles J, Passantino M, Ji H. Cunyblender TAC-KBP 2010 entity linking and slot filling system description. In: Proceedings of the Text Analysis Conference. 2010

    Google Scholar 

  10. Lehmann J, Monahan S, Nezda L, Jung A, Shi Y. LCC approaches to knowledge base population at TAC 2010. In: Proceedings of the Text Analysis Conference. 2010

    Google Scholar 

  11. Radford W, Hachey B, Nothman J, Honnibal M, Curran J R. Document-level entity linking: CMCRC at TAC 2010. In: Proceedings of the Text Analysis Conference. 2010

    Google Scholar 

  12. Varma V, Bysani P, Reddy K, Reddy V B, Kovelamudi S, Vaddepally S R, Nanduri R, N K K, Gsk S, Pingali P. IIIT hyderabad in guided summarization and knowledge base guided summarization track. In: Proceedings of the Text Analysis Conference. 2010

    Google Scholar 

  13. Agirre E, Chang A X, Jurafsky D S, Manning C D, Spitkovsky V I, Yeh E. Stanford-UBC at TAC-KBP. In: Proceedings of Test Analysis Conference 2009. 2009

    Google Scholar 

  14. Li S, Gao S Y, Zhang Z Y, Li X S, Guan J Y, Xu W R, Guo J. PRIS at TAC 2009: experiments in KBP track. In: Proceedings of Test Analysis Conference 2009. 2009

    Google Scholar 

  15. McNamee P. HLTCOE efforts in entity linking at TAC KBP 2010. In: Proceedings of the Text Analysis Conference. 2010

    Google Scholar 

  16. Zhang W, Su J, Chen B, Wang W, Toh Z, Sim Y, Cao Y, Lin C, Tan C L. I2R-NUS-MSRA at TAC 2011: entity linking. In: Proceedings of the Text Analysis Conference. 2011

  17. Han X P, Zhao J. Named entity disambiguation by leveraging wikipedia semantic knowledge. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management. 2009, 215–224

    Google Scholar 

  18. Song Y, Huang J, Councill I G, Li J, Giles C L. Efficient topicbased unsupervised name disambiguation. In: Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries. 2007, 342–351

    Google Scholar 

  19. Bekkerman R, McCallum A. Disambiguating web appearances of people in a social network. In: Proceedings of the 14th International Conference on World Wide Web. 2005, 463–470

    Chapter  Google Scholar 

  20. Han X P, Zhao J. Web personal name disambiguation based on reference entity tables mined from the Web. In: Proceedings of the 11th International Workshop on Web Information and Data Management. 2009, 75–82

    Google Scholar 

  21. Tang J T, Lu Q, Wang T, Wang J, Li WJ. A bipartite graph based social network splicing method for person name disambiguation. In: Proceedings of the 34th international ACM SIGIR Conference on Research and Development in Information Retrieval. 2011, 1233–1234

    Google Scholar 

  22. Lang J, Qin B, Song W, Liu L, Liu T, Li S. Person name disambiguation of searching results using social network. Chinese Journal of Computers, 2009, 32(7): 1365–1374

    Article  MathSciNet  Google Scholar 

  23. Xu R F, Xu J, Dai X Y, Kit C. Combine person name and person identity recognition and document clustering for Chinese person name disambiguation. In: Proceedings of CIPS-SIGHAN Joint Conference on Chinese Language Processing. 2010, 359

  24. Fisher R A. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 1936, 7(2): 179–188

    Article  Google Scholar 

  25. Han Z H, Peng L, Sun X P. SIR-NERD: A Chinese named entity recog nition and disambiguation system using a two-stage method. In: Proceedings of the 2nd CIPS-SIGHAN Joint Conference on Chinese Language Processing. 2012, 115

    Google Scholar 

  26. Zong H, Wong D F, Chao L S. A template based hybrid model for Chinese personal name disambiguation. In: Proceedings of the 2nd CIPSSIGHAN Joint Conference on Chinese Language Processing. 2012

    Google Scholar 

  27. Han W, Liu G, Mao Y Z, Huang Z N. Attribute based Chinese named entity recognition and disambiguation. In: Proceedings of the 2nd CIPS-SIGHAN Joint Conference on Chinese Language Processing. 2012, 127

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qin Lu.

Additional information

Ruifeng Xu is an associate professor and PhD supervisor at School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, China. His research areas are natural language processing, emotion computing, text mining and Bioinformatics. His main research focuses on computational methods for natural language processing and understanding. He received his BS degree in computer science from Harbin Institute of Technology, China, MS and PhD degrees in computer science from the Hong Kong Polytechnic University, China.

Lin Gui is a PhD candidate at the School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, China. His research areas are natural language processing, sentiment analysis, emotion computing and machine learning. His main research focuses on machine learning for natural language processing. He received his BS degree from Nankai University, China and MS degree in computer science from the Harbin Institute of Technology, China.

Qin Lu is a professor and associate head of Department of Computing, the Hong Kong Polytechnic University, China. Her research areas are natural language processing, computational linguistics, lexical semantics, text mining and knowledge discovery. Her main research focuses on using computational methods for information extraction, text mining and knowledge discovery. She has conducted extensive work on Chinese collocation extraction, terminology extraction and ontology construction, named entity disambiguation and emotion analysis. She received her BS degree in Beijing Normal University, China, MS and PhD degrees in computer science from the University of Illinois at Urbana-Champaign, USA.

Shuai Wang is a master candidate at the School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, China. His research areas are natural language processing, information retrieval, machine learning. His main research focuses on machine learning for natural language processing. He received his BS degree from Heilongjiang Institute of Technology, China.

Jian Xu obtained his PhD degree from the Department of Computing, the Hong Kong Polytechnic University, China, and currently works for Huawei Technologies Company Limited. His research areas are natural language processing, computational linguistics, lexical semantics, text mining and knowledge discovery. His main research focuses on entity disambiguation. He received his BS degree from Beijing Language and Culture University, China, and MS degree from Peking University, China.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xu, R., Gui, L., Lu, Q. et al. Incorporating multi-kernel function and Internet verification for Chinese person name disambiguation. Front. Comput. Sci. 10, 1026–1038 (2016). https://doi.org/10.1007/s11704-016-4503-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11704-016-4503-0

Keywords

Navigation