Choosing Better Seeds for Entity Set Expansion by Leveraging Wikipedia Semantic Knowledge

  • Zhenyu Qi
  • Kang Liu
  • Jun Zhao
Part of the Communications in Computer and Information Science book series (CCIS, volume 321)


Entity Set Expansion, which refers to expanding a human-input seed set to a more complete set which belongs to the same semantic category, is an important task for open information extraction. Because human-input seeds may be ambiguous, sparse etc., the quality of seeds has a great influence on expansion performance, which has been proved by many previous researches. To improve seeds quality, this paper proposes a novel method which can choose better seeds from original input ones. In our method, we leverage Wikipedia semantic knowledge to measure semantic relatedness and ambiguity of each seed. Moreover, to avoid the sparseness of the seed, we use web corpus to measure its population. Lastly, we use a linear model to combine these factors to determine the final selection. Experimental results show that new seed sets chosen by our method can improve expansion performance by up to average 13.4% over random selected seed sets.


information extraction seed set refinement semantic knowledge 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Vishnu, V., Patrick, P., Eric, C.: Helping editors choose better seed sets for entity set. In: Proceedings of CIKM 2009, pp. 225–234. ACM, Hong Kong (2009)Google Scholar
  2. 2.
    Richard, W., Nico, S., William, C., Eric, N.: Automatic Set Expansion for List Question Answering. In: Proceedings of EMNLP 2008, pp. 947–954. ACL, USA (2008)Google Scholar
  3. 3.
    Marco, P., Patrick, P.: Entity Extraction via Ensemble Semantics. In: Proceedings of EMNLP 2009, pp. 238–247. ACL, Singapore (2009)Google Scholar
  4. 4.
    Richard, W., William, C.: Automatic Set Instance Extraction using the Web. In: Proceedings of ACL/AFNLP 2009, pp. 441–449. ACL, Singapore (2009)Google Scholar
  5. 5.
    Marius, P.: Weakly-supervised discovery of named entities using web search queries. In: Proceedings of CIKM 2007, pp. 683–690. ACM, Portugal (2007)Google Scholar
  6. 6.
    Richard, W., William, C.: Iterative set expansion of named entities using the web. In: Proceedings of ICDM 2008, pp. 1091–1096. IEEE Computer Society, Italy (2008)Google Scholar
  7. 7.
    Patrick, P., Eric, C., Arkady, B., Ana-Maria, P., Vishnu, V.: Web-Scale Distributional Similarity and Entity Set Expansion. In: Proceedings of EMNLP 2009, pp. 938–947 (2009)Google Scholar
  8. 8.
    Yeye, H., Dong, X.: C.: SEISA Set Expansion by Iterative Similarity Aggregation. In: Proceedings of WWW 2011, pp. 427–436. ACM, India (2011)Google Scholar
  9. 9.
    Richard, W., William, C.: Language-Independent Set Expansion of Named Entities using the Web. In: Proceedings of ICDM 2007, pp. 342–350. IEEE Computer Society, USA (2007)Google Scholar
  10. 10.
    David, M., Ian, H.W.: Learning to link with Wikipedia. In: Proceedings of CIKM 2008, pp. 509–518. ACM, USA (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Zhenyu Qi
    • 1
  • Kang Liu
    • 1
  • Jun Zhao
    • 1
  1. 1.National Laboratory of Pattern Recognition(NLPR)Institute of Automation Chinese Academy of SciencesBeijingChina

Personalised recommendations