Advertisement

Named Entity Based Document Similarity with SVM-Based Re-ranking for Entity Linking

  • Ayman Alhelbawy
  • Rob Gaizauskas
Part of the Communications in Computer and Information Science book series (CCIS, volume 322)

Abstract

In this paper we present a novel approach to search a knowledge base for an entry that contains information about a named entity (NE) mention as specified within a given context. A document similarity function (NEBSim) based on NE co-occurrence has been developed to calculate the similarity between two documents given a specific NE mention in one of them. NEBsim is also used in conjunction with the traditional cosine similarity measure to learn a model for ranking. Naive Bayes and SVM classifiers are used to re-rank the retrieved documents. Our experiments, carried out on TAC-KBP 2011 data, show NEBsim achieves significant improvement in accuracy as compared with a cosine similarity approach. They also show that re-ranking using learn to rank techniques can significantly improve the accuracy at high ranks.

Keywords

NEBsim Entity Linking Supported Vector Machine Learn to Rank SVM-map SVM-rank Naive Bayes 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    McNamee, P., Dang, H.T.: Overview of the TAC 2009 knowledge base population track. In: Text Analysis Conference TAC (2009)Google Scholar
  2. 2.
    Bunescu, R.C., Pasca, M.: Using Encyclopedic Knowledge for Named entity Disambiguation. In: Proceedings of EACL, vol. 6 (2006)Google Scholar
  3. 3.
    Cucerzan, S.: Large-scale named entity disambiguation based on Wikipedia data. In: Proceedings of EMNLP-CoNLL (2007)Google Scholar
  4. 4.
    Han, X., Sun, L., Zhao, J.: Collective entity linking in web text: a graph-based method. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information (2011)Google Scholar
  5. 5.
    Bagga, A., Baldwin, B.: Entity-based cross-document coreferencing using the vector space model. In: Proceedings of the 17th International Conference on Computational Linguistics, vol. 1 (1998)Google Scholar
  6. 6.
    Zheng, Z., Li, F., Huang, M., Zhu, X.: Learning to link entities with knowledge base. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (2010)Google Scholar
  7. 7.
    Petkova, D., Croft, W.B.: Proximity-based document representation for named entity retrieval. In: Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, pp. 731–740 (2007)Google Scholar
  8. 8.
    Gottipati, S., Jiang, J.: Linking Entities to a Knowledge Base with Query Expansion. In: Empirical Methods in Natural Language Processing, EMNLP (2011)Google Scholar
  9. 9.
    Han, X., Sun, L.: A generative entity-mention model for linking entities with knowledge base. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (2011)Google Scholar
  10. 10.
    Mandl, T., Womser-Hacker, C.: The effect of named entities on effectiveness in cross-language information retrieval evaluation. In: Proceedings of the 2005 ACM Symposium on Applied Computing (2005)Google Scholar
  11. 11.
    Kumaran, G., Allan, J.: Text classification and named entities for new event detection. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (2004)Google Scholar
  12. 12.
    Reddy, B.K., Kumar, K., Krishna, S., Pingali, P., Varma, V.: Linking Named Entities to a Structured Knowledge Base. International Journal of Computational Linguistics and Applications 1(1-2), 121–136 (2010)Google Scholar
  13. 13.
    Lin, D.: An Information-Theoretic Definition of Similarity. Morgan Kaufmann (1998)Google Scholar
  14. 14.
    Liu, T.Y.: Learning to rank for information retrieval. Morgan Springer-Verlag New York Inc. (2011)Google Scholar
  15. 15.
    Joachims, T.: Optimizing search engines using clickthrough data. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2002)Google Scholar
  16. 16.
    Joachims, T.: Training linear SVMs in linear time. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2002)Google Scholar
  17. 17.
    Yue, Y., Finley, T., Radlinski, F., Joachims, T.: A support vector method for optimizing average precision. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Ayman Alhelbawy
    • 1
    • 2
  • Rob Gaizauskas
    • 1
  1. 1.Computer Science DepartmentUniversity of SheffieldSheffieldUK
  2. 2.Information Science DepartmentFayoum UniversityFayoumEgypt

Personalised recommendations