On the Long-Tail Entities in News

  • José Esquivel
  • Dyaa Albakour
  • Miguel Martinez
  • David Corney
  • Samir Moussa
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10193)

Abstract

Long-tail entities represent unique challenges for state-of-the-art entity linking systems since they are under-represented in general knowledge bases. This paper studies long-tail entities in news corpora. We conduct experiments on a large news collection of one million articles, where we devise an approach for measuring the volume of such entities in news and we uncover insights on the challenges associated with linking these entities to general knowledge bases.

References

  1. 1.
    Frank, J.R., Kleiman-Weiner, M., Roberts, D.A., Voorhees, E., Soboroff, I.: TREC KBA overview. In: Proceedings of TREC (2014)Google Scholar
  2. 2.
    Martinez, M., Kruschwitz, U., Kazai, G., Hopfgartner, F., Corney, D., Campos, R., Albakour, D.: Report on the 1st international workshop on recent trends in news information retrieval (NewsIR16). SIGIR Forum 50(1), 58–67 (2016)CrossRefGoogle Scholar
  3. 3.
    Reinanda, R., Meij, E., de Rijke, M.: Document filtering for long-tail entities. In: Proceedings of CIKM (2016)Google Scholar
  4. 4.
    Daiber, J., Jakob, M., Hokamp, C., Mendes, P.N.: Improving efficiency and accuracy in multilingual entity extraction. In: Proceedings of the 9th International Conference on Semantic Systems (I-Semantics) (2013)Google Scholar
  5. 5.
    Ferragina, P., Scaiella, U.: Tagme: on-the-fly annotation of short text fragments (by Wikipedia entities). In: Proceedings of CIKM2010 (2010)Google Scholar
  6. 6.
    van Erp, M., Mendes, P., Paulheim, H., Ilievski, F., Plu, J., Rizzo, G., Waitelonis, J.: Evaluating entity linking: an analysis of current benchmark datasets and a roadmap for doing a better job. In: Proceedings of ELRA (2016)Google Scholar
  7. 7.
    Lin, T., Etzioni, O.: No noun phrase left behind: detecting and typing unlinkable entities. In: Proceedings of EMNLP (2012)Google Scholar
  8. 8.
    Farid, M.H., Ilyas, I.F., Whang, S.E., Yu, C.: LONLIES: estimating property values for long tail entities. In: Proceedings of SIGIR 2016, 1125–1128 (2016)Google Scholar
  9. 9.
    Corney, D., Albakour, D., Martinez, M., Moussa, S.: What do a million news articles look like? In: Proceedings of ECIR NewsIR workshop (2016)Google Scholar
  10. 10.
    Fetahu, B., Anand, A., Anand, A.: How much is wikipedia lagging behind news? In: Proceedings of the ACM Web Science Conference (2015)Google Scholar
  11. 11.
    Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by Gibbs sampling. In: Proceedings of ACL (2015)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • José Esquivel
    • 1
    • 2
  • Dyaa Albakour
    • 2
  • Miguel Martinez
    • 2
  • David Corney
    • 2
  • Samir Moussa
    • 2
  1. 1.School of Computer Science and Electronic EngineeringUniversity of EssexColchesterUK
  2. 2.Signal Media Ltd.LondonUK

Personalised recommendations