On Emerging Entity Detection

  • Michael FärberEmail author
  • Achim Rettinger
  • Boulos El Asmar
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10024)


While large Knowledge Graphs (KGs) already cover a broad range of domains to an extent sufficient for general use, they typically lack emerging entities that are just starting to attract the public interest. This disqualifies such KGs for tasks like entity-based media monitoring, since a large portion of news inherently covers entities that have not been noted by the public before. Such entities are unlinkable, which ultimately means, they cannot be monitored in media streams. This is the first paper that thoroughly investigates all types of challenges that arise from out-of-KG entities for entity linking tasks. By large-scale analytics of news streams we quantify the importance of each challenge for real-world applications. We then propose a machine learning approach which tackles the most frequent but least investigated challenge, i.e., when entities are missing in the KG and cannot be considered by entity linking systems. We construct a publicly available benchmark data set based on English news articles and editing behavior on Wikipedia. Our experiments show that predicting whether an entity will be added to Wikipedia is challenging. However, we can reliably identify emerging entities that could be added to the KG according to Wikipedia’s own notability criteria.


Emerging information discovery Evolving knowledge Novelty Detection Entity linking Text annotation 


  1. 1.
    Ben-Hur, A., Horn, D., Siegelmann, H.T., Vapnik, V.: Support vector clustering. J. Mach. Learn. Res. 2, 125–137 (2002)zbMATHGoogle Scholar
  2. 2.
    Bunescu, R., Pasca, M.: Using encyclopedic knowledge for named entity disambiguation. In: Proceedings of the 11th Conference of of the European Chapter of the Association for Computational Linguistics (EACL-06), pp. 9–16, Trento, Italy (2006)Google Scholar
  3. 3.
    Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka, E., Mitchell, T.: Toward an architecture for never-ending language learning (2010)Google Scholar
  4. 4.
    Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)zbMATHGoogle Scholar
  5. 5.
    Cucerzan, S.: Large-scale named entity disambiguation based on Wikipedia data. In: Proceedings of the 2007 Joint Conference on EMNLP-CoNLL, pp. 708–716, Prague, Czech Republic. Association for Computational Linguistics, June 2007Google Scholar
  6. 6.
    Dredze, M., McNamee, P., Rao, D., Gerber, A., Finin, T.: Entity disambiguation for knowledge base population. In: Proceedings of the 23rd International Conference on Computational Linguistics, COLING 2010, Stroudsburg, PA, USA, pp. 277–285. Association for Computational Linguistics (2010)Google Scholar
  7. 7.
    Dutta, A., Meilicke, C., Stuckenschmidt, H.: Enriching structured knowledge with open information. In: Proceedings of the 24th International Conference on World Wide Web, WWW 2015, Republic and Canton of Geneva, Switzerland, pp. 267–277 (2015)Google Scholar
  8. 8.
    Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, Stroudsburg, PA, USA, pp. 1535–1545. Association for Computational Linguistics (2011)Google Scholar
  9. 9.
    Gottipati, S., Jiang, J.: Linking entities to a knowledge base with query expansion. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, Stroudsburg, PA, USA, pp. 804–813. Association for Computational Linguistics (2011)Google Scholar
  10. 10.
    Hoffart, J., Altun, Y., Weikum, G.: Discovering emerging entities with ambiguous Names. In: Proceedings of the 23rd International Conference on World Wide Web, WWW 2014, New York, NY, USA, pp. 385–396. ACM (2014)Google Scholar
  11. 11.
    Ji, H., Nothman, J., Hachey, B., Florian, R.: Overview of TAC-KBP2015 tri-lingual entity discovery and linking (2015)Google Scholar
  12. 12.
    Lin, T., Etzioni, O.: No noun phrase left behind: detecting and typing unlinkable entities. In: Proceedings of the 2012 Joint Conference on EMNLP and CoNLL, EMNLP-CoNLL 2012, Stroudsburg, PA, USA, pp. 893–903. ACL (2012)Google Scholar
  13. 13.
    Milne, D., Witten, I.H.: Learning to link with wikipedia. In: Proceedings of the 17th ACM conference on Information and knowledge management, CIKM 2008, New York, NY, USA, pp. 509–518. ACM (2008)Google Scholar
  14. 14.
    Nakashole, N., Tylenda, T., Weikum, G.: Fine-grained semantic typing of emerging entities. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pp. 1488–1497 (2013)Google Scholar
  15. 15.
    Parada, C., Sethy, A., Dredze, M., Jelinek, F.: A spoken term detection framework for recovering out-of-vocabulary words using the web. Paragraph 10(71.24), 323K (2010)Google Scholar
  16. 16.
    Soboroff, I., Harman, D.: Novelty detection: the TREC experience. In: HLT 2005, Stroudsburg, PA, USA, pp. 105–112. ACL (2005)Google Scholar
  17. 17.
    Trampuš, M., Novak, B.: Internals of an aggregated web news feed. In: Proceedings of the Fifteenth International Information Science Conference IS SiKDD 2012, pp. 431–434 (2012)Google Scholar
  18. 18.
    Wang, C., Chakrabarti, K., Cheng, T., Chaudhuri, S.: Targeted disambiguation of ad-hoc, homogeneous sets of named entities. In: Proceedings of the 21st International Conference on World Wide Web, WWW 2012, New York, NY, USA, pp. 719–728. ACM (2012)Google Scholar
  19. 19.
    Wu, Z., Song, Y., Giles, C.L.: Exploring multiple feature spaces for novel entity discovery. In: AAAI 2016, AAAI - Association for the Advancement of Artificial Intelligence, February 2016Google Scholar
  20. 20.
    Yosef, M.A., Bauer, S., Hoffart, J., Spaniol, M., Weikum, G.: HYENA: hierarchical type classification for entity names. In: COLING 2012, pp. 1361–1370 (2012)Google Scholar
  21. 21.
    Zhang, L., Färber, M., Rettinger, A.: xLiD-Lexica: cross-lingual Linked data lexica. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014), pp. 2101–2105. ELRA (2014)Google Scholar
  22. 22.
    Zhang, L., Rettinger, A.: X-LiSA: cross-lingual semantic annotation. PVLDB 7(13), 1693–1696 (2014)Google Scholar
  23. 23.
    Zhao, S., Li, C., Ma, S., Ma, T., Ma, D.: Combining POS tagging, lucene search and similarity metrics for entity linking. In: Lin, X., Manolopoulos, Y., Srivastava, D., Huang, G. (eds.) WISE 2013. LNCS, vol. 8180, pp. 503–509. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-41230-1_44 CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Michael Färber
    • 1
    Email author
  • Achim Rettinger
    • 1
  • Boulos El Asmar
    • 1
  1. 1.Karlsruhe Institute of Technology (KIT)KarlsruheGermany

Personalised recommendations