All that Glitters Is Not Gold – Rule-Based Curation of Reference Datasets for Named Entity Recognition and Entity Linking

  • Kunal Jha
  • Michael RöderEmail author
  • Axel-Cyrille Ngonga Ngomo
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10249)


The evaluation of Named Entity Recognition as well as Entity Linking systems is mostly based on manually created gold standards. However, the current gold standards have three main drawbacks. First, they do not share a common set of rules pertaining to what is to be marked and linked as an entity. Moreover, most of the gold standards have not been checked by other researchers after they were published. Hence, they commonly contain mistakes. Finally, many gold standards lack actuality as in most cases the reference knowledge bases used to link entities are refined over time while the gold standards are typically not updated to the newest version of the reference knowledge base. In this work, we analyze existing gold standards and derive a set of rules for annotating documents for named entity recognition and entity linking. We derive Eaglet, a tool that supports the semi-automatic checking of a gold standard based on these rules. A manual evaluation of Eaglet’s results shows that it achieves an accuracy of up to 88% when detecting errors. We apply Eaglet to 13 English gold standards and detect 38,453 errors. An evaluation of 10 tools on a subset of these datasets shows a performance difference of up to 10% micro F-measure on average.


Entity recognition Entity linking Benchmarks 



This work has been supported by the H2020 project HOBBIT (GA no. 688227) as well as the EuroStars projects DIESEL (project no. 01QE1512C) and QAMEL (project no. 01QE1549C).


  1. 1.
    Cornolti, M., Ferragina, P., Ciaramita, M.: A framework for benchmarking entity-annotation systems. In: Proceedings of the 22nd International Conference on World Wide Web (WWW 2013), pp. 249–260, New York, NY, USA. ACM (2013)Google Scholar
  2. 2.
    Cucerzan, S.: Large-scale named entity disambiguation based on wikipedia data. In: EMNLP-CoNLL, pp. 708–716 (2007)Google Scholar
  3. 3.
    Doddington, G., Mitchell, A., Przybocki, M., Ramshaw, L., Strassel, S., Weischedel, R.: Automatic content extraction (ACE) program - task definitions and performance measures. In: Proceedings of the 4th International Conference on Language Resources and Evaluation (2004)Google Scholar
  4. 4.
    Ehrmann, M., Nouvel, D., Rosset, S.: Named entity resources - overview and outlook. In: Calzolari, N., Choukri, K., Declerck, T., Goggi, S., Grobelnik, M., Maegaard, B., Mariani, J., Mazo, H., Moreno, A., Odijk, J., Piperidis, S., (eds.) Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), Paris, France. European Language Resources Association (ELRA), May 2016Google Scholar
  5. 5.
    Hoffart, J., Seufert, S., Nguyen, D.B., Theobald, M., Weikum, G.: KORE: keyphrase overlap relatedness for entity disambiguation. In: Proceedings of CIKM (2012)Google Scholar
  6. 6.
    Hoffart, J., Yosef, M.A., Bordino, I., Fürstenau, H., Pinkal, M., Spaniol, M., Taneva, B., Thater, S., Wiegand, M., Weikum, G.: Robust disambiguation of named entities in text. In: Proceedings of EMNLP 2011, 27–31, pp. 782–792, Stroudsburg, PA. ACL, July 2011Google Scholar
  7. 7.
    Hoffart, J., Altun, Y., Weikum, G.: Discovering emerging entities with ambiguous names. In: Proceedings of the 23rd WWW, pp. 385–396. ACM (2014)Google Scholar
  8. 8.
    Rothschild, S.: Agreement, the f-measure, and reliability in information retrieval. J. Am. Med. Inf. Assoc. 12(3), 296–298 (2005)CrossRefGoogle Scholar
  9. 9.
    Kulkarni, S., Singh, A., Ramakrishnan, G., Chakrabarti, S.: Collective annotation of wikipedia entities in web text. In: Proceedings of the 15th ACM SIGKDD, pp. 457–466. ACM (2009)Google Scholar
  10. 10.
    Lee, H., Peirsman, Y., Chang, A., Chambers, N., Surdeanu, M., Jurafsky, D.: Stanford’s multi-pass sieve coreference resolution system at the CoNLL-2011 shared task. In: Conference on Natural Language Learning (CoNLL) Shared Task (2011)Google Scholar
  11. 11.
    Ling, X., Singh, S., Weld, D.S.: Design challenges for entity linking. Trans. Assoc. Comput. Linguist. 3, 315–328 (2015)Google Scholar
  12. 12.
    Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: Association for Computational Linguistics (ACL) System Demonstrations, pp. 55–60 (2014)Google Scholar
  13. 13.
    Mendes, P.N., Jakob, M., García-Silva, A., Bizer, C.: Dbpedia spotlight: shedding light on the web of documents. In: Proceedings of the 7th International Conference on Semantic Systems, pp. 1–8. ACM (2011)Google Scholar
  14. 14.
    Michael, R., Usbeck, R., Ngomo, A.-C.N.: Techreport for GERBIL 1.2.2 - V1. Technical report, Leipzig University (2016)Google Scholar
  15. 15.
    Milne, D., Witten, I.H.: Learning to link with wikipedia. In: 17th ACM CIKM, pp. 509–518 (2008)Google Scholar
  16. 16.
    Nuzzolese, A.G., Gentile, A.L., Presutti, V., Gangemi, A., Garigliotti, D., Navigli, R.: Open knowledge extraction challenge. In: Gandon, F., Cabrio, E., Stankovic, M., Zimmermann, A. (eds.) SemWebEval 2015. CCIS, vol. 548, pp. 3–15. Springer, Cham (2015). doi: 10.1007/978-3-319-25518-7_1CrossRefGoogle Scholar
  17. 17.
    Ratinov, L., Roth, D., Downey, D., Anderson, M.: Local and global algorithms for disambiguation to wikipedia. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 1375–1384. ACL (2011)Google Scholar
  18. 18.
    Rehm, G.: The language resource life cycle: towards a generic model for creating, maintaining, using and distributing language resources. In: Calzolari, N., Choukri, K., Declerck, T., Goggi, S., Grobelnik, M., Maegaard, B., Mariani, J., Mazo, H., Moreno, A., Odijk, J., Piperidis, S., (eds.) Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), Paris, France, May 2016. European Language Resources Association (ELRA) (2016)Google Scholar
  19. 19.
    Rowe, M., Stankovic, M., Dadzie, A.-S., (eds.): Making Sense of Microposts (#Microposts2014) In: Proceedings of 4th Workshop on Making Sense of Microposts (#Microposts2014): Big Things Come in Small Packages, Seoul, Korea, 7 April 2014Google Scholar
  20. 20.
    Usbeck, R., Röder, M., Ngomo, A.-C.N., Baron, C., Both, A., Brümmer, M., Ceccarelli, D., Cornolti, M., Cherix, D., Eickmann, B., Ferragina, P., Lemke, C., Moro, A., Navigli, R., Piccinno, F., Rizzo, G., Sack, H., Speck, R., Troncy, R., Waitelonis, J., Wesemann, L.: GERBIL - general entity annotation benchmark framework. In: 24th WWW Conference (2015)Google Scholar
  21. 21.
    van Erp, M., Mendes, P., Paulheim, H., Ilievski, F., Plu, J., Rizzo, G., Waitelonis, J.: Evaluating entity linking: an analysis of current benchmark datasets and a roadmap for doing a better job. In: LREC 2016 (2016)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Kunal Jha
    • 1
  • Michael Röder
    • 1
    Email author
  • Axel-Cyrille Ngonga Ngomo
    • 1
    • 2
  1. 1.AKSW Research GroupUniversity of LeipzigLeipzigGermany
  2. 2.Data Science GroupUniversity of PaderbornPaderbornGermany

Personalised recommendations