Predicting Entity Mentions in Scientific Literature

  • Yalung Zheng
  • Jon Ezeiza
  • Mehdi Farzanehpour
  • Jacopo UrbaniEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11503)


Predicting which entities are likely to be mentioned in scientific articles is a task with significant academic and commercial value. For instance, it can lead to monetary savings if the articles are behind paywalls, or be used to recommend articles that are not yet available. Despite extensive prior work on entity prediction in Web documents, the peculiarities of scientific literature make it a unique scenario for this task. In this paper, we present an approach that uses a neural network to predict whether the (unseen) body of an article contains entities defined in domain-specific knowledge bases (KBs). The network uses features from the abstracts and the KB, and it is trained using open-access articles and authors’ prior works. Our experiments on biomedical literature show that our method is able to predict subsets of entities with high accuracy. As far as we know, our method is the first of its kind and is currently used in several commercial settings.


  1. 1.
    Adamic, L.A., Adar, E.: Friends and neighbors on the web. Soc. Netw. 25(3), 211–230 (2003)CrossRefGoogle Scholar
  2. 2.
    Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). Scholar
  3. 3.
    Balog, K., Bron, M., De Rijke, M.: Query modeling for entity search based on terms, categories, and examples. ACM Trans. Inf. Syst. (TOIS) 29(4), 22 (2011)CrossRefGoogle Scholar
  4. 4.
    Bishop, C.: Pattern Recognition and Machine Learning. Springer, New York (2006)zbMATHGoogle Scholar
  5. 5.
    Blanco, R., Cambazoglu, B.B., Mika, P., Torzec, N.: Entity recommendations in web search. In: Alani, H., et al. (eds.) ISWC 2013. LNCS, vol. 8219, pp. 33–48. Springer, Heidelberg (2013). Scholar
  6. 6.
    Côté, R.A., College of American Pathologists, et al.: Systematized nomenclature of medicine. College of American Pathologists (1977)Google Scholar
  7. 7.
    Damljanovic, D., Stankovic, M., Laublet, P.: Linked data-based concept recommendation: comparison of different methods in open innovation scenario. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 24–38. Springer, Heidelberg (2012). Scholar
  8. 8.
    Frijters, R., Van Vugt, M., Smeets, R., Van Schaik, R., De Vlieg, J., Alkema, W.: Literature mining for the discovery of hidden connections between drugs, genes and diseases. PLoS Comput. Biol. 6(9), e1000943 (2010)CrossRefGoogle Scholar
  9. 9.
    Ghahramani, Z., Heller, K.A.: Bayesian sets. In: Proceedings of NIPS, pp. 435–442 (2005)Google Scholar
  10. 10.
    Groth, P., Gibson, A., Velterop, J.: The anatomy of a nanopublication. Inf. Serv. Use 30(1–2), 51–56 (2010)CrossRefGoogle Scholar
  11. 11.
    Hopfield, J.J.: Neural networks and physical systems with emergent collective computational abilities. Proc. Nat. Acad. Sci. 79(8), 2554–2558 (1982)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Jayaram, N., Gupta, M., Khan, A., Li, C., Yan, X., Elmasri, R.: GQBE: querying knowledge graphs by example entity tuples. In: Proceedings of ICDE, pp. 1250–1253 (2014)Google Scholar
  13. 13.
    Jelier, R., Jenster, G., Dorssers, L.C., van der Eijk, C.C., van Mulligen, E.M., Mons, B., Kors, J.A.: Co-occurrence based meta-analysis of scientific texts: retrieving biological relationships between genes. Bioinformatics 21(9), 2049–2058 (2005)CrossRefGoogle Scholar
  14. 14.
    Jiang, J., Lu, W., Rong, X., Gao, Y.: Adapting language modeling methods for expert search to rank Wikipedia entities. In: Geva, S., Kamps, J., Trotman, A. (eds.) INEX 2008. LNCS, vol. 5631, pp. 264–272. Springer, Heidelberg (2009). Scholar
  15. 15.
    Kastrin, A., Rindflesch, T.C., Hristovski, D.: Link prediction on a network of co-occurring MeSH terms: towards literature-based discovery. Methods Inf. Med. 55(04), 340–346 (2016)CrossRefGoogle Scholar
  16. 16.
    Lindberg, D.A., Humphreys, B.L., McCray, A.T.: The unified medical language system. Methods Inf. Med. 32(04), 281–291 (1993)CrossRefGoogle Scholar
  17. 17.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of NIPS, pp. 3111–3119 (2013)Google Scholar
  18. 18.
    Milne, D., Witten, I.H.: Learning to link with Wikipedia. In: Proceedings of CIKM, pp. 509–518 (2008)Google Scholar
  19. 19.
    Ni, Y., Xu, Q.K., Cao, F., Mass, Y., Sheinwald, D., Zhu, H.J., Cao, S.S.: Semantic documents relatedness using concept graph representation. In: Proceedings of WSDM, pp. 635–644 (2016)Google Scholar
  20. 20.
    Noy, N.E., et al.: BioPortal: ontologies and integrated data resources at the click of a mouse. Nucleic Acids Res. 37, W170–W173 (2009)CrossRefGoogle Scholar
  21. 21.
    Perozzi, B., Al-Rfou, R., Skiena, S.: DeepWalk: online learning of social representations. In: Proceedings of KDD, pp. 701–710 (2014)Google Scholar
  22. 22.
    Piwowar, H., et al.: The state of OA: a large-scale analysis of the prevalence and impact of open access articles. PeerJ 6, e4375 (2018)CrossRefGoogle Scholar
  23. 23.
    Sarmento, L., Jijkuon, V., de Rijke, M., Oliveira, E.: More like these: growing entity classes from seeds. In: Proceedings of CIKM, pp. 959–962 (2007)Google Scholar
  24. 24.
    Sioutos, N., de Coronado, S., Haber, M.W., Hartel, F.W., Shaiu, W.L., Wright, L.W.: NCI Thesaurus: a semantic model integrating cancer-related clinical and molecular information. J. Biomed. Inform. 40(1), 30–43 (2007)CrossRefGoogle Scholar
  25. 25.
    Swanson, D.R.: Fish oil, Raynaud’s syndrome, and undiscovered public knowledge. Perspect. Biol. Med. 30(1), 7–18 (1986)CrossRefGoogle Scholar
  26. 26.
    Tirilly, P., Claveau, V., Gros, P.: A review of weighting schemes for bag of visual words image retrieval. Technical report (2009)Google Scholar
  27. 27.
    Tseytlin, E., Mitchell, K., Legowski, E., Corrigan, J., Chavan, G., Jacobson, R.S.: NOBLE-Flexible concept recognition for large-scale biomedical natural language processing. BMC Bioinformatics 17(1), 32 (2016)CrossRefGoogle Scholar
  28. 28.
    Vercoustre, A.-M., Pehcevski, J., Thom, J.A.: Using Wikipedia categories and links in entity ranking. In: Fuhr, N., Kamps, J., Lalmas, M., Trotman, A. (eds.) INEX 2007. LNCS, vol. 4862, pp. 321–335. Springer, Heidelberg (2008). Scholar
  29. 29.
    Wang, R.C., Cohen, W.W.: Iterative set expansion of named entities using the web. In: Proceedings of ICDM, pp. 1091–1096 (2008)Google Scholar
  30. 30.
    Weerkamp, W., Balog, K., Meij, E.: A generative language modeling approach for ranking entities. In: Geva, S., Kamps, J., Trotman, A. (eds.) INEX 2008. LNCS, vol. 5631, pp. 292–299. Springer, Heidelberg (2009). Scholar
  31. 31.
    Zhang, Y., Xiao, Y., Hwang, S.w., Wang, H., Wang, X.S., Wang, W.: Entity suggestion with conceptual explanation. In: Proceedings of IJCAI, pp. 4244–4250 (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Authors and Affiliations

  • Yalung Zheng
    • 1
  • Jon Ezeiza
    • 2
  • Mehdi Farzanehpour
    • 2
  • Jacopo Urbani
    • 1
    Email author
  1. 1.Vrije Universiteit AmsterdamAmsterdamThe Netherlands
  2. 2.SCITODATE B.V.AmsterdamThe Netherlands

Personalised recommendations