Joint Entity Recognition and Linking in Technical Domains Using Undirected Probabilistic Graphical Models

  • Hendrik ter Horst
  • Matthias Hartung
  • Philipp Cimiano
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10318)

Abstract

The problems of recognizing mentions of entities in texts and linking them to unique knowledge base identifiers have received considerable attention in recent years. In this paper we present a probabilistic system based on undirected graphical models that jointly addresses both the entity recognition and the linking task. Our framework considers the span of mentions of entities as well as the corresponding knowledge base identifier as random variables and models the joint assignment using a factorized distribution. We show that our approach can be easily applied to different technical domains by merely exchanging the underlying ontology. On the task of recognizing and linking disease names, we show that our approach outperforms the state-of-the-art systems DNorm and TaggerOne, as well as two strong lexicon-based baselines. On the task of recognizing and linking chemical names, our system achieves comparable performance to the state-of-the-art.

Keywords

Joint entity recognition and linking Undirected probabilistic graphical models Diseases Chemicals 

References

  1. 1.
    Doğan, R.I., Leaman, R., Lu, Z.: NCBI disease corpus. A resource for disease name recognition and concept normalization. J. Biomed. Inform. 47, 1–10 (2014)CrossRefGoogle Scholar
  2. 2.
    Durrett, G., Klein, D.: A joint model for entity analysis. Coreference, typing, and linking. TACL 2, 477–490 (2014)Google Scholar
  3. 3.
    Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by Gibbs sampling. In: Proceedings of ACL, pp. 363–370 (2005)Google Scholar
  4. 4.
    Ganea, O.E., Ganea, M., Lucchi, A., Eickhoff, C., Hofmann, T.: Probabilistic bag-of-hyperlinks model for entity linking. In: Proceedings of WWW, pp. 927–938 (2016)Google Scholar
  5. 5.
    Hakimov, S., Horst, H., Jebbara, S., Hartung, M., Cimiano, P.: Combining textual and graph-based features for named entity disambiguation using undirected probabilistic graphical models. In: Blomqvist, E., Ciancarini, P., Poggi, F., Vitali, F. (eds.) EKAW 2016. LNCS, vol. 10024, pp. 288–302. Springer, Cham (2016). doi:10.1007/978-3-319-49004-5_19 CrossRefGoogle Scholar
  6. 6.
    Hartung, M., Klinger, R., Zwick, M., Cimiano, P.: Towards gene recognition from rare and ambiguous abbreviations using a filtering approach. In: Proceedings of BioNLP 2014, pp. 118–127 (2014)Google Scholar
  7. 7.
    Hoffart, J., Yosef, M.A., Bordino, I., Fürstenau, H., Pinkal, M., Spaniol, M., Taneva, B., Thater, S., Weikum, G.: Robust disambiguation of named entities in text. In: Proceedings of EMNLP, pp. 782–792 (2011)Google Scholar
  8. 8.
    Koller, D., Friedman, N.: Probabilistic Graphical Models. Principles and Techniques. MIT Press, Cambridge (2009)MATHGoogle Scholar
  9. 9.
    Kschischang, F.R., Frey, B.J., Loeliger, H.A.: Factor graphs and sum product algorithm. IEEE Trans. Inf. Theory 47(2), 498–519 (2001)MathSciNetCrossRefMATHGoogle Scholar
  10. 10.
    Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields. Probabilistic models for segmenting and labeling sequence data. In: Proceedings of ICML, pp. 282–289 (2001)Google Scholar
  11. 11.
    Leaman, R., Lu, Z.: TaggerOne. Joint named entity recognition and normalization with semi-Markov models. Bioinformatics 32, 2839–2846 (2016)CrossRefGoogle Scholar
  12. 12.
    Leaman, R., Dogan, R.I., Lu, Z.: DNorm. Disease name normalization with pairwise learning to rank. Bioinformatics 29, 2909–2917 (2013)CrossRefGoogle Scholar
  13. 13.
    Lee, H.C., Hsu, Y.Y., Kao, H.Y.: An enhanced CRF-based system for disease name entity recognition and normalization on BioCreative V DNER task. In: Proceedings of the BioCreative V Workshop, pp. 226–233 (2015)Google Scholar
  14. 14.
    Luo, G., Huang, X., Lin, C.Y., Nie, Z.: Joint entity recognition and disambiguation. In: Proceedings of EMNLP, pp. 879–888 (2015)Google Scholar
  15. 15.
    Mihalcea, R., Csomai, A.: Wikify! Linking documents to encyclopedic knowledge. In: Proceedings of CIKM, pp. 233–242 (2007)Google Scholar
  16. 16.
    Moro, A., Raganato, A., Navigli, R.: Entity linking meets word sense disambiguation. A unified approach. TACL 2, 231–244 (2014)Google Scholar
  17. 17.
    Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 3–26 (2007)CrossRefGoogle Scholar
  18. 18.
    Nguyen, D., Theobald, M., Weikum, G.: J-NERD. Joint named entity recognition and disambiguation with rich linguistic features. TACL 4, 215–229 (2016)Google Scholar
  19. 19.
    Poon, H., Domingos, P.: Machine reading: a “Killer App” for statistical relational AI. In: Proceedings of StarAI, pp. 76–81 (2010)Google Scholar
  20. 20.
    Ratinov, L., Roth, D., Downey, D., Anderson, M.: Local and global algorithms for disambiguation to wikipedia. In: Proceedings of ACL: HLT, pp. 1375–1384 (2011)Google Scholar
  21. 21.
    Singh, S., Wick, M., McCallum, A.: Monte Carlo MCMC. Efficient inference by approximate sampling. In: Proceedings of EMNLP, pp. 1104–1113 (2012)Google Scholar
  22. 22.
    Usbeck, R., Ngomo, A.C.N., Röder, M., Gerber, D., Coelho, S.A., Auer, S., Both, A.: AGDISTIS. Graph-based disambiguation of named entities using linked data. In: The Semantic Web-ISWC 2014, pp. 457–471 (2014)Google Scholar
  23. 23.
    Wei, C.H., Peng, Y., Leaman, R., Davis, A.P., Mattingly, C.J., Li, J., Wiegers, T.C., Lu, Z.: Overview of the BioCreative V Chemical Disease Relation (CDR) task. In: Proceedings of the BioCreative V Evaluation Workshop, pp. 154–166 (2015)Google Scholar
  24. 24.
    Wick, M., Rohanimanesh, K., Culotta, A., McCallum, A.: SampleRank. learning preferences from atomic gradients. In: Proceedings of the NIPS Workshop on Advances in Ranking, pp. 1–5 (2009)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Hendrik ter Horst
    • 1
  • Matthias Hartung
    • 1
  • Philipp Cimiano
    • 1
  1. 1.Cognitive Interaction Technology Cluster of Excellence (CITEC), Semantic Computing GroupBielefeld UniversityBielefeldGermany

Personalised recommendations