An Ontology-Driven Probabilistic Soft Logic Approach to Improve NLP Entity Annotations

  • Marco RospocherEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11136)


Many approaches for Knowledge Extraction and Ontology Population rely on well-known Natural Language Processing (NLP) tasks, such as Named Entity Recognition and Classification (NERC) and Entity Linking (EL), to identify and semantically characterize the entities mentioned in natural language text. Despite being intrinsically related, the analyses performed by these tasks differ, and combining their output may result in NLP annotations that are implausible or even conflicting considering common world knowledge about entities. In this paper we present a Probabilistic Soft Logic (PSL) model that leverages ontological entity classes to relate NLP annotations from different tasks insisting on the same entity mentions. The intuition behind the model is that an annotation likely implies some ontological classes on the entity identified by the mention, and annotations from different tasks on the same mention have to share more or less the same implied entity classes. In a setting with various NLP tools returning multiple, confidence-weighted, candidate annotations on a single mention, the model can be operationally applied to compare the different annotation combinations, and to possibly revise the tools’ best annotation choice. We experimented applying the model with the candidate annotations produced by two state-of-the-art tools for NERC and EL, on three different datasets. The results show that the joint “a posteriori” annotation revision suggested by our PSL model consistently improves the original scores of the two tools.


Probabilistic Soft Logic (PSL) Named Entity Recognition And Classification (NERC) Candidate Annotations Entity Linking (EL) DBpedia Spotlight 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



The author would like to thank Dr. Francesco Corcoglioniti for some useful suggestions and fruitful discussions while developing the idea.


  1. 1.
    Vossen, P., et al.: NewsReader: using knowledge resources in a cross-lingual reading machine to generate more knowledge from massive streams of news. Knowl.-Based Syst. 110, 60–85 (2016). Scholar
  2. 2.
    Corcoglioniti, F., Rospocher, M., Aprosio, A.P.: Frame-based ontology population with PIKES. IEEE Trans. Knowl. Data Eng. 28(12), 3261–3275 (2016)CrossRefGoogle Scholar
  3. 3.
    Hoffart, J., Suchanek, F.M., Berberich, K., Weikum, G.: YAGO2: a spatially and temporally enhanced knowledge base from wikipedia. Artif. Intell. 194, 28–61 (2013)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by Gibbs sampling. In: Proceedings of ACL 2005, pp. 363–370 (2005)Google Scholar
  5. 5.
    Daiber, J., Jakob, M., Hokamp, C., Mendes, P.N.: Improving efficiency and accuracy in multilingual entity extraction. In: Proceedings of I-Semantics (2013)Google Scholar
  6. 6.
    Hoffart, J., et al.: Robust disambiguation of named entities in text. In: Proceedings of EMNLP 2011 (2011)Google Scholar
  7. 7.
    Minard, A.L., et al.: MEANTIME, the newsreader multilingual event and time corpus. In: Proceedings of LREC 2016 (2016)Google Scholar
  8. 8.
    Ji, H., Grishman, R., Dang, H.: Overview of the TAC2011 knowledge base population track. In: TAC 2011 Proceedings Papers (2011)Google Scholar
  9. 9.
    Pujara, J., Miao, H., Getoor, L., Cohen, W.: Knowledge graph identification. In: Alani, H. (ed.) ISWC 2013. LNCS, vol. 8218, pp. 542–557. Springer, Heidelberg (2013). Scholar
  10. 10.
    Stern, R., Sagot, B., Béchet, F.: A joint named entity recognition and entity linking system. In: Proceedings of HYBRID 2012, pp. 52–60 (2012)Google Scholar
  11. 11.
    Nguyen, D.B., Theobald, M., Weikum, G.: J-NERD: joint named entity recognition and disambiguation with rich linguistic features. TACL 4, 215–229 (2016)Google Scholar
  12. 12.
    Bach, S.H., Broecheler, M., Huang, B., Getoor, L.: Hinge-loss Markov random fields and probabilistic soft logic. J. Mach. Learn. Res. (JMLR) 18(109), 1–67 (2017)MathSciNetzbMATHGoogle Scholar
  13. 13.
    Lehmann, J., et al.: Dbpedia - a large-scale, multilingual knowledge base extracted from Wikipedia. Semant. Web 6(2), 167–195 (2015)Google Scholar
  14. 14.
    Corcoglioniti, F., Rospocher, M., Mostarda, M., Amadori, M.: Processing billions of RDF triples on a single machine using streaming and sorting. In: Proceedings of the 30th Annual ACM Symposium on Applied Computing. SAC 2015, pp. 368–375. ACM (2015)Google Scholar
  15. 15.
    Baker, C.F., Fillmore, C.J., Lowe, J.B.: The Berkeley FrameNet Project. In: Proceedings of ACL 1998, pp. 86–90 (1998)Google Scholar
  16. 16.
    Tonelli, S., Bryl, V., Giuliano, C., Serafini, L.: Investigating the semantics of frame elements. In: ten Teije, A. (ed.) EKAW 2012. LNCS (LNAI), vol. 7603, pp. 130–143. Springer, Heidelberg (2012). Scholar
  17. 17.
    Beltagy, I., Erk, K., Mooney, R.J.: Probabilistic soft logic for semantic textual similarity. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL-14), Baltimore, MD, pp. 1210–1219 (2014)Google Scholar
  18. 18.
    Liu, S., Liu, K., He, S., Zhao, J.: A probabilistic soft logic based approach to exploiting latent and global information in event classification. In: AAAI, pp. 2993–2999. AAAI Press (2016)Google Scholar
  19. 19.
    Wang, W.C., Ku, L.W.: Identifying Chinese lexical inference using probabilistic soft logic. In: 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 737–743, August 2016Google Scholar
  20. 20.
    Plu, J., Rizzo, G., Troncy, R.: A hybrid approach for entity recognition and linking. In: Gandon, F., Cabrio, E., Stankovic, M., Zimmermann, A. (eds.) SemWebEval 2015. CCIS, vol. 548, pp. 28–39. Springer, Cham (2015). Scholar
  21. 21.
    Sil, A., Yates, A.: Re-ranking for joint named-entity recognition and linking. In: Proceedings of CIKM 2013, pp. 2369–2374 (2013)Google Scholar
  22. 22.
    Luo, G., Huang, X., Lin, C.Y., Nie, Z.: Joint named entity recognition and disambiguation. In: Proceedings of EMNLP 2015, pp. 879–888 (2015)Google Scholar
  23. 23.
    Leaman, R., Lu, Z.: TaggerOne: joint named entity recognition and normalization with semi-Markov models. Bioinformatics 32(18), 2839–2846 (2016)CrossRefGoogle Scholar
  24. 24.
    Durrett, G., Klein, D.: A joint model for entity analysis: coreference, typing, and linking. TACL 2, 477–490 (2014)Google Scholar
  25. 25.
    Rospocher, M., Corcoglioniti, F.: Joint posterior revision of NLP annotations via ontological knowledge. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18, pp. 4316–4322 (2018).
  26. 26.
    Dong, X., et al.: Knowledge vault: a web-scale approach to probabilistic knowledge fusion. In: Proceedings of ACM KDD 2014, pp. 601–610 (2014)Google Scholar
  27. 27.
    De Sa, C., et al.: DeepDive: declarative knowledge base construction. SIGMOD Rec. 45(1), 60–67 (2016)CrossRefGoogle Scholar
  28. 28.
    Mitchell, T., et al.: Never-ending learning. In: Proceedings of AAAI-15 (2015)Google Scholar
  29. 29.
    Jiang, S., Lowd, D., Dou, D.: Learning to refine an automatically extracted knowledge base using markov logic. In: Proceedings of ICDM 2012, pp. 912–917 (2012)Google Scholar
  30. 30.
    Suchanek, F.M., Sozio, M., Weikum, G.: SOFIE: a self-organizing framework for information extraction. In: WWW 2009 (2009)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Fondazione Bruno Kessler – IRSTTrentoItaly

Personalised recommendations