Integrating NLP Using Linked Data

  • Sebastian Hellmann
  • Jens Lehmann
  • Sören Auer
  • Martin Brümmer
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8219)


We are currently observing a plethora of Natural Language Processing tools and services being made available. Each of the tools and services has its particular strengths and weaknesses, but exploiting the strengths and synergistically combining different tools is currently an extremely cumbersome and time consuming task. Also, once a particular set of tools is integrated, this integration is not reusable by others. We argue that simplifying the interoperability of different NLP tools performing similar but also complementary tasks will facilitate the comparability of results and the creation of sophisticated NLP applications. In this paper, we present the NLP Interchange Format (NIF). NIF is based on a Linked Data enabled URI scheme for identifying elements in (hyper-)texts and an ontology for describing common NLP terms and concepts. In contrast to more centralized solutions such as UIMA and GATE, NIF enables the creation of heterogeneous, distributed and loosely coupled NLP applications, which use the Web as an integration platform. We present several use cases of the second version of the NIF specification (NIF 2.0) and the result of a developer study.


Data Integration Natural Language Processing RDF 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Auer, S., Hellmann, S.: The web of data: Decentralized, collaborative, interlinked and interoperable. In: LREC (2012)Google Scholar
  2. 2.
    Chiarcos, C.: Ontologies of linguistic annotation: Survey and perspectives. In: LREC. European Language Resources Association (2012)Google Scholar
  3. 3.
    Chiarcos, C., Ritz, J., Stede, M.: By all these lovely tokens... merging conflicting tokenizations. Language Resources and Evaluation 46(1), 53–74 (2012)Google Scholar
  4. 4.
    Ciccarese, P., Ocana, M., Garcia Castro, L., Das, S., Clark, T.: An open annotation ontology for science on web 3.0. Biomedical Semantics 2, S4+ (2011)Google Scholar
  5. 5.
    Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: A framework and graphical development environment for robust NLP tools and applications. In: ACL (2002)Google Scholar
  6. 6.
    Ferrucci, D., Lally, A.: UIMA: An architectural approach to unstructured information processing in the corporate research environment. Natural Language Engineering 10(3/4), 327–348 (2004)CrossRefGoogle Scholar
  7. 7.
    Hellmann, S., Lehmann, J., Auer, S.: Linked-data aware uri schemes for referencing text fragments. In: ten Teije, A., Völker, J., Handschuh, S., Stuckenschmidt, H., d’Acquin, M., Nikolov, A., Aussenac-Gilles, N., Hernandez, N. (eds.) EKAW 2012. LNCS, vol. 7603, pp. 175–184. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  8. 8.
    Hellmann, S., Lehmann, J., Auer, S., Nitzschke, M.: Nif combinator: Combining NLP tool output. In: ten Teije, A., Völker, J., Handschuh, S., Stuckenschmidt, H., d’Acquin, M., Nikolov, A., Aussenac-Gilles, N., Hernandez, N. (eds.) EKAW 2012. LNCS, vol. 7603, pp. 446–449. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  9. 9.
    Ide, N., Suderman, K.: Bridging the Gaps: Interoperability for Language Engineering Architectures using GrAF. LRE Journal 46(1), 75–89 (2012)Google Scholar
  10. 10.
    Khalili, A., Auer, S., Hladky, D.: The rdfa content editor - from wysiwyg to wysiwym. In: COMPSAC (2012)Google Scholar
  11. 11.
    Mendes, P., Jakob, M., García-Silva, A., Bizer, C.: Dbpedia spotlight: Shedding light on the web of documents. In: I-Semantics (2011)Google Scholar
  12. 12.
    Peroni, S., Vitali, F.: Annotations with earmark for arbitrary, overlapping and out-of order markup. In: Borghoff, U.M., Chidlovskii, B. (eds.) ACM Symposium on Document Engineering, pp. 171–180. ACM (2009)Google Scholar
  13. 13.
    Rizzo, G., Troncy, R., Hellmann, S., Bruemmer, M.: NERD meets NIF: Lifting NLP extraction results to the linked data cloud. In: LDOW (2012)Google Scholar
  14. 14.
    Rubiera, E., Polo, L., Berrueta, D., El Ghali, A.: TELIX: An RDF-based model for linguistic annotation. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 195–209. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  15. 15.
    Schierle, M.: Language Engineering for Information Extraction. Phd thesis, Universität Leipzig (2011)Google Scholar
  16. 16.
    Singh, S., Subramanya, A., Pereira, F., McCallum, A.: Wikilinks: A large-scale cross-document coreference corpus labeled via links to Wikipedia. Technical Report UM-CS-2012-015 (2012)Google Scholar
  17. 17.
    Tobies, S.: Complexity results and practical algorithms for logics in knowledge representation. PhD thesis, TU Dresden (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Sebastian Hellmann
    • 1
  • Jens Lehmann
    • 1
  • Sören Auer
    • 1
  • Martin Brümmer
    • 1
  1. 1.Institute of Computer Science, AKSW GroupUniversity of LeipzigLeipzigGermany

Personalised recommendations