Representing Annotated Texts as RDF

  • Philipp Cimiano
  • Christian Chiarcos
  • John P. McCrae
  • Jorge Gracia


Text annotation consists in defining markables (elements to be annotated), their features (attributes and values of annotations) and relations between markables (e.g. syntactic dependencies or semantic links). In this chapter we describe the principles for annotating text data using RDF-compliant formalisms. These principles provide the basis for making annotated corporate and text collections accessible from the LLOD ecosystem.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    E. Hovy, M. Marcus, M. Palmer, L. Ramshaw, R. Weischedel, OntoNotes: the 90% solution, in Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (HLT-NAACL 2006) (Association for Computational Linguistics, New York, 2006), pp. 57–60Google Scholar
  2. 2.
    J. Nivre, Ž. Agić, L. Ahrenberg, et. al., Universal dependencies 1.4 (2016).
  3. 3.
    N. Ide, C. Chiarcos, M. Stede, S. Cassidy, Designing annotation schemes: from model to representation, in Handbook of Linguistic Annotation, ed. by N. Ide, J. Pustejovsky, Text, Speech, and Language Technology (Springer, Berlin, 2017)CrossRefGoogle Scholar
  4. 4.
    C. Chiarcos, Ontologies of linguistic annotation: survey and perspectives, in Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC), Istanbul, 2012, pp. 303–310Google Scholar
  5. 5.
    K. Verspoor, K. Livingston, Towards adaptation of linguistic annotations to scholarly annotation formalisms on the Semantic Web, in Proceedings of the 6th Linguistic Annotation Workshop (Association for Computational Linguistics, Jeju, 2012), pp. 75–84Google Scholar
  6. 6.
    L. Isaksen, R. Simon, E.T. Barker, P. de Soto Cañamares, Pelagios and the emerging graph of ancient world data, in Proceedings of the 2014 ACM Conference on Web Science (ACM, New York, 2014), pp. 197–201Google Scholar
  7. 7.
    R. Sanderson, P. Ciccarese, B. Young, Web Annotation Data Model. Technical Report, W3C Recommendation (2017).
  8. 8.
    P. Ciccarese, M. Ocana, L.J. Garcia Castro, S. Das, T. Clark, An open annotation ontology for science on web 3.0, J. Biomed. Semant. 2(Suppl. 2), S4 (2011)., CrossRefGoogle Scholar
  9. 9.
    D.C. Comeau, R. Islamaj Doğan, P. Ciccarese, K.B. Cohen, M. Krallinger, F. Leitner, Z. Lu, Y. Peng, F. Rinaldi, M. Torii, et al., BioC: a minimalist approach to interoperability for biomedical text processing, Database 2013, bat064 (2013)Google Scholar
  10. 10.
    R. Sanderson, P. Ciccarese, H. Van de Sompel, Designing the W3C Open Annotation data model, in Proceedings of the 5th Annual ACM Web Science Conference, WebSci ’13 (ACM, New York, 2013), pp. 366–375. CrossRefGoogle Scholar
  11. 11.
    R. Sanderson, P. Ciccarese, B. Young, Web Annotation vocabulary. Technical Report, W3C Recommendation (2017).
  12. 12.
    P. Mendes, M. Jakob, A. García-Silva, C. Bizer, DBpedia Spotlight: shedding light on the web of documents, in Proceedings of the 7th International Conference on Semantic Systems (I-Semantics 2011), Graz, 2011Google Scholar
  13. 13.
    S. Hellmann, NIF 2.0 Core Ontology. Technical Report, AKSW, University Leipzig (2015)., version of 08-04-2015. Accessed 9 July 2019
  14. 14.
    E. Wilde, M. Duerst, RFC 5147 – URI fragment identifiers for the text/plain media type. Technical Report, Internet Engineering Task Force (IETF), Network Working Group (2008)Google Scholar
  15. 15.
    N. Freed, N. Borenstein, RFC 2046 – Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types. Technical Report, Internet Engineering Task Force (IETF), Network Working Group (1996)Google Scholar
  16. 16.
    P. Grosso, E. Maler, J. Marsh, N. Walsh, XPointer Framework. W3C Recommendation 25 March 2003. Technical Report, W3C (2003)Google Scholar
  17. 17.
    A. Fokkens, A. Soroa, Z. Beloki, N. Ockeloen, G. Rigau, W.R. van Hage, P. Vossen, NAF and GAF: Linking linguistic annotations, in Proceedings of the 10th Joint ISO-ACL SIGSEM Workshop on Interoperable Semantic Annotation (2014), pp. 9–16Google Scholar
  18. 18.
    N. Ide, K. Suderman, E. Nyberg, J. Pustejovsky, M. Verhagen, LAPPS/Galaxy: Current state and next steps, in Proceedings of the 3rd International Workshop on Worldwide Language Service Infrastructure and 2nd Workshop on Open Infrastructures and Analysis Frameworks for Human Language Technologies (WLSI/OIAF4HLT2016) (2016), pp. 11–18Google Scholar
  19. 19.
    S. Hellmann, J. Lehmann, S. Auer, M. Brümmer, Integrating NLP using Linked Data, in Proceedings of the 12th International Semantic Web Conference, 21–25 October 2013, Sydney, 2013. Also see
  20. 20.
    M. Egner, M. Lorch, E. Biddle, UIMA Grid: Distributed large-scale text analysis, in Proceedings of the 7th IEEE International Symposium on Cluster Computing and the Grid (CCGRID’07), Rio de Janeiro, 2007, pp. 317–326Google Scholar
  21. 21.
    H. Cunningham, GATE, a general architecture for text engineering. Comput. Hum. 36(2), 223 (2002)Google Scholar
  22. 22.
    S. Hellmann, J. Lehmann, S. Auer, Linked-data aware URI schemes for referencing text fragments, in Proceedings of the International Conference on Knowledge Engineering and Knowledge Management (Springer, Berlin, 2012), pp. 175–184Google Scholar
  23. 23.
    M. Davis, K. Whistler, Unicode Standard Annex #15. Unicode Normalization Forms. Technical Report, Unicode, Inc. (2017). Unicode 10.0.0, version of 2017-05-26, revision 45Google Scholar
  24. 24.
    E. Brill, J. Wu, Classifier combination for improved lexical disambiguation, in Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and the 17th International Conference on Computational Linguistics (COLING-ACL 1998), Montréal, 1998, pp. 191–195Google Scholar
  25. 25.
    M.P. Marcus, B. Santorini, M.A. Marcinkiewicz, Building a large annotated corpus of English: the Penn treebank. Comput. Linguist. 19, 313 (1993)Google Scholar
  26. 26.
    S. Hellmann, M. Brümmer, M. Ackermann, Provenance and confidence for NIF annotations. Technical Report, AKSW, University of Leipzig, Germany (2016). Version of Oct 17, 2016Google Scholar
  27. 27.
    E. Rubiera, L. Polo, D. Berrueta, A. El Ghali, TELIX: An RDF-based model for linguistic annotation, in Proceedings of the 9th Extended Semantic Web Conference (ESWC 2012), Heraklion, 2012Google Scholar
  28. 28.
    A. Miles, S. Bechhofer, SKOS Simple Knowledge Organization System eXtension for Labels (SKOS-XL). Technical Report, W3C Recommendation (2009)Google Scholar
  29. 29.
    R. Agerri, I. Aldabe, E. Laparra, G. Rigau Claramunt, A. Fokkens, P. Huijgen, R. Izquierdo Beviá, M. van Erp, P. Vossen, A.L. Minard, et al., Multilingual event detection using the NewsReader pipelines, in Proceedings of the Workshop on Cross-Platform Text Mining and Natural Language Processing Interoperability, collocated with International Conference on Language Resources and Evaluation (LREC) (2016)Google Scholar
  30. 30.
    M. Verhagen, K. Suderman, D. Wang, N. Ide, C. Shi, J. Wright, J. Pustejovsky, The LAPPS Interchange Format, in Proceedings of the International Workshop on Worldwide Language Service Infrastructure (Springer, Berlin, 2015), pp. 33–47Google Scholar
  31. 31.
    B. Bohnet, J. Kuhn, The best of both worlds: a graph-based completion model for transition-based parsers, in Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (Association for Computational Linguistics, Stroudsburg, 2012), pp. 77–87Google Scholar
  32. 32.
    A. Gangemi, V. Presutti, D. Reforgiato Recupero, A.G. Nuzzolese, F. Draicchio, M. Mongiovì, Semantic Web machine reading with FRED Semantic Web 8(6), 873 (2017)CrossRefGoogle Scholar
  33. 33.
    R. Witte, B. Sateli, The LODeXporter: flexible generation of linked open data triples from NLP frameworks for automatic knowledge base construction, in Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC) (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Semantic Computing GroupBielefeld UniversityBielefeldGermany
  2. 2.Angewandte ComputerlinguistikGoethe-UniversityFrankfurt am MainGermany
  3. 3.Insight Centre for Data AnalyticsNational University of IrelandGalwayIreland
  4. 4.Aragon Institute of Engineering Research (I3A)University of ZaragozaZaragozaSpain

Personalised recommendations