CoNLL-RDF: Linked Corpora Done in an NLP-Friendly Way

  • Christian Chiarcos
  • Christian Fäth
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10318)


We introduce CoNLL-RDF, a direct rendering of the CoNLL format in RDF, accompanied by a formatter whose output mimicks CoNLL’s original TSV-style layout. CoNLL-RDF represents a middle ground that accounts for the needs of NLP specialists (easy to read, easy to parse, close to conventional representations), but that also facilitates LLOD integration by applying off-the-shelf Semantic Web technology to CoNLL corpora and annotations. The CoNLL-RDF infrastructure is published as open source. We also provide SPARQL update scripts for selected use cases as described in this paper.


Open Annotation Dependency Annotation Datatype Property Select Query Empty Element 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



The research of Christian Chiarcos was supported by the BMBF-funded Research Group ‘Linked Open Dictionaries (LiODi)’ (2015–2020). The research of Christian Fäth was conducted in the context of DFG-funded projects ‘Virtuelle Fachbibliothek’ (2015–2016) and ‘Fachinformationsdienst Linguistik’ (2017–2019).


  1. 1.
    Beckett, D., Berners-Lee, T., Prud’hommeaux, E., Carothers, G.: RDF 1.1 Turtle. (2014).
  2. 2.
    Brants, S., Hansen, S.: Developments in the TIGER annotation scheme and their realization in the corpus. In: LREC (2002)Google Scholar
  3. 3.
    Chiarcos, C., Sukhreva, M.: OLiA - Ontologies of Linguistic Annotation. Semant. Web J. 518, 379–386 (2015)CrossRefGoogle Scholar
  4. 4.
    Chiarcos, C., Fäth, C., Renner-Westermann, H., Abromeit, F., Dimitrova, V.: Lin\(|\)gu\(|\)is\(|\)tik: building the linguist’s pathway to bibliographies, libraries, language resources and linked open data. In: Calzolari, N., Choukri, K., Declerck, T., Goggi, S., Grobelnik, M., Maegaard, B., Mariani, J., Mazo, H., Moreno, A., Odijk, J., Piperidis, S. (eds.) Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), European Language Resources Association (ELRA), Paris, France (May 2016)Google Scholar
  5. 5.
    Chiarcos, C., McCrae, J., Cimiano, P., Fellbaum, C.: Towards open data for linguistics: linguistic linked data. In: Oltramari, A., Vossen, P., Qin, L., Hovy, E. (eds.) New Trends of Research in Ontologies and Lexical Resources, pp. 7–25. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  6. 6.
    Chiarcos, C., Nordhoff, S., Hellmann, S.: Linked Data in Linguistics. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  7. 7.
    Cimiano, P., McCrae, J., Buitelaar, P.: Lexicon model for ontologies (2016).
  8. 8.
    Das, S., Sundara, S., Cyganiak, R.: R2RML: RDB to RDF mapping language (2012).
  9. 9.
    Declerck, T., Buitelaar, P., Wunner, T., McCrae, J., Montiel-Ponsoda, E., de Cea, A.: Lemon: an ontology-lexicon model for the multilingual semantic web. In: W3C Workshop: The Multilingual Web - Where Are We? Madrid, Spain, October 2010Google Scholar
  10. 10.
    Hellmann, S., Lehmann, J., Auer, S., Brümmer, M.: Integrating NLP using linked data. In: Proceedings of 12th International Semantic Web Conference, 21–25 October 2013, Sydney, Australia (2013).
  11. 11.
    Ide, N., Chiarcos, C., Stede, M., Cassidy, S.: Designing annotation schemes: from model to representation. In: Ide, N., Pustejovsky, J. (eds.) Handbook of Linguistic Annotation: Text, Speech, and Language Technology. Springer, Dordrecht (2017, in press)Google Scholar
  12. 12.
    Johnson, M.: Computational linguistics. Where do we go from here? Invited talk at the 50th Annual Meeting of the Association of Computational Linguistics (ACL-IJCNLP 2012), Jeju, Korea (2012). mjohnson/papers/Johnson12next50.pdf. Accessed 13 July 2016
  13. 13.
    Lezius, W., Biesinger, H., Gerstenberger, C.: TigerXML quick reference guide (2002)Google Scholar
  14. 14.
    Nivre, J., Agić, Ž., Ahrenberg, L., et. al.: Universal dependencies 1.4 (2016).
  15. 15.
    Sanderson, R., Ciccarese, P., Van de Sompel, H.: Open annotation data model (2013).
  16. 16.
    Sanderson, R., Ciccarese, P., Young, B.: Web annotation data model (2017).
  17. 17.
    Sérasset, G.: DBnary: Wiktionary as a lemon-based multilingual lexical resource in RDF. Semant. Web J. 648 (2014).

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Goethe-University FrankfurtFrankfurtGermany

Personalised recommendations