TELIX: An RDF-Based Model for Linguistic Annotation

  • Emilio Rubiera
  • Luis Polo
  • Diego Berrueta
  • Adil El Ghali
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7295)

Abstract

This paper proposes the application of the RDF framework to the representation of linguistic annotations. We argue that RDF is a suitable data model to capture multiple annotations on the same text segment, and to integrate multiple layers of annotations. As well as using RDF for this purpose, the main contribution of the paper is an OWL ontology, called TELIX (Text Encoding and Linguistic Information eXchange), which models annotation content. This ontology builds on the SKOS XL vocabulary, a W3C standard for representation of lexical entities as RDF graphs. We extend SKOS XL in order to capture lexical relations between words (e.g., synonymy), as well as to support word sense disambiguation, morphological features and syntactic analysis, among others. In addition, a formal mapping of feature structures to RDF graphs is defined, enabling complex composition of linguistic entities. Finally, the paper also suggests the use of RDFa as a convenient syntax that combines source texts and linguistic annotations in the same file.

References

  1. 1.
    TEI P5: Guidelines for Electronic Text Encoding and Interchange. Technical report, TEI Consortium (2012), http://www.tei-c.org/Guidelines/P5/
  2. 2.
    Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: A Nucleus for a Web of Open Data. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L.J.B., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ISWC/ASWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  3. 3.
    Bechhofer, S., Miles, A.: SKOS Simple Knowledge Organization System Reference. W3C recommendation, W3C (August 2009), http://www.w3.org/TR/2009/REC-skos-reference-20090818/
  4. 4.
    Birbeck, M., Adida, B.: RDFa primer. W3C note, W3C (October 2008), http://www.w3.org/TR/2008/NOTE-xhtml-rdfa-primer-20081014/
  5. 5.
    Carroll, J.J., Bizer, C., Hayes, P., Stickler, P.: Named graphs, Provenance and Trust. In: WWW 2005: Proceedings of the 14th International Conference on World Wide Web, pp. 613–622. ACM, New York (2005)CrossRefGoogle Scholar
  6. 6.
    Cassidy, S.: An RDF realisation of LAF in the DADA annotation server. In: Proceedings of ISA-5, Hong Kong (2010)Google Scholar
  7. 7.
    Chiarcos, C.: An Ontology of Linguistic Annotations. LDV Forum 23(1), 1–16 (2008)Google Scholar
  8. 8.
    Chiarcos, C.: POWLA: Modeling Linguistic Corpora in OWL/DL. In: Simperl, E., et al. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 225–239. Springer, Heidelberg (2012)Google Scholar
  9. 9.
    Derdek, S., El Ghali, A.: Une chaîne UIMA pour l’analyse de documents de réglementation. In: Proceeding of SOS 2011, Brest, France (2011)Google Scholar
  10. 10.
    Dipper, S.: XML-based stand-off representation and exploitation of multi-level linguistic annotation. In: Proceedings of Berliner XML Tage 2005 (BXML 2005), pp. 39–50 (2005)Google Scholar
  11. 11.
    Lévy, F. (ed.): D1.4 Interactive ontology and policy acquisition tools. Technical report, Ontorule project (2011), http://ontorule-project.eu/
  12. 12.
    Farrar, S., Langendoen, T.: A Linguistic Ontology for the Semantic Web. GLOT International 7, 95–100 (2003)Google Scholar
  13. 13.
    Hayes, P.: RDF semantics. W3C recommendation. W3C (February 2004), http://www.w3.org/TR/2004/REC-rdf-mt-20040210/
  14. 14.
    Heath, T., Bizer, C.: Linked Data: Evolving the Web into a Global Data Space, 1st edn. Morgan & Claypool (2011)Google Scholar
  15. 15.
    Hellmann, S.: NLP Interchange Format (NIF) 1.0 specification, http://nlp2rdf.org/nif-1-0
  16. 16.
    Ide, N., Romary, L.: International Standard for a Linguistic Annotation Framework. Journal of Natural Language Engineering 10 (2004)Google Scholar
  17. 17.
    Ide, N., Suderman, K.: GrAF: a graph-based format for linguistic annotations. In: Proceedings of the Linguistic Annotation Workshop, LAW 2007, Stroudsburg, PA, USA, pp. 1–8. Association for Computational Linguistics (2007)Google Scholar
  18. 18.
    King, P.J.: An Expanded Logical Formalism for Head-Driven Phrase Structure Grammar. Arbeitspapiere des SFB 340 (1994)Google Scholar
  19. 19.
    McCrae, J., Spohr, D., Cimiano, P.: Linking Lexical Resources and Ontologies on the Semantic Web with Lemon. In: Antoniou, G., Grobelnik, M., Simperl, E., Parsia, B., Plexousakis, D., De Leenheer, P., Pan, J. (eds.) ESWC 2011, Part I. LNCS, vol. 6643, pp. 245–259. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  20. 20.
    Miles, A., Bechhofer, S.: SKOS Simple Knowledge Organization System eXtension for Labels (SKOS-XL). W3C recommendation, W3C (August 2009), http://www.w3.org/TR/2009/REC-skos-reference-20090818/skos-xl.html
  21. 21.
    Pollard, C.: Lectures on the Foundations of HPSG. Technical report, Unpublished manuscript: Ohio State University (1997), http://www-csli.stanford.edu/~sag/L221a/cp-lec-notes.pdf
  22. 22.
    Sag, I.A., Baldwin, T., Bond, F., Copestake, A., Flickinger, D.: Multiword Expressions: A Pain in the Neck for NLP. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 1–15. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  23. 23.
    Schreiber, G., van Assem, M., Gangemi, A.: RDF/OWL representation of WordNet. W3C working draft, W3C (June 2006), http://www.w3.org/TR/2006/WD-wordnet-rdf-20060619/
  24. 24.
    Seaborne, A., Harris, S.: SPARQL 1.1 query. W3C working draft, W3C (October 2009), http://www.w3.org/TR/2009/WD-sparql11-query-20091022/
  25. 25.
    Tobin, R., Cowan, J.: XML information set, W3C recommendation, W3C, 2nd edn. (February 2004), http://www.w3.org/TR/2004/REC-xml-infoset-20040204

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Emilio Rubiera
    • 1
  • Luis Polo
    • 1
  • Diego Berrueta
    • 1
  • Adil El Ghali
    • 2
  1. 1.Fundación CTICSpain
  2. 2.IBM CASFrance

Personalised recommendations