Linking to Linguistic Data Categories in ISOcat

  • Menzo WindhouwerEmail author
  • Sue Ellen Wright


ISO Technical Committee 37, Terminology and other language and content resources established an ISO 12620:2009 based Data Category Registry (DCR), called ISOcat (see, to foster semantic interoperability of linguistic resources. This registry follows a grass roots approach, which means that any linguist can add the data categories (s)he needs. Standardized subsets of these data categories are created by a standardization procedure involving groups of international experts who are members of various Thematic Domain Groups (TDGs) and of the DCR Board. However, the goal of improving semantic interoperability can only be met if the data categories are reused by a wide variety of linguistic resource types. A resource indicates its usage of data categories by linking to them. ISO 12620:2009 specifies a small DC Reference XML vocabulary to annotate XML documents with links to data categories. The link is established by an URI, which servers as the Persistent IDentifier (PID) of a data category. Any XML document can now refer to data categories to explicate the semantics of elements, attributes and values. This paper discusses the efforts to mimic the same approach for RDF-based resources. It also introduces the RDF quad store based Relation Registry RELcat, which enables ontological relationships between data categories not supported by ISOcat and thus adds an extra level of linguistic knowledge.


Data Category Language Resource Content Resource Transitive Relationship Persistent IDentifier 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Berners-Lee T (1998) Cool URIs don’t change. Tech. rep., World Wide Web Consortium,
  2. Broeder D, Declerck T, Hinrichs E, Piperidis S, Romary L, Calzolari N, Wittenburg P (2008) Foundation of a component-based flexible registry for language resources and technology. In: Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008), Marrakech, Morocco Google Scholar
  3. Farrar S, Langendoen DT (2010) An OWL-DL implementation of GOLD: An ontology for the semantic web. In: Witt AW, Metzing D (eds) Linguistic Modeling of Information and Markup Languages: Contributions to Language Technology, Springer Google Scholar
  4. ISO 12620 (2009) Terminology and other language and content resources - Specification of data categories and management of a Data Category Registry for language resources Google Scholar
  5. ISO 24613 (2008) Language resource management - Lexical markup framework (LMF) Google Scholar
  6. Kemps-Snijders M, Windhouwer M, Wittenburg P, Wright SE (2008) ISOcat: Corralling data categories in the wild. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08), Marrakech, Morocco,
  7. Schuurman I, Windhouwer M (2011) Explicit semantics for enriched documents. What do ISOcat, RELcat and SCHEMAcat have to offer? In: Proceedings of the 2nd Supporting Digital Humanities Conference, Copenhagen, Denmark Google Scholar
  8. Simons G, Bird S (2003) The open language archives community: An infrastructure for distributed archiving of language resources. Literary and Linguistic Computing 18(2):117–128 CrossRefGoogle Scholar
  9. Windhouwer M, Wright SE, Kemps-Snijders M (2010) Referencing ISOcat data categories. In: Budin G, Declerck T, Romary L, Wittenburg P (eds) Proceedings of the LREC 2010 LRT standards workshop, Malta,

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  1. 1.Max Planck Institute for PsycholinguisticsNijmegenThe Netherlands

Personalised recommendations