Linking to Linguistic Data Categories in ISOcat
ISO Technical Committee 37, Terminology and other language and content resources established an ISO 12620:2009 based Data Category Registry (DCR), called ISOcat (see http://www.isocat.org), to foster semantic interoperability of linguistic resources. This registry follows a grass roots approach, which means that any linguist can add the data categories (s)he needs. Standardized subsets of these data categories are created by a standardization procedure involving groups of international experts who are members of various Thematic Domain Groups (TDGs) and of the DCR Board. However, the goal of improving semantic interoperability can only be met if the data categories are reused by a wide variety of linguistic resource types. A resource indicates its usage of data categories by linking to them. ISO 12620:2009 specifies a small DC Reference XML vocabulary to annotate XML documents with links to data categories. The link is established by an URI, which servers as the Persistent IDentifier (PID) of a data category. Any XML document can now refer to data categories to explicate the semantics of elements, attributes and values. This paper discusses the efforts to mimic the same approach for RDF-based resources. It also introduces the RDF quad store based Relation Registry RELcat, which enables ontological relationships between data categories not supported by ISOcat and thus adds an extra level of linguistic knowledge.
KeywordsData Category Language Resource Content Resource Transitive Relationship Persistent IDentifier
Unable to display preview. Download preview PDF.
- Berners-Lee T (1998) Cool URIs don’t change. Tech. rep., World Wide Web Consortium, http://www.w3.org/Provider/Style/URI.html
- Broeder D, Declerck T, Hinrichs E, Piperidis S, Romary L, Calzolari N, Wittenburg P (2008) Foundation of a component-based flexible registry for language resources and technology. In: Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008), Marrakech, Morocco Google Scholar
- Farrar S, Langendoen DT (2010) An OWL-DL implementation of GOLD: An ontology for the semantic web. In: Witt AW, Metzing D (eds) Linguistic Modeling of Information and Markup Languages: Contributions to Language Technology, Springer Google Scholar
- ISO 12620 (2009) Terminology and other language and content resources - Specification of data categories and management of a Data Category Registry for language resources Google Scholar
- ISO 24613 (2008) Language resource management - Lexical markup framework (LMF) Google Scholar
- Kemps-Snijders M, Windhouwer M, Wittenburg P, Wright SE (2008) ISOcat: Corralling data categories in the wild. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08), Marrakech, Morocco, http://www.lrec-conf.org/proceedings/lrec2008/
- Schuurman I, Windhouwer M (2011) Explicit semantics for enriched documents. What do ISOcat, RELcat and SCHEMAcat have to offer? In: Proceedings of the 2nd Supporting Digital Humanities Conference, Copenhagen, Denmark Google Scholar
- Windhouwer M, Wright SE, Kemps-Snijders M (2010) Referencing ISOcat data categories. In: Budin G, Declerck T, Romary L, Wittenburg P (eds) Proceedings of the LREC 2010 LRT standards workshop, Malta, http://www.lrec-conf.org/proceedings/lrec2010/workshops/W4.pdf