Skip to main content
Log in

Creating & Testing CLARIN Metadata Components

  • Project Note
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

The CLARIN Metadata Infrastructure (CMDI) that is being developed in Common Language Resources and Technology Infrastructure (CLARIN) is a computer-supported framework that combines a flexible component approach with the explicit declaration of semantics. The goal of the Dutch CLARIN project “Creating & Testing CLARIN Metadata Components” was to create metadata components and profiles for a wide variety of existing resources housed at two data centres according to the CMDI specifications. In doing so the principles of the framework were tested. The results of the project are of benefit to other CLARIN-projects that are expected to adhere to the CMDI framework and its accompanying tools.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. http://www.clarin.eu/files/metadata-CLARIN-ShortGuide.pdf.

  2. http://www.isocat.org.

  3. http://www.mpi.nl.

  4. http://www.clarin.eu.

  5. http://www.meertens.knaw.nl.

  6. http://www.inl.nl.

  7. The HLT Agency (http://www.inl.nl/tst-centrale), which is an initiative of and is funded by, the Dutch Language Union, was set up to organize easy access and re-usage of language resources for the Dutch language developed with public funding.

  8. Although CMDI can also be used for creating metadata for tools and web services, in the project these were not taken into account.

  9. http://www.mpi.nl/IMDI/.

  10. http://www.clarin.eu/toolkit.

  11. http://www.clarin.eu/cmdi.

  12. The standardisation process is carried out by domain experts who evaluate data categories and work in Thematic Domain Groups. Each Thematic Domain Group focusses on a specific topic like morphosyntax, metadata or lexicology.

  13. http://catalog.clarin.eu/ds/ComponentRegistry/#.

  14. http://www.isocat.org/interface/index.html.

  15. http://trac.clarin.nl/raw-attachment/wiki/WikiStart/BestPracticeGuide-V4.pdf.

References

  • Barbiers, S., Cornips, L. & Kunst, J. P. (2007). The Syntactic Atlas of the Dutch Dialects: A corpus of elicited speech and text as an on-line dynamic atlas. In J. C. Beal & K. C. Corrigan & H. Moisl [red.] Creating and digitizing language corpora. Volume 1: Synchronic databases. Palgrave Macmillan, Hampshire, pp. 54–90.

  • Beeken, J. C. & van der Kamp, P. (2004). The Centre for Dutch Language and Speech Technology (TST Centre). In Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC), pp. 555–558.

  • Broeder, D., Declerck, T., Hinrichs, E., Piperidis, S., Romary, L., Calzolari, N., & Wittenburg, P. (2008). Foundation of a component-based flexible registry for language resources and technology. In Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC).

  • Broeder, D., & Wittenburg, P. (2006). The IMDI metadata framework, its current application and future direction. International Journal of Metadata, Semantics and Ontologies, 1(2), 119–132.

    Article  Google Scholar 

  • Cucchiarini, C., Driesen, J., Van Hamme, H., & Sanders, E. (2008). Recording Speech of Children, Non-Natives and Elderly People for HLT Applications: The JASMIN-CGN Corpus. In Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC).

  • ISLE Metadata Initiative (IMDI). (2009). Metadata Elements for Catalogue Descriptions. Part 1 B, Version 3.0.13. http://www.mpi.nl/IMDI/documents/Proposals/IMDI_Catalogue_3.0.0.pdf.

  • Kemps-Snijders, M., Windhouwer, M., Wittenburg, P. & Wright, S.E. (2009). ISOcat: Remodeling Metadata for Language Resources. In the special issue on the Open Forum on Metadata Registries of the International Journal of Metadata, Semantics and Ontologies (IJMSO), 4(4), pp. 261–276.

    Google Scholar 

  • Meder, T. (2010). From a Dutch Folktale Database towards an International Folktale Database. In: Fabula 51, Heft 1/2. Walter de Gruyter: Berlin: New York.

  • NISO. (2004). Understanding Metadata. Bethesda, MD: NISO Press. URL: http://www.niso.org/standards/resources/UnderstandingMetadata.pdf.

  • Simons, G., & Bird, S. “OLAC Metadata”. 2008, cited version http://www.language-archives.org/OLAC/metadata-20080531.html, latest version http://www.language-archives.org/OLAC/metadata.html.

  • TEI Text Encoding Initiative. (2009). http://www.tei-c.org/.

  • Váradi, T., Wittenburg, P., Krauwer, S., Wynne, M., & Koskenniemi, K. (2008). CLARIN: Common language resources and technology infrastructure. In Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC).

Download references

Acknowledgments

The authors would like to thank Jan Pieter Kunst (Meertens Institute) and Anna Aalstein (INL) for their valuable input during the project. The project reported on in this paper was funded by CLARIN-NL (www.clarin.nl).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Folkert de Vriend.

Appendix

Appendix

See Tables 1 and 2.

Rights and permissions

Reprints and permissions

About this article

Cite this article

de Vriend, F., Broeder, D., Depoorter, G. et al. Creating & Testing CLARIN Metadata Components. Lang Resources & Evaluation 47, 1315–1326 (2013). https://doi.org/10.1007/s10579-013-9231-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-013-9231-6

Keywords

Navigation