Advertisement

Fine-GRAINed Process Metadata

  • Kerstin JungEmail author
  • Markus Gärtner
  • Jonas Kuhn
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 1057)

Abstract

We describe the process metadata of GRAIN, a complex language data corpus, as a show case for application of metadata in the Digital Humanities. While the creation of language resources usually involves some automatic processing ranging from format conversion to labeling of structural features, data selection, inspection and interpretation are important manual steps, which tend to be neglected in the description of scientific workflows. GRAIN makes use of a format which (i) maps all workflow steps to flexible triples of \(\{input,operator,output\}\) and (ii) treats manual and automatic steps equally. Moreover, the process metadata has been semi-automatically generated and allows for a straightforward visualization describing the creation of the resource.

Keywords

Process metadata Natural language processing Linguistic annotation 

References

  1. 1.
    Eckart, K., Gärtner, M., Kuhn, J., Schweitzer, K.: Nützlich und nutzbar für die linguistische Forschung: Sprachtechnologische Infrastruktur. In: Lobin, H., Schneider, R., Witt, A. (eds.) Digitale Infrastrukturen für die germanistische Forschung, chap. 6, pp. 115–148. de Gruyter, Berlin/Boston (2018).  https://doi.org/10.1515/9783110538663-007
  2. 2.
    Elming, J., Johannsen, A., Klerke, S., Lapponi, E., Martinez Alonso, H., Søgaard, A.: Down-stream effects of tree-to-dependency conversions. In: Proceedings of NAACL-HLT 2013, pp. 617–626. Association for Computational Linguistics, Atlanta, Georgia (2013)Google Scholar
  3. 3.
    Gärtner, M., Hahn, U., Hermann, S.: Supporting sustainable process documentation. In: Rehm, G., Declerck, T. (eds.) GSCL 2017. LNCS (LNAI), vol. 10713, pp. 284–291. Springer, Cham (2018).  https://doi.org/10.1007/978-3-319-73706-5_24CrossRefGoogle Scholar
  4. 4.
    Jung, K., Gärtner, M.: Approaches to sustainable process metadata. In: Simov, K., Eskevich, M. (eds.) Proceedings of CLARIN Annual Conference 2019, Leipzig, Germany (2019)Google Scholar
  5. 5.
    Lemnitzer, L., Zinsmeister, H.: Korpuslinguistik. Narr Francke Attempo, Tübingen (2015)Google Scholar
  6. 6.
    Riester, A., Baumann, S.: The RefLex Scheme - Annotation Guidelines, SinSpeC. Working Papers of the SFB 732, vol. 14, University of Stuttgart (2017). http://dx.doi.org/10.18419/opus-9011
  7. 7.
    Schweitzer, K., et al.: German radio interviews: the grain release of the SFB732 silver standard collection. In: Calzolari, N., et al. (eds.) Proceedings of LREC 2018. Miyazaki, Japan (2018). ISBN 979-10-95546-00-9Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Institute for Natural Language ProcessingUniversity of StuttgartStuttgartGermany

Personalised recommendations