Skip to main content

Advertisement

SpringerLink
Log in
Menu
Find a journal Publish with us
Search
Cart
Book cover

European Language Grid pp 151–169Cite as

  1. Home
  2. European Language Grid
  3. Chapter
Datasets, Corpora and other Language Resources

Datasets, Corpora and other Language Resources

  • Victoria Arranz3,
  • Khalid Choukri3,
  • Valérie Mapelli3,
  • Mickaël Rigault3,
  • Penny Labropoulou4,
  • Miltos Deligiannis4,
  • Leon Voukoutis4 &
  • …
  • Stelios Piperidis4 
  • Chapter
  • Open Access
  • First Online: 02 November 2022
  • 1446 Accesses

Part of the Cognitive Technologies book series (COGTECH)

Abstract

This chapter provides an overview of what is available in ELG in terms of datasets, corpora and other language resources (LRs) and how this has been achieved. We look at the procedures and steps that have been followed to complete the full resource ingestion cycle, which goes from repository and LR identification to metadata description and ingestion. We explain the approaches, priorities and methodology. The chapter also outlines the repositories that have been integrated into ELG, discussing the different procedures followed (metadata conversion, extraction, and completion, as well as harvesting) and the reasons behind these choices. Furthermore, the ELG catalogue content is described, with details on key elements and features as well as accomplishments. The last two sections are devoted to the crucial legal issues behind such a complex platform and its data management plan, respectively.

Chapter PDF

Download to read the full chapter text

Author information

Authors and Affiliations

  1. ELDA, Paris, France

    Victoria Arranz, Khalid Choukri, Valérie Mapelli & Mickaël Rigault

  2. Institute for Language and Speech Processing, R. C., Athena, Greece

    Penny Labropoulou, Miltos Deligiannis, Leon Voukoutis & Stelios Piperidis

Authors
  1. Victoria Arranz
    View author publications

    You can also search for this author in PubMed Google Scholar

  2. Khalid Choukri
    View author publications

    You can also search for this author in PubMed Google Scholar

  3. Valérie Mapelli
    View author publications

    You can also search for this author in PubMed Google Scholar

  4. Mickaël Rigault
    View author publications

    You can also search for this author in PubMed Google Scholar

  5. Penny Labropoulou
    View author publications

    You can also search for this author in PubMed Google Scholar

  6. Miltos Deligiannis
    View author publications

    You can also search for this author in PubMed Google Scholar

  7. Leon Voukoutis
    View author publications

    You can also search for this author in PubMed Google Scholar

  8. Stelios Piperidis
    View author publications

    You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Victoria Arranz .

Editor information

Editors and Affiliations

  1. Deutsches Forschungszentrum für Künstliche Intelligenz GmbH (DFKI), Berlin, Germany

    Georg Rehm

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and Permissions

Copyright information

© 2023 The Author(s)

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Arranz, V. et al. (2023). Datasets, Corpora and other Language Resources. In: Rehm, G. (eds) European Language Grid. Cognitive Technologies. Springer, Cham. https://doi.org/10.1007/978-3-031-17258-8_8

Download citation

  • .RIS
  • .ENW
  • .BIB
  • DOI: https://doi.org/10.1007/978-3-031-17258-8_8

  • Published: 02 November 2022

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-17257-1

  • Online ISBN: 978-3-031-17258-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

search

Navigation

  • Find a journal
  • Publish with us

Discover content

  • Journals A-Z
  • Books A-Z

Publish with us

  • Publish your research
  • Open access publishing

Products and services

  • Our products
  • Librarians
  • Societies
  • Partners and advertisers

Our imprints

  • Springer
  • Nature Portfolio
  • BMC
  • Palgrave Macmillan
  • Apress
  • Your US state privacy rights
  • Accessibility statement
  • Terms and conditions
  • Privacy policy
  • Help and support

Not affiliated

Springer Nature

© 2023 Springer Nature