A modular approach to cataloguing marine science data

Leadbetter, Adam; Meaney, Will; Tray, Elizabeth; Conway, Andrew; Flynn, Sarah; Keena, Tara; Kelly, Caoimhín; Thomas, Rob

doi:10.1007/s12145-020-00445-w

A modular approach to cataloguing marine science data

Methodology Article
Open access
Published: 08 February 2020

Volume 13, pages 537–553, (2020)
Cite this article

Download PDF

You have full access to this open access article

Earth Science Informatics Aims and scope Submit manuscript

A modular approach to cataloguing marine science data

Download PDF

Adam Leadbetter ORCID: orcid.org/0000-0003-4382-2285¹,
Will Meaney¹,
Elizabeth Tray²,
Andrew Conway¹,
Sarah Flynn¹,
Tara Keena¹,
Caoimhín Kelly¹ &
…
Rob Thomas¹

2975 Accesses
4 Citations
10 Altmetric
Explore all metrics

Abstract

The ability to access and search metadata for marine science data is both a key requirement for answering fundamental principles of data management (making data Findable, Accessible, Interoperable and Reusable) and also in meeting domain-specific, community defined standards and legislative requirements placed on data publishers. This paper describes a modular data model to answer the functional requirements developed from these drivers and illustrates how this data model can be operationalised. The ability of this solution to meet the FAIR principles is then assessed.

Data Cataloguing

Data Catalogs in the Enterprise: Applications and Integration

Article Open access 21 June 2023

Harvesting of Metadata with Open Access Tools

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

In 2016, Wilkinson et al. introduced the FAIR principles of data management - that research data should be Findable, Accessible, Interoperable and Reusable. In order to meet the requirements of the Findable aspect of FAIR data, a dataset must be described by rich metadata in a searchable resource and the dataset must be assigned a clearly labelled persistent, unique identifier. The metadata describing the data resource should be released with a clear data usage license, detailed data provenance and ensure that the metadata meet domain-relevant community standards.

Online metadata catalogues for environmental data have been documented since early in the history of the World Wide Web (Günther et al., 1996). This early approach was focussed on helping users to discover, or find, data relevant for a given topic and to access it quickly in a user-friendly manner. It was also beneficial in meeting legislative requirements, such as the European Access to Information on the Environment Regulations (European Parliament, 2003). Even at this stage the issue of semantic interoperability of metadata and data was recognised as being important. As new paradigms of data have come to the fore, this conclusion has further grown in importance (Hilbring & Usländer, 2006; Proctor et al., 2010; Tanhua et al., 2019).

In the context of environmental Big Data, Vitolo et al. (2015) call for the use of data catalogues to allow the discovery of data services and their functionality. However, they point out that semantic heterogeneity is a hurdle which must be overcome in searching through catalogue services. Leadbetter et al. (2014) and Leadbetter & Vodden (2016) demonstrate how interoperable, homogenous semantics can provide improved knowledge-building and cross-disciplinary data integration in environmental data catalogues.

One such cross-disciplinary activity is Marine Spatial Planning (MSP) which is concerned with the management of the distribution of human activities in space and time in and around seas and oceans to achieve ecological, economic and societal objectives and outcomes (Ehler et al., 2019). Nylén et al., 2019 include as one of their steps in the MSP data process the establishment of a metadata catalogue for the data to be used in the process. Within their framework, the catalogue should be able to differentiate between the original versions of existing spatial data and newly created data products derived from one or more original datasets. This differentiation should also include the processing steps taken to generate the new data products. The data catalogue should also be able to handle both observed and modelled data, and for modelled data to provide information on the input parameters to the model and the methods employed by the model. Flynn et al. (2019) conclude that a data cataloguing system for MSP can allow the availability and suitability of data for the MSP process to be assessed at regular review cycles. Friddell et al., 2014 demonstrate that in other cross-disciplinary topics, in their case polar research, modularity is required in order to represent datasets, projects or programmes and other polar data resources within the catalogue system.

Marine Spatial Planning is also a European legislative requirement (European Parliament, 2014), as are other data integration programmes including the Marine Strategy Framework Directive (European Parliament, 2008) and the INSPIRE Spatial Data Infrastructure. A data catalogue should recognise these targets and look to meet the technical requirements that they set as well as highlighting which datasets may be relevant to them. These include, for example, the delivery of ISO19115/19139 standard metadata to comply with the INSPIRE Spatial Data Infrastructure (Craglia & Annoni, 2007). In addition to legislative requirements, community standards should also be adhered to, such as the European Directory of Marine Environmental Datasets (Schaap & Lowry, 2010) and the Marine Community Profile (Proctor et al., 2010).

Therefore, in the sphere of marine science data management, the need for a modular approach to data cataloguing which is designed to meet a number of requirements highlighted above (see Table 1) can be clearly seen. In this paper we describe a data cataloguing system developed at and in use at the Marine Institute, Ireland and will expand on the data model used in developing the catalogue; discuss the approach taken to implementing the catalogue; and discuss our findings and future work.

Table 1 Functional requirements for a marine data cataloguing system

Full size table

Data model

The data model used within this modular catalogue is focused on a number of high-level concepts and their inter-relationships, illustrated in Fig. 1. These concepts are modularly developed as classes within the data model and are described below. Examples of instances of the classes are given in the text and are also summarised in Table 2.

Table 2 Examples of instances of the classes in the Data Catalogue data model

Full size table

Dataset

First is the high-level Dataset class (Fig. 2). It may combine many different parameters, collected at multiple times and locations, using different instruments. A Dataset is linked to its storage and retention information and the classification, including licensing, associated with the Dataset under a machine actionable data policy. This machine actionable data policy is derived from a set of business rules associated with the data classifications laid out in the institutional data policy (such as Marine Institute, 2017). Therefore, a Dataset which is marked as containing personal data, as defined by the European General Data Protection Regulation (Voigt & Von dem Bussche, 2017) or business sensitive data will not be made publicly available. Examples of a Dataset include an institution’s entire research vessel Conductivity-Temperature-Depth profile archive; or a spatial dataset such as the distribution and abundance of cetacean species within an exclusive economic zone.

Dataset Collection Activity

Related to a Dataset is a Dataset Collection Activity (Fig. 2). This class specialises the Dataset in that it has a mandatory end date and also a mandatory platform element, which indicates the vehicles, structures or organisms capable of bearing instruments or tools for the collection of physical, chemical, geological or biological samples or data. Examples of a Dataset Collection Activity include a research vessel survey or cruise; or the deployment of a moored buoy at a specific location for a given time period.

Platform

Within the INSPIRE spatial data infrastructure, the Environmental Monitoring Facilities component describes the environmental monitoring facility (a research vessel, a satellite) as a spatial object in the context of INSPIRE and observations and measurements linked to the environmental monitoring facility. (INSPIRE TWG EMF, 2013). The Platform class (Fig. 3) of this catalogue system seeks to carry the attributes required to complete an Environmental Monitoring Facilities instance when combined with details from the Dataset Collection Activity class. It is also synonymous with the GeoLink class Platform which describes a “physical object of significance enabling observations resulting in a Dataset” (Krisnadhi et al., 2015). To this end a Platform instance is attributed with: its platform type; whether or not it is a mobile platform; which environmental regime it operates in; its operational start date, and if applicable, end date; and which Organisation is responsible for the platform. Where available, the International Council for the Exploration of the Seas platform code is also attributed to the Platform. Example instances of the Platform class include a research vessel, such as the RV Celtic Explorer, or an individual Argo programme drifting profiling float.

Dataset Collection

The Dataset Collection class (Fig. 2) is used to provide a link between a Dataset Collection Activity (e.g. a research vessel based survey; a deployment of a mooring) and a Dataset. As such, the Dataset Collection may be a subset of both the data collected by the Dataset Collection Activity (a limited set of the full parameters from that Activity) and the Dataset (possibly limited in time and/or parameter space). The Dataset Collection is linked to both a Dataset Collection Activity and a Dataset; and to the Device(s) used to sample the environment for a given range of parameters. An example of a Dataset Collection may be the Conductivity-Temperature-Depth profiles taken on a research vessel survey allowing the individual sensors to be connected to the activity and the calibration of those sensors to be connected with the associated measurements. A further example could be the time series of atmospheric weather conditions recorded during the deployment of a sea-surface monitoring buoy which allows for the change of sensors at service intervals of the buoy to be properly tracked within the catalogue.

Geographic Feature

A Geographic Feature (Fig. 3) is a mandatory attribute of a Dataset Collection Activity, and a recommended attribute of a Dataset. The Geographic Feature within this data catalogue model is closely related to the Open Geospatial Consortium and International Organisation for Standardisation’s Simple Feature Access model (Herring, 2011). To this extent, the Geographic Feature class stores the geographic coordinates of points, lines, and polygons and the feature type for both the Simple Feature Access model and the European Commission’s INSPIRE spatial data infrastructure. An instance of the Geographic Feature class may be attributed as a child of another Geographic Feature in order to build hierarchical networks of Geographical Features, such as river catchments and sea areas. Further attributes of a Feature within this model are the Coordinate Reference System used to define the latitude and longitude of the point, line or polygon; a URL to a definition of the Geographical Feature; and an organisation responsible for the Geographical Feature. Example instances of the class are a sampling location; a research vessel survey track; or a polygon defining a lake or river catchment area.

Programme

The Programme class (Fig. 3) is similar in scope to the EarthCube GeoLink ontology’s Program class in that instances represent a “formally recognized scientific effort receiving significant funding, requiring large scale coordination” (Krisnadhi et al., 2015). An instance of the Programme class may have a coordinating organisation, and a number of contributing and funding organisations as well as the name of an individual who is the principal investigator of the Programme. A Programme is time bound by a start date and an optional end date, and may have a URL link to a website describing the Programme. A Programme may have a number of deliverables associated with it. An instance of the Programme class may also be the child of another instance of the same class.

Device

As stated above, Dataset Collection Activity takes place via a Platform and is linked to a Dataset through a Dataset Collection which describes the deployment of a Device on a Platform. The Device class (Fig. 4) is designed to allow a SensorML (Botts and Robin, 2007) record to be constructed for a given Device instance. As such, an instance of the Device class carries the input and output parameters of the Device, its measurement units, its manufacturer, operating organisation and start and end dates. It also carries links to the documentation regarding the calibration history of the Device. The Device class is more detailed than the similar GeoLink class of Instrument as it holds the Device's serial number as well as the instrument type from a controlled vocabulary.

Organisation

The Organisation class (Fig. 3) is designed to capture the details of research institutes, data holding centres, monitoring agencies, governmental and private organisations that are in one way or another engaged in oceanographic and marine research activities, data & information management and/or data acquisition activities. It is synonymous with the GeoLink Organisation class, but is more detailed in its attribution. Attributes include the full postal address of the organisation and institutional contact details (email, telephone, fax number, web site) which are used instead of personal contact details in any publically available metadata in order to comply with the European General Data Protection Regulation. A link to the page where the information was collected from is maintained. Where an organisation has an entry in the European Directory of Marine Organisations (Schaap & Lowry, 2010) the unique identifier from that directory is also assigned to the Organisation record here.

Re-use of community-managed controlled vocabulary terms

Many attributes of the classes in the data model are constrained against well-managed, community governed controlled vocabularies, which addresses one of the Interoperability aspects of the FAIR principles. These are highlighted in Table 3. Controlled vocabularies provide consistency in the labelling of metadata and, when published online, allow for interoperability through accessing labels and definitions through web services (Schaap and Lowry, 2010). Controlled vocabularies which have a hierarchy of terms published, that is a “thesaurus” (McGuinness, 2002), allow the more coarse grained terminology which is often used as a data discovery vector to be inferred from fine grained terminology which is important in usage metadata (see Fig. 5). Rather than storing a local copy of the full hierarchy of the vocabulary terms, the data catalogue solution presented here only tags its entities with the finest-grained vocabulary terms, and when coarser-grained terms are required to be attributed to the dataset for discovery purposes, these are inferred from queries to web services at the vocabulary service host organisations. Listing 1 shows an example SPARQL (the query language for semantic databases) query which builds up the hierarchy for a parameter usage vocabulary term which is illustrated in Fig. 5.

Table 3 The use of community governed controlled vocabularies to constrain varies properties within the data model

Full size table

Implementation

In order to implement the data model described above, the architecture described below and illustrated in Fig. 6 has been adopted.

The first component (component 1 in Fig. 6) is an internal repository of metadata, developed using the Drupal content management system. Drupal is an open-source, community based framework enabling rapid development of web applications and is particularly suited to content management systems such as the Data Catalogue. The flexible native content management ability of a framework such as Drupal was key to the decision to input metadata in it rather than in a more familiar data cataloguing platform such as CKAN or GeoNetwork. In addition to core Drupal functionality provided ‘out-of-the-box’, the Data Catalogue also makes use of extended functionality through the inclusion of contributed software modules which can be managed within the Drupal framework. It is also possible to develop new modules to provide custom functionality that may not be available as core or contributed modules. It should be noted that:

This repository is designed as an internal intranet portal only and not for general public access. A subset of relevant and appropriately classified data descriptions as defined by the actionable data policy are shared externally, only after criteria for external publication have been met
The Data Catalogue implements role based access control allowing user access to be appropriately managed, e.g. limit create/update privileges to data owners and administrators.
The Data Catalogue is available in read-only mode to any users already authenticated on the internal network. In this case a restricted view is provided, ensuring that any restricted access information is hidden.

The Data Catalogue has been developed to export metadata for datasets and services in ISO 19115/19139 based XML format in compliance with the INSPIRE implementing rules for metadata (component 2 of Fig. 6). This allows dataset descriptions and associated information (e.g. owning organisation, programme, sensor information etc.) to be published and/or harvested from the Data Catalogue using industry standard formats and metadata rules. In addition, the internal Catalogue supports the DataCite metadata schema, allowing a completed data description entry to be exported in support of the minting of Digital Object Identifiers (DOI) for published data. The assignment of a DOI to a dataset is a well-documented paradigm to allow data to be cited within the scientific literature, and for a data centre or data publishing organisation to assert that an assessment of the technical quality (metadata, data format) of the dataset has been passed allowing the data to be maintained and served for the foreseeable future (Callaghan et al., 2012). This is the only place in the Data Catalogue system where individual’s names are made publically available alongside the dataset, as dataset authors for the citation. As can be seen in Fig. 2, this is not the only place where individual’s names are stored in the Data Catalogue system. In these other occurrences of an individual’s name only organisational-level contact information (such as an info@example.org email address) and contact points based on an individual’s organisational affiliation are made available to public view. The operational procedure which has been established around this is to obtain explicit consent from all dataset authors for this prior to the publication step in order to comply with the European General Data Protection Regulation. When a DOI is assigned to an entity in the Data Catalogue, it is recommended best practice to create and store the shortened form of the DOI from the ShortDOI.org service at the time the DOI is minted.

A subset of the content maintained within the internal Data Catalogue is shared externally. This publication process has been developed to make use of the standard metadata export functionality (XML formatted files) and the external facing GeoNetwork instance (component 3 of Fig. 6). GeoNetwork is an open source catalogue application to manage spatially referenced resources. It provides powerful metadata editing and search functions as well as an interactive web map viewer. It is currently used in numerous Spatial Data Infrastructure initiatives worldwide. A custom implementation of GeoNetwork has been developed to serve as the external/public facing web portal for the Data Catalogue. A number of steps are involved in the publication process, which are described below and illustrated in Fig. 7. Content is regularly exported from the internal data catalogue in ISO 19139 XML. This process can be configured to run as a background task or be manually initiated if updates are required immediately. Publication criteria and rules are applied through the machine-actionable data policy to ensure that only content appropriate for publication is included in the export process. These rules are based on data classification, publication status, licensing etc. and can be updated as required. Once exported, metadata XML files are moved to a central staging area located on the external perimeter network or ‘demilitarized zone’ (DMZ). This serves as the collection point for the GeoNetwork instance. The GeoNetwork instance includes an automated and configurable harvest capability. This allows the previously exported data descriptions to be imported and published on the public facing portal.

While not a core component of the Data Catalogue, the external facing Catalogue, through GeoNetwork, supports integration with other data serving applications (components 4 and 5 of Fig. 6). This allows Data Catalogue users to download or link to the underlying data as described by in the Data Catalogue. For spatial data this is achieved via Open Geospatial Consortium compliant web services from a GeoServer instance. GeoServer implements a number of standards such as Web Feature Services, Web Map Services, and Web Coverage Services. Another important data serving application is ERDDAP; a data server that gives a simple, consistent way to download subsets of scientific datasets in common file formats and make graphs and maps (Simons, 2019). ERDDAP has been developed by the National Oceanographic and Atmospheric Administration in the United States to provide access to data stored in multiple different formats through a web interface, and using RESTful URLs through a web service, brokering the storage formats to a number of data delivery formats. ERDDAP is a useful tool in scientific data delivery, not just for marine science, as it can access and serve any tabular or gridded data. These data integration components are included here to provide a complete, unified solution view of metadata and data delivery.

This approach provides a clear decoupling and separation of potentially sensitive internal data descriptions and externally published metadata. Only information explicitly categorised as suitable for publication is persisted on the external system. The external facing portal publishes a ‘read-only’ copy of the metadata contained within the Data Catalogue and mitigates any data loss if the external system has been compromised. Core user accounts and system provisioning details are maintained on the internal Drupal system, residing on a controlled and secure network. The publication criteria can be updated and modified at any time if requirements or user needs change. It makes use of industry standard metadata and provides an excellent reference system for other implementers that may be interested in using metadata in a similar way, for example Ireland’s Open Data Portal (https://data.gov.ie/).

Conclusions

We have presented a reusable, modular approach to cataloguing marine science data which meets a number of functional requirements derived from both academic literature and legislative drivers. The Data Catalogue system presented above also meets, at a base level, the requirements of the FAIR principles of data management (see Table 4). One particular development of note in the “Findability” principle is that the data model is presented within the HTML representations of the metadata landing pages using JSON-LD encoded Schema.org (see Listing 2). This improves the discoverability of the content of the Data Catalogue through exposing it to tools such as Google’s Dataset Search.

Table 4 How the data cataloguing platform described in this paper addresses the requirements of the FAIR principles of Data Management

Full size table

Although Table 4 shows a good alignment of the work presented above with the FAIR principles, there remains work to complete on the formalised representation of the data in structured formats beyond Schema.org and in the provenance of the datasets described in the Data Catalogue. Firstly, although GeoNetwork supports a generic Resource Description Framework (Miller, 1998) description of metadata records using the Data Catalog vocabulary (Maali et al., 2014) this requires extension to add in specific terms from domain specific ontologies such as GeoLink. This should also allow for more formalised descriptions of linkages between various datasets using richer semantics to describe the connections. Better connectivity between datasets and reports which use them is also required in the future. A further semantic application would be the use of spatial semantics to provide textual geographic search, which requires extensions to the existing structured thesauri describing geographic regions of the sea, such as the SeaVoX salt and freshwater body gazetteer (http://vocab.nerc.ac.uk/collection/C19).

Marine science programmes often collect biological samples in combination with environmental data. A collection of physical samples is analogous to a Dataset, with added complexity due to the samples tangibility. For example, a Dataset Collection Activity in the form of a marine research vessel survey may have a primary goal to measure stock abundance of a specific fishery (i.e. haddock or cod). In this example, a large part of the survey will involve taking biological samples (in the form of fish otoliths) for aging the population to report on stock recruitment. The resulting age dataset will be used to inform policy advice on regulatory measures regarding fishing effort in succeeding years (Marine Institute, 2018). The biological samples (in this case, the otoliths), and the associated fish metadata, are often stored for an extended period of time after the Dataset Collection Activity, for scientific reproducibility and transparency of the age dataset generated. In addition, otoliths can be used for microchemical analyses to investigate fish diet and habitat (Campana & Thorrold, 2001), which can be valuable for fisheries conservation efforts in subsequent years. Therefore, the necessity for appropriate physical and digital storage of biological samples and their associated metadata is evident. We anticipate the development of an optional accessory extension to the Data Catalogue to model biological samples and their associated metadata. The extension will utilize select concepts from the Data Catalogue, such as Geographic Feature and Programme, but also include additional metadata. For example, in the fisheries use-case, phenotype data (i.e. fish length and weight) will be associated with each biological sample (i.e. otolith). We expect the physical sample’s extension to the Data Catalogue to become a useful tool for long term archiving and reusability of physical samples resulting from various marine science programmes.

Finally, there is ongoing work in the data catalogue beyond the FAIR principles, as these offer a base level of good data stewardship (Boeckhout et al., 2018). One example is to automate assessments of the maturity of the stewardship of datasets within the Data Catalogue system. This takes the Data Stewardship Maturity Framework of Peng et al. (2015) as its starting point and will assess the values encoded for various elements in the Data Catalogue’s data model to produce a rating for a given dataset. As discussed by Flynn et al. (2019), this approach can also be specialised in order to provide an assessment of the suitability of a dataset for a given application, in the case of their study for Marine Spatial Planning.

References

Boeckhout M, Zielhuis ZA, Bredenoord AL (2018) The FAIR guiding principles for data stewardship: fair enough? Eur J Hum Genet 26:931–936
Article Google Scholar
Botts M, Robin A (2007) OpenGIS Sensor Model Language (SensorML) implementation specification. Open Geospatial Consortium
Callaghan S, Donegan S, Pepler S, Thorley M, Cunningham N, Kirsch P et al (2012) Making data a first class scientific output: data citation and publication by NERC's environmental data centres. Int J Digit Curation 7(1):107–113
Article Google Scholar
Campana SE, Thorrold SR (2001) Otoliths, increments, and elements: keys to a comprehensive understanding of fish populations? Can J Fish Aquat Sci 58(1):30–38. https://doi.org/10.1139/f00-177
Article Google Scholar
Craglia M, Annoni A (2007) INSPIRE: an innovative approach to the development of spatial data infrastructures in Europe. Research and Theory in Advancing Spatial Data Infrastructure Concepts, 93–105
Ehler C, Zaucha J, Gee K (2019) Maritime/Marine Spatial Planning at the interface of research and practice. In Maritime Spatial Planning (pp. 1–21). Palgrave Macmillan, Cham
European Parliament (2003). Directive 2003/4/EC of the European Parliament and of the Council of 28 January 2003 on public access to environmental information and repealing Council Directive 90/313/EEC. Retrieved 7th January 2020 from https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=celex:32003L0004
European Parliament (2008). Directive 2008/56/EC of the European Parliament and of the Council of 17 June 2008 establishing a framework for community action in the field of marine environmental policy (Marine Strategy Framework Directive) (Text with EEA relevance). Retrieved 7th August 2019 from https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:32008L0056
European Parliament (2014). Directive 2014/89/EU of the European Parliament and of the Council of 23 July 2014 establishing a framework for maritime spatial planning. Retrieved 7th August 2019 from https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=uriserv:OJ.L_.2014.257.01.0135.01.ENG%20
Flynn S, Meaney W, Leadbetter A, Fisher J, Nic Aonghusa C (2019) A Data Management and Storage Process for Marine Spatial Planning in Ireland. Manuscript submitted for publication
Friddell JE, LeDrew EF, Vincent WF (2014) The Polar Data Catalogue: best practices for sharing and archiving Canada's polar data. Data Science Journal: IFPDA-01
Günther O, Lessing H, Swoboda W (1996) UDK: A European environmental data catalogue. In Proceedings of the Third International Conference in Integrating GIS and Environmental Modeling. National Center for Geographic Information and Analysis, Santa Barbara (USA)
Herring J (2011) OpenGIS implementation standard for geographic information-simple feature access-part 1: common architecture. Open Geospatial Consortium
Hilbring D, Usländer T (2006, September) Catalogue services enabling syntactical and semantic interoperability in environmental risk management architectures. In EnviroInfo (pp. 39-46)
INSPIRE Thematic Working Group Environmental Monitoring Facilities (2013) D2.8.II/III.7 INSPIRE Data Specification on Environmental Monitoring Facilities – Technical Guidelines. European Commission Joint Research Centre
Krisnadhi A, Hu Y, Janowicz K, Hitzler P, Arko R, Carbotte S, ... Ji P (2015, October) The GeoLink modular oceanography ontology. In International Semantic Web Conference (pp. 301–309). Springer, Cham
Leadbetter AM, Lowry RK, Clements DO (2014) Putting meaning into NETMAR–the open service network for marine environmental data. International Journal of Digital Earth 7(10):811–828
Article Google Scholar
Leadbetter AM, Vodden PN (2016) Semantic linking of complex properties, monitoring processes and facilities in web-based representations of the environment. International Journal of Digital Earth 9(3):300–324
Article Google Scholar
Leadbetter AM, Carr R, Flynn S, Meaney W, Moran S, Bogan Y, Brophy L, Lyons K, Stokes D, Thomas R (2019) Implementation of a data management quality management framework at the Marine Institute. Ireland Earth Science Informatics:1–13. https://doi.org/10.1007/s12145-019-00432-w
Maali F, Erickson J, Archer P (2014) Data catalog vocabulary (DCAT). W3C recommendation, 16
Marine Institute (2017) Marine Institute Data Policy. Marine Institute, Galway, Ireland. https://www.marine.ie/Home/sites/default/files/MIFiles/Docs/DataServices/Marine%20Institute%20Data%20Policy%202017.pdf
Marine Institute (2018) The Stock Book 2018: annual review of fish stocks in 2018 with management advice for 2019. Marine Institute, Galway
Miller E (1998) An introduction to the Resource Description Framework. Bull Am Soc Inf Sci Technol 25(1):15–19
Nylén T, Tolvanen H, Erkkilä-Välimäki A, Roose M (2019) Guide for cross-border spatial data analysis in maritime spatial planning. University of Turku, Turku
Google Scholar
McGuinness DL (2002) Ontologies come of age. Spinning the semantic web: bringing the World Wide Web to its full potential, 171–194
Peng G, Privette JL, Kearns EJ, Ritchey NA, Ansari S (2015) A unified framework for measuring stewardship practices applied to digital environmental datasets. Data Science Journal 13:231–253
Article Google Scholar
Proctor R, Roberts K, Ward BJ (2010) A data delivery system for IMOS, the Australian Integrated Marine Observing System. Adv Geosci 28:11–16
Schaap DMA, Lowry RK (2010) SeaDataNet–pan-European infrastructure for marine and ocean data management: unified access to distributed data sets. International Journal of Digital Earth 3(S1):50–69
Article Google Scholar
Simons RA 2019 ERDDAP. https://coastwatch.pfeg.noaa.gov/erddap. Monterey, CA: NOAA/NMFS/SWFSC/ERD
Tanhua T, Pouliquen S, Hausman J, O’Brien K, Bricher P, de Bruin T, Buck JJH, Burger EF, Carval T, Casey KS, Diggs S, Giorgetti A, Glaves H, Harscoat V, Kinkade D, Muelbert JH, Novellino A, Pfeil B, Pulsifer PL, Van de Putte A, Robinson E, Schaap D, Smirnov A, Smith N, Snowden D, Spears T, Stall S, Tacoma M, Thijsse P, Tronstad S, Vandenberghe T, Wengren M, Wyborn L, Zhao Z (2019) Ocean FAIR data services. Front Mar Sci 6:440. https://doi.org/10.3389/fmars.2019.00440
Article Google Scholar
Vitolo C, Elkhatib Y, Reusser D, Macleod CJ, Buytaert W (2015) Web technologies for environmental big data. Environ Model Softw 63:185–198
Article Google Scholar
Voigt, P., & Von dem Bussche, A. (2017). The EU General Data Protection Regulation (GDPR). A Practical Guide, 1st Ed., Cham: Springer International Publishing
Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, Blomberg N et al. (2016) "The FAIR guiding principles for scientific data management and stewardship." Scientific Data 3

Download references

Acknowledgments

The authors wish to thank Mr. Dirk Fleischer of Christian-Albrechts University Kiel for conversations which informed the concept of the machine actionable data policy and Mr. Trevor Alcorn at the Geological Survey Ireland, and previously at the Marine Institute, for his early contributions to the project.

A number of projects and programmes have contributed to the development of this work.

This work is part supported by the Irish Government and the European Maritime & Fisheries Fund as part of the EMFF Operational Programme for 2014-2020.

This work is part supported by the Marine Institute’s Digital Ocean programme.

This work was part carried out under the COMPASS project. This project is supported by the European Union’s INTERREG Iva Programme, managed by the Special EU Programmes Body (SEUPB). The views and opinions expressed in this document do not necessarily reflect those of the European Commission or the Special EU Programmes Body (SEUPB).

The ‘Unlocking the Archive’ project (Grant-Aid Agreement No. PBA/FS/16/03) is carried out with the support of the Marine Institute and is funded under the Marine Research Programme by the Irish Government.

Author information

Authors and Affiliations

Marine Institute, Rinville, Oranmore, Co Galway, Ireland
Adam Leadbetter, Will Meaney, Andrew Conway, Sarah Flynn, Tara Keena, Caoimhín Kelly & Rob Thomas
Galway-Mayo Institute of Technology, Galway, Ireland
Elizabeth Tray

Authors

Adam Leadbetter
View author publications
You can also search for this author in PubMed Google Scholar
Will Meaney
View author publications
You can also search for this author in PubMed Google Scholar
Elizabeth Tray
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Conway
View author publications
You can also search for this author in PubMed Google Scholar
Sarah Flynn
View author publications
You can also search for this author in PubMed Google Scholar
Tara Keena
View author publications
You can also search for this author in PubMed Google Scholar
Caoimhín Kelly
View author publications
You can also search for this author in PubMed Google Scholar
Rob Thomas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Adam Leadbetter.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Leadbetter, A., Meaney, W., Tray, E. et al. A modular approach to cataloguing marine science data. Earth Sci Inform 13, 537–553 (2020). https://doi.org/10.1007/s12145-020-00445-w

Download citation

Received: 16 August 2019
Accepted: 21 January 2020
Published: 08 February 2020
Issue Date: June 2020
DOI: https://doi.org/10.1007/s12145-020-00445-w

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A modular approach to cataloguing marine science data

Abstract