Environmental and Earth Science

. Answering the key challenges for society due to environmental issues like climate change, pollution and loss of biodiversity, and making the right deci-sions to tackle these in a cost-efﬁcient and sustainable way requires scientiﬁc understanding of the Earth System. This scientiﬁc knowledge can then be used to inform the general public and policymakers. Scientiﬁc understanding starts with having available the right data, often in the form of observations. Research Infrastructures (RIs) exist to perform these observations in the required quality and to make the data available to ﬁrst of all the researchers. In the current Big Data era, the increasing challenge is to provide the data in an interoperable and machine-readable and understandable form. The European RIs on environment formed a project cluster called ENVRI that tackles these issues. In this chapter, we introduce the societal relevance of the environmental data produced by the RIs and discuss the issues at hand in providing the relevant data according to the so-called FAIR principles.

In August 2016, the Anthropocene Working Group of the Sub commission on Quaternary Stratigraphy 1 of the International Commission on Stratigraphy 2 officially voted to define our time as the Anthropocene in the Geological Time Scale. The ratification of this Anthropocene era by the International Commission on Stratigraphy of the International Union of Geological Sciences 3 is pending due to a discussion on where this period should begin (between the beginning of Agricultural Revolution about 12000 years ago or only since the so-called Great Acceleration (1945 A.D.), but nevertheless we can safely say that we are now in a period where mankind is the main determinant in the fate of Earth [1].
Human impacts on climate and biodiversity are the most striking illustrations of the Anthropocene, as demonstrated by the UN IPCC programme in its most recent Fifth Assessment Report on climate [2], and by the very recent 2019 IPBES Global Assessment Report on Biodiversity and Ecosystem Services 4 . Global rates of extinction are shown to have been on the rise since at least 1500 and are now accelerating at an unparalleled pace. A recent estimate is that since the rise of human civilisation 83% of wild mammals and 50% of plants have already been lost [8]. The use of fossil fuels since the industrial revolution has now increased the CO 2 global atmospheric average atmosphere from the normal 180-280 ppm in the past million years to more than 405 ppm in 2017 5 .
The human influence on natural resources is increasing due to population and economic growth but in return the natural processes in solid Earth, climate, ecosphere, terrestrial and marine domains have an increasing effect on mankind and society due to the increasing complexity and capital intensity of our society and economies. Understanding and quantifying these pressures and resulting changes is a requirement for the sustainable development of our societies using fact-based decision making. Assessments of changes in environmental conditions and their relationship with the driving forces must be based on trustworthy and well-documented observations. This is not an easy task as there are many interactions between the changes in the atmosphere, land and hydrosphere, and the resulting impacts on ecosystems all need special and focused highquality long-term observations. This requires us to have better observations and data on these important pre-conditions in order to better inform decision makers to take the measures needed to maintain a thriving society. Research infrastructures are an important element in providing the information required to support science and fact-based policy development.

Supporting Sustainable Development with Data
The United Nations Sustainable Development Goals are a call for action by all countries -poor, rich and middle-income -to promote prosperity while protecting the planet.
They recognise that ending poverty must go hand-in-hand with strategies that build economic growth and address a range of social needs including education, health, social protection, and job opportunities while tackling climate change and environmental protection. The UN defined a set of 17 Sustainable Development Goals (SDG) where data is required in order to develop policies and evaluate and track the progress of the developments, as shown in Fig. 1. For the environmental research infrastructures (ENVRI) to be discussed in this book, most SDGs are very relevant but particularly relevant are Climate Action (Goal 13), Life Below (in) Water (Goal 14) and Life On Land (Biodiversity, Forests and land degradation) (Goal 15). Of course, all these SDGs are also closely related to SDGs like Energy (Goal 7), Sustainable production and consumption (Goal 12), Cities (Goal 11) and Water and sanitation (Goal 6). One of the global partnerships in the framework of the UN SDGs is the Global Partnership for Sustainable Development Data with motto: BETTER DATA. BETTER DECISIONS. BETTER LIVES 6 .

The Role of Research Infrastructures
Research Infrastructures (RI) of the Environment Domain as defined by ESFRI 7 cover the main four subdomains of the complex Earth system (Atmosphere, Marine, Solid Earth, and Biodiversity/Terrestrial Ecosystems), thus forming the cluster of European Environmental and Earth System Research Infrastructures (ENVRI) 8  The RI facilities were developed to respond to the needs of specific research communities, following individual requirements and methods of specific disciplines. However, the necessity of interdisciplinary cooperation has been evident for decades. Therefore, the ENVRI community has increasingly cooperated within the cluster projects ENVRI (2011-2014, FP7) [9], which paved the way for the ENVRIplus 9 project (2015-2019, H2020) [9,10] and the ENVRI-FAIR 10 project (2019-2022, H2020) [11]. ENVRIplus gathered all subdomains of the Earth system science to work together, capitalise the progress made in the various disciplines, and strengthen interoperability amongst RIs and subdomains.
In Sect. 3, three example cases will be shown where Research Infrastructures from ENVRI provide data to inform policy and society for better decision making with regards to reaching the Sustainable Development Goals.

The ENVRIplus Objectives
The objective of ENVRIplus was to provide common solutions to shared challenges for European Environmental and Earth System Research Infrastructures (RIs) in their efforts to deliver new services for science and society.
To reach this overall goal, ENVRIplus brought together the environmental RIs included in the ESFRI Roadmap, leading preparatory projects, key developing RI networks and specific technical specialist partners to build common synergistic solutions for pressing issues in RI construction and implementation. ENVRIplus was organised around six key objectives, identified as "Themes" as shown in Fig. 2: 1. Improve the ability of RIs to observe the Earth System, in particular through development and testing of new sensor technologies, harmonizing observation methodologies and developing techniques to overcome common problems associated with distributed remote observation networks; 2. Generate common solutions for shared information technology and data related challenges of the environmental RIs, especially in data and service discovery and use, workflow documentation, mechanisms for data citations, service virtualization, and user characterization and interaction; 3. Develop harmonised policies for access (physical and virtual) for the environmental RIs, including access services for multidisciplinary users; 4. Investigate the interactions between RIs and society that includes: finding common approaches and methodologies for assessing the ability of an RI to address economic and societal challenge; developing ethics guidelines for RIs, and investigating the possibility of enhancing the use of Citizen Science in RI products and services; 5. Ensure the cross-fertilisation and knowledge exchange between RIs on new technologies, best practices, approaches and policies by generating training material for RI personnel to provide instruction on using the new observational, technological and computational tools, as well as facilitating inter-RI knowledge transfer via a staff exchange program; 6. Create a communication and cooperation framework to coordinate the activities of the environmental RIs for the purposes of common strategic development, improved user interaction and interdisciplinary cross-RI products and services.

Climate Change and Atmospheric Composition Research (ICOS, ACTRIS and IAGOS)
Climate Change has been recognised by the United Nations and the European Union as the major environmental challenge for mankind. Research is needed on future scenarios on climate change that will have a dramatic effect on natural environments, plants and animals, leading to an acceleration in biodiversity loss in some areas. The impacts will have knock-on effects for many communities and sectors that depend on natural resources, including agriculture, fisheries, energy, tourism and water. The Stern Review [3] stated as early as 2007 that climate change is the greatest and widest-ranging market failure ever seen, presenting a unique challenge for economics. According to the Stem Review, without action, the overall costs of climate change will be equivalent to losing at least 5% of global gross domestic product (GDP) each year, now and forever. Another important area for research-based information for climate policy is the validation of emission reductions required as part of the COP21 Paris Climate Agreement of 2015. In order to keep climate change as a consequence of increased emissions of greenhouse gases due to human activities under 2.0°C and preferably 1.5°C the world will need to be carbon neutral by 2050. The mitigation measures and the speed of their implementation need to be validated by independent methods and closely monitored, while the influence of natural feedback due to the ongoing climate change will require attention, as this may force a change in the speed of implementation of mitigation measures and adaptation.
The data from the Integrated Carbon Observation Network (ICOS) 11 Research Infrastructure supports climate science to inform scientists and society on natural and human emissions and uptake of these greenhouse gases from ocean, land ecosystems and atmosphere. The ICOS data portal 12 , which has been setup as a FAIR 13 [4] compliant repository, provides data from over 130 monitoring stations, as shown in Fig. 3. It gives access to high-quality data processed by the Thematic Centers as raw, near real-time and final quality-controlled data, and supplemented with elaborated (model) data and analyses, which is almost always licensed under a CC4BY 14 license.
The IAGOS 15 research infrastructure provides atmospheric composition information including greenhouse gas observations from commercial aircraft. IAGOS data are being used by researchers worldwide for process studies, trend analysis, validation of climate and air quality models, and the validation of spaceborne data retrievals.
The ACTRIS 16 research infrastructure observes aerosols and their precursors. Aerosols also have a large influence on the earth's radiation balance and thus climate, and their concentrations are tightly connected to human activities and emissions.
All of these infrastructures are part of a global endeavour to advance science-based high-quality observations that ultimately allow for better decisions. Therefore, the methods and data are based on global, often community-based standards. Interoperability on the global scale with, for example, the World Meteorological Organisation (WMO) 17 .

Mitigating the Societal and Economic Impacts of Future Volcanic Eruptions and the Role of the European Plate Observing System (EPOS)
The  to a height of around 9 km into the atmosphere. Due to the potential damage to aircraft engines from the ash, the ongoing eruption of Eyjafjallajökull (see Fig. 4) from April to June 2010 led to the largest suspension of commercial air traffic since World War II. This closure of European airspace led to the cancellation of large numbers of flights that left millions of passengers stranded and cost airlines an estimated $200 million per day in lost revenue. The total global losses in GDP due to the prolonged inability to move people or goods have been estimated at approximately $4.7 billion. This figure incorporates both net airline industry and destination losses, along with general productivity losses [5]. The long-term effects of the eruption also continue to impact local inhabitants and the environment due to the potential toxicity to humans, animals and plant life either by direct inhaling the particulates or due to the acid rain that can result from the sulphur in the ash.
Eruptions of Icelandic volcanoes are relatively frequent with events similar to that of the Eyjafjallajökull volcano occurring, on average, every 20-40 years. In this case, the combination of a volcanic event with the prevailing weather conditions caused significant disruption both within Europe and beyond, with major economic and societal impacts. However, the potential for this type of event had been previously been recognised but precautionary measures to limit the impact of such an event had been limited [6].
To mitigate for future volcanic eruptions and reduce the potential impact of these events, enhanced monitoring of Icelandic volcanoes combined with the increased availability of the data for integrated use by multiple agencies, and to provide timely information to local inhabitants has become a priority. Enhanced monitoring of volcanoes also allows better disaster response planning at the local, national and international level in an effort to minimise the impact of future events on both local inhabitants and the wider population.
The European Plate Observing System (EPOS) 18 Research Infrastructure has integrated various solid Earth research facilities, the so-called thematic core services (TCS), into a single framework that facilitates sharing of various data for the solid Earth domain. These facilities range from monitoring networks such as those delivering real-time seismic data from Icelandic volcanoes to Global Navigation Satellite System (GNSS) data used for global positioning and navigation. Data services made available by the EPOS research infrastructure, such as those delivered by the Icelandic FUTUREVOLC 19 supersite initiative, can be used by various agencies in Iceland to provide real-time monitoring information for the approximately 130 Icelandic volcanoes currently known to be either currently or potentially active. This information can be used to provide early warning of an eruption for local inhabitants and can also be used in combination with other types of data such as meteorological information to predict the likely impact of an eruption. For example, the Icelandic Met Office provides information on volcanic activity using colour coding that conforms with the International Civil Aviation Organisation (ICAO) 20 to inform the aviation industry of potential risks to aircraft due to ash plumes associated with an eruption event 21 . This allows better modelling of the potential disruption that may be caused by an eruption depending on different combinations of prevailing winds, type and volume of ejecta, and the duration of any eruption.
The ENVRI community brings together environmental research infrastructures from different domains. Integration of EPOS with those RIs focused on atmospheric data and data products provide the necessary framework for modelling the potential impacts and informing the mitigation strategies for the various agencies that require timely information to inform disaster response and remediation strategies following a major volcanic event.

The Importance of Data Management to Solve Societal and Scientific Questions for the Oceans (SeaDataNet)
The ocean plays a central role in regulating the Earth's climate [12]. As the International Oceanographic Data and Information Exchange (IODE) 22 has announced: "The timely, free and unrestricted international exchange of oceanographic data is essential for the efficient acquisition, integration and use of ocean observations gathered by the countries of the world for a wide variety of purposes including the prediction of weather and climate, the operational forecasting of the marine environment, the preservation of life, the mitigation of human-induced changes in the marine and coastal environment, as well as for the advancement of scientific understanding that makes this possible" 23 . Marine data are important and relevant for many uses such as: • Scientific research to gain knowledge and insight • Monitoring and assessment (water quality, climate status, stock) • Coastal Zone management • Modelling (including hindcast, now-cast, forecast) • Dimensioning and supporting operations and activities at sea (shipping, offshore industry, and dredging industry) • Implementation and execution of marine conventions for the protection of the seas, including aligning with international legislation such as the European Marine Strategy Framework Directive (MSFD).
Acquisition of marine data is expensive: annual cost in Europe estimated at 1.4 Billion e (1 for in-situ data, 0.4 for satellite data). In order to achieve IODE's goals for unrestricted exchange of oceanographic data, professional data management is essential with agreements on standardisation, quality control procedures, long term archiving, catalogue and access. The main objective of data management was to ensure safe and long-term storage of data and metadata so that present and future users are able to use all of the data that have been collected over time.
SeaDataNet 24 is a pan-European infrastructure set up and operated for managing marine and ocean data in cooperation with the National Oceanographic Data Centre (NODCs) and data focal points of 34 countries bordering the European seas, as shown in Fig. 5. SeaDataNet's significant contribution to the ocean data landscape is through the establishment of collaboration across the partners and the agreements on the consistent use of standards and controlled vocabularies for data annotation, formatting and discovery. SeaDataCloud, the EU project currently driving the further development of the SeaDataNet infrastructure will deliver a collaborative and high-performing cloud and virtual research environment (VRE), configured with tools and services for processing essential marine data. Using Open Geospatial Consortium (OGC), ISO, and World Wide Web Consortium (W3C) standards and incorporating scientific expertise, dynamic workflows are configured for analysing, processing, and combining subsets of data. The VRE and workflows will allow data product teams to work more efficiently for processing large amounts of input datasets and generating data products collaboratively, while also adopting innovations like machine learning for QA/QC of large data collections. This way, the production cycle for data products can be reduced in duration and higherquality products can be achieved. One of the challenges is to make the SeaDataNet data, metadata and related services more FAIR [4]. This focuses on improving and optimising Findability, Accessibility, Interoperability, and Re-usability, both for machines and for people, with emphasis on machines. As part of improving FAIRness of SeaDataNet services, several activities are planned and some have already been undertaken.

The ENVRIplus Data to Science Theme
Environmental Research infrastructures are important pillars not only for supporting their own communities, but also (a) for interdisciplinary research, (b) for the European Earth Observation Program COPERNICUS 25 , and (c) as a contribution to the Global Earth Observation System of Systems (GEOSS 26 ). As such, it is very important that the data-related activities of the environmental RIs are well integrated. This requires common policies, models and e-infrastructure to optimise technological implementation, define workflows; and ensure coordination, harmonization, integration and interoperability of data, applications and other services between ESFRI and other research infrastructure initiatives.
The key is common metadata systems that utilise a rich metadata model with formal syntax and declared semantics, which acts as the 'switchboard' for interoperation. Metadata is used to characterise data, services, users and ICT resources (including sensors and detectors). This approach provides an e-infrastructure that is virtualised for end-users but within which expert domain users and ICT experts can work to provide improved services as requirements evolve.
The objectives of this ENVRIplus Data to Science theme were to: • optimise data processing and to develop common models, rules and guidelines for research data workflow documentation; • facilitate data discovery and use, and to provide integrated end-user information technology to access heterogeneous data sources; • make data citable by developing existing approaches with practical examples, exchange of expertise, and agreements with publishers; • facilitate the discovery of software services and their composition; • characterise users and build a community evolving from current RI communities; • characterise ICT resources (including sensors and detectors) to allow virtualisation of the environment (for instance onto Grid-or Cloud-based platforms) such that data and information management and analysis is optimised in use of resources and energy usage; • facilitate the connection of users, composed software services, appropriate data and necessary resources in order to meet end-user requirements.
To maximise re-use of existing technologies and solutions, this theme conducted an in-depth review of the results from the ESFRIs (such as ICOS, Euro-Argo, EPOS and SIOS) [7], and interacted closely with computational e-Infrastructures (such as EGI and CLOUD Nebula, platforms (such as DIRAC), data infrastructures (such as EUDAT CDI and D4Science), and other initiatives working on related issues, such as the European Open Science Cloud (EOSC) that was initiated during the ENVRIplus project.

The FAIR Principles as Guidelines for Data Management
The term FAIR, a set of guiding principles to make data Findable, Accessible, Interoperable, and Reusable was developed in 2014 and published two years later [4].
Based on these 15 principles, a set of 14 metrics have been defined to quantify levels of FAIRness. The latest developments on FAIR are available at GO-FAIR 27 . The FAIR principles are characterised as:

Findable
• F1. (meta)data are assigned a globally unique and eternally persistent identifier.
• F2. data are described with rich metadata. • F3. (meta)data are registered or indexed in a searchable resource.

Accessible
• A1 (meta)data are retrievable by their identifier using a standardised communications protocol.
-A1.2 the protocol allows for an authentication and authorization procedure, where necessary. • A2 metadata are accessible, even when the data are no longer available.

Interoperable
• I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation. • I2. (meta)data use vocabularies that follow FAIR principles. • I3. (meta)data include qualified references to other (meta)data.

Re-usable
• R1. meta(data) have a plurality of accurate and relevant attributes. Although good data management is not a goal in itself, it is a necessary condition that enables innovation, knowledge creation, data and knowledge integration, and reuse of data by other users. There are currently many factors missing or inadequately implemented, and also many institutional barriers that limit the deployment of research data. This situation can be improved using a systematic approach in applying these principles in order to maximise the FAIRness of data management.

Challenges
There are many challenges for ENVRIs on the way to becoming fully FAIR compliant. To begin with, the concept of FAIRness is still evolving and has different interpretations depending on the community of practice that continues to be discussed in different fora such as the Research Data Alliance (RDA 28 ) and the GoFAIR 29 initiative.
One of the biggest challenges for RIs is that most of them are already (partly) operational and rely for a large part on legacy database and metadata systems that were built years or, in some cases, decades ago, and that are based on highly specialised and sometimes informal and dynamically generated community standards. They cannot simply redesign existing systems, and cannot afford system downtime, as this would interrupt their services to users and might even lead to unacceptable data losses.
In addition, the underlying databases are often rigid relational database systems that have been optimised for performance to serve the designated user community of the RI, and in some cases utilise proprietary software that requires authentication and authorisation through custom systems. This complicates the accessibility of the systems and hampers the linking to external catalogues necessary for enhanced findability of the data. These challenges will be discussed further in Chapter 3 of this book.
Interoperability has many facets and one of these involves the translation of community standards to more generally usable metadata standards. This translation from one metadata standard into another (machine operable) metadata standard will potentially lead to risks of loss of information or even errors, which will hamper the acceptance by the involved scientific communities. An important first step on this route to interoperability is the development of controlled vocabularies and data type registries, that document and stabilise the community standards.