Remote Sensing

In this chapter we present the concepts of remote sensing and Earth Observation and, explain why several of their characteristics (volume, variety and velocity) make us consider Earth Observation as Big Data. Thereafter, we discuss the most commonly open data formats used to store and share the data. The main sources of Earth Observation data are also described, with particular focus on the constellation of Sentinel satellites, Copernicus Hub and its six thematic services, as well as other private initiatives like the five Copernicus-related Data and Information Access Services and  Sentinel Hub. Next, we present an overview of representative software technologies for efficiently describing, storing, querying and accessing Earth Observation datasets. The chapter concludes with a summary of the Earth Observation datasets used in each DataBio pilot.


Introduction
Remote sensing is one of the most common ways to extract relevant information about the Earth and our environment. It can be defined as "the acquisition of information about an object or phenomenon without making physical contact with the object and thus in contrast to on-site observation, especially the Earth, including on the surface and in the atmosphere and oceans, based on propagated signals (e.g. electromagnetic radiation)" [1]. The term "remote sensing" was first utilized in the early 1960s to describe any means of observing the Earth from afar, particularly as applied to aerial photography, the main sensor used at that time. Today, as a result of rapid technological advances, we routinely survey our planet's surface from different platforms: low-altitude unmanned aerial vehicles (UAVs), airplanes and satellites. The surveillance of Earth's terrestrial landscapes, oceans and ice sheets constitutes the main goal of remote sensing techniques [2]. Remote sensing acquisitions, done through both active (synthetic aperture radar, LiDAR) and passive (optical and thermal range, multispectral and hyperspectral) sensors, provide a variety of information about the land and ocean processes. In a broader context, remote sensing activities include a wide range of aspects, from the physical basis to obtain information from a distance, to the operation of platforms carrying out the sensor system, and further to the data acquisition, storage and interpretation. Then, the remotely collected data are converted to relevant information, which is provided to a vast variety of potential end users: farmers, foresters, fishers, hydrologists, geologists, ecologists, geographers, etc.
The use of Earth observation data imposes a series of technological challenges to: • Combine satellite data with in situ or enterprise data.
• Understand, select, download, conserve and process data.
• Harness a range of scientific and technical skills and manpower.
• Load and store petabytes of data.
• Deploy high-performance processing capabilities.

Earth Observation Relation to Big Data
Different types of Earth observation data have been produced over the last forty years, bringing significant changes in the context of the big data concept. Moreover, the precise and up-to-date worldwide Earth observation data are changing the way that Earth is interpreted. It is leading to the implementation of applications powered with humongous amounts of remote sensing information. In that regard, several of the remote sensing data characteristics allow us to consider remote sensing data as big data:

• Volume
Among the various areas where big data sets have become common, the ones related to remote sensing and information and communication technology are foremost, since the datasets involved have reached huge dimensions. This makes exceptionally complex their visualization, analysis and interpretation [2]. Besides, just in 2010, the satellite observation networks around the world had more than 200 on-orbit satellite sensors [3], capturing several gigabytes of information per second [3]. Nowadays, with the advent of the Copernicus programme with its Sentinel and contributing missions' satellites and with the entering into the commercial market of the US satellite operator Planet, the observation capacities dramatically increased, adding several petabytes of annual observations. According to Open Geospatial Consortium (OGC), the worldwide observation information currently most likely surpasses one exabyte.

• Variety
Variety refers to the number of types of data, and concerning remote sensing data, it is specifically linked to structured information such as images obtained by satellite sensors. More specifically, in this context, variety depends on the different resolution (spectral, temporal, spatial and radiometric) of the captured data. Remote sensing data variety is enormous. There are approximately 200 satellite sensors with a huge variety of spatial, temporal, radiometric and spectral resolutions [3]. Thus, for instance, satellites have a wide range of orbital altitudes, optics, and acquisition techniques. Consequently, the imagery acquired can be at very fine resolutions (fine level of detail) of 1 m or less with very narrow coverage swaths, or the images may have much larger swaths and cover entire continents at very coarse resolutions (>1 km). In addition, the satellites are equipped with sensors capable of acquiring data from portions of the electromagnetic spectrum that cannot be sensed by the human eye or conventional photography. The ultraviolet, near-infrared, shortwave infrared, thermal infrared and microwave portions of the spectrum provide valuable information of critical environmental variables [1].

• Velocity
Velocity refers to the frequency of incoming data and the speed at which is generated, processed and transmitted. In the case of remote sensing data, the orbital characteristics of most satellite sensors enable repetitive coverage of the same area of Earth's surface on a regular basis with a uniform method of observation. The repeat cycle of the various satellite sensor systems varies from 15 min to nearly a month. This characteristic makes remote sensing ideal for multi-temporal studies, from seasonal observations over an annual growing season to inter-annual observations depicting land surface changes [2].

Formats and Standards
Nowadays, remote sensing images (both, currently acquired and historical images) are typically distributed in digital format. A digital image is a numeric translation of the original radiances received by the sensor, forming a 2D matrix of numbers. Those values represent the optical properties of the area sampled, where the pixel represents the minimum spatial unit of measurement within the sensor coverage [2].
The following are the file formats most generally accepted as standards for encoding and transferring the remote sensing images: • HDF 1 is a self-describing and portable, platform-independent data format for sharing science data, as it can store many different kinds of data objects, including multi-dimensional arrays, metadata, raster images, colour palettes and tables in a single file. There is no limit on the number or size of data objects in the collection, giving great flexibility for big data. • NetCDF 2 is also a self-describing, portable and scalable format that is currently widely used by climate modellers. • JPEG 2000 3 is an image coding system that uses state-of-the-art compression techniques based on wavelet technology and offers an extremely high level of scalability and accessibility. Content can be coded once at any quality, up to lossless, but accessed and decoded at a potentially very large number of other qualities and resolutions and/or by region of interest, with no significant penalty in coding efficiency. Typically used for distributing Sentinel-2 images. • GeoTIFF 4 is a public domain metadata standard which allows georeferencing information to be embedded within a TIFF file. The potential additional information includes map projection, coordinate systems, ellipsoids, datums and everything else necessary to establish the exact spatial reference for the file. More interestingly, "Cloud Optimized GeoTIFF" (COG)-a standard based on GeoTIFF-is designed to make it straightforward to use GeoTIFFs hosted on HTTP web servers, so that users/software can make use of partial data within the file without having to download the entire file. It is designed to work with HTTP range requests and specifies a particular layout of data and metadata within the GeoTIFF file, so that clients can predict which range of bytes they need to download.
These specially designed data formats work quite well when the amount of data is not very large. However, issues start to arise when data volumes increase. The most obvious problem is that it is not easy to find, retrieve and query the information needed.
A lot of effort has been spent during the last years for standardising many of the EO ground segment interfaces in the context of HMA (OGC) 5 [4] and CEOS 6 [5]. The interfaces for which widely accepted standards exist and are deployed include: • EO dataset/product metadata [6].
Further details concerning standards for EO metadata and discovery interfaces can be found in Chap. 2 "Standardized EO data platforms".

SENTINEL-1
With the objectives of land and ocean monitoring, SENTINEL-1 is composed of two polar-orbiting satellites operating day and night and will perform radar imaging, enabling them to acquire imagery regardless of the weather

SENTINEL-2
Its main objective is land monitoring, and the mission is composed of two polar-orbiting satellites providing high-resolution optical imagery. Vegetation, soil and coastal areas are among the monitoring objectives

SENTINEL-3
Its primary objective is marine observation, with focus on studying sea surface topography, sea and land surface temperature, ocean and land colour. Composed of three satellites, the mission's primary instrument is a radar altimeter, but the polar-orbiting satellites will carry multiple instruments, including optical imagers

SENTINEL-4
It is dedicated to air quality monitoring. Its UVN instrument is a spectrometer carried aboard Meteosat Third Generation satellites, operated by EUMETSAT.
The mission aims to provide continuous monitoring of the composition of the Earth's atmosphere at high temporal and spatial resolution, and the data will be used to support monitoring and forecasting over Europe

SENTINEL-5
It is dedicated to air quality monitoring. The SENTINEL-5 UVNS instrument is a spectrometer carried aboard the MetOp Second Generation satellites. The mission aims to provide continuous monitoring of the composition of the Earth's atmosphere. It provides wide swath, global coverage data to monitor air quality around the world SENTINEL-5P A precursor satellite mission SENTINEL-5P aims to fill in the data gap and provide data continuity between the retirement of the Envisat satellite and NASA's Aura mission and the launch of SENTINEL-5. The main objective of the Sentinel-5P mission is to perform atmospheric measurements, with high spatio-temporal resolution, relating to air quality, climate forcing, ozone and UV radiation

Copernicus Programme and Sentinel Missions
The Copernicus EO programme is a cooperation of the European Union (EU) and the European Space Agency (ESA). This agency is responsible for coordinating the satellite acquisition and delivery of the EO data. Since the launch in 2014 of Sentinel-1A, the fleet of Sentinel satellites is delivering data for environmental monitoring and civil security applications. Copernicus is served by a set of dedicated satellites (the Sentinel families) and contributing missions (existing commercial and public satellites). The Sentinel satellites are specifically designed to meet the needs of the Copernicus services and their users (Table 4.1).

Thematic Services
Besides the Sentinel satellite constellation, Copernicus also provides access to specific services, which fall into six main thematic categories 7 : services for land management, services for the marine environment, services relating to the atmosphere, services to aid emergency response, services associated with security and services relating to climate change.
• Land Monitoring: Monitoring the Earth's land is useful for many fields, particularly agriculture, forestry, topography and land-cover and land-change studies. The data can be used to track current trends and predict future changes. • Marine Monitoring: Information on the state and dynamics of the ocean and coastal zones can be used to help protect and manage the marine environment and resources more effectively, as well as ensure safety at sea and monitor pollution from oil spills and other events. • Atmospheric Monitoring: Monitoring the quality and condition of our planet's atmosphere is important in that it helps us to understand how we may be affected and is an essential tool in forecasting weather events. • Managing Emergency: When an emergency occurs, satellite data can prove essential in forming a response. Historical data can provide perspective on a situation, while current data can help to analyse and manage the emergency. • Security: Surveillance and security can be difficult to manage from the ground.
Observations from space can make monitoring borders and sea routes much easier and track developing situations. • Climate Change: Satellites are a vital tool in monitoring our world's changing climate, providing wide-scale views of affected areas and contributing to growing archives of data for use in long-term studies.
Most of the data and information are delivered by Copernicus, and its services are made available via a "free, full and open" policy to any citizen and any organization everywhere on Earth.
For dissemination of level 0, level 1 and level 2 products, ESA provides access via the Copernicus Open Access Hub 9 portal, providing access to Sentinel-1, -2, -3 and -5p data through an interactive graphical user interface. Additionally, there are the Collaborative Data Hub, International Access Hub and Copernicus Services Data Hub which are providing access to public authorities, European projects and Copernicus services.

DIAS
In order to facilitate the access of Earth observation products and the development of EO-powered applications for end users, five different Data and Information Access Services (DIAS) are available (see Table 4.2). The DIASes provide access to product repositories in cloud storage. They primarily are not thought to be used as "dissemination" hubs (download bandwidth is even lower than at Open Access Hub, and it is generally not free). The DIAS provides platforms for hosting processing in vicinity to the cloud storage. End users can bring their algorithms and run them with free and fast access to the product data (by combining simple access to curated petabyte-size collections of Copernicus, other satellite and third-party data). Eventually, the end user only needs to download the (typically low volume) processing results and not the (high volume) satellite input products.

Other
Other data access portals are available as well: • Amazon Web Services (AWS) and Google Cloud Platform (GCP) offering storage and processing platforms services similar like the DIAS but differing in product offers and service pricing • Sentinel Hub 10 is a commercial data access and on-the-fly processing software instantiated on AWS and on two of the DIAS and exposing an application programme interface (API) to user applications for accessing Copernicus and Landsat products and derivatives.

Selected Technologies
The present section identifies information technology domains and contains further practically relevant insights (mainly from DataBio data access components) into these for builders of applications and systems using EO data and cloud-based environments.

Metadata Catalogue
As per the OGC definition 11 : "Catalogue services support the ability to publish and search collections of descriptive information (metadata) for data, services and related information objects. Metadata in catalogues represent resource characteristics that can be queried and presented for evaluation and further processing by both humans and software. Catalogue services are required to support the discovery and binding to registered information resources within an information community". In the case of Earth observation datasets, a series of specific EO metadata profiles have been defined in order to facilitate their description and findability. Chapter 2 "Standards and EO data platforms" provides further details about them. The following describes the concrete EO metadata catalogue implementations used in DataBio.

FedEO Gateway
This component [13] acts as a unique endpoint allowing clients to access metadata and data from different backend EO catalogues implementing different protocols. It supports access through OGC 10-032r8 and OGC 13-026r8 OpenSearch interfaces and provides atom responses with metadata in OGC 10-157r4 format (i.e. EO profile observations and measurements). Alternative response formats such as RDF/XML, Turtle, JSON-LD and GeoJSON (OGC 17-003) are available as well. SRU-style bindings and W3C linked data platform bindings are available as well.

FedEO Catalogue
This component [13] implements an EO catalogue server allowing to store EO (satellite) collections (series) and products (datasets) metadata. It offers an API to populate the catalogue and an API to search the catalogue.
Both components have been developed by Spacebel s.a.

Object Storage and Data Access
GeoRocket GeoRocket 12 is a high-performance data store for geospatial files developed by Fraunhofer Institute for Computer Graphics Research IGD. It can store 3D city models (e.g. CityGML), GML files or GeoJSON data sets. It provides the following features: • High-performance data storage with multiple back ends such as Amazon S3, MongoDB, distributed file systems (e.g. HDFS or Ceph), or your local hard drive (enabled by default) • Support for high-speed search features based on the popular open-source framework elasticsearch. You can perform spatial queries and search for attributes, layers and tags. • Its design and implementation (based on the open-source toolkit Vert.x), makes it perfectly suitable for being deployed in Cloud environments, making it reactive and capable of handling big files and larger numbers of parallel requests.

Rasdaman
Rasdaman 13 is an array database system, which provides flexible, fast, scalable geoservices for multi-dimensional spatio-temporal sensor, image, simulation and statistics data of unlimited volume. Data are stored in a PostgreSQL database, thereby achieving full information integration (e.g. latitudes, longitudes, time coordinates, resolutions and other ancillary annotations.). Ad-hoc access, extraction, aggregation, as well as remix and analytics are enabled through a new SQL raster query language-the Rasdaman query language (RasQL)-with highly effective serverside optimization. The core features include-truly multi-dimensional-1D, 2D, 3D, 4D, and beyond-powerful, flexible query language for visualization, classification, convolution, aggregation and many more geospatial functions spatial indexing and adaptive tiling for fast data access-parallelization and for unlimited scalability from laptop to cluster and cloud-full information integration of raster data with all geo data in the PostgreSQL database-support for the raster-relevant OGC standards, reference implementation for WCS core and WCPS.

Data Cubes
EO data cubes are an advanced way how users interact with large spatio-temporal EO data [14]. Figure 4.1 illustrates the principle. The idea is to read incoming image tiles covering an area ("Dice") and arrange these in time series pixel stacks ("Stack"). This makes access to the time series of observations ("Use") much easier. Data cubes implementations (such as Rasdaman or ADAM 14 ) allow accessing a large variety of multi-year global geospatial collections enabling data discovery, visualization, combination, processing and download. They permit to exploit data from global to local scale (taken from distributed data sources are made accessible through the data cube layer that exposes OGC-standardized interfaces). On top of the data cube layer, platform-based interfaces (web application, mobile application, Jupyter Notebook and APIs) as well as third-party user interfaces can be deployed.
Another example is Xcube, 15 which is an open-source Python package for generating and exploiting data cubes. It comprises one of the core parts of the Euro Data Cube (EDC), 16 together with the Sentinel Hub. The EDC engine is able to technically serve custom raster data in addition to the freely available EO data archives like Sentinel, Modis or Landsat.

Usage of Earth Observation Data in DataBio's Pilots
A significant part of the 27 DataBio pilots uses Earth observation data as input for their specific purposes, in the context of efficient resource use and increasing productivity in agriculture [15], forestry [16] and fishery [17] (Table 4.3). The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.