Introduction

Satellite-based Earth observation programs are increasingly providing ever-larger datasets for end-users to access and analyse for a variety of applications. A key component of such programs is to enable users to access datasets through portals that (1) facilitate rapid identification of the relevant data or data products at various levels of pre-processing and that (2) enable efficient downloading of the data products for direct application or further analysis. Many of the solutions to achieving these aims are provided by the programs themselves, while others are generated by the wider research or end-user community as unforeseen technical issues or user needs arise.

The European Copernicus program is a major publicly funded Earth observation initiative managed by the European Commission on behalf of the European Union Member States and in partnership with other agencies (i.e., the European Space Agency, the European Environment Agency, etc.). The Copernicus program is divided into three distinct components: (1) the space component, which manages satellite missions such as the Sentinel family of satellites; (2) the in-situ measurements components; and (3) the services component. The service component is in turn divided according to six main themes: (1) Atmosphere; (2) Marine; (3) Climate Change; (4) Security; (5) Emergency and (6) Land. Within the latter theme, mm-scale to cm-scale measurements of ground surface motions made across Europe by Interferometric Synthetic Aperture Radar (InSAR) analyses of Sentinel-1 SAR satellite imagery (SAR for Synthetic Aperture Radar) (Costantini et al. 2021; Crosetto et al. 2020, 2021; European Ground Motion Service — Copernicus Land Monitoring Service) are now freely available through the European Ground Motion Service (EGMS).

The EGMS aims to offer extensive dissemination of satellite-derived ground surface motion measurements to end users. Users interact with EGMS via an online platform named EGMS Explorer (European Ground Motion Service Explorer). They can there visualise EGMS-derived displacement velocities on an interactive map that is rendered through a JavaScript web application. Users can also display local time series of displacements by selecting an InSAR measurement point in the online Explorer. The time series of InSAR measurements have a temporal resolution up to 6/12 days, as determined by the revisit times of the Sentinel-1 satellites. For registered users (free registration), EGMS allows datasets to be downloaded for standalone use and further processing and analysis.

Unfortunately, there are several bottlenecks in the dissemination of EGMS products (e.g., downloading) through the online platform. A non-exhaustive list of the limitations includes: (1) the splitting of searches if the requested area is too large; (2) the lack of tools to select datasets with regard to satellite parameters (e.g., satellite track and direction) (see Sect. Sentinel-1 satellites and data); and (3) the complex and inefficient post-processing of EGMS-derived data files (e.g., merging of files, etc.). These issues arise due to certain limitations at server level (i.e., number of parallel downloads, number of queries, etc.), and they mean that large-scale analyses of EGMS data are relatively complex and are not as straightforward or as quick as analyses of relatively small areas. Some user community members recently released (Festa and Del Soldato 2023) a standalone application to address some of the issues within the post-processing workflow, but this application still requires manual interaction with EGMS Explorer for selecting the datasets, which maintains many of the limitations of EGMS Explorer and does not resolve the difficulties for large-scale applications.

In this article, we present the EGMS-toolkit: a set of Python (version 3) scripts to “automatically” detect and download EGMS products, and then merge, crop, and clip individual EGMS files into seamless mosaics. The EGMS-toolkit is suitable for non-expert users, it can be easily installed on any operating system, and it provides a unified workflow for large-scale applications. Below we first give an overview of the Sentinel-1 satellites, the InSAR technique and the EGMS datasets. We then present the toolkit, its design, and an associated example of its use and functionality. Finally, we discuss large-scale applications and future developments.

Overview of Sentinel-1 SAR data and EGMS InSAR products

Sentinel-1 satellites and data

Sentinel-1 consists of a constellation of twin C-band SAR satellites in sun-synchronous and near-polar orbit: Sentinel-1 A and Sentinel-1B. Sentinel-1 A was launched in 2014 (April) and has systematically acquired SAR images over land (and selected maritime regions) with a revisiting time of 12 days. Sentinel-1B was launched in April 2016, and, until it was lost due to power failure at the end of 2022, it reduced the temporal resolution of Sentinel-1 acquisitions to 6 days. The SAR data from Sentinel-1, and therefore the InSAR-derived displacement measurements, can cover a maximum period of 2014 to the present day with a temporal resolution up to 6 to 12 days. New satellites for the same constellation (i.e., Sentinel-1 C and Sentinel-1 D) are planned to be launched in the upcoming years.

The Sentinel-1 satellites are right-looking – the radar beam is transmitted rightward of the satellite heading (see Fig. 1). Consequently, as the satellite orbits the rotating Earth, the direction from which an area on the ground is illuminated by the radar beam varies according to whether the satellite heading is southward (descending) or northward (ascending). Regarding the satellite configuration, the ground is illuminated from the east on the descending pass and from the west on the ascending pass. The orbital tracks taken by the satellites as they traverse the globe are defined by a unique number and often called track or relative orbit. Additionally, the radar beam is inclined toward the Earth’s surface at a look angle of around 29°-46° from vertical (nadir). Observations of ground motion are thus made in the oblique, one-dimensional Line-of-Sight (LOS) of the satellite.

SAR satellites can use different acquisition modes depending on the mission requirements (e.g., spatial resolution, spatial coverage, etc.). The default mode of Sentinel-1 satellites is the Interferometric Wide Swath (IW) mode (De Zan and Monti Guarnieri 2006) – this is used by the EGMS. The IW acquisition mode acquires data with a ~ 250-km (in range) swath made up of 3 sub-swaths (see Fig. 1). Each sub-swath consists of a continuous set of independent bursts with a small overlap. A single burst covers an area of approximately 21 km parallel to the satellite trajectory (azimuth direction) and 80 km perpendicular to the trajectory (range direction) (see Fig. 1). Each burst can be considered as a single image for which InSAR computation can be performed. The spatial resolution (ground sampling dimensions) of a Sentinel-1 SAR image is approximately 4 m by 14 m in azimuth and range directions, respectively.

Fig. 1
figure 1

Schematic representation of the IW acquisition mode of the Sentinel-1 satellites – successive acquisitions of Sentinel-1 bursts

EGMS products

EGMS provides unified, calibrated and verified ground-surface displacement measurements via InSAR (Crosetto et al. 2021). InSAR exploits phase information of two SAR images of a specific region collected at different times (Hanssen 2001). The images are correlated (i.e., aligned) and the phase information is differenced to produce an interferogram (Ferretti et al. 2023; Massonnet and Feigl 1998). The differential phase information represents a direct measure of ground surface displacements that have occurred between the two times of image acquisition, once other differential phase contributions (i.e., topography, atmospheric vapor content, etc.) (e.g., Jolivet et al. 2014; Jolivet et al. 2011) are removed. From multiple SAR acquisitions, InSAR Time-Series-Analysis algorithms (or approaches) (e.g., Casu et al. 2006; Ferretti et al. 2001; Hooper et al. 2012; Hooper et al. 2007; Hooper et al. 2004; Osmanoğlu et al. 2016) can resolve the temporal evolution of ground surface displacement by extracting phase information from points with a high signal-to-noise ratio.

Since the late 1990s, InSAR has been used in a variety of applications that include: (1) monitoring of volcano unrest and activity (Hrysiewicz et al. 2023b; Pinel et al. 2014); (2) landslide detection (Solari et al. 2020); (3) mapping of active geological fault movements (e.g., earthquakes) (Jonsson et al. 2003); (4) detection of subsidence induced by extraction of hydrocarbons, minerals or groundwater (Hrysiewicz et al. 2023a); and (5) quantification of infrastructure instabilities and urban subsidence (Yan et al. 2012). However, the extraction of InSAR-derived motions remains computationally challenging, resource-intensive (i.e., IT resources), and time-consuming for most stakeholders. These limitations are accentuated when it comes to large-scale and long-term monitoring of ground surface displacements. The EGMS thus represents a solution for wide-scale dissemination of reliable ground motion data to end users, and it is the first open access and freely available InSAR analysis undertaken at continental scale (Potin et al. 2019).

After quality control and validation by internal and external consortia, the EGMS provides an annual update of the ground surface displacements across Europe. At the time of writing, there have been two releases of EGMS data, each spanning a different time-period. The first data release was in November 2022 and the data set spans the period February 2015 to end-2021. The second data release was in July 2023 and that dataset spans the period 2018 to end-2022. The EGMS-derived InSAR products are currently available at three processing levels: basic (L2a); calibrated (L2b); and ortho (L3) (Crosetto et al. 2021).

The L2a (uncalibrated or basic) products corresponds to ground surface motions estimated by InSAR, at full spatial resolution, from the four different algorithms and approaches as unified through the EGMS (Ferretti et al. 2023): PSP-INSAR (Costantini et al. 2013), SqueeSAR (Ferretti et al. 2011), NORCE InSAR processing chain, and Persistent Scatterer InSAR (Kampes 2005). The used algorithms vary per track numbers and can be found in the metadata files of EGMS datasets. However, the algorithms have been unified (e.g., in terms of pixel selection) to produce similar results. The data comprise the geolocation of each measurement point, the InSAR-derived displacement velocity (i.e. linear trend of displacement time-series temporal evolution) at that point, the Line-of-Sight (LOS) displacement time-series for that point and some computed parameters (i.e., standard deviation of displacement velocities, acceleration of displacements, etc.). Ground surface motions are given relative both spatially and temporally to a stable reference point, which is selected on a burst-by-burst basis (Ferretti et al. 2023). The L2b (calibrated) products are also provided at full resolution and derived from L2a products, but are corrected via GNSS data to obtain absolute LOS ground surface displacements. The calibration is based on a GNSS model (vertical and horizontal velocity on a regular grid) estimated from three sources of GNSS observations (EUREF’s EPN Densification Program, data from Nevada Geodetic Laboratory, and EUREF’s Working Group on European Dense Velocities). Then the model is used to correct the EGMS-derived velocities. More information can be found in Larsen et al. (2023). The L2a and L2b datasets are available for the two orbit directions (ascending and descending). Each L2a and L2b data file that can be accessed from the EGMS corresponds to a single Sentinel-1 burst on one of the orbit directions (ascending or descending).

The L3 (ortho) products consist of the vertical (Up-Down) and horizontal (East-West) components of displacement as obtained by combining ascending and descending L2b LOS displacements (Wright 2004). The resulting products are the time series and velocities of absolute ground surface motions in vertical and horizonal directions. Due to the merging of LOS displacement points that correspond to slightly different ground positions, the L2b measurement points are gridded onto a regular 100-m grid (in EPGS:3035) adopted from the Copernicus Digital Elevation Model (Festa and Del Soldato 2023). The L3 InSAR-derived measurement points are thus virtual because they are produced by gridding. In the EGMS platform, the datasets are separated into 100-km-coverage tiles (European Ground Motion Service Explorer).

L2a and L2b datasets are delivered by text-file format (in csv, Comma-Separated Values, format) containing all point-measurement information (i.e., locations, displacement velocity, etc.) and time series of displacements. Similar text files are delivered for L3 dataset but with additional raster images (in GeoTIFF format) of displacement velocity on the L3 spatial grid.

Toolkit design and implementation

Requirements

EGMS-toolkit is written in Python 3. The package contains all the requirements files (i.e., setup.py, etc.) and can be easily installed – after being downloaded from GitHub – with the pip (for Python 3) package manager. EGMS-toolkit requires a few external Python packages (i.e., shapely, numpy, fiona, etc.) which are automatically installed by the package manager. The user additionally needs a correct installation of GDAL (GDAL/OGR Geospatial Data Abstraction software Library, 2024), and GMT (with the GSHHG files for country borders and coastlines) (Wessel et al. 2019; Wessel and Smith 1996) geospatial mapping tools. The Python package can be used in a Python script or a command line in the operating system shell through a dedicated console entry-point: EGMStoolkit. In addition, EGMS-toolkit is provided with full documentation (local and online).

Python package structures

EGMS-toolkit consists of three Python classes (with associated methods), one Python independent module (named egmsdatatools) and two other scripts adding some functions to the classes. The workflow is cascaded (see Fig. 2). The first class must be used by the second class, and the third class requires the second class. The independent module corresponds to certain post-processing tools (merging, cropping, etc.).

The first Python class – called S1burstIDmap – is used to detect and download the burst identity map (Sentinel-1-Burst-ID) from the European Space Agency (ESA) server. This map contains the predicted locations of Sentinel-1 bursts and is required by EGMS-toolkit to automatically select datasets of EGMS-derived displacements in the L2a and L2b levels.

The second Python class – called S1ROIparameter – is for detecting EGMS datasets based on a Region of Interest (ROI). The user must first specify the ROI with a coordinate string (W, S, E, N) in EPSG:4326 format, an ESRI shapefile (multiline string format) or a country code (i.e., FR for the French metropolitan territory). If the country code is given, the toolkit generates the ROI outline by using the GMT software (Wessel et al. 2019; Wessel and Smith 1996). This ROI will be converted into a unique ESRI shapefile by EGMS-toolkit. Finally, this Python class has a method for detecting the bursts and tiles corresponding to the ROI. At this point, the first Python class is used as an input argument. The results of the user searches can be saved and reloaded.

The third Python class – called egmsdownloader – is for downloading EGMS files available online. The user is required to give the second Python class as an input argument. For authentication on the EGMS server, the user must provide their “token”. This 32-length-character string token is a time-limited user ID generated by the user authentication on the EGMS Explorer. This token can be found at the end of any download link generated by using the EGMS Explorer. When the user downloads files with EGMS-toolkit, any bursts or tiles that are not covered by EGMS data (e.g., because the corresponding area of the user’s ROI is the sea) are automatically rejected. Files are stored in an output directory according to their levels (L2a, L2b or L3), and their releases. In addition, multiple searches can be concatenated and simultaneously downloaded.

Fig. 2
figure 2

Schematic illustration of EGMS-toolkit with an overview of Python modules/classes. Red and blue areas correspond to the parts provided by the toolkit. The green parts are the final files, which can be open by the user via their preferred GIS software

The final part of EGMS-toolkit is a Python module for post-processing (see Fig. 2). Several functions are provided to enable the user to: (1) merge files to obtain a single file for the satellite tracks and passes; (2) crop or clip files to the user’s ROI; (3) interpolate .csv files into raster files; and (4) convert text files into other formats. By default, the conventional post-processing workflow is: (1) merge files; (2) crop/clip files; and (3) convert files. The toolkit offers the possibility of applying post-processing to files regardless of their formats and levels (e.g., merging of L2 files in .csv or .tiff formats). In addition, if functions are applied on .csv files, the user can define the parameters that will be saved (i.e., velocity, standard deviation of velocity, etc.). Supplementary to .csv and .tiff formats, another option of EGMS-toolkit is the ability to use the GDAL Virtual Format (VRT) to merge and interpolate EGMS datasets. This format allows the user to minimise the pressure on RAM space, which is especially useful in case of large-scale analysis.

Merging of datasets

For the L2a and L2b levels, and due to the Sentinel-1 acquisition mode and the processing workflow of the EGMS consortium, measurement points in burst overlaps (and swath overlaps) can be estimated by backward and forward bursts. The resulting points are therefore “duplicated” (2–4% of points in the example presented in the next section) and may create inaccurate statistical analyses (i.e., average displacement velocity over an area located in the burst overlaps) if the L2a and L2b files are merged without taking the “duplicated” points in account. To solve this problem, EGMS-toolkit offers two merging options (see Appendix 1):

  1. 1.

    without removal of “duplicated” points within overlaps;

  2. 2.

    with removal of “duplicated” points within overlaps. This option is based on an implementation of the concave hull algorithm (Park and Oh 2012). This algorithm produces the best reduction in “duplicated” points in terms RAM memory usage and time consumption. The user can modify the option (and associated parameters).

For L3 datasets, the measurement dates can be different for each tile from the L2b datasets used to compute the vertical and horizontal components of displacements. EGMS-toolkit checks each virtual measurement point to create a unique time sampling. If a point has no measurement for a date, a “nan” value is used to fill the time-series dataframe.

Example of EGMS-toolkit in use

The following section is an example of the use of EGMS-toolkit over the city of Dublin, Ireland. The aim is to obtain:

  1. 1.

    calibrated (L2b) LOS displacements from the ascending satellite direction for the first data release, and.

  2. 2.

    vertical displacements over the city for the second data release.

Selecting and downloading the relevant EGMS datasets

The first step is dataset selection. The user first gives a ROI (black rectangle in Fig. 3). The ROI is centred on Dublin, Ireland. The first search is the selection of ascending datasets regardless of relative orbits. EGMS-toolkit detects two different tracks (001 and 103) with the user ROI coverage. Several bursts are required to cover the Dublin area: i.e., four for the track 001 (blue rectangles) and three for the track 103 (green rectangles in Fig. 3). For vertical displacements, the toolkit detects only one tile (red rectangle) covering Dublin, with the ROI corresponding to a relatively small part of this tile. The total number of measurement points is as follows: (1) 2,285,407 points for 001; (2) 1,779,313 for 103; and (3) 128,757 for vertical displacements.

The user – by giving their “token” – can then download the detected datasets. The download time (around 10 min on a public WI-FI network of the authors’ University), will vary depending on the data volume, internet connection speed and other factors. The downloaded files are then automatically stored and decompressed for post-processing. In this example, EGMS-toolkit downloads 8 .csv files and one .tiff image (for the L3 dataset) (see Figs. 3 and 4a). However, the number of files can reach several hundred when large-scale analyses are carried out.

Fig. 3
figure 3

Screenshot of the data selection figure generated by EGMS-toolkit. This figure can be stored in .jpg format or viewed directly in the user’s internet browser. The tooltips (not visible on the screenshot) give information on the track numbers and satellite directions. The green rectangles correspond to the L2b measurements of the 001 ascending datasets. The blue rectangles are for the L2b measurements for the 103 ascending datasets. The red rectangle is for the vertical (L3) measurements of displacements. The black rectangle corresponds to the user ROI. Coordinates in EPSG:4326. Basemap Data: ©OpenStreetMap Contributors

Post-processing of datasets

The first post-processing step is to merge datasets to obtain a single file (.csv or .tiff) for each of the downloaded dataset – i.e., here for the track 001 LOS displacements, the track 103 LOS displacements, and the vertical displacements. The number of files thus decreases to three in this example, and the number of measurement points in each file decreases due to the removal of “duplicated” points. The merged files now contain 2,149,411 points for track 001 and 1,747,525 points for track 103 – i.e., ~ 6% and ~ 2%, respectively, of the downloaded points are considered “duplicates” and thus reduced. The number of measurement points for the L3 dataset is unchanged (see Fig. 4b). The .csv-related merging step occurs in a couple of minutes if only one parameter for all measurement points (i.e., velocity, single-time displacements, etc.) is saved and around 10 min if all the parameters are saved (time measured on a Mac M1 2020, 16 GB of RAM). The time to merge .tiff files is negligible compared to the time required for .csv files.

The user can automatically crop/clip the merged datasets based on the given ROI. The numbers of measurement points in the clipped datasets drops to 1,201,018 for the 001 dataset; 970,960 for the 103 dataset; and 26,682 for the L3 dataset. With the same computer used for the previous step, the time required varied from 1 min to 3 min (see Fig. 4c).

Then, the gridding of the displacement-acceleration parameter (second derivative of displacement time series) of displacement times series, provided by EGMS and only available in .csv format, is carried out from the .csv file of the L3 dataset. In this case, the clipped/cropped result is produced from the result of the interpolation function (see Fig. 4d). The time required to process interpolation varies widely depending on the selected parameters (spatial resolution, algorithm, etc.). In this example, this operation needs some seconds.

Fig. 4
figure 4

Example of post-processing with EGMS-toolkit. (a) Measurement-point locations from the .csv files downloaded for the 001-ascending dataset. The colours represent the different files. (b) Measurement-point locations for the 001-ascending dataset after merging. (c) Measurement-point locations for the 001-ascending datasets after merging and clipping. (d) Map of displacement-acceleration parameter from the vertical-displacement dataset, after interpolation and cropping. Coordinates in EPGS:3035. Basemap Data: Google, Landsat / Copernicus / ©2023 TerraMetrics / ©2023 Airbus / ©2023 Maxar Technologies / IBCAO

No data visualisation tools are provided by EGMS-toolkit because many powerful open-source or commercial Geographic Information System (GIS) software packages are already available. The generated csv and tiff files can be opened in any GIS program, such as QGIS (QGIS 2024) or ArcGIS Pro (ArcGISPro 2024), by following the guidelines provided by the EGMS consortium (Crosetto and Cuevas-Gonzalez 2024).

During the post-processing (i.e., clipping and cropping), the attribute values of EGMS-derived measurement point are not modified (except for gridding). Potential errors and uncertainties are therefore related to InSAR processing. A comprehensive description of EGMS limitations and errors is provided by the EGMS consortium via a validation report (Calero et al. 2023). In summary, the agreement between EGMS results and in-situ measurements (e.g., GNSS) is excellent. However, EGMS results are less accurate and less precise for non-linear displacement trends: e.g., seasonal variations in displacement are affected by the GNSS calibration and could be erroneous. It is therefore recommended that “seasonal effects should only be interpreted, with some care, for Basic/L2a products” (Calero et al. 2023).

Discussion and future developments of EGMS-toolkit

Recommendations for use in large-scale analysis of ground motions

For a large-scale analysis of EGMS ground surface motion data, we recommend using the L3 dataset because the LOS displacements (L2a or L2b levels) are more complex to analyse and require more resources to process. Figure 5 gives an example of the results of a large-scale post-processing of EGMS data by EGMS-toolkit over Ireland and Great Britain. Two distinct post-processing operations were carried out: (1) for Ireland by using a rectangle ROI; and (2) for Great Britain by using a ROI generated in the toolkit from the UK country code. In both cases, the L3 datasets of vertical and horizontal displacements, from each of the two data releases, were downloaded and merged. No cropping or clipping has been done. On a Linux workstation (Intel® Xeon W-2104 CPU, 31 GB of RAM memory), the time required for processing was several hours (including download, merging of .csv and .tiff files), and required no user intervention during the workflows (see Appendix 2). Each dataset represents several millions of virtual measurement points (2.5 million over Ireland and 7.5 million over Great Britain) for a total of ~ 62 GB. Time series were stored in .csv files and .tiff images of displacement velocities are directly ready for analysis.

For large-scale analyses, merging of L2a and L2b datasets may be limited by RAM space. To resolve this limitation, the user can currently merge datasets in VRT format. We recommend avoiding the use of formats other than those provided by EGMS. For example, the shapefile format may have several limitations regarding file size (should be less than 2 GB) and number of features contained within the file. In addition, we also recommend interpolating EGMS measurements of .csv files onto raster images to reduce resource requirements.

Limitations

Developed for large-scale analyses, EGMS-toolkit suffers from certain limitations regardless of Python scripts. Regarding user computing resources, no limitations on the selection and downloading of EGMS datasets are present. However, the “clipping” and “gridding” post-processing steps can require reading and writing very large files (several GB) – the VRT files can be used for the merging step – and if the user wishes to open the results with GIS software. The user’s computer therefore should have adequate RAM and disk space for these post-processing steps.

On the server side, the EGMS server may reject requests when too many queries are made. To bypass this limitation, the user can increase the sleep time in the toolkit scripts (the default value is 15 s). Another solution may be that the user reruns the workflow to download any missed files.

Another limitation concerns country borders generated by GMT. For some territories (e.g., Ireland), the obtained ESRI shapefile fails the EGMS-toolkit processing: probably due to missing closed polygons. In addition, the accuracy of the country borders must be questioned, especially if the clipping/cropping is carried out on the coasts. In such cases, the user may wish to find an alternative input file or generate their own.

Future improvements

The current version of EGMS-toolkit (version 0.2.10 Beta) is compatible with both releases of EGMS (i.e., 2015–2021 and 2018–2022) and with the first update of the second release. The implementation of a new release of the EGMS-toolkit will be done when the newest EGMS data release will be available at the end of 2024. For updates, the EGMS has decided to add an integer of the end of files based on the update iteration. During our investigations, only update number 1 was found and we are aware that EGMS-toolkit should face to several EGMS updates. We plan to implement this feature in the future. Although the data visualisation currently depends on the user’s own resources and preferences, we also plan to add some functions to quickly display downloaded files: i.e., velocity maps and time series of displacements once the best technology to respect the spirit of large-scale analysis is found.

Finally, EGMS-toolkit is compatible with GIS software such QGIS (QGIS 2024) or ArcGIS Pro (ArcGISPro 2024), via respective Python plugins or Python add-ins. We therefore plan to add these features in the next version of EGMS-toolkit.

Fig. 5
figure 5

Example of EGMS-toolkit processing on Ireland and Great Britain from L3 datasets. The displayed datasets in the vertical ground surface displacement rates are from the 2018–2022 release. Coordinates in EPGS:3035. Basemap Data: Google, Landsat / Copernicus / ©2023 TerraMetrics / ©2023 Airbus / ©2023 Maxar Technologies / IBCAO

Summary

In this paper, we present the new open-source EGMS-toolkit for more efficient downloading, merging, and clipping/cropping of European-Ground-Motion-Service datasets. The toolkit package provides three Python classes and a Python module for: (1) management of the Sentinel-1-Burst-ID map; (2) automatic detection of EGMS datasets within a Region of Interest given by the user; (3) downloading of the required EGMS datasets; and (4) post-processing the downloaded EGMS datasets. The post-processing capabilities of the toolkit include tools to manipulate the EGMS files: i.e., merging several bursts, eliminating InSAR-derived measurement point duplication between bursts, clipping/cropping datasets with the ROI; interpolation of measurement-point observations to raster image to facilitate user investigations of ground surface motions on a large scale.

We illustrate the EGMS-toolkit workflow with the extraction of EGMS datasets over the city of Dublin. Although this region of interest is relatively small, the extracted, merged, and cropped datasets contain ~ 2 million InSAR-derived virtual measurement points for the LOS displacement (L2) data and about 130,000 InSAR-derived measurement points for the vertical displacement (L3) data. Furthermore, we prove the efficiency of EGMS-toolkit for large-scale analyses with an example over Ireland and Great Britain – here a seamlessly merged and cropped vertical displacement (L3) dataset of ~ 10 million InSAR-derived virtual measurement points was generated for both islands, which have a land surface area of nearly 300,000 km2.

In the future, we will continue develop the toolkit to improve and optimise its functions and scripts, including maintaining the compatibility of EGMS-toolkit with new releases and updates of the EGMS. We expect that EGMS-toolkit can enhance ground surface motion monitoring on a large scale, and so aid the investigation and interpretating of observations caused by Earth motions, for a wide range of application such as geohazard assessment and infrastructure monitoring.

Appendix 1: Algorithm example of reducing “duplicated” points

The algorithm used to reduce “duplicated” points gives the best results while optimising computing resources. Figure 6 shows the results with and without the use of the reducing algorithm. Performance varies depending on the location of overlaps and the land types. For the track 001 ascending, the burst and swath overlaps are on the city, where the point density is highest. About 6% (~ 10% in the ROI area) of the points are deleted. The reduction is clearly visible with the point density (see Fig. 6b). The algorithm therefore produces a reasonable and correct reduction of “duplicated” points. For the second track (track 103 ascending), the burst overlaps are in rural areas (outside the greater Dublin urban area), where the point density is relatively low. The number of “duplicated” points is correspondingly smaller at ~ 2% of measurement points (< 1% of points in the ROI area) and the reduction is not easily visible (see Fig. 6d). We recommend that users are aware of this problem and ideally select EGMS datasets (in L2a and L2b levels) with the minimal overlap areas in the ROI coverage.

Fig. 6
figure 6

Results of “duplicated”-point reduction. (a) Point density of the 001-ascending dataset without reduction. (b) Point density of the 001-ascending dataset with reduction. (c) Point density of the 103-ascending dataset without reduction. (d) Point density of the 103-ascending dataset with reduction. Coordinates in EPGS:3035. Basemap Data: Google, Landsat / Copernicus / ©2023 TerraMetrics / ©2023 Airbus / ©2023 Maxar Technologies / IBCAO

Appendix 2: script and command lines used

A set of Python scripts similar to those used for the Dublin example here can be found in the EGMS-toolkit documentation. The command lines used to create the EGMS-derived vertical displacement rates on a large scale were as follows:

  • for Great Britain: EGMStoolkit -l L3UD, L3EW -r 2015_2021,2018_2022 -b GB -o ./EGMS_Data_UK -t < user token> --noclipping.

  • for Ireland: EGMStoolkit -l L3UD, L3EW -r 2015_2021,2018_2022 -b -10.751,51.075,-4.932,55.749 -o ./EGMS_Data_UK -t < user token> --noclipping.

Code availability and requirements

The Python package is available via GitHub: https://github.com/alexisInSAR/EGMStoolkit. The scripts are provided with a local documentation. An online version of the EGMS-toolkit documentation can also be found via GitHub Pages: https://alexisinsar.github.io/EGMStoolkit/. The program language is Python 3 and requires binary versions of GDAL (GDAL/OGR Geospatial Data Abstraction software Library, 2024) and GMT (Wessel et al. 2019; Wessel and Smith 1996). The installation can be done by using pip Python package manager. The scripts have been developed and actively tested on Linux OS (Ubuntu 20.04) and MacOS (Ventura 13.5), and on Python 3.10. The example (given in the documentation) has been tested on Windows system (Windows 11 and Python 3.12). The full package size is around 8 MB (including 7 MB for documentation).