The OA cyberinfrastructure is implemented using a service-oriented architecture composed of a presentation tier, a services tier and a data tier. The data tier utilizes a hybrid solution for data management, featuring a highly optimized PostgreSQL database with tiered storage and a decoupled object-based system that stores data in HDF5 file format. The services or application logic tier consist of programs and scripts that process data and are accessible as Web services that can be invoked by applications in the presentation tier. Most users interact with the data via the OA portal in the presentation/applications tier. Several components of this 3-tier architecture are further described below.
The OA cyberinfrastructure was developed in multiple stages, with an initial objective of making ICESat data access seamless and efficient. Since the ICESat mission ended prior to OA development, the final version of the entire dataset (GLAS/ICESat L1B Global Elevation Data, Version 34) was made available in OA. The ICESat-2 dataset (geolocated photon data and surface-specific elevations) presented several additional challenges that required re-architecting and modifying several components across all three tiers, with the biggest impact on the data tier. First, the ICESat-2 mission is more complex than ICESat, simultaneously collecting data along six ground tracks versus ICESat’s single ground track. Second, the ICESat-2 laser fires at 10KHz versus ICESat’s 40 Hz. Finally, ICESat-2 data is being collected continuously through the mission, while ICESat collected data during several month-long campaigns each year. As a result, ICESat-2 is generating several orders of magnitude more data than ICESat, requiring a presentation and data management system that can handle discovery, access and processing of massive volumes of data, as well as a highly streamlined data ingestion workflow.
One of the technical goals from onset was to design a service-oriented architecture that was highly modular, to easily accommodate changing requirements. This enabled rapid prototyping and modular development, lowering the time needed for everything from designing the architecture to deploying changes.
Cyberinfrastructure design philosophy
OpenAltimetry was built as a web-based application so that end users would not be required to install or download any applications to their local compute environment. In our experience, most users want to quickly browse satellite datasets in an interactive and easy-to-use interface that provides some ability to download data in their area of interest. A majority of users do not require access to all the information contained in the standard data products generated by the mission. However, they need the ability to conveniently acquire the authoritative source product for their spatial and temporal area of interest if needed. Users also require the ability to quickly visualize elevation profiles for the data products including the waveforms (from ICESat) and photon clouds (from ICESat-2).
ICESat data are derived from individual laser shots acquired along specified reference ground tracks, where each track represents the target path of the satellite over a single orbit of earth. There are multiple unique tracks, each of which was traversed multiple times during the mission. ICESat data were temporally organized into campaigns, which are demarcated by start and stop times. Each laser shot is identified by a unique identifier and is described by a set of attributes such as the laser footprint position (latitude/longitude/elevation), the shot time, and the histogram of reflected laser energy (the waveform) that is collected by the instrument. Multiple shots along a track are aggregated into an elevation profile.
ICESat-2 operates similarly to ICESat, however instead of a single laser beam directed toward the ground track, ICESat-2 has six beams organized into three widely spaced (10s km) sets of closely spaced (100 s m) beam pairs. Additionally, ICESat-2 records the time and position of each surface-reflected photon from a laser shot, versus the time vs. energy waveform of ICESat. For ICESat-2, photon data are aggregated into overlapping 40 m track segments, which are somewhat analogous to ICESat footprints. Each segment is identified by a unique identifier and described by a set of attributes (time, position, etc.), similar to that for ICESat. The 10 kHz laser firing rate for ICESat-2 results in photons being returned continuously along the satellite ground track. The aggregate of all collected photons is referred to as the photon cloud, whose shape reveals fine topographic details of the surface.
OA leverages existing open source software projects for efficiency and reduction in development time compared to building software solutions from scratch. Providing alternate pathways (e.g. APIs) to the data was also deemed important, as we had seen with OpenTopography, since it tends to bring an increase in usage of the data and the development of novel applications independent of the portal environment.
While the OA cyberinfrastructure was initially installed and operated on commodity hardware, it was designed to be readily deployable in a cloud environment. That is, it had to have the capability of being migrated to the cloud without major changes or effort in order to potentially leverage the mature commercial cloud offerings including high scalability, load balancing and other optimizations.
Finally, central to OA’s architectural design was the ability to enable continuous ingestion and rapid availability of ICESat-2 data from NASA’s NSIDC DAAC, as well as the ability to add new data products from other missions in the future.
The OpenAltimetry web portal
Most users of OA interact with the data and visualization features via the application’s portal. The portal’s data discovery interfaces were designed with NASA’s popular EOSDIS Worldview (Murphy et al. 2015) application as a reference in order to provide a recognizable and minimally disruptive user experience. Unlike the NASA EarthData search tool, which must work with hundreds of different datasets within a single user interface, OA takes a more focused approach to data access for a particular class of Earth-orbiting missions: satellite laser altimeters.
In the current implementation of OpenAltimetry, data from the ICESat and ICESat-2 missions have their own map-based data discovery and elevation profile visualization interfaces. This enables us to customize each interface specific to the mission while maintaining a core set of functions that are common and applicable across all missions (e.g. base maps, geographic/polar projection views, etc.). The interactive data discovery user interfaces were custom built using the open source OpenLayers mapping library.Footnote 2 OA uses ESRI ArcGIS World Imagery service for a base layer in the geographic projection view and the MODIS Blue Marble Next Generation layer for the North and South polar projection views. The ICESat and ICESat-2 data are delivered in geographic coordinates. For mid-latitude views on the map interface, we use the geographic projection (EPGS:4326). For polar views, we do projection conversions on the fly using ESRI Web services for the polar projection views: WGS 84 / NSIDC Sea Ice Polar Stereographic North (EPSG:3413) for the north pole and SWEREF99 15 45(EPSG:3031) for the south pole.
OA has implemented the NASA EarthData OAuth module which enables users to login using their EarthData credentials. Once logged in, users are provided additional functions including the capability to download the authoritative and original complete data (subsetted) in HDF5 format from the NSIDC DAAC via their REST API. Users who are logged in can also create and share annotations (referenceable areas of interest with a particular date/campaign selected).
The primary challenge with ICESat and ICESat-2 data is to provide a visual representation of the laser footprints on the map interface without overwhelming the users with huge data volumes. This must be achieved using commodity compute resources, optimized for web-based access. Showing too much data is not useful to the users, overwhelms the client browser, and quickly consumes limited server-side resources.
OA solves this problem by showing a small percentage of data initially and gradually increasing the amount of data with increasing zoom levels on the map interface. The zoom levels, amount of data shown, and functional capabilities vary by mission as well as products within the mission. The amount of data shown at various zoom levels are also pre-determined based on most common client screen resolution data gathered from web analytics. The common capabilities across all interfaces are the rapid response times, availability of visualization tools for plotting elevation data on demand, and ability to quickly download data from a user-selected temporal and spatial area of interest.
ICESat data discovery interfaces
Since the ICESat mission is complete, data for the entire mission is displayed in the map interface. In the ICESat data discovery interface, reference ground tracks are displayed after zooming into a predetermined zoom level, at which point a small percentage of the actual ICESat laser footprints start appearing as blue dots (Fig. 1). This percentage increases as users zoom in to their area of interest, eventually displaying all of the actual footprints. The zoom slider (upper-left of Fig. 1) displays the actual percentage of data being shown.
Footprints can be filtered on the fly by reference tracks or by ICESat observation periods (aka laser campaigns; Borsa et al. 2019). At sufficiently high zoom levels, users can draw a bounding box and download basic data attributes (latitude, longitude, elevation, etc.) from the boxed area in comma-separated-value (CSV) format. For levels of zoom that display 100% of the data, the dots corresponding to the footprints are color-coded by campaign (Fig. 2).
Users have the ability to view individual tracks and their laser footprints, which are color coded to represent their campaigns (Fig. 2). This is useful because the actual tracks of laser footprints do not all fall precisely on the reference ground track due to the imprecisions in the orientation of the spacecraft. Users also have the ability to click on individual footprints to bring up a window with additional footprint metadata, including track ID, laser footprint position (latitude/longitude), surface elevation, campaign and shot time (Fig. 1). This window also shows the energy waveform (i.e. the profile of returned energy) for the footprint.
Finally, when users elect to view data from an individual track, they have the ability to select a region and view the elevation profile for a single or multiple campaigns for the track (Fig. 2). Additional options are also available for viewing combined and individual waveforms of the laser footprints, along with the ability to download presentation-quality plots as well as the underlying data. For all the on-the-fly visualizations and plots in OA, we use the open source (non-commercial use) Highcharts Javascript,Footnote 3 an SVG-based multi-platform charting library.
ICESat-2 data discovery interfaces
The ICESat-2 data discovery interface has a map-based interface similar to that for ICESat (Fig. 3). A small percentage of ICESat-2 segments are shown initially, increasing to 100% as the user zooms into an area of interest. ICESat-2 segments shown in OA vary with the data product from which they are derived. Segment elevations are provided for land ice height (the ATL06 product), sea ice height (ATL07), land and vegetation heights (ATL08), sea ice freeboard (ATL10), ocean surface height (ATL12) and inland water body heights (ATL13). The segments in these datasets have different spatial coverage and can yield different elevations. Documentation on all ICESat and ICESat-2 standard products are available from NSIDC.Footnote 4 Additionally, elevations are provided for the geolocated photons (ATL03) that serve as the raw data for the other data products.
Since the ICESat-2 mission is ongoing, with high volumes of data being continuously collected, we display ICESat-2 data in one-day increments in the map interface. When users initially access the ICESat-2 user interface, data from the latest available date is shown for a default data product, which is currently ATL08. Users can select other data products at the top-right of the OA window, or other dates using the calendar in the sidebar (Fig. 3). There are also controls for selecting particular tracks and individual beams. When a user requests a plot of the elevations from a surface-specific product, they are also given the option of viewing the photon data from which the product was derived (Fig. 4).
The percentage of total segments shown at different zoom levels is pre-determined based on performance testing that considers the most popular screen resolutions from web analytics data. It also varies by data product, since segment spacing can vary by product. Users have the ability to click on individual segments to reveal additional metadata, including its position (latitude/longitude/elevation), ID, beam number and track number. Users also have the ability to visualize the corresponding photon data on the fly for that segment (photon height vs latitude) (Fig. 4).
At a particular zoom level, users have the ability to draw a bounding box and view elevation profiles within the selected area. This opens up a new browser window that displays the elevation profiles of the selected data products within that area. In case of ATL08 canopy heights are also displayed and in the case of ATL10 sea ice freeboard is displayed in addition to ice surface elevation. Plots of the return signal photons (Fig. 4) are also available in the new browser window. Since the number of photons can be extremely large, only a sampling of the photons is typically displayed. Users can optionally request to visualize all photon data.
Similar to the case with the ICESat interface, users have the ability to download data as comma-separated values (CSV) for any of the available mission data products, subsetted by the spatial and temporal area of interest. If logged in using EarthData login, users are allowed to download the original HDF5 file from NSIDC DAAC via its REST API, subsetted to the user’s area of interest.
Another powerful capability of the OA portal is the ability for users to add annotations to the datasets. Users can tag specific data views (spatial and temporal) as a persistent URL for easy sharing and collaboration. Annotations also provide the capability to leverage community input in identifying areas of interest and anomalies in the data. Finally, we have also integrated MODIS daily true-color surface product from NASA GIBS (Murphy et al. 2015) to allow users to see cloud and surface conditions for the day of any ICESat-2 acquisition. Users simply toggle a satellite imagery basemap from a button in the sidebar, eliminating the need to access NASA Worldview to view these images.
Capture of user metrics and usage analytics are a key component of OA and have been built into the system from the very onset. These include details on spatial queries performed, data downloads, the number of elevation and photon visualization plots generated, including breakouts by individual products, the number of callbacks to the original HDF5 datasets at NSIDC and spatiotemporal “hot zones” of high data usage, among others. User metrics and usage analytics are vital not just for architecture design and optimization but also as a measurement of success.
OA data management system
OA utilizes a hybrid solution for data management with a highly optimized PostgreSQL database featuring PostGIS for geographic objects support on tiered storage and a decoupled object-based storage system for storing HDF5 files. The ICESat waveform energy data files and ICESat-2 photon data files are stored in their original HDF5 format. We ingest only the few data elements from the original ICESat-2 HDF5 files needed for OA visualization. This dramatically shortens the data ingestion process, enabling rapid turnaround of newly available data. An added advantage of decoupling the object storage is the ease of porting this data management system to cloud distributed resources for scalability.
The core software component of OA that makes this storage strategy possible is the open source JHDF5 (HDF5 for Java), a Java binding for HDF5,Footnote 5 that extracts waveform and photon data from the HDF5 files on the fly. JHDF5 gives us the capability to fetch specific blocks of the data based on their index, unlike most other libraries (e.g. HDFQL, official HDF5 java), which require loading entire data into the memory and then extracting the needed data. This significantly improves memory usage and performance, vital for rapid response times in the application.
In order to further speed up response times for low zoom levels, we extract a decimated subset of the data products and place them into the PostgreSQL database. Sampling rates vary by product density, ranging from 1 in 2048 for ATL06 and ALT08 to 1 in 128 for ATL12. This eliminates the need to read multiple HDF5 files for showing data footprint locations at global zoom levels. OA performance is thereby improved while adding minimal overhead to the data loading process and minimal additional storage space.
ICESat-2 data pipeline from NSIDC
OA has a data pipeline established with the NSIDC DAAC to ingest the geolocated photon data product (ATL03) and other surface products (ATL06, ATL07, ATL08, ATL10, ATL12, ATL13) as they become available at NSIDC. The data pipeline is implemented by querying the NSIDC API service with the data product, temporal range and subset of data elements required. The NSIDC service endpoint utilizes URS authentication for their EOSDIS Service Interface results, which satisfies NASA’s requirement to collect metrics on all data access and downloads. Once the request is received, the NSIDC services generate subsetted files for download. These data are then transferred and ingested into the OA data management system using several extract-transform-load (ETL) scripts. The data pipeline with NSIDC is further optimized by parallelizing the data pulls using asynchronous mode. This reduces the overall ETL time at least by half (this varies according to the computational demand at NSIDC) by simultaneously downloading different sections of the dataset in parallel.
The OpenAltimetry API
We have implemented application programming interfaces (APIs) to allow for programmatic access to all data products available in OA. API-based access to OA data products provide a flexible platform for researchers to design complex data access and processing workflows independent of the processing tools available within the OA portal.
APIs in OA are implemented using the open source Jersey RESTful Web Services framework.Footnote 6 The open source Swagger framework is also used for documenting and consuming these RESTful services.Footnote 7 We have also enabled pre-formatted API endpoints for every elevation plot, where users can conveniently copy and paste the API endpoint URL into their application and pull corresponding data with the subsetted spatial and temporal parameters.
Data outputs are available in multiple formats (including CSV and json) to boost the use of OA data directly in third party applications such as Jupyter notebooks, as well as to promote development of independent algorithms and functionality. To make it more convenient for users, we provide relevant API endpoint URLs under the elevation and photon plots in the user interfaces, as well as links to sample Jupyter Notebooks (Fig. 5) that make use of these endpoints to pull in data.