Earth Observation Scientific Data Access, Discovery, and Challenges

The intuitive nature of understanding information portrayed on a map makes them extremely useful to decision-makers. The tools used to interact with web-based maps increase the utility of information products. As more information about our world becomes open to an ever-growing number of people, we have more opportunities to help decision-makers incorporate that information into their processes and workflows. This fact is especially true when considering the vast amount of data—including scientific multidimensional data—that when made easily accessible, could be used more effectively through online maps and apps.

US government agencies such as NOAA, NASA, USGS, USDA and DOE produce large volumes of near real-time, forecast, and historical data that drives climatological and meteorological studies. These data underpin critical operations ranging from short term weather prediction to long term climate projections, covering temperature changes or anomalies, across land or sea and how it is driving sea ice loss (Shrestha et al. 2016).

Modern science is computationally intensive because of the availability of this enormous amount of scientific data, the adoption of data-driven analysis, and the need to share data and research results with the public. Advances in IT and mapping technology have made a tremendous impact on how we ingest, manage, analyze, visualize, and share the complex scientific data that can be consumed by scientists, policy makers and the public.

The volume, variety and complexity of multidimensional earth observation, model and forecast scientific data pose challenges with how it is shared with a diverse community, visualized intuitively, or fused for answering scientific questions. Implementing a spatial data infrastructure for Earth and Space Sciences that is built on top of frameworks that support satellite images, airborne data, climate and weather observations, simulations, and forecasts allows us to manage data, generate science information products, and develop focused tools that support the specific needs of end users (Shrestha et al. 2016).

How Are Maps and Earth Observation Data Transforming Our World?

The challenge of existing and upcoming earth observations scientific data is how to make complicated data available and interoperable for the growing community of potential users from scientists, data wranglers and facility managers to policy makers. Geospatial software is evolving in step with the technology industry to help meet these challenges. Two of the biggest changes in the geospatial community in recent years have been the increase in collaboration, and the growth of cloud computing. In the early days, most GIS analysts and image scientists were individuals working on their own project, on their own data, all on their computers. As technology evolved, people within organizations began to see benefit in sharing, and they developed centralized data storage and some centralized analytic services for use within their organization. Organizations like these, who had adopted a Services Oriented Architecture approach, have been early adopters moving to cloud based solutions, are now sharing their data and services with other organizations. Collaborative communities have emerged that reach beyond corporate or organizational boundaries. This system of systems is a collection of portals, or distributed data and distributed analytics, which can interoperate as a single system, allowing people to collaborate across space and discipline. One example of this is GEOSS, the Global Earth Observations System of Systems, which has 104 member countries and many affiliated research groups. Esri and many of its customers are active participants in GEOSS.

A large factor in this increase in collaboration has been the growth of cloud computing in geosciences. ArcGIS Online is Esri’s fully distributed collaborative environment hosted cloud solution providing data hosting, mapping, and spatial analysis. It is included with every Esri software license, has several million users, and serves over one billion maps per day. Many organizations still want or need to host their own data and applications on their portal. Although some still choose an on-premise solution where they purchase and manage their own hardware, we have seen a large increase in the use of commercial cloud services in the last 3 years. Commercial cloud providers such as Microsoft and Amazon have become more cost effective and easier to implement than building and configuring your own hardware. To simplify setup, Esri provides a collection of predefined machine images to choose from, making it as easy as visiting a webpage, picking the hardware and software configuration you want, and then you can be up and running with a 20 or 200 CPU cluster the same day.

Delivering Geospatial Data and Information Products with Web GIS

The capabilities to share geospatial data and information products to meet the need of varying levels of users is accomplished through Web GIS, which is an architectural approach, for implementing a modern GIS. It is powered by web services that deliver data and capabilities as well as connect components. It can be implemented in the cloud, on-premises, or more typically as a hybrid combination of the two, leveraging the best of both worlds.

Web GIS is a transformation of GIS that brings analytics to spatial data in a way that was not possible before. Previously, spatial data had to be processed, modified, and extracted to answer a set of questions. With Web GIS, the data is transformed into web maps or services that are mashed up with different layers, making the data to dynamically answer questions. It no longer needs to be processed for each individual parameter. Web GIS offers a much more flexible and agile workflow. It brings GIS into the hands of a much larger audience, reduces the need to create custom applications, provides a platform for integrating GIS with other business systems, and enables cross-organizational collaboration.

Web GIS (Esri 2017a) provides the capabilities to let organizations properly manage all of their geographic knowledge. At the heart of Web GIS is a map-centric content management system. It lets you share your data and information products as map services, and other types of services providing the flexibility to enable various access capabilities for user interaction. For example, if you want to make your imagery available through an open, recognized standard, you can enable the Web Coverage Service (WCS) capability on the image service. You can also enable similar capabilities on a map, feature data, analysis tasks, etc. When you publish a service with ArcGIS Enterprise, it exposes itself through the common web service technologies SOAP and REST. This lets you share your web maps using web service URLs, and gives you freedom over how you want your users to interact with your data and information products over the web. In nutshell, map services represent a map that you’ve made available to others on a server, and they are designed to work in many web and intranet scenarios. The same map service may be consumed by several users on different platforms at the same time without those users physically downloading the data. And as mentioned earlier, Esri web services are exposed through SOAP and REST interfaces, and it also supports applicable OGC standards, making them usable in a wide variety of Esri and non-Esri applications.

To help leverage this wealth of dynamic data-driven web services, Esri provides a large suite of configurable open source apps that help people interact with the maps and data on desktops, the web, smartphones, and tablets. This suite includes apps for the field (delivering focused workflows and tools for everyday tasks), apps for the office (which let you view, analyze, create and share maps and location information), and apps for the community (including easily configurable apps like Esri Story Maps as well as an open data portal to share important information with the community). Story Maps are a new approach on telling dynamic map-based stories and interactive narratives in a way that makes geographic information easy to understand. They use geography as a means of organizing and presenting information, and they are designed for technical as well as non-technical audiences. They combine interactive maps with other rich content—text, photos, video and audio—within user experiences that are basic and intuitive (Esri 2017b). By telling the story of a place, event, issue, trend or pattern in a geographic context, they are a great tool for public engagement. There are already hundreds of thousands of these stories created and shared on the web (http://storymaps.arcgis.com/en/gallery/). Most Story Maps are created using simple templates and do not require the user to write any code. But if they are interested in getting fancy and extending the templates, they are open source and available for customization.

Esri and Big Data

Big Data storage and processing is one of the Esri’s important development thrusts and it has become an integral component of the ArcGIS platform. GIS in the age of BigData is about organizing, visualizing, and analyzing massive and diverse data from various sources to make accurate, more informed decisions. There is high demand for using such capabilities to encapsulate BigData by large organizations. Large organizations can manage tremendous amounts of operational data using enterprise systems, creating a useful and overwhelming source of information. They can integrate their geospatial systems with their critical business systems and BigData solutions to analyze massive amounts of data across the enterprise. Esri has used its BigData tools, including ArcGIS Image Server, ArcGIS GeoEvent Server, and ArcGIS GeoAnalytics Server, to accomplish specific objectives in several domains including intelligence, telematics, telecommunications, insurance, advertising and more.

ArcGIS Image Server is Esri’s solution for storing, analyzing and serving massive collections of diverse earth observation imagery and raster geospatial data. It is designed to take advantage of the storage and computational power provided by the cloud with a new approach to distributed storage and distributed computation. The raster data storage leverages Limited Error Raster Compression (LERC) as a cloud variant of the Meta Raster Format (MRF) file format (Baker 2015). The distributed storage of raster data, for example across multiple Amazon S3 buckets, improves parallel computation by minimizing “I/O bound” operations, when CPUs are idle waiting for data, which can happen if the data is very large and the computation relatively simple. The new analysis tools are built as raster functions (Esri 2017c) allowing creation of function chains which are executed as a single process creating no intermediate data, and can be configured to run on selected spatial extents and resolutions against a collection of image files. For example, a collection of Landsat images across Europe and Asia can immediately calculate vegetation index (Esri 2017d) at screen extent and resolution, and every time the map is panned or zoomed, the result is recomputed and drawn to the screen. A large collection of predefined raster functions is also available, as well as the ability to expand by writing your own Python Raster Function. When it is necessary to calculate a full resolution result or process a large collection of imagery, the software also is built to leverage distributed parallel computation in a multi-node cluster. These capabilities are well illustrated by the Landsat Explorer web application which Esri developed and is referenced through the Unlock Earth’s Secret web page (www.esri.com/landsatonaws). This application provides instantaneous access to nearly a half million scenes, or over 500 terabytes of imagery stored on Amazon S3 (Fig. 1). The services are updated with about 600 new Landsat 8 scenes from the USGS every day. The services provide dynamic mosaics of Landsat data with the ability to view and perform analysis interactively over any location at any scale. The services are provided using standard equatorial map projections as well as polar projections covering the Arctic and Antarctic. All imagery is stored once in its native coordinate system and projected and analyzed on the fly. This architecture simplifies data management, reduces storage cost, improves analysis time.

Fig. 1
figure 1

Landsat explorer web application

The value of such a user experience is the ability to instantaneously find and use data and currently receives 1–2 million requests per month. The user does not need to search, select and download, but can directly use in their desktop and web applications. The user can define processing to be applied and instantaneously get the results either visually or as data values without the need to download. The data and analysis capabilities are also available as image services for use in other applications.

The project provides a wide range of image data ingest and management tools, image processing workflows, as well as raster GIS analytics, enabling solutions such as:

  • Automated daily ingest, ortho-correction and serving of current image basemaps for web applications.

  • Segmentation and classification of very large high resolution image collections.

  • Interactive site selection suitability modeling for green infrastructure development for the entire United States at 30 m resolution.

  • Perform terrain analysis such as viewshed, elevation profiles, and watershed delineation anywhere in the world in seconds.

ArcGIS GeoAnalytics Server (Esri 2017e) introduces distributed storage and computation for vector-based feature data and can be used to analyze massive collections of vector data. It supports parallel read/write of spatiotemporal data and supports distributed analysis across multi-node clusters of machines, enabling a new scale of spatial analytics. These tools can analyze patterns and aggregate data in the context of both space and time as well as help answer questions such as:

  • Using millions of emergency calls accumulated over decades, which areas had the highest rates of emergency calls?

  • What are the most popular locations for taxi pickups in Paris, and how is this trend changing weekly?

  • What is the flight path of all US commercial aircraft in 2016, and how many of those paths occurred within 100 km of a no-fly zone?

ArcGIS GeoEvent Server (Esri 2017f) enables real-time event-based data streams to be integrated as data sources in the GIS. Data streams come from fleet vehicle GPS, in situ environmental sensors, and other devices capturing location and measurements. Event data can be filtered, processed, and sent to multiple destinations, allowing you to connect with virtually any type of streaming data and automatically alert personnel when specified conditions occur, all in real-time. It changes GIS applications into frontline decision support applications, helping decision makers respond faster and with increased awareness whenever and wherever change occurs. With ArcGIS GeoEvent Server you can:

  • Ingest high velocity sensor network data and make is available as web services for visualization and analysis.

  • Set alerts for values exceeding predefined values when actions should occur.

  • Geofence areas of interest using existing feature data to detect the spatial proximity of events.

Notable Solutions and Capabilities

There have been several technological advances in recent years that are changing data usage across the earth science community, and driving new innovative research and development. Advances such as increased internet speeds, machine to machine communication, use of Application Program Interfaces (APIs), and the increased adoption of mobile devices have changed how organizations provide access to earth science data utilizing GIS.

With the rise of faster servers and increased internet speeds, the paradigm of downloading data to your local system is being replaced by accessing data through services over the web. These web services provide the capability to visualize the data, change projections, and perform processing and analysis on the fly. Any application and device now connected to the web can access and use these services, providing users direct access to just the data they need in the format they want.

One such use case is the NASA Atmospheric Science Data Center (ASDC) and the Prediction of Worldwide Energy Resources (POWER) Surface and meteorology and Solar Energy (SSE) project. The goal of the SSE project is to make NASA’s solar and meteorological data more readily accessible to the renewable energy community, particularly where such data has proven its value in providing enhancements to energy related decisions support systems. The parameters available through the SSE project are based primarily upon solar radiation derived from satellite observations and meteorological data from the NASA’s Global Modeling and Assimilation Office (GMAO) Goddard Earth Observing System (GEOS) assimilation models, and have been developed through close collaboration with industry and government partners in the solar renewable energy community (Fig. 2).

Fig. 2
figure 2

End-to-End solutions for processing, analyzing and visualizing NASA ASDC scientific data

The ASDC and SSE worked together to develop geospatial web services based on the SSE data products that provided a greatly improved way for users to interact and access the data. The public, commercial organizations, and other government agencies can now integrate these data services from Portal for ArcGIS (https://asdc-arcgis.larc.nasa.gov/portal) directly into their workflows and applications using the services, vastly improving the speed of accessing the data and flexibility in mashing up this data with other data products. The SSE project also developed their own web mapping application (https://asdc-arcgis.larc.nasa.gov/sse) utilizing the geospatial web services and Web AppBuilder for ArcGIS. Web AppBuilder for ArcGIS also natively makes the application responsive, so the same application can be viewed on a desktop/laptop, tablet, or smartphone (Fig. 3). The application provides users with a powerful visualization capability for the SSE parameters as well as providing the ability to download data, create tables for a given point and given time span, subset data by regions, zoom to their current location, measurements, and the ability to switch basemaps. This level of interactivity and instant feedback through the application has resulted in wider access and greater use of the SSE data for scientific research and applications.

Fig. 3
figure 3

NASA ASDC Portal

Conclusion

As the proliferation of Earth Observation data meets emerging technology, it is becoming easier to share data with a rapidly growing and diverse audience. Data providers further enable this by providing not only data, but intelligent web services and information products that are tailored to user experiences. Driving this change is a shift toward open data portals and collaboration, increasingly hosted on commercial cloud infrastructure. Geospatial software is in many ways driving innovation by developing architecture and algorithms for leveraging this vast storage and computational power. The level of IT skill needed to spin up a compute cluster and share useful web services is dropping quickly. This is leading to democratization of geospatial BigData. We see growing excitement and activity in our work and that of our customers, merging image science and spatial science with data science techniques, leading to the emergence of geospatial data science as a new specialty. Combining the increasing Earth Observation data with new capabilities to manage, analyze and serve this data lead to improved decision making for our planet and many opportunities for new discovery.