17.1 Introduction

In the face to accelerated environmental change, being able to assess different aspects of plant biodiversity, such as those related to, e.g., the productivity or health of an ecosystem, repeatedly at large spatial scales, is increasingly important. The recent decade has seen an explosion of in-situ databases necessary to assess such patterns and processes, often cover large parts of the Earth (e.g., plant functional traits, phenology (PhenoCam networks)), and integrating this data with remotely sensed products enables assessment at critical scales which would otherwise be impossible or extremely costly to do.

However, RS data comes with limitation of their own, and despite of the many opportunities offered by RS data, certain aspects and scales of biodiversity are currently not measurable using RS technology alone. Thus, the combination of RS, in-situ and other auxiliary data, provides the most powerful approach to assessing ecosystem functioning and conservation at large spatial scales.

The means and uses of RS to link biodiversity with other relevant ecosystem metrics are manifold, and some are covered in detail in other chapters within this book. A compilation of some of the major RS data types and sensors used for vegetation analyses at the landscape scale are provided in Fig. 17.1. Here, we discuss strengths, weaknesses, and caveats of linking RS with in-situ data to address the themes of ecosystem functioning and conservation of biodiversity.

Fig. 17.1
figure 1

Non-exhaustive compilation of some of the major RS data types and sensors used for vegetation analyses at the landscape scale, revisit times (line style and symbols), spatial resolution (from low resolution at the top to high resolution at the bottom), sensor types (colors), and yearsFig. 17.1 (continued) active (x-axis). For airborne data, it was assumed that theoretically, data can be available for the past although these data will be very sparse and not easily accessible. In all cases, highest possible spatial and temporal resolutions are shown. This might not be the case for all bands covered by these sensors. Where satellites are currently active or proposed, they are presented here as being active into the future; this is subject to change and should be reviewed frequently. Codes after sensor names: “not continuous, ∗∗ ATSR-1/2 also contained a Microwave (MW) sensor, ∗^ 16-day revisit with one satellite, 8 days if using data from both, ∗^∗ 2 satellites proposed—details specific to Biosphere observations, ∗∗∗ Film and digital. Green and red boxes refer to the examples given in the text, enclosing MODIS and ALOS PALSAR sensors. Other than Kite, UAV, and declared airborne methods, all instruments are either satellites or sensors carried by a satellite. Many satellites have a payload of a range of instruments; where this is the case, hyperspectral or multispectral units have been presented. Many satellites also carry panchromatic sensors, which are not represented in this figure. (For more information, see Toth and Jóźków et al. (2016) and Khorram et al. (2016))

17.2 Ecosystem Functioning

Pettorelli et al.’s (2017) framework for monitoring ecosystem functions at all scales lends itself well to the flexibilities and strengths provided by RS. We consider ecosystem functions as those attributes related to the performance of an ecosystem that are the consequence of one or more ecosystem processes (Lovett et al. 2005). With respect to plant diversity, the attributes underpinning functions that benefit plant species (and indirectly fauna and humans) are crucial. Such functions include pollination, water regulation, disturbance regulation, supporting habitats, and biological control. The measurement and monitoring of these ecosystem functions often relies on a remotely sensed proxy. For example, the ecosystem function of greenhouse gas regulation could be monitored with RS-based measurements of emissions from fires, as provided by Moderate Resolution Imaging Spectroradiometer (MODIS) and expected from missions such as Environmental Mapping and Analysis Program (EnMap) and the Surface Biology and Geology imaging spectrometer (currently under planning to replace the cancelled HyspIRI mission). The advantage of such sensors is their moderate to high spatial and temporal resolution that creates dense time series. Analyzing ecosystem functions over large scales can provide information on drivers of species diversity and abundance change, as well as aspects that affect human well-being such as those related to ecosystem services (ES ). Ecosystems provide regulating, provisioning, and cultural and supporting services to society, such as nutrient regulation, the provision of food, and waste treatment (De Groot et al. 2002; Ma 2005). Detailed techniques for mapping ES at large scales (Englund et al. 2017) and rapid assessments (Meyer et al. 2015; Cerreta and Poli 2017) are discussed elsewhere. Although ES functions and processes are closely related, “services” implies inherent contributions to humans and an attached monetary or cultural value (De Groot et al. 2002). All ESs are under increasing threat due to pollution, overexploitation, land use change, and climate change, with concerns of overstepping the “safe operating space for humanity” (Rockström et al. 2009).

We choose to focus on a few contrasting ecosystem functions in this chapter, with particular focus on those that plants provide or require for survival and fitness; further details on other ecosystem functions can be obtained in Pettorelli et al. (2017). The narrative below focuses on satellite RS measurements because they are widely accessible and offer a relatively inexpensive and verifiable means of deriving data with complete spatial coverage in a consistent manner over large areas with (near) appropriate temporal resolutions, thus offering great potential for tracking change in ecosystem functions (Cabello et al. 2012; Nagendra et al. 2013). However, these satellite measurements can (and should) be integrated with a host of other data sources that are remotely acquired; these are also discussed where appropriate.

17.2.1 Pollination

The transport of pollen between plants is crucial for reproduction. Different pollination types exist with varying distributions in space and time. The ecosystem function of pollination is under varying threats especially with declines in abundance and the loss of the organisms providing the service (e.g., bees) (Vanbergen et al. 2013; IPBES 2016). Thus, up-to-date information on pollination is extremely important.

Two distinct RS approaches can be used to study this function: (i) direct RS of different pollination types and (ii) remotely sensed indicators of pollination (i.e., vegetation phenology or biomass as an indicator). The former is challenging because many pollination traits cannot be directly measured with RS sensors due to the signal contribution of some pollination traits (nectar content, flower structure, etc.) being too low relative to other surface components. Alternatively, Feilhauer et al. (2016) posited that different pollination types might be inferred from leaf and canopy optical traits (leaf area index, leaf tilt angle, mean canopy height, cover, specific leaf area, leaf dry matter content, leaf dry mass; see also Olinger 2011), allowing for an indirect classification of plant pollination types. Using data acquired by an airborne hyperspectral sensor, pollination types were related to canopy reflectance in a way that allows their discrimination, opening the potential to expand this approach to other ecosystems and different phenological stages. Such an approach should benefit from upcoming satellite missions equipped with hyperspectral sensors, for example, the German EnMAP and the Chinese GF-5 Hyperspectral Imager (Fig. 17.1). Further, pollination traits such as floral display size may have too little of a signal to be discernible from other surface components with RS, although several studies have been able to detect flowers with hyperspectral airborne sensors and use the information to, for example, detect invasive species (see Bolch et al., Chap. 12, in this book).

Plant phenology is another ecosystem function directly linked to pollination, since any change in the phenological cycle may affect interactions such as competition between plant species and mutualism with pollinators (Buitenwerf et al. 2015). Any direct alteration in functional or taxonomic plant diversity as a result of change in vegetation phenology is further compounded by both short- and long-term climatic changes. RS has demonstrated capacity for measuring and monitoring vegetation phenology (Cleland et al. 2007) and thus indirectly detecting pollination (Neil and Wu 2006). In the next section, we describe how RS can be used to measure phenology.

17.2.2 Phenology

Recent climate change has shifted phenology and associated species distributions globally. This has in turn increased the risk of extinction for affected species through the alteration of development rates of species or by altering the timing of environmental cues that affect a species’ presence in the community (Yang and Rudolf 2009). Ongoing climatic and phenological changes are expected to further increase this risk. Moreover, the extent to which changes in vegetation phenology will feed back into the climate system by modifying albedo and hence cloud formation is a major source of uncertainty in climate change projections (Zhao et al. 2013). Recent work on land surface phenology has focused on assessing changes in phenology globally (rather than regionally) over long time periods as is now afforded by the Earth Observation (EO) archive (Ganguly et al. 2010). Buitenwerf et al. (2015) used phenomes (83 phenologically similar zones) in their global study of 32 years (1980–2012) and showed via metrics extracted from the normalized difference vegetation index (NDVI ) record that most of Earth’s land surface has undergone some form of change in the seasonal pattern of vegetation activity. Other studies used alternative indices, such as the MERIS Global Vegetation Index (MGVI) and Terrestrial Chlorophyll Index (MTCI), from the now defunct Envisat platform and MODIS Enhanced Vegetation Index (EVI). Indeed, comparison between these indices for a temperate deciduous forest showed that MTCI corresponded more closely with vegetation phenology from ground observations and climatic proxies than any of the other indices (Boyd and Foody 2011). This finding suggests that the Envisat MTCI is best suited for monitoring vegetation phenology, advocated by its sensitivity to canopy chlorophyll content, a proxy for the canopy’s physical and chemical alterations occurring during phenological cycles (Boyd et al. 2012). These studies also point to the value of increasing both the temporal and spatial sampling of vegetation phenology. Limitations to spatial resolution mean that satellite-derived data represent land surface phenology rather than direct vegetation phenology and are therefore too coarse to detect critical individual, species, or community-level responses. Further improving the temporal sampling means that those challenges to using satellite data, including high sensitivity to effects of clouds and atmospheric conditions, could be overcome. With the recent launch of the ESA Sentinel-2 and Sentinel-3 , this improvement in data characteristics is assured to go forward.

Key to applying data from these or any satellite for the derivation and study of phenology, however, are accurate calibration and atmospheric correction to obtain surface reflectance data. National Oceanic and Atmospheric Administration (NOAA) satellite sensors have legacy calibrated data. For the Landsat and Terra/Aqua satellites, calibrated top-of-atmosphere radiance and surface reflectance products are delivered directly to the users. While Sentinel satellites are calibrated, users are provided with top-of-atmosphere reflectance and have to perform the atmospheric correction (using plug-ins such as sen2cor (http://step.esa.int/main/third-party-plugins-2/sen2cor/)). That said, Vuolo et al. (2016) have demonstrated very good agreement between calibrated Sentinel-2 and Landsat-8 data for six test sites. A Harmonized Landsat and Sentinel-2 land surface reflectance data set is now readily available (Claverie et al. 2018).

The requirement for high-temporal-resolution RS data has, since 1982, been provided by the NOAA Advanced Very High Resolution Radiometer (AVHRR) sensor and its successors, with much regional scale analyses undertaken using NDVI (Reed et al. 2009). Capturing the seasonal pattern of photosynthetically active radiation absorbed by the land surface, using repeated measures of vegetation indices such as NDVI throughout the year for an area of interest, allows depicting the cycle of events that drive the seasonal progression of vegetation through stages of dormancy, active growth, and senescence. Several phenological metrics can be extracted from the temporal sequence, relating to leaf-on and leaf-off, length of growing season, peak of growing season, trough, and measures of seasonal amplitude (integral, trough), and their patterns over time and space. Specially written and customizable open-source software, such as TIMESAT (Eklundh and Jonsson 2015) and QPhenoMetrics (Duarte et al. 2018), afford some robustness to the study of phenology.

With this emphasis on using satellite RS of land surface phenology to assess changes in vegetation, validation of extracted metrics is imperative. Three principal approaches are suitable, all of which use observations taken at ground level. The first relies on collaboration between experts to ensure suitable spatial and temporal coverage. The PEP725 ground phenology database generated as a result of the Pan European Phenology (PEP) project (a European infrastructure to promote and facilitate phenological research, education, and environmental monitoring) is one example (Templ et al. 2018). The second approach is an extension of the first and uses citizen science projects where the public exploits Web 2.0 technologies and contributes ground-based observations (Kosmala et al. 2016). The idea of citizens as sensors is not new, but full use of their input still requires effort. This is the focus of a current European Union (EU) Horizon 2020 project, LandSense (https://landsense.eu/). The third approach uses proximal sensing (often automated systems) to provide detailed information at particular locations (e.g., traffic cameras, Morris et al. 2013; archived TV video footage, De Frenne et al. 2018; and webcams, e.g., the PhenoCam project, Richardson et al. 2018). However, as Brown et al. (2016) pointed out, these technological advances present challenges with respect to data standards. These authors suggest that continental-scale ecological research networks, such as the US National Ecological Observatory Network (NEON) and the EU’s Integrated Carbon Observation System (ICOS), can serve as templates for developing rigorous data standards and extending the utility of PhenoCam data through standardized protocols for ground-truthing.

17.2.3 Carbon Storage

Biomass estimates are fundamental to estimate carbon fluxes and stocks and link them to carbon credit initiatives. Active RS such as laser scanning provides fundamental data on canopy structure useful to include in allometric equations to estimate biomass (Chave et al. 2014). For example, the analysis of first returns from laser scanning is used to estimate canopy height (Bouvet et al. 2017). Recent Terrestrial Laser Scanning (TLS) technology shows promising advances in more accurate estimates of biomass because it allows for a first-order estimate of diameter at breast height. Further, TLS analysis is able to provide more precise representation of tree structure, allowing for moving beyond the assumed cylindrical shape of a tree trunk used for many biomass estimates. Novel methods also make it possible to fit geometric shapes along not only the trunk but also the branches and stems, giving a more precise estimate of the woody component of trees and producing more reliable estimates of biomass.

Another area of active research in carbon storage and credits is the estimation of tree cover and the number of trees per pixel, because even if the biomass per tree is correctly estimated, it is scaled to regional or global estimates by a multiplier of tree density. MODIS offers a tree cover product (as a layer within the Vegetation Continuous Fields product) systematically and repeatedly at a global scale, but this product makes some assumptions on the minimum cover of trees needed to be detected by the MODIS sensors. In addition, it is important to estimate the contribution of non-tree functional types such as shrubs and grasslands to the carbon storage and credits calculations. Current assessments of the performance of reducing emissions from deforestation and forest degradation (REDD+) programs have shown varied success; most are linked to the quality of the biomass estimates, which are fundamental to calculate the carbon potential of a given ecosystem as a fraction (often assumed around 1/2) of its biomass.

17.2.4 Challenges

Although there is a clear role for RS for monitoring ecosystem functions (see Serbin and Townsend, Chap. 3; Martin, Chap. 5), there are still many challenges to be overcome to ensure its full potential is realized. Many acknowledge the lack of an acceptable framework that brings together the many proxies for ecosystem functioning that can be directly remotely sensed (e.g., Asner and Olinger 2009; De Araujo Barbosa et al. 2015). But fundamentally there is a need to improve the RS estimates of the many proxies that are used to infer the ecosystem functions of interest. Developments in methodologies for processing, analyzing, and interpreting RS data will serve to improve the mapping accuracy and monitoring opportunities. However, those developments cannot exist in isolation; a dialogue between computational scientists and those concerned with ecosystem functions must occur for the full potential of RS to be realized (Cabello et al. 2012; Paganini et al. 2016). Coupled with this is the need for advances in sensor technologies that will enable accurate, timely data with the right thematic, i.e., where interpretation of the sensor’s raw data can provide the information necessary for applications in ecosystem function and biodiversity. Further, data should be open access, maintained, and interoperable (Pettorelli et al. 2017), particularly given the plethora of existing local to regional scale data capture initiatives by airborne methods based on drones or airplanes (Cord et al. 2017). Finally, RS proxies will often need to be combined with field measurements to accurately represent the desired ecosystem function (e.g., Tong et al. 2004). Indeed, joint analyses of satellite data with in-situ measurements or process measurements in the lab may be essential steps to the refinement and increased capacity and utility of satellite-based indicators for ecosystem function monitoring (see also Meireles et al., Chap. 7, in this book). This is likely to be a nontrivial task, particularly in highly dynamic situations.

17.3 Conservation

Global environmental change has led to major losses, changes, and erosion of biodiversity and ecosystem function and counteracted to some extent by conservation action. Conservation science focuses on understanding the distribution of organisms, their rarity status, the viability of populations, drivers and disturbances, and current and future restoration need. RS has been increasingly used to answer conservation science questions and applications (Rose et al. 2015), namely, species mapping (see Chaps. 9, 10, and 11 in this book by Record et al.; Paz et al.; Pinto-Ledezma), biodiversity monitoring (Feret and Asner 2014; Rocchini et al. 2017), detecting invasive alien species (see Bolch et al., Chap. 12, in this book), assessing vegetation condition, monitoring carbon storage and credits, and assessing habitat extent and condition (see Record et al., Chap. 10, in this book), among others (Bustamante et al. 2016; Lawley et al. 2016; Niphadkar and Nagendra 2016; Reddy et al. 2017). Here, we focus on aspects of conservation related to the abovementioned measures of ecosystem function at large scales.

17.3.1 Biodiversity Monitoring

RS has long been recognized as useful for biodiversity measurement (Stoms and Estes 1993; Turner et al. 2003; Turner 2014), and more recently it has emerged prominently in biodiversity monitoring, through essential biodiversity variables (Pereira et al. 2013; Skidmore et al. 2015; Pereira et al. 2015). Rose et al. (2015) identified the top ten applications of RS in conservation, namely, for species distribution and abundance, movement and life stages, ecosystem processes, climate change, rapid response, protected areas, ecosystem services, conservation effectiveness, changes in land use/cover, and degradation and disturbance regimes.

Beyond land cover classification, one of the most extensive uses of RS data is to produce distribution maps of species, communities, and ecosystems (Kerr and Ostrovsky 2003). Most RS studies of biodiversity focused on mapping species using all kinds of information from optical, radar, and Light Detection and Ranging (LiDAR ) data. Nagendra (2001) reviewed the potential of RS data for assessing biodiversity, namely, species mapping and species diversity and habitat mapping, and concluded that at the time the most feasible application of RS would be to map species distributions and habitat, the former at smaller spatial scales and the latter at larger scales. RS data such as those from Landsat and other multispectral sensors have been widely used to map species and vegetation communities (Xie et al. 2008). There is now a growing literature on identifying species in many case studies. However, use of imaging spectroscopy for species identification needs to be understood at a more fundamental level—especially the development of generalized methodologies and rules for detection and mapping, which is an area of active research today. Conceptually, we have yet to resolve how to identify unique spectral signatures for the estimated 400,000 extant plant species or groups of species (i.e., functional types or optical types; Ustin and Gamon 2010). In contrast to geologic minerals, which are often spectrally distinct, all land plants share a common basic metabolism and biochemistry. This fundamental similarity makes identification of plant species difficult. The interactions of a spectral signal with environmental conditions and shifts in spectral signatures through phenological stages contribute to spectral variation, in addition to the characteristic properties of individual species (Ustin and Jacquemoud, Chap. 14). In recent years, with the advent of hyperspectral sensors and the fusion of these data sets with other auxiliary data, novel avenues to map and monitor biodiversity have emerged. These new data sets make it possible to directly discriminate species in terrestrial and freshwater ecosystems (Jones and Vaughan 2010; Turak et al. 2017; Choa et al. 2012; Fassnacht et al. 2016) and assess relationships between the diversity of spectra and the diversity of species and the fundamentals of the spectral diversity hypothesis (i.e., that the diversity of spectral profiles generally predicts diversity of species, Nagendra 2001), which are presented in other chapters of this book (see especially Schweiger, Chap. 15; Cavender-Bares et al., Chap. 2). A recent study highlights the relation of spectral diversity to functional and phylogenetic components of biodiversity (Schweiger et al. 2018), and this topic is covered in more detail in Meireles et al. (Chap. 7).

Another avenue in which RS can contribute to biodiversity monitoring is through its use in species distribution models (SDMs) (see Pinto-Ledézma and Cavender-Bares, Chap. 9; Paz et al., Chap. 11). SDMs are empirical statistical approaches that predict the spatial distribution of species (Guisan and Zimmermann 2000), and the choice of environmental predictors is fundamental for SDM. RS measurements of vegetation condition (Turner et al. 2003), ecosystem productivity (Running et al. 2004), and seasonality (Reed et al. 1994), among others, are now available over time series (e.g., Landsat time series; Kennedy et al. 2014) and might be used in SDMs (Bradley and Fleishman 2008; He et al. 2015) making it possible to predict species distributions over time. Although the use of RS in SDMs is widely advocated and applied, it has yet to be scaled to most species, especially non-plant taxa. Promising progress toward the inclusion of RS products in SDMs includes responses to nutritional value (Sheppard et al. 2007), food resources (Coops et al. 2009), seasonal variation (Bischof et al. 2012), and combined effects of climate and land use changes (Santos et al. 2017). Upcoming sensors are expected to provide even better and more diverse measurements of ecosystem processes and other information that might be relevant to map species distributions at finer spatial, temporal, and spectral resolutions (e.g., Sentinel satellites; Berger and Aschbacher 2012). Further, through SDMs we can expand our capacity to monitor taxonomic groups beyond plants, providing a broader understanding of the dynamics and dimensions of biodiversity, its feedbacks and interactions, and its change.

Yet another way in which RS can be useful in biodiversity monitoring is to detect animals using unmanned aerial vehicles with visible and thermal sensors and LiDAR data. Nagendra et al. (2013) concluded that despite the potential for RS in monitoring habitat, the integration has not happened yet because of technical challenges of conducting and accurately interpreting image analyses, insufficient integration between in-situ data and expert knowledge RS data, and lack of funding and platforms that provide such services and capacity in an accessible way globally. However, progress in this domain includes the provision of environmental data layers from RS sources in Movebank, a major animal movement data repository. Environmental data are directly extracted at the recorded locations and interpolated to the date and time of each GPS fix (Dodge et al. 2013). This allows animal movement ecologists to easily extract environmental variables co-registered in space and time with their animal location data.

Current RS capabilities allow for improving species mapping and monitoring such as the local species pool (i.e., alpha diversity, Feret and Asner 2014) as well as to move beyond species (Jetz et al. 2016) to measure and monitor other components of diversity such as compositional turnover (i.e., beta diversity, Leitão et al. 2015; Schwieder et al. 2016; Rocchini et al. 2017). There were a few attempts to map species richness using Landsat TM multispectral data to calculate NDVI as a proxy for species richness with limited success (Gould 2000). The spectral resolution of imaging spectrometer data today is sufficiently fine to implement and test the spectral diversity hypothesis (Feret and Asner 2014) because plant species exhibit a set of traits that respond to light at different wavelengths (plant optical types, Ustin and Gamon 2010). Novel findings, however, show some limitations to the application of the spectral diversity principles at larger spatial resolutions (Schmidtlein and Fassnacht 2017), and more studies are needed to identify challenges and opportunities of this approach to mapping and monitoring biodiversity.

17.3.2 Vegetation Condition

There have been many efforts to move beyond species assessments toward functional aspects, which include vegetation condition. The current wealth of time series data from sensors like NOAA AVHRR, MODIS, or Landsat allows measurements of ecosystem phenology, seasonality, and changes in onset of seasons and assessment of ecosystem condition. Vegetation condition is the measurement of the vegetation response to stress (Liu and Kogan 1994). High or good condition corresponds to green, photosynthetically active vegetation, while stress such as water and nutrient limitations, pest outbreaks, and fire results in low or poor condition, after accounting for seasonal changes.

Physiologically, plants respond to stress by reducing chlorophyll activity and subsequently expressing other pigments. Plants also respond to stress by closing their stomata and reducing gas and water exchanges, which results in cells becoming turgid under water stress. These responses can be measured in the RS signal in both the visible and near infrared (NIR). In the visible the signal switches from a weak to a stronger reflectance signal due to chlorophyll absorption of red and blue wavelengths. Water stress can be measured in the NIR because as cells become turgid, they increase scattering of NIR radiation and therefore change the measured signal. One advance toward systematic measurements of vegetation condition is the Australian BioCondition (Lawley et al. 2016). This approach provides a framework to systematically assess terrestrial biodiversity condition—“[t]he similarity in key features of the regional ecosystem being assessed with those of the same regional ecosystem in its reference state”—using attributes like fraction of large trees, tree canopy height, recruitment of canopy species, native plant richness, size of patch, and connectivity. Thus, such in-situ measurements of vegetation condition can be linked to RS estimates to better provide an assessment of an ecosystem’s stress level and ability to function and to provide services like habitat provisioning. In the next section, we will cover the later issue.

17.3.3 Habitat Intactness and Critical Transitions

Habitat intactness may be defined temporally or spatially as either (i) the degree to which the condition of the vegetation that forms habitat has not changed beyond what is expected from natural processes such as phenology and other dynamics or (ii) the spatial pattern of a given habitat, its degree of connectivity or fragmentation, and its edge extent. An active area of research on the potential of RS for conservation is the assessment of habitat intactness. Nagendra et al. (2013) employed RS to estimate how habitat has been changing in several regions of the Western Ghats in India. Coops et al. (2008, 2009) developed a dynamic habitat index using time series satellite data and showed its potential to monitor habitat condition in Canada. Later the approach was expanded to other regions, and dynamic habitat indices have been used as predictors of the richness of other taxa (Hobi et al. 2017). Coops et al. (2018) have now expanded it globally and have shown how these data sets mimic global biodiversity patterns. These data sets are currently available at http://silvis.forest.wisc.edu/data/DHIs-clusters/ and can be very useful to monitor protected area performance.

RS data are increasingly applied across large spatial scales to study stable state conditions of habitats and assess early warning signals for catastrophic shifts. For example, the relationship between stable ecosystem states and rainfall can be inferred from the global probability density of forests extracted from remotely sensed forest cover and its relation to rainfall (Verbesselt et al. 2016). Temporal autocorrelation of NDVI time series and vegetation optical depth from radar over tropical forests indicated a reduced rate of recovery (critical slowing down) when tall canopy trees of intact forests under decreasing rainfall approached a tipping point, inducing high mortality. The rainfall threshold was at a similar level as the one indicated for stable state transitions from the spatial analysis. Similar to the temporal warning signals, the patch size distribution of spatially patterned ecosystems, such as arid ecosystems, showed a meltdown when approaching extinction (Dakos et al. 2011). High-resolution RS data across large spatial scales and patch size analysis can therefore be used to assess the extinction risk of these vulnerable ecosystems under decreasing rainfall conditions.

17.3.4 Protected Area Monitoring

RS plays an essential role in monitoring natural ecosystems, especially in protected areas. Human pressure over these areas has changed dramatically over the last decades (Geldmann et al. 2014), justifying a need for monitoring. Perhaps the two most influential papers that first demonstrated the usefulness of RS to monitor biodiversity in protected areas were Liu et al. (2001), which showed ecological degradation in protected areas designed to protect giant pandas (Ailuropoda melanoleuca), and Asner et al. (2005), which mapped deforestation in the Amazon with Landsat data and showed large rates of deforestation within legally designated protected areas. In 2007, the journal Remote Sensing of Environment published a special issue on monitoring protected areas in which a series of papers provided a framework for establishing monitoring programs, presented techniques and methods to make operational the use of remotely sensed data in protected area monitoring, and showcased a few examples linking remotely sensed data to models used to inform ecological assessments (Gross et al. 2008). RS can aid monitoring of many aspects of biodiversity (Cavender-Bares et al., Chap. 2; Gamon et al., Chap. 16) and ecosystem functioning within protected areas, including forest extent, land use/land cover change, local species pool and turnover, invasions (Bolch et al., Chap. 12), and carbon dynamics. The technical aspects of how RS can address some of these issues are presented in other chapters; here we review a few selected examples.

One of the major uses of RS in monitoring protected areas involves assessing land cover change and dynamics, for example, due to anthropogenic or natural disturbance. Liu et al. (2001) used Landsat data to estimate changes in forest cover and giant panda habitat before and after a reserve was created and showed how the Wolong Nature Reserve was becoming progressively more fragmented and how this resulted in crucial loss of habitat for the giant panda. Koltunov et al. (2009) showed how selective logging in the Amazon region led to different forest dynamics and to land cover change. Asner et al.’s (2005) seminal work was followed by global maps of deforestation and was also produced using the Landsat archive (Hansen et al. 2013), which have been used to track deforestation dynamics and development frontiers (Potapov et al. 2017). These data sets are now freely available for initiatives such as the Global Forest Watch (https://www.globalforestwatch.org/), in which the dynamics of deforestation can be monitored within and outside protected areas. These initiatives are fundamental to downscale global biodiversity goals to local scale action (Geijzendorffer et al. 2018). The quality of the forest classification in the Hansen et al. (2013) product is, however, limiting. In this data set, any pixel with >25% tree cover is considered a forest, and there is no distinction between naturally occurring forests and planted forests (e.g., eucalyptus or oil palm plantations) or forests planted for REDD+ programs. While these readily available products are fundamental to monitoring protected areas, they still rely on careful interpretation of their results on the ground. Asner and Tupayachi (2017) showed the extent of mining in the Amazon, and within this system, road development has long been shown to lead to deforestation and land cover changes within and outside protected areas. For example, Gude et al. (2007) showed how land use change around Yellowstone National Park could increase the risk to biodiversity inside the park. Svancara et al. (2009) showed areas surrounding US national parks are more protected and natural than areas farther away but had higher human population density and subsequently higher conversion risk for the parks’ ecosystems and natural processes.

Such changes in land cover might result in changes in habitat availability and quality both outside and within protected areas. For example, Taubert et al. (2018) showed global patterns of tropical forest fragmentation follow a power-law distribution, which suggests that tropical forest fragmentation is close to a critical point. Santos et al. (2017) showed how the extent of habitat for small mammals in Yosemite National Park has changed in the last 100 years and that these habitat changes might in some cases counteract negative effects of climate change on species persistence. Platforms such as the Global Forest Watch (mentioned above) and the Global Surface Water Explorer from the EU Joint Research Centre show ways by which this integration may be achieved. Novel satellite configurations also show unexpected potential to monitor ecosystems and their responses to disturbance, providing potential new avenues for further integration at conservation-relevant scales.

17.3.5 Challenges

Two main challenges are more immediate in the conservation applications of RS. The first is that it is important to move beyond considering biodiversity as only number of species, and novel approaches looking at the four dimensions of biodiversity (genetic, species, function, and ecosystem structure) are necessary. Jetz et al. (2016) provide a framework for such an approach, and several chapters in this book (see Record et al., Chap. 10, in this book) already show the state of the art of the RS potential in these areas.

The second is that there is a growing need that conservation moves beyond protected areas to protected landscapes, which include livelihoods. Remotely sensed supporting services include habitat, nutrient cycling, terrestrial and aquatic primary production, soil formation, and provision of biological refugia, but there is an acute lack of RS applications in the study of cultural ES with the exception of cultural heritage and recreation (Andrew et al. 2014; de Araujo Barbosa et al. 2015; Cord et al. 2017). Similar to the concept of essential biodiversity variables, indicators have been developed to allow fast and efficient assessment of ES across large spatial scales. One of these indicators is the recently proposed Rapid Ecosystem Function Assessment (REFA, Meyer et al. 2015). REFA builds on a suite of core variables such as aboveground primary productivity, soil fertility, decomposition, and pollination. While many of these could theoretically be assessed using RS technology, the concept was developed for in-situ measured data. On the other hand, Cerreta and Poli (2017) propose a GIS-based framework with scalable and transferable methodology to rapidly assess multiple ecosystem functional features of a landscape using a multi-criteria spatial decision support system. While none of these approaches has been implemented for the use of RS technology at large spatial extents, there is potential for setting up near-real-time systems of fast ES assessment. One issue with extending temporal coverage and amount of ESs covered, despite increasing amounts and quality of remotely sensed data products, is the lack of ground data necessary to validate satellite outputs (Jones and Vaughan 2010). Alternatives to traditional validation approaches based on comparison with ground data are new methods such as the application of process models to test the consistency of time series of more complex satellite data products (Loew et al. 2017).

17.4 Data Availability and Issues

Large amounts of auxiliary data from RS and other sources are now freely available, together with the models and technology necessary to process disparate data in geospatial frameworks. To take advantage of the full range of aspects covered, increase the reliability of different data sets, and account for data uncertainty as much as possible, auxiliary data sources are often used in tandem with RS data. For example, satellite and census land use data can be integrated to scale administrative-level information to the globe (Ellis et al. 2013), and vegetation indices from satellites can be combined with local ecological knowledge to improve assessments of ecosystem degradation (Eddy et al. 2017). We provide an overview of the most frequently and widely integrated remotely sensed and auxiliary data products using approaches discussed at the end of this chapter. We focus on data sources that are useful in assessing plant diversity-related aspects at the landscape scale. In a recent review, Englund et al. (2017) found RS-related publications to use the term landscape rather loosely as describing studies at anything between 24 and 122 million ha. Here, we define landscape as referring to studies going beyond local, plot-level scales and generally not past country-level scales, although many methods and data products will be applicable to continental and global scales, too. Where the RS data products are described in other chapters, we only report on non-remotely sensed products; otherwise, references to both are given. Details of data, spatial, and temporal resolution as well as the formats available are given in Table 17.1.

Table 17.1 Non-exhaustive examples of key, global, open access auxiliary variables; some sources; their maximum spatial and temporal resolution; and the available data format

17.4.1 In-Situ Biodiversity-Related Data

Decades of field work by dedicated researchers , their assistants, and students have resulted in collection of large amounts of biodiversity data, including plant species records on geographical location and abundance, traits of individual species, taxonomy, and phylogenetic data, as well as information on associated parameters such as pollination or dispersal and growth. This plethora of information is often scattered, hidden in scientific publications and a range of online (and offline) databases, herbaria, and agency reports. One means of retrieving relevant data semiautomatically from online sources is web scraping. Tools have been developed that make this a viable option for people who are not experts in languages routinely used for creating web pages and applications. These include several R packages (e.g., rvest, xml2, httr, TR8), Python libraries (e.g., Beautiful Soup), online tools (e.g., Nokogiri), and software assisting with identifying relevant CSS selectors on websites (e.g., Selector Gadget).

Recent efforts to cover this step and make dissemination of data more traceable, convenient, and standardized have resulted in large databases covering all the aspects discussed above. For example, large global databases exist for plant functional traits (e.g., TRY, Kattge et al. 2011), plant community data (species co-occurrences; sPlot, Dengler and sPlot Core Team 2014), plant phylogeny (e.g., TreeBASE, Smith and Brown 2018; Open Tree of Life), species distributions (e.g., Global Biodiversity Information Facility (GBIF), and botanical description and identification tools (e.g., JSTOR’s Global Plants) to name just a few.

Some issues with using such large databases, however, are unavoidable. Regarding plant phylogeny data, one needs to be aware of the lack of molecular data associated with most species of plants resulting in many phylogenetic placements being based on data at the genus or even family level (Smith and Brown 2018). Where genetic data are available at the species level, large uncertainties with regard to the placement of many taxa remain (Smith and Brown 2018), and, increasingly, genetic sequences have not yet been linked to species names (so-called dark taxa). Species distribution data, on the other hand, are known to have an inherently large spatial sampling bias (see, e.g., Fig. 17.2) and generally lack absence data, which can inflate the effect of sampling bias even further (Barbet-Massin et al. 2012; Kramer-Schadt et al. 2013; Beck et al. 2014; Maldonado et al. 2015).

Fig. 17.2
figure 2

Number of species distribution databases reporting the presence of two tree species: Abies alba (right) and Corylus avellana (left) across Europe. Color shows 0.5° raster including geo-references of A. alba or C. avellana presence, ordered from red (only one database reports presence of the species in this pixel) to dark blue (all seven examined databases report presence of the species in this pixel). Note the strong country border-related pattern for C. avellana. The number of pixels as a percentage of the total number of “presence pixel” where all seven databases agree is indicated in the plot (3.65% and 5.32%). Seven species distribution databases covering Europe were analyzed, including the Atlas Florae Europaeae, GBIF (status Nov. 2017), European Vegetation Archive, EUFORGEN (EU Forestry Commission), data from Brus et al. (2012), data collated by colleagues from the University of Leipzig (DE), and the FunDivEUROPE project (Baeten et al. 2013)

On the other hand, in the case of trait data, for example, TRY—a global database of plant functional traits—has been shown to be biased toward more extreme trait values, that is, frequently measured species consistently have higher or lower trait values than species missing in TRY (Sandel et al. 2015). Although plant functional traits are conventionally measured at the peak of the growing season and in full light conditions (top of canopy) (Pérez-Harguindeguy et al. 2013), one of the most commonly measured traits, specific leaf area (SLA) has been shown to have values in TRY that are typical of partial canopy shading (Keenan and Niinemets 2016). Due to the extremely diverse nature of studies contributing data to the TRY database, most entries have on average only three traits measured simultaneously, which makes multivariate analyses at the individual plant level extremely challenging (Schrodt et al. 2015).

These issues are mainly due to studies represented within these databases not necessarily following standardized protocols (e.g., Pérez-Harguindeguy et al. 2013), studies having different foci, data from opportunistic sampling (Maes et al. 2015) being mixed with data from directed approaches, and rare species, from a purely statistical viewpoint, being less likely to be measured. As such, avoiding them at the database level, especially where such a large number of data entries are managed in open access databases (e.g., Version 4 of TRY contained almost seven million trait records), is currently virtually impossible. An additional challenge when using trait data in tandem with RS is the lack of geo-referenced measurements within trait databases. For example, only about 60% of all data points within TRY are geo-referenced with variable levels of precision.

This leaves it to the data user to work around and with the data issues. Means of dealing with some of the aforementioned challenges regarding data availability and quality are (i) gap filling of missing trait data (Swenson 2014; Schrodt et al. 2015) and (ii) spatial extrapolation of plant traits (Butler et al. 2017), accounting for “dark diversity,” i.e., the portion of species absent from species distribution data such as biodiversity maps (Ronk et al. 2015). Care must be taken to avoid circularity (e.g., using phylogeny to gap fill trait data when including an aspect of taxonomy or phylogeny in subsequent analyses). In addition, a possible lack of representativeness throughout analyses should be considering, as well as the fact that different approaches might require different data collection protocols (e.g., statistical versus process models).

17.4.2 In-Situ Abiotic Factors

Much abiotic information, including data from the lithosphere, atmosphere, hydrosphere, and cryosphere, can be assessed remotely (see Record et al., Chap. 10, in this book for a thorough discussion, including access to climatological data). However, many important aspects are only accessible from in-situ sources. These include, for example, soil chemical and physical characteristics, geomorphology, and subsurface hydrology (see Table 17.1). Many are available at static temporal but relatively high spatial resolutions with high associated uncertainties in geolocation, bias due to different sampling efforts depending on the location, etc. For example, Generalized Linear Interactive Modelling (GLIM), a lithology and mineralogy data source, has been shown to be highly biased by country boundaries—an issue that is perpetuated in other products using GLIM, such as the SoilGrids database (Hengl et al. 2017), resulting in error propagation to higher-level agglomerate analyses. Other challenges include breakdown of concepts and assumptions related to up- and downscaling of composite products (e.g., inter-cell redistribution of soil water at fine spatial resolution, which can be ignored at coarser resolutions) and a lack of knowledge about parameters and processes acting at different resolutions (Bierkens et al. 2015).

17.4.3 Socioeconomic Factors and Land Use

Socioeconomic aspects are often ignored in assessments of plant biodiversity at the landscape scale, despite the obvious imprint humans have left on most of the globe. For example, Abelleira Martínez et al. (2016) found that studies linking local plant trait measurements to environmental gradients without accounting for anthropogenic effects on these traits render them of limited use due to the multivariate nature of the processes governing observed patterns. The same applies to studies integrating in-situ plant trait variability with land cover types, e.g., for ES assessments without explicitly taking into account human modification of the landscape through management, engineered novel communities, land use history, and heterogeneous landscapes (Abelleira Martínez et al. 2016). Anthropogenic aspects that can be assessed easily and incorporated into landscape scale analyses include data on population density, socioeconomics, and pollution (see Table 17.1).

Land use is another important anthropogenic aspect but less easily assessed at the landscape scale using RS techniques alone. This is mainly due to the same land cover (which refers to the physical characteristics of a landscape) having the potential of belonging to different land use categories (referring to the human use of this landscape). For example, the land cover class forest could be within the land use category natural primary forest or heavily managed degraded forest. Consequently, large-scale land use mapping depends on auxiliary data such as that coming from open access crowdsourced land cover and land use data to improve ground-truthing and validation (Fritz et al. 2017) and a combination of RS data sources, such as fusion of spaceborne optical data with radar data (Joshi et al. 2016).

At the other end of the spectrum of land uses are the human modified, urbanized, and infrastructure types such as cities and roads. RS has demonstrated a great potential to map impervious surface and more limited success in detecting roads. There has been a growing interest in urban ecology because more than half of the global population now lives in cities, and there is a growing interest to increase healthy urban living that combines well-being and biodiversity (Botzat et al. 2016). The first step to reach this goal is to create urban green belts (Hostetler et al. 2011), which are expected to bring about increasing numbers of native species and increased connectivity (Aronson et al. 2017). However, urban areas are also linked to high richness (Gavier-Pizarro et al. 2010) and spread of invasive species (Hui et al. 2017), and small urban centers are sources of invasive plants into natural areas (McLean et al. 2017).

RS of urban (invasive) plant species is covered in another chapter in this book (see Bolch et al., Chap. 12). Roads are more difficult to retrieve with RS alone, although the fishbone patterns in the Brazilian Amazon are evident in Landsat data (Alves and Skole 1996). Initiatives like OpenStreetMap can provide auxiliary data to improve the accuracy of RS-only estimates. These data are fundamental to assess global roadless areas and fundamental for maintaining biodiversity processes and avoiding deleterious effects of fragmentation.

17.4.4 Land Cover

Among the most traditional applications of RS are those related to the estimation of biophysical variables (e.g., tree density, vegetation health). AVHRR, Landsat, Sentinel-2, and MODIS are the most widely used sensors for this purpose, but integration of optical RS with LiDAR technology significantly improves the estimation and assessment of vegetation structure due to added horizontal and vertical information of vegetation properties (e.g., canopy height) (Lim et al. 2003). Studies combining optical, LiDAR, and radar RS have been applied to study interactions between biotic (i.e., vegetation) and abiotic (i.e., soil, geomorphology) elements at landscape scales and to quantify the carbon cycle and biomass.

Land cover is another commonly analyzed measure (in their recent review, Ma et al. (2017) analyzed 254 experimental cases and 173 scientific papers on the subject) that has been shown to be highly sensitive to the classification method applied, with the optimal approach depending to a large extent on spatial resolution, differences between land cover types, and training set size. Land cover maps are often used to derive landscape structural features such as patch size, isolation, and perimeter-to-area ratio.

Such landscape metrics can be assessed using patch matrix models (PMM), which are most suitable for high-hemeroby (low naturalness and high anthropogenic pressure, e.g., urban) landscapes due to reduced spatiotemporal heterogeneity, while gradient models (GM) are recommended for low-hemeroby landscapes (e.g., undisturbed forest) (Lausch et al. 2015). While PMMs are relatively well established and easy to use, disadvantages include that heterogeneity information might be lost, patches tend to have sharp boundaries, and results are highly sensitive to misclassifications of land cover and use metrics (Lausch et al. 2015). GMs, on the other hand, are more complex to use and require more computing capacity and RS expertise while being less susceptible to loss of heterogeneity information and artificially sharp boundaries (Lausch et al. 2015). Both models use a variety of data as inputs, including hyperspectral and LiDAR RS as well as in-situ data, thereby taking full advantage of opportunities offered by each methodology.

17.5 Methods to Integrate Remotely Sensed Measures of Plant Biodiversity with In-Situ Plant Diversity, Abiotic, and Socioeconomic Data

Studies of plant diversity at the landscape level frequently require a mix of data sources from various sensors as well as in-situ data (Table 17.1, Fig. 17.3). There are thus three main reasons for integrating measures of biodiversity-related variables across different data sources: (i) combining data from different sensors to make use of different vegetation aspects measured (e.g., MODIS vs. Advanced Land Observation Satellite Phased Array type L-band Synthetic Aperture Radar (ALOS PALSAR)); (ii) combining different sensors to simulate higher spatiotemporal and spectral resolutions to save financial resources or account for gaps in available RS data (e.g., Zeng et al. 2017); and (iii) combining in-situ with RS data for upscaling and validation.

Fig. 17.3
figure 3

Example work flow to correlate biomass (k) with human population density (f). Raw data from a variety of sources needs to be integrated, including Sentinel-2 (a), Landsat (b), and LiDAR (c) from RS as well as modeled land cover (d, INEGI 2013), plot level in-situ biomass measurements (e), and raw population density data (f, GPWv4 2016). After general data checking and cleaning (which is advisable for any data source), atmospheric and geometric corrections are performed on the remotely sensed data using software such as ENVI or SNAP, followed by transformation of the bands—in this case, calculation of the NDVI vegetation index (g). Radar data are classified into ground and nonground points using LAStools software, followed by application of a digital terrain and height canopy model to derive a canopy height map (h). Aboveground biomass (AGB, k) is calculated using the NDVI, canopy height, and (to validate the model) ground data and vegetation map (i). (j) Rasterized population density map, (m) pixelwise regression between (k) and (l). Please note no visible difference between (j) and (l) due to resampling, resulting in only small changes in pixel size

The process of integrating, combining, and correlating data from different sensors and data types of different temporal and spatial scales is not straightforward. Challenges are numerous and include sensor calibration, the propagation of uncertainties from individual data sets with inherent and variable uncertainty and impreciseness, outliers and spurious data, bias due to spatial autocorrelation, differences in geospatial data registration and alignment, and different processing frameworks (see Quattrochi et al. 2017 for a thorough discussion).

Figure 17.3 presents an example workflow that depicts the steps required to assess if there is a correlation between biomass and human population density in a coastal area of Mexico. In this case, since biomass cannot be inferred directly from RS sources (as described elsewhere in this book), several preprocessing and processing steps are required. Preprocessing steps include data preparation (data cleaning, atmospheric and geometric corrections), data transformation (e.g., from tabular into rasterized data) and fusion, running of auxiliary methods such as classification methods, and application of digital terrain models and height canopy models. Processing steps incorporate the use of allometric equations using in-situ plot level measurements of plant biomass in that area, which are also used as training and validation data to formulate the final aboveground biomass model fusing RS with in-situ data. Here, we present examples of techniques dealing with some of the abovementioned challenges in aligning different sensors, fusing data from these different sources across space and time, training fusion methods, and validating results.

17.5.1 Fusion

Data fusion is an invaluable tool to assess patterns and processes of biodiversity at large spatial scales and integrate data across different aspects of remotely sensed plant diversity, abiotic, and socioeconomic factors. Data fusion allows integration of data from different sensors and of diverging spatial, spectral, and temporal extent to produce outputs of increased fidelity and usefulness. Fusion is often performed to account for limitation in one data source, e.g., where single data rather than time series data are available in the sensor of interest (Carreiras et al. 2017) or to resample low-resolution data from one spaceborne RS channel using data from another, high-resolution channel on the same sensor.

In the example mentioned above (Fig. 17.3), the authors chose to use optical data from MODIS and radar data from ALOS PALSAR, and for their successful fusion, it is important to consider the different spatial resolutions, temporal data availability, and sensor characteristics (Fig. 17.1). This kind of constellation is frequently used to map a range of land cover and land use characteristics, including change, conversion, and modification where detailed information on both broad land cover classes from optical data and detailed surface roughness and moisture information from radar images are required (Pereira et al. 2013; Dusseux et al. 2014; Stefanski et al. 2014).

Different fusion techniques are applied to spaceborne or airborne sensors. For example, Sankey et al. (2018) described an approach to fuse unmanned aerial vehicle (UAV) LiDAR with hyperspectral data using a decision tree classification technique and found the combined use of these sensors provided more accurate assessments of 3D analyses of plant characteristics and plant species identification at submeter spatial resolutions.

A common application of fusion is an increase of spatial or spectral resolution, either accounting for limitations in the available data or imitating high-cost systems using low-cost alternatives, e.g., in precision agriculture. For example, Zeng et al. (2017) developed a system imitating very high spatial resolution hyperspectral measurements such as those required for the calculation of some vegetation indices (VI) using low-cost UAV-mounted sensors and fusing multispectral imagery with spectrometer data using Bayesian imputation and principal component analysis. Data fusion can also be applied when linking RS data to in-situ data. In their recent review, Lesiv et al. (2016) compared different algorithms fusing RS with crowdsourced data for forest cover mapping, including geographically weighted logistic regression (GWR), naïve Bayes, nearest neighbor, logistic regression, and classification and regression trees (CART), finding GWR to perform slightly better where input data were disparate.

In its simplest form, fusion can be a basic overlay of high- (spectral/spatial/temporal) resolution data over low-resolution data. However, as Lesiv et al. (2016) and others have shown, it is worth comparing different fusion techniques. Several studies have performed such comparisons but mainly with respect to land cover classification and specific to certain sensors and spatiotemporal scales (e.g., Caruana and Niculescu-Mizil 2006, Clinton et al. 2015). Consequently, Liu et al. (2017) recommend routine use of statistical comparisons between different fusion techniques (e.g., a Wilcoxon signed-ranks test for two algorithms or a Friedman test with Iman and Davenport extension if more than five algorithms are compared) to detect the optimal solution for a given application.

17.5.2 Assimilation

In essence, data assimilation is an extension of data fusion, linking noisy RS measurements with the outputs from imperfect numerical models to optimize estimates of measures that are not directly observable from RS [e.g., for detailed, high spatiotemporal drought monitoring (Ahmadalipour et al. 2017) or, in the example given in Fig. 17.3, to derive biomass estimates using a combination of canopy height, digital terrain, and aboveground biomass (AGB) modeling]. Advantages of data assimilation include enhanced quality control, the ability to take into account errors and uncertainties in data and models simultaneously, gap filling in data-poor locations and where insufficient temporal information is available, and improved parameter estimation in models.

However, data assimilation can also result in circular and inconsistent analyses. The end user needs to be aware that many remotely sensed variables [e.g., leaf area index (LAI)] are based on models incorporating ancillary information and are thus not independently retrieved. In our example, land cover might already be used as an information layer to tune the RS LAI retrieval within the data assimilation step. Thus, using LAI as biodiversity variable and adding land cover as an explanatory variable could be problematic (inconsistent if from different sources or circular). This illustrates the importance of carefully considering all input variables at different steps of the data fusion and assimilation process before using a RS product for further analyses.

A range of data assimilation techniques is available, from univariate (scalar) and multivariate (vector) 3D and 4D Kalman filter to ensemble methods that are particularly suitable where large sets of parameters are required and models are complex (for details see, e.g., Bouttier and Courtier 2002; Evensen 2002).

17.5.3 Validation

Like any other source of data, spaceborne remotely sensed products have errors and uncertainties associated with them. Validation is thus an ongoing challenge, although a number of guidelines and recommendations for best practice exist. For example, the NASA Land Product Validation Subgroup has published a framework for product validation and inter-comparison, as well as a “guide to the expression of uncertainty in measurement” (Schaepman-Strub et al. 2014). In the case of remotely sensed LAI, the list of auxiliary parameters with associated uncertainties that should be considered when validating LAI measures is long. It includes input data [land cover, radiometric calibration error, geometry, aerosol optical depth at 550 nm, canopy condition (chlorophyll, dry matter, and moisture content), understory reflectance and geolocation (sensitivity to terrain slope), sensor noise (especially for dark targets such as dense vegetation), clear sky top-of-atmosphere radiance, bidirectional reflectance distribution function (BRDF) modeling uncertainty, canopy and understory modeling uncertainty, and geometric considerations (where products are gridded in map projection systems of varying shape and area, Fernandes et al. 2014). A detailed overview of validation techniques used across different levels of RS data is given in Zeng et al. (2015), and guidelines on terminology, unified satellite validation metrics, and strategies, as well as explicit examples of RS validation techniques, including their mathematical basis, are provided in Loew et al. (2017). Luckily for the end user, many of these validation steps are performed by the respective satellite agencies (e.g., the European Space Agency (Dorigo et al. 2017) and NASA (Justice et al. 2013). However, being aware of the complexity of this endeavor and the importance of considering both the target variable (e.g., LAI) and its associated quality measure (e.g., uncertainty) as provided by the space agencies is of utmost importance to ensure appropriate use of RS products.

Apart from validating spaceborne RS data, validation techniques are also used to evaluate the quality of modeled secondary indices, as well as to assess uncertainty propagation after data fusion [e.g., accounting for uncertainty due to variable data quality of in-situ or crowdsourced data (see, e.g., Comber et al. 2016) or to validate downscaled RS products and airborne RS products]. For instance, crowdsourced data have been used to validate a high-resolution global land cover map (Fritz et al. 2017), and in-situ measurements of LAI collected simultaneously with airborne hyperspectral images were used to validate canopy radiative transfer models in agricultural landscapes (Haboudane et al. 2003). In its simplest form, validation is a pixelwise comparison of presumably high accuracy (often in-situ) data with the remotely sensed or modeled data, using an x-fold validation approach (splitting the in-situ data into training and validation data) if some of the in-situ data are needed for model development or downscaling. For an overview of more complex techniques see, for example, Montesano et al. (2016) for Landsat-derived tree cover, Lesiv et al. (2016) for crowdsourced forest cover, Joshi et al. (2016) for optical- and radar-derived land use, and Sun et al. (2017) for in-situ validated land cover.

With the rapidly growing availability of RS and auxiliary data, validation can become a time-consuming and complex task. Thus, increasingly, web-based validation systems are being developed that integrate big data access and storage, adjustment, and different intercomparison and validation techniques simultaneously (e.g. Sun et al. 2017).

One of the potential issues with the data fusion and assimilation methods described above is that they are often applied globally without testing whether variables and correlations remain stable in space and time (Comber et al. 2012). This is a recognized problem, and solutions have been proposed for over a decade (e.g., geographically distributed correspondence matrices, Foody 2005) with new approaches being continually proposed. Often these are specific for certain applications, such as net primary production (Wang et al. 2005), epidemiology (Khormi and Kumar 2011), biomass (Propastin 2012) or population segregation (Yu and Wu 2013). One recently proposed more generic approach is that of locally geographically weighted correspondence matrices, which combine categorical difference measures (Pontius and Milones 2011, Pontius and Santacruz 2014) with spatially distributed kappa coefficient, user, and producer accuracy estimates—with code to run these tests in R being available, e.g., see packages gwxtab, differ and RSLcode (available on github (https://github.com/lexcomber/RSLcode)) (Comber et al. 2017). All of these draw attention to the fact that, even after performing data cleaning, fusion, assimilation, and validation steps, local approaches, data, and techniques cannot necessarily be directly transferred from one location and spatiotemporal resolution to another.

17.6 Conclusions

We are living in an increasingly data-rich world, in which not only more but also more accurate and reliable data are available on many aspects related to plant biodiversity, both from remotely sensed as well as in-situ measurements. Increasingly, limitations and potential circularities inherent to these data are acknowledged, often aided by the provision of associated estimates of uncertainties and dedicated intercomparison studies. Techniques are being developed that enable even nonexperts to account for and learn from these. Even in the case of data limitations, however, our ability to map all aspects of biodiversity over large spatial and temporal scales has increased exponentially over the last decade, and monitoring and understanding ecosystem functions is easier than ever before. Nevertheless, significant challenges remain, including scaling mismatches, misuse of data and techniques, insufficiently high spatiotemporal resolution of RS data, biases in in-situ data, and many more. It is imperative that we acknowledge and work with these challenges to devise even more accurate and suitable approaches to assessing biodiversity for the study of ecosystem function, conservation, and other applications at large spatial scales.