MODIS land cover uncertainty in regional climate simulations

MODIS land cover datasets are used extensively across the climate modeling community, but inherent uncertainties and associated propagating impacts are rarely discussed. This paper modeled uncertainties embedded within the annual MODIS Land Cover Type (MCD12Q1) products and propagated these uncertainties through the Regional Atmospheric Modeling System (RAMS). First, land cover uncertainties were modeled using pixel-based trajectory analyses from a time series of MCD12Q1 for Urumqi, China. Second, alternative land cover maps were produced based on these categorical uncertainties and passed into RAMS. Finally, simulations from RAMS were analyzed temporally and spatially to reveal impacts. Our study found that MCD12Q1 struggles to discriminate between grasslands and croplands or grasslands and barren in this study area. Such categorical uncertainties have significant impacts on regional climate model outputs. All climate variables examined demonstrated impact across the various regions, with latent heat flux affected most with a magnitude of 4.32 W/m2 in domain average. Impacted areas were spatially connected to locations of greater land cover uncertainty. Both biophysical characteristics and soil moisture settings in regard to land cover types contribute to the variations among simulations. These results indicate that formal land cover uncertainty analysis should be included in MCD12Q1-fed climate modeling as a routine procedure.


Introduction
Many global land cover datasets have been used by research communities exploring land surface processes and land-atmosphere interactions at regional to global scales (Sterling and Ducharne 2008). In the 1990s, global land cover datasets derived from the Advanced Very High Resolution Radiometer (AVHRR) became available (DeFries et al. 1995;Hansen et al. 2000;Loveland et al. 2000). Later, Moderate-Resolution Imaging Spectroradiometer (MODIS), on board the Terra (1999) and Aqua (2002) satellites, with improved spectral, spatial, geometric, and radiometric characteristics, provided new opportunities for remote sensing-based global land cover mapping research. The MODIS Land Cover Type (MCD12Q1) dataset was first produced (Friedl et al. 2002) soon after data from the sensors became available and continue to be updated (Friedl et al. 2010). Unlike other similar moderate resolution datasets that were produced for limited time periods, such as GLC2000 (Bartholome and Belward 2005) from SPOT VEGETATION and GlobCover (Arino et al. 2008) from MEdium Resolution Imaging Spectrometer (MERIS), MCD12Q1 provides yearly products from 2001. They are widely used in studies of atmospheric science, hydrology, ecology, and land change science (Guenther et al. 2006;Gerten et al. 2004;Reichstein et al. 2007;Turner et al. 2007).
In regional climate modeling, land cover is a key component (Cotton and Pielke 2007;Pielke et al. 2007). Various models, including the Regional Atmospheric Modeling System (RAMS) and the Weather Research and Forecasting (WRF) model, were found to be sensitive to input land cover data. Stohlgren et al. (1998) successfully simulated the phenomena of lower summer temperature in Colorado mountains as a result of agriculture and urban land cover change in the plains, using the RAMS model. Ge et al. (2007) reported that input land cover data with accuracies lower than 80% would have strong impacts on RAMS precipitation simulation in East Africa. Sertel et al. (2010) replaced the default Global Land Cover Characteristics dataset in the WRF model with a newly developed land cover map from recent Landsat ETM+ and achieved better temperature simulations for Northwest Turkey. MCD12Q1 products are used widely as input for climate models, but data quality is rarely fully considered. Although the overall classification accuracy of MCD12Q1 products is assessed at 74.8% (MODIS Land Team 2014), it can be much lower for certain classes or in certain regions. Zeng et al. (2015) reported an overall accuracy of MCD12Q1 Version 5.1 at 64.62% for China, varying between 36.11 and 76.52% in different provinces and between 3.74% (shrublands) and 82.92% (water bodies) in different land cover classes. Confidence assessment maps (McIver and Friedl 2001) are provided with the products, but they do little to help in understanding the impact of data quality on climate modeling outputs.
Inaccuracies in land cover products arise in every step of data production (Congalton et al. 2014) and are described in different terms (Messina et al. 2008). In this study, we focus on "uncertainty" as the degree of data fidelity open to question, instead of "error" as discrepancies between data and reference. Land cover errors are typically recognized via disagreement between one land cover dataset and a reference map, e.g. ground control or other land cover datasets (Cohen et al. 2003;Giri et al. 2005;Ran et al. 2010;Fritz et al. 2011). Land cover error propagation is then studied by comparing modeled outputs obtained using different land cover inputs (Yin et al. 2007;Ge et al. 2009;DeVisser and Messina 2009;Gao and Jia 2013). However, land cover comparisons are often problematic, since data were usually collected in different years, different spatial scales, and coded in inconsistent classification schemes. In this paper we explore MCD12Q1 uncertainties routinely introduced by classification processes through the use of the time series of MCD12Q1 products themselves. A similar approach to assess the quality of MCD12Q1 at global scales was introduced by Liang and Gong (2010), which hypothesized that locations with highly unstable land cover classes over time are more likely to be suffering from classification process inaccuracy rather than experiencing actual land cover changes. Thus, we propose to be able to quantify the uncertainty of land cover data from unstable locations and develop models to propagate this uncertainty.
Using a large region around Urumqi as the case study area, we explored the levels of land cover uncertainty embedded within the time series of MCD12Q1 products for this area and explored the impacts of uncertainty propagation through RAMS. We answered two specific research questions. (1) How should uncertainty within the time series of MCD12Q1 products be characterized? (2) What are the propagation impacts of MCD12Q1 uncertainties on regional climate simulations?

Study area and data
Our study site, Urumqi, is an oasis metropolis in semi-arid northwest China (Fig. 1a). The average yearly temperature is 6.9 °C, and average yearly precipitation is 286.3 mm, with summers slightly wetter than winters. The city sits on the northern foot of Tianshan Mountains and southern edge of the Dzungaria Basin. The area for analysis was selected to cover a mountainous region within 42.5°N-45°N, 86°E-88.5°E, roughly 5.6 × 10 4 km 2 (Fig. 1b). Elevation ranges from below sea level at Turpan Depression to over 5000 m on nearby mountain peaks. Vegetation is most abundant along the northern slope of the mountains and declines to desert both north and south.
Urumqi was selected as a target city for its rapid growth and hypothesized sensitivity to climate change. When preparing land cover inputs for regional climate simulations, we discovered the spatially extensive class-flipping problem in the MCD12Q1 product: pixels frequently alternating between pairs of land cover classes over time. Due to the particular political environment, collection of ground reference data by foreign institutes is prohibited, precluding the production of a typical error assessment. Therefore, we could not assess single time period classification errors with regard to ground control. However, as we discovered that class flipping was largely limited to classes with acknowledged significant impacts in climate simulations (e.g., Stohlgren et al. 1998), we sought to characterize these propagation impacts.
MCD12Q1 is a yearly product available from 2001 to the present. The base algorithm is a C4.5 (Quinlan 1993) decision-tree and ensemble classifications are estimated using boosting. Classification inputs 12 sets of 32-day average nadir BRDF-adjusted reflectance (Schaaf et al. 2002) for MODIS band 1-7, enhanced vegetation index (Huete et al. 2002) and land surface temperature (Wan et al. 2002) together with their annual minimum, maximum and mean. The high temporal resolution of these datasets made it possible to utilize phenological and other temporally variable information to facilitate classification. Outputs then go through a 3-year movingwindow stabilization procedure to reduce the amount of spurious inter-annual changes from 30 to 10% (see Friedl et al. 2010 for more details on MCD12Q1 production). However, the producer suggested that this rate is still well above the actual global land cover change rate, leaving a large model output space to data uncertainty. Urban areas were produced separately from other land cover types using MODIS data from 2001 to 2002 (Schneider et al. 2009), and have not been updated.
In this paper, subsets of MCD12Q1 products are used to study uncertainty in the Urumqi region of northwestern China (Fig. 1a). The study area covers a semi-arid mountainous region within 42.5°N-45°N, 86°E-88.5°E, roughly 5.6 × 10 4 km 2 (Fig. 1b). Version 5.1 MCD12Q1 products with a spatial resolution of 500 m were acquired from 2001 to 2012 using the IGBP class layer (e.g. Fig. 1c). Most years show unrealistically high (above 10%) inter-annual change rates as compared to previous years (Fig. 1d), suggesting that something other than actual land cover change is occurring (Zhang et al. 2007).

Land cover uncertainty
Since stability is the dominant characteristic of land cover, uncertainty can be disentangled through a systematic evaluation of changes. A change trajectory from 2001 to 2012 was generated for each pixel. Two variables were computed to characterize trajectories: the number of changes (C, Fig. 2a) and different varieties (V, Fig. 2b) of types that occupied the pixel. Pixels were further divided into groups based on the distributions of these two variables (Fig. 2c). Stable pixels (C34) were those consistently classified throughout the period. Unstable (C > 4) pixels were further divided into two groups depending on V with a cutoff at 3. For the group with both high C and V, it is difficult to retrieve meaningful categorical uncertainty while the other group (high C low V, Fig. 2d) contains pixels flipping among a small set of land cover types. The latter indicates that the automated MCD12Q1 classification model may have difficulty in distinguishing between particular land cover types, especially for this study area or ecotone.
In order to find inter-category uncertainties, we subdivided the 12-year trajectories of those flipping pixels into separate changes from one year to the next. Table 1 shows the most frequent change directions. The first four directions account for over 85% of the total changes, and they perfectly cover two pairs of land cover types. For example, in 2011 over 9% of the entire landscape fell into these four spurious directions compared to 2010. Figure 3a shows that the uncertain areas present some spatial structure across the entire landscape, with Opposite directions tend to be spatially intermingled with each other, which suggest inherent spatial patterns with our discovered uncertainty. However, it could be argued that the spurious (or uncertain) inter-annual changes might result from particular farming practice or natural vegetation dynamics, thus not necessarily from data uncertainty.
To lend support to our findings, a set of Landsat 5 TM data were collected for our study area in August of respective years (Fig. 3b), along with those years of 16-day average Enhanced Vegetation Index (EVI) from MOD13A1 Version 6 (Didan 2015) for 100 random sample locations of each spurious change direction (Fig. 3c). The August TM images were chosen for their low cloud cover and strong vegetation signal. In this semi-arid environment, croplands are heavily dependent on irrigation therefore showing vigorous green color in the pseudo-color TM images compared to nearby natural grasslands. No large-scale shifts between grasslands and croplands were found in the TM images, as opposed to the spurious changes detected from MCD12Q1 (see Fig. 3a). Not surprisingly, the grasslands/ barren uncertainty is much harder to elucidate. The IGBP scheme defines barren as no more than 10% vegetation at any time of the year. It requires a reference dataset of high temporal resolution to distinguish between grasslands and Year 2011 may have slightly higher averaged EVIs during the growing season but this is the case in all samples. Therefore we consider the identified changes more likely to result from data uncertainty rather than actual land cover changes.

RAMS
RAMS v6.0 (Cotton et al. 2003) is a state-of-the-art limited area model that numerically integrates the fully compressible non-hydrostatic equations of motion, and it solves the equations of radiative transfer; water, heat and momentum exchange between the surface and air, turbulent boundary layer transport, convection; and cloud microphysics. Boundary conditions from large-scale datasets allow the model to be nudged towards the large-scale data. Land surface conditions, including vegetation, soil, water and snow, interact with the atmosphere through the LEAF-3 sub-model (Walko et al. 2000), which can and should be customized to represent regional characteristics ( Table 2). The IGBP classification system used in MCD12Q1 covers a broad swath of vegetation types for a large domain, translating to variabilities in these biophysical characteristics.
The main simulation domain has a 32 km spatial resolution and the nested domain has 8 km (Fig. 4). No finer grids were added due to their potentially massive computation demand and the fact that 8 km resolution is sufficient for the scale of this study. The 8 km grid was selected to span at least 6 grid points outside the landscape so that boundary nudging was not included in the domain. The 32 km grid was selected again to keep nudged pixels away from the inner grid and to give enough space for wave information to propagate across without creating large inhomogeneities at the boundaries. Given that prevailing winds are from the west we needed to make sure that the domain center was to the west of Urumqi. From this basic approach, we tested multiple configurations (slightly more north, slightly more west, etc) until we arrived at a grid configuration that validated well against historical data.
The land cover inputs were aggregated into coarser grids to match the nested domain resolution. The LEAF-3 model examines the 500 m land cover lattices falling into each 8 km grid and calculates the fraction of area each land cover type occupies. Water fraction is always stored regardless of value. The other land cover types are sorted by abundance and stored into the 8 km grid for up to four types (e.g. Fig. 5). Therefore, sub-grid information was preserved to some degree.
A set of variables were selected as major outputs from RAMS, including sensible heat flux (SHF), latent heat flux (LHF), vertical motion (VM) at the top of boundary layer, 2-m temperature (TEMP) and 10-m horizontal wind speed (WS) above land surface, and hourly precipitation (HP). Although it is possible to retrieve all variables used in calculating atmospheric conditions, these six are key ones that describe climate and underlying energy and moisture exchange. Land cover classes with less vegetation should produce lower LHF and (depending on albedo) lower SHF. Lower SHF should be connected to lower 2 m air temperatures in smooth terrain. The changes in the variables we examined in detail depend sensitively on which specific biophysical parameters are involved. It is difficult to hypothesize about vertical motion, WS, and convection given that these variables are connected to spatial heterogeneity, thus dependent on the specific situation, which is what we are here to test. Of course, convection is linked to precipitation, so part of the reason for looking at these simulations is to estimate if coarse-scale, cohesive circulations can develop solely from land cover uncertainty. The RAMS model simulations were run and output hourly from March 21 to June 25, 2011, but the first 30-days were discarded for model spin-up. This time period was selected as the most recent non-drought, non-snowcovered growing season onset period available for simulating a non-dormant vegetation signal.

Uncertainty propagation
A range of uncertainty analysis methods are available for propagating input uncertainty through relatively simple applications (Wang and Gertner 2013). However, in complex spatio-temporal system models like RAMS, uncertainty effects on the results cannot be analytically estimated, and may manifest in unknown spatial and temporal segments. Monte Carlo simulation is a widely applicable approach to study uncertainty propagation in complex nonlinear system modeling (Heuvelink 1998). The basic idea is to generate a set of statistically equivalent realizations for the uncertain input variable and then run the model using those realizations, thus producing a set of model outputs. Uncertainty propagation can be studied based on the distributions of those outputs.
However in this case, defining a proper distribution for land cover is challenging, so we made a few assumptions. Let the two land cover types in an arbitrary spurious direction be A and B. (1) For a single pixel that changed in a spurious direction as shown in Table 1, it has equal probabilities of being either A or B. (2) The classification uncertainties between two consecutive years are systematic, which means all pixels that fell into a specific spurious direction were either all A or all B. (3) The spurious directions are independent from each other. We made such assumptions under the circumstance that: (1) real land cover changes are rare when compared to classification errors, (2) given the limited size of the study area, locations changed in the same spurious directions are more likely to have similar ground conditions, and (3) lack of any reference that helps determine error structure within pixels with spurious changes, e.g. false rates and spatial patterns.
Given the above assumptions, a binary control model (see e.g., Li et al. 2014) was developed based on combinations of spurious change directions. A simple on-off rule was used to generate possible land cover realizations within our defined uncertainty space. For example, from maps 2010 to 2011, if the direction grasslands-to-barren is turned off, all pixels that fell in this direction needed to be modified from barren to grasslands in the 2011 map. Since we discovered 4 spurious directions, the on-off perturbation can be set to all these directions, thus yielding, in total, 16 equally possible land cover realizations. All were later passed into RAMS, including the original 2011 land cover (the all-on realization) as reference, and the variations among their outputs were examined.

Variation analysis
The variation among spatial-temporal outputs of RAMS is complex, and the temporal profiles and spatial patterns can be illustrated separately by aggregating data along different dimensions. More specifically, for a temporal profile, domain-averaged daily series for a limited set of variables from the modeled outputs were calculated for each simulation and presented together using box plots. For the same set of variables, pixel-based data ranges among simulations at every hour were calculated and then averaged over the entire analysis period to show spatial patterns. Furthermore, we tested the effects of particular land cover perturbations by comparing each simulation with the reference. Mean and variance over the simulation period were selected to describe variabilities in climate, with t test and f test (von Storch and Zwiers 2001) performed at the pixel-level to examine whether the differences between simulation and reference are statistically significant. Due to the fact that climate variables are highly autocorrelated temporally, equivalent sample sizes (Zwiers and Storch 1995) were calculated and used to adjust calculations for significance tests. In addition, all simulations were combined into a whole population and compared with the reference. All these post-analysis of RAMS outputs were implemented using NCAR Command Language (NCL) 6.2.0.
Unlike domain-averaged results, substantial differences were found in most variables with great spatial heterogeneity (Fig. 7). At roughly 44°N, 87°E, a hot spot was evident in SHF, LHF, TEMP and WS, which corresponds to the areas with high grasslands/croplands uncertainty. The associated northeast down-wind area also manifests impacts in SHF, VM and WS, likely demonstrating a spatial propagation effect. Additionally, the hourly animation (see supplementary file) presents a daily east-west movement within this down-wind area. Another hot spot (~43°N, 87.5°E) emerged with high VM and moderately high TEMP and WS variations. It is collocated in an area with grasslands-barren uncertainty on mountain slopes. As for HP, greater impact was found along the western border where most of the resolved precipitation occurs.  Figure 8 shows the percentage of pixels identified as significantly different from the reference in mean or variance. Different means suggest changes in their distribution baseline while different variances suggest changes in the magnitude of their fluctuations. In general, two variables stand out in the significance tests, LHF and HP. For LHF, similar amounts of pixels were identified to have different means or variances, but HP differs only in variances and more pixels were affected comparing to other variables. SHF, VM and WS emerged, while TEMP was found not significant in any case.

Comparison among simulations
For simulations with only one direction turned off, manipulations on direction 1 and 4, which are grasslands to/from croplands, had the greatest impacts. Direction 2 and 3, barren to/from grasslands, caused smaller changes. When more directions were altered together, the patterns of significant pixels become complicated. Direction 1 and 4 slightly mitigated each other, and did more so on HP than on LHF, while direction 2 and 3 seems added up on each other. Combinations on three or more directions even resulted in significant pixels in VM and/or WS.
Finally, the combined experiment shows that the aggregated results exhibited different distributions for only LHF and HP, variables that are strongly connected to the presence of water. For LHF, pixels with significantly different means or variances were both around 3%; and for HP, there were no significant differences in mean, but over 12% of the pixels showed significantly different variances. Figure 9 further illustrated the area and magnitude of such differences. The statistical significant pixels for LHF in both t test (Fig. 9a) and f test (Fig. 9b) seem connect to locations with high land cover uncertainty. All the significant pixels in Fig. 9b have values over 1, suggesting greater variability comparing to the reference. Variability in precipitation (Fig. 9c) exhibit a more complex pattern.

Discussion
Grasslands/croplands and grasslands/barren were the major categorical uncertainties identified in the study area. These three land cover types were also the most prevalent types in this area. The specific reasons for these uncertainties to occur is buried in the particular classification model and model inputs used to produce the MCD12Q1 products. However more broadly speaking, such uncertainties may be the product of ambiguous class definitions or thresholds for classification that happen to fall near the natural break point in any particular ecoregion. For example, any single class may describe an overly broad feature space, e.g. grasslands in lush or sparse form. Alternatively, different classes may overlap in feature space, e.g. lush grasslands vs. croplands and sparse grasslands vs. barren. These types of problems are hard to wholly avoid, therefore assessing effects on applications might be a better error assessment strategy.
In our experiments, the grasslands/croplands uncertainty is the biggest contributor to the propagated impacts through climate simulation. Although the grasslands/barren uncertainty are more different biophysically and more extensive in space, their propagated effects were considerably lower. This outcome was not what we expected. However, the underlying factor, as it turns out, is how soil moisture is modeled in RAMS with very different parameter settings for natural landscapes vs. irrigated croplands. In this application, irrigated croplands always have the soil set at the saturation level, which obviously affects the evapotranspiration and propagates further to energy and moisture fluxes in atmosphere. LHF is directly related to this matter thus emerging as the most impacted variable overall. Interesting impacts on HP emerged as well. Although most of the precipitation in this semi-arid region resolved along the western border, where the greatest ranges among simulations occur, statistically significant changes were only found in locations related to landcover uncertainty.
There were discrepancies between the areas identified in range maps and significance tests. The range maps suggested that greater spatial extents, even beyond the uncertain locations, were affected to varying degrees. While many of these spatially extensive disturbed regions were not statistically significant for impacts, it does not at all mean that these insignificant disturbances are irrelevant. For systems focusing on aspects other than overall means and variances, such as critical values or certain space-time subsets, areas identified in the range analyses become critical for decision making.

Conclusion
Our study found that MCD12Q1 products for the Urumqi area have substantial categorical uncertainties introducing significant and complex propagation impacts on climate simulation results. The MCD12Q1 classification algorithm struggles to discriminate between two pairs of locally relevant land cover types, grasslands vs. croplands and grasslands vs. barren. In a 66-day RAMS experiment, such categorical uncertainties significantly affected latent heat flux with overall average ranges at 4.32 W/m 2 . Sensible heat flux, vertical motion, temperature, wind speed and hourly precipitation were not significantly affected across the domain average, but were locally significant in diverse regions. Impacted areas were most frequently connected to the locations with land cover uncertainty but these impacts also propagated spatially. It is evident that differences in both biophysical characteristics, as indicated by MCD12Q1 Fig. 8 Percentage of significant pixels in t tests and f tests against reference-run at 0.05 level. The 4-digit simulation ids correspond to the four spurious directions in Table 1,   We believe that uncertainty space is an underestimation of the total uncertainty budget in regional climate models due to a lack of fidelity in the land cover data. We only considered regular variability within MCD12Q1 without accounting for irregular uncertainty and systematic biases. Some places may be classified chaotically or consistently wrong. As mentioned earlier, the urban class was produced once from data over a decade ago. For areas with rapid urban expansion, such as the Urumqi area, the classification bias on urban land escalates over time. Another example in this study area is that wetland/riparian areas are often misclassified as irrigated croplands, as they are spectrally similar and spatially co-located along river valley and lowlands. Despite these well-known production issues, MCD12Q1 is still the most efficient choice for models requiring large spatial extents and a time-series.
Our experiments covering the semi-arid region around Urumqi offer direct reference to areas with similar semiarid environments, particularly other parts of Central Asia and the Western United States. The type of MCD12Q1 categorical uncertainties we observed and the magnitudes of their propagation impacts are most likely similar in these environments. Different environments (e.g. semi-humid) and/or different climate models (e.g. WRF) and model setups (e.g. soil moisture) will experience different propagation impact from MCD12Q1 uncertainty. Our work establishes a framework to characterize MCD12Q1 uncertainties and test for their propagation impacts not only for climate simulations under any environment but also any application model using MCD12Q1 as a primary land use and cover input. Fig. 9 Maps of significant pixels from combined experiment, including a t test and b f test results for LHF and c f test for HP. Insignificant pixels were shown in grey. t test shows the difference in mean comparing to reference in W/m 2 , and f test shows the ratio of standard deviation to that of the reference