Advanced Pattern Analysis to Validate Land Use Cover Maps

Paegelow, Martin; García-Álvarez, David

doi:10.1007/978-3-030-90998-7_12

Martin Paegelow⁵ &
David García-Álvarez⁶

21k Accesses

Abstract

In this chapter we explore pattern analysis for categorical LUC maps as a means of validating land use cover maps, land change and land change simulations. In addition to those described in Chap. “Spatial Metrics to Validate Land Use Cover Maps”, we present three complementary methods and techniques: a Goodness of Fit metric to measure the agreement between two maps in terms of pattern (Map Curves), the focus on changes on pattern borders as a method for validating on-border processes and a technique quantifying the magnitude of distance error. Map Curves (Sect. 1) offers a universal pattern-based index, called Goodness of Fit (GOF), which measures the spatial concordance between categorical rasters or vector layers. Complementary to this pattern validation metric, the following Sect. 2 focuses specifically on the changes that take place on pattern borders. This enables changes to be divided into those that take place on the borders of existing features and those that form new, disconnected features. Bringing this chapter on landscape patterns to a close, Sect. 3 presents a technique for quantifying allocation errors in simulation maps and more precisely on the minimum distance between the allocation errors in simulation maps and the nearest patch belonging to the same category on the reference map. The comparison between a raster-based and a vector-based approach brings us back to the differences in measurement inherent in the representation of entities in raster and vector mode. These techniques are applied to two datasets. Section 1 uses the Asturias Central Area database, where CORINE maps are compared to SIOSE maps and simulation outputs. For their part, the techniques described in Sects. 2 and 3 are applied to the Ariège Valley database. CORINE maps for 2000 and 2018 are used as reference maps in comparisons with simulated land covers.

You have full access to this open access chapter, Download chapter PDF

Techniques for the Validation of LUCC Modeling Outputs

Lessons and Challenges in Land Change Modeling Derived from Synthesis of Cross-Case Comparisons

Validating models of one-way land change: an example case of forest insect disturbance

Article 25 June 2021

Keywords

1 Map Curves

Description

This is a quantitative method proposed by Hargrove et al. (2006) to evaluate the spatial concordance between different categorical raster or vector datasets. It calculates the Goodness of Fit (GOF) (Fig. 1), a standard metric that evaluates the spatial concordance between the patches of two or more rasters or the polygons of two or more vectors. Unlike other methods, it does not evaluate spatial agreement at cell level, and instead focuses on agreement at patch level in rasters or at polygon level in vectors. Consequently, this method is independent of spatial resolution.

GOF values range from 0 to 1. Maximum GOF (1) is obtained when there is full overlap between two polygons or patches. If there is no overlap, GOF is 0. If overlap affects half the area of the polygons or patches, GOF will be 0.5.

When comparing pairs of maps, the GOF value may vary depending on whether the assessed map is evaluated against the reference map or the reference map is evaluated against the assessed map. Map Curves calculates the GOF values for both these operations. It then uses the highest of these two GOF values in the comparison.

GOF values may be obtained either for the whole dataset or for the set of patches or polygons that make up each category on the map. Although it is technically possible to calculate a GOF for each individual polygon or patch, it is computationally very demanding and is not normally done.

Based on the GOF metrics at the category level, the results of the map comparison may be expressed in a graph, which shows the percentage of the categories in the map that have a specific GOF value. For example, if there are 10 categories and 2 of these have a GOF value of ≥0.8, the graph will show that 20% of the categories have GOF values of ≥0.8.

Utility

Exercises
1. To validate a map against reference data/map 2. To validate a simulation against a reference map 3. To validate simulated changes against a reference map of changes 4. To validate a series of maps with two or more time points

Map Curves provides a simple metric for assessing the extent to which two datasets share the same spatial structure, i.e. the same number and shape of polygons or patches. Unlike many other metrics, GOF evaluates the spatial agreement between maps at a polygon or patch level. In most cases, this type of analysis is based on raster data and comparisons are made at cell level. However, polygons or patches reflect the real structure of a landscape better than cells. GOF therefore provides a better, more realistic method for validating the similarity between maps than cell-based metrics.

GOF provides a standard and, therefore, comparable metric. The GOF value in one validation exercise may be compared with the GOF value obtained in another. Consequently, when using this metric to assess validity, we can establish a general minimum acceptable GOF threshold above which the map can be considered valid.

Map Curves gives an overview of the pattern agreement for the whole landscape and at category level. However, it does not provide information about the agreement per polygon. This means that a few polygons that do not show good overlap when comparing the maps could be hidden in the general analysis. Thus, as currently implemented, this technique only provides information on spatial agreement at a category level and does not shed light on disagreements occurring at more detailed scales of analysis.

The fact that GOF is unaffected by the spatial resolution used in the analysis should be considered an important strength, as spatial resolution is one of the main sources of uncertainty associated with any validation exercise. Nonetheless, at very coarse spatial resolutions, the area and shape of some polygons and patches can become very distorted, and this could affect the results of the analysis. Therefore, when used with rasters, GOF can be considered independent of spatial resolution below a certain threshold.

We do not recommend validating the spatial structure of a map by comparing it with another map obtained at a different resolution. Changes in spatial resolution or scale will always result in changes in the spatial structure of the maps. The results of the analysis will highlight not only the differences between the original maps in the way they represent LUC in the landscape, but also the differences produced by changes in the spatial resolution.

Although Map Curves could be a useful tool for comparing the agreement of the spatial pattern between different maps, its results must be treated with caution when validating the pattern of the maps. This is because Map Curves only assesses the degree of overlap between the patches or polygons belonging to each category in the two maps compared. If the overlap is low, the GOF score obtained by Map Curves analysis will also be low. However, this only means that their classes do not overlap well and does not imply that the two maps being compared have completely different patterns.

Spatial metrics (see Chap. “Spatial Metrics to Validate Land Use Cover Maps”) are more suitable for validating the pattern of the map. Even if there is no spatial overlap, they provide objective information about the fragmentation of the landscape or the complexity of the polygons/patches, which can be used when comparing two maps. Spatial metrics therefore allow us to compare pattern agreement between maps, even if they do not locate land uses in the same positions.

QGIS Exercise

Available tools
• Processing Toolbox R Pattern evaluation Map Curves raster R script Map Curves vector R script

There is no default tool in QGIS for carrying out Map Curves analysis. It is however implemented in R. We have developed two R tools for QGIS to perform the Map Curves analysis for either raster or vector data. To learn how to configure QGIS to work with R scripts, see Chap. “About This Book” of this book. This also explains how to install the different R scripts required to do some of the exercises presented in the book.

The Map Curves raster script is based on the code developed by Professor Emiel van Loon from the University of Amsterdam.^{Footnote 1} The script provides full Map Curves results. These consist of: (i) the GOF value of the analysis, with details of the map used as a reference; (ii) the table for the GOF between categories; and (iii) the Map Curves graph. The R code of the Map Curves raster script also allows us to compare raster and vector maps. However, the vector option is unstable and does not always produce correct results. Its use is therefore not recommended.

The Map Curves vector script, which can only be employed to compare vector maps, is based on the “Sabre” R package.^{Footnote 2} Unlike the previous script, it only provides information on the overall GOF between the two maps and the map used as a reference when obtaining it.

The Map Curves raster script provides more information than the Map Curves vector script. It is also much faster and more efficient. We therefore recommend that this analysis be carried out with raster data.

Exercise 1. To validate a map against reference data/map

Aim

To check the agreement between the SIOSE and CORINE maps, considering SIOSE as a valid reference. We will assess to what extent the spatial structure of the CORINE map (number of polygons, shape) is similar to the SIOSE map.

Materials

SIOSE Land Use Map Asturias Central Area 2011

CORINE Land Use Map Asturias Central Area 2011

Requisites

The two maps must be raster and have the same projection. Although the tool does work with raster maps at different extents and with different thematic resolutions, we recommend comparing rasters with the same or very similar extents and thematic resolutions, so as to avoid results that may not be particularly meaningful.

Execution

If necessary, install the Processing R provider plugin, and download the MapCurves_raster.rsx R script into the R scripts folder (processing/rscripts). For more details, see Chap. “About This Book” of this book.

Step 1

Open the Map Curves Raster function and fill in the required parameters. These are basically the two LUC maps to be compared: “Land Use map 1” (SIOSE) and “Land Use map 2” (CORINE) (Fig. 2).

Results and Comments

After running the function, we obtain two tables and one graph. All the information, with the exception of the graph, will also be displayed in the “Log” window (Fig. 3).

The GOF value is a measure of the general agreement between the two maps being compared. This value ranges from 0 to 1, with 0 meaning no agreement and 1 total agreement. The GOF value for our comparison (0.54) indicates that the agreement between the two maps is significant, although not very high. The patches of the same categories partially overlap.

The reference map ($Refmap) value informs us as to which map was used as the reference when obtaining the GOF value. If value “A” is obtained, it means that “Land use map 1” was used as the reference map in the comparison. If value “B” appears, it means that “Land use map 2” was used. Therefore, in our case, a GOF of 0.54 was obtained when comparing SIOSE and CORINE and taking CORINE as the reference. If SIOSE had been taken as the reference, agreement (GOF value) would have been lower.

The GOF table details the GOF value for agreement per category, so providing a measure of how similar the pattern for a particular category is in the two maps. It therefore answers the following question: to what extent do the patches that make up a particular category overlap in the two maps being compared?

In our case, the category that shows the greatest pattern agreement between the two maps is water bodies (Category 11), with a GOF value of 0.968. Agricultural areas (Category 0; GOF 0.783) and vegetation areas (Category 1; GOF 0.800) also show high levels of agreement. By contrast, agreement between the two maps is very low for road and rail networks (Category 6; GOF 0.112).

If we observe the two maps, most of the agreement and disagreement is due to the fact that they follow different Minimum Mapping Unit (MMU) and Minimum Mapping Width (MMW) criteria. Thus, if a patch is larger than the MMU and MMW of both maps, it will be similarly mapped in both cases. However, if a patch is drawn in SIOSE, but is too small for the MMU and MMW of CORINE, this will lead to disagreement between the two maps.

This explains the results for Category 6 (road and rail networks). Whereas many patches representing road and rail networks are mapped in SIOSE, most of them are not mapped in CORINE because they are less than 100 m wide and therefore do not comply with its MMW criterion (Fig. 3). As a result, the agreement for this category in terms of overlapping patches is very low. Although in the few patches for this category in which the two maps overlap the agreement is high, in most cases the SIOSE road and rail networks patches do not overlap with patches in CORINE, and the agreement is null. Overall, the agreement for this category in the two maps is very low, with a GOF of just 0.112.

In this exercise, the GOF values for the different categories did not indicate a high degree of similarity between the category patterns on the two maps. On the contrary, they indicated different patterns of fragmentation for each category because of the different MMU and MMW rules applied in each map.

In addition to the overall GOF and the GOF table detailing the GOF agreement per category, the Map Curves function also produces two extra tables: the $BMC_A2B and the $BMC_B2A (Fig. 4).

Unlike the other two tables, these tables are only displayed in the “Log” window and are not stored in any folder. For each category, they indicate the category with which it shows most agreement (GOF) on the other map. Whereas, the information in the first table ($BMC_A2B) was obtained using map A (Land use map 1) as the reference, the information in the second table ($BMC_B2A) was obtained using map B (Land use map 2) as the reference.

When Land use map 1 (SIOSE) was used as the reference map, the agricultural areas (category 0) in SIOSE showed the best agreement with the agricultural areas (category 0) in CORINE. The GOF value was 0.783, which indicates a very high overlap between the patches of this category on the two maps.

For Land use map 2 (CORINE), the agricultural areas (category 0) showed the best agreement with the agricultural areas (category 0) of SIOSE. The GOF value was the same as that obtained when SIOSE was used as the reference. In this category it therefore makes no difference which map is used as the reference map.

All the categories showed their best agreement with the same category on the other map. In other words, agricultural areas in Map 1 showed their best agreement with agricultural areas in Map 2, and vegetation areas in Map 1 showed their best agreement with vegetation areas in Map 2 etc. This indicates that the two maps are thematically consistent, i.e. the categories are distributed in a similar way in both maps.

Finally, the last result provided by the Map Curves function is the Map Curves graph (Fig. 5), which is stored in .png format in the folder specified when running the tool (R plots). The graph presents the same information provided in the GOF table. It represents the percentage of categories that reach or exceed a specific GOF threshold. Thus, all the categories (100%) always have a GOF score higher than 0. However, only around 40% of the categories in this map have a GOF score of over 0.5 and none of the categories show perfect agreement (0% of the categories have a GOF score of 1) (Fig. 5).

The graph provides the GOF scores using either Land use map 1 (A) or Land use map 2 (B) as a reference. It is therefore a good summary of the pattern agreement between the two maps.

In summary, in this exercise we have noted that although the GOF value is not very high, CORINE has a very similar pattern to SIOSE. The lower GOF is the result of different pattern fragmentation in the two maps: SIOSE maps have many small patches that do not appear in CORINE. However, if we look at the maps, the polygons from the same category usually overlap very well and have a similar pattern structure. In addition, thematic agreement, as we noted in the $BMC_A2B and $BMC_B2A tables, seems to be very high.

Exercise 2. To validate a simulation against a reference map

Aim

To assess the similarity between the spatial structure of a simulation and the spatial structure of a map used as a reference.

Materials

Simulation CORINE Asturias Central Area 2011

CORINE Land Use Map Asturias Central Area 2011

Requisites

The two maps must be raster and have the same projection. Although the tool works with raster maps at different extents and with different thematic resolutions, we recommend that raster maps with the same or very similar extents and thematic resolutions be compared so as to avoid results that may be not fully informative. For a proper validation, the reference map must be for the same year as the simulation.

Execution

If necessary, install the Processing R provider plugin and download the MapCurves_raster.rsx R script into the R scripts folder (processing/rscripts). For more details, see Chap. “About This Book”.

Step 1

Open the Map Curves Raster function and fill in the required parameters: “Land Use map 1” (CORINE simulation) and “Land Use map 2” (CORINE reference map) (Fig. 6).

Results and Comments

After running the tool, a GOF value was obtained for the whole maps compared and broken down per pair of classes (GOF table). The GOF values are stored in different tables and displayed in the “Log” window ($GOF, $GOFtable). The GOF values per pair of classes are also represented in the Map Curves graph, which is stored in the specified folder (R Plots).

The GOF value for our comparison is very high (0.92). This is logical given that most of the simulated landscape did not change over the simulation period and, therefore, remained the same. Permanence is one of the easiest processes to simulate in LUC modelling. This means that the reference and the simulated maps look very similar. The patterns of the two maps are very similar because most of the pattern remains unchanged over the simulation period and was correctly simulated as such.

The agreement (GOF) per category was always very high. The minimum scores were for port areas (0.669) and mineral extraction sites (0.708). In the modelling exercise, these categories were treated as features (categories that remained invariant during the simulation) and were therefore not simulated. However, a few changes did in fact occur in these categories in the reference map. As a result, the Map Curves analysis produced a relatively poor fit for these categories when comparing the simulation with the reference map. Whereas no change occurred in these categories in the simulation, a few changes did take place in the reference map. Given that these categories consist of a very small number of patches, even a small number of changes can reduce the GOF values substantially.

All in all, this analysis is not particularly meaningful. It confirms that the two compared maps have very similar patterns because most of the landscape was correctly simulated as permanence. However, more meaningful results could be obtained by focusing exclusively on the areas that were simulated as change. Hence, for a proper validation of the simulation, the simulated changes must be compared with the changes observed on the reference maps.

Exercise 3. To validate simulated changes against a reference map of changes

Aim

To evaluate how similar the changes we simulated in our modelling exercise are to those observed on the reference map.

Materials

CORINE Land Use Changes Asturias Central Area 2005–2011

Simulated CORINE changes Asturias Central Area 2005–2011

Requisites

The two maps must be raster and have the same projection. Although the tool does work with raster maps at different extents and with different thematic resolutions, we recommend comparing rasters with the same or very similar extents and thematic resolutions, so as to avoid results that may not be very meaningful. For a proper validation, the simulation and the reference map must refer to the same time period. In both cases, the maps must only display the changes that occurred during the study period, showing all other areas as 0 or some other suitable code.

Execution

If necessary, install the Processing R provider plugin and download the MapCurves_raster.rsx R script into the R scripts folder (processing/rscripts). For more details, see Chap. “About This Book”.

Step 1

Open the Map Curves Raster function and fill in the required parameters: “Land Use map 1” (Simulated CORINE changes) and “Land Use map 2” (CORINE changes) (Fig. 7).

Results and Comments

After running the function, we get the overall GOF ($GOF) value, the GOF value per category ($GOFtable) and the Map Curves graph (R Plots). In this case, the only results that might be useful for interpreting the validity of the simulated changes are the results per category.

The general GOF value is 0.3, but this is artificially high due to the almost perfect overlap of class 0 (areas with no change) which has a GOF value of 0.993 (Table 1). A high level of agreement between areas of permanence is always expected, as explained in detail in the previous exercise (Exercise 2). In this case, however, we want to assess the agreement between simulated changes and reference map changes for the two classes that were modelled actively: urban fabric and industrial and commercial areas.

Table 1 Result from Exercise 3 showing the class GOF values between observed and simulation land use

Full size table

The spatial overlap between these two categories in the two maps is very low. The GOF value for urban fabric (Category 3 in the maps) is only 0.05. In the case of industrial and commercial maps (Category 4) it is even lower: 0.039.

This means that the spatial structure of the simulated changes is very different to that of the changes used as a reference for the same period. Thus, even though the Map Curves analysis for the whole simulation (persistence and changes) obtained good results, the simulated changes overlap poorly with the changes mapped in the reference data.

We cannot draw final conclusions about the different patterns of simulated and reference changes. Even if there is no overlap between them, their shape or fragmentation could be similar. For a clearer picture of these aspects, other tools, such as spatial metrics, must be used (see Chap. “Spatial Metrics to Validate Land Use Cover Maps”).

Exercise 4. To validate a series of maps with two or more time points

Aim

To test the consistency of the pattern of land uses in a series of LUC maps made up of two different time points.

Materials

CORINE Land Use Map Asturias Central Area 2005 v.0

CORINE Land Use Map Asturias Central Area 2011

Requisites

The two maps must be raster and have the same projection. It is also recommended that they have similar extents and thematic resolutions.

Execution

If necessary, install the Processing R provider plugin and download the MapCurves_raster.rsx R script into the R scripts folder (processing/rscripts). For more details, see Chap. “About This Book”.

Step 1

Open the Map Curves Raster function and fill in the required parameters: “Land Use map 1” (CORINE 2005) and “Land Use map 2” (CORINE 2011) (Fig. 8).

Results and Comments

The results show the level of overall agreement between the pair of maps compared ($GOF), the agreement per category ($GOFtable), the best matches between categories ($BMC_A2B, $BMC_B2A) and the Map Curves graph (R plots). All results are displayed in the “Log” window and stored in the preselected folders.

The overall agreement between our maps is 0.5, which is not high. This means that there is only partial overlap between the categories in the two maps. In a series of two or more Land Use maps, persistence is the norm and one would expect almost perfect overlap between the maps for most of the landscape. Landscapes must be very dynamic to experience changes affecting more than 10% of the study area. The Asturias Central Area is not a dynamic landscape of this kind. The low GOF score therefore suggests that a lot of the differences between the two maps are due to technical changes or errors.

When agreement was assessed at the category level, the only very high values were for water bodies (Category 11), with a GOF of 0.961 (Fig. 9), and background (Category 12), with a GOF of 1. The background is therefore identical in the two maps, whereas the water bodies have an almost perfect overlap. The small difference between the two maps for the water bodies category (0.039) could be due to spurious or erroneous changes, although real changes in the areas covered by water may also have taken place.

The agricultural areas (0.709), vegetation areas (0.704) and airports (0.778) show a high level of agreement between the two maps. However, there are still important differences between them that cannot be explained solely by the normal land use dynamism of the study area, in which only small changes usually take place.

For all the other categories, agreement is low or very low. Nonetheless, there is no evidence of systematic confusion between one category on the first map and a different category on the second. This is confirmed by the tables showing the best matches between categories (Fig. 10) in which the best match for each category (i.e. the largest overlap or agreement) was always with the same category on the other map.

The low agreement or overlap between the categories in the two maps is also summarized in the Map Curves graph (Fig. 11), which shows that only around 40% of the classes on the maps obtained a GOF score of over 0.5. This means that more than half the categories show poor overlaps, i.e. most of the categories are mapped very differently on each map.

All in all, we can conclude that the time series we assessed has many errors and uncertainties and is therefore affected by many erroneous or spurious changes. These are changes that did not really happen on the ground and arose due to technical reasons, such as different production methods. In a coherent time series of LUC maps, high GOF scores of 0.9 or over would be expected.

The low agreement in our exercise is due to the change in the methodology used to produce the Spanish CORINE Land Cover maps between 2006 and 2011. The CORINE 2005 map (v.00) used in this exercise was obtained using photointerpretation of satellite imagery. However, from 2011 onwards the CORINE maps were obtained by generalizing more detailed Land Use maps (SIOSE). This change in the production method resulted in LUC maps with important differences from their predecessors. In order to solve this problem, the Copernicus service produced another CORINE map for 2005 in Spain according to the new methodology, which was consistent and comparable with the CORINE 2011 map. This more recent version of the CORINE 2005 map is the one normally used in the different exercises of this book.

2 Change on Pattern Borders

Description

In pairs of maps or time series, this technique is used to identify the changes taking place on the edges of patches. The allocation of changes (on the edge of an existing patch or a new disconnected one) provides useful information about the nature of change dynamics: the expanding or shrinking of existing boundaries or the appearance of new land use patches.

Utility

Exercises
1. To validate a series of maps with two or more time points

By detecting the changes taking place on the edges of the patches, we can assess both the type of landscape dynamics taking place and the data errors resulting from different data sources, classifiers or spectral responses.

QGIS Exercise

Available tools
∙ Raster Raster Calculator Conversion Polygonize ∙ Vector Overlay Extract by location ∙ Vector Table Field Calculator ∙ Vector Analysis Raster Calculator ∙ Vector Table Basic statistics for fields

For the sake of simplicity, we will only be presenting the tools used in this exercise, although we are aware that there are many other tools that could be used to carry out this analysis.

Exercise 1. To validate a series of maps with two or more time points

Aim

To focus on gains taking place on the edges of patches for a specific land use/cover category. We can then assess the proportion of change taking place on the edges of existing patches compared to the change that appears in new, disconnected areas.

Materials

CORINE Land Cover Map Val d’Ariège 2000

CORINE Land Cover Map Val d’Ariège 2018

Requisites

All maps must be rasters and have the same resolution, extent and projection.

Execution

Step 1

First, we extract forests in 2000 (Fig. 12) and then new forested locations in 2018 (non-forest in 2000 AND forest in 2018) using the Raster Calculator (Fig. 13).

Figure 14 shows the result as an overlay of the two maps obtained: forest in 2000 in light green and forest gains between 2000 and 2018 in dark green.

Step 2

We then vectorize the binary raster maps computed in Step 1 using the Polygonize Raster Conversion function with no specific parameters.

Step 3

We now isolate the forest gains on the edge of the pattern. The aim is to distinguish between new areas of forest in 2018 (i.e. that did not exist in 2000) which are contiguous with forests that existed in 2000 and others that are not. For this purpose, we use the Extract by location Vector Selection tool with the ‘touch’ operator (Fig. 15).

Figure 16 shows a detail from the resulting layer: the forests that existed in 2000 are shown in light green, while the new forests that appeared in 2018 separately from existing forests are in dark green. The new forests that appeared in connection with forests that already existed in 2000 are overlaid in brown.

Step 4

In this step we will isolate the new forests that are not connected to forests that existed in 2000. This step is optional insofar as new forest patches not connected to forests that existed in 2000 can be obtained simply by subtracting new connected forests from the total area for new forests.

To get an independent layer of new forest in 2018 that is not connected to forests that existed in 2000, we use the same Extract by location tool, opting this time for the ‘disjoint’ operator (Figs. 17 and 18).

Step 5

The next step is to calculate the area covered by new connected/unconnected forests. We use the Vector table Field Calculator tool to create a new attribute called area_ha (decimal number), selecting the $area operator, divided by 10,000 to calculate the area in ha (Fig. 19).

This operation is carried out for both connected and isolated forests. The updated attribute tables are shown in Fig. 20: table for connected new forests on the left, and for unconnected new forests on the right.

Step 6

Of the various tools available to summarize the characteristics of the assessed patches, we use the Basic statistics for fields vector analysis tool. On the left of Fig. 21 we can see the various parameters that must be filled in, and on the right the log containing the sum of the areas of unconnected new patches of forest.

Results and Comments

The results consist of update attribute tables and statistics, which appear in the log for the Basic Statistics for Fields function. After examining the attribute tables, we found that there were 74 contiguous and 2 isolated polygons representing new forests that did not appear on the map for the year 2000. Table 2 summarizes the basic statistics for both connected and unconnected new forest patches.

Table 2 Results from Exercise 1. Spatial metrics for both, connected and not connected new forest patches

Full size table

As can be seen in Table 2, almost all new forest patches (97.4%) are connected to forests that existed in 2000. These patches cover 92.94% of the total area of new forest. In addition, to better interpret these results, we have to bear in mind that most of the analysed territory is covered by forest; there are too few isolated patches of new forest to allow us to come to general conclusions; and changes take place more frequently on the edges of existing patches, especially for semi-natural dynamics like reforestation, than in new, separate areas of the landscape.

3 Allocation Error Distance

Description

Allocation error distance refers to the distance between a wrongly allocated pixel compared to the closest object belonging to the same category on the reference map. It can be measured in different ways:

(a)
The minimum distance from the edge of the wrongly allocated patch to the edge of the closest patch belonging to the same category on the reference map.
(b)
The distance between the centroids of the two patches described in (a).

Allocation distance error can be expressed in terms of (i) individual pixels/patches, (ii) LUC classes (mean distance) or (iii) the mean distance for all the allocation errors. The mean allocation distance error can be usefully completed by calculating the minimum, maximum and standard deviation values when applied to several patterns (LUC class or whole map).

Utility

Exercises
1. To validate a simulation against a reference map (vector) 2. To validate a simulation against a reference map (raster)

Simulation accuracy can be measured in different ways, such as quantity agreement, allocation agreement, landscape structure agreement, etc. (Hagen-Zanker 2006; Paegelow et al. 2014) as described in Part III of this book. Generally, the indices and maps assessing allocation error tend to focus on the amount involved. Here we go further by measuring “how wrong” the simulation errors are. This analysis, which measures the individual (entity) or mean error distance (LUC class), is complementary to the cross-tabulation of maps at varying spatial resolution, often implemented by fuzzy logic.

QGIS Exercise

Available tools
• Raster Raster Calculator • Raster Analysis Proximity • Processing Toolbox GRASS r.distance r.grow.distance • Processing Toolbox SAGA Distance

GRASS and SAGA toolboxes offer several algorithms for measuring the distance inside a raster grid (r.grow.distance; SAGA distance) or the minimum distance between pixels/patches belonging to two different grid layers (r.distance). Their use inside QGIS may be unstable.

Vector analysis tools require converting raster layers into vector format and then calculate the centroids of the polygons obtained. The Distance to nearest centre (points) tool creates a points layer whose table contains minimum distances between the points in one layer to the nearest point in the second layer.

Both tools (raster and vector) are used in the next two exercises because they provide complementary results.

Exercise 1. To validate a simulation against a reference map (vector)

Aim

To calculate the seriousness (degree) of allocation errors for a specific LUC category, expressed as the minimum mean distance between all the pixels wrongly allocated to this category in the simulation and the nearest patch belonging to the same category on the reference map.

Materials

CORINE Land Cover Map Val d’Ariège 2018

Simulation LCM Val d’Ariège 2018

Requisites

Maps can be raster or vector. They must have the same resolution, extent and projection. If using vector maps, readers can skip the first steps detailed in the execution.

Execution

Step 1

We extract real built-up areas in 2018 (Fig. 22) and the pixels wrongly allocated as built-up areas with the Raster Calculator (Fig. 23). They are areas wrongly simulated as built-up areas, which are not built up according to the reference map.

The right map (A) in Fig. 24 is an overlay of real built-up areas (light grey) in 2018 (Corine Land Cover) and areas wrongly simulated as built-up (black). The left map in Fig. 24 represents the allocation errors that we will now go on to analyse.

Step 2

The two raster layers obtained in Step 1 are now polygonized into vector layers. This is done using the Polygonize function in the Raster—Conversion menu (Fig. 25).

The above map (Fig. 26) shows an overlay of the two vector layers: real built-up polygons in 2018 (reference map) and areas wrongly allocated as built-up (red) by the simulation. Results vary depending on whether or not diagonal connexions are allowed.

Step 3

We then calculate the centroids for each of these vector layers with the Centroids tool (Vector—Geometric tools) (Fig. 27).

Step 4

Once we have obtained the two centroids maps (built-up areas in 2018 and built-up allocation errors), we use the Distance to nearest hub (points) tool available in the Processing Toolbox (QGIS Vector). The source points layer is the point layer containing allocation errors and the destination hubs layer is the layer containing the built-up centroids from the reference map (Fig. 28). We measure the distance in metres and give the output point layer a name.

Step 5

To obtain the desired statistics about the allocation error distance for wrongly simulated built-up areas, we use the Basic statistics for fields tool (Processing Toolbox, Vector—analysis) by selecting the field containing the calculated distance to the nearest hub (Fig. 29).

Results and Comments

The resulting points layer contains the same number of points as the allocation error polygons at the same location. The corresponding table contains the minimum distance between each allocation error (centroid) and the nearest existing built-up area (centroid) on the reference map (Fig. 30).

A summary of the statistics appears in the log of the Basic statistics for fields function (Fig. 31).

As we can see, the mean distance for 132 allocation errors is about 1,236 m. This is quite close to the median value (1,119 m), although standard deviation is also quite high (775 m). When interpreting these values, it is important to remember how the distance was calculated: from centroids offering a one-dimensional representation of the built-up areas (polygons). If we had measured the distance from the nearest edge to the nearest edge, the values would have been lower.

The mean allocation error distance of about 1.2 km should be put into context by comparing it with the spatial extent of the layer, which is about 31 × 62 km. It may also be useful to compare this value with the mean allocation error distances for other LUC categories and the mean value for all the allocation errors.

Exercise 2. To validate a simulation against a reference map (raster)

Aim

To calculate the seriousness (degree) of allocation errors for a specific LUC category expressed as the minimum, individual and mean distance between wrongly allocated areas (simulation map) and the nearest patch belonging to the same LUC category on the reference map.

Materials

CORINE Land Cover Map Val d’Ariège 2018

Simulation LCM Val d’Ariège 2018

Built-up allocation error map (generated during Exercise 1)

Requisites

All maps must be rasters and have the same resolution, extent and projection.

Execution

Step 1

First, we compute a raster distance map up from built-up areas using the QGIS raster function Proximity (Fig. 32). If the built-up areas layer is not available, it must be extracted from the CLC_2018 layer using Raster Calculator (see Step 1 of the previous exercise).

In the Proximity tool, the input layer is built-up areas in 2018. We have to specify the target pixels (allocation errors = 1) and the fact that we want to calculate the distance in Coordinate Reference System (CRS) units (Fig. 32). The result is shown in Fig. 33. This map illustrates the distance between areas wrongly allocated to built-up (red) in the simulation and real built-up areas on the reference map (mapped in grey).

Step 2

Once you have obtained a distance map and an allocation error map in vector format (obtained in the previous exercise, Step 2), the next step involves extracting statistics from the raster distance map in order to update the table for the polygon (vector) layer of allocation errors using the Zonal statistics tool (Processing toolbox) (Fig. 34).

Open this function and choose the distance map to built-up areas 2018 (reference map) and the vector layer containing the allocation errors for the built-up category in 2018 (simulation). The table (Fig. 36) for the vector layer will be enhanced by one or more additional columns depending on the number of statistics selected. In this case, the following values were measured: minimum, mean, median, standard deviation and maximum (Fig. 35). Figure 36 shows the updated table.

Step 3

The third and last step can be done on a spreadsheet. We will calculate the mean values (mean, median, standard deviation, minimum and maximum) for the individual distances extracted (Table 3).

Table 3 Results from Exercise 2. Calculated statistics

Full size table

Results and Comments

As we can see, the mean minimum distance for built-up commission errors is about 28.5 m. The mean distance is close to 57 m. The mean maximum distance is quite small (106.81) and the standard deviation is low (21.98). This means that allocation errors affect small patches or are close to the right location.

The values obtained in this exercise differ greatly from those obtained in Exercise 1. During Exercise 1 we calculated the distances between the centroids of polygons. This may result in longer distances than those generated by the technique used in Exercise 2, which measures the mean or minimum distance. The two techniques can produce different results, depending on the number, the extent and the shape of the features being analysed.

Notes

1.
The code is available on the Professor’s personal website: https://www.uva.nl/en/profile/l/o/e.e.vanloon/e.e.vanloon.html.
2.
Full details of this R package and the functions it includes can be found at: https://cran.r-project.org/web/packages/sabre/index.html.

References

Hagen- A (2006) Map comparison methods that simultaneously address overlap and structure. J Geogr Syst 8:165–185. https://doi.org/10.1007/s10109-006-0024-y
Article Google Scholar
Hargrove WW, Hoffman FM, Hessburg PF (2006) Mapcurves: a quantitative method for comparing categorical maps. J Geogr Syst 8:187–208. https://doi.org/10.1007/s10109-006-0025-x
Article Google Scholar
Paegelow M, Camacho Olmedo MT, Mas JF, Houet T (2014) Benchmarking of LUCC modelling tools by various validation techniques and error analysis. Cybergeo. https://journals.openedition.org/cybergeo/26610. https://doi.org/10.4000/cybergeo.26610

Download references

Author information

Authors and Affiliations

Département de Géographie, Aménagement et Environnement, Université Toulouse Jean Jaurès, Toulouse, France
Martin Paegelow
Departamento de Geología, Geografía y Medio Ambiente, Universidad de Alcalá, Alcalá de Henares, Spain
David García-Álvarez

Authors

Martin Paegelow
View author publications
You can also search for this author in PubMed Google Scholar
David García-Álvarez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Martin Paegelow .

Editor information

Editors and Affiliations

Departamento de Geología, Geografía y Medio Ambiente, Universidad de Alcalá, Alcalá de Henares, Spain
David García-Álvarez
Departamento de Análisis Geográfico Regional y Geografía Física, Universidad de Granada, Granada, Spain
María Teresa Camacho Olmedo
Département de Géographie, Aménagement et Environnement, Université de Toulouse Jean Jaurès, Toulouse, France
Martin Paegelow
Laboratorio de Análisis Espacial, Centro de Investigaciones en Geografía Ambiental, Universidad Nacional Autónoma de México, Morelia, Mexico
Jean François Mas

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Paegelow, M., García-Álvarez, D. (2022). Advanced Pattern Analysis to Validate Land Use Cover Maps. In: García-Álvarez, D., Camacho Olmedo, M.T., Paegelow, M., Mas, J.F. (eds) Land Use Cover Datasets and Validation Tools. Springer, Cham. https://doi.org/10.1007/978-3-030-90998-7_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-90998-7_12
Published: 17 June 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-90997-0
Online ISBN: 978-3-030-90998-7
eBook Packages: Earth and Environmental ScienceEarth and Environmental Science (R0)

Publish with us

Policies and ethics

Advanced Pattern Analysis to Validate Land Use Cover Maps

Abstract

Similar content being viewed by others

Techniques for the Validation of LUCC Modeling Outputs

Lessons and Challenges in Land Change Modeling Derived from Synthesis of Cross-Case Comparisons

Validating models of one-way land change: an example case of forest insect disturbance

Keywords

1 Map Curves

2 Change on Pattern Borders

3 Allocation Error Distance

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Advanced Pattern Analysis to Validate Land Use Cover Maps

Abstract

Similar content being viewed by others

Techniques for the Validation of LUCC Modeling Outputs

Lessons and Challenges in Land Change Modeling Derived from Synthesis of Cross-Case Comparisons

Validating models of one-way land change: an example case of forest insect disturbance

Keywords

1 Map Curves

2 Change on Pattern Borders

3 Allocation Error Distance

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation