Keywords

1 Basic Cross-Tabulation

Description

Cross-Tabulation is a primary analysis that crosses two datasets, either raster or vector, to analyse their spatial relation. This analysis combines the datasets in spatial terms. It produces a map or table that shows how the values of one dataset spatially relate with the values in the other, thereby informing us as to whether the two datasets share the same values at a given location and, if not, with which other values they have established a relation.

Utility

Exercises

1. To validate a map against reference data/map

2. To validate soft maps produced by the model against a reference map

3. To validate a simulation against a reference map

4. To validate a series of maps with two or more time points

Starting with a map and some reference data, we can use Cross-Tabulation to determine to what extent the map we want to validate agrees with the reference data. In this way we can compare the success of a LUC classification exercise or a LUCC modelling exercise against reference data. We can also assess how uncertain a map is with regard to the data used as a reference. Cross-Tabulation can also be used to study the LUC changes between pairs of maps at two or more different points in time, or to validate a chronological series of maps, as it can detect unusual or abnormal changes, which could be due to technical errors.

The Cross-Tabulation matrix provides users with a lot of information from the maps in one single analysis. However, in order to take advantage of the full potential of this analysis, it is important for them to understand what all this information means. This is what we will be explaining in this chapter.

The results of Cross-Tabulation can then be used to make further analyses and to extract other metrics that allow us to take full advantage of this basic analysis. These methods (e.g. LUCC budget, Quantity & Allocation disagreement, the Figure of Merit, Intensity Analysis) (see Sects. 2, 3, 4 and 6 in Chapter “Pontius Jr. Methods Based on a Cross-Tabulation Matrix to Validate Land Use Cover Maps”) make it easier for users to interpret the results. However, they also require many further analyses and are therefore more time-consuming. We will now provide an overview of some relevant examples:

  • Hagen-Zanker (2009) used a well-known Cross-Tabulation matrix to improve the fuzzy Kappa statistic (see Sect. 3 in Chapter “Metrics Based on a Cross-Tabulation Matrix to Validate Land Use Cover Maps”).

  • Alcamo et al. (2011) used the Cross-Tabulation function with potential maps from a land use change model.

  • Mas et al. (2014) and Paegelow et al. (2014) used Cross-Tabulation in different ways to provide useful information to help them assess land change model robustness.

  • Krüger and Lakes (2015) calculated a disagreement index from the Cross-Tabulation matrix used in LUCC modelling exercises.

  • Pontius (2018) created an Excel spreadsheet that performs a range of automatic analyses from the Cross-Tabulation matrix.

The maps to be compared or assessed may be in either raster or vector format. For those in raster, we can use both hard and soft maps, such as suitability, transition potential and probabilities maps.

QGIS Exercises

The methods and techniques presented in Chapter “Pontius Jr. Methods Based on a Cross-Tabulation Matrix to Validate Land Use Cover Maps” (e.g. LUCC Budget, Intensity Analysis, Quantity and Allocation disagreement…) are based on this basic Cross-Tabulation analysis. In this chapter, we will therefore only be describing the fundamental principles and the normal procedure followed when performing a Cross-Tabulation between two datasets.

Available tools

• Processing Toolbox

SAGA

Image analysis

Confusion matrix (two grids)

Confusion matrix (polygons/grid)

Raster analysis

Cross-classification and tabulation

• Processing Toolbox

GRASS

Raster

r.cross

• Semi-Automatic Classification Plugin

Tab: Postprocessing

Section: Cross classification

Section: Accuracy

Section: Land cover change

QGIS includes many tools for cross-tabulating spatial data through the associated GRASS and SAGA models. The “Semi-Automatic Classification Plugin” also includes tools to cross-tabulate datasets for different purposes.

Table 1 includes a review of the available Cross-Tabulation tools in QGIS. It provides information of the input and output parameters in each tool. Although the r.kappa function also cross-tabulates two raster maps to obtain the Kappa index, we will not be analysing it in this chapter. Those interested in using this tool should refer to the Kappa indices, Sect. 3 in Chapter “Metrics Based on a Cross-Tabulation Matrix to Validate Land Use Cover Maps”.

Table 1  Review of Cross-Tabulation tools available in QGIS

The associated R software can also be used to cross-tabulate pairs of maps. This is done using the crosstab function, which is part of the “raster” package.Footnote 1 As QGIS already provides many tools for carrying out this analysis, we will not be covering the implementation of this R function in QGIS here.

Of all the tools available in QGIS, the one we will be recommending and using in this book is the “Semi-Automatic Classification Plugin”, which proved to be the most efficient and stable of all those assessed.

Exercise 1. To validate a map against reference data/map

Aim

To validate the CORINE 2011 Land Use map, taking the SIOSE 2011 Land Use map as a reference.

Materials

SIOSE Land Use Map Asturias Central Area 2011

CORINE Land Use Map Asturias Central Area 2011

Requisites

The two maps must have the same extent, spatial resolution, projection and classification legend. If the maps have different classification legends, the user must reclassify the maps in such a way as to unify the two legends.

Execution

Step 1

Open the “Semi-Automatic Classification Plugin” and select the “Postprocessing” tab from the sidebar. Then click on Accuracy and select the required parameters: raster to assess (CORINE map) and reference raster (SIOSE map) (Fig. 1).

Fig. 1
figure 1

Exercise 1. Step 1. Semi-automatic classification plugin

Results and Comments

Once the function has been executed, QGIS creates an output raster that gives each pixel a code. This code identifies every single possible combination of values between the two input rasters. The meaning of each code is presented in a table in CSV format, which is stored in the same folder as the raster. This information is also displayed in the “output” window of the “Semi-Automatic Classification Plugin” (Fig. 2).

Fig. 2
figure 2

Results from Exercise 1 displayed in the “output” window of the Semi-Automatic Classification Plugin

If we analyse the first matrix shown in the “output” window (ErrMatrixCode/Reference/Classified/PixelSum), it will help us understand the meaning of the codes in the raster. The “ErrMatrixCode” is the number that identifies each pixel in the new raster. “Reference” is the code for the category on the reference map (i.e. SIOSE Land Use Map). “Classified” is the code for the category on the compared map (i.e. CORINE Land Use Map), and “PixelSum” refers to the number of pixels for each combination in the new raster.

The ErrMatrixCode 1 identifies 234,164 pixels (PixelSum) in category 0 in SIOSE (Reference) and 0 in CORINE (Classified). The codes for combinations in which the reference and the classified categories are the same (e.g. 0, 0) mean agreement, while those in which the reference and the classified categories are different (e.g. 0, 1) mean disagreement. Code 2 is therefore a disagreement area because the pixel is classified as 0 in SIOSE and as 1 in CORINE.

If we symbolize the obtained raster in such a way that all the codes that refer to combinations of the same classes (1, 15, 29, 43, 57, 71, 85, 99, 113, 127, 141, 155) are labelled as agreement and all the codes that refer to combinations of different classes are labelled as disagreement, we can obtain a map like the one presented in Fig. 3. Code 169 is not represented on this new map because it refers to pixels that are background (category 12) in both SIOSE and CORINE.

Fig. 3
figure 3

Result from Exercise 1. Map showing areas of agreement and disagreement between CORINE and SIOSE maps

Although the map in Fig. 3 illustrates the general pattern of disagreement areas, it does not provide much information about the particular characteristics of the disagreement between the two datasets. For a better understanding of how similar/different CORINE is from SIOSE, other maps must be drawn up.

With the obtained raster, we can for example represent where the urban fabric of CORINE (2) confuses with other classes in SIOSE. We can even detail which classes of SIOSE are affected.

To do so, we must first identify the codes (ErrMatrixCode) for the combinations we are looking for, i.e. pixels which are urban fabric in CORINE (Classified is 2) and which belong to any other category in SIOSE (Reference is not 2). These are codes 3, 16, 42, 55, 68, 81, 94, 107, 120, 133, 146 and 159. We can also represent the pixels that both CORINE and SIOSE label as urban fabric (code 29). The resulting map can be seen in Fig. 4.

Fig. 4
figure 4

Result from Exercise 1. Map showing areas of agreement and disagreement between CORINE and SIOSE maps for the CORINE category "urban fabric". The map specifies with which categories of SIOSE the urban fabric category of CORINE is confussed

This map shows the city of Oviedo and its immediate surrounding area. Most of the city is identified as urban fabric in both SIOSE and CORINE. However, CORINE also labels as urban fabric many small patches that SIOSE identifies, for example, as industrial or commercial areas or artificial green urban areas. This disagreement is to be expected given the different Minimum Mapping Units (MMU) and Minimum Mapping Widths (MMW) of both databases. The MMU used in CORINE is 25 ha, whereas in SIOSE it is only 0.5–2 ha. The result is that many of the small patches inside the city that SIOSE identifies as other classes are classified as urban fabric in CORINE because of the scale at which this map was made. CORINE offers a much more generalized picture of the landscape to be mapped out. This means that when validated against SIOSE, numerous errors emerge due to the level of generalization.

In addition to the map, the accuracy analysis of the “Semi-Automatic Classification Plugin” also generates two error/Cross-Tabulation matrixes, one in cells and the other in square meters (area proportions). The matrix in cells (Fig. 5) shows the number of pixels for each combination. For example, if we look at the combination 0–0, we see that there are 234,164 pixels that have the same value (0) in SIOSE and CORINE. In other words, there are 234,164 pixels classified as agricultural areas on both maps.

Fig. 5
figure 5

Result from Exercise 1 displayed in the “output” window of the Semi-Automatic Classification Plugin: Error matrix in pixels

The area-based error matrix gives us the same information (the proportion of the total area of the raster represented by each combination), but in different units. Using the example above, the matrix shows that combination 0-0 covers a fraction of 0.2535/1 of the map, i.e. 25.35% of its pixels (Fig. 6).

Fig. 6
figure 6

Result from Exercise 1 displayed in the “output” window of the Semi-Automatic Classification Plugin: Area based error matrix (proportions)

An analysis of the two tables (area-based error matrix and error matrix pixel count) offers us a detailed picture of how the categories on one map relate with the categories on the other. This highlights the degree of agreement between the reference map and the one we are trying to validate. In both tables, the combinations in which there is agreement can be seen on a diagonal line running across the table. All combinations outside this diagonal mean disagreement (Table 2).

Table 2 Traditional scheme of a Cross-Tabulation matrix, differentiating which cells indicate agreement between the compared maps and which cels indicate disagreement

If we look at urban fabric, of a total of 28,110 pixels labelled as urban fabric in CORINE (Total column on the right), 19,455 are also labelled as urban fabric in SIOSE. That is, almost 70% of the pixels identified as urban fabric by CORINE are also considered urban fabric in SIOSE. In the other 30%, CORINE mostly confuses urban fabric with industrial and commercial areas (category 3, 2244 confused pixels), artificial green urban areas (category 9, 1643 confused pixels) and road and rail networks (category 6, 1216 confused pixels).

These results are due to the greater degree of generalization when mapping CORINE, as explained above. On the basis of these results and taking SIOSE as a reference, we can conclude that CORINE maps urban fabric correctly and can be considered a valid map for our exercises.

Users can also carry out more complex analyses with these matrixes using the CSV file generated by the tool. In this way the matrixes can be imported in spreadsheet format with software such as Excel or OpenOffice Calc. We can then calculate the agreement and disagreement percentages for the whole raster or for each of the categories under consideration, as we did manually for the urban fabric above.

The error matrixes also provide useful statistical measures (Fig. 6), such as the standard error (SE), confidence interval (CI), producer’s accuracy (PA), user’s accuracy (UA), overall accuracy (in %) and Kappa (see Sect. 3 in Chapter “Metrics Based on a Cross-Tabulation Matrix to Validate Land Use Cover Maps”).

Exercise 2. To validate soft maps produced by the model against a reference map

Aim

To find out whether the urban fabric soft map produced by our model agrees with the urban fabric areas of the reference map for the year of the simulation.

Materials

CORINE Land Use Map Asturias Central Area 2011

Urban fabric suitability map – CORINE model

Requisites

The two maps must have the same extent, spatial resolution and projection. The soft map must be categorical. The Land Use map must only contain information about the category being assessed. For a proper validation, the reference map must refer to the same date on which the landscape was simulated.

Execution

Step 1

Only discrete or categorical maps can be cross-tabulated. As the soft map we want to validate is continuous (continuous values from 0.1 to 1), the first step must be to convert it into a categorical map, using the Reclassify by table function (Processing toolbox > Raster analysis > Reclassify by table).

After opening this tool, we select the map we want to reclassify (Urban fabric suitability map) and fill in the “Reclassification table” with the new values that will be replacing the old ones in the raster (Fig. 7). In this case, we are going to reclassify the values on our suitability map (0–1) into four categories, from low to high suitability. The new categories will be 1 (suitability 0–0.25), 2 (0.25–0.50), 3 (0.50–0.75) and 4 (0.75–1).

Fig. 7
figure 7

Exercise 2. Step 1. Reclassify by table

Step 2

Given that our objective is to compare the suitability values for urban fabric in the model with the areas classified as urban fabric on the 2011 map, we must ignore all the other categories on the Land Use Cover map. We must therefore obtain a binary map from the initial CORINE map. In this binary map, 1 will mean the category being evaluated (urban fabric) and 0 all the others.

To obtain this binary map, we repeat the same process as in Step 1. In this case, we reclassify the CORINE map, assigning a value of 1 to urban fabric (code 2 in the original map) and a value of 0 to the other categories (codes 0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11 and 12) (Fig. 8).

Fig. 8
figure 8

Exercise 2. Step 2. Table required for the Reclassify by Table tool

Step 3

Once we have obtained the two maps, we then carry out Cross-Tabulation using the “Semi-Automatic Classification Plugin”. We click on the “Postprocessing” tab and select the Cross classification option.

We then select the required parameters. In “Select the classification” we choose the reference Land Use Cover map obtained after reclassification (Step 2). In “Select the reference vector or raster” we choose the soft map obtained after reclassification (Step 1) (Fig. 9).

Fig. 9
figure 9

Exercise 2. Step 3. Semi-Automatic Classification plugin

Results and Comments

Once the function has been executed, QGIS creates a raster and a CSV file with all the results of the Cross-Tabulation. These are also displayed in the “Output” window (Fig. 10).

Fig. 10
figure 10

Results from Exercise 2 displayed in the “output” window of the Semi-Automatic Classification Plugin

The first table provides information about the meaning of each code in the new raster. Pixels with value 2 refer to areas that are urban fabric (Classification is 1) and have a suitability of less than 0.2 (Reference category is 1). This combination occurs in just 2 pixels (PixelSum), which represent an area of 5000 m2 on the map (Area [metre2]).

The second table gives an overview of the possible combinations on the two maps and the area, in square meters, covered by these combinations. This shows that the areas that are not urban fabric (Classification is 0.0) and have a suitability of below 0.25 (Reference 1) occupy 2,312,499 m2.

From all the possible combinations, we can see that most of the pixels that are urban fabric on the reference map fit with the areas with the highest suitability to become urban fabric (26,137 pixels, 65,342,474 m2). There are relatively few urban fabric pixels with a suitability of between 0.5 and 0.75 (1971 pixels, 4,927,498 m2) and an insignificant number with a suitability of less than 0.5.

These results indicate that our suitability map has been validated. In other words, the high suitability values on the soft map correspond with urban fabric areas on the reference map. For their part, the low suitability values correspond to areas where there is no urban fabric on the map. This means that when we use this map in our simulation, it will help us to correctly identify those areas that can become urban in the future and those that cannot.

Other more sophisticated tools, such as the ROC curve and the Difference in Potential (see Sects. 2 and 3 in Chapter “Validation of Soft Maps Produced by a Land Use Cover Change Model”), can be used to complement this analysis and offer the user a full overview of the validity of their potential maps.

Exercise 3. To validate a simulation against a reference map

Aim

To validate a simulation for the year 2011, which we obtained through our LUCC modelling exercise with CORINE maps, against a CORINE reference map for the year 2011.

Materials

Simulation CORINE Asturias Central Area 2011

CORINE Land Use Map Asturias Central Area 2011

Requisites

The two maps must have the same extent, spatial resolution, projection and legend. For a proper validation, the reference date must refer to the date on which the landscape was simulated.

Execution

Step 1

Open the “Semi-Automatic Classification Plugin”, click on the “Postprocessing” tab and select the section Accuracy. Then, select the required parameters: raster to assess (Simulation) and reference raster (CORINE reference map) (Fig. 11).

Fig. 11
figure 11

Exercise 3. Step 1. Semi-Automatic Classification plugin

Results and Comments

When we execute the function, QGIS creates an output raster showing the combination of classes between the two input maps. The function generates three tables in the “output window”, which are also stored in CSV format in the same folder as the raster. They specify the meaning of each code in the new raster. They also include a couple of error/Cross-Tabulation matrixes, in cells and in square meters (proportional quantities) (Fig. 12). Statistical measures such as standard error (SE), confidence interval (CI), producer’s accuracy (PA), user’s accuracy (UA), overall accuracy (%) and Kappa are also provided in the tables.

Fig. 12
figure 12

Results from Exercise 3 displayed in the “output” window of the Semi-Automatic Classification Plugin

If we symbolize the raster and focus on the information in the Cross-Tabulation matrix of most interest for assessing our simulation, we can understand the errors we made in our modelling exercise in greater detail.

In our exercise we only actively modelled two categories: urban fabric and industrial areas. In the raster we can identify the simulated areas that show agreement (or disagreement) with the reference map for each of these two categories. To do this, the first step is to identify the code for the combinations involving the two categories being considered: urban fabric (2) and industrial and commercial areas (3).

The combination codes for urban fabric are 3, 16, 29, 42, 81, and 120, while code 29 represents the areas that were simulated as urban fabric (Classified is 2) and also appear as urban fabric on the reference map (Reference is 2). The combination codes for industrial and commercial areas are 4, 17, 30, 43, and 82, while code 43 represents the pixels that are industrial and commercial areas in both the simulation and the reference map.

If we symbolize the raster obtained using these codes in terms of agreement (codes 29 and 43) and disagreement (all the other codes mentioned above), we can visualize the pattern of error in our simulations compared to the map we use as a reference (Fig. 13).

Fig. 13
figure 13

Result from Exercise 3. Map showing areas of agreement and disagreement between our simulation and the reference map for the two categories actively simulated: urban fabric, industrial and commercial areas

Most of the simulated areas agree with the reference map. Disagreement can only be observed in a few cases. However, this conclusion may be misleading. Most of the agreement refers to areas that were already urban fabric or industrial and commercial areas, i.e. areas that were correctly simulated as permanence.

Simulating permanence for artificial surfaces is very easy. A high rate of success is expected in all cases. If we focus on the areas that actually changed during the simulation period in relation to the reference map and those that were simulated as change, we can detect a higher proportion of errors. However, this cannot be detected on our map. In order to focus on these errors, we should only cross-tabulate the changes in the simulation with respect to the initial map (CORINE 2005) and the changes in the reference map (CORINE 2012) with respect to the initial map (CORINE 2005). Using this method, the new map and the Cross-Tabulation table would only assess those areas that changed between the two dates, so removing unchanged areas from the analysis.

An analysis of the error/Cross-Tabulation matrixes leads to similar conclusions. For urban fabric, out of a total of 28,183 pixels labelled as such on the simulation map (Total column on the right), 27,402 pixels were also classified as urban fabric on the reference map. A total of 621 pixels confuse with agricultural areas, 60 with vegetation areas and 100 with other categories on the reference map. Most of the confusion is therefore with categories where one would expect new urban fabric to develop.

Once again, whereas most of the agreement refers to areas that were already urban fabric in the past and were correctly simulated as persistence, confusion seems to refer above all to areas that were not correctly simulated. That is, agricultural and vegetation areas where new urban fabric was simulated but which, according to the reference map, did not actually undergo any change. We therefore need to repeat the analysis, focusing only on the areas that actually change so as to assess the success of our simulation more effectively.

Other tools, such as the Figure of Merit (see Sect. 4 in Chapter “Pontius Jr. Methods Based on a Cross-Tabulation Matrix to Validate Land Use Cover Maps”), can also be useful to help validate the simulation and overcome some of the limitations we have encountered.

Exercise 4. To validate a series of maps with two or more time points

Aim

To study the land use change between two CORINE maps at two different time points: 2005 and 2011.

Materials

CORINE Land Use Map Asturias Central Area 2005

CORINE Land Use Map Asturias Central Area 2011

Requisites

The two maps must be raster and must have the same extent, spatial resolution, projection and classification legend. If the maps have different classification legends, the user must reclassify the maps in such a way as to unify the two legends. The maps must refer to two different points in time.

Execution

Step 1

The first step is to obtain a raster for the whole study area, showing the areas that changed during the study period and those that remained the same.

To get this map, open the “Semi-Automatic Classification Plugin”, click on the “Postprocessing” tab and then select Land cover change. Then, complete the required parameters, selecting the older map as the reference classification (CORINE 2005) and the more recent one as the new classification (CORINE 2011). Mark the “Report unchanged pixels” option.

Step 2

To obtain a map that only shows the areas that changed during the study period, we must repeat the same operation, this time leaving the “Report unchanged pixels” option unmarked (Fig. 14).

Fig. 14
figure 14

Exercise 4. Step 2. Semi-Automatic Classification plugin

Results and Comments

After executing Steps 1 and 2, QGIS creates two output rasters, one showing changes and permanent areas (Fig. 15) and the other showing just the changes between the two maps (Fig. 16). Each raster will identify each possible combination between categories or pixel values with a single unique code.

Fig. 15
figure 15

Result from Exercise 4. Raster displaying the areas that are the same in the two maps compared, that is, the areas of permanence in the time series

Fig. 16
figure 16

Result from Exercise 4. Raster displaying the areas that are different in the two maps compared, that is, the areas of change in the time series

The function also generates a table for each map in the “output” window and stored in CSV format. This table shows each possible combination and the code with which it is represented in the output rasters (Fig. 17). All the combinations are included in the table, even if no pixels actually undergo this change.

Fig. 17
figure 17

Results from Exercise 4 displayed in the “output” window of the Semi-Automatic Classification PluginFootnote

ReferenceClass and NewClass columns may appear swiched due to the use of a different version of the “Semi-Automatic Classification Plugin”.

Both the rasters and the table can be used to understand the changes in our study area. The table shows those that took place during the study period (Table 2) and includes changes from agricultural areas (category 0 in CORINE 2005), vegetation areas (category 1), urban fabric (2), industrial and commercial areas (3), mineral extraction sites (4), road and rail networks (6) and water bodies (11).

Of the various different transitions of agricultural areas, the one to urban fabric (from category 0 in 2005 to category 2 in 2011) is the most important with a total of 751 pixels. As regards the transitions in vegetation areas (category 1), the most common was the change from vegetation areas to agricultural areas (from category 1 in 2005 to category 0 in 2011), with a total of 588 pixels.

This change in pixels (Table 3) can be translated into a change in area, by multiplying each pixel by the area it covers. The spatial resolution of our raster is 50 m, so the calculation is easy: a square with a 50 m side covers a surface area of 2500 m2. This is the area of each pixel. Therefore, the transition from agricultural areas (0) to urban fabric (2) which took place in 751 pixels affected an area of 1,877,500 m2.

Table 3 Result from Exercise 4. Table showing the transitions detected between the two maps compared and their size

Most of the change in our area was between agricultural and vegetation areas and vice versa and from agricultural and vegetation areas to artificial surfaces. However, there were also various other interesting transitions, such as the conversion of water bodies into port areas (from category 11 to category 7), which affected a total of 657 pixels. This was due to the construction of a dock in Gijón in the north of our study area.

By symbolizing the raster of changes (Fig. 18), we can gain a spatial perspective of what changed. To obtain this map, we must group the changes together according to the new land use. Codes 13, 25, 37, 49 and 73 will show the areas that changed to agricultural areas. Codes 1, 26, 50 and 74 will show changes to vegetation areas. Codes 2, 14, 39 and 51 will show changes to urban fabric. Codes 3, 15, 27, 52 and 136 will show changes to industrial and commercial areas. Codes 4 and 16 will show changes to mineral extraction sites. Codes 5 and 17 will show changes to dump sites. Codes 6 and 18 will show changes to road and rail networks. Code 140 will show changes to port areas. Codes 9 and 33 will show changes to artificial green urban areas. Finally, Code 22 will show changes to open spaces with little or no vegetation.

Fig. 18
figure 18

Result from Exercise 4. Map showing areas of change between the two maps compared, displayed over the map for the oldest year

In the composition of the map in Fig. 18, we also added CORINE 2006 as the base layer, with an opacity of 10%, to enable us to interpret the changes on the map better.

The map shows the changes for the example area of Gijón. In the north, we can observe the new dock built in the port area. Apart from the port, most of the growth in industrial land took place in the south of the city. The same is true for urban fabric, with the construction of a new residential development in Roces. As can be seen on the map, this new residential area is cut off from the existing urban fabric of the city. There is a highway running between the two.

The results of this analysis can also be useful to validate a chronological series of maps. When interpreting the changes, it can help detect unrealistic changes that may be due to errors in the input data. We can also detect changes in the boundaries of the study area which cannot be fully represented on the maps because the study area has been clipped.

Other tools and techniques, such as LUCC budget or Quantity and Allocation disagreement, can also help characterize real changes in the study area and detect areas where no changes have taken place, despite being marked as change areas on the maps. In this way, these techniques can provide useful, complementary information on this question.

2 Multiple-Resolution Cross-Tabulation

Description

Multiple-Resolution Cross-Tabulation is based on the same technique as basic Cross-Tabulation (see previous section). It crosses two raster datasets at a minimum of two different spatial resolutions: the original resolution and a coarser one. However, users can compare the dataset at as many different resolutions as they deem fit. These must always be coarser than the original spatial resolution.

The concept of spatial resolution refers to the level of spatial detail available in the spatial data. It applies to data in raster format, where the spatial resolution is defined by the pixel size. This means that, unlike basic Cross-Tabulation, this analysis can only be performed with raster data.

Utility

Exercises

1. To validate a map against reference data/map

2. To validate soft maps produced by the model against a reference map

3. To validate a simulation against a reference map

This technique aims to control the multiscale uncertainty of a validation exercise, which is not considered in basic Cross-Tabulation analyses. It can also be used to evaluate the uncertainty of a LUC classification exercise, a LUC map or a LUCC modelling exercise against reference data.

Maps that show a lot of disagreement at detailed scales can refer to the same information at coarser scales. This technique can therefore be used to discover at which spatial resolution a map is considered least uncertain according to the information provided by a reference map.

This analysis can be used as a complement to fuzzy logic tools (Fritz and See 2005), which evaluate the agreement between maps by considering spatial near-hits. A near-hit occurs when two pixels that share the same value are not in the same spatial position, but close to each other.

Multiple-Resolution Cross-Tabulation can only be carried out with raster data. However, we can make the comparison with either hard- or soft-classified raster maps, such as suitability, transition potential or probabilities maps. In the last case, we must always reclassify the soft-classified raster maps in a set of categories. It is not possible to cross-tabulate rasters with a continuous range of values.

As in the case of basic Cross-Tabulation, if we want to explore the full potential of the results of these analyses, we can use other complementary metrics such as Land Use Cover Change budget (LUCC budget, see Sect. 2), Quantity and Allocation disagreement or the Figure of Merit (see Sects. 3 and 4 in Chapter “Pontius Jr. Methods Based on a Cross-Tabulation Matrix to Validate Land Use Cover Maps”).

In addition to the basic Multiple-Resolution Cross-Tabulation presented in this section, some more sophisticated variants have been proposed by other authors. These include:

  • Costanza (1989), who proposed a method to determine the goodness of fit between model output and spatial and/or time series data based on the idea that the measurements at one resolution are not sufficient to describe more complex patterns. In his method, an expanding window is used to gradually degrade the resolution of the data, establishing, among the lack of fit, situations of “registration”, “resolution” and residual components.

  • Kok et al. (2001), who proposed a multiscale land use change modelling procedure, applied at five spatial resolutions, and demonstrated that results improve strongly as spatial resolution decreases.

  • Pontius and Cheuk (2006) proposed a method for computing a Cross-Tabulation matrix at multiple scales, focusing on soft-classified pixels. This Multiple-Resolution method resolves difficulties due to traditional Cross-Tabulation approaches and fuzzy methods, proposing a Composite operator.

QGIS Exercises

Available tools

• Processing Toolbox

GRASS

Raster

r.resample

• Processing Toolbox

GDAL

Raster projections

Warp (reproject)

• Processing Toolbox

SAGA

Raster tools

Resampling

• Layer

Save As…

QGIS does not include a tool to cross-tabulate maps at multiple resolutions. To carry out this analysis, it is therefore necessary to combine raster resampling tools with the basic Cross-Tabulation tools. For detailed information of the tools available in QGIS for performing Cross-Tabulation, please refer to Sect. 1.

Various different tools can be used to resample raster maps in QGIS. The GRASS module provides a tool (r.resample) for resampling the raster according to the Nearest Neighbour method. The GDAL module provides a tool to reproject rasters (Warp (reproject)) that also enables resampling through different methods, including the Nearest Neighbour. For its part, the SAGA toolbox provides a tool for resampling rasters with similar options. In addition, the QGIS interface allows the user to resample maps by making a copy of a displayed map via the option “Save raster layer as…” (Layer > Save as).

For categorical maps such as Land Use Cover maps, two resampling strategies are usually applied: Nearest Neighbour and Majority Rule. We decided to apply Nearest Neighbour because this is the method that best preserves the landscape composition and configuration or in other words, the proportions of the different categories and their patterns.

The four resampling tools available in QGIS are all equally valid. In this case we decided to use the tool that becomes available when making a copy of an existing raster (Save as…) because of its simplicity and efficiency. Nevertheless, users must be aware that the resampled rasters will vary slightly depending on the method chosen, and are therefore not fully comparable. Once a method or tool has been selected, all the resampling procedures must be performed using this same method or tool.

Exercise 1. To validate a map against reference data/map

Aim

To validate the CORINE 2011 Land Use map, taking the SIOSE 2011 Land Use map as the reference and determining the resolution at which the maps show most agreement.

Materials

SIOSE Land Use Vector Map Asturias Central Area 2011

CORINE Land Use Vector Map Asturias Central Area 2011

Requisites

The two maps must have the same extent, projection and classification legend. If the maps have different classification legends, the user must reclassify the maps in such a way as to unify the two legends.

Execution

Step 1

Given that to carry out Cross-Tabulation at multiple resolutions we need to have maps in raster format, the first thing we have to do is rasterize our vector maps. If you would like to perform this analysis by resampling original raster maps, please refer to Exercise 2 Step 1.

We are going to convert our original vector file to raster at four different spatial resolutions: 25, 50, 75 and 100 m. Our analysis will be based on the same four spatial resolutions.

To rasterize vector data, we use the Rasterize (Vector to raster) tool. Once inside this tool, we begin by indicating the vector layer we want to rasterize (SIOSE 2011 map). Then, we go to “Field to use for burn-in value [optional]” where we indicate the field in the attribute table of the vector layer that will give the raster the pixel values (Metro) (Fig. 19).

Fig. 19
figure 19

Exercise 1. Step 1. Rasterize (Vector to Raster)

We must also set the spatial resolution for the raster we want to create. To do this, we must first define the units for the spatial resolution in the “Output raster size unit” option (Georeferenced Units). Then, we choose the spatial resolution or pixel size through the “Width/Horizontal resolution” (25) and “Height/Vertical resolution” options (25). We must also specify the extent of the raster that will be created in the option “Output extent (xmin, xmax, ymin, ymax)”. We are going to use the extent of the layer we are rasterizing (SIOSE 2011) through the submenu on the right (Use layer extent…).

The final stage is to assign a value to the background, i.e. the pixels that are not covered by any polygon in the vector file. Given that the vector already has values from 0 to 11, we will define the background with code 12. We do this via the option “Pre-initiate the output image with value [optional]”, available under the “Advanced parameters” options (Fig. 20).

Fig. 20
figure 20

Exercise 1. Step 1. Advanced parameters of the Rasterize (Vector to Raster) tool

Our background value (12) will also be the nodata value of our raster. We can assign a nodata value for the raster we are going to create using the option “Assign a specified nodata value to output bands [optional]” (Fig. 19).

Step 2

Once we have finished the first rasterization, we must repeat the same procedure for the other three spatial resolutions that we need for the SIOSE dataset. Then, we must repeat the whole workflow for the CORINE map. Once all these tasks have been completed, we will have 8 different maps (4 SIOSE and 4 CORINE) at 4 different spatial resolutions (25, 50, 75 and 100 m).

Step 3

Once all the maps have been created, we can start the Cross-Tabulation. To do this, open the “Semi-Automatic Classification Plugin”, click on the “Postprocessing” tab and select Cross Classification. Then, select the required parameters: raster to assess (CORINE map 25 m) and reference raster (SIOSE map 25 m) (Fig. 21).

Fig. 21
figure 21

 Exercise 1. Step 2. Semi-Automatic Classification plugin

Step 4

After the first execution, repeat this process with the other pair of maps (one for CORINE and one for SIOSE) at different spatial resolutions.

Results and Comments

Once we have executed the function four times, QGIS will create an output map for each execution with the combined classes and an error/Cross-Tabulation matrix. These will be stored in the folder we selected earlier when executing the tool. Matrixes are also displayed in the “output” window. For a detailed description of each of these results, please refer to the Sect. 1.

If we compare the results of each of the error matrixes, we can see that there are few differences between them. Error matrixes show the area in square meters covered by each possible combination between classes. The combination that covers most area is always the agreement between agricultural areas: pixels that are 0 (agricultural areas) in both the validated (CORINE) and the reference (SIOSE) maps. At a spatial resolution of 25 m, these areas occupy 585,267,500 m2; at 50 m, 585,225,000 m2; at 75 m, 585,815,625 m2; and at 100 m, 584,660,000 m2. The differences are therefore very small.

A similar pattern can be observed if we look at the rest of the combinations. This means that at all the spatial resolutions there are very similar levels of agreement and disagreement between the classes on the two maps (CORINE and SIOSE). We can therefore conclude that the spatial resolution selected to make the analysis has no substantial effect on the results.

That means that the areas classified differently on the two maps are not due to small details drawn on one map that do not appear on the other. Disagreement is not the result of isolated pixels on one map that are not classified in the same category on the other. If this were true, the agreement between the two maps should be higher at coarser resolutions because they are more generalized, so ruling out minor details.

In conclusion, it would seem that the differences between the two maps are structural. In other words, they are not caused by the spatial resolution or level of detail of the maps, and instead result from the fact that each map represents a different reality on the ground. If we generalize both maps and rule out all small details, both maps show a similar level of agreement. Notwithstanding this, we must always remember that most of the areas in both maps agree, as confirmed in the Sect. 1.

When compared with SIOSE, CORINE can be considered a valid map because the agreement between the two is very high. The differences between them are the same regardless of the spatial resolution employed to make the analysis, at least within the resolution range we used (from 25 to 100 m). Thus, although the differences between SIOSE and CORINE are the result of their different scale and Minimum Mapping Unit, they cannot be eliminated simply by generalizing the maps using coarser spatial resolutions. In fact, their agreements and disagreements remain the same, which suggests that the different scale of production introduces important structural differences in the way the two maps draw the ground land uses and land covers.

Exercise 2. To validate soft maps produced by the model against a reference map

Aim

To evaluate to what extent the urban fabric suitability map of our model agrees with the urban fabric areas of the reference map for the year of the simulation at multiple spatial resolutions, determining the resolution at which there is most agreement.

Materials

CORINE Land Use Map Asturias Central Area 2011

Urban fabric suitability map—CORINE model

Requisites

The two maps must have the same extent, spatial resolution and projection. The soft map must be a categorical map. The Land Use map must only contain information about the category being assessed. For a proper validation, the reference map must refer to the same date as the simulation.

Execution

Step 1

We begin by converting our soft map into a categorical one to comply with the requirements of the Cross-Tabulation tool. This is done using the Reclassify by table function (Processing toolbox > Raster analysis > Reclassify by table).

There are no standard criteria for the reclassification of soft maps and users can apply whatever thresholds they think best. In this case, we will use the same thresholds we used in Exercise 2 of the Sect. 1. We will therefore reclassify the map into four new categories: 1 (suitability 0–0.25), 2 (0.25–0.50), 3 (0.50–0.75) and 4 (0.75–1).

Step 2

As stated in the requisites, we will cross-tabulate the reclassified soft map with a map that only shows the Land Use Cover category of interest, i.e. urban fabric. To this end, we must extract the urban fabric areas from the LUC map (CORINE) using the same function as in Step 1 (Reclassify by table). In the reclassification, we will assign a value of 1 to urban fabric (code 2 in the original map) and a value of 0 to the other categories (codes 0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11 and 12). For a detailed explanation of how to carry out these first two steps, readers are referred to Exercise 2 of the Sect. 1.

Step 3

Once we have the two maps, we can then resample them at different spatial resolutions to carry out the Multiple-Resolution Cross-Tabulation. In our case, as the original pixel size is 50 m, we will resample our maps at 75, 100, 125 and 150 m using the Save As…tool. In this tool, we need to indicate the name of the map we are going to resample (the reclassified suitability map of urban fabric) and the spatial resolution at which we will resample the maps (Fig. 22), in our case, 75 m.

Fig. 22
figure 22

Exercise 2. Step 3. Save Raster Layer as... tool

Step 4

After resampling the map, we must repeat the same procedure for the other resolutions (100, 125 and 150 m). Then, we do the same for the urban fabric areas map. By the end we should have 8 maps (4 SIOSE and 4 CORINE) at 4 different spatial resolutions (75, 100, 125 and 150 m).

Step 5

Once we have obtained all the maps we need, we can then carry out the Cross-Tabulation exercise using the Cross classification tool from the “Semi-Automatic Classification Plugin”. Once inside the tool, we must indicate the two rasters that we want to cross-tabulate: the soft map (Select the classification) and the land use map for the category of interest (Select the reference vector or raster) (Fig. 23).

Fig. 23
figure 23

Exercise 2. Step 5. Semi-Automatic Classification plugin

Step 6

After we do this for the maps at the original resolution (50 m), we repeat the process at the other 4 spatial resolutions (75, 100, 125 and 150 m).

Results and Comments

After executing the function for each pair of maps at each spatial resolution, the tool produces (for each spatial resolution) an output map with the combination and two matrixes detailing how the values of both maps cross-tabulate. These are stored in the folder we selected and are also displayed on the screen (Output tab). For a detailed description of each of these results, please refer to the Sect. 1.

“The “Cross Matrix” is the most interesting of all these results in that it provides us with all the information we need for our analysis. It details how much of the area for each category in the reclassified suitability map falls inside areas that are urban fabric in our reference maps (Tables 4, 5, 6, 7 and 8).

Table 4 Result from Exercise 2. Table showing the corresponde between the urban fabric category in CORINE and the different groups of suitability values for urban fabric in the map at 50m of spatial resolution
Table 5 Result from Exercise 2. Table showing the corresponde between the urban fabric category in CORINE and the different groups of suitability values for urban fabric in the map at 75 m of spatial resolution
Table 6 Result from Exercise 2. Table showing the corresponde between the urban fabric category in CORINE and the different groups of suitability values for urban fabric in the map at 100 m of spatial resolution
Table 7 Result from Exercise 2. Table showing the corresponde between the urban fabric category in CORINE and the different groups of suitability values for urban fabric in the map at 125 m of spatial resolution
Table 8 Result from Exercise 2. Table showing the corresponde between the urban fabric category in CORINE and the different groups of suitability values for urban fabric in the map at 150 m of spatial resolution

For the analysis at a spatial resolution of 50 m, there are 4999 m2 of low suitability (suitability below 0.25) that cross-tabulate with areas that are urban fabric in the reference LUC map. If we consider that each pixel represents an area of 2500 m2 (50 m × 50 m), this means that only 2 pixels of urban fabric cross-tabulate with areas of low suitability on the suitability map. 1971 pixels with medium to high suitability (0.5–0.75) cross-tabulate with areas that are urban fabric. Finally, most of the urban fabric pixels cross-tabulate with areas with the highest suitability (0.75–1): this combination is represented by 26,137 pixels. These data show that there is a positive correlation between suitability and the presence of urban fabric. We can therefore conclude that suitability is a good driver for our model.

Varying the spatial resolution of the analysis did not lead to any major differences in the correlation between the suitability map and the urban fabric areas in the reference maps. At the five spatial resolutions assessed, most of the pixels fell within the highest suitability category (0.75–1).

The dissimilarities between the analyses at different resolutions were very small. At 75 m, just two pixels fell within the areas of lowest suitability (11,245 m2). At 100 m, there were a lot more: 74 pixels (738,436 m2). At 125 m there was just 1 pixel (15,651 m2), and at 150 m, no pixels at all (0 m2). Similar behaviour can be observed for the other two categories of suitability at all five resolutions.

This indicates that the suitability map for urban fabric in our modelling exercise is correct. It positively correlates with those areas that are urban fabric in our reference map, so helping us to identify the areas in which new urban fabric is most likely to appear. However, no conclusions can be drawn regarding the best spatial resolution at which to carry out the modelling exercise. As the explanatory power of the suitability maps is very similar at all the spatial resolutions assessed, the decision as to which spatial resolution would be best for our modelling exercise should be based on other factors, such as how realistic the pattern looks or what the minimum level of detail might be for the model to be useful for stakeholders and users.

This analysis could be complemented with more sophisticated tools like the ROC curve and the Difference in Potential (see Sects. 2 and 3 in Chapter “Validation of Soft Maps Produced by a Land Use Cover Change Model”). These tools also provide information about how well a model soft map simulates a category of interest, such as urban fabric.

Exercise 3. To validate a simulation against a reference map

Aim

To validate a simulation for the year 2011 against a reference map for the same year at multiple spatial resolutions, determining the resolution at which both maps show the best agreement.

Materials

Simulation CORINE Asturias Central Area 2011

CORINE Land Use Map Asturias Central Area 2011

Requisites

The two maps must have the same extent, spatial resolution, projection and legend. For proper validation, the reference date must refer to the date on which the landscape was simulated.

Execution

Step 1

For Multiple-Resolution Cross-Tabulation, we need first to resample the original rasters (50 m) at other spatial resolutions. In this case, we will resample our simulation at 100, 150 and 200 m, according to the procedure for the Save As…tool set out in the previous exercise (Exercise 2, Execution - Step 2). Once inside the tool, we fill in the required parameters: name of the raster to be sampled (Simulation CORINE) and spatial resolution (100 m).

Step 2

Once we have resampled the first map, we then repeat the procedure for the other spatial resolutions (150 and 200 m) and for the reference map. By the end, we should have 8 maps (4 simulations and 4 reference maps) at 4 spatial resolutions (50, 100, 150 and 200 m).

Step 3

With all these resampled maps, we can then carry out the Cross-Tabulation exercise at multiple resolutions. To do this, open the “Semi-Automatic Classification Plugin”, click on the “Postprocessing” tab and select Accuracy. Fill in the required parameters: raster to assess (Simulation CORINE 11 map at 50 m) and reference raster (CORINE 11 map at 50 m) (Fig. 24).

Fig. 24
figure 24

Exercise 3. Step 3. Semi-Automatic Classification plugin

Step 4

Repeat the same procedure for the other pairs of maps at 100, 150 and 200 m.

Results and Comments

After this function has been executed for each spatial resolution, QGIS will create an output map, a couple of matrixes and some statistical measures. All the tables and statistics can be consulted in the “output window” and all the results will be saved in the folder we selected earlier. For a detailed description of each of these results, please refer to the Sect. 1.

The analysis of the matrixes at the different spatial resolutions shows no important differences between resolutions, and very similar results in all cases. In general, there is a high level of agreement between the simulation and the reference map, as studied above in the Sect. 1 when conducting the analysis at the original resolution of the modelling exercise.

If we take Overall Accuracy as a summary metric describing the similarity between the two maps, we can see that similarity is very high in all cases (Table 9). Only the exercise at 100 m shows a lower agreement rate. This may be due to multiple causes, but it does indicate that coarsening the spatial resolution of the simulation does not ensure higher levels of agreement between the simulated landscape and the reference landscape.

Table 9 Results from Exercise 3. Overall accuracies of the simulation, when assessed against a reference map, at four spatial resolutions: 50, 100, 150 and 200 m

We must also bear in mind the limitations for this exercise mentioned in the Sect. 1. Validating a simulation by cross-tabulating the simulated exercise with a reference map may be misleading. Most of the areas in both maps agree because most of the areas in the simulated landscape remain the same during the modelling period.

The best way to validate the changes modelled in our exercise is to focus exclusively on the simulated changes and on a map of reference showing the changes on the ground. In this case, the Multiple-Resolution exercise could provide very interesting insights, as agreement between simulated and reference changes may be higher at coarser spatial resolutions.