Geographically Weighted Methods to Validate Land Use Cover Maps

Molinero-Parejo, Ramón

doi:10.1007/978-3-030-90998-7_13

Ramón Molinero-Parejo⁵

21k Accesses
1 Citations

Abstract

One of the most commonly used techniques for validating Land Use Cover (LUC) maps are the accuracy assessment statistics derived from the cross-tabulation matrix. However, although these accuracy metrics are applied to spatial data, this does not mean that they produce spatial results. The overall, user’s and producer’s accuracy metrics provide global information for the entire area analysed, but shed no light on possible variations in accuracy at different points within this area, a shortcoming that has been widely criticized. To address this issue, a series of techniques have been developed to integrate a spatial component into these accuracy assessment statistics for the analysis and validation of LUC maps. Geographically Weighted Regression (GWR) is a local technique for estimating the relationship between a dependent variable with respect to one or more independent variables or explanatory factors. However, unlike traditional regression techniques, it considers the distance between data points when estimating the coefficients of the regression points using a moving window. Hence, it assumes that geographic data are non-stationary i.e., they vary over space. Geographically weighted methods provide a non-stationary analysis, which can reveal the spatial relationships between reference data obtained from a LUC map and classified data. Specifically, logistic GWR is used in this chapter to estimate the accuracy of each LUC data point, so allowing us to observe the spatial variation in overall, user’s and producer’s accuracies. A specific tool (Local accuracy assessment statistics) was specially developed for this practical exercise, aimed at validating a Land Use Cover map. The Marqués de Comillas region was selected as the study area for implementing this tool and demonstrating its applicability. For the calculation of the user’s and producer’s accuracy metrics, we selected the tropical rain forest category [50] as an example. Furthermore, a series of maps were obtained by interpolating the results of the tool, so enabling a visual interpretation and a description of the spatial distribution of error and accuracy.

You have full access to this open access chapter, Download chapter PDF

Quantifying Uncertainty for Estimates Derived from Error Matrices in Land Cover Mapping Applications: The Case for a Bayesian Approach

Bias-variance decomposition of errors in data-driven land cover change modeling

Article 04 July 2016

Land-Use and Land Cover Change: Advancing with Geographic Information Science

Keywords

1 Overall, User’s and Producer’s Accuracy Through GWR

Description

Overall accuracy (OA), user’s accuracy (UA) and producer’s accuracy (PA) are assessment metrics obtained from the cross-tabulation matrix (see Sect. 5 in chapter “Metrics Based on a Cross-Tabulation Matrix to Validate Land Use Cover Maps”). Overall accuracy is expressed as the proportion of the map that has been correctly classified. User’s accuracy indicates the probability that a pixel from a specific category on the classified map correctly represents the real situation on the ground or reference map. Producer’s accuracy indicates the probability that a reference pixel belonging to a specific category has been correctly allocated to that category (Story and Congalton 1986). These last two metrics (user’s and producer’s accuracies) refer to commission and omission errors, respectively.

None of these accuracy assessment statistics produces spatially distributed information, i.e., they provide a single accuracy value for the entire study area or for each land use/land cover class. However, it is possible to explore how the error and accuracy of a classified map is spatially distributed with respect to reference data using Geographically Weighted Regression (GWR) methods.

GWR allow us to explore local spatial relationships between a dependent variable and a set of explanatory variables (Brunsdon et al. 1996; Fotheringham et al. 2002). In this chapter, we use the logistics version of the geographical weighting method (GWLR) to generate land use/land cover accuracy metrics with spatial variation, according to the proposal by Comber (2013), which was later developed in Comber et al. (2012), Comber et al. (2017) and Tsutsumida and Comber (2015).

GWR is a statistical technique in which regression points are estimated on the basis of the spatial distribution of data points. A moving window analyses the data points it collects to estimate the coefficients of the selected regression point. This window, or kernel, weights each data point according to the distance within the window and the assigned weighting function (gaussian, exponential, bisquare, tricube, boxcar). Its maximum weighting value is 1 and this decreases as the distance between the observation and calibration data points increases. The size of the kernel is defined by the bandwidth, which indicates the number of data points that will be included in the local calculation for each regression point. This can consider either a fixed or a variable number of reference data points. If a fixed number of points are considered, a specific number will be obtained, while in the case of a variable number, a distance value is given. The number of reference data points therefore varies according to their distribution. It is important to select a suitable bandwidth so as to minimise the cross-validation prediction error. According to Fotheringham et al. (2002), the GWR formula is

$${y}_{i}={\beta }_{0({u}_{i}, {v}_{i})}+ {\sum }_{n}{\beta }_{n({u}_{i}, {v}_{i})}{x}_{n}$$

where ${\beta }_{0}$ is the intercept, ${\beta }_{n}$ is the coefficient, ${x}_{n}$ is the value of the explanatory variable, and ${u}_{i}, {v}_{i}$ are the coordinates of the data point (Fig. 1).

This geographically weighted method was adapted for the calculation of local accuracy assessment statistics by Comber (2013). According to his proposal, the probability that a reference data point is correctly identified by a classified data point is given by

$$\mathrm{Overall} \; \mathrm{accuracy} \to P(A=1)={\mathrm{logit}(\beta }_{0({u}_{i}, {v}_{i})})$$

where P(A = 1) is the probability that the agreement between the classified data and the reference data is equal to 1. This value is 0 when there is no agreement and 1 when there is agreement.

To estimate user’s accuracy, it is necessary to analyse the reference data against the classified data. This metric indicates the probability that the reference LUC class ${y}_{i}$ and is correctly predicted by the classified data ${x}_{i}$.

$$\begin{gathered} {\text{User's}}\;{\text{accuracy}} \to P\left( {y_i = 1} \right) \hfill \\ = {\text{logit}}\left( {\beta_{0\left( {u_i ,v_i } \right)} + \beta_{1\left( {u_i ,v_i } \right)} x_i } \right). \hfill \\ \end{gathered}$$

To estimate producer’s accuracy, it is necessary to analyse the classified data against reference data. This indicates the probability that the classified data ${x}_{i}$ correctly represents reference LUC class ${y}_{i}$.

$$\begin{gathered} {\text{Producer's}}\;{\text{accuracy}} \to P\left( {x_i = 1} \right) \hfill \\ = {\text{logit}}\left( {\beta_{0\left( {u_i ,v_i } \right)} + \beta_{1\left( {u_i ,v_i } \right)} y_i } \right) \hfill \\ \end{gathered}$$

Finally, in order to obtain the accuracy values, the coefficients have to be adjusted. To this end, the coefficients are added together, and an alogit function (inverse logit) is applied.

Utility

Exercises
1. To validate a map against reference data/map

Geographically Weighted methods can be used to validate single LUC maps by analysing spatial variations in the agreement between reference data and classified remotely sensed data, so enabling us to analyse the spatial non-stationarity of LUC data error and accuracy. They allow to explore the spatial relationships between the reference data and the classified data, exposing possible clusters of land cover errors, and reporting the values for each data point in contrast to global accuracy assessment statistics, which only provide a global value for the entire map.

This technique allows us not only to discover what proportion of the map has been correctly classified but also to estimate in which areas the classification fits best and to analyse possible trends that are only visible spatially. In this way, the spatial distribution of the overall, user’s and producer’s accuracy metrics can be visualized on a map so as to enable a better understanding of classification uncertainty.

QGIS Exercise

Available tools
▪ Processing Toolbox R Geographically weighted methods Local accuracy assessment statistics Interpolation IDW Interpolation GDAL Raster extraction Clip raster by mask layer

By default, there are no tools in QGIS that carry out a Geographically Weighted Methods analysis to estimate overall, user’s and producer’s accuracy values for local areas. We have therefore developed an R tool to calculate these local accuracy assessment metrics in QGIS, in which Geographically Weighted Methods are already implemented.

The Local accuracy assessment statistics script is based on the code developed by Professor Alexis J. Comber from the University of Leicester,^{Footnote 1} which was created using above all the “spgwr” R package.^{Footnote 2} The script provides overall, user’s and producer’s accuracy values for each data point, so allowing accuracy and error distribution areas to be generated by interpolation of the results obtained by the tool.

First, to estimate local OA values, the tool calculates internally, for each data point, the agreement between the reference data and the classified data, where 0 represents disagreement and 1 represents agreement. Agreement is automatically selected as dependent variable [y] and “1” is selected as independent variable [x], where P(A = 1) is the probability that agreement is equal to 1.

To estimate local UA values, the tool generates a new data frame and obtains two columns. One column shows the presence (1)/absence (0) of the chosen category for the reference data, while the other column shows the same for the classified data. The reference data (RD) is selected as dependent variable [y], and the classified data (CD) is selected as independent variable [x], where P(RD = 1|CD = 1). The procedure for producer’s accuracy is very similar. The classified data for the chosen category is selected as dependent variable [y], and the reference data is selected as independent variable [x], where P(CD = 1|RD = 1).

In order to ensure that the tool works correctly, various parameters must be configured. Selecting an appropriate bandwidth is therefore crucial. A small bandwidth would include too few data points in the local sample, making it unreliable for calibrating the model, while a large bandwidth would include too many data points, so reducing the local analysis capacity. A spatially distributed data sample is also required.

The fact that the parameters must be configured and the need for more in-depth knowledge to interpret the results could be considered a disadvantage when choosing these validation methods. Another important consideration is that using large data samples can lead to long runtimes.

Exercise 1. To validate a map against reference data/map

Aim

To assess the spatial variation of accuracy assessment measures (overall, user’s and producer’s accuracy) when validating the Marqués de Comillas LUC map against a reference set of points.

Materials

Marqués de Comillas random sample points from Mexico (2019)

Boundary of Marques de Comillas

Requisites

The data points must be projected in their corresponding reference system. The vector point file must include two attributes, one corresponding to reference LUC data and one to classified LUC data. It is recommended that the data points have an appropriate random distribution. Sample size should not be overly large, as this could lead to long runtimes.

Execution

If necessary, install the Processing R provider plugin, and download the Local accuracy assessment statistics.rsx R script into the R scripts folder (processing/ rscripts). For more details, see chapter “About This Book” of this book.

Step 1

Open the Local accuracy assessment statistics function and fill in the required parameters (see Fig. 2). The input for this tool is the point layer containing the LUC random sample dataset. Select the type of accuracy assessment statistic to be obtained (“Overall”), and indicate the corresponding attribute table columns with the reference data and the classified data. The category can also be indicated, although this is only used to estimate the user’s and producer’s accuracy values. The remaining value to be set is the bandwidth, which in this exercise is 0.15. This means that 15% of the nearest neighbours will be used to estimate the coefficient for each regression point. The kernel is set internally in the tool by default with a Gaussian function.

Step 2

The parameter configuration for calculating User’s Accuracy is very similar. Select the corresponding accuracy assessment statistic in the “Accuracy” option (“User”) and the category you want to assess in the “Category” option, (see Fig. 3). In this exercise, we will be using the tropical rain forest class [50] as an example.

Step 3

To estimate the producer’s accuracy values, the same steps must be followed (see Fig. 4). Select the corresponding accuracy assessment statistic (“Producer”), and the tool will modify the internal inputs. The tropical rain forest class [50] will again be used as an example.

Step 4

Finally, the coefficients adjusted by the Local accuracy assessment statistics tool were interpolated using the Inverse Distance Weighted method (IDW interpolation tool in QGIS) (see Fig. 5) to obtain a map showing the continuous variation in the spatial distribution of the accuracy measures, and to facilitate understanding in a more visual manner.

The names of the column or attribute obtained as a result of applying the tool and indicating the local overall, user’s and producer’s accuracy values are “g__SDF_”, “coefs_u” and “coefs_p” respectively. This column must be specified in the “Interpolation attribute” option in line with the accuracy metric being analysed.

Step 5

As an additional, optional step, the raster images obtained by interpolation can be clipped by mask using the Marques de Comillas boundary (Clip raster by mask layer tool in QGIS) in order to provide a better visual representation. In addition, a discrete colour scale using six classes was chosen in order to make interpretation of the data more straightforward.

Results and Comments

After the execution of the previous steps, we obtain a new attribute column with the estimated local values for OA, UA and PA respectively, and the interpolated distribution maps for these accuracy measures. Another output of the tool is a new layer that includes the estimated Overall Accuracy value for each data point. In addition, a summary of the local and overall values calculated is displayed in the log window (Fig. 6). It shows the minimum, first quantile, median, mean, third quartile, maximum and global overall accuracy values (Table 1).

Table 1 Results from Exercise 1. Table summarizing the variations in Overall, User’s and Producer’s accuracy values

Full size table

The IDW interpolation method is used to generate an area that visually represents the distribution of the values obtained, offering a more detailed spatial representation of the distribution of accuracy and error than that provided by a single overall accuracy value. Figure 7 clearly shows a higher degree of accuracy in the north of the map, which decreases as it moves south and east.

The example category in this exercise is tropical rain forest (code 50). User’s accuracy describes the commission errors in the tropical rain forest category. Its values range between 0.55 and 0.87, with a variation of 0.32, despite the overall value for the entire study area of 0.74 (Fig. 8).

Figure 9 represents the probability that a classified data point belonging to the tropical rain forest class is correctly represented by the reference data (User’s accuracy). Values are high through the centre and south of the region, but fall as we move away to the northeast.

The last part of this exercise focuses on Producer’s Accuracy. In this case, it describes omission errors related to the tropical rain forest class. User’s accuracy varies from 0.56 to 0.89 (variation of 0.33), despite the global value for the entire area of 0.74 (Fig. 10).

Figure 11 represents the probability that any reference data point is correctly classified (producer’s accuracy). Most of the omission errors are concentrated in the north-east of our study area, while higher levels of producer’s accuracy can be seen in the south-west.

The values set out in Figs. 6, 8 and 10 are summarized in Table 1, which shows the variations in the accuracy of the classified data points with respect to the reference data points. The Overall accuracy value for the entire study area is 0.80. Nonetheless, it has been demonstrated that OA varies over space. The minimum value is 0.77 and the maximum is 0.84, which means that a variation of 0.07 is observed.

Producer’s accuracy has the highest range of variation, with User’s accuracy close behind. By contrast, Overall accuracy has a relatively small range, indicating low levels of spatial variation. Despite this, the maximum Overall accuracy value (0.84) is below the value proposed by Anderson (1971).

In conclusion, Local accuracy assessment statistics should be considered as a useful complement to the cross-tabulation matrix and its global accuracy statistics in that they provide more detailed information that can help improve classification techniques by locating possible error clusters with greater precision. It is also important to stress that a visual interpretation can enable better decisions to be taken when evaluating and validating LUC maps.

Notes

1.
The code is available at the personal repository of Professor Alexis J. Comber. https://github.com/lexcomber/AccuracyWorkshop2016.
2.
Full details of this R package and the functions it includes, may be found at https://cran.r-project.org/web/packages/spgwr/spgwr.pdf.

References

Brunsdon C, Fotheringham AS, Charlton ME (1996) Geographically weighted regression: a method for exploring spatial nonstationarity. Geogr Anal 28(4):281–298
Article Google Scholar
Comber AJ (2013) Geographically weighted methods for estimating local surfaces of overall, user and producer accuracies. Remote Sens Lett 4(4):373–380. https://doi.org/10.1080/2150704X.2012.736694
Article Google Scholar
Comber A, Fisher P, Brunsdon C, Khmag A (2012) Spatial analysis of remote sensing image classification accuracy. Remote Sens Environ 127:237–246. https://doi.org/10.1016/j.rse.2012.09.005
Article Google Scholar
Comber A, Brunsdon C, Charlton M, Harris P (2017) Geographically weighted correspondence matrices for local error reporting and change analyses: mapping the spatial distribution of errors and change. Remote Sens Lett 8(3):234–243. https://doi.org/10.1080/2150704X.2016.1258126
Article Google Scholar
Story M, Congalton RG (1986) Accuracy assessment: a user’s perspective. Photogramm Eng Remote Sens 52:397–399
Google Scholar
Tsutsumida N, Comber AJ (2015) Measures of spatio-temporal accuracy for time series land cover data. Int J Appl Earth Obs Geoinf 41:46–55. https://doi.org/10.1016/j.jag.2015.04.018
Article Google Scholar
Fotheringham AS, Brunsdon C, Charlton M (2002) Geographically weighted regression. The analysis of spatially varying relationships. In: Fotheringham AS, Brunsdon C, Charlton M (eds) Wiley
Google Scholar

Download references

Author information

Authors and Affiliations

Departamento de Geología, Geografía y Medio Ambiente, Universidad de Alcalá, Alcalá de Henares, Spain
Ramón Molinero-Parejo

Authors

Ramón Molinero-Parejo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ramón Molinero-Parejo .

Editor information

Editors and Affiliations

Departamento de Geología, Geografía y Medio Ambiente, Universidad de Alcalá, Alcalá de Henares, Spain
David García-Álvarez
Departamento de Análisis Geográfico Regional y Geografía Física, Universidad de Granada, Granada, Spain
María Teresa Camacho Olmedo
Département de Géographie, Aménagement et Environnement, Université de Toulouse Jean Jaurès, Toulouse, France
Martin Paegelow
Laboratorio de Análisis Espacial, Centro de Investigaciones en Geografía Ambiental, Universidad Nacional Autónoma de México, Morelia, Mexico
Jean François Mas

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Molinero-Parejo, R. (2022). Geographically Weighted Methods to Validate Land Use Cover Maps. In: García-Álvarez, D., Camacho Olmedo, M.T., Paegelow, M., Mas, J.F. (eds) Land Use Cover Datasets and Validation Tools. Springer, Cham. https://doi.org/10.1007/978-3-030-90998-7_13

Download citation

DOI: https://doi.org/10.1007/978-3-030-90998-7_13
Published: 17 June 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-90997-0
Online ISBN: 978-3-030-90998-7
eBook Packages: Earth and Environmental ScienceEarth and Environmental Science (R0)

Publish with us

Policies and ethics

Geographically Weighted Methods to Validate Land Use Cover Maps

Abstract

Similar content being viewed by others

Quantifying Uncertainty for Estimates Derived from Error Matrices in Land Cover Mapping Applications: The Case for a Bayesian Approach

Bias-variance decomposition of errors in data-driven land cover change modeling

Land-Use and Land Cover Change: Advancing with Geographic Information Science

Keywords

1 Overall, User’s and Producer’s Accuracy Through GWR

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Geographically Weighted Methods to Validate Land Use Cover Maps

Abstract

Similar content being viewed by others

Quantifying Uncertainty for Estimates Derived from Error Matrices in Land Cover Mapping Applications: The Case for a Bayesian Approach

Bias-variance decomposition of errors in data-driven land cover change modeling

Land-Use and Land Cover Change: Advancing with Geographic Information Science

Keywords

1 Overall, User’s and Producer’s Accuracy Through GWR

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation