Imprecision and Spatial Uncertainty
Synonyms
Definition
Spatial uncertainty is defined as the difference between the contents of a spatial database and the corresponding phenomena in the real world. Because all contents of spatial databases are representations of the real world, it is inevitable that differences will exist between them and the real phenomena that they purport to represent. Spatial databases are compiled by processes that include approximation, measurement error, and generalization through the omission of detail. Many spatial databases are based on definitions of terms, classes, and values that are vague, such that two observers may interpret them in different ways. All of these effects fall under the general term of spatial uncertainty, since they leave the user of a spatial database uncertain about what will be found in the real world. Numerous other terms are partially synonymous with spatial uncertainty. Data quality is often used in the context of metadata, and describes the measures and assessments that are intended by data producers to characterize known uncertainties. Vagueness, imprecision, and inaccuracy all imply specific conceptual frameworks, ranging from fuzzy and rough sets to traditional theories of scientific measurement error, and whether or not it is implied that some true value exists in the real world that can be compared to the value stored in the database.
Historical Background
Very early interest in these topics can be found in the literature of stochastic geometry (Kendall 1961), which applies concepts of probability theory to geometric structures. An early paper by Frolov and Maling (1969) analyzed the uncertainties present in finite-resolution raster representations, and derived confidence limits on measures such as area, motivated in part by the common practice of estimating measures of irregular patches by counting grid cells. Maling’s analysis established connections between the spatial resolution of the overlaid raster of cells and confidence limits on area estimates. Maling’s book (Maling 1989) was a seminal venture into the application of statistical methods to maps, and helped to stimulate interest in the topic of spatial uncertainty. The growth of geographic information systems (GIS) provided the final impetus, and led to the first research initiative of the new US National Center for Geographic Information and Analysis in 1988, on the topic of accuracy in spatial databases (Goodchild and Gopal 1989).
The notion that spatial databases could be treated through the application of classical theories of measurement error soon proved too limiting, however. The definitions of types that are used in the compilation of maps of soil class, vegetation cover class, or land use are clearly open to interpretation, and such maps must be regarded as to some degree subjective and outside the normal bounds of scientific replicability. Concepts of fuzzy and rough sets were explored by researchers interested in these issues (Fisher and Unwin 2005). While the definition of a given class may be vague, it is nevertheless helpful to think about degrees of membership in the class. For example, researchers interested in developing plain-language interfaces to GIS found that prepositions such as “near” had vague meanings that could be represented more formally through membership functions. This approach resonated well with the move in the early 1990s to introduce theories of linguistics and cognition into GIS research.
By the end of the 1990s the literature on spatial uncertainty had grown to include several distinct theoretical frameworks, including geostatistics, fuzzy sets, rough sets, and spatial statistics. Zhang and Goodchild (2002) published a synthesis, framed within the fundamental dichotomy between discrete objects and continuous fields that underlies much of GIScience. Research continues, particularly on such topics as spatial uncertainty in digital terrain data.
Scientific Fundamentals
In the classical theory of measurement, an observed value z ′ is distorted from its true value z by a series of random effects. If these effects are additive, the distortion δ z = z ′ − z is expected to follow a Gaussian distribution, and each observed measurement is interpreted as a sample drawn from that distribution. The mean of the distribution is termed the bias or systematic error, and the root mean square of δ z is termed the standard error. The standard deviation of δ z with respect to its own mean is often termed precision, and a biased measurement device is thus said to be possibly precise but not accurate. However, precision can also refer to the number of numerical digits used to report a measurement, and imprecision is used in several ways in the literature on spatial uncertainty.
This analysis extends readily to measurement of position in two or three dimensions, and thus to measurements made by such technologies as the Global Positioning System (GPS), where the multivariate Gaussian distribution is widely used to characterize positional uncertainty. Measurement errors in the two horizontal dimensions are commonly found to have equal variance, but errors in the vertical dimension typically have very different variance; and measurement errors in all three dimensions are commonly found to be uncorrelated.
This classical theory has been developed extensively within the discipline of surveying, under the rubric of adjustment theory, in order to understand the effects that errors in raw measurements may have on the inferred locations of items of interest. For example, errors in the measurement of bearing, elevation, and range will translate into errors in the inferred positions of the objects of the survey. Complications arise when closed loops are surveyed in the interests of reducing errors, which must then be allocated around the loop in a process known as adjustment. This body of theory has not had much influence on spatial databases, however, outside of the domain of traditional surveying.
Any spatial database will consist of large numbers of measurements. For example, a remotely sensed image may contain millions of pixels, each containing several measurements of surface reflectance. Although measurements made by simple devices such as thermometers can reasonably be assumed to have statistically independent errors, this is almost never true of data compiled across geographic space. Instead, strong and mostly positive correlations are observed between data values that are close together in space. These correlations may be induced by the production process, when many data values inherit the errors in a smaller number of nearby measurements through various forms of interpolation, or through the measurements themselves, which are distorted by effects that operate across areas of space. Such correlations are generally known as spatial dependence or spatial autocorrelation.
This tendency turns out to be quite useful. For example, consider a curved segment of a street, recorded in a spatial database as a sequence of coordinate pairs. Assume a measurement error of 10 m, not unreasonable in today’s street centerline databases. If each point was independently disturbed by 10 m, the result would be impossibly and unacceptably erratic, and the segment’s length as determined from the database would be severely overestimated. Instead, positive correlation between nearby errors ensures that the general shape of the street will be preserved, even though its position is disturbed. Similar arguments apply to the preservation of slopes in disturbed elevation models, and to many other examples of spatial data.
Several authors have drawn attention to an apparent paradox that follows from this argument. Consider a straight line, such as a straight segment of a street or property boundary, and suppose that the endpoints are disturbed by measurement error. If the disturbances are independent with known distributions, standard errors can be computed at any point along the line; and are found to be in general smaller away from the endpoints. If the disturbances have perfect positive correlation then standard errors are constant along the line; if they have identical and independent distributions then standard error is least at the midpoint where it is equal to 0.707 times the endpoint standard error; and if errors have perfect negative correlation then standard error will drop to zero at one intermediate point. Kyriakidis and Goodchild (2006) have generalized this problem to several other instances of linear interpolation. In practice, however, the straight line may itself be a fiction, and deviations of the truth from the straight line will tend to rise away from the endpoints, more than compensating for this effect.
Geostatistics (Goovaerts 1997) provides a comprehensive theoretical framework for modeling such spatial autocorrelation of errors. Variances between nearby errors are expected to increase monotonically up to a distance known as the range, beyond which there is no further increase. The variance at this range is termed the sill, and corresponds to the absolute error of the database; however relative error is less over distances shorter than the range, and near zero over very short distances. Mathematical functions provide models of the monotonic increase of variance with distance.
Such models provide a convenient and powerful basis for exploring the effects of errors in applications such as terrain databases. Just as one might simulate the effects of error by adding independent samples from a Gaussian distribution to an observed value, so the effects of error in such databases can be simulated by adding realizations from random field models with suitable spatial covariances. In such cases, however, and because of the strong spatial dependences present in virtually all spatial data, it is the entire database that must be simulated in each realization of the random process, not its individual measurements; and samples from the stochastic process are entire maps, not simple measurements. Such simulations have proven very useful in visualizing the effects of spatially autocorrelated errors in spatial databases, and in exploring the propagation of such errors during GIS analysis. Several studies have demonstrated the use of geostatistical techniques such as conditional simulation to provide models of error in spatial databases.
Progress has been made in modeling the ways in which uncertainties propagate through GIS operations based on this theoretical framework. Although simple queries may refer only to a single point, and require knowledge only of that point’s marginal distribution of uncertainty, other operations such as the measurement of area, distance, slope, or direction require knowledge of joint distributions and thus covariances. Heuvelink (1998) has developed a comprehensive framework for the propagation of uncertainty, using both analytic and numeric methods, including Taylor series approximations.
Such approaches are fundamentally limited by their insistence on the existence of a truth that is distorted by measurement. They fit well with applications in terrain modeling, and the positional accuracy of well-defined features such as roads, but poorly to applications involving classifications of soil, vegetation cover, or land use. But progress has been made in analyzing these latter types of database using the theoretical frameworks of fuzzy and rough sets. Briefly, such frameworks suppose that although the exact nature of a class A may remain unknown, it is still possible to measure membership m(A) in the class. Zhu et al. (1996) have shown how maps of membership can be useful in characterizing inherently vague phenomena, and Woodcock and Gopal (2000) have shown how such maps can be useful in managing forests. Fisher and Unwin (2005) have explored more advanced versions of these simple frameworks. Fundamentally, however, and despite the simplicity and intuitive appeal of these approaches, the question remains: if A cannot be defined, how is it possible to believe that m(A) can be measured? Moreover, it has proven difficult to represent the fundamental spatial dependence properties of spatial data within these frameworks, so while marginal properties can be analyzed with some success, the joint properties that underlie many forms of GIS analysis remain the preserve of statistical methods and of frameworks such as geostatistics and spatial statistics.
Key Applications
The literature on imprecision and spatial uncertainty now encompasses virtually all types of spatial data. As noted earlier, the literature on uncertainty in terrain data is voluminous. Several authors have demonstrated the use of representations of uncertainty in spatial decision support (e.g., Aerts et al. 2003), and have discussed the many sources of uncertainty in such applications. Interesting methods have been devised for visualizing uncertainty, including animation (Ehlschlaeger et al. 1997). To date, however, the implementation of these methods in GIS software remains limited. Duckham (2002) and Heuvelink (2005) have described efforts to build error-aware systems, and data quality is now an important element of metadata. But the mainstream GIS products continue to report the results of calculations to far more decimal places than are justified by any assessment of accuracy, and to draw lines whose positions are uncertain using line widths that are in no way representative of that uncertainty. Indeed, GIS practice seems still to be largely driven by the belief that accuracy is a function of computation, not representation, and that the last uncertainties were removed from maps many decades ago.
Future Directions
Uncertainty has been described as the Achilles’ Heel of GIS (Goodchild 1998): the dark secret that once exposed, perhaps through the arguments of clever lawyers, will bring down the entire house of cards. While this sounds extreme, it is certainly true that the results of GIS analysis are often presented as far more accurate than they really are. As GIS moves more and more into the realm of prediction and forecasting, the dangers of failing to deal with uncertainty are likely to become more and more pressing. At the same time the accuracy of databases is steadily improving, as more accurate measurements become available. Nevertheless there is an enormous legacy of less accurate data that is sure to continue to find application for many years to come.
While much has been learned over the past two decades about the nature of spatial uncertainty, a large proportion of the literature remains comparatively inaccessible, due to the complexity of its mathematics. Some progress has been made in making the work more accessible, through visualization and through the comparatively straightforward methods of Monte Carlo simulation. In time, such approaches will result in greater awareness of what is possible, and greater adoption of these methods within the wider community.
Progress is also needed on the construction of suitable data models for error-sensitive spatial databases. The simple expedient of adding values representing uncertainty to the entire database in its metadata, or to individual objects as additional attributes, fails to capture all of the complexity of spatial uncertainty, particularly its essential spatial dependence. Goodchild (2004) has argued that this problem is profound, stemming from the complex structures of spatial dependence; that it presents fundamental and expensive barriers to any attempt to improve spatial databases through partial correction and update; and that it can only be addressed by a radical restructuring of spatial databases around the concept of measurement, in what he terms a measurement-based GIS. In practice, the almost universal use of absolute coordinates to define position in spatial databases ensures that any information about spatial dependence, and the processes used to compute or compile such positions, will have been lost at some point during the production process.
Cross-References
References
- Aerts JCJH, Goodchild MF, Heuvelink GBM (2003) Accounting for spatial uncertainty in optimization with spatial decision support systems. Trans GIS 7(2):211–230CrossRefGoogle Scholar
- Duckham M (2002) A user-oriented perspective of error-sensitive GIS development. Trans GIS 6(2):179–194CrossRefGoogle Scholar
- Ehlschlaeger CR, Shortridge AM, Goodchild MF (1997) Visualizing spatial data uncertainty using animation. Comput Geosci 23(4):387–395CrossRefGoogle Scholar
- Fisher PF, Unwin DJ (eds) (2005) Re-presenting GIS. Wiley, HobokenGoogle Scholar
- Frolov YS, Maling DH (1969) The accuracy of area measurement by point counting techniques. Cartogr J 1:21–35CrossRefGoogle Scholar
- Goodchild MF (1998) Uncertainty: the Achilles heel of GIS? Geo Info Syst 50–52Google Scholar
- Goodchild MF (2004) A general framework for error analysis in measurement-based GIS. J Geogr Syst 6(4):323–324CrossRefGoogle Scholar
- Goodchild MF, Gopal S (1989) Accuracy of spatial databases. Taylor and Francis, New YorkGoogle Scholar
- Goovaerts P (1997) Geostatistics for natural resources evaluation. Oxford, New YorkGoogle Scholar
- Heuvelink GBM (1998) Error propagation in environmental modelling with GIS. Taylor and Francis, LondonGoogle Scholar
- Heuvelink GBM (2005) Handling spatial uncertainty in GIS: development of the data uncertainty engine. Instituto Geografico Portugues, EstorilGoogle Scholar
- Kendall MG (1961) A course in the geometry of n dimensions. Hafner, New YorkMATHGoogle Scholar
- Kyriakidis P, Goodchild MF (2006) On the prediction error variance of three common spatial interpolation schemes. Int J Geogr Inf Sci 20(8):823–856CrossRefGoogle Scholar
- Maling DH (1989) Measurement from maps: principles and methods of cartometry. Pergamon, New YorkGoogle Scholar
- Woodcock CE, Gopal S (2000) Fuzzy set theory and thematic maps: accuracy assessment and area estimation. Int J Geogr Inf Sci 14(2):153–172CrossRefGoogle Scholar
- Zhang J-X, Goodchild MF (2002) Uncertainty in geographical information. Taylor and Francis, New YorkCrossRefGoogle Scholar
- Zhu AX, Band LE, Dutton B, Nimlos T (1996) Automated soil inference under fuzzy logic. Ecol Model 90(2):123–145CrossRefGoogle Scholar