Abstract
The question the paper deals with refers to how it is possible to update existing geodatabases considering both their accuracies and those of the new measurements taken for their updating. Traditionally, maintaining geodatabases (or map bases) has been highly time consuming, costly, and sometimes difficult work, especially in urban and high-density areas. The most common procedure is to globally generate geodatabases every few years by photogrammetric techniques. On the opposite, the possibility of dynamically updating the landscape information from a maintained core spatial database can be considered as an appealing alternative to traditional map revision techniques. A kriging solution, based on the hypothesis that the vector field of the position error on a geodatabase is a homogeneous, isotropic intrinsic random field with constant mean and variogram depending only on the squared distance, known a priori from the relative accuracy of the map, is proposed. The method is a first approach to the problem, as far as at the moment it does not consider constraints to which points on the geodatabase must adapt to. That is the reason why it is presented as an intermezzo.
Similar content being viewed by others
Position of the problem
Spatial databases can be nowadays easily managed by means of geographic information system tools (Rigaux et al. 2002). The hot point in the first era of the digital revolution was mainly to collect pieces of information. On the opposite now the great challenge relates to provide geographic data of high quality level (and therefore the estimation of such a kind of information) and to make available methods to exploit as much as possible the already available information, correctly modeling the uncertainty intrinsic into spatial data, instead of remaking maps every time from the beginning. Thus data sharing, data conflating (Brovelli and Zamboni 2004, 2006), and updating (Arnold and Wright 2005) must be supported by rigorous approaches having a sound statistical basis. Efforts have been made in this regard by some researchers (Leung et al. 2004). The paper places in this frame, trying to solve the problem at least for some elementary cases.
We consider a geodatabase basically as a collection of points {P i ; i = 1,2,…N} of known planar coordinates, {(x i y i )}, topological information, allowing to identify linear or area features and other numerical or thematic information concerning several attributes related to these points and objects. Here, we will be concerned only with the first aspect, namely the set of points {P i } and their coordinates. The question we want to handle is the following: imagine we perform new geodetic measurements involving one part of the points {P i }, how do we update their coordinates in view of such an information?
As an example, think that at some points P i we have done observations with a GPS receiver, connected to a positioning service in a RTK mode, as it is quite usual nowadays.
Of course if it was available the full covariance matrix of the vector of coordinates {(x i y i )} and that of the new measurements, the update could be performed rigorously, at least in terms of least squares theory.
This is however not the practical situation and for two reasons. The former is that when the number of points is of the order of 104 or more its covariance information is of the order of 108 real numbers or more and storing and using such a huge quantity of information is not the easiest thing to do.
The latter reason is even more important and it is that, in practice, this information is usually not available because the production of such coordinates is the result of many steps in which noise propagation is not possible or was not performed. On the opposite logical side there would be the practice of moving, namely updating, only the points which have been involved into the new measurements. This immediately leads to unacceptable results, because it would modify the relative position of objects in an incompatible way, for instance moving a building into another one or in the middle of the road. It is obvious that it is simply impossible to ignore that there is a relation between points so that once one is shifted by new observations, also the others should follow. The question is that part of the points should move like rigid bodies, verifying metric constraints as lengths of sides, rectangular shapes, etc., while in general all points should have a kind of “elastic” relation to one another.
In order to cope with such problems, the authors have already invoked a Bayesian concept (Belussi et al. 2006), introducing first of all geometrical constraints into the prior distribution of the coordinate vector and binding all the points, one to the other, by means of a simple network of pseudo-observations.
The idea bears some merits, in particular by showing that Bayesian statistics is a natural tool to be applied to an updating problem. Yet, the prior distribution reflecting the true information available on the geodatabase is still a problem. Specifically, the proposal of connecting all the points on the geodatabase by a plane pseudo-traverse is questionable in that the accuracy of each individual point will then depend on the quite arbitrary design of the traverse itself.
In this paper, we give precisely a solution to this problem when the prior information is assumed to consist barely of a relative accuracy prescription.
With “relative accuracy”, here we mean that given any two points P, Q on the geodatabase, the errors in their coordinates are such that
with K a given constant, \( \underline{\text{r}}_{PQ} \) the planar base vector between P and Q, and σ the r.m.s. of its modulus.
It turns out that (1.1) defines an essential feature of the error field \( \underline{\text{u}} (P) = \left( {\delta x(P),\delta y(P)} \right) \) of the geodatabase, namely the variogram. In this way, the well-known kriging theory (Wackernagel 2003) becomes available for the solution of the present problem. So we find a clear method to update the coordinates of all points, without considering them as possibly belonging to rigid bodies, i.e., without any further constraint.
This is the reason why we have entitled the paper “Bayesian intermezzo”; because the full solution under the Bayesian concept still has to be studied.
The relative accuracy of the geodatabase defines the variogram of the error random field
In this paragraph, we set up the prior stochastic model of the geodatabase on the basis of our hypotheses, namely the knowledge of the relative accuracy translated into formula (1.1).
So we assume that any identifiable point P on the geodatabase, with coordinates
has a random position, due to previous noise propagation, with average
and the difference between the two, the position error vector \( \underline{\text{u}} (P) \), is a random field with zero average and finite variance.
Since the theory we are going to develop is fully translation invariant, the hypothesis \( E\left\{ {\underline{\text{u}} (P)} \right\} = 0 \), can be substituted by the much weaker
a constant translation vector valid for the whole geodatabase.
This hypothesis is often useful if we have to compare geodatabase coordinates with GPS coordinates which might be given in a different geodetic datum.
Let us agree on the hypothesis that the vector random field \( \underline{\text{u}} (P) \) is intrinsically homogeneous and isotropic (cf. Matheron 1970).
Notice that this implies an extension of the usual kriging concept in the sense that we assume
where d PQ is the Euclidean distance between P and Q and in addition that the component of \( \underline{\text{u}} (P) \) along any fixed direction is again an intrinsic process such that
with f independent of the direction of the arbitrary unit vector \( \underline{\text{e}} \).
This implies that
and also
because the sum of the second and third term in (2.6) has to be γ(d PQ ).
In addition it turns out that
This last relation is easily derived by imposing that the field
has f(d PQ ) as variogram irrespectively of the value of α.
In fact, using (2.6)
and this implies (2.8).
We call γ(d) the variogram of the field \( \underline{\text{u}} \) and we note that each component of \( \underline{\text{u}} \) has a variogram which is just a half of γ. Under these hypotheses, let us consider the random vector
Indeed we have
Now we need to compute \( \underline{\text{E}} \left\{ {\left| {\underline{\text{R}}_P - \underline{\text{R}}_Q } \right|} \right\} \). We do that with an approximation up to the second order in \( \left| {\underline{\text{u}} } \right| \).
Observe that for any finite vector \( \underline{\text{r}} \) and a small enough vector \( \underline{\varepsilon } \) we have (denoting \( \left| {\underline{\text{r}} } \right| = r \))
We recall that \( E\left\{ {\underline{\text{u}}_P - \underline{\text{u}}_Q } \right\} = 0 \) by hypothesis and use (2.7), to find, neglecting third order terms in \( \underline{\varepsilon } = \underline{\text{u}}_P - \underline{\text{u}}_Q \),
Squaring (2.12) and subtracting from (2.10) yields
which is the sought relation.
According to (1.1), (2.13) implies
One interesting remark is that the model (2.14), which is indeed an authorized variogram (Wackernagel 2003), is not however compatible with a stationary covariance.
This is the reason why we have from the beginning pointed to kriging, rather than the ordinary Wiener–Kolmogorov theory, also known as collocation in geodesy.
The updating algorithm
Now that the relations (2.13), (2.14) are established, the full machinery of kriging theory is available to solve the updating problem formalized in the following.
We assume to have a geodatabase with points P identified by the vector
where \( \underline{\text{u}} (P) \) is an intrinsic random field, homogeneous and isotropic, endowed with the known variogram γ(d PQ ), given, in the present context, by (2.14).
Moreover, we assume that new measurements have been performed involving part of the points {P i } so that we have from them, namely for a certain subset J of the indexes {i = 1,2,…,N}, a new sequence of coordinates
where we have just denoted with an index i quantities related to the point P i .
In (3.2), we suppose that
so that the full covariance of the vector \( \underline{\nu } = \left[ {\begin{array}{*{20}c} {...} & {\underline{\nu }_i^t } & {...} \\ \end{array} } \right]^t \) is known. For the sake of simplicity, we shall assume that the index set J is just that of the first J positions. Since we are thinking of an updating of the geodatabase, we obviously think of the variances of the coordinates in \( \underline{\text{N}}_i \) as significantly smaller than the variances of the geodatabase errors \( \underline{\text{u}}_i \). Moreover, we assume \( \underline{\nu }_i \) to be uncorrelated with the field \( \underline{\text{u}} (P) \).
Now we observe that
this relation means that we can consider \( \underline{\text{R}}_i - \underline{\text{N}}_i \) as an observation of the random field \( \underline{\text{u}} (P) \) at the updating points P i , \( i \in J \), with noise \( - \underline{\nu }_i \). So, we shall put
\( \underline{\text{u}}_{{{\text{o}}i}} \) being the observation vector.
The problem is precisely how to predict \( \underline{\text{u}} (P) \) at any other point of the geodatabase.
We look for a linear prediction, i.e., for a linear combination of the observed \( \underline{\text{u}}_{{{\text{o}}i}} \),
unconditionally unbiased or, recalling (2.3), such that
Moreover we will require that the mean square prediction error E2
be minimized. Note that the class of our predictors (3.6) is not the most general one, which would be of the form
with Λ i a set of 2 × 2 matrices. On the other hand, it is easy to see that a predictor like (3.9) could be determined only if we knew the full covariance of \( \underline{\text{u}}_i \), which is not the case.
The form (3.6) in fact translates our prior hypothesis of isotropy, which is not a model for how the “true” error \( \underline{\text{u}} (P) \) is but rather the formalization of our lack of prior information.
In other words, this is where the Bayesian concept plays its role. As usual the condition (3.7) implies that the coefficients λ i have to satisfy the relation
Now the proof runs as that of ordinary kriging (Wackernagel 2003) and we report it here only briefly.
First one can prove that, under (3.10),
where
and we have used
So by using the synthetic notation
we have
In fact from
which holds because of (3.10), one derives
Now using the identity
and again recalling (3.10), one easily derives (3.14).
The target function (3.14) can be minimized with respect to \( \underline{\lambda } \) with the constraint (3.10) that is written, introducing the vector \( \underline{\text{e}}^t = \left[ {\begin{array}{*{20}c} 1 & 1 & {...} & 1 \\ \end{array} } \right] \), as
By using a Lagrange multiplier −2α, one finds the usual kriging equations
Once \( \underline{\lambda } \) is derived, \( \widehat{{\underline{u} }}\left( P \right) \) is computed through (3.6) and (3.14) provides its prediction error.
Then the updated geodatabase at the original points P i , of coordinates \( \underline{\text{R}}_{\text{up}} \left( {P_i } \right) \) is
and, since
the prediction error at P i becomes the error of the updated geodatabase.
We conclude the paragraph with a remark. Namely, in addition to the above results, one would like as well to get hold of the translation between geodatabase and updated coordinates. It is known that the optimal estimate can be obtained only if we know the full covariance structure of \( \underline{\text{u}} (P) \).
But since this is not available, we can still have a non-optimal estimate, for instance by taking
Though we are not able to give the variance of this estimator, (3.19) is sufficient for many practical purposes.
An extension and a few examples
Since most of the updating work is nowadays done by GPS and since, when a positioning service is not available, a natural outcome of GPS observations is base vectors between couples of points or even local networks of base vectors, it is interesting to see whether and how the above developed theory can account for this kind of measurements.
We will do that taking the case of single base, between points P 1 and P 2,
leaving to the reader the easy generalization to more bases. We assume that \( E\left\{ {\underline{\nu } } \right\} = E\left\{ {\underline{\nu }_2 - \underline{\nu }_1 } \right\} = 0 \) and that the covariance of \( \underline{\nu } \) is known. This case is not covered by ordinary kriging theory and then it requires some adjustment.
The point is that now the observations are themselves translation invariant and therefore they cannot convey any information on the absolute value of \( \underline{\text{u}} (P) \).
Nevertheless, we can expect to be able to say something about variations of \( \underline{\text{u}} (P) \) for instance about \( \delta \underline{\text{u}}_P = \underline{\text{u}} (P) - \underline{\text{u}} \left( {P_1 } \right) \). So we have as observation:
and we try to estimate \( \delta \underline{\text{u}}_P \) as a linear function of \( \delta \underline{\text{u}}_{{{\text{o}}2}} \), namely
It is extremely important to notice however that in this context
as well as
so that there is no constraint on (4.3) to get an unbiased predictor, namely λ is a free variable. Now we proceed to compute the r.m.s. prediction error, i.e.
On the other hand
We call, for the sake of simplicity
and then, combining (4.7) and (4.6), we find
The minimum of E 2 is obtained at
From (4.9) and (4.8), the prediction error can be computed. So we can put now
and therefore we have
Formula (4.10) tells us how to update the geodatabase, relative to the position of P 1, while (4.11) tells us that the prediction error of \( \delta \underline{\text{u}}_P \) is the new error of the updated geodatabase.
To get acquainted with the above concepts, we present two small artificial examples.
Example 1: assume you have a geodatabase with a prior relative error of σ(d) = 2 · 10−4 d. We have two points P 1 and P 2 observed by GPS with errors \( \underline{\nu }_1, \,\,\underline{\nu }_2 \) such that
The position of the two points is respectively −1 km and +1 km on the x-axis at the scale of the terrain.
The errors observed at P 1, P 2 are \( \underline{\text{u}}_{\text{o1}}, \underline{\text{u}}_{\text{o2}} \), and are given in meters.
Note that, according with (2.14), γ(d) = 4 · 10−8 d 2 (d given in meters) = 4 · 10−2 d 2 (d given in kilometers), meaning that we shall express d in km but the result for γ will be in m2.
So we have
and the conditioned normal system (3.16) becomes
We notice that, in this system, \( \underline{\lambda } \) is non-dimensional, α is in m2 as the known terms, but d 1 and d 2 are expressed in km.
The solution of the system is
So that, using the relation \( d_1^2 - d_2^2 = 4x \) (x in km),the geodatabase correction is given by
The result nicely illustrates that, when \( \underline{\text{u}}_{\text{o1}} = \underline{\text{u}}_02 \), \( \widehat{{\underline{u} }}\left( P \right) \) is just a common translation. The unit of \( \widehat{{\underline{u} }}\left( P \right) \) are meters as those of \( \underline{\text{u}}_{\text{o1}}, \underline{\text{u}}_{\text{o2}} \).
Finally, we compute the new geodatabase error (in m2) and we find
Here, we read that at the origin the error is 0.1 m, which is a reasonable number, and that the error increases faster in the Y direction, which again is understandable, given the design of the updating points.
Example 2: we take the same situation as in Example 1, with the only difference that now we assume that instead of giving separately, the updated coordinates of P 1 and P 2,we give the base vector \( \underline{\text{N}}_2 - \underline{\text{N}}_1 \).
Correspondingly (cf. 4.2), we will have an observation \( \delta \underline{\text{u}}_02 \) with a vector noise \( \underline{\nu } \) that now we assume to have covariance
Our purpose is to predict \( \delta \widehat{{\underline{u} }}\left( P \right) \) according to (4.3). The solution is almost given by (4.9) so we have only to use
in that formula.
The result is
we observe that indeed λ = 0 when x = −1, y = 0 because in that case \( \delta \underline{\text{u}}_P = 0 \).
On the other hand, the same is true when x = −1 and y whatever; this is explained by the fact that we have only one base vector observed, directed along x and this gives no information on \( \delta \underline{\text{u}} \) when \( \underline{\text{r}}_P - \underline{\text{r}}_1 \) is along the y axis.
Finally, a computation of E2 up to 10−4 m2, and neglecting 10−6 m2, gives
As we see E2 is zero for y = 0, x = -1 as it should be. Also interesting is the case y = 0, x = 1, namely P = P 2, where \( E^2 (P) = 4 \cdot 10^{- 2} {\text{m}}^2 \), which is complying with the noise of \( \delta \underline{\text{u}}_{02} \).
Conclusions
If one assumes that the vector field \( \underline{\text{u}} (P) \) of the position error on a geodatabase is a homogeneous, isotropic intrinsic random field with constant mean and variogram (2.14), known a priori from the relative accuracy of the geodatabase, one can attack the problem of updating the position \( \underline{\text{R}}_P \), of any point P on the geodatabase, by kriging theory.
With a typical GPS updating survey in mind, in the paper two cases are covered: that the coordinates of individual points P i , \( i \in J \), are re-determined or that base vectors \( \underline{\text{b}}_{ik} = \underline{\text{N}}_{{P_i }} - \underline{\text{N}}_{P{_k }} \) are determined.
On the basis of the new observations, the new geodatabase coordinates are derived and their mean square error is computed.
As explicitly stated, this approach does not take into account any other geometric relation between points than that of the relative accuracy of the bases \( \left| {\underline{\text{ R}}_i - \underline{\text{R}}_k } \right| \).
If other constraints have to be considered, this can always be done a posteriori.
For instance, consider the case of Example 1 and assume that four points P 3, P 4, P 5, P 6 have been accordingly updated, with coordinates \( \underline{\text{R}}_{\text{up3}}, \,\,\underline{\text{R}}_{\text{up4}}, \,\,\underline{\text{R}}_{\text{up5}}, \,\,\underline{\text{R}}_{\text{up6}} \) with errors of know variances. Assume further that we want to impose the condition that P 3, P 4, P 5, P 6 are the vertices of a rectangle; then one can easily create the family of rectangles in the plane, depending on five parameters (namely the sides a, b and the three parameters of a roto-translation) and find the one that best fits the updated coordinates. Similar artifacts can be found to impose different conditions. Of course, a full rigorous theory would adjust the updating observations, taking the constraint into account all together. But this is left for future work.
References
Arnold LM, Wright GL (2005) Analysing product-specific behaviour to support process dependent updates in a dynamic spatial updating model. Trans GIS 9(3):397–419
Belussi A, Brovelli MA, Negri M, Pelagatti G, Sansò F (2006) Dealing with multiple accuracy levels in spatial databases with continuous update, Proceedings of Accuracy 2006—7th International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences, Instituto Geografico Portugues, Lisboa, Portugal, pp 203–212
Brovelli MA, Zamboni G (2004) Adaptive transformation of cartographic bases by means of multiresolution spline interpolation. Int Arch Photogramm Remote Sens Spat Inf Sci 35(part B):206–211
Brovelli MA, Zamboni G (2006) The usability of vectorization and a new point matching procedure as first step in conflating raster and vector maps. Int Arch Photogramm Remote Sens Spat Inf Sci 36(part 2/W40):85–91
Leung Y, Ma J-H, Goodchild MF (2004) A general framework for error analysis in measurement-based GIS part 1: the basic measurement-error model and related concepts. J Geogr Syst 6(4):325–354
Matheron G (1970) La théorie des variables régionalisées, et ses applications. Fascicule 5 des cahiers du Centre de Morphologie Mathématique de Fontainebleau
Rigaux P, Scholl M, Voisard A (2002) Spatial databases: with application to GIS. Morgan Kaufmann Series in Data Management Systems
Wackernagel H (2003) Multivariate geostatistics: an introduction with applications, 3rd edn. Springer, Berlin
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License ( https://creativecommons.org/licenses/by-nc/2.0 ), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
About this article
Cite this article
Brovelli, M.A., Sansò, F. Geodatabase updating by new measurements; a Bayesian intermezzo. Appl Geomat 1, 41–47 (2009). https://doi.org/10.1007/s12518-009-0003-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12518-009-0003-3