Skip to main content

Simultaneous estimation of cross-validation errors in least squares collocation applied for statistical testing and evaluation of the noise variance components

Abstract

The cross-validation technique is a popular method to assess and improve the quality of prediction by least squares collocation (LSC). We present a formula for direct estimation of the vector of cross-validation errors (CVEs) in LSC which is much faster than element-wise CVE computation. We show that a quadratic form of CVEs follows Chi-squared distribution. Furthermore, a posteriori noise variance factor is derived by the quadratic form of CVEs. In order to detect blunders in the observations, estimated standardized CVE is proposed as the test statistic which can be applied when noise variances are known or unknown. We use LSC together with the methods proposed in this research for interpolation of crustal subsidence in the northern coast of the Gulf of Mexico. The results show that after detection and removing outliers, the root mean square (RMS) of CVEs and estimated noise standard deviation are reduced about 51 and 59%, respectively. In addition, RMS of LSC prediction error at data points and RMS of estimated noise of observations are decreased by 39 and 67%, respectively. However, RMS of LSC prediction error on a regular grid of interpolation points covering the area is only reduced about 4% which is a consequence of sparse distribution of data points for this case study. The influence of gross errors on LSC prediction results is also investigated by lower cutoff CVEs. It is indicated that after elimination of outliers, RMS of this type of errors is also reduced by 19.5% for a 5 km radius of vicinity. We propose a method using standardized CVEs for classification of dataset into three groups with presumed different noise variances. The noise variance components for each of the groups are estimated using restricted maximum-likelihood method via Fisher scoring technique. Finally, LSC assessment measures were computed for the estimated heterogeneous noise variance model and compared with those of the homogeneous model. The advantage of the proposed method is the reduction in estimated noise levels for those groups with the fewer number of noisy data points.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

References

  • Amiri-Simkooei A (2007) Least-squares variance component estimation: theory and GPS applications. Doctoral dissertation, TU Delft, Delft

  • Arabelos DN, Forsberg R, Tscherning CC (2007) On the a priori estimation of collocation error covariance functions: a feasibility study. Geophys J Int 170:527–533

    Article  Google Scholar 

  • Baarda W (1968) A testing procedure for use in geodetic networks. In: Geodesy, New series, vol 2. issue 5, Netherlands Gedetic Commission, Delft

  • Burden RL, Faires JD (2011) Numerical analysis, 9th edn. Brooks/Cole, Pacific Grove

    Google Scholar 

  • Darbeheshti N, Featherstone WE (2009) Non-stationary covariance function modelling in 2D least-squares collocation. J Geod 83(6):495–508

    Article  Google Scholar 

  • Dokka RK (2011) The role of deep processes in late 20th century subsidence of New Orleans and coastal areas of southern Louisiana and Mississippi. J Geophys Res Solid Earth 116:B06403. https://doi.org/10.1029/2010jb008008

  • El-Fiky G, Kato T, Fuji Y (1997) Distribution of vertical crustal movement rates in the Tohoku district, Japan, predicted by least-squares collocation. J Geod 71(7):432–442

    Article  Google Scholar 

  • Eshagh M, Sjöberg LE (2011) Determination of gravity anomaly at sea level from inversion of satellite gravity gradiometric data. J Geodyn 51(5):366–377

    Article  Google Scholar 

  • Featherstone WE, Sproule DM (2006) Fitting AusGeoid98 to the Australian height datum using GPS-levelling and least squares collocation: application of a cross-validation technique. Surv Rev 38(301):573–582

    Article  Google Scholar 

  • Grafarend EW (1976) Geodetic applications of stochastic processes. Phys Earth Planet Inter 12(3):151–179

    Article  Google Scholar 

  • Grodecki J (1999) Generalized maximum-likelihood estimation of variance components with inverted gamma prior. J Geod 73(7):367–374

    Article  Google Scholar 

  • Harville DA (1997) Matrix algebra from a statistician’s perspective. Springer, New York

    Book  Google Scholar 

  • Jarmołowski W (2013) A priori noise and regularization in least squares collocation of gravity anomalies. Geod Cartogr 62(2):199–216

    Article  Google Scholar 

  • Jarmołowski W (2015) Least squares collocation with uncorrelated heterogeneous noise estimated by restricted maximum likelihood. J Geod 89(6):577–589

    Article  Google Scholar 

  • Jarmołowski W, Bakuła M (2014) Precise estimation of covariance parameters in least-squares collocation by restricted maximum likelihood. Stud Geophys Geod 58(2):171–189

    Article  Google Scholar 

  • Kitanidis PK (1983) Statistical estimation of polynomial generalized covariance functions and hydrologic applications. Water Resour Res 19(4):909–921

    Article  Google Scholar 

  • Koch KR (1977) Least squares adjustment and collocation. Bull Geod 51(2):127–135

    Article  Google Scholar 

  • Koch KR (1986) Maximum likelihood estimate of variance components. Bull Geod 60(4):329–338

    Article  Google Scholar 

  • Koch KR (1999) Parameter estimation and hypothesis testing in linear models, 2nd edn. Springer, Berlin

    Book  Google Scholar 

  • Koch KR (2007) Introduction to Bayesian statistics, 2nd edn. Springer, New York

    Google Scholar 

  • Koch KR, Kusche J (2002) Regularization of geopotential determination from satellite data by variance components. J Geod 76(5):259–268

    Article  Google Scholar 

  • Krakiwsky EJ, Biacs ZF (1990) Least squares collocation and statistical testing. Bull Geod 64(1):73–87

    Article  Google Scholar 

  • Krarup T (1969) A contribution to the mathematical foundation of physical geodesy, pub. 44. Dan Geod Inst, Copenhagen

  • Kusche J, Klees R (2002) Regularization of gravity field estimation from satellite gravity gradients. J Geod 76(6–7):359–368

    Article  Google Scholar 

  • Mikhail EM, Ackermann F (1976) Observations and least squares. Harper and Row, New York

    Google Scholar 

  • Moritz H (1962) Interpolation and prediction of gravity and their accuracy, rep. 24. Inst Geod Phot Cart, Ohio State University, Columbus

  • Moritz H (1972) Advanced least-squares methods, vol 175. Department of Geodetic Science, Ohio State University, Columbus

    Google Scholar 

  • Moritz H (1980) Advanced physical geodesy. Herbert Wichmann Verlag, Karlsruhe

    Google Scholar 

  • Pope AJ (1976) The statistics of residuals and the detection of outliers. NOAA technical report NOS 65 NGS 1

  • Rummel R, Schwarz KP, Gerstl M (1979) Least squares collocation and regularization. Bull Geod 53(4):343–361

    Article  Google Scholar 

  • Sadiq M, Tscherning CC, Ahmad Z (2009) An estimation of the height system bias parameter N0 using least squares collocation from observed gravity and GPS-levelling data. Stud Geophys Geod 53(3):375–388

    Article  Google Scholar 

  • Schaffrin B (2001) Softly unbiased prediction. Part 2: the random effects model. Boll Geod Sci Affini 60(1):49–62

    Google Scholar 

  • Shinkle KD, Dokka RK (2004) Rates of vertical displacement at benchmarks in the lower Mississippi Valley and the northern Gulf Coast, US Department of Commerce NOAA technical report NOS/NGS 50

  • Snow KB (2012) Topics in total least-squares adjustment within the errors-in-variables model: singular cofactor matrices and prior information. Doctoral dissertation, The Ohio State University

  • Stein ML (1999) Interpolation of spatial data: some theory for kriging. Springer, New York

    Book  Google Scholar 

  • Teunissen PJG (2000) Testing theory an introduction. Series on mathematical geodesy and positioning. Delft University Press, Delft

    Google Scholar 

  • Tscherning CC (1991a) Strategy for gross-error detection in satellite altimeter data applied in the Baltic-sea area for enhanced geoid and gravity determination. Determination of the geoid. Springer, New York, pp 95–107

    Google Scholar 

  • Tscherning CC (1991b) The use of optimal estimation for gross-error detection in databases of spatially correlated data. Bull d’Inf 68:79–89

    Google Scholar 

  • Vaníček P, Krakiwsky EJ (1986) Geodesy: the concepts. North Holland, Amsterdam

    Google Scholar 

  • Vestøl O (2006) Determination of postglacial land uplift in Fennoscandia from leveling, tide-gauges and continuous GPS stations using least squares collocation. J Geod 80(5):248–258

    Article  Google Scholar 

  • Wei M (1987) Statistical problems in collocation. Manuscr Geod 12:282–289

    Google Scholar 

  • Yang Y, Zeng A, Zhang J (2009) Adaptive collocation with application in height system transformation. J Geod 83(5):403–410

    Article  Google Scholar 

Download references

Acknowledgements

The US National Oceanic and Atmospheric Administration and US National Geodetic Survey are appreciated for providing access to the observational data for this research. We would like to thank the editors and reviewers for many constructive and insightful comments that lead to major improvements of the manuscript. Dr. Soheil Vasheghani is acknowledged for proofreading the English of the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Behzad Behnabian.

Appendices

Appendix A

1.1 A lemma in linear algebra

Notation For an arbitrary matrix \(\mathbf{D}=\left\{ {d_{ij} } \right\} \), \(\mathbf{d}_{i,-i} \) is the ith row of D whose ith element is removed and \({{\varvec{D}}}_{-i,-i}\) is the same matrix whose ith row and column are removed.

Lemma

If \(\mathbf{A}\) represents an arbitrary symmetric positive definite matrix and \(\mathbf{B}=\mathbf{A}^{-1}\), then

$$\begin{aligned} \mathbf{a}_{i,-i} \mathbf{A}_{-i,-i}^{-1} =-\frac{1}{b_{ii} }\mathbf{b}_{i,-i} \end{aligned}$$
(A1)

Proof

we define the vector \(\mathbf{d}^{(i)}\) by

$$\begin{aligned} \mathbf{d}^{(i)}=\mathbf{b}_i \mathbf{A} \end{aligned}$$
(A2)

where \(\mathbf{b}_i \) is the ith row of \(\mathbf{B}\), the kth element of \(\mathbf{d}^{(i)}\) is simply derived

$$\begin{aligned} d_k^{(i)} =\sum _{j} {b_{ij} a_{jk}} =\delta _{ik} \end{aligned}$$
(A3)

where \(\delta _{ik} \) is the Kronecker delta. Considering the arbitrary vector \(\mathbf{e}^{(i)}\) that is defined by

$$\begin{aligned} \mathbf{e}^{(i)}=\mathbf{b}_{i,-i} \mathbf{A}_{-i,-i} \end{aligned}$$
(A4)

and using Eq. (A3), one can conclude that:

$$\begin{aligned} e_k^{(i)}= & {} \sum _{j\ne i} {b_{ij} a_{jk} } =d_k^{(i)} -b_{ii} a_{ik}\nonumber \\= & {} -b_{ii} a_{ik} \quad \forall k\ne i \end{aligned}$$
(A5)

Finally, the following relations are deduced from Eq. (A5)

$$\begin{aligned} a_{ik}= & {} \frac{-1}{b_{ii} }e_k^{(i)} \nonumber \\= & {} \frac{-1}{b_{ii} }\sum _{j\ne i} {b_{ij} a_{jk} } \quad \forall k\ne i \end{aligned}$$
(A6)

Therefore,

$$\begin{aligned} \mathbf{a}_{i,-i} =-\frac{1}{b_{ii} }{} \mathbf{b}_{i,-i} \mathbf{A}_{-i,-i} \end{aligned}$$
(A7)

It has to be mentioned here that any principal submatrix of a positive definite matrix is also positive definite (Harville 1997, p. 214). Therefore, for the positive definite matrix \(\mathbf{A}\), \(\mathbf{A}_{-i,-i} \) is always invertible. \(\square \)

Appendix B

1.1 LSC prediction errors and noise estimation

LSC prediction error at an unobserved point \(p_0 \) is computed by (Moritz 1972, p. 47; Mikhail and Ackermann 1976, p. 422)

$$\begin{aligned} \sigma _{\hat{{y}}_0 }^2= & {} c_{s_{0} s_{0} } -\mathbf{c}_{{s}_{0} \mathbf{s}} \mathbf{C}_{\mathbf{ww}}^{-1} \mathbf{c}_{s_0 \mathbf{s}}^T \nonumber \\&+\left( {\mathbf{c}_{s_0 \mathbf{s}} \mathbf{C}_{\mathbf{ww}}^{-1} \mathbf{A}-\mathbf{a}_0 } \right) \mathbf{C}_{{\hat{\mathbf{x}}\hat{\mathbf{x}}}} \left( {\mathbf{c}_{s_0 \mathbf{s}} \mathbf{C}_{\mathbf{ww}}^{-1} \mathbf{A}-\mathbf{a}_0 } \right) ^{T} \end{aligned}$$
(B1)

where \(\hat{{y}}_0 \) is prediction of y at \(p_0 \) and \(c_{s_0 s_0 } \) is the signal variance, \(\mathbf{c}_{s_0 \mathbf{s}} \) is the cross-covariance vector of the predicted point and the vector of data points, \(\mathbf{a}_0 \) is the vector of trend for predicted point, and \(\mathbf{C}_{{\hat{\mathbf{x}}\hat{\mathbf{x}}}} \) denotes the covariance matrix of estimated trend parameters which is computed by the following formula

$$\begin{aligned} \mathbf{C}_{{\hat{\mathbf{x}}\hat{\mathbf{x}}}} =\left( {\mathbf{A}^{T}\mathbf{C}_{\mathbf{ww}}^{-1} \mathbf{A}} \right) ^{-1} \end{aligned}$$
(B2)

LSC internal error (adopted from Darbeheshti and Featherstone 2009) is LSC prediction error at an observed point \(p_i \)

$$\begin{aligned} \sigma _{\hat{{y}}_i }^2= & {} c_{s_i s_i } -\mathbf{c}_{s_i \mathbf{s}} \mathbf{C}_{\mathbf{ww}}^{-1} \mathbf{c}_{s_i \mathbf{s}}^T \nonumber \\&+\left( {\mathbf{c}_{s_i \mathbf{s}} \mathbf{C}_{\mathbf{ww}}^{-1} \mathbf{A}-\mathbf{a}_i } \right) \mathbf{C}_{{\hat{\mathbf{x}}\hat{\mathbf{x}}}} \left( {\mathbf{c}_{s_i \mathbf{s}} \mathbf{C}_{\mathbf{ww}}^{-1} \mathbf{A}-\mathbf{a}_i } \right) ^{T} \end{aligned}$$
(B3)

where \(\hat{{y}}_i \) is prediction of y at \(p_i \) and \(c_{s_i s_i } \) is the signal variance, \(\mathbf{c}_{s_i \mathbf{s}}\) is the cross-covariance vector of the predicted point and the vector of data points, \(\mathbf{a}_i \) is the ith row of \(\mathbf{A}\).

Noise of the observations in Eq. (1) is always unknown. It can be estimated by the following formula (Moritz 1980, p. 119)

$$\begin{aligned} {\hat{\mathbf{n}}}=\mathbf{C}_{\mathbf{nn}} \mathbf{C}_{\mathbf{ww}}^{-1} \left( {\mathbf{y}-\mathbf{A}{\hat{\mathbf{x}}}} \right) \end{aligned}$$
(B4)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Behnabian, B., Mashhadi Hossainali, M. & Malekzadeh, A. Simultaneous estimation of cross-validation errors in least squares collocation applied for statistical testing and evaluation of the noise variance components. J Geod 92, 1329–1350 (2018). https://doi.org/10.1007/s00190-018-1122-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00190-018-1122-6

Keywords

  • Cross-validation errors
  • Least squares collocation
  • Statistical tests
  • Blunder detection
  • Estimation of noise variance components