On the computation of confidence regions and error ellipses: a critical appraisal

Customary confidence regions do not truly reflect in the majority of our geodetic applications the confidence one can have in one’s produced estimators. As it is common practice in our daily data analyses to combine methods of parameter estimation and hypothesis testing before the final estimator is produced, it is their combined uncertainty that has to be taken into account when constructing confidence regions. Ignoring the impact of testing on estimation will produce faulty confidence regions and therefore provide an incorrect description of estimator’s quality. In this contribution, we address the interplay between estimation and testing and show how their combined non-normal distribution can be used to construct truthful confidence regions. In doing so, our focus is on the designing phase prior to when the actual measurements are collected, where it is assumed that the working (null) hypothesis is true. We discuss two different approaches for constructing confidence regions: Approach I in which the region’s shape is user-fixed and only its size is determined by the distribution, and Approach II in which both the size and shape are simultaneously determined by the estimator’s non-normal distribution. We also prove and demonstrate that the estimation-only confidence regions have a poor coverage in the sense that they provide an optimistic picture. Next to the provided theory, we provide computational procedures, for both Approach I and Approach II, on how to compute confidence regions and confidence levels that truthfully reflect the combined uncertainty of estimation and testing.


Introduction
To evaluate the quality of an estimator, it is not uncommon to compute the probability that a certain region, dependent on the estimator, covers the unknown true parameter (vector) or alternatively, compute the size of the region that corresponds with a certain preset value of the probability. Such region is called the confidence region/set and the corresponding prob-ability is called its confidence level/coefficient (Koch 1999). Using the concept of confidence region and confidence level, one can evaluate the separation between the estimator and the unknown value it is aimed to estimate, and connect this separation to a probability. For a given confidence level, the smaller the size of the confidence region the higher the quality of the estimator, and for a given confidence region, the higher the confidence level the higher the quality of the estimator.
To construct the confidence region for the parameters of interest, one may take one of the following two approaches.
-Approach I: One can define beforehand the shape of the region, e.g., spherical, ellipsoidal or rectangular, and then compute its size for a given confidence level (Hoover 1984;Šidák 1967;Hofmann-Wellenhof et al 2003); or -Approach II: One can have the confidence region determined by the contours of the probability density function (PDF) of the estimator for a given confidence level (Hyndman 1996;Gundlich and Koch 2002;Teunissen 2007).
Examples of Approach I are the circular error probable (CEP) or the spherical error probable (SEP) as used in the field of navigation for instance. CEP/SEP is defined as the radius of the circle/sphere around the to-be-estimated point such that it contains a certain fraction, say 90% or 95%, of the estimator's outcomes. Examples of Approach II are the error ellipses/ellipsoids as used in surveying and geodetic networks. Approach I is the simplest and gives information of how often one can expect the solution to fall within the predefined shape region. Approach II is more complex, but also more informative since it shows through the shape of the confidence region where in space the solution is 'weak' and where 'strong. ' We note that if the estimator has an elliptically contoured distribution (Cambanis et al 1981), e.g., normal distribution, then there would be no difference between Approach I and Approach II provided that the confidence region in Approach I is taken to be ellipsoidal and defined by the estimator's variance matrix.
Whether Approach I or Approach II is taken, it is important to realize that the properties of the confidence region are determined by the probabilistic properties of the estimator. The size and, in case of Approach II, the shape of the confidence region are driven by the estimator's PDF. It is therefore crucial, in order to have a realistic confidence region and one that properly reflects the quality of the estimator, that one works with the correct PDF of one's estimator. It is here that we believe improvements in our geodetic and navigational practice of data analysis and quality control are in order.
In our daily data processing, it is common practice to combine methods of parameter estimation and hypothesis testing. Such examples can be found in a wide variety of different applications, like geodetic quality control (Kösters and Van der Marel 1990;Seemkooei 2001;Perfetti 2006), navigational integrity (Teunissen 1990; Gillissen and Elema 1996;Yang et al 2014), deformation analysis and structural health monitoring (Verhoef and De Heus 1995;Yavaşoglu et al 2018;Durdag et al 2018;Lehmann and Lösler 2017;Nowel 2020), and GNSS integrity monitoring (Jonkman and De Jong 2000;Kuusniemi et al 2004;Hewitson and Wang 2006;Khodabandeh and Teunissen 2016). However, when combining parameter estimation and statistical testing, the finally produced estimator will have inherited uncertainties stemming from both the estimation and testing steps. It is therefore of importance, when constructing confidence regions, to take this combined uncertainty into account. In (Teunissen 2018) it was shown, by means of the PDF of the DIA-estimator, that this combined uncertainty cannot be captured by the usually assumed normal distribution. Hence, the customary procedure followed in practice of ignoring the impact of testing on estimation (Wieser 2004;Devoti et al 2011;Dheenathayalan et al 2016;Huang et al 2019; Tengen 2020) will produce faulty confidence regions and therefore provide an incorrect description of the estimator's quality.
The aim of the present contribution is to address how the interplay between estimation and testing affects the confidence region and confidence level of the parameters of interest. We prove and demonstrate that the estimation-only confidence regions have a poor coverage in the sense that they provide a too optimistic picture and that they should thus be made larger in order to contain the required probability of one's estimator. Next to the provided theory, we also provide computational procedures, for both Approach I and Approach II, on how to correctly compute and construct confidence regions and their confidence levels.
In doing so, our focus is on the designing phase prior to when the actual measurements are collected. When designing a measurement set up (e.g., a geodetic network), one will usually assume that the actual measurements and adjustment will also be done under the working (null) hypothesis H 0 . That is, a priori one has no reason to believe why a specific other model would be valid than H 0 . But one does know a priori what estimation and testing procedure one will apply, and it is on the basis of this knowledge that one then should construct the appropriate confidence regions, in order to be able to judge one's measurement design properly.
This contribution is structured as follows. A brief review of the necessary background theory is provided in Sect. 2. We describe the null and alternative hypotheses, highlight the role of the misclosure space partitioning when testing these hypotheses, and provide and discuss the statistical distribution of the DIA-estimator. It is hereby emphasized that its distribution is non-normal even if one works with linear models and normally distributed observations. In Sect. 3, we consider, also as reference for later sections, the customary estimation-only case when testing is not taken into account. In Sect. 4, we discuss the method of constructing confidence region of Approach I. It is shown if one neglects the testing contribution and assumes the resulting estimator to be normally distributed, that the resulting confidence region of Approach I would actually cover a smaller probability than one presumes with the preset confidence level. Then, in Sect. 5, we take Approach II to form the confidence region of the DIA-estimator using its PDF contours. It is demonstrated that the confidence region resulting from Approach II can be a non-convex and even disconnected region. In Sect. 6, we describe the necessary steps for the numerical algorithms. This is first done for evaluating the confidence level of the DIA-estimator corresponding with any translational-invariant confidence region, which is then followed by a description of the numerical algorithms for constructing confidence regions based on both Approach I and Approach II. Finally, a summary with conclusions is provided in Sect. 7. We use the following notation: E(·) and D(·) denote the expectation and dispersion operator, respectively. The ndimensional space of real numbers is denoted as R n . Random vectors are indicated by use of the underlined symbol '·'. Thus x ∈ R n is a random vector, while x is not.x and x denote the BLUE-estimator and the DIA-estimator of x, respectively. The squared weighted norm of a vector, with respect to a positive-definite matrix Q, is defined as H is reserved for statistical hypotheses, P for regions partitioning the misclosure space, N (x, Q) for the normal distribution with mean x and variance matrix Q, and χ 2 (n, ξ) for the Chi-square distribution with n degrees of freedom and the non-centrality parameter of ξ . P(·) denotes the probability of the occurrence of the event within paren-

Brief review of background theory
In this section, we provide a brief review of estimation + testing inference with corresponding DIA-estimator and describe the concept of translational-invariant confidence regions. We restrict our attention to the linear model with normally distributed observables. Departures from these assumptions, such as nonlinearity and/or distributional approximations are not taken into account, but see, e.g., (Teunissen 1989;Zaminpardaz et al. 2017;Niemeier and Tengen 2017;Lösler et al. 2021).

Integrated estimation and testing
Let y ∈ R m be the normally distributed random observable vector with an unknown mean but a known variance matrix. The following null and κ > 0 alternative hypotheses are put forward on the mean of y, for i = 1, . . . , κ, with x ∈ R n the estimable unknown parameter vector, A ∈ R m×n the known design matrix of rank(A) = n, and Q yy ∈ R m×m the positive-definite variance matrix of y. We assume that [A C i ] is a known matrix of full rank and b i is an unknown vector.
To test the hypotheses in (1) against one another and select the most likely one, it is required to have redundant measurements under H 0 , i.e., r = m − n = 0. In that case, an ancillary statistic, known as the misclosure vector t ∈ R r , can be formed as (Teunissen 2006) where B ∈ R m×r is a matrix of rank(B) = r , such that [A B] ∈ R m×m is invertible and A T B = 0, see, e.g., (Yang et al 2021;Teunissen 2003, p. 62) for constructing matrix B.
With y with Q tt = B T Q yy B the variance matrix of t and C t i = B T C i . As t has a known PDF under H 0 , which is the PDF of N (0, Q tt ), any statistical testing procedure is driven by the misclosure vector t and its known PDF under H 0 . A testing procedure can be established through unambiguously assigning the outcomes of t to the hypotheses H i 's for i = 0, 1, . . . , κ, which can be realized by partitioning the misclosure space R r in κ + 1 subsets P i ⊂ R r (i = 0, 1, . . . , κ). The testing procedure is then unambiguously defined as (Teunissen 2018) select Therefore, the testing procedure's decisions are driven by the outcome of the misclosure vector t. If H i is true, then the decision is correct if t ∈ P i , and wrong if t ∈ P j =i . Note, in single-redundancy case r = 1, that P 1 = · · · = P κ = P c 0 , implying that the alternative hypotheses are not distinguishable from one another.
Statistical testing is usually followed by the estimation of the parameters of interest x. These parameters then get estimated according to the model selected through the testing procedure, aŝ Being linear functions of the normally distributed random vector y, the individual estimatorsx i 's (i = 0, 1, . . . , κ) are all normally distributed with a mean of x under H 0 .

DIA-estimator
As (5) shows, the outcome of testing (4) dictates how the parameter vector x gets estimated. The probabilistic properties of such an estimation + testing combination is captured by the DIA-estimator of Teunissen (2018). The DIA-estimator of the parameter vector x is given as where p i (t) is the indicator function of region P i (cf. 4), i.e., p i (t) = 1 for t ∈ P i and p i (t) = 0 for t / ∈ P i . The PDF ofx is then driven by the probabilistic properties of the estimatorŝ x i 's (i = 0, 1, . . . , κ) and t, which under H i is given by where fx 0 (ξ |H i , x) denotes the PDF ofx 0 under H i which is characterized by the parameter vector x. The conditional In the above equation, fx j ,t (ξ, τ |H i , x) is the joint PDF of x j and t under H i which is characterized by the parameter vector x and, given (5), can be obtained as Note, although the estimatorsx i 's (i = 0, 1, . . . , κ) and the misclosure t are normally distributed, that the DIA-estimator x is not due to its nonlinear dependency on t. Figure 1 illustrates the non-normal PDF ofx for a simple binaryhypothesis testing example, with the null hypothesis H 0 and the alternative H a , as given in (Teunissen 2018), Examples 5 and 8. The results are shown for different values of falsealarm probability P FA , standard deviation ofx 0 denoted by σx 0 , standard deviation of t denoted by σ t , and bias values b a under H a . The difference between the panels in Fig. 1 and those of Figures 6-8 in (ibid) is that the latter were accidentally based on other values than given in their captions (e.g., standard deviations instead of variances).
In this contribution, we focus on confidence region and confidence level evaluation under H 0 , and thus we work with estimators' PDF under H 0 . Using the total probability rule with correct acceptance CA = {t ∈ P 0 , H 0 } and false alarm FA = {t / ∈ P 0 , H 0 } events, the PDF ofx under H 0 can be decomposed as with the false-alarm probability P FA and the correctacceptance probability P CA satisfying P CA + P FA = 1. The second equality of (10) is a consequence ofx|CA =x 0 |CA and thatx 0 and t are independent from each other, i.e., x 0 |CA =x 0 .

Confidence region
The general definition of a confidence region and confidence level reads as follows (Schervish 1995).
Definition (Confidence region & confidence level) Let x ∈ R n be a random vector of which the PDF is dependent on the unknown parameter vector x ∈ R n . Let C α (x) ⊂ R n be a random set of which the location, size and shape are determined by α and x such that Thus, a confidence region is random and it corresponds with a certain probability that it covers the parameter vector x. The interpretation of the confidence region is that if independent samples of x were to be generated many times, the realized confidence regions would then include x, 100(1 − α)% of the time.
A random confidence region C α (x) can be constructed as follows (Casella and Berger 2001). Let S α (x) ⊂ R n be an x-dependent set with the property that Then, the random set is a 100(1 − α)% confidence region for x, satisfying (11). With this definition in mind, we refer, in the following, to S α (x) also as confidence region. For every realization of x, the corresponding realization of (13) gives all values of ξ ∈ R n for which S α (ξ ) covers that realization of x. To compute the confidence region (13) for a given confidence level or the other way round, one needs to make use of (12). The probability P x ∈ S α (x) in this equation is, in general, dependent on the unknown x through the PDF of x and S α (x).
The following result specifies the conditions under which this probability becomes independent of x.
, and H a with b a = 5. The top panels correspond to σ 2 x0 = 0.5 and σ 2 t = 2, while the bottom panels to σ 2 x0 = 0.25 and σ 2 t = 1. For the results under H a , L a was set to 0.5 Lemma 1 Let x ∈ R n be a random vector of which the PDF, denoted by f x (ξ |x), is dependent on the unknown parameter vector x ∈ R n , and let S α (x) ⊂ R n be an x-dependent set.

is a set of vectors obtained by translating all the vectors inside
In this contribution, we will consider three different confidence regions as follows.
-Estimation-only confidence region which is constructed based on the normal distribution ofx 0 under H 0 neglecting the impact of testing preceding the estimation process, and is denoted by S E α (x); -Estimation + testing confidence region (Approach I) which has the same shape as the estimation-only one but its size is determined by the PDF ofx under H 0 , and is denoted by S I α (x); -Estimation + testing confidence region (Approach II) which is constructed based on the concept of highest density regions where its shape and size are determined by the contours of the PDF ofx under H 0 , and is denoted by S II α (x). Table 1 provides an overview of these three confidence regions. For a one-dimensional example x ∈ R, Fig. 2 shows the construction of these confidence regions for the same value of α. Confidence regions in the one-dimensional case are simply intervals referred to as confidence intervals. In each panel, the horizontal border of the green area indicates the corresponding confidence interval. The estimation-only confidence interval in Fig. 2 [left] is an x-centered interval over which the probability mass of the normal PDF ofx 0 is (1 − α). The estimation + testing confidence region using Approach I in Fig. 2 [middle] is an x-centered interval over which the probability mass of the PDF ofx is (1 − α). It is observed that since the PDF ofx is less peaked around x compared to the PDF ofx 0 , the confidence interval in the middle panel is wider than the one in the left panel. This implies that working with the estimationonly confidence interval when combining estimation with testing provides a too optimistic description of the estimator's quality. The estimation + testing confidence interval using Approach II in Fig. 2 [right] consists of three disconnected sub-intervals over which the total probability mass of the PDF ofx is (1 − α) and the PDF ofx is larger than a certain value. The reason for getting a disconnected confidence interval is that the PDF ofx is multi-modal; having one major mode at x and two minor modes equally spaced to the left and right of x.
Construction of the three confidence intervals in Table 1

Estimation-only confidence region: current practice
Although parameter estimation is often carried out following a statistical testing procedure to select the most likely hypothesis, the approach that is usually followed in practice to evaluate the quality of the resulting estimator, including its confidence region, does not take into account the uncertainty of the testing decisions (Wieser 2004;Devoti et al 2011;Dheenathayalan et al 2016;Huang et al 2019;Niemeier and Tengen 2020). Therefore, in the designing phase, to construct a confidence region for the parameters of interest x under H 0 , the eventual estimator is taken asx 0 (cf. 5) rather thanx (cf. 6). Confidence regions can in principle take many different shapes. For the linear model as in (1), the confidence region is often taken to be of ellipsoidal shape. This is a conse- being distributed as a central Chi-square distribution with n degrees of freedom, The 100(1 − α)% ellipsoidal confidence region for x is then given as where χ 2 1−α (n, 0) is the (1 − α) quantile of the central Chisquare distribution with n degrees of freedom. The smaller the α, the larger the confidence region. It will be clear that both the set (15) and the normal distribution fx 0 (ξ |H 0 , x) satisfy the conditions of Lemma 1 and that therefore the probabilityx 0 lying in S E α (x) under H 0 can be computed without knowing x.
Example 1 (Three-point network with angle measurements) Consider a two-dimensional (2-D) terrestrial survey network of three points forming an isosceles triangle as shown in Fig. 3. At each point, the internal angle is measured, yielding three measurements in total (m = 3). In the following, we use E i and N i to denote the East and North coordinates of point i, respectively, d i j for the observed-minus-computed direction measurement made from point i to point j, l i j for the computed distance between point i and point j, η i ∈ R 2 for the coordinate increment vector of point i, with a jik = d ik − d i j . For the 2-D network in Fig. 3, we take points 1 and 2 as S-basis (Baarda 1973;Teunissen 1985), and thus the total number of unknowns is n = 2, namely two unknown coordinate increments of point 3 ( . With m = 3 and n = 2, the redundancy is r = 1. The angle measurements are assumed to be uncorrelated and normally distributed. Figure 4 shows the confidence ellipse S E α (x) in (15) for α = 0.1 assuming that the standard deviations of direction measurements are the same and equal to 5 seconds of arc.
To understand the shape and orientation of the ellipse in Fig. 4, driven by Qx 0x0 , let us first assume that the network points form an equilateral triangle. In this case, due to the symmetry of the network configuration and measuring all the three internal angles with the same precision, S E α (x) would be of circular shape. Now, if one moves point 3 further away from the other two points in the North direction, the network configuration changes to an isosceles triangle like that in Fig. 3. In this case, the observational model becomes more sensitive to the movement of point 3 in the East direction than in the North direction. Thus, the previous circular confidence region would become an ellipse with its minor axis parallel to the East direction, as shown in Fig. 4.

Estimation + testing confidence region: Approach I
When estimation is combined with testing, using the probabilistic properties ofx 0 (cf. 5) to compute confidence region/level is statistically incorrect as it does not do justice to the statistical testing that preceded the estimation of the parameters x. Thus for a proper computation of confidence region/level, one needs to work withx (cf. 6). In this section, we consider the construction of confidence region using Approach I with the geodetic-common choice of ellipsoidal region, but based on the PDF ofx, as where s I α is chosen so as to satisfy The smaller the α, the larger the s I α . The confidence region in (17) is an ellipsoidal region of the same shape and orientation as that of (15), be it that the size of (17) is determined by the PDF ofx through (18). For the PDF fx (ξ |H 0 , x), we have the following result.

Lemma 2 The PDF of the DIA-estimatorx under
With the above result and that (17) is a translationalinvariant set, the probability in (18) can be computed without knowing x according to Lemma 1. This computation is, however, not trivial due to the complexity of fx (ξ |H 0 , x). We will come back to this in Sect. 6 where we present the required numerical algorithm.
Here we now discuss the consequences if one, because of convenience, would stick to using the estimation-only ellipsoidal confidence region as in (15). Then, since the probability is now driven by the PDF ofx, we have the following result.
Theorem 1 (Confidence levels of estimation-only versus estimation + testing under H 0 ) Let S E α (x) be the ellipsoidal confidence region as in (15). Then, we have Proof See 'Appendix.' The above theorem shows that S E α (x) has poorer coverage for the DIA-estimatorx than (1 − α), and thus needs to be scaled up in order to contain the required probability of (1 − α), which implies that s I α ≥ χ 2 1−α (n, 0). Therefore, neglecting the link between estimation and testing and taking S E α (x) as if it would be the 100(1 − α)% confidence region ofx under H 0 , will provide a too optimistic description of the DIA-estimator's quality, i.e., the region is then shown too small. With (7), the difference between the actual confidence level P(x ∈ S E α (x)|H 0 ) and the desired confidence level which would approach zero under any of the conditions below , if the correlation betweenx i and t would be zero. One can get zero correlation betweenx i and t if L i = 0 (cf. 5), which can happen if A T Q −1 yy C i = 0 (C i being Q yy -orthogonal to the columns of A, zero influential bias (Teunissen 2017)). In addition, the correlation betweenx i and t would get close to zero if L i t becomes far more precise thanx 0 , i.e., Qx 0x0 L i Q tt L T i , and/or L i → 0 and/or Q tt → 0. Given that t H 0 ∼ N (0, Q tt ), the probability P(t ∈ P i |H 0 ), for i = 1, . . . , κ, would approach zero if f t (τ |H 0 ) becomes highly peaked at zero, i.e., Q tt → 0, and/or if regions P i =0 shrink, i.e., P 0 → R r (P FA → 0). Table 2 summarizes conditions under which P(x ∈ S E α (x)|H 0 ) gets closer to (1 − α).

Condition Consequence
Despite Theorem 1, one might still be willing to work with S E α (x), due to its lower computational burden compared to S I α (x), if the difference between P(x ∈ S E α (x)|H 0 ) and (1 − α) is not deemed significant based on the requirements of the application at hand. The computation of the exact confidence level P( , which is not trivial due to the complexity of the integrand. The following result however presents an easy-to-compute upper bound for the difference between the actual and desired confidence levels. Theorem 2 (Upper bound for the actual and desired confidence levels difference) The difference between the desired confidence level (1 − α) and the actual confidence level P(x ∈ S E α (x)|H 0 ) can be bounded from above as Proof See 'Appendix.' With the above easy-to-compute upper bound, one can decide whether or not to use the region S E α (x) as the 100(1 − α)% confidence region.
Example 2 Here we graphically show, with the PDFs of a simple 1-dimensional example having one alternative hypothesis (κ = 1), the effect of each of the factors in Table  2 on the difference between P(x ∈ S E α (x)|H 0 ) and (1 − α). Suppose that the H 0 -model in (1) contains only one unknown parameter (n = 1) and that its redundancy is one (r = 1), i.e., x ∈ R and t ∈ R. The canonical form of such a model, applying the Tienstra-transformation T to the normally distributed vector of observables y, reads (Teunissen 2018) where σx 0 and σ t are the standard deviations ofx 0 and t, respectively. As r = 1, we can only consider one single alternative hypothesis, say H a , and thus i ∈ {0, a}. If more than one alternative is considered, they will not be separable from each other. Under the null and alternative hypotheses, we have for some b a ∈ R\{0} and L a ∈ R (cf. 5). To test H 0 against H a , we use the following misclosure space partitioning In the simple model as given in (23), there is a scalar parameter of interest x ∈ R, and thus S E α (x) in (15) reduces to the following interval Figure 5 [left] shows the graphs of fx (ξ |H 0 , x) and its components (10), i.e., fx 0 (ξ |H 0 , x) and fx a |FA (ξ |FA, x), for different values of σx 0 , σ t , L a and P FA . Figure 5 [right] shows the corresponding graphs of the confidence level difference (1 − α) − P(x ∈ S E α (x)|H 0 ) and its upper bound (22) as a function of CI= χ 2 1−α (1, 0) σx 0 for 0.001 ≤ CI ≤ 10. In accordance with Table 2, it is observed that the difference between P(x ∈ S E α (x)|H 0 ) and (1 − α) gets smaller for the following situations: σx 0 ↑ (first vs. second row), σ t ↓ (first vs. third row), L a ↓ (first vs. fourth row) and P FA ↓ (first vs. fifth row).
We note that increasing CI for a given σx 0 is equivalent to decreasing α (cf. 26). As the left panels in Fig. 5 show, fx 0 (ξ |H 0 , x), in blue, is more peaked than fx (ξ |H 0 , x), in magenta, around x, implying that 1−α = P(x 0 ∈ S E α (x)|H 0 ) increases more rapidly than P(x ∈ S E α (x)|H 0 ) as CI increases from 0.001. In addition, we have Therefore, as the actual confidence level P(x ∈ S E α (x)|H 0 ) and the desired confidence level (1 − α) are continuous functions of CI, the difference (1 − α) − P(x ∈ S E α (x)|H 0 ) will be an increasing and then decreasing function of CI. This behavior can clearly be recognized in all the panels.

Estimation + testing confidence region: Approach II
In this section, we take Approach II to construct confidence region by using the contours of the PDF fx (ξ |H 0 , x). For a given H 0 -model and a misclosure space partitioning, the value of this PDF depends on ξ and x. We make use of the concept of highest density regions and define the 100(1−α)% confidence region as the one which has the smallest volume among all the regions which cover the probability (1 − α) for the DIA-estimator. Such region can be shown to take the following form (Hyndman 1996;Teunissen 2007) where s II α is chosen such that The smaller the α, the smaller the s II α . The confidence region in (28) is driven by the contours of the DIA-estimator's PDF. Therefore, Approach II is more informative than Approach I since it shows through the shape of the confidence region where in space the solution is 'weak' and where 'strong'. In casex has an elliptically contoured distribution, e.g., normal distribution, then the confidence region S II α (x) based on Approach II (28) would be identical to the confidence region S I α (x) based on Approach I (cf. 17). Special case (detection only): Let the integrated estimation + testing procedure providex 0 as the estimate of x if H 0 is accepted, while declare the solution for x to be unavailable if H 0 is rejected. Hence, in this case the testing is confined to detection only and no identification is attempted. The DIAestimator for this case is given bȳ To determine the confidence inx, one should note thatx solution is only available when t ∈ P 0 . Thus we compute the confidence region conditioned on t ∈ P 0 . Asx|(t ∈ P 0 ) = x 0 |(t ∈ P 0 ) andx 0 is normally distributed and independent from t, i.e.,x 0 |(t ∈ P 0 ) =x 0 , the confidence regions (17) and (28) should be formed based on the normal PDF ofx 0 . Thus, these two confidence regions become identical and the same as S E α (x) in (15). Therefore, if the testing procedure is confined to validating the null hypothesis only, there would be no difference between the estimation-only and estimation + testing confidence regions under H 0 , for a given α.
Determining the confidence region (28) requires the computation of s II α which can be done using (29). It was already shown in Lemma 2 that the PDF fx (ξ |H 0 , x) is translational invariant. Therefore, the probability in (29) will become independent of the unknown x if S II α (x) satisfies the condition (i) in Lemma 1, i.e., S II α (x) should be translational invariant. We have the following result.

Theorem 3 The confidence region
As the above theorem shows, the confidence region (28) inherits the translational-invariance property from that of the PDF fx (ξ |H 0 , x). Therefore, the probability in (29) is independent of the unknown x.
Example 3 (Continuation of Example 1) We now add in parallel to the three angle measurements, one length ratio measurement (Baarda 1967) taken at point 1, thus increasing the total number of measurements by one to m = 4. In the following, we use ρ i j to denote the observed-minus-computed pseudo-distance measurement made from point i to point j, with r jik = 1 l ik ρ ik − 1 l i j ρ i j . As the total number of unknowns is n = 2, the redundancy is r = 2. The length ratio measurement is assumed to be normally distributed and uncorrelated with the angle measurements.
As alternative hypotheses, we consider those describing outliers in individual observations. Here we restrict ourselves to the case of one outlier at a time. In that case, there are as many alternative hypotheses as there are observations, i.e., κ = m = 4, with C i = c i ∈ R 4 (1), for i = 1, . . . , 4, where the c i -vector takes the form of a canonical unit vector having 1 as i th entry and zeros elsewhere. To test the hypotheses at hand, we first use overall model test to check the validity of H 0 and then employ datasnooping to screen each individual observation for the presence of an outlier (Baarda 1968;Kok 1984). The corresponding misclosure space partitioning is given by  (28), red ellipse indicates the boundary of (17), and blue ellipse indicates the boundary of (15) where w i is Baarda's w-test statistic computed as (Baarda 1967;Teunissen 2006) Figure 6 shows the three confidence regions in (15), (17) and (28), see Table 1. The green area shows the region S II α (x), while the blue and red ellipses show the boundary of the regions S E α (x) and S I α (x), respectively. The results are illustrated for α = 0.1, 0.05, 0.02 and 0.01 assuming that the standard deviations of direction and pseudo-distance measurements are 5 seconds of arc and 3 mm, respectively, and P FA = 10 −1 . According to (20), the blue ellipse does not cover the probability (1 − α) for the DIA-estimator. Figure  7 elaborates more on this by showing positive values for the difference (1 − α) − P(x ∈ S E α (x)|H 0 ) as a function of α. Thus the blue ellipse should be made larger in order to contain the required probability of (1 − α). This explains why the red ellipse always encompasses the blue one.
Adding the length ratio measurement from point 1 to the previous angular measurements increases the sensitivity of the model to the movement of point 3 along the triangle side connecting point 1 to point 3. Therefore, the ellipse shown in Fig. 4 would shrink and its orientation would change depending on the length ratio precision with respect to the angular measurements. In the current example, the length ratio measurement is around two times more precise than the angle as a function of α measurements. In the extreme case, if the length ratio is known with zero standard deviation, the confidence ellipse will have its minor axis parallel to the triangle side connecting point 1 to point 3. As this length ratio measurement gets less precise, the confidence ellipse's minor axis moves further toward the East direction. This explains the shape and orientation of the ellipses in Fig. 6 as compared to those of the ellipse shown in Fig. 4. It is observed in Fig. 6 that the green region S II α (x) can significantly differ in shape from the ellipsoidal confidence region which is conventionally used. Depending on the α value, S II α (x) can be a non-convex and even disconnected Fig. 8 Contour maps of the PDF ofx under H 0 and its constituent components for the model specified in Example 3. The range of PDF values of each colored region is specified in the color bar on the right region, which can be explained by the shape of the PDF of x. Figure 8 (top-left) shows the contour plot of this PDF, i.e., fx (ξ |H 0 , x), which is a multi-modal distribution with three modes (peaks); one major mode at x = [x E , x N ] T with a PDF value larger than 2000, and two minor modes approximately North and South of x = [x E , x N ] T with the same PDF values between 20 and 50. According to the contour plot of fx (ξ |H 0 , x), for α values corresponding with s II α > 50, the confidence region (28) is a single contiguous area. As s II α gets smaller than 50, the two minor modes of fx (ξ |H 0 , x) start contributing to the construction of the confidence region. As a result, (28) becomes a non-convex region which, for some values of s II α within the range 10 < s II α < 50, can even consist of three disconnected sub-regions.
The multi-modality of fx (ξ |H 0 , x) can also explain why the red ellipse is significantly larger than the blue one, i.e., s I α is significantly larger than χ 2 1−α (2, 0), for small values of α. Although the majority of the probability mass for the DIA-estimator under H 0 is located around x, some probability mass can still be located far from x around the two minor modes of fx (ξ |H 0 , x). Therefore, in order for the red ellipse to cover large probabilities, equivalent to having small α values, one needs to set large values for s I α such that the probability mass around the minor modes is also captured by the red ellipse.
The shape of fx (ξ |H 0 , x) is driven by its constituent components (7), i.e., fx 0 (ξ |H 0 , x) and fx |t∈P i (ξ |t ∈ P i , H 0 , x) for i = 1, . . . , 4, which are all illustrated in Fig. 8. The contribution of fx 0 (ξ |H 0 , x) and fx |t∈P i (ξ |t ∈ P i , H 0 , x), for i = 1, . . . , 4, to the PDF ofx under H 0 is driven by the respective probabilities P(t ∈ P 0 |H 0 ) = 1 − P FA and P(t ∈ P i |H 0 ). As P FA = 10 −1 and 4 i=1 P(t ∈ P i |H 0 ) = P FA , then fx 0 (ξ |H 0 , x) has the largest contribution. This explains the similarity of the shapes of the three contour lines 200, 1000 and 2000 (the three closest lines to x = [x E , x N ] T ) between the top-left and top-middle panels. The multi-modality of fx (ξ |H 0 , x) results from the same property of the conditional distribution fx |t∈P 4 (ξ |t ∈ P 4 , H 0 , x) shown in the bottom-right panel.

Confidence level
Let S α (x) ⊂ R n be a translational-invariant confidence region satisfying the condition (i) in Lemma 1. The computation of the confidence level P(x ∈ S α (x)|H 0 ) requires integrating the non-normal PDF fx (ξ |H 0 , x) over the region S α (x). This is not an easy task due to the complexity of the PDF ofx, see, e.g., Fig. 8. We therefore show how the confidence level can be computed by means of Monte Carlo simulation. In doing so, we make use of the fact that a probability can always be written as an expectation, and an expectation can be approximated by taking the average of a sufficient number of samples from the distribution. The probability P(x ∈ S α (x)|H 0 ) can be written as with the indicator function Note, the above probability is independent of the unknown x as S α (x) and fx (ξ |H 0 , x) satisfy the conditions (i) and (ii) in Lemma 1, respectively. Therefore, one may choose for the unknown x any n-vector for the computation of (34). Letx (s) ∈ R n , s = 1, . . . , N , be sample vectors independently drawn from the distribution fx (ξ |H 0 , x). Then, for sufficiently large N , determined by the requirements of the application at hand, the confidence level P(x ∈ S α (x)|H 0 ) can be well approximated bŷ The simulation ofx (s) under H 0 can be carried out through the following steps.
1. Simulation of y (s) under H 0 : To simulate a sample vector y (s) ∈ R m from N (Ax, Q yy ), we first use a random number generator to simulate a sample u (s) ∈ R m from the multivariate standard normal distribution N (0, I m ). Then, we use the Cholesky-factor C T of the Choleskyfactorization Q yy = C T C, to transform u (s) to C T u (s) , which now can be considered to be a sample from N (0, Q yy ). Then, we shift C T u (s) to y (s) = C T u (s) + Ax, to get the asked for sample from N (Ax, Q yy ). Note that A, Q yy and x are needed to generate y (s) . The matrices A and Q yy are known, as they define the assumed model.
Although vector x ∈ R n is not known, one can, fortunately, take any n-vector as choice for x. If H i =0 is selected, we first compute the sample of the bias-estimator asb

Computation ofx
With the above three steps repeated N times, one obtains the sample setx (s) , s = 1, . . . , N , as needed for the computation of (36).

Confidence region: Approach I
In order to determine the confidence region S I α (x) in (17) for a chosen value of α, one needs to find the threshold value s I α . With (18), this threshold value can be computed by inverting which is not trivial due to the complexity of the PDF ofx. We therefore show how s I α can be computed for a chosen value of α by means of Monte Carlo simulation and density quantile approach (Hyndman 1996). We first define the following random variable implying that −s I α is the α quantile of θ . Therefore, −s I α can be approximated as α sample quantile from a set of independent samples from the distribution of θ . Letx (s) ∈ R n , s = 1, . . . , N , be sample vectors independently drawn from the distribution fx (ξ |H 0 , x). Then, θ (s) Qx 0x0 ∈ R, s = 1, . . . , N are independent samples from the distribution of θ under H 0 . After sorting the samples in an ascending order, the first sample with an index larger than α N gives an approximation of the α quantile of θ (Hyndman 1996). An approximation of s I α is then obtained by changing the sign of the mentioned sample.

Confidence region: Approach II
In order to determine the confidence region S II α (x) in (28) for a chosen value of α, one first needs to find the threshold value s II α . With (29), this threshold value can be computed by inverting which is again not trivial due to the complexity of the PDF of x. We therefore show how s II α can be computed for a chosen value of α by means of Monte Carlo simulation and density quantile approach. We first define the following random variable Therefore s II α should satisfy implying that s II α is the α quantile of ν. Therefore, s II α can be approximated as α sample quantile from a set of independent samples from the distribution of ν. Letx (s) ∈ R n , s = 1, . . . , N , be sample vectors independently drawn from the distribution fx (ξ |H 0 , x). Then, ν (s) = fx (x (s) |H 0 , x), s = 1, . . . , N , are independent samples from the distribution of ν under H 0 . After sorting the samples in an ascending order, the first sample with an index larger than α N gives an approximation of the α quantile of ν, i.e., s II α . To compute the samples of ν in (44), one needs to compute the PDF ofx, which, using (7) to (9), can be written as where L 0 = 0 and the expectation in the last equality is with respect to f t (τ |H 0 ). Given the last equality, we compute fx (ξ |H 0 , x) using a Monte Carlo simulation. Let t (l) ∈ R r , for l = 1, . . . , M, be sample vectors independently drawn from the distribution f t (τ |H 0 ). Then, for sufficiently large M, determined by the requirements of the application at hand, the value of fx (ξ |H 0 , x) can be well approximated bŷ The procedure of finding an approximation of s II α for a given α, explained above, can be summarized in the following steps.

Summary and conclusions
In this contribution, a critical appraisal was provided on the computation and evaluation of confidence regions and standard ellipses. We made the case that in the majority of our geodetic applications the customary confidence regions do not truly reflect the confidence one can have in one's produced estimators. We have shown that the common practice of combining parameter estimation with hypothesis testing, necessitates that the uncertainties of both estimation and testing are taken into account when constructing and evaluating confidence regions. We provided theory and computational procedures on how to compute and construct such confidence regions and associated confidence levels. In doing so, we made use of the concept of DIA-estimator and focused on the designing phase prior to when the actual measurements are collected, thereby naturally assuming that the null hypothesis H 0 is valid.
To construct the confidence region for the parameters of interest, two different approaches were discussed: Approach I in which the region's shape is user-fixed and only its size is determined by the estimator's distribution; and Approach II in which both the size and shape are simultaneously determined by the estimator's distribution. It was hereby emphasized irrespective of which approach is taken, that the properties of the confidence region are determined by the probabilistic properties of the estimator.
In using Approach I, we considered the geodetic-common choice of ellipsoidal region and then determined its size by the non-normal PDF of the DIA-estimator. We proved and demonstrated that if one neglects the testing contribution and assumes the estimator resulting from combined estimation + testing to be normally distributed, that the resulting confidence region of Approach I would actually cover a smaller probability than one presumes with the preset confidence level. We also provided an easy-to-compute upper bound for the difference between the actual and preset confidence level.
To form the confidence region of Approach II, we used the concept of highest density regions and defined the confidence region as the one which has the smallest volume among all the regions which cover the same probability. The shape and size of such confidence region are then driven by the contours of the DIA-estimator's PDF. It was demonstrated that the confidence region of Approach II can significantly differ in shape from the ellipsoidal confidence region which is conventionally used. It can be a non-convex and even disconnected region depending on the preset confidence level. However, in the very special case when one opts for not including identification, but have the testing procedure be confined to detection only, there would be no difference between the customary confidence region and those of Approach I and Approach II.
We presented the numerical algorithms for computing the confidence level of the DIA-estimator corresponding with any translational-invariant confidence region, as well as the numerical algorithms for constructing confidence regions based on both Approach I and Approach II. These are summarized in Fig. 9. Although Approach II confidence regions are more informative than those of Approach I, they demand higher computational complexities. The trade-offs, based on informativity and complexity, in selecting these approaches are determined by the application at hand.
We remark that although our analyses were presented assuming a normal distribution for the measurements, our confidence region/level computation algorithms in Sect. 6 and Fig. 9 can be applied for any arbitrary distribution of the measurements. In that case, for Approach II, use needs to be made of a more generalized form of (47) aŝ Finally, in this study, attention was focused on the computation of truthful confidence regions/levels under the null hypothesis H 0 . A similar study under alternative hypotheses, is the topic of future works.
showing that the probability in (51) is independent of x, thus proving Lemma 1. In (52), in the first equality use is made of condition (i), the second equality is obtained by change of variable ξ → η = ξ − γ , and in the last equality use is made of condition (ii). (7) to (9) and defining L 0 = 0, one can write, for any arbitrary vector γ ∈ R n ,

Proof of Lemma 2 Combining
which proves Lemma 2. Note, the second equality is a consequence of the PDF ofx 0 under H 0 being translational invariant.

Proof of Theorem 2
Taking the integral of both sides of (10) over S E α (x) gives Using the above lower bound and that P FA = 1 − P CA , we arrive at (22).
Proof of Theorem 3 'if' part: with fx (ξ |H 0 , x) being translational invariant, we have, for any γ, x, ξ ∈ R n , from which we can conclude that S II α (x + γ ) = S II α (x) + γ for any γ, x ∈ R n . 'only if' part: let S II α (x + γ ) = S II α (x) + γ be valid for any γ, x ∈ R n and α ≥ 0. Then, we have That (57) holds for all s II α proves that the PDF fx (ξ |H 0 , x) is translational invariant.