DIA-datasnooping and identifiability

In this contribution, we present and analyze datasnooping in the context of the DIA method. As the DIA method for the detection, identification and adaptation of mismodelling errors is concerned with estimation and testing, it is the combination of both that needs to be considered. This combination is rigorously captured by the DIA estimator. We discuss and analyze the DIA-datasnooping decision probabilities and the construction of the corresponding partitioning of misclosure space. We also investigate the circumstances under which two or more hypotheses are nonseparable in the identification step. By means of a theorem on the equivalence between the nonseparability of hypotheses and the inestimability of parameters, we demonstrate that one can forget about adapting the parameter vector for hypotheses that are nonseparable. However, as this concerns the complete vector and not necessarily functions of it, we also show that parameter functions may exist for which adaptation is still possible. It is shown how this adaptation looks like and how it changes the structure of the DIA estimator. To demonstrate the performance of the various elements of DIA-datasnooping, we apply the theory to some selected examples. We analyze how geometry changes in the measurement setup affect the testing procedure, by studying their partitioning of misclosure space, the decision probabilities and the minimal detectable and identifiable biases. The difference between these two minimal biases is highlighted by showing the difference between their corresponding contributing factors. We also show that if two alternative hypotheses, say \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {H}}_{i}$$\end{document}Hi and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {H}}_{j}$$\end{document}Hj, are nonseparable, the testing procedure may have different levels of sensitivity to \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {H}}_{i}$$\end{document}Hi-biases compared to the same \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {H}}_{j}$$\end{document}Hj-biases.


Introduction
The DIA method for the detection, identification and adaptation of mismodelling errors combines estimation with testing. This combination of estimation and testing can be rigorously captured in the DIA estimator as introduced in (Teunissen 2017). The DIA method has already been widely employed in a variety of applications, such as the quality control of geodetic networks and the integrity monitoring of GNSS models, see, e.g., (DGCC 1982;Teunissen 1990;Salzmann 1995;Tiberius 1998;Perfetti 2006;Khodabandeh and Teunissen 2016;Zaminpardaz et al. 2015).
In this contribution, as an important example of multiple hypothesis testing, datasnooping (Baarda 1967(Baarda , 1968Teunissen 1985) is presented in the context of the DIA method. In doing so, we make use of the partitioning of misclosure space based on which we discuss the datasnooping decision probabilities and the construction of the corresponding DIA estimator. Through this partitioning, the distribution of the misclosure vector can be used to determine the correct detection (CD) and correct identification (CI) probabilities of each of the alternative hypotheses, as well as their corresponding minimal biases, the minimal detectable bias (MDB) and the minimal identifiable bias (MIB). We highlight their difference by showing the difference between their corresponding contributing factors. We also investigate the circumstances under which two or more hypotheses are nonseparable and discuss the relevant corrective actions including 'remeasurement', 'adaptation' or stating that the solution is 'unavailable'. Of these, the adaptation step is the most involved and will be discussed in more detail.
This contribution is structured as follows. In Sect. 2, we briefly review the DIA method, describe the steps of DIAdatasnooping and define its corresponding DIA estimator. We hereby highlight the role played by the chosen partitioning of misclosure space. In Sect. 3, the decision probabilities of DIA-datasnooping are discussed, whereby between the following events are distinguished: correct acceptance (CA), false alarm (FA), correct/missed detection and correct/wrong identification. It is hereby highlighted that the MDB provides information about correct detection and not about correct identification. A high probability of correct detection does namely not necessarily imply a high probability of correct identification, unless one is dealing with the special case of having only one single alternative hypothesis.
As identification of hypotheses becomes problematic if the misclosure vector has the same distribution under different hypotheses, we study its consequences for the identification and adaptation steps in Sect. 4. We discuss the corrective actions one can choose from in terms of 'remeasurement', 'adaptation' or stating that the solution is 'unavailable'. Of these, the adaptation step is the most involved. By means of a theorem on the equivalence between the nonseparability of hypotheses and the inestimability of parameters, we demonstrate that one can forget about adapting the complete vector of unknowns for hypotheses that are nonseparable. However, it is demonstrated that there may exist parameter functions for which adaptation is still possible. It is shown how this adaptation looks like and how it changes the structure of the DIA estimator.
To illustrate and explain the performance of the various elements of DIA-datasnooping, the theory is applied to selected examples in Sect. 5. The following three different cases are treated: height-difference observations of a leveling network, distance measurements of a horizontal geodetic network and pseudorange measurements between a single ground station and GPS satellites. We analyze how geometry changes in the measurement setup affect the testing procedure, including its partitioning of the misclosure space, and the corresponding CD probabilities (MDB) and CI probabilities (MIB). We also demonstrate that for a given bias-to-noise ratio and a false alarm probability, the ordering of the CD probabilities of the alternative hypotheses is not necessarily the same as that of their CI probabilities. It is also shown if two alternative hypotheses, say H i and H j , are not distinguishable, that the testing procedure may have different levels of sensitivity to H i -biases compared to the same H j -biases. Finally, a summary and conclusions are given in Sect. 6.

DIA in brief
We first formulate the null-and alternative hypotheses, denoted as H 0 and H i , respectively. Let the observational model under the null hypothesis be given as with E(.) the expectation operator, D(.) the dispersion operator, y ∈ R m the normally distributed random vector of observables linked to the estimable unknown parameters x ∈ R n through the design matrix A ∈ R m×n of rank (A) = n, and Q yy ∈ R m×m the positive-definite variance-covariance matrix of y. The redundancy of The validity of the null hypothesis can be violated if the functional model and/or the stochastic model is misspecified.
Here we assume that a misspecification is restricted to an underparametrization of the mean of y, which is the most common error that occurs when formulating the model. Thus, the alternative hypothesis H i is formulated as is a known matrix of full rank and rank ([A C i ]) < m. C i and b i will further be specified in detail in Sect. 2.2. The best linear unbiased estimator (BLUE) of x under H 0 and H i is, respectively, denoted byx 0 andx i and given aŝ yy being the orthogonal projector that projects, along the range space of C i , onto the Q −1 yy -orthogonal complement of the range space of C i .
As one often will have to consider more than one single alternative hypothesis, the statistical model validation of H 0 and k alternatives H i (i = 1, . . . , k) usually goes along the following three steps of detection, identification and adaptation (DIA) (Baarda 1968;Teunissen 1990).

Detection
The validity of the null hypothesis is checked by virtue of an overall model test, without the need of having to consider a particular set of alternative hypotheses. If H 0 is accepted,x 0 is provided as the estimate of x.

Identification
In case H 0 is rejected, a search is carried out among the specified alternative hypotheses H i (i = 1, . . . , k) with the purpose of pinpointing the potential source of model error. In doing so, two decisions can be made. Either one of the alternative hypotheses, say H i , is confidently identified, or none can be identified as such, in which case an 'undecided' decision is made. 3. Adaptation In case H i is confidently identified, it is chosen as the new null hypothesis. The H 0 -based inferences are then accordingly corrected andx i is provided as the estimate of x. However, in case the 'undecided' decision is made, then the solution for x is declared 'unavailable'.
All the information that is needed for the above three steps is contained in the misclosure vector t ∈ R r given as where the m × r matrix B is a basis matrix of the null space of A T (cf. 1), i.e., A T B = 0 and rank(B) = r , and Q tt is the variance matrix of t. Assuming that the observations are normally distributed as y As t is zero-mean under H 0 and also independent ofx 0 , it provides all the available information useful for validation of H 0 (Teunissen 2017). Thus, an unambiguous testing procedure can be established through assigning the outcomes of t to the statistical hypotheses H i for i = 0, 1, . . . , k.

DIA-datasnooping
So far, no assumption was made about the structure of C i in (2). As the problem of screening observations for possible outliers is an important example of multiple hypothesis testing (see, e.g., Baarda 1968;Van Mierlo 1980;Hawkins 1980;Teunissen 1985;Parkinson and Axelrad 1988;Sturza 1988;Van der Marel and Kosters 1990;Su et al. 2014), we will restrict our attention to this important case. We further assume that only one observation at a time is affected by an outlier. Thus, in (2), b i is the scalar outlier and C i takes the form of a canonical unit vector c i ∈ R m having 1 as its ith entry and zeros elsewhere. This leads to having as many alternative hypotheses as the observations, i.e., k = m. This procedure of screening each individual observation for the presence of an outlier is known as datasnooping (Baarda 1968;Kok 1984). The corresponding DIA steps are specified as follows: in which · 2 Q tt = (·) T Q −1 tt (·) and k α is the user-chosen α-percentage of the central Chi-square distribution with r degrees of freedom. If H 0 is accepted, thenx 0 is provided as the estimate of x. Otherwise, go to step 2. 2. Identification Form Baarda's test statistic as (Baarda 1967;Teunissen 2000) in which Note, since t = B Tê 0 , withê 0 = y − Ax 0 , that the above procedure can be formulated by means of the least-squares residual vectorê 0 as well, thus providing a perhaps more recognizable form of the testing procedure (Teunissen 2000). Also note that we assume the variance-covariance matrix Q yy to be known. Variance-component estimation (Teunissen and Amiri-Simkooei 2008) with further modification of the partitioning of misclosure space would need to be included in case of unknown variance components. In the simplest case of a single unknown variance of unit weight, the datasnooping partitioning gets determined by only the w j statistics, which then will have a studentized distribution instead of a standard normal one (Koch 1999;Teunissen 2000).
Finally note that the vector of misclosures t is not uniquely defined. This, however, does not affect the testing outcome as both the detector t 2 Q tt and Baarda's test statistic w i remain invariant for any one-to-one transformation of the misclosure vector. Therefore, instead of t, one can for instance also work with which, given (5), is distributed ast . The advantage of usingt over t lies in the ease of visualizing certain effects due to the identity-variance matrix oft. We will make use of this in Sect. 5. The partitioning corresponding witht is then characterized through tt c t i / c t i Q tt being a unit vector and · 2 = (·) T (·). As such, P 0 containst's inside and on a zero-centered sphere with the radius of √ k α whereas P i =0 includes allt's outside the mentioned sphere which, amonḡ c j for j = 1, . . . , k, make the smallest angle withc i . The border between P i =0 and P j =0 is then the locus of the vectorst ∈ R r /P 0 which make the same angle withc i andc j . Therefore, the partitioning of R r is driven by k α and the relative orientation ofc j for j = 1, . . . , k with respect to each other.

DIA estimator
As the above three steps show, DIA-datasnooping combines estimation with testing. By using a canonical model formulation and a partitioning of misclosure space, a unifying framework to rigorously capture the probabilistic properties of this combination was presented in Teunissen (2017). It was there also shown how the combined estimation-testing scheme could be captured in one single DIA estimator. The DIA estimator is a function ofx j ( j = 0, 1, . . . , k) and the misclosure vector t, and it is given as with p j (t) being the indicator function of region P j , i.e., p j (t) = 1 for t ∈ P j and p j (t) = 0 elsewhere. Asx is linear inx j , the DIA estimator of θ = F T x with F ∈ R n× p is given as withθ j = F Tx j . For a general probabilistic evaluation of the DIA estimator, we refer to Teunissen (2017), but see also Teunissen et al. (2017). Here we note, however, that expressions (12) and (13) are only valid under the assumption that the set of regions P i (i = 0, 1, . . . , k) forms a partitioning of misclosure space, i.e., ∪ k i=0 P i = R r and P i ∩ P j = ∅ for any i = j. Note the second condition is considered for the interior points of the distinct regions P i . The regions P i are allowed to have common boundaries since we assume the probability of t lying on one of the boundaries to be zero. That the set of regions P i (i = 0, 1, . . . , k) forms a partitioning of misclosure space requires that the canonical unit vectors of the individual hypotheses satisfy certain conditions. Lemma 1 (Datasnooping partitioning) The m + 1 regions P i of (6) and (8) form a partitioning of misclosure space iff c t i ∦ c t j for any i = j.
Proof See Appendix.
It will be clear that the conditions of the above Lemma may not always be fulfilled. The question is then which strategy to follow to deal with such a situation. Should one decide for 'undecidedness' if c t i c t j for some i = j, or should one re-measure all such involved observables, or would it still be possible to perform an adaptation? An answer to these questions is provided in Sect. 4, where we consider the more general case and not restrict C i to be the canonical unit vector c i . First, however, we discuss the testing probabilities that are involved in the detection and identification step.

The probabilities
As shown by (6), (7) and (8), the decisions of the testing procedure are driven by the outcome of the misclosure vector t. The probabilities of their occurrence depend on which of the hypotheses is true. If H i is true, then the decision is correct if t ∈ P i , and wrong if t ∈ P j =i . We therefore discriminate between the following events With we denote the probability of * by P * satisfying P CA + P FA = 1 Computation of P * requires information about the misclosures probability density function (PDF) which is given in (5). Here, it is important to note the difference between the CD and CI probabilities, i.e., P CD i ≥ P CI i . They would be the same if there is only one alternative hypothesis, say H i , since then P i = R r /P 0 . Analogous to the CD and CI probabilities, we have the concepts of the minimal detectable bias (MDB) (Baarda 1968) and the minimal identifiable bias (MIB) (Teunissen 2017). In the following, the difference between the MDB (P CD i ) and the MIB (P CI i ) is highlighted by showing the difference between their corresponding contributing factors.

Minimal detectable bias (MDB)
The MDB of the alternative hypothesis H i is defined as the smallest value of |b i | that can be detected given a certain CD probability. Therefore, the MDB is an indicator of the sensitivity of the detection step. Under H i =0 with the definition of P 0 in (6), the probability of correct detection reads The MDB of H i can then be computed by inverting the above equation for a certain CD probability. With (5), we For certain P FA = α, P CD i = γ CD and r , one can compute λ 2 i = λ 2 (α, γ CD , r ) from the Chi-square distribution, and then the MDB is (Baarda 1968;Teunissen 2000) |b which shows that for a given set of {α, γ CD , r }, the MDB depends on c t i Q tt . One can compare the MDBs of different alternative hypotheses for a given set of {α, γ CD , r }, which provides information on how sensitive is the rejection of H 0 for the biases the size of |b i,MDB | , s. The smaller the MDB |b i,MDB | is, the more sensitive is the rejection of H 0 .

Minimal identifiable bias (MIB)
It is important to realize that the MDB provides information about correct detection and not correct identification. A high probability of correct detection does therefore not necessarily imply a high probability of correct identification (cf. 15), unless we have the special case of only a single alternative hypothesis. In case of multiple hypotheses, one can define the MIB of the alternative hypothesis H i as the smallest value of |b i | that can be identified given a certain CI probability. It is an indicator of the sensitivity of the identification step. The MIB, denoted by |b i,MIB |, can be computed through inverting for a given CI probability. The above probability is an r -fold integral over the complex region P i . Thus, the inversion of (18) is not as trivial as that of (16). The MIB needs then to be computed through numerical simulations, see, e.g., Teunissen (2017), pp. 73 and Robert and Casella (2013). From P CD i ≥ P CI i , one can infer that |b i,MDB | ≤ |b i,MIB | given P CI i = γ CD . The identification of mismodeling errors is thus more difficult than their detection (Imparato et al. 2018). Although computation of (18) is not trivial, we can still assess the behavior of CI probability in relation to the contributing factors. To simplify such assessment, we make use oft instead of t and present the CI probability as With the definition of P i in (11) and the CI probability, for a given value of b i , is dependent on the following three factors -P i : As the integrand function in (19) is positive for all τ ∈ R r , then the integral value will increase as P i expands. -The orientation ofc i w.r.t. the borders of P i : The unit vectorc i , lying within the borders of P i , determines the direction of E(t|H i ) about which the PDF ft (τ |H i ) is symmetric. The following lemma elaborates the role of the orientation ofc i in CI probability for r = 2. For this case, the regions P i in (11) are defined in R 2 . Each region has then three borders of which one is curved (with P 0 ) and two are straight lines on either sides ofc i .
Lemma 2 (P CI i as function of the orientation ofc i ) Let β i be the angle between the two straight borders of P i and let β i,1 be the angle betweenc i and the closest straight border on its right side (see Fig. 2). For a given β i , k α and ft (τ |H i ), the CI probability depends on β i,1 . We then have Proof See the Appendix.
Therefore, for r = 2, for a given β i , k α and ft (τ |H i ), the CI probability reaches its maximum ifc i is parallel to the bisector line of the angle between the two straight borders of P i .
c t i Q tt : The scalar c t i Q tt determines the magnitude of E(t|H i ). Therefore, the larger the value of c t i Q tt , the further the center of ft (τ |H i ) gets from the origin alonḡ c i , and the larger the probability mass of ft (τ |H i ) inside P i will become.
We will use this insight in the contributing factors of the CI probability to explain some of the phenomena that we come across in our numerical analysis in Sect. 5.

Identifying nonseparable hypotheses
As any testing procedure is driven by the misclosure vector, identification of hypotheses becomes problematic if the misclosure vector has the same distribution under different hypotheses. According to (5) this happens when for two different hypotheses, say H i and H j (i = j), In such a case, the misclosure vector t remains insensitive for the differences between H i and H j , as a consequence of which we have P i = P j . One can then not distinguish between the two hypotheses H i and H j in the identification step. If this is the case and t ∈ P i = P j , one may consider the following: 1. Remeasurement If in case of datasnooping, H i and H j are singled out in the identification step, then it is one of the two observables, y i = c T i y or y j = c T j y, that is suspected to contain a blunder or outlier. To remedy the situation, one may then decide to replace both y i and y j by their remeasured values. 2. Adaptation If remeasurement is not an option, one might think that adaptation ofx 0 would be an option by extending the design matrix to [A C i C j ], so as to cover both the hypotheses H i and H j . But, as the theorem below shows, this is unfortunately not possible as x will then become inestimable. Also note, despite the nonseparability of the two hypotheses, that adaptation on either should not be pursued. Such adaptation will still produce a biased result if done for the wrong hypothesis. 3. Unavailability Without remeasurement or adaptation, the remaining option is to declare a solution for x to be unavailable.
In the following theorem, we show an equivalence between the nonseparability of hypotheses and the inestimability of parameters.

Theorem 1 (Nonseparable hypotheses and inestimable parameters) Let [A B] be an invertible matrix, with A of order m × n and B of order m
Furthermore, for any i = j and i, j = 1, . . . , l, let C i be full-rank matrices of order m × q with m − n > q such that rank ([C i C j ])> q and rank ([A C i ]) = n + q. Then for any i = j and i, j = 1, . . . , l, for some invertible matrix iff Proof See the Appendix.
The above theorem conveys that if the alternative hypotheses H i with i = 1, . . . , l are not distinguishable, then extending the design matrix A by any two or more matrices C i with i = 1, . . . , l will result in a rank-deficient design matrix and therefore make unbiased estimability of the parameter vector x impossible. The conclusion reads therefore that if remeasurement is not an option and x is the parameter vector for which a solution is sought, the issue of nonseparable hypotheses should already be tackled at the designing phase of the measurement experiment.

Adaptation for estimable functions
The above theorem has shown that one can forget about adaptingx 0 for hypotheses that are nonseparable. This concerns, however, the complete vector x and not necessarily functions of x. It could still be possible that some relevant components of x or some relevant functions of x remain estimable, despite the rank-deficiency of the extended design matrix. The following theorem specifies which parameters remain estimable after the mentioned extension of the design matrix as well as presents the corresponding adaptation step for these estimable parameters.
Also, let C ∈ R m×l 1 q be a matrix formed by putting l 1 matrices C i column-wise next to each other. Then θ = F T x, with F ∈ R n× p , is unbiased estimable under the extended model in which V is a basis matrix of the null space of C ⊥ T A, i.e., C ⊥ T A V = 0, and C ⊥ is a basis matrix of the orthogonal complement of the range space of C.
(ii) Adaptation: The BLUE of θ = F T x under (24) and its variance matrix, denoted asθ and Qθθ , respectively, can be written in adapted form as Proof See the Appendix.
Note if one opts for the adaptation ofθ 0 as given above, that one cannot use the expression for the DIA estimator as given in (13) anymore. For example, if the hypotheses H i , with i = 1, · · · , l, are indistinguishable, i.e., P 1 = . . . = P l , the adaptation according to (26) implies that the DIA estimator in (13) changes tō Thus, the k + 1 terms in the sum are now reduced to k −l + 2, withθ being the BLUE under (24).

Numerical analysis
In this section, we apply the theory of the previous sections to some selected examples so as to illustrate and explain the performance of the various decision elements in DIA-datasnooping. The insight so obtained will also help us appreciate some of the more complex intricacies of the theory. The following three different cases are considered: height-difference observations of a leveling network, distance measurements of a horizontal geodetic network and pseudorange measurements between a single ground station and GPS satellites. We analyze and illustrate how geometry changes in the measurement setup affect the testing procedure, including its partitioning of the misclosure space, and the corresponding CD probabilities (MDB) and CI probabilities (MIB). The CD probability under H i (i = 1, . . . , k) is computed based on (16) from χ 2 (r , λ 2 i ), whereas the CI probability under H i (i = 1, . . . , k) is computed as described in the Appendix.

Leveling network
Suppose that we have two leveling loops containing n ≥ 2 height-difference observations each and sharing one observation with each other (see Fig. 1). For such leveling network, two misclosures can be formed stating that the sum of observations in each loop equals zero. Assuming that all the obser- vations are uncorrelated and of the same precision σ , a misclosure vector t and its variance matrix Q tt can be formed as where y A is the observation shared between the two leveling loops, and y B and y C the n-vectors of observations of the leveling loops B and C, respectively. The number of datasnooping alternative hypotheses for the above model is equal to 2n + 1. But it will be clear of course that not all of them are separately identifiable. Looking at the structure of B T in (28), it can be seen that out of 2n + 1 vectors c t i (columns of B T ), only the following three are nonparallel which implies that in each leveling loop excluding the shared observation y A , an outlier on each of the observations is sensed in the same way by the vector of misclosures. In other words, the testing procedure cannot distinguish between the outliers on the observations in y B , and between those on the observations in y C . Therefore, among the 2n + 1 alternative hypotheses, we retain three: H A corresponding with y A , H B corresponding with one of the observations in y B and H C corresponding with one of the observations in y C .

Misclosure space partitioning
Given (29), the datasnooping partitioning of the misclosure space is formed by four distinct regions P i with i ∈ {0, A, B, C}. For the sake of visualization, instead of t, we work witht (cf. 9). The datasnooping partitioning, as mentioned earlier, is then driven by the relative orientation ofc A , c B andc C (cf. 11). The angles between these unit vectors are computed as As (30) suggests, when n → ∞, the angles (c A ,c B ) and (c A ,c C ) go to 45 • , and the angle (c B ,c C ) goes to 90 • . Figure 2 demonstrates the impact of n on the misclosure space partitioning given α = 0.05, r = 2 and σ = 5 mm. Using different shades of gray color, the first row of Fig. 2 shows, for n = 2, n = 10 and n = 100, the partitioning of the misclosure space formed by P i with i ∈ {0, A, B, C}.

CD and CI probabilities
According to (17), for a given λ(α, γ CD , r ), the MDB depends only on c t i Q tt . For the leveling network characterized in (28) and its corresponding vectors c t i in (29), we have which clearly shows that for a given set of {α, γ CD , r }, smaller H A -biases can be detected compared to H B and H C . Equivalently, it can be stated that for a given {α, r } and b i = b, the CD probability of H A is larger than that of H B and H C . That is because each observation in y B and y C contributes to only one leveling loop while y A contributes to two leveling loops, thus being checked by the observations of both loops. The solid curves in Fig. 2 (second row) depict P CD i as function of the bias-to-noise ratio |b|/σ . The dark gray graphs correspond with H A , while the light gray graphs correspond with H B and H C . These graphs can be used as follows. For a certain b i = b, one can compare the corresponding P CD i of different alternative hypotheses. One can also take the reverse route by comparing the MDB of different alternative hypotheses for a certain P CD i = γ CD . In agreement with (31), the solid dark gray graphs always lie above the solid light gray ones. As the number of observations increases in each loop (n ↑), the corresponding P CD i decreases for a given H i -bias. This is due to the fact that the variance of the misclosure vector is an increasing function of the number of observations in each loop (see 28). The lower the precision of the misclosures, the lower is the sensitivity of the testing procedure to a given bias in an observation. The dashed curves in Fig. 2 (second row) depict P CI i as function of |b|/σ . These curves (P CI i ) always lie below their solid counterparts (P CD i ). Like the solid graphs, these dashed graphs can be used either for comparing the MIB of different alternative hypotheses given a certain P CI i = γ CI , or for comparing the corresponding P CI i of different alternative hypotheses given a certain b i = b. We note that despite the CD probability of H A being always larger than that of H B and H C , the CI probability of H A is not always larger than that of H B and H C . Depending on the number of measurements in each loop n, if |b|/σ is smaller than a certain value, then we have P CI A < P CI B = P CI C . This discrepancy between the behavior of CD probability and that of CI probability as function of |b|/σ for a given α is due to the fact that while P CD i is driven only by c t i Q tt , P CI i is in addition driven by P i and the orientation ofc i w.r.t. the straight borders of P i (cf. 19). Looking at the first row of Fig. 2, we note that P A has smaller area compared to P B and P C . Therefore, |b| should be large enough such that c t A Q tt > c t B Q tt = c t C Q tt can compensate for P A being smaller than P B and P C .

Impact of partitioning on CI probability
As was mentioned, P CI i depends on P i , the orientation of c i and the magnitude of c t i Q tt . While the last two factors are driven by the underlying model, the first one depends on the testing procedure. Our above conclusions about the CI probability will then change if we go for another testing scheme. For example, let P 0 be defined by (10) and where d A =c A , d B = R (−60 • )cA and d C = R (60 • )cA with R θ being the counterclockwise rotation matrix. This testing scheme leads to P A , P B and P C to be of the same shape. In addition, whilec A is parallel to the bisector line of the angle between the two straight borders of P A ,c B andc C are close to one of the straight borders of their corresponding region. This combined with the fact that c t A Q tt > c t B Q tt = c t C Q tt lead us to the conclusion that P CI A > P CI B = P CI C holds for any given bias b. Figure 3 shows the difference between the testing procedure based on (11)

Horizontal geodetic network
Consider a horizontal geodetic network containing m reference points from which we measure distances toward an unknown point to determine its horizontal coordinates. Assuming that all the measurements are uncorrelated and of the same precision, the design matrix and the observations variance matrix of the linearized model under H 0 read where the unit direction 2-vector from the unknown point to the reference point (i = 1, . . . , m) is denoted by u i . In this observational model, the redundancy is r = m − 2 revealing that the misclosure vector t is of dimension m − 2.

Misclosure space partitioning
For the model in (33), the angles between the correspondinḡ c i vectors are computed as Assuming that the horizontal geodetic network comprises m = 4 reference points, Fig. 4 presents the same information as Fig. 2 but for geodetic networks corresponding with (33). The first row shows the orientation of vectors u i . The standard deviation of the distance measurements is considered to be σ = 5 mm, and the false alarm is set to α = 0.05. In (a), the geometry of the measuring points leads to a cofactor matrix of C x x = 2 I 2 of which the substitution in (34) gives cos (c i ,c j ) = − cos (u i , u j ). Given that the angle between consecutive vectors u i is 45 • , the four regions P i =0 have then the same shape. Moving the reference point D to a new location such that u D = −u A as illustrated in (b), the two regions P B and P C , as Theorem 1 states, become identical. The proof is given as follows. Let u D = p u A ( p = ±1). As the vectors c t i are the columns of B T and given that B T A = 0, we have Multiplying both sides of the above equation with u ⊥ A from the right, we get which means that c t B c t C , thusc B c C and P B = P C . If in addition, we have u C = q u B (q = ±1), then (35) simplifies to Multiplying the above once with u ⊥ A and once with u ⊥ B from the right, then we get c t A c t D and c t B c t C , thusc A c D andc B c C . From (b) to (c), as the angle between u B and u C decreases, the errors in the measurements A and D become less distinguishable from each other, but better separable from those in the measurements of B and C.

CD and CI probabilities
The illustrations on the third row of Fig. 4 show the graphs of P CD i (solid lines) and P CI i (dashed lines) under all the four alternative hypotheses H i with i ∈ {A, B, C, D}. The CD probability P CD i corresponding with (33) for a given α, r and a bias value |b| is driven by (cf. 17) with det(.) being the determinant operator. In (a), owing to 45 • angle between the consecutive vectors u i , we have c t i Q tt = c t j Q tt for any i = j, hence P CD i = P CD j for any given value of bias |b| and i = j. Furthermore, as a consequence of having a symmetric partitioning, we also have P CI i = P CI j for any given value of bias |b| and i = j. In (b) and (c), given that u A u D and u A ⊥ u C , we have c t A Q tt = c t B Q tt = c t D Q tt conveying that the hypotheses H A , H B and H D have the same CD probability. H A and H D have, in addition, the same CI probability since P A and P D have the same shape and also the orientation ofc A inside P A is the same as that ofc D inside P D .
In (b) and (c), H B is not distinguishable from H C . For these hypotheses, although not identifiable from each other, we still define CI probability as P CI B = P(t ∈ P B |H B ) and P CI C = P(t ∈ P B |H C ). It can be seen that, although H B is not distinguishable from H C , they are different in both the CD and CI probabilities. Also, the testing procedure is more sensitive to the biases in y B compared to the same biases in y C . This is due to the fact that the observation of C contributes to the misclosure vector less than the observation of B. The contribution of the measurement of C to the misclosure vector [Middle] Datasnooping partitioning of the misclosure space R 2 corresponding witht (cf. 9).
[Bottom] The graphs of CD (solid lines) and CI probability (dashed lines) of different alternative hypotheses as function of bias-to-noise ratio depends on the relative orientation of u B w.r.t. u C . In case u B is parallel to u A and u D , the measurement of the point C would have zero contribution to the misclosure vector and cannot be screened at all. As the angle between u B and u C decreases, the mentioned contribution increases, so does the sensitivity of the testing procedure to the biases in the measurement of C.
Note that for the geometries shown in (b) and (c), if the misclosure vector lies in P B , it cannot be inferred that whether y B or y C is biased. For adaptation, one may extend the design matrix A to [A c B c C ], which would be of rel-evance if the parameters of interest remain estimable (see Theorem 2). As c B and c C are canonical unit vectors, then [c B c C ] ⊥ T A is a matrix achieved by removing the rows of A corresponding with y B and y C as which clearly shows that the x-coordinate is not estimable. However, the above adaptation strategy is still of relevance if one is interested in the y-coordinate.
A summary of the above qualitative findings in relation to the geometry of the measuring points is given as follows -If | cos (u i , u i+1 )| = cos 45 • for any i = 1, 2, 3, then • | cos (c i ,c i+1 )| = cos 45 • • P i has the same shape of P j for any i = j • P CD i = P CD j and P CI i = P CI j for any i = j -If u A u D , then • P B = P C • P CD A = P CD D and P CI A = P CI D -If u A u D and u B u C , then • P A = P D and P B = P C • P A has the same shape of P B • P CD i = P CD j and P CI i = P CI j for any i = j -If u A u D and u C ⊥ u A , then • If (u B , u C ) decreases, so does the differences P CD B − P CD C and P CI B − P CI C .
-If u A u B , u A u D and u C ⊥ u A , then P CD C = P CI C = 0.

GPS single-point positioning
Let the pseudorange observations of m GPS satellites be collected by one single receiver to estimate its three-dimensional position coordinates and clock error. Assuming that all the code observations are uncorrelated and of the same precision σ , the corresponding linearized observational model, also known as the single-point positioning (SPP) model, under H 0 is characterized through the following full-rank design matrix and the observations variance matrix in which the 3-vectors u i (i = 1, . . . , m) are the receiversatellite unit direction vectors. The first three columns of A correspond with the receiver North-East-Up coordinate increments while the last one corresponds with the receiver clock error increment. Given that the design matrix A is of order m × 4, the redundancy of the SPP model is r = m − 4.

Misclosure space partitioning
With the SPP model in (40), the angles between the vectors c i are computed as Assuming that six GPS satellites are transmitting signals to a single receiver (m = 6), two misclosures can be formed, i.e., r = 2. Figure 5, for three different geometries of these satellites (first row), shows the partitioning of the misclosure space (second row). The satellite geometries in (a) and (b) are artificial while that in (c), except for the name of satellites, is a real GPS geometry at Perth, Australia.
In (a), despite having six pseudorange observations, the partitioning is formed by five distinct regions. The regions corresponding with H 5 and H 6 coincide each other, i.e., P 5 = P 6 , which can be explained as follows. The linesof-sight of the four satellites G1, G2, G3, and G4 lie on a cone of which the symmetry axis is indicated as the red circle. Therefore, we have with d the unit 3-vector of the symmetry axis of the mentioned cone and c the cosine of the half the vertex angle of the cone. The extended SPP design matrix [A c 5 c 6 ] will then satisfy Therefore, the 6×6 matrix [A c 5 c 6 ] is rank-deficient which, according to Theorem 1, implies that the two alternative hypotheses H 5 and H 6 are not separable. If the misclosure vector lies in P 5 , it cannot be inferred that whether observation y 5 or y 6 is biased. For adaptation, one may use the above-extended design matrix in case the parameters of interest remain estimable (see Theorem 2). As c 5 and c 6 are canonical unit vectors, then [c 5 c 6 ] ⊥ T A is a matrix achieved by removing the last two rows of A. Based on such reduced design matrix, according to (42), the position solution in the direction of d is indeterminate. Since d is vertically oriented, the horizontal coordinates (East-North) remain estimable based on the first four rows of A. In (b), all the alternative hypotheses are distinguishable. In (c), the two vectorsc 3 andc 5 are almost parallel which is due to the satellites G1, G2, G4 and G6 forming a cone-like geometry of which the axis is indicated by a red circle.

CD and CI probabilities
The graphs of P CD i and P CI i for i = 1, . . . , 6 as function of the bias-to-noise ratio are given in the third row of Fig. 5. One [Middle] Datasnooping partitioning of the misclosure space R 2 corresponding witht (cf. 9).
[Bottom] The graphs of CD (solid lines) and CI probabilities (dashed lines) of different alternative hypotheses as function of bias-to-noise ratio notes that the signature of P CD i is generally different from P CI i . For example, in (a), we have P CD 2 > P CD 3 while P CI 3 > P CI 2 . That is because P CI i , in addition to c t i Q tt , is also driven by P i and the orientation ofc i within P i . In (a), we also note that although H 5 and H 6 cannot be distinguished, the testing procedure has a different sensitivity to the H 5and H 6 -biases. For the same bias-to-noise ratios, we have P CD 5 > P CD 6 and P CI 5 > P CI 6 , which can be explained as follows. The difference between P CD 5 and P CD 6 for a given bias-to-noise ratio lies in the difference between c t 5 Q tt and c t 6 Q tt (cf. 17). Given that c t i is the ith column of B T and given (42), multiplying the corresponding SPP design matrix A with B T from left and with [d T , c] T from right, we arrive at According to the skyplot in (a), c = cos 40 • and u T 5 d = cos 60 • and u T 6 d = cos 80 • , which means that c t 5 Q tt > c t 6 Q tt , thus P CD 5 > P CD 6 . Sincec 5 c 6 and P 5 = P 6 , the difference between P CI 5 and P CI 6 for a given bias-to-noise ratio depends only on the difference between c t 5 Q tt and c t 6 Q tt . Therefore, c t 5 Q tt > c t 6 Q tt will also lead to P CI 5 > P CI 6 . In (b), all the satellites except G4 locate almost on a cone with its axis shown as the red circle. If the satellites G1, G2, G3, G5 and G6 would have formed a perfect cone, then the contribution of the G4 observation to the misclosures would have been identically zero. This can be shown by proving that the fourth column of B T , i.e., c t 4 , becomes a zero-vector. If the unit vectors u i for i = 4 lie on a cone with d being its symmetry axis, then for some scalar c ∈ R we have u T i d = c (cf. 42). Multiplying the corresponding SPP design matrix A with B T from left and with [d T , c] T from right, we arrive at Since u 4 does not lie on the mentioned cone, then u T 4 d = c implying that c t 4 = 0, thus P CD 4 = P CI 4 = 0. However, as the line-of-sights to the satellites G1, G2, G3, G5 and G6 do not form a perfect cone, i.e., u T i =4 d ≈ c, the observation of satellite G 4 has a nonzero contribution to the misclosure vector resulting in nonzero values for P CD 4 and P CI 4 . It can be seen that P CD 4 and P CI 4 are significantly smaller than, respectively, P CD i =4 and P CI i =4 . To understand the distinct behavior of P CD 4 compared to P CD i =4 , we look at c t i Q tt given as where The quadratic expression within the brackets can be worked out using the eigenvalue decomposition of C x x =i as in which λ j,i and v j,i for j = 1, 2, 3 are, respectively, the eigenvalues and the corresponding eigenvectors of C x x =i . Assuming λ 1,i ≥ λ 2,i ≥ λ 3,i , for a given value of u i −ū =i , (47) achieves its maximum when (u i −ū =i ) v 3,i . In the following, we check λ 3,i (the minimum eigenvalue), the angle between (u i −ū =i ) and v 3,i (eigenvector corresponding with the minimum eigenvalue), and u i −ū =i for i = 1, . . . , 6.
λ 3,i : For i = 4, since u T j =4 d ≈ c, it can be concluded that v 3,4 is almost parallel to d and λ 3,4 ≈ 0. This implies that λ −1 3,4 is extremely large. For i = 4, among the five remaining satellites, still there are four unit vectors which satisfy u T j =i,4 d ≈ c. Therefore, the eigenvector v 3,i =4 does not deviate too much from the direction d. However, due to the presence of satellite G4 not lying on the mentioned cone, λ 3,i =4 is much larger than zero, implying that λ −1 3,i =4 is much smaller than λ −1 3,4 . -The angle between (u i −ū =i ) and v 3,i : As shown in the skyplot in (b), while u 4 is almost parallel to v 3,4 , u i =4 makes an almost 56 • with v 3,i =4 (almost parallel to d).
For the geometry shown in (b),ū =4 is almost parallel to v 3,4 , whereas this is not the case withū =i (i = 4). Therefore, we have ( Sinceū =i is computed based on five out of six unit direction vectors, its norm does not change too much for different i. Therefore, u i −ū =i gets its minimum value for i = 4 as u 4 is almost parallel toū =4 . However, Given the above explanation, u 4 −ū =4 2 C xx =4 is much larger than u i −ū =i 2 C xx =i , and c t 4 Q tt is thus much smaller compared to c t i Q tt . This explains that the CD probability of H 4 is much smaller than that of H i =4 . As P 4 and the orientation ofc 4 within it are similar to those of H i with i = 1, 3, 6 and poorer than H i with i = 2, 5, then c t i =4 Q tt > c t 4 Q tt can also explain why P CI i =4 > P CI 4 .

Conclusion and summary
In this contribution, we presented datasnooping in the context of the DIA method, discussed its decision probabilities for detection and identification and showed what options one has available when two or more of the alternative hypotheses are nonseparable.
In our discussion, we emphasized the central role that is played by the partitioning of misclosure space, both in the formation of the decision probabilities and in the construction of the DIA estimator. In case of datasnooping, the partitioning is determined by the row vectors of the basis matrix of the null space of A T . Through this partitioning, the distribution of the misclosure vector can be used to determine the correct detection (CD) and correct identification (CI) probabilities of each of the alternative hypotheses. These probabilities can be 'inverted' to determine their corresponding minimal biases, the minimal detectable bias (MDB) and the minimal identifiable bias (MIB). We highlighted their difference by showing the difference between their corresponding contributing factors. In particular, it should be realized that the MDB provides information about correct detection and not about correct identification. A high probability of correct detection does namely not necessarily imply a high probability of correct identification, unless one is dealing with the special case of having only one single alternative hypothesis.
In the identification step, one has to ascertain whether or not all the hypotheses are identifiable. Identification of hypotheses becomes problematic if the misclosure vector has the same distribution under different hypotheses. We discussed the options one can choose from in terms of 'remeasurement', 'adaptation' or stating that the solution is 'unavailable'. Of these, the adaptation step is the most involved. By means of an equivalence between the nonseparability of hypotheses and the inestimability of parameters (cf. Theorem 1), we demonstrated that one can forget about adaptingx 0 for hypotheses that are nonseparable. However, as this concerns the complete vector x and not necessarily functions of x, we also demonstrated that functions of x may exist for which adaptation is still possible (cf. Theorem 2). It was shown how this adaptation looks like and how it changes the structure of the DIA estimator.
We applied the theory to selected examples so as to illustrate and explain the performance of the various elements of DIA-datasnooping. Three different cases were discussed in detail: height-difference observations of a leveling network, distance measurements of a horizontal geodetic network and pseudorange measurements between a single ground station and GPS satellites. We analyzed and illustrated how geometry changes in the measurement setup affect the testing procedure, including its partitioning of the misclosure space, and the corresponding CD probabilities (MDB) and CI probabilities (MIB). We also demonstrated that for a given bias-to-noise ratio and a false alarm probability, the ordering of the CD probabilities of the alternative hypotheses is not necessarily the same as that of their CI probabilities. And we showed, if two alternative hypotheses, say H i and H j , are not distinguishable, that the testing procedure may have different levels of sensitivity to H i -biases compared to the same H j -biases.

Proof of Lemma 2
Consider the 2-vectort with a length ofl which makes an angle ofβ i withc i measured counterclockwise. Therefore, we havē Given the above equation and the orientation ofc i w.r.t. the straight borders of P i in (11), one can write with With (49), one can also obtain the joint PDF of [l,β i ] T from ft (τ |H i ) through the PDF transformation rule as with μt i = |b i | c t i Q tt . Equations (50) and (52) enable us to express the CI probability in terms ofl andβ i as Substitution of (52) into (53), and then taking the derivative w.r.t. β i,1 , we achieve ∞ √ k α l 2π exp − 1 2 (l 2 + μt 2 ) × [exp l μt cos β i,1 − exp l μt cos(β i − β i,1 ) + exp −l μt cos β i,1 − exp −l μt cos(β i − β i,1 ) ] dl (54) Setting the above derivative equal to zero, a set of solutions for β i,1 is given by Since β i,1 < π, then the only valid solution is β i,1 = 1 2 β i . To check whether this critical point is the maximizer of (53), we compute the derivative of the expression in (54) Since 0 < β i < π, then sin β i 2 > 0 and cos β i 2 > 0. These, in tandem with l μt > 0 and the fact that exp{·} is a 'positive' increasing function, imply that the expression in (56) is 'negative.' Thus, β i,1 = 1 2 β i is the maximizer of (53).

Proof of Theorem 1
We start with the 'if' part: If there exists a nonzero matrix X ∈ R n×q such that A X = C i − C j X i, j for some invertible matrix X i, j ∈ R q×q , then multiplying both sides of the equation with B T from left gives For the 'only if' part we have: If B T C i = B T C j X i, j for some invertible matrix X i, j ∈ R q×q , then two conclusions can be made: 1. C i = C j X i, j which is not possible as it contradicts our assumption that rank ([C i C j ]) > q; 2. (C i − C j X i, j ) ∈ R(A). Therefore, there exists a nonzero matrix X ∈ R n×q such that A X = C i − C j X i, j or equivalently Proof of Theorem 2 Since θ is a linear function of x and not b, we reduce the observational model in (24) to one containing only the unknown parameters x as where C ⊥ is a basis matrix of the orthogonal complement of the range space of C. Let V be a basis matrix of the null space of C ⊥ T A. Furthermore, let S be a full-rank matrix of which the range space is complementary to that of V . Therefore, [S V] ∈ R n×n is an invertible matrix, thus a basis matrix of R n . This indicates that any vector x ∈ R n can be parametrized as (i) We now show that θ is unbiased estimable under (59) iff F T V = 0. 'if' part: If F T V = 0, then θ = F T S x S . Substituting (60) into (59), as C ⊥ T A V = 0, gives us which shows that x S as well as any linear function of it like θ are unbiased estimable under (59). 'only if' part: Equation (61) shows that x V is not estimable. Therefore, for θ = F T x to be estimable in (59) in whichĀ = P ⊥ C A and P ⊥ C = I m −C(C T Q −1 yy C) −1 C T Q −1 yy . It can be shown ) that the BLUE of x S based on (1),x S 0 , is linked tox S aŝ where t and Q tt are given in (4) and Qx S ,t is the covariance betweenx S and t which can be computed given (4) and (62). Multiplying both sides of the above equation with F T S from left gives the link betweenθ = F T Sx S andθ 0 = F T Sx S 0 . The link between their variances is also obtained using the error propagation law. (18): In this study, given Eqs. (5)-(8), the CI probability under H i (cf. 18) is computed through the following steps.

Numerical evaluation of P CI i in
-Generate n samples of t from the normal distribution N (μ t i = b i c t i , Q tt ). -Compute t 2 Q tt for all the samples. Single out those samples satisfying t 2 Q tt > k α (i.e., t / ∈ P 0 ) and collect them in a set denoted by CD .
-For each sample in CD , compute the w-tests w j for j = 1, . . . , k in (7). Count the number of samples for which |w i | ≥ |w j | for any j = i (i.e., t ∈ P i ), and denote it by n CI . -Compute the CI probability under H i as P CI i = n CI n .