Best integer equivariant estimation for elliptically contoured distributions

This contribution extends the theory of integer equivariant estimation (Teunissen in J Geodesy 77:402–410, 2003) by developing the principle of best integer equivariant (BIE) estimation for the class of elliptically contoured distributions. The presented theory provides new minimum mean squared error solutions to the problem of GNSS carrier-phase ambiguity resolution for a wide range of distributions. The associated BIE estimators are universally optimal in the sense that they have an accuracy which is never poorer than that of any integer estimator and any linear unbiased estimator. Next to the BIE estimator for the multivariate normal distribution, special attention is given to the BIE estimators for the contaminated normal and the multivariate t-distribution, both of which have heavier tails than the normal. Their computational formulae are presented and discussed in relation to that of the normal distribution.


Introduction
This contribution extends the theory of integer equivariant (IE) estimation as introduced in Teunissen (2003). The theory of IE estimation provides a solution to the problem of carrier-phase ambiguity resolution, which is key to highprecision GNSS positioning and navigation. It is well known that the so-called fixed GNSS baseline estimator is superior to its 'float' counterpart if the integer ambiguity success rate is sufficiently close to its maximum value of one. Although this is a strong result, the necessary condition on the success rate does not make it hold for all measurement scenarios. This restriction was the motivation to search for a class of estimators that could provide a universally optimal estimator while still benefiting from the integerness of the carrier-phase ambiguities. The solution was found in the class of IE estimators, with its best integer equivariant (BIE) estimator being best in this class in the mean squared error sense (Teunissen 2003). The class of IE estimators obeys the integer removerestore principle and was shown to be larger than the class of integer (I) estimators as well as larger than the class of linear unbiased (LU) estimators. As a consequence, the BIE estimator has the universal property that its mean squared error (MSE) is never larger than that of any I estimator or any LU estimator. This shows that from the MSE perspective one should always prefer the use of the BIE baseline over that of the integer least-squares (ILS) baseline and best linear unbiased (BLU) baseline.
In Teunissen (2003), an explicit expression of the BIE estimator was derived that holds true when the data can be assumed to be normally distributed. In Verhagen and Teunissen (2005), the performance of this estimator was compared with the float and fixed GNSS ambiguity estimators, while in Wen et al. (2012) it was shown how to use this BIE estimator for GNSS precise point positioning (PPP). In Brack et al. (2014), a sequential approach to best integer equivariant estimation was developed and tested, while Odolinski and Teunissen (2020) analyzed the normal distribution-based BIE estimation for low-cost single-frequency multi-GNSS RTK positioning.
In this contribution, we will generalize BIE estimation to the larger class of elliptically contoured distributions (ECDs). Many important distributions belong to this class (Chmielewski 1981;Cabane et al. 1981). Explicit expressions of the BIE estimator will be given for the multivariate normal distribution, the contaminated normal distribution and the multivariate t-distribution. The relevance of the contaminated distribution stems from the fact that it is a finite mixture distribution particularly useful for modeling data that are thought to contain a distinct subgroup of observations and thus can be used to model experimental error or contamination. The multivariate t-distribution is another class of distributions with tails heavier than the normal (Kibria and Joarder 2006;Roth 2013).
Several studies have already indicated the occurrence of GNSS instances where working with distributions that have tails heavier than the normal would be more appropriate. In Heng et al. (2011), for instance, it is shown that GPS satellite clock errors and instantaneous UREs have heavier tails than the normal distribution for about half of the satellites. Similar findings can be found in Dins et al. (2015). Also in fusion studies of GPS and INS, Student's t-distribution has been proposed as the more suited distribution, see e.g., (Zhu et al. 2012;Zhong et al. 2018;Wang and Zhou 2019). And similar findings can be found in studies of multi-sensor GPS fusion for personal and vehicular navigation (Dhital et al. 2013;Al Hage et al. 2019).
This contribution is organized as follows. We start with a brief review of the theory of integer equivariant estimation in Sect. 2. Here, we emphasize the difference between integer equivariant estimation and integer estimation and show that the structure of the BIE estimator is one of a weighted average over the combined spaces of real and integer numbers. In Sect. 3, we present the general expression of the BIE estimator for elliptically contoured distributions. This will be done for the mixed-integer model, the integer-only model and the real-only model. We also emphasize here how the probability density function determines the structure of the estimator. Then, in Sects. 4, 5 and 6, we derive the explicit expressions of the BIE estimator for the multivariate normal, the contaminated normal and the multivariate t-distribution. In Sect. 7, they are compared and is it shown how they can be computed efficiently. The summary and conclusions are given in Sect. 8.
We make use of the following notation: A T is the transpose of matrix A, R(A) denotes the range space of matrix A and N (A) its null space. ||y|| 2 Σ = (y) T Σ −1 (y) denotes the weighted squared norm of vector y and |Σ yy | the determinant of matrix Σ yy . A matrix U is said to be a basis matrix of a subspace V, if the column vectors of U form a basis of V, i.e., the columns of U are linearly independent (full rank) and R(U ) = V. The subspace V ⊥ is the orthogonal complement of V. We also make a frequent use of the PDF transformation rule: If x = T y + t and f y (y) is the probability density function (PDF) of y, then f x (x) = |T | −1 f y (T −1 (y − t)) is the PDF of x. P [E] denotes the probability of event E, ∼ 'distributed as,' and E(.) and D(.), the expectation and dispersion operator, respectively.

Integer equivariant estimation
The class of integer equivariant (IE) estimators was introduced in Teunissen (2003). To appreciate the differences between IE estimators and integer estimators, we first recall the three conditions that an integer (I) estimator has to fulfill (Teunissen 1999a, b): Definition 1 (I-estimation) Letâ ∈ R n be a real-valued estimator of the integer vector a ∈ Z n . Then,ǎ = I(â), I : R n → Z n , is an integer estimator of a if the pull-in region of I, P z = {x ∈ R n | z = I(x)}, satisfies the three conditions: Each of the above three conditions states a property which is reasonable to ask of an arbitrary integer estimator. The first condition states that the pull-in regions should not leave any gaps, and the second that they should not overlap. The absence of gaps is needed in order to be able to map any 'float' solutionâ to Z n , while the absence of overlaps is needed to guarantee that the 'float' solution is mapped to just one integer vector.
The third condition of the definition follows from the requirement that I(x + z) = I(x) + z, ∀x ∈ R n , z ∈ Z n . Also, this condition is a reasonable one to ask for. It states that when the 'float' solutionâ is perturbed by an arbitrary integer vector z, the corresponding integer solution is perturbed by the same amount. This property thus allows one to apply the integer remove-restore technique, I(â − z) + z = I(â). It allows one to work with the fractional parts ofâ, instead of with its complete entries.
It is this last condition which forms the guiding principle of IE estimators. Let the mean of the m-vector of observables be mixed-integer parametrized as with [A, B] ∈ R m×(n+ p) given and of full rank, and let our interest lie in estimating the linear function It then seems reasonable that the estimator should at least obey the integer remove-restore principle. For instance, when estimating integer ambiguities in case of GNSS, one would like, when adding an arbitrary number of cycles to the carrier-phase data, that the solution of the integer ambiguities be shifted by the same integer amount. For the estimator of θ , this means that adding Az to y, for arbitrary z ∈ Z n , must result in a shift of L a z. Similarly, it seems reasonable to require of the estimator that adding Bβ to y, for arbitrary β ∈ R p , results in a shift of L b β. After all, we would not like the integer part of the estimator to be affected by such an addition to y. Requiring these two properties leads to the following definition of integer equivariant estimation (Teunissen 2003): As the IE-class only requires the integer remove-restore property for estimating integer parameters, it encompasses the I-class, i.e., I ⊂ IE. In Teunissen (2003), it was shown that the IE-class also encompasses the class of linear unbiased (LU) estimators, LU ⊂ IE. An important consequence of both I and LU being subsets of IE is that optimality in IE automatically carries over to I and LU. Probably, the most popular optimal estimator in the LU-class is the best linear unbiased estimator (BLUE), where 'best' is taken in the minimum mean squared error (MMSE) sense (Rao 1973;Koch 1999;Teunissen 2000). Using the same criterion, but now in the larger class of integer equivariant estimators, leads to the best integer equivariant (BIE) estimator. (Teunissen 2003): Let the vector of observables y ∈ R m , with PDF f y (y), have the mean E(y) = Aa + Bb, with a ∈ Z n and b ∈ R p . Then, the BIE estimator of the linear function θ(a, b) = L a a + L b b, L a ∈ R q×n , L b ∈ R q× p , is given aŝ

BIE Theorem
As a consequence of LU ⊂ IE, we have that the mean squared error (MSE) of the best linear unbiased (BLU) estimator is never better than that of a BIE estimator: This implies that in the context of GNSS it would make sense to always compute the BIE baseline estimator, as its mean squared error will never be poorer than that of the 'float' baseline solutionb. Likewise, since I ⊂ IE, it follows that the BIE estimator is also MSE-superior to any integer estimator, thus also to such popular estimators of integer least-squares (ILS), integer bootstrapping (IB) and integer rounding (IR). The BIE estimator is a 'weighted average.' This can be seen if we write (3) in the compact form to unity, z∈Z n R p w(z, β)dβ = 1. In fact, by interpreting w(z, β) as a joint probability mass/density function, the BIE estimate can be interpreted as the mean of that distribution. Would one be interested in only estimating the ambiguities, then with L a = I n and L b = 0, one obtains from (5) Likewise, if one would be interested in estimating b, then with L a = 0 and L b = I p , one obtains from (5) With bothâ BIE andb BIE available, one can compute the BIE estimator of θ directly asθ BIE = L aâBIE + L bbBIE . Hence, the BIE estimator of the mean E(y) is given aŝ y BIE = Aâ BIE + Bb BIE . This is then the expression to use, when in case of GNSS for instance, one would like to obtain the BIE solution for the pseudorange and carrier-phase data. Note that the above theorem holds true for any PDF the vector of observables y might have. A closer look at (3) reveals, however, that the unknowns a and b are needed in order to compute the estimator. This dependence on a and b is present in the numerator of (3), but not in its denominator as the dependence there disappears due to the integer summation and integration. Would the dependence persist, we would not be able to compute the BIE estimator. Note, however, that it disappears if the PDF of y has the structure f y (y) = f (y − Aa − Bb). Fortunately, this is still true for a large class of PDFs, and in particular for the class of elliptically contoured distributions (ECD), which we consider next.
Definition 3 (Elliptically contoured distribution (ECD)) A random vector y ∈ R m is said to have an ECD, denoted as y ∼ ECD m (ȳ, Σ, g), if its PDF is of the form Note that the contours of constant density of an ECD are ellipsoids, (y −ȳ) T Σ −1 yy (y −ȳ) = constant, from which the ECD draws its name. Also note, since (8) is symmetric with respect toȳ, thatȳ in (8) is the mean of y, E(y) =ȳ. The positive-definite matrix Σ yy in (8), however, is in general not the variance matrix of y. It can be shown that the variance matrix of y is a scaled version of Σ yy , D(y) = σ 2 Σ yy . The function g(.) of (8) is called the generating function of the ECD (not to be confused with its moment generating function).
Several of the properties of ECDs are similar to those of the multivariate normal distribution. For instance, ECDs also have the linearity property: Also, marginal and conditional PDFs of ECDs are again an ECD. Several important distributions belong to the ECD-class. Such examples are the multivariate normal distribution, the contaminated normal distribution and the multivariate t-distribution.
If y has mean (1), its likelihood function reads f y (y|a, b) = |Σ yy | −1/2 g(||y − Aa − Bb|| 2 Σ yy ), from which the maximum likelihood estimators of a ∈ Z n and b ∈ R p follow as thus showing that the maximum likelihood estimator coincides with the (mixed) integer least-squares estimator. The minimization (9) can be further worked out if we make use of the orthogonal decomposition (Teunissen 1995): withê = y − Aâ − Bb, the least-squares residual vector,â = ΣââĀ T Σ −1 yy y andb = ΣbbB T Σ −1 yy y, the 'float' least-squares solutions of a and b, respectively, andb(a) =b − ΣbâΣ −1 aâ (â − a), the conditional leastsquares solution of b given a. The matrices are given as yy , respectively. Substitution of (10) into (9) shows thať In Teunissen (1999b), it is proven that for ECDs the ILS estimatorǎ is optimal in the sense that it has, of all estimators in the I-class, the largest probability of correct integer estimation (i.e., the largest success rate). The above property that the maximum likelihood estimate of a ∈ Z n and b ∈ R p remains the same, irrespective the choice for g (.), does in general not carry over to best integer equivariant estimation. Hence, for different functions g(.), we will have different BIE solutions. In Sects. 4, 5 and 6, we will give the explicit BIE solutions when the distributions are normal, contaminated normal and multivariate t. First, however, we will determine the general expression for the ECD-BIE estimator. We will do so by considering the three cases: A = 0 (real-valued model), B = 0 (integer-valued model) and A = 0, B = 0 (mixed-integer real model).
Proof For the proof, see the "Appendix".
Note in case the model has no integer parameters (A = 0), the BIE estimator of the real-valued parameter vector is identical to its BLUE. This is a consequence of the fact that the class of linear unbiased estimators is a subset of the IEclass, while both estimators have the MMSE property. Also note that in case of a mixed model (A = 0, B = 0), the BIE estimatorb BIE has the same structure asb of (11). By replacingǎ in the expression ofb byâ BIE , one obtainsb BIE .
An important difference between the two type of estimators is, however, that in the (mixed) ILS case the mapping of y toǎ andb does not depend on g(.), whereas in the BIE case it does. This dependence on g(.) reveals itself in the function h(z) (cf. 13). In general, it depends, through c z , on both ||ê|| 2 Σ yy and ||â − z|| 2 Σââ . Hence, h(z) will then not only be smaller whenâ is further away from z, but also when the norm of the least-squares residual vector gets larger.
The above theorem also shows that the complexity of the BIE estimator of a ∈ Z n differs depending on whether the model contains additional real-valued parameters or not. Both cases are relevant for GNSS. The model without realvalued parameters occurs, for instance, in the geometry-fixed case, when the data of a short GNSS baseline with known coordinates are analyzed for the purpose of stochastic model estimation.
As the estimatorâ BIE is simpler to compute when the model only contains integer parameters, one would perhaps be inclined, since real-valued parameters are easily eliminated by means of a linear transformation of the vector of observables, to always opt for using case (B = 0) instead of case (B = 0). Such would, however, be wrong. Although the class of elliptically contoured distributions is closed under linear transformations, the transformed ECD will generally not be a simple scaled version of the original. The following lemma makes this clear: Lemma 1 (Linear function of ECD) Let y have the PDF f y (y) = |Σ yy | −1/2 g(||y − Aa − Bb|| 2 Σ yy ) and y c = C T y, with basis matrix C ∈ R(B) ⊥ (i.e., C T B = 0). Then, y c ∼ |Σ y c y c | −1/2 g c ||y c − C T Aa|| 2 Σ yc yc Proof For proof, see the "Appendix".
This result shows that although the PDF of y c = C T y is again an ECD, its generating function g c (.) differs from the original g(.) as it is now an integrated version of it. It thus depends on g(.) whether or not one would be allowed, in the computation ofâ BIE , to still work with g(.) when eliminating the real-valued parameters from the model. As we will see in the next sections, this is the case with the normal distribution, but not in general.

BIE for multivariate normal distribution
The elliptically contoured PDF f y (y) = |Σ yy | −1/2 g(||y − Aa − Bb|| 2 Σ yy becomes that of a multivariate normal distribu-tion N m (Aa + Bb, Σ yy ), when the generating function g(x) is chosen as Thus, in this case, Σ yy is the variance matrix of y. By now using (15) with (13), one can obtain the weights for the BIE estimator in case y is normally distributed.
Corollary 1 (BIE weights for normal distribution) Let y ∼ N m (Aa + Bb, Σ yy ), a ∈ Z n , b ∈ R p . Then, the BIE weights of (12) follow using Proof By substituting (15) into (13), one obtains for the case Σââ } ∞ 0 exp{− 1 2 r 2 }r p−1 dr, of which only the z-dependent part remains when substituted into the weights of (12). The same result is obtained for the case B = 0.
Note that where the weights of the general ECD-BIE expression (cf. 13) depend on c z and thus on ||ê|| 2 Σ yy , that this dependence on the least-squares residual vectorê is absent in case of a normal distribution.

BIE for contaminated normal distribution
The contaminated normal distribution is a mixture of two normal distributions having the same mean but proportional variance matrices. A mixture of two distributions is the distribution of a random vector y of which the sample is created from the realizations of two other random vectors x 1 and x 2 as follows: First, one of the two random vectors is selected by chance according to the two given probabilities of selection, say for x 1 and 1− (0 ≤ ≤ 1) for x 2 , and then the sample of the selected random vector is realized.
The PDF of the so-created random vector y can be expressed as a convex combination of the density functions f x 1 (y) and f x 2 (y) of the two random vectors: Σ yy ) and f x 2 (y) = |δΣ yy | −1/2 g(||y −ȳ|| 2 δΣ yy ) (δ > 1) are two ECDs with same mean but proportional variance matrices, then f y (y) is a contaminated ECD.
The contaminated ECD is again an ECD, with generating function (1 − )g(x) + δ −m/2 g( x δ ). Thus, with g(x) = (2π) −m/2 exp{− 1 2 x} being the generating function for the multivariate normal distribution, the generating function for the contaminated normal distribution is given as Usually δ is chosen large and small. The idea is that the main distribution N m (ȳ, Σ yy ) is slightly 'contaminated' by the wider distribution N m (ȳ, δΣ yy ). This results in a distribution with heavier tails than the main one. By substituting (17) into (13), one obtains the weights for the contaminated normal BIE estimator.
Compare (18) with (16). Note hereby, as k(z) depends on c z , that the weights of the contaminated normal BIE estimator depend on the least-squares residual, this in contrast to the normal BIE estimator. This dependence gets less, the closer δ is chosen to one and in the limit we have lim δ→1 k(z) = 1 + /(1 − ), which as a constant cancels in the weights w z . Also note the dependence of k(z) on the dimension of the real-valued vector b ∈ R p . When all else remains the same, k(z) gets larger for larger p. The case B = 0 corresponds with p = 0.

BIE for multivariate t-distribution
The random vector y ∈ R m is said to have a multivariate t-distribution with d > 2 degrees of freedom, denoted as y ∼ T m (ȳ, Σ yy , d), if its PDF is given as (Zellner 1973;Kibria and Joarder 2006;Roth 2013) in which Γ (.) denotes the gamma-function. The mean of y is E(y) =ȳ, and its variance matrix is D(y) = d d−2 Σ yy .
It is noted that T 1 (0, 1, d) is Student's t-distribution with d degrees of freedom (Gosset 1908;Koch 1999). The t distribution has heavier tails than the normal distribution, but in the limit d → ∞ converges to the standard normal distribution N 1 (0, 1). The analogy in the multivariate case is that T m (0, Σ yy , d) converges to N m (0, Σ yy ) as d → ∞.
Also, the multivariate t-distribution is an elliptically contoured distribution. Its generating function is given as By substituting (21) into (13), one obtains the weights for the multivariate t BIE estimator.
Corollary 3 (BIE weights for multivariate t) Let y ∼ T m (Aa+ Bb, Σ yy , d), a ∈ Z n , b ∈ R p . Then, the BIE weights of (12) follow using 2a 2n Γ (s) (Gradshteyn and Ryzhik 2007), it follows from Note, as with the contaminated normal distribution, that the BIE weights for the multivariate t-distribution depend on the least-squares residual vectorê. Also note, when all else remains the same, that h(z) gets larger for larger p, i.e., if the underlying model gets weaker. The case B = 0 corresponds with p = 0.
As the t-distribution converges to the normal distribution for d → ∞, one would expect h(z) of (22) to converge to (16) for increasing degrees of freedom. And indeed, since lim n→∞ [1 + x n ] n = e x , it follows that which is the h(z) of (16).

On the computation of the BIE estimators
As (12) cannot be computed in practice due to the occurrence of infinite sums, we need to replace the sum over all integers by a sum over a finite set. It seems reasonable to neglect the contributions to the sum, when h(z) is sufficiently small. As h(z) gets smaller, the larger the (weighted) distance between z andâ is (cf. 12), we define the integer set as and approximateâ BIE bŷ This will be a good approximation ofâ BIE if the size of the integer set Ω λ a , and thus λ can be properly chosen. To find such way of choosing λ, we first note that the approximationâ λ BIE can also be seen as an exact BIE estimator in its own right, namely when the PDF is given by the following truncated version of f y (y), with δ λ a (x) being the indicator function of the ellipsoidal region E λ a = {x ∈ R n | ||x − a|| 2 Σââ ≤ λ 2 }. Thus, by using (25) as PDF instead of (8), one will obtain instead of (12), with its infinite sums, the BIE estimator (24), having finite sums, since z∈Z n δ λ z (â)h(z) = z∈Ω λ a h(z). As (25) produces (24) as BIE estimator, one will have a good approximation of (12) when f λ y (y) is a good approximation to the original PDF f y (y), which will be the case when the denominator of (25) is close enough to one, and thus for α small enough. By applying the appropriate change-ofvariable rule, one will recognize the integral of (26) as the probability ofâ residing in E λ a . Hence, the proper size of the integer set Ω λ a in (24) is found by choosing λ to satisfy The following lemma shows how this probability can be computed for the three different ECDs that we have considered.
Once the choice for the size of the integer set Ω λ a has been made, the integer vectors contained in it have to be collected, so as to be able to compute (24). This collection can be done efficiently with the LAMBDA method (Teunissen 1995;De Jonge et al. 1996), as also demonstrated in Verhagen and Teunissen (2005) and (Odolinski and Teunissen 2020).

Summary and conclusions
In this contribution, the theory of integer equivariant (IE) estimation (Teunissen 2003) was extended to include the family of elliptically contoured distributions. As the class of IE estimators includes both the class of integer (I) estimators and class of linear unbiased (LU) estimators, any optimality in the IE-class automatically carries over to the I-class and the LUclass. Hence, ifb is an arbitrary I estimator andb an arbitrary LU estimator, then the best integer equivariant (BIE) estimatorb BIE , which is optimal in the minimum mean squared error sense, will have a mean squared error (MSE) that satisfies MSE(b BIE ) ≤ MSE(b) and MSE(b BIE ) ≤ MSE(b). (28) In the context of GNSS, this implies that the MSE of the baseline BIE estimator is always less than, or at the most equal to, that of both its 'float' and 'fixed' counterparts. This shows that from the MSE perspective one should always prefer the use of the BIE baseline over that of the best linear unbiased (BLU) baseline and integer least-squares (ILS) baseline.
In contrast to the BLU estimator and the ILS estimator, the expression for the BIE estimator depends on the probability density function (PDF) of the vector of observables. Different PDFs give different mappings from the vector of observables y to the baseline b. In this contribution, we developed the expressions of the BIE estimator for the family of elliptically contoured distributions. For the mixed-integer model E(y) = et al. 1989), it follows from an application of the change-ofvariable rule that R p g(c z + v T v)dv = S( p) ∞ 0 r p−1 g(c z + r 2 )dr (38) where S( p) = J (1, α)dα is the surface area of the pdimensional unit sphere. Combining (37) and (38) with (35) concludes the proof for the case A = 0. For the case A = 0, we havê which, with ||y − Bβ|| 2 Σ yy = ||ê|| 2 Σ yy + ||b − β|| 2 Σbb ,b = (B T Σ −1 yy B) −1 B T Σ −1 yy y, can be written aŝ which proves the stated result.
This concludes the proof of the lemma.