1 Introduction

The mixed-integer model forms the basis for ultraprecise GNSS parameter estimation (Hofmann-Wellenhof et al. 2008; Leick et al. 2015; Teunissen and Montenbruck 2017). Characteristically the mixed-integer least-squares model parameters are usually computed in the order of first the integer ambiguities and then the ambiguity-resolved baseline parameters. There is in principle however no a-priori reason for this particular order. In this contribution we study the dual mixed-integer least-squares problem by reversing the computational order of the ambiguities and baseline parameters. This has the potential advantage of a direct computation of the baseline vector, without the need of an explicit computation of the resolved integer ambiguities. We study the opportunities and drawbacks of this approach, and show that certain approximations of the dual problem may have practical potential under specified conditions. We provide a complete distributional analysis of their estimators, thus allowing users to judge whether or not the approximations are acceptable for their application. We also develop the algorithmic details to ensure that the global minimizer of the dual function can be computed.

This contribution is organized as follows. Section 2 provides a brief review of integer least-squares (ILS) ambiguity resolution, together with the distributional properties of the ambiguity- and baseline-estimators. The dual mixed-ILS formulation is introduced in Sect. 3, together with a representation of its objective function \({\mathcal {D}}(b)\). It is shown that it implicitly also relies on an ILS-problem, albeit with a metric driven by the more precise conditional ambiguity variance matrix. The potential advantage of the dual formulation in solving for b directly, relies therefore on the ease with which this implicit ILS problem can be solved. For the purpose of alleviating the computational demand on the implicit ILS problem, two approximations to the dual are introduced in Sects. 4 and 5, respectively. The dual approximation of Sect. 4 consists of approximating the conditional ambiguity variance matrix. It is shown to which primal formulation this approximate dual belongs and a complete distributional description of its estimators, together with success-rate bounds, is provided. The approximation of Sect. 5 consists of replacing the implicit ILS-estimator of the dual function by a simpler integer map. It is shown that as a result a new class of admissible integer estimators is found. Also for this class a distributional description of its estimators, together with success-rate bounds, is provided. With the purpose of providing insight in the challenge of minimizing \({\mathcal {D}}(b)\), Sect. 6 illustrates and describes the multimodality of the dual function. As the dual function lacks convexity, special algorithmic care is required to find its global minimizer. We present our proposed method in Sect. 7. It has finite termination with a guaranteed \(\epsilon \)-tolerance and it is constructed from combining the branch-and-bound principle, with a special convex-relaxation of the dual, to which the projected-gradient-descent method is applied to obtain the required bounds. As the described approach of our proposed method is not restricted to the presented dual formulation, we provide an outlook for the constrained and partitioned dual problems in Sect. 8. Finally, Sect. 9 contains the Summary and Conclusions.

The following notation is used: \({\textsf{E}}(.)\) and \({\textsf{D}}(.)\) stand for the expectation and dispersion operator, respectively, and \({\mathcal {N}}_{p}(\mu , Q)\) denotes a p-dimensional, normally distributed random vector, with mean (expectation) \(\mu \) and variance matrix (dispersion) Q. \({\mathbb {R}}^{p}\) and \({\mathbb {Z}}^{p}\) denote the p-dimensional spaces of real and integer numbers, respectively, and the range space of a matrix M is denoted as \({\mathcal {R}}(M)\). The least-squares (LS) inverse of a full column rank matrix M is denoted as \(M^{+}=(M^{T}Q_{yy}^{-1}M)^{-1}M^{T}Q_{yy}^{-1}\) and the orthogonal projector onto \({\mathcal {R}}(M)\) as \(P_{M}=MM^{+}\). \(P_{M}^{\perp }=I-P_{M}\) is then the orthogonal projector that projects orthogonally on the orthogonal complement of \({\mathcal {R}}(M)\). The Q-weighted squared norm is denoted as \(||.||_{Q}^{2}=(.)^{T}Q^{-1}(.)\), and \(\lceil x\rfloor \) denotes the rounding of x to the nearest integer. If applied to a vector, the rounding is understood to apply to each of its coordinates. \(\cup \) and \(\cap \) denote the union and intersection operators, and the vectorial inequality \(\preceq \) denotes the all componentwise inequality \(\le \). \({\textsf{P}}[{\mathcal {A}}]\) denotes the probability of event \({\mathcal {A}}\), \(f_{{\hat{b}}}(b)\) the probability density function (PDF) of the continuous random vector \({\hat{b}}\) and \({\textsf{P}}[{\check{a}}=z]\) the probability mass function (PMF) of the integer random vector \({\check{a}}\). The noncentral Chi-square distribution with p degrees of freedom and noncentrality parameter \(\lambda \) is denoted as \(\chi ^{2}(p, \lambda )\) and its \(\delta \)-percentage critical value as \(\chi ^{2}_{\delta }(p,0)\).

2 Brief review of ILS ambiguity resolution

We start from the mixed-integer model of (linearized) GNSS observation equations (Leick et al. 2015; Teunissen and Montenbruck 2017), which in vector–matrix form reads,

$$\begin{aligned} {\textsf{E}}(y)=Aa+Bb,\; {\textsf{D}}(y)=Q_{yy} \end{aligned}$$
(1)

with \(y \sim {\mathcal {N}}_{m}({\textsf{E}}(y), {\textsf{D}}(y))\) the m-vector of normally distributed pseudorange and carrier-phase observables, \([A, B] \in {\mathbb {R}}^{m \times (n+p)}\) the given design matrix of full rank \(n+p\), \(a \in {\mathbb {Z}}^{n}\) the unknown ambiguity vector consisting of the integer carrier-phase ambiguities, \(b \in {\mathbb {R}}^{p}\) the unknown baseline vector consisting of the remaining real-valued parameters, such as, e.g., position coordinates, atmosphere parameters, receiver/satellite clock parameters, and instrumental biases, and \(Q_{yy} \in {\mathbb {R}}^{m \times m}\) the given positive-definite variance matrix of the observables. The above GNSS model may be given in undifferenced, single-differenced or double-differenced form. In any of these forms, the possible rank-defects in the design matrix are assumed eliminated through a careful reparametrization in clearly defined estimable parameters (Odijk et al. 2015; Teunissen 2019).

The mixed integer least-squares (ILS) estimation of the integer ambiguity vector \(a \in {\mathbb {Z}}^{n}\) and the real-valued baseline vector \(b \in {\mathbb {R}}^{p}\) is executed in three steps (float-integer-fixed). In the first step, the integer constraint on a is discarded, giving the so-called float-solution of a and b as

$$\begin{aligned} {\hat{a}}={\bar{A}}^{+}y\;\textrm{and}\;{\hat{b}}={\bar{B}}^{+}y \end{aligned}$$
(2)

with \({\bar{A}}=P_{B}^{\perp }A\) and \({\bar{B}}=P_{A}^{\perp }B\). In the second step, the integer constraint \(a \in {\mathbb {Z}}^{n}\) is invoked, and \({\hat{a}} \in {\mathbb {R}}^{n}\) of (2) is used as input to obtain the integer estimate of the ambiguity vector a as

$$\begin{aligned} {\check{a}}= \arg \min _{a \in {\mathbb {Z}}^{n}} ||{\hat{a}}-a||_{Q_{{\hat{a}}{\hat{a}}}}^{2} \end{aligned}$$
(3)

with \(Q_{{\hat{a}}{\hat{a}}}=({\bar{A}}^{T}Q_{yy}^{-1}{\bar{A}})^{-1}\) being the variance matrix of \({\hat{a}}\). Once the integer solution (3) has been obtained, the expression of the conditional least-squares (LS) baseline estimator, \({\hat{b}}(a)={\hat{b}}-Q_{{\hat{b}}{\hat{a}}}Q_{{\hat{a}}{\hat{a}}}^{-1}({\hat{a}}-a)\) (i.e. conditioned on knowing a), is used in the third step to compute the ambiguity-fixed baseline solution as

$$\begin{aligned} {\check{b}}={\hat{b}}({\check{a}})={\hat{b}}-Q_{{\hat{b}}{\hat{a}}}Q_{{\hat{a}}{\hat{a}}}^{-1}\left( {\hat{a}}-{\check{a}}\right) \end{aligned}$$
(4)

That (2), (3) and (4) are indeed the LS and ILS solutions of the GNSS model (1) follows readily from the orthogonal decomposition (Teunissen 1998a)

$$\begin{aligned} ||y-Aa-Bb||_{Q_{yy}}^{2}=||P_{[A,B]}^{\perp }y||_{Q_{yy}}^{2}+F(a,b) \end{aligned}$$
(5)

where

$$\begin{aligned} F(a,b)=||{\hat{a}}-a||_{Q_{{\hat{a}}{\hat{a}}}}^{2}+||{\hat{b}}(a)-b||_{Q_{{\hat{b}}(a){\hat{b}}(a)}}^{2} \end{aligned}$$
(6)

with \(Q_{{\hat{b}}(a){\hat{b}}(a)}=(B^{T}Q_{yy}B)^{-1}\) the variance matrix of \({\hat{b}}(a)\). As \(||P_{[A,B]}^{\perp }y||_{Q_{yy}}^{2}\) is independent of a and b, the minimizers of \(||y-Aa-Bb||_{Q_{yy}}^{2}\) are those of F(ab). It therefore follows from (6), recognizing \({\hat{b}}({\hat{a}})={\hat{b}}\), that the real-valued minimizers of F(ab) are given by (2), while their mixed-integer counterparts are given by (3) and (4), respectively. We therefore have for the LS and the mixed ILS solutions,

$$\begin{aligned} \left. \begin{array}{c} {\hat{a}}\\ {\hat{b}} \end{array} \right\} = \underset{a \in {\mathbb {R}}^{n}, b \in {\mathbb {R}}^{p}}{\arg \min }F(a,b),\; \left. \begin{array}{c} {\check{a}}\\ {\check{b}} \end{array} \right\} = \underset{a \in {\mathbb {Z}}^{n}, b \in {\mathbb {R}}^{p}}{\arg \min }F(a,b) \end{aligned}$$
(7)

In order to judge the quality of the mixed ILS estimators \({\check{a}}\) and \({\check{b}}\), we need their probability distributions. They are given in the following theorem.

Theorem 1

(Teunissen 1999b) The probability mass function (PMF) of \({\check{a}}\) and the probability density function (PDF) of \({\check{b}}\) are given as

$$\begin{aligned} \left\{ \begin{array}{lcl} {\textsf{P}}[{\check{a}}=z] &{}=&{} \int _{{\mathscr {P}}_{z}} f_{{\hat{a}}}(\alpha )d \alpha \\ f_{{\check{b}}}(\beta )&{}=&{} \sum \limits _{z \in {\mathbb {Z}}^{n}} f_{{\hat{b}}(z)}(\beta ) {\textsf{P}}[{\check{a}}=z] \end{array} \right. \end{aligned}$$
(8)

with \({\hat{a}} \sim {\mathcal {N}}_{n}(a, Q_{{\hat{a}}{\hat{a}}})\), \({\hat{b}}(z) \sim {\mathcal {N}}_{p}(b-Q_{{\hat{b}}{\hat{a}}}Q_{{\hat{a}}{\hat{a}}}^{-1}(a-z), Q_{{\hat{b}}(z){\hat{b}}(z)})\), and the pull-in region of \(z \in {\mathbb {Z}}^{n}\) given as \({\mathscr {P}}_{z} = \{ x \in {\mathbb {R}}^{n}|\; ||x-z||_{Q_{{\hat{a}}{\hat{a}}}}^{2} \le ||x-u||_{Q_{{\hat{a}}{\hat{a}}}}^{2}, \forall u \in {\mathbb {Z}}^{n}\}\). \(\blacksquare \)

In the practice of GNSS ambiguity resolution one aims to resolve the ambiguities with a high success-rate, i.e. a high probability of correct integer estimation \({\textsf{P}}[{\check{a}}=a]\). When the success-rate is high enough, one may neglect the uncertainty in \({\check{a}}\) and describe the uncertainty in \({\check{b}}\) by means of the PDF of \({\hat{b}}(a)\). This is made precise by the following bounds of Teunissen (1999b),

$$\begin{aligned} {\textsf{P}}[{\hat{b}}(a) \in \varOmega ]{\textsf{P}}[{\check{a}}=a]\le {\textsf{P}}[{\check{b}} \in \varOmega ] \le {\textsf{P}}\left[ {\hat{b}}(a) \in \varOmega \right] \end{aligned}$$
(9)

which hold true for any convex region \(\varOmega \subset {\mathbb {R}}^{p}\) centred at \({\textsf{E}}({\hat{b}})\). Thus when the success-rate \({\textsf{P}}[{\check{a}}=a]\) is close enough to one, then

$$\begin{aligned} {\textsf{P}}[{\check{b}} \in \varOmega ] \approx {\textsf{P}}[{\hat{b}}(a) \in \varOmega ] \end{aligned}$$
(10)

which in case of GNSS, due to the very precise carrier-phase data, is usually a much larger probability than that obtained from the float-solution \({\hat{b}}\), \({\textsf{P}}[{\hat{b}}(a) \in \varOmega ] \gg {\textsf{P}}[{\hat{b}} \in \varOmega ]\).

Fig. 1
figure 1

Primal and dual orthogonal decompositions, after (Teunissen 1998a): \(||y-Aa-Bb||_{Q_{yy}}^{2}=||P_{[A,B]}^{\perp }y||_{Q_{yy}}^{2}+||P_{[A,B]}(y-Aa-Bb)||_{Q_{yy}}^{2}\), with primal decomposition \(||P_{[A,B]}(y-Aa-Bb)||_{Q_{yy}}^{2}=||P_{{\bar{A}}}(y-Aa)||_{Q_{yy}}^{2}+||P_{B}(y-Aa-Bb)||_{Q_{yy}}^{2}=||{\hat{a}}-a||_{Q_{{\hat{a}}{\hat{a}}}}^{2}+||{\hat{b}}(a)-b||_{Q_{{\hat{b}}(a){\hat{b}}(a)}}^{2}\) and dual decomposition \(||P_{[A,B]}(y-Aa-Bb)||_{Q_{yy}}^{2}=||P_{{\bar{B}}}(y-Bb)||_{Q_{yy}}^{2} +||P_{A}(y-Aa-Bb)||_{Q_{yy}}^{2}=||{\hat{b}}-b||_{Q_{{\hat{b}} {\hat{b}}}}^{2}+||{\hat{a}}(b)-a||_{Q_{{\hat{a}} (b){\hat{a}}(b)}}^{2}\)

3 A dual mixed ILS formulation

3.1 Primal and dual mixed ILS

A characteristic of the 3-step solution approach is the order in which the mixed ILS solutions \({\check{a}}\) and \({\check{b}}\) are computed in the last two steps. First the integer ambiguity estimate \({\check{a}}\) (cf. 3) is computed and then the fixed baseline estimate as \({\check{b}}={\hat{b}}({\check{a}})\) (cf. 4). There is in principle however no a-priori reason for this particular order. The same solution will be obtained if one would interchange the order of the two minimization steps, since

$$\begin{aligned} \min _{a \in {\mathbb {Z}}^{n}}\left[ \min _{b \in {\mathbb {R}}^{p}}F(a,b)\right] = \min _{b \in {\mathbb {R}}^{p}}\left[ \min _{a \in {\mathbb {Z}}^{n}}F(a,b)\right] \end{aligned}$$
(11)

With this equivalence, the solution to the mixed ILS problem can be formulated in two alternative ways, each working with a different objective function, namely a primal function \({\mathcal {P}}(a)\) that solely depends on the ambiguity vector and a dual function \({\mathcal {D}}(b)\) that solely depends on the baseline vector b. With the aid of the following short-hand notation

$$\begin{aligned} \begin{array}{lcl} {\hat{b}}(a) &{}=&{} \arg \min \limits _{b \in {\mathbb {R}}^{p}}F(a,b) \\ {\check{a}}(b) &{}=&{} \arg \min \limits _{a \in {\mathbb {Z}}^{n}}F(a,b) \end{array} \end{aligned}$$
(12)

we have the following result.

Theorem 2

(Primal and Dual Mixed ILS) Let the primal and dual objective functions be defined as

$$\begin{aligned} \begin{array}{llcl} \mathrm{Primal:}&{}{\mathcal {P}}(a) &{}=&{} \min \limits _{b \in {\mathbb {R}}^{p}}F(a,b)=F\left( a, {\hat{b}}(a)\right) \\ \mathrm{Dual:}&{} {\mathcal {D}}(b) &{}=&{} \min \limits _{a \in {\mathbb {Z}}^{n}}F(a,b)=F\left( {\check{a}}(b),b\right) \end{array} \end{aligned}$$
(13)

Then the mixed ILS solution is given as

$$\begin{aligned} \begin{array}{lcl} {\check{a}} &{}=&{} \arg \min \limits _{a \in {\mathbb {Z}}^{n}} {\mathcal {P}}(a) = {\check{a}}({\check{b}})\\ {\check{b}} &{}=&{} \arg \min \limits _{b \in {\mathbb {R}}^{p}} {\mathcal {D}}(b) = {\hat{b}}({\check{a}}) \end{array} \end{aligned}$$
(14)

\(\blacksquare \)

This result shows that one has two routes available for computing the mixed ILS solution. Either one minimizes \({\mathcal {P}}(a)\) to obtain \({\check{a}}\) first and then \({\check{b}}={\hat{b}}({\check{a}})\), or one minimizes \({\mathcal {D}}(b)\) to obtain \({\check{b}}\) first and then \({\check{a}}={\check{a}}({\check{b}})\). Both routes determine the same minimum of F(ab),

$$\begin{aligned} \min _{a \in {\mathbb {Z}}^{n}, b \in {\mathbb {R}}^{p}}F(a,b)= {\mathcal {P}}\left( {\check{a}}\right) ={\mathcal {D}}\left( {\check{b}}\right) \end{aligned}$$
(15)

The first route is the one described in the previous section. The second route is the object of study of the present contribution.

3.2 The dual objective function

If the parameters of interest are not the ambiguities, but rather the entries of b, it seems that working with the dual function \({\mathcal {D}}(b)\) is a natural way to go. To determine an explicit expression for \({\mathcal {D}}(b)\), it is useful to start from the orthogonal decomposition (6), but now with the roles of a and b interchanged, i.e.

$$\begin{aligned} F(a,b)= ||{\hat{b}}-b||_{Q_{{\hat{b}}{\hat{b}}}}^{2}+||{\hat{a}}(b)-a||_{Q_{{\hat{a}}(b){\hat{a}}(b)}}^{2} \end{aligned}$$
(16)

where \({\hat{a}}(b)={\hat{a}}-Q_{{\hat{a}}{\hat{b}}}Q_{{\hat{b}}{\hat{b}}}^{-1}({\hat{b}}-b)\), \(Q_{{\hat{b}}{\hat{b}}}=({\bar{B}}^{T}Q_{yy}^{-1}{\bar{B}})^{-1}\), and \(Q_{{\hat{a}}(b){\hat{a}}(b)}=(A^{T}Q_{yy}^{-1}A)^{-1}\). For the geometry of the primal and dual orthogonal decompositions, see Fig. 1. We can now obtain the following representations of the dual function \({\mathcal {D}}(b)\).

Lemma 1

(Dual objective function): Let \({\mathscr {S}}_{z}=\{ x \in {\mathbb {R}}^{n}|\; ||x-z||_{Q_{{\hat{a}}(b){\hat{a}}(b)}}^{2}\le ||x-u||_{Q_{{\hat{a}}(b){\hat{a}}(b)}}^{2}, \forall u \in {\mathbb {Z}}^{n}\}\) be the ILS pull-in regions of \(Q_{{\hat{a}}(b){\hat{a}}(b)}\), having \(s_{z}(x)\) as its indicator function, i.e. \(s_{z}(x)=1\) if \(x \in {\mathscr {S}}_{z}\) and \(s_{z}(x)=0\) otherwise. Then

$$\begin{aligned} \begin{array}{l} {\mathcal {D}}(b)=\min \limits _{a \in {\mathbb {Z}}^{n}}F(a,b)\\ \quad =||{\hat{b}}-b||_{Q_{{\hat{b}}{\hat{b}}}}^{2}+||{\hat{a}}(b)-{\check{a}}(b)||_{Q_{{\hat{a}}(b){\hat{a}}(b)}}^{2}\\ \quad = ||{\hat{b}}-b||_{Q_{{\hat{b}}{\hat{b}}}}^{2}+\sum \limits _{z \in {\mathbb {Z}}^{n}}||{\hat{a}}(b)-z||_{Q_{{\hat{a}}(b){\hat{a}}(b)}}^{2}s_{z}({\hat{a}}(b)) \end{array} \end{aligned}$$
(17)

where

$$\begin{aligned} {\check{a}}(b)=\arg \min \limits _{a \in {\mathbb {Z}}^{n}} ||{\hat{a}}(b)-a||_{Q_{{\hat{a}}(b){\hat{a}}(b)}}^{2} \end{aligned}$$
(18)

\(\blacksquare \)

This shows that the dual function \({\mathcal {D}}(b)\) is a sum of two functions in b,

$$\begin{aligned} {\mathcal {D}}(b)=\underbrace{||{\hat{b}}-b||_{Q_{{\hat{b}}{\hat{b}}}}^{2}}_{{\mathcal {D}}_{1}(b)}+\underbrace{||{\hat{a}}(b)-{\check{a}}(b)||_{Q_{{\hat{a}}(b){\hat{a}}(b)}}^{2}}_{{\mathcal {D}}_{2}(b)} \end{aligned}$$
(19)

As the second function \({\mathcal {D}}_{2}(b)\) is formed from solving again an ILS problem, one may wonder whether anything would be gained by working with the dual \({\mathcal {D}}(b)\), in particular if we also note that the ILS problem of (18) needs to be re-evaluated for every different value of the unknown b. A comparison of the two ILS problems, (3) and (18), shows however that the second is formulated with respect to the conditional variance matrix \(Q_{{\hat{a}}(b){\hat{a}}(b)}\) and not with respect to \(Q_{{\hat{a}}{\hat{a}}}\) as is the case with (3). Although both ILS problems can be solved efficiently by means of the LAMBDA method (Teunissen 1995), we recall that herein the two dominant computational components are (1) the Z-decorrelation, and (2) the ellipsoidal integer search. Hence, if the structure of the conditional vc-matrix \(Q_{{\hat{a}}(b){\hat{a}}(b)}\) is such that one or both of these components can be skipped or simplified, then working with the dual \({\mathcal {D}}(b)\) could perhaps become attractive in some instances. For instance, if \(Q_{{\hat{a}}(b){\hat{a}}(b)}\) is diagonal, \({\check{a}}(b)\) equals the component-wise rounded version of \({\hat{a}}(b)\), and both components can be avoided. Diagonality of \(Q_{{\hat{a}}(b){\hat{a}}(b)}\) happens when the columns of A are mutually orthogonal in the metric of \(Q_{yy}\). In the realm of GNSS, this is the case with the multi-frequency geometry-free GNSS model. Ease of computation would also be present if \({\check{a}}(b)\) would only be moderately dependent on b. To provide insight into this, we consider the probability mass function of \({\check{a}}(b)\) and in particular its success-rate \({\textsf{P}}[{\check{a}}(b)=a]\).

3.3 Probability mass function of \({\check{a}}(b)\)

For every b that we need to evaluate \({\mathcal {D}}(b)\), we need to compute the integer estimate \({\check{a}}(b)\). The performance of this integer estimator can be described by its probability mass function (PMF).

Lemma 2

(PMF of \({\check{a}}(b)\)) The probability mass function of \({\check{a}}(b)\) (cf. 18) is given as

$$\begin{aligned} \begin{array}{lcl} {\textsf{P}}[{\check{a}}(b)=z]&{}=&{} {\textsf{P}}[{\hat{a}}(b) \in {\mathscr {S}}_{z}]\\ &{}=&{} \int _{{\mathscr {S}}_{z}} f_{{\hat{a}}(b)}(x)dx, \forall z \in {\mathbb {Z}}^{n} \end{array} \end{aligned}$$
(20)

with pull-in regions \({\mathscr {S}}_{z}=\{x \in {\mathbb {R}}^{n}\;|\; ||x-z||_{Q_{{\hat{a}}(b){\hat{a}}(b)}}^{2} \le ||x-u||_{Q_{{\hat{a}}(b){\hat{a}}(b)}}^{2}, \forall u \in {\mathbb {Z}}^{n}\}\) and the PDF \(f_{{\hat{a}}(b)}(x)\) of \({\hat{a}}(b)\) given as

$$\begin{aligned} {\hat{a}}(b) \sim N_{n}\left( a+\varDelta a, Q_{{\hat{a}}(b){\hat{a}}(b)}\right) \end{aligned}$$
(21)

where \(\varDelta a = -Q_{{\hat{a}}{\hat{b}}}Q_{{\hat{b}}{\hat{b}}}^{-1}\varDelta b\), with \(\varDelta b={\textsf{E}}({\hat{b}})-b\). \(\blacksquare \)

The PMF of \({\check{a}}(b)\) (cf. 20) is driven by the PDF of \({\hat{a}}(b)\) (cf. 21). Its ambiguity success-rate can be evaluated with the bounds of Teunissen (2001) or with the simulation algorithms provided in Ps-LAMBDA (Verhagen et al. 2013). The PDF of \({\hat{a}}(b)\) is usually very peaked, especially in case of GNSS where we have \(Q_{{\hat{a}}(b){\hat{a}}(b)} \ll Q_{{\hat{a}}{\hat{a}}}\) due to the very precise phase data. Would this peakedness of the PDF be such that it is located over only a single pull-in region, say \({\mathscr {S}}_{u}\), \(u \in {\mathbb {Z}}^{n}\), for a certain b, then the PMF of \({\check{a}}(b)\) could be well approximated for that value of b by a Kronecker delta function,

$$\begin{aligned} {\textsf{P}}[{\check{a}}(b)=z] \approx \delta _{uz} = \left\{ \begin{array}{lcl} 1 &{} \textrm{if}\;z=u\\ 0 &{} \textrm{if}\;z \ne u \end{array} \right. \end{aligned}$$
(22)

The ambiguity success-rate of \({\check{a}}(b)\) would then be large, i.e. \({\textsf{P}}[{\check{a}}(b)=a] \approx 1\), if \(u=a\). For this to happen however, we need \(a+\varDelta a \in {\mathscr {S}}_{a}\), i.e. the bias in \({\hat{a}}(b)\) needs to be sufficiently small, with \(\varDelta a\) residing in \({\check{a}}(b)\)’s pull-in region of the origin, \(\varDelta a\in {\mathscr {S}}_{0}\). For the squared \(Q_{{\hat{a}}(b){\hat{a}}(b)}\)-weighted norm of this ambiguity bias we have the following result.

Lemma 3

(Bias of \({\hat{a}}(b)\)) Let \(\varDelta a = -Q_{{\hat{a}}{\hat{b}}}Q_{{\hat{b}}{\hat{b}}}^{-1}\varDelta b\). Then

$$\begin{aligned} \begin{array}{lcl} ||\varDelta a||_{Q_{{\hat{a}}(b){\hat{a}}(b)}}^{2}&{}=&{}||P_{A}B\varDelta b||_{Q_{yy}}^{2} \\ &{}\le &{} ||\varDelta b||_{Q_{{\hat{b}}(a){\hat{b}}(a)}}^{2} \end{array} \end{aligned}$$
(23)

\(\blacksquare \)

Proof

see Appendix. \(\square \)

This result shows that for the to be accounted range of b-values, one can generally not expect the bias \(\varDelta a\) to be small enough such that \(\varDelta a \in {\mathscr {S}}_{0}\). It would namely require knowledge of b such that \(\varDelta b= {\textsf{E}}({\hat{b}})-b\) is sufficiently small with respect to the phase-driven, small standard deviations of \({\hat{b}}(a)\). Such can only be expected in a model having strong a-priori constraints on b. As the following example demonstrates, this cannot be expected from a regular unconstrained GNSS model.

Example 1

Consider the single-frequency, single epoch, single baseline, double-differenced (DD), \(m+1\) satellite GNSS model

$$\begin{aligned} {\textsf{E}}\left[ \begin{array}{c} p \\ \phi \end{array} \right] = \left[ \begin{array}{lcl} 0 &{} D^{T}G\\ \lambda I_{m} &{} D^{T}G \end{array} \right] \left[ \begin{array}{c} a\\ b \end{array} \right] , {\textsf{D}}\left[ \begin{array}{c} p \\ \phi \end{array} \right] \nonumber \\= \left[ \begin{array}{lcl} \sigma _{p}^{2}D^{T}D &{} 0\\ 0 &{} \sigma _{\phi }^{2}D^{T}D \end{array} \right] \end{aligned}$$
(24)

with \(p, \phi \in {\mathbb {R}}^{m}\) the DD pseudorange and carrier-phase data vectors, \(\lambda \) the wavelength, \(D^{T}=[-e_{m}, I_{m}]\) the differencing matrix, \(G \in {\mathbb {R}}^{(m+1) \times 3}\) the receiver-satellite geometry matrix, and \(\sigma _{p}^{2}, \sigma _{\phi }^{2}\) the variances of the single-differenced pseudoranges and carrier-phases. For this model the variance matrix of \({\hat{a}}(b)\) and its bias work out to be

$$\begin{aligned} \begin{array}{lcl} Q_{{\hat{a}}(b){\hat{a}}(b)}&{}=&{}\left( A^{T}Q_{yy}^{-1}A\right) ^{-1}=\tfrac{\sigma _{\phi }^{2}}{\lambda ^{2}}D^{T}D \\ \varDelta a &{}=&{}\left( A^{T}Q_{yy}^{-1}A\right) ^{-1}A^{T}Q_{yy}^{-1}B\varDelta b = \tfrac{1}{\lambda }D^{T}G\varDelta b \end{array} \end{aligned}$$
(25)

Recognizing that the rows of \(D^{T}G\) consist of differences of the rows of G and that each row of G consists of a unit direction vector, the entries of \(\varDelta a\) can be bounded from above as \(|(\varDelta a)_{i}| \le \tfrac{2}{\lambda } ||\varDelta b||\), \(i=1, \ldots , m\). This shows, as \(\lambda \approx 20\)cm in case of GNSS, that \(\varDelta b = {\textsf{E}}({\hat{b}})-b\) has to be very small indeed to ensure that \(|(\varDelta a)_{i}|\) stays below the subcycle level. \(\square \)

As the above has demonstrated, without strong a-priori constraints on b, one cannot expect the success-rate of \({\check{a}}(b)\) to be large. That is, despite the high precision of the conditional estimate \({\hat{a}}(b)\), the influence of the unknown b is still too large to have \({\check{a}}(b)=a\) with high probability. This implies that one will have to evaluate \({\mathcal {D}}_{2}(b)=||{\hat{a}}(b)-{\check{a}}(b)||_{Q_{{\hat{a}}(b){\hat{a}}(b)}}^{2}\) for a range of values of b and thus also solve as many ILS problems. It would therefore be beneficial, in case solving the ILS-problem is too time-consuming, if we could replace the evaluation of \({\mathcal {D}}_{2}(b)\) with a simpler one, without affecting the performance of the whole estimation process by much. One can ask oneself for instance, whether one can take advantage of the peakedness of the PDF of \({\hat{a}}(b)\) and replace the ILS estimator \({\check{a}}(b)\) by the integer-rounding (IR) estimator \(\lceil {\hat{a}} \rfloor \), without a serious degradation in performance. Such would be possible if ’all’ probability of the PDF of \({\hat{a}}(b)\) would be concentrated in the intersections of the ILS and IR pull-in regions, which would require \(a+\varDelta a \in {\mathscr {S}}_{u} \cap {\mathscr {R}}_{u}\), with \({\mathscr {R}}_{u}\) denoting the integer-rounding pull-in region of \(u \in {\mathbb {Z}}^{n}\). However, such assumption cannot be generally valid, as by changing b in \(\varDelta a= - Q_{{\hat{a}}{\hat{b}}}Q_{{\hat{b}}{\hat{b}}}^{-1}\varDelta b\), one would also be able to pull \(a+\varDelta a\) out of such intersection \({\mathscr {S}}_{u} \cap {\mathscr {R}}_{u}\). It is therefore of importance, if one would replace the evaluation of \({\mathcal {D}}_{2}(b)\) by a simpler one, that one at the same time also has the ability to give a rigorous evaluation of the performance of such simplification. In the following we introduce two different simplifications of \({\mathcal {D}}(b)\) and study the probabilistic properties of their minimizers.

4 Dual with approximate weight matrix

In this and the next section we study the properties of the baseline- and ambiguity estimators when one works, instead with the dual \({\mathcal {D}}(b)\), with easier-to-compute approximations to it. The two types of approximation that we consider are,

$$\begin{aligned} \begin{array}{lcl} (a)\;\;{\mathcal {D}}^{\circ }(b)&{}=&{}||{\hat{b}}-b||_{Q_{{\hat{b}}{\hat{b}}}}^{2}+ \min \limits _{ a \in {\mathbb {Z}}^{n}}||{\hat{a}}(b)-a||_{Q^{\circ }_{{\hat{a}}(b){\hat{a}}(b)}}^{2}\\ (b)\;\;{\mathcal {D}}^{\bullet }(b)&{}=&{}||{\hat{b}}-b||_{Q_{{\hat{b}}{\hat{b}}}}^{2}+ ||{\hat{a}}(b)-{\mathcal {I}}^{\bullet }({\hat{a}}(b))||_{Q_{{\hat{a}}(b){\hat{a}}(b)}}^{2}\\ \end{array} \end{aligned}$$
(26)

In the first type, we have replaced the conditional variance matrix \(Q_{{\hat{a}}(b){\hat{a}}(b)}\) by an approximation \(Q^{\circ }_{{\hat{a}}(b){\hat{a}}(b)}\), the idea being that the approximation will then allow for a simpler ambiguity minimization in (26a). For instance, when \(Q^{\circ }_{{\hat{a}}(b){\hat{a}}(b)}\) is chosen to be a diagonal matrix, the minimization in (26a) reduces to a straightforward componentwise integer rounding of \({\hat{a}}(b)\). In the second approximation type, we have replaced the integer ambiguity minimizer \({\check{a}}(b)=\arg \min \limits _{a \in {\mathbb {Z}}^{n}}||{\hat{a}}(b)-a||_{Q_{{\hat{a}}{\hat{a}}}}^{2}\) of \({\mathcal {D}}_{2}(b)\) (cf. 19) by an arbitrary admissible integer estimator \({\check{a}}^{\bullet }(b)={\mathcal {I}}^{\bullet }({\hat{a}}(b))\), \({\mathcal {I}}^{\bullet }:{\mathbb {R}}^{n} \mapsto {\mathbb {Z}}^{n}\). This second type will be studied in the next section.

To determine the properties of the baseline estimator \({\check{b}}^{\circ }=\arg \min \limits _{b \in {\mathbb {R}}^{p}}{\mathcal {D}}^{\circ }(b)\) and its corresponding integer ambiguity estimator, we again make use of the correspondence between the primal and dual formulations. The quadratic form identity, as provided by the following Lemma, forms the basis for establishing this correspondence.

Lemma 4

Let the conditional ambiguity variance matrix \(Q_{{\hat{a}}(b){\hat{a}}(b)}\) in (16) be replaced by \(Q^{\circ }_{{\hat{a}}(b){\hat{a}}(b)}\). Then

$$\begin{aligned} \begin{array}{c} ||{\hat{b}}-b||_{Q_{{\hat{b}}{\hat{b}}}}^{2}+||{\hat{a}}(b)-a||_{Q^{\circ }_{{\hat{a}}(b){\hat{a}} (b)}}^{2}\\ \quad =||{\hat{a}}-a||_{Q^{\circ }_{{\hat{a}}{\hat{a}}}}^{2}+||{\hat{b}}^{\circ }(a)-b||_{Q^{\circ }_{{\hat{b}}(a){\hat{b}}(a)}}^{2} \end{array} \end{aligned}$$
(27)

with

$$\begin{aligned} \begin{array}{lcl} Q^{\circ }_{{\hat{a}}{\hat{a}}} &{}=&{} Q^{\circ }_{{\hat{a}}(b){\hat{a}}(b)}+Q_{{\hat{a}}{\hat{b}}}Q_{{\hat{b}}{\hat{b}}}^{-1}Q_{{\hat{b}}{\hat{a}}}\\ {\hat{b}}^{\circ }(a) &{}=&{} {\hat{b}}-Q_{{\hat{b}}{\hat{a}}}Q^{\circ -1}_{{\hat{a}}{\hat{a}}}({\hat{a}}-a)\\ Q^{\circ }_{{\hat{b}}(a){\hat{b}}(a)}&{}=&{} Q_{{\hat{b}}{\hat{b}}}-Q_{{\hat{b}}{\hat{a}}}Q^{\circ -1}_{{\hat{a}}{\hat{a}}}Q_{{\hat{a}}{\hat{b}}} \end{array} \end{aligned}$$
(28)

Proof

see Appendix. \(\square \)

This result shows that replacing the variance matrix \(Q_{{\hat{a}}(b){\hat{a}}(b)}\) in \({\mathcal {D}}(b)\) (cf. 19) by \(Q^{\circ }_{{\hat{a}}(b){\hat{a}}(b)}\) provides an objective function of the type (6). It therefore again establishes a primal-dual equivalence, but now one that is driven by the approximate dual function \({\mathcal {D}}^{\circ }(b)\). Note that the single replacement \(Q_{{\hat{a}}(b){\hat{a}}(b)} \rightarrow Q^{\circ }_{{\hat{a}}(b){\hat{a}}(b)}\) resulted in three changes of the primal formulation: \(Q_{{\hat{a}}{\hat{a}}} \rightarrow Q^{\circ }_{{\hat{a}}{\hat{a}}}\), \({\hat{b}}(a) \rightarrow {\hat{b}}^{\circ }(a)\), and \(Q_{{\hat{b}}(a){\hat{b}}(a)} \rightarrow Q^{\circ }_{{\hat{b}}(a){\hat{b}}(a)}\). These changes will therefore also drive the properties of the corresponding baseline- and integer ambiguity estimators. Using the quadratic identity (27), the following equivalence for the minimizer of \({\mathcal {D}}^{\circ }(b)\) can be established.

Theorem 3

Let the approximate dual be given as

$$\begin{aligned} {\mathcal {D}}^{\circ }(b)=||{\hat{b}}-b||_{Q_{{\hat{b}}{\hat{b}}}}^{2}+||{\hat{a}}(b)-{\check{a}}^{\circ }(b)||_{Q^{\circ }_{{\hat{a}}(b){\hat{a}}(b)}}^{2} \end{aligned}$$
(29)

with \({\check{a}}^{\circ }(b)=\arg \min \limits _{a \in {\mathbb {Z}}^{n}}||{\hat{a}}(b)-a||_{Q^{\circ }_{{\hat{a}}(b){\hat{a}}(b)}}^{2}\). Then the corresponding primal is \({\mathcal {P}}^{\circ }(a) = ||{\hat{a}}-a||_{Q^{\circ }_{{\hat{a}}{\hat{a}}}}^{2}\) and the minimizer \({\check{b}}^{\circ }\) of \({\mathcal {D}}^{\circ }(b)\), with corresponding integer ambiguity solution \({\check{a}}^{\circ }\), satisfies the primal-dual equivalence,

$$\begin{aligned} \left. \begin{array}{lcl} {\check{a}}^{\circ }&{}=&{}\arg \min \limits _{a \in {\mathbb {Z}}^{n}}{\mathcal {P}}^{\circ }(a)\\ {\check{b}}^{\circ }&{}=&{}{\hat{b}}^{\circ }({\check{a}}^{\circ }) \end{array} \right\} \Leftrightarrow \left\{ \begin{array}{lcl} {\check{b}}^{\circ }&{}=&{}\arg \min \limits _{b \in {\mathbb {R}}^{p}}{\mathcal {D}}^{\circ }(b)\\ {\check{a}}^{\circ }&{}=&{}{\check{a}}^{\circ }({\check{b}}^{\circ }) \end{array} \right. \end{aligned}$$
(30)

Proof

see Appendix. \(\square \)

This equivalence can now be used to apply available theory for the primal formulation to determine the distributional properties of the estimators \({\check{a}}^{\circ }\) and \({\check{b}}^{\circ }\). It should hereby be noted, however, although \({\check{a}}^{\circ }\), like \({\check{a}}\), is still computed as the solution of an ILS-problem having \({\hat{a}}\) as its input, the weight matrix used is now different, \(Q^{\circ -1}_{{\hat{a}}{\hat{a}}}\) instead of \(Q^{-1}_{{\hat{a}}{\hat{a}}}\). Also note, although \({\hat{b}}^{\circ }(a)\) has the same structure as \({\hat{b}}(a)\), that \({\hat{a}}\) is now not independent of \({\hat{b}}^{\circ }(a)\). The matrix \(Q^{\circ }_{{\hat{b}}(a){\hat{b}}(a)}\) of (28) is therefore not the variance matrix of \({\hat{b}}^{\circ }(a)\). We have the following distributional result.

Theorem 4

(Distributions of \({\check{a}}^{\circ }\)and \({\check{b}}^{\circ }\)) With the PDF of \({\hat{a}} \sim {\mathcal {N}}_{n}(a, Q_{{\hat{a}}{\hat{a}}})\) denoted as \(f_{{\hat{a}}}(\alpha )\), the PMF of the ambiguity estimator \({\check{a}}^{\circ }\) is given as

$$\begin{aligned} \begin{array}{lcl} {\check{a}}^{\circ } &{}\sim &{} {\textsf{P}}[{\check{a}}^{\circ }=z]=\int _{{\mathscr {P}}^{\circ }_{z}}f_{{\hat{a}}}(\alpha )d\alpha ,\;\textrm{with}\\ &{} &{} {\textsf{P}}[{\check{a}}^{\circ }=a] \le {\textsf{P}}[{\check{a}}=a] \end{array} \end{aligned}$$
(31)

and the limiting PDF of the baseline estimator \({\check{b}}^{\circ }\) is given as

$$\begin{aligned} \begin{array}{lcl} \lim \limits _{{\textsf{P}}[{\check{a}}^{\circ }=a] \uparrow 1} {\check{b}}^{\circ } &{}\sim &{} {\mathcal {N}}_{p}(b, Q_{{\hat{b}}^{\circ }(a){\hat{b}}^{\circ }(a)}),\;\textrm{with}\\ Q_{{\hat{b}}^{\circ }(a){\hat{b}}^{\circ }(a)}&{}=&{}Q_{{\hat{b}}(a){\hat{b}}(a)}+T_{{\hat{b}}{\hat{a}}}Q_{{\hat{a}}{\hat{a}}}T_{{\hat{b}}{\hat{a}}}^{T} \ge Q_{{\hat{b}}(a){\hat{b}}(a)} \end{array} \end{aligned}$$
(32)

where \({\mathscr {P}}^{\circ }_{z}=\left\{ x \in {\mathbb {R}}^{n}|\; ||x-z||_{Q^{\circ }_{{\hat{a}}{\hat{a}}}}^{2} \le ||x-u||_{Q^{\circ }_{{\hat{a}}{\hat{a}}}}^{2}, \forall u \in {\mathbb {Z}}^{n}\right\} \) and \(T_{{\hat{b}}{\hat{a}}}=Q_{{\hat{b}}{\hat{a}}}\left[ Q_{{\hat{a}}{\hat{a}}}^{-1}-Q^{\circ -1}_{{\hat{a}}{\hat{a}}}\right] \). \(\blacksquare \)

Proof

See Appendix. \(\square \)

The above result shows that replacing \(Q_{{\hat{a}}(b){\hat{a}}(b)}\) by \(Q^{\circ }_{{\hat{a}}(b){\hat{a}}(b)}\) will always degrade the performance of the associated estimators. It will give a smaller ambiguity success-rate (cf. 31), as well as a poorer precision of the ambiguity-fixed baseline (cf. 32). Still, depending on the choice made for \(Q^{\circ }_{{\hat{a}}(b){\hat{a}}(b)}\), the degradation could be acceptably small, depending on the application.

To evaluate the success-rate \({\textsf{P}}[{\check{a}}^{\circ }=a]\), the multivariate integral of (31) needs to be computed. This is a nontrivial numerical task due to the geometric complexity of the pull-in region \({\mathscr {P}}^{\circ }_{a}\), over which the integration needs to be carried out. One approach is to rely on simulation, whereby the Ps-LAMBDA simulation tools of (Verhagen et al. 2013) can be used. Note hereby, that the success-rate \({\textsf{P}}[{\check{a}}^{\circ }=a]\) is driven by both \(Q_{{\hat{a}}{\hat{a}}}\) and \(Q^{\circ }_{{\hat{a}}{\hat{a}}}\), i.e. by the ambiguity variance matrix that determines \(f_{{\hat{a}}}(\alpha )\) and by its approximation that determines the pull-in region \({\mathscr {P}}^{\circ }_{a}\).

The following is such example where the two success-rates \({\textsf{P}}[{\check{a}}=a]\) and \({\textsf{P}}[{\check{a}}^{\circ }=a]\) are compared.

Example 2

Consider the GNSS model of Example 1 (cf. 24) and assume that in solving the dual problem we approximate the fully populated variance matrix \(Q_{{\hat{a}}(b){\hat{a}}(b)}=\frac{\sigma _{\phi }^{2}}{\lambda }D^{T}D\) with the diagonal matrix \(Q^{\circ }_{{\hat{a}}(b){\hat{a}}(b)}=2\frac{\sigma _{\phi }^{2}}{\lambda }I_{m}\). Then, with \(Q_{{\hat{a}}{\hat{b}}}Q_{{\hat{b}}{\hat{b}}}^{-1}Q_{{\hat{b}}{\hat{a}}}= \tfrac{\sigma _{p}^{2}}{\lambda ^{2}}D^{T}G\left[ G^{T}P_{D}G\right] ^{-1}G^{T}D\) and \(P_{D}=D(D^{T}D)^{-1}D^{T}\), we get

$$\begin{aligned} Q^{\circ }_{{\hat{a}}{\hat{a}}}= \tfrac{1}{\lambda ^{2}}\left( 2\sigma _{\phi }^{2}I_{m}+\sigma _{p}^{2}D^{T}G\left[ G^{T}P_{D}G\right] ^{-1}G^{T}D\right) \end{aligned}$$
(33)

With reference to Theorem 4 (cf. 31), Fig. 2 compares the two success-rates \({\textsf{P}}[{\check{a}}=a]\) and \({\textsf{P}}[{\check{a}}^{\circ }=a]\), based on (33), for a case of single-epoch, single-frequency L1 GPS, using a \(10^{-4}\) phase-code variance ratio. It shows indeed that \({\textsf{P}}[{\check{a}}^{\circ }=a] \le {\textsf{P}}[{\check{a}}=a]\), but also that \({\textsf{P}}[{\check{a}}^{\circ }=a]\) can still be acceptably large for some measurement scenarios. \(\square \)

Fig. 2
figure 2

Comparison of \({\textsf{P}}[{\check{a}}^{\circ }=a]\) with \({\textsf{P}}[{\check{a}}=a]\) as described in Example 2, based on the GNSS model of Example 1. The plot on the right side shows a zoom-in of the left one

Instead of simulation, success-rate bounds may sometimes be used as an alternative. Upper-bounds are then useful to identify when successful ambiguity resolution would be problematic, while lower-bounds are useful to identify when to expect successful ambiguity resolution. As upper-bound of \({\textsf{P}}[{\check{a}}^{\circ }=a]\), one may directly use the ILS success-rate \({\textsf{P}}[{\check{a}}=a]\) (cf. 31), or alternatively, any of its simpler to compute upper-bounds given in (Teunissen 2000a; Verhagen et al. 2013). The following Theorem provides two lower-bounds on the ambiguity success-rate of \({\check{a}}^{\circ }\).

Theorem 5

(Success-rate lower-bounds) Let \({\hat{a}} \sim {\mathcal {N}}_{n}(a,Q_{{\hat{a}}{\hat{a}}})\) and \({\check{a}}^{\circ } = \arg \min \limits _{a \in {\mathbb {Z}}^{n}}||{\hat{a}}-a||_{Q^{\circ }_{{\hat{a}}{\hat{a}}}}^{2}\). Then the success-rate of \({\check{a}}^{\circ }\) can be lower-bounded as follows:

  1. (i)

    If \(Q^{\circ }_{{\hat{a}}{\hat{a}}}\ge Q_{{\hat{a}}{\hat{a}}}\), then

    $$\begin{aligned} \prod _{i=1}^{n}[2\varPhi \left( \tfrac{1}{2 \sqrt{d_{i}}}\right) -1] \le {\textsf{P}}[{\check{a}}^{\circ }=a] \end{aligned}$$
    (34)

    where \(D=\textrm{diag}(d_{1}, \ldots , d_{n})\) is the diagonal matrix of the triangular decomposition \(Q^{\circ }_{{\hat{a}}{\hat{a}}}=LDL^{T}\) and \(\varPhi (x)=\int _{-\infty }^{x}\tfrac{1}{\sqrt{2 \pi }}\exp ( -\tfrac{1}{2}v^{2}) dv\).

  2. (ii)

    For any \(Q^{\circ }_{{\hat{a}}{\hat{a}}}>0\),

    $$\begin{aligned} {\textsf{P}}[\chi ^{2}(0,n) \le r^{2}] \le {\textsf{P}}[{\check{a}}^{\circ }=a] \end{aligned}$$
    (35)

    with

    $$\begin{aligned} \left\{ \begin{array}{lcl} r^{2} &{}=&{} \lambda _{\textrm{min}} \times \tfrac{1}{4} \min \limits _{z \in {\mathbb {Z}}^{n}/\{0\}}||z||_{Q^{\circ }_{{\hat{a}}{\hat{a}}}}^{2}\\ \lambda _{\textrm{min}} &{}=&{} \min \limits _{x \in {\mathbb {R}}^{n}} \frac{x^{T}Q^{\circ }_{{\hat{a}}{\hat{a}}}x}{x^{T}Q_{{\hat{a}}{\hat{a}}}x} \end{array} \right. \end{aligned}$$
    (36)

Proof

see Appendix \(\square \)

Note that lower-bound (34) is somewhat easier to compute than (35). It requires however that \(Q^{\circ }_{{\hat{a}}{\hat{a}}}\ge Q_{{\hat{a}}{\hat{a}}}\), while no such restriction is placed on the lower-bound (35). Also note, although both lower-bounds are here presented in the context of the primal-dual formulations, that they are in fact success-rate lower-bounds of improperly weighted ILS-estimators, i.e. ILS-estimators that not use the inverse variance-matrix as their weight-matrix. They can therefore also be used more generally for studying the impact misspecifications in the stochastic model have on the success-rate.

5 Dual with approximate integer map

In this section we consider the second approximation of the dual function, \({\mathcal {D}}^{\bullet }(b)\) (cf. 26), and determine the statistical properties of its minimizer.

5.1 The minimizer of \({\mathcal {D}}^{\bullet }(b)\)

The approximation \({\mathcal {D}}^{\bullet }(b)\) of \({\mathcal {D}}(b)\) is a result of replacing the integer vector \({\check{a}}(b)= \arg \min \limits _{z \in {\mathbb {Z}}^{n}}||{\hat{a}}(b)-z||_{Q_{{\hat{a}}(b){\hat{a}}(b)}}^{2}\) in \({\mathcal {D}}(b)\) by the integer vector \({\mathcal {I}}^{\bullet }({\hat{a}}(b))\), for which \({\mathcal {I}}^{\bullet }: {\mathbb {R}}^{n} \mapsto {\mathbb {Z}}^{n}\) may be chosen as any member from the class of admissible integer estimators, such as, for instance, integer rounding (IR), integer bootstrapping (IB), integer least-squares (ILS) or vectorial integer bootstrapping (VIB) (Teunissen et al. 2021). The following theorem provides the solution of minimizing \({\mathcal {D}}^{\bullet }(b)\).

Theorem 6

(Minimizer of \({\mathcal {D}}^{\bullet }(b)\)) Let \({\mathcal {I}}^{\bullet }: {\mathbb {R}}^{n} \mapsto {\mathbb {Z}}^{n}\) be any admissible integer estimator. Then the minimizer \({\check{b}}^{\bullet }= \arg \min \limits _{b \in {\mathbb {R}}^{p}}{\mathcal {D}}^{\bullet }(b)\) of the approximate dual

$$\begin{aligned} {\mathcal {D}}^{\bullet }(b)=||{\hat{b}}-b||_{Q_{{\hat{b}}{\hat{b}}}}^{2}+ ||{\hat{a}}(b)-{\mathcal {I}}^{\bullet }({\hat{a}}(b))||_{Q_{{\hat{a}}(b){\hat{a}}(b)}}^{2} \end{aligned}$$
(37)

is given as

$$\begin{aligned} {\check{b}}^{\bullet }= {\hat{b}}({\check{a}}^{\bullet }), \; \textrm{with}\;\; {\check{a}}^{\bullet }= \arg \min \limits _{z \in \varOmega ^{\bullet }_{{\hat{a}}}}||{\hat{a}}-z||_{Q_{{\hat{a}}{\hat{a}}}}^{2} \end{aligned}$$
(38)

where

$$\begin{aligned} \varOmega ^{\bullet }_{{\hat{a}}} = \{z \in {\mathbb {Z}}^{n} |\; z ={\mathcal {I}}^{\bullet }({\hat{a}}+M\beta ), \forall \beta \in {\mathbb {R}}^{p} \} \end{aligned}$$
(39)

with \(M=Q_{{\hat{a}}{\hat{b}}}Q_{{\hat{b}}{\hat{b}}}^{-1} \in {\mathbb {R}}^{n \times p}\).

Proof

see Appendix. \(\square \)

Fig. 3
figure 3

(Top) The integer set \(\varOmega ^{\bullet }_{{\hat{a}}}=\{ z \in {\mathbb {Z}}^{n} |\; z={\mathcal {I}}^{\bullet }({\hat{a}}+M\beta ), \forall \beta \in {\mathbb {R}}^{p}\}\) for \(n=2, p=1\), and \({\mathcal {I}}(.)=\lceil . \rfloor \); (Centre) The real-valued set \({\bar{\varOmega }}^{\bullet }_{z} = \{ x \in {\mathbb {R}}^{n}|\; z={\mathcal {I}}^{\bullet }(x+M\beta ), \exists \beta \in {\mathbb {R}}^{p}\}\) for \(n=2, p=1\); (Bottom) The integer set \(\varPhi ^{\bullet }_{z}= \{ u \in {\mathbb {Z}}^{n}|\; u={\mathcal {I}}^{\bullet }(x+M\beta ), x \in {\bar{\varOmega }}^{\bullet }_{z}, \forall \beta \in {\mathbb {R}}^{p}\}\) for \(n=2, p=1\)

Note, in contrast to the baseline estimator \({\check{b}}^{\circ }\) (cf. 30), that the baseline estimator \({\check{b}}^{\bullet }\) (cf. 38) is based, like the estimator \({\check{b}}\) (cf. 4), on the conditional LS baseline mapping \({\hat{b}}(a)\). However, \({\check{b}}\) and \({\check{b}}^{\bullet }\) make use of different integer ambiguity estimators in general. Although the integer estimator \({\check{a}}^{\bullet }\) (cf. 38) has the appearance of a standard ILS-estimator, it is generally not, unless of course \({\mathcal {I}}^{\bullet }(x)\) is chosen as \({\mathcal {I}}^{\bullet }(x)=\arg \min \limits _{z \in {\mathbb {Z}}^{n}}||x-z||_{Q_{{\hat{a}}(b){\hat{a}}(b)}}^{2}\), in which case \({\check{a}}^{\bullet } = {\check{a}}\), since then \({\mathcal {D}}^{\bullet }(b)={\mathcal {D}}(b)\).

The difference between the two integer estimators \({\check{a}}\) and \({\check{a}}^{\bullet }\) is driven by the characteristics of the integer set \(\varOmega _{{\hat{a}}}^{\bullet }\) (cf. 39), which on its turn is driven by the \(n \times p\) matrix M in

$$\begin{aligned} {\hat{a}}(\beta )={\hat{a}}+M(\beta -{\hat{b}}) \end{aligned}$$
(40)

The following three cases can be discriminated:

(1):

\({\hat{a}}(\beta )\equiv {\hat{a}}\) if \(M=0\), i.e. \(Q_{{\hat{a}}{\hat{b}}}=0\) or \(A^{T}Q_{yy}^{-1}B=0\)

(2):

\({\hat{a}}(\beta )\) describes a linear manifold if \(\textrm{rank}(M) <n\)

(3):

\({\hat{a}}(\beta )\) covers the whole of \({\mathbb {R}}^{n}\) if \(\textrm{rank}(M)=n\)

Case (1) happens if the estimators \({\hat{a}}\) and \({\hat{b}}\) are uncorrelated. As the integer set (39) reduces then to the single integer vector \(\varOmega ^{\bullet }_{{\hat{a}}}=\{ z={\mathcal {I}}^{\bullet }({\hat{a}})\}\), we have in that case \({\check{a}}^{\bullet }={\mathcal {I}}^{\bullet }({\hat{a}})\). In case (2), the subset \(\varOmega ^{\bullet }_{{\hat{a}}}\) contains all integer vectors to which \({\mathcal {I}}^{\bullet }(x)\) is mapped when x varies along the \(\textrm{rank}(M)\)-dimensional linear manifold (40). This is shown in Fig. 3(Top) when \({\mathcal {I}}^{\bullet }(x)\) represents integer rounding and \(n=2, p=1\), in which case the pull-in regions of \({\mathcal {I}}^{\bullet }(x)\) are unit-squares centred at integer grid points. As \(\varOmega ^{\bullet }_{{\hat{a}}} \ne {\mathbb {Z}}^{n}\), we have \({\check{a}}^{\bullet } \ne {\check{a}}\) in case (2). In case (3), the invertibility of matrix M implies that the whole \({\mathbb {R}}^{n}\) is integer-mapped by \({\mathcal {I}}^{\bullet }(x)\), thus giving \(\varOmega ^{\bullet }_{{\hat{a}}}={\mathbb {Z}}^{n}\). This shows that \({\check{a}}^{\bullet }={\check{a}}\) in case (3).

Note, as \(\textrm{rank}(M)\le \min (n,p)\), that case (3) can only happen if \(p \ge n\). Hence, since \(p<n\) in most GNSS models, the equality of the two estimators, \({\check{a}}^{\bullet }\) and \({\check{a}}\), is very unlikely in case of GNSS. Although the two estimators \({\check{a}}^{\bullet }\) and \({\check{a}}\)  are then generally different, their integer sample outcomes can, of course, sometimes be the same. This happens when the outcome of \({\check{a}}\) lies in \(\varOmega ^{\bullet }_{{\hat{a}}}\).

5.2 A qualitative comparison of \({\check{a}}^{\bullet }\) and \({\check{a}}\)

We now compare the two integer estimators for case (2), i.e. when \({\hat{a}}(\beta )\) describes a linear manifold of dimension \(\textrm{rank}(M)<n\) and \({\check{a}}^{\bullet }\ne {\check{a}}\).

To aid the comparison between \({\check{a}}^{\bullet }\) and \({\check{a}}\), we first introduce the ambiguity search space

$$\begin{aligned} E_{{\hat{a}}} =\left\{ z \in {\mathbb {Z}}^{n}|\; ||{\hat{a}}-z||_{Q_{{\hat{a}}{\hat{a}}}}^{2} \le \chi ^{2}\right\} \end{aligned}$$
(41)

where \(\chi ^{2}\) is assumed chosen such that \({\check{a}} \in E_{{\hat{a}}}\) (note: for any integer \(z_{0} \in {\mathbb {Z}}^{n}\), e.g. \(z_{0}=\lceil {\hat{a}} \rfloor \), the value \(\chi ^{2}=||{\hat{a}}-z_{0}||_{Q_{{\hat{a}}{\hat{a}}}}^{2}\) satisfies this assumption). With the help of \(E_{{\hat{a}}}\) we may write \({\check{a}}=\arg \min \limits _{u \in {\mathbb {Z}}^{n}} ||{\hat{a}}-u||_{Q_{{\hat{a}}{\hat{a}}}}^{2}\) in a similar form as that of \({\check{a}}^{\bullet }\). We therefore have

$$\begin{aligned} \begin{array}{lcl} {\check{a}}&{}=&{}\arg \min \limits _{u \in E_{{\hat{a}}}} ||{\hat{a}}-u||_{Q_{{\hat{a}}{\hat{a}}}}^{2}\\ {\check{a}}^{\bullet }&{}=&{}\arg \min \limits _{u \in \varOmega ^{\bullet }_{{\hat{a}}}} ||{\hat{a}}-u||_{Q_{{\hat{a}}{\hat{a}}}}^{2} \end{array} \end{aligned}$$
(42)

which shows that the two estimators can be compared by comparing their respective search spaces, \(E_{{\hat{a}}}\) vs \(\varOmega ^{\bullet }_{{\hat{a}}}\). For \(\varOmega ^{\bullet }_{{\hat{a}}}\) we have

$$\begin{aligned} \varOmega ^{\bullet }_{{\hat{a}}} =\left\{ z \in {\mathbb {Z}}^{n}|\; z={\mathcal {I}}^{\bullet }({\hat{a}}+M\beta ), \forall \beta \in {\mathbb {R}}^{p}\right\} \end{aligned}$$
(43)

For \(E_{{\hat{a}}}\) we may write, with the help of \(M=Q_{{\hat{a}}{\hat{b}}}Q_{{\hat{b}}{\hat{b}}}^{-1}\) and \(Q_{{\hat{a}}{\hat{a}}}=Q_{{\hat{a}}(b){\hat{a}}(b)}+MQ_{{\hat{b}}{\hat{b}}}M^{T}\),

$$\begin{aligned} E_{{\hat{a}}}{} & {} =\left\{ z \in {\mathbb {Z}}^{n}|\; ({\hat{a}}-z)^{T}\left[ Q_{{\hat{a}}(b){\hat{a}}(b)}+MQ_{{\hat{b}}{\hat{b}}}M^{T}\right] ^{-1}\right. \nonumber \\{} & {} \left. \quad ({\hat{a}}-z) \le \chi ^{2}\right\} \end{aligned}$$
(44)

We can now compare the two integer sets (43) and (44). We will do so, for two extreme cases.

$$\begin{aligned} \textrm{Case}\;(a): Q_{{\hat{a}}(b){\hat{a}}(b)}=\textrm{small}\;\;\textrm{and}\;\;Q_{{\hat{b}}{\hat{b}}}=\textrm{large} \end{aligned}$$
(45)

This is the typical ‘GNSS case’, in particular for instantaneous positioning. As the very precise carrier phase data do not contribute to the determination of b in case of a single epoch, the precision of \({\hat{b}}\) is solely driven by the noisy pseudorange data and \(Q_{{\hat{b}}{\hat{b}}}=\textrm{large}\). Would b be known, then it are the very precise carrier phase data that predominantly determine the ambiguities and \(Q_{{\hat{a}}(b){\hat{a}}(b)}=\textrm{small}\).

When (45) is true, the ellipsoidal search space \(E_{{\hat{a}}}\) (cf. 44) will have an extreme elongation in the directions of the range space of M and therefore closely resemble the integer set \(\varOmega ^{\bullet }_{{\hat{a}}}\), which, afterall, is constructed from integer mapping the points of the linear manifold \({\hat{a}}+M\beta \). Under case (a) one can therefore expect the two estimators to be not too different, i.e. sample values of the ILS-estimator \({\check{a}} \in E_{{\hat{a}}}\) will not rarely be inside \(\varOmega ^{\bullet }_{{\hat{a}}}\) as well. We hereby note that the GNSS-typical extreme elongation of \(E_{{\hat{a}}}\) results in integer search-halting when solving for \({\check{a}}\). Resolving this bottleneck was the motivation for developing LAMBDA. By means of its decorrelating Z-transformation, the discontinuity in the spectrum of sequential conditional ambiguity variances is largely removed and search-halting avoided, see (Teunissen 1995).

$$\begin{aligned} \textrm{Case}\;(b): Q_{{\hat{a}}(b){\hat{a}}(b)}=\textrm{large}\;\;\textrm{and}\;\;Q_{{\hat{b}}{\hat{b}}}=\textrm{small} \end{aligned}$$
(46)

Now we have a different situation, which in GNSS-terminology could be described as having ultra-precise pseudorange data and very poor carrier-phase data. In such case the shape of the ellipsoidal search space \(E_{{\hat{a}}}\) is primarily driven by \(Q_{{\hat{a}}(b){\hat{a}}(b)}\), implying that its shape will now generally not be aligned with \(\varOmega ^{\bullet }_{{\hat{a}}}\). And this will even be more so if the ellipsoidal search space would have its elongation orthogonal to the range space of M. In this case one would expect the two estimators, \({\check{a}}^{\bullet }\) and \({\check{a}}\), to have different performances, i.e. sample values of the ILS-estimator \({\check{a}} \in E_{{\hat{a}}}\) are then not likely to reside inside \(\varOmega ^{\bullet }_{{\hat{a}}}\) as well.

Fig. 4
figure 4

For \(p=1\), \(n=2\), the pull-in regions \({\mathscr {P}}^{\bullet }_{z}\) of the integer ambiguity estimator \({\check{a}}^{\bullet }\) are shown, i.e. the regions where float ambiguity values are mapped to the same integer vector \(z \in {\mathbb {Z}}^{n}\). In green colour, the \({\mathscr {P}}^{\bullet }_{0}\) region is depicted for \(z=0\), surrounded by similar regions (in grey) for \(z \ne 0\). The top plot shows a comparison with ILS pull-in regions, with the integer set \(\varOmega _{0}^{\bullet }\) depicted through encircled gridpoints; the bottom plot shows a zoom-in \({\mathscr {P}}^{\bullet }_{0}\) of \({\check{a}}^{\bullet }\) together with the (lower) bounding ellipse \(E_{0}\) (cf. proof of Theorem 7)

5.3 The pull-in regions of \({\check{a}}^{\bullet }\)

In order to study the statistical properties of \({\check{a}}^{\bullet }\) and \({\check{b}}^{\bullet }\), it is useful to first determine the pull-in regions of \({\check{a}}^{\bullet }\). As the pull-in region \({\mathscr {P}}^{\bullet }_{z}\) of \({\check{a}}^{\bullet }\) is the region in which the float solution \({\hat{a}}\) gets mapped to \(z \in {\mathbb {Z}}^{n}\), we have

$$\begin{aligned} {\mathscr {P}}^{\bullet }_{z} =\left\{ x \in {\mathbb {R}}^{n}|\; z= \arg \min \limits _{u \in \varOmega ^{\bullet }_{x}}||x-u||_{Q_{{\hat{a}}{\hat{a}}}}^{2}\right\} \end{aligned}$$
(47)

To further characterize this region, we recognize that the choice of \(z \in {\mathbb {Z}}^{n}\), i.e. the integer-vector for which the pull-in region is described, already constrains the values of \(x \in {\mathbb {R}}^{n}\) to a subset. As z has to lie in \(\varOmega ^{\bullet }_{x}\), the choice of z implies the following subset for the values of x,

$$\begin{aligned} {\bar{\varOmega }}^{\bullet }_{z} =\left\{ x \in {\mathbb {R}}^{n}|\; z={\mathcal {I}}^{\bullet }(x+M\beta ), \exists \beta \in {\mathbb {R}}^{p}\right\} \end{aligned}$$
(48)

This is the set of x-values for which a \(\beta \in {\mathbb {R}}^{p}\) exists, such that \(x+M\beta \) gets mapped by \({\mathcal {I}}^{\bullet }(.)\) to z, see Fig. 3(Middle).

With the help of the region \({\bar{\varOmega }}^{\bullet }_{z}\) we can now characterize the whole integer set that is in play in the minimization of (47). As \({\mathscr {P}}^{\bullet }_{z}\) is characterized by the minimization of \(||x-u||_{Q_{{\hat{a}}{\hat{a}}}}^{2}\) over the integer subset \(\varOmega ^{\bullet }_{x}\), while, at the same time, this is constrained to all \(x \in {\bar{\varOmega }}^{\bullet }_{z}\), the integer set considered is actually

$$\begin{aligned} \varPhi ^{\bullet }_{z}= \left\{ u \in {\mathbb {Z}}^{n}|\; u={\mathcal {I}}^{\bullet }(x), \forall x \in {\bar{\varOmega }}^{\bullet }_{z}\right\} \end{aligned}$$
(49)

It consists of all integer vectors to which the elements of \({\bar{\varOmega }}^{\bullet }_{z}\) are mapped by \({\mathcal {I}}^{\bullet }(.)\), see Fig. 3(Bottom).

With the above three constructed sets, \(\varOmega ^{\bullet }_{x} \subset {\mathbb {Z}}^{n}\), \({\bar{\varOmega }}^{\bullet }_{z} \subset {\mathbb {R}}^{n}\), and \(\varPhi ^{\bullet }_{z} \subset {\mathbb {Z}}^{n}\), we have the following three representations of the pull-in regions of \({\check{a}}^{\bullet }\).

Lemma 5

(Representations of pull-in region \({\mathscr {P}}^{\bullet }_{z}\)) The pull-in regions \({\mathscr {P}}^{\bullet }_{z}\), \(z \in {\mathbb {Z}}^{n}\), of \({\check{a}}^{\bullet }\) can be represented, with

$$\begin{aligned} \begin{array}{lcl} \varOmega ^{\bullet }_{x} &{}=&{}\left\{ z \in {\mathbb {Z}}^{n} |\; z ={\mathcal {I}}^{\bullet }(x+M\beta ), \forall \beta \in {\mathbb {R}}^{p} \right\} \\ {\bar{\varOmega }}^{\bullet }_{z} &{}=&{}\left\{ x \in {\mathbb {R}}^{n}|\; z={\mathcal {I}}^{\bullet }(x+M\beta ), \exists \beta \in {\mathbb {R}}^{p}\right\} \\ \varPhi ^{\bullet }_{z} &{}=&{}\left\{ u \in {\mathbb {Z}}^{n}|\; u={\mathcal {I}}^{\bullet }(x), \forall x \in {\bar{\varOmega }}^{\bullet }_{z}\right\} \end{array} \end{aligned}$$

as

$$\begin{aligned} \begin{array}{lcl} {\mathscr {P}}^{\bullet }_{z} &{}=&{}\left\{ x \in {\mathbb {R}}^{n}|\; z= \arg \min \limits _{u \in \varOmega ^{\bullet }_{x}}||x-u||_{Q_{{\hat{a}}{\hat{a}}}}^{2}\right\} \\ &{}=&{} \left\{ x \!\in \! {\mathbb {R}}^{n}|\;||x-z||_{Q_{{\hat{a}}{\hat{a}}}}^{2}\!\le \! ||x-{\mathcal {I}}^{\bullet }(y)||_{Q_{{\hat{a}}{\hat{a}}}}^{2},\! \forall y \!\in \! {\bar{\varOmega }}^{\bullet }_{z}\right\} \\ &{}=&{}\left\{ x \in {\mathbb {R}}^{n}|\; ||x-z||_{Q_{{\hat{a}}{\hat{a}}}}^{2}\le ||x-u||_{Q_{{\hat{a}}{\hat{a}}}}^{2}, \forall u \in \varPhi ^{\bullet }_{z}\right\} \end{array}\nonumber \\ \end{aligned}$$
(50)

\(\blacksquare \)

We can now use these pull-in representations to show that \({\check{a}}^{\bullet }\) is an admissible integer estimator. Recall that an integer estimator is said to be admissible if its pull-in regions are translational invariant and cover the whole space \({\mathbb {R}}^{n}\) without gaps and overlaps (Teunissen 2002).

Lemma 6

(Admissible integer estimator \({\check{a}}^{\bullet }\)) The integer estimator \({\check{a}}^{\bullet }\) is admissible as its pull-in regions satisfy

$$\begin{aligned} \begin{array}{ll} (i) &{} \underset{z \in {\mathbb {Z}}^{n}}{\cup } {\mathscr {P}}^{\bullet }_{z} = {\mathbb {R}}^{n}\\ (ii)&{} {\mathscr {P}}^{\bullet }_{z_{1}}\cap {\mathscr {P}}^{\bullet }_{z_{2}} = \emptyset ,\;\forall z_{1}\ne z_{2}\\ (iii)&{} {\mathscr {P}}^{\bullet }_{z}={\mathscr {P}}^{\bullet }_{0}+z,\;\forall z \in {\mathbb {Z}}^{n} \end{array} \end{aligned}$$
(51)

Proof

see Appendix. \(\square \)

The admissibility property implies that if y is perturbed by Az to give \(y'=y+Az\), the ambiguity float solution changes from \({\hat{a}}={\bar{A}}^{+}y\) to \({\hat{a}}'={\bar{A}}^{+}(y+Az)={\hat{a}}+z\), and the integer ambiguity solution from \({\check{a}}^{\bullet }\) to \({\check{a}}'^{\bullet }={\check{a}}^{\bullet }+z\). Hence, this provides the pleasant property, that if one wants to work with managable numbers, one can subtract arbitrary integers from the ambiguity float solution and still get the correct integer solution by restoring the subtracted integer at the end, i.e. if \({\hat{a}}'={\hat{a}}-z\) then \({\check{a}}^{\bullet }={\check{a}}'^{\bullet }+z\).

Figure 4 shows, for \(n=2\) and \(p=1\), an example of the pull-in regions \({\mathscr {P}}^{\bullet }_{z}\) of the integer ambiguity estimator \({\check{a}}^{\bullet }\). The choice made for the integer map \({\mathcal {I}}^{\bullet }: {\mathbb {R}}^{n} \mapsto {\mathbb {Z}}^{n}\) is in this case integer-rounding, i.e. \({\mathcal {I}}^{\bullet }(x)= \lceil x \rfloor \). The encircled integer gridpoints constitute the integer set \(\varOmega ^{\bullet }_{x=0}\) (cf. Lemma 5) and the line through the origin has \(M=Q_{{\hat{a}}{\hat{b}}}Q_{{\hat{b}}{\hat{b}}}^{-1}\) as its direction vector. For comparison also the hexagonian ILS pull-in regions of \({\check{a}}\) are shown, thus illustrating the close overlap between the two types of pull-in regions.

Fig. 5
figure 5

The dual function \({\mathcal {D}}(b)\) for \(n=p=1\), \(\sigma _{{\hat{b}}}^{2}=2.2^2\), \(\sigma _{{\hat{a}}}^{2}=0.15^2\) and \(\rho _{{\hat{a}}{\hat{b}}}=0.5\) (left), \(\rho _{{\hat{a}}{\hat{b}}}=0.9\) (right)

5.4 Distributions of \({\check{a}}^{\bullet }\) and \({\check{b}}^{\bullet }\)

With the knowledge that the integer estimator \({\check{a}}^{\bullet }\) is admissible, we can now apply existing theory of Teunissen (1999b) to determine the distributions of \({\check{a}}^{\bullet }\) and \({\check{b}}^{\bullet }\).

Corollary (Distributions of \({\check{a}}^{\bullet }\)and \({\check{b}}^{\bullet }\)): Let \(f_{{\hat{a}}}(\alpha )\) be the PDF of \({\hat{a}} \sim {\mathcal {N}}_{n}(a, Q_{{\hat{a}}{\hat{a}}})\) and \(f_{{\hat{b}}(z)}(\beta )\) be the PDF of \({\hat{b}}(z) \sim {\mathcal {N}}_{p}(b-Q_{{\hat{b}}{\hat{a}}}Q_{{\hat{a}}{\hat{a}}}^{-1}(a-z), Q_{{\hat{b}}(a){\hat{b}}(a)})\). Then, as \({\check{a}}^{\bullet }\) is an admissible integer estimator and \({\check{b}}^{\bullet }={\hat{b}}({\check{a}}^{\bullet })\), their PMF and PDF follow from (Teunissen 1999b) as

$$\begin{aligned} \left\{ \begin{array}{lcl} {\textsf{P}}[{\check{a}}^{\bullet }=z] &{}=&{} \int _{{\mathscr {P}}^{\bullet }_{z}} f_{{\hat{a}}}(\alpha )d \alpha \\ f_{{\check{b}}^{\bullet }}(\beta )&{}=&{} \sum \limits _{z \in {\mathbb {Z}}^{n}} f_{{\hat{b}}(z)}(\beta ) {\textsf{P}}[{\check{a}}^{\bullet }=z] \end{array} \right. \end{aligned}$$
(52)

\(\blacksquare \)

This result shows that the distribution of the ambiguity resolved baseline \({\check{b}}^{\bullet }\) can be approximated well by the peaked PDF \({\mathcal {N}}_{p}(b, Q_{{\hat{b}}(a){\hat{b}}(a)})\) if the ambiguity success-rate \({\textsf{P}}[{\check{a}}^{\bullet }=a]\) is sufficiently close to one. Verification whether or not the success-rate is large enough can be done by simulation or by using the following lower-bound.

Theorem 7

(Lower-bound of \({\textsf{P}}[{\check{a}}^{\bullet }=a]\)) Let \({\hat{a}} \sim {\mathcal {N}}_{n}(a, Q_{{\hat{a}}{\hat{a}}})\) and \({\check{a}}^{\bullet } = \arg \min \limits _{z \in \varOmega ^{\bullet }_{{\hat{a}}}}||{\hat{a}}-z||_{Q_{{\hat{a}}{\hat{a}}}}^{2}\), with \(\varOmega ^{\bullet }_{{\hat{a}}}=\{u \in {\mathbb {Z}}^{n}|\; u={\mathcal {I}}^{\bullet }({\hat{a}}+Q_{{\hat{a}}{\hat{b}}}Q_{{\hat{b}}{\hat{b}}}^{-1}\beta ), \forall \beta \}\). Then

$$\begin{aligned} {\textsf{P}}\left[ \chi ^{2}(0,n) \le r^{2}\right] \le {\textsf{P}}[{\check{a}}^{\bullet }=a] \end{aligned}$$
(53)

with \(r^{2}=\tfrac{1}{4} \min \limits _{z \in {\mathbb {Z}}^{n}/\{0\}}||z||_{Q_{{\hat{a}}{\hat{a}}}}^{2}\).

Proof

see Appendix. \(\square \)

6 On the multimodality of \({\mathcal {D}}(b)\)

So far we studied the distributional properties of the estimators that follow from the dual formulation. A convergent algorithm for actually computing these estimators has however not been developed yet. To be able to do so, it is useful to first illustrate some insightful characteristics of the dual function. To highlight some of its characteristics, we start with the simple one-dimensional case \(n=p=1\). The two components of \({\mathcal {D}}(b)\) (cf. 19) simplify then to

$$\begin{aligned} \begin{array}{lcl} {\mathcal {D}}_{1}(b)&{}=&{}||{\hat{b}}-b||_{Q_{{\hat{b}}{\hat{b}}}}^{2}=({\hat{b}}-b)^{2}/\sigma _{{\hat{b}}}^{2}\\ {\mathcal {D}}_{2}(b)&{}=&{}||{\hat{a}}(b)\!-\!{\check{a}}(b)||_{Q_{{\hat{a}}(b){\hat{a}}(b)}}^{2}\!=\!\left( {\hat{a}}(b)-\lceil {\hat{a}}(b) \rfloor \right) ^{2}/\sigma _{{\hat{a}}(b)}^{2} \end{array}\nonumber \\ \end{aligned}$$
(54)

For the second component, we may write \({\mathcal {D}}_{2}(b)=({\hat{a}}(b)-z)^{2}/\sigma _{{\hat{a}}(b)}^{2}\) if \({\hat{a}}(b) \in [z-\tfrac{1}{2}, z+\tfrac{1}{2}]\). Since \({\hat{a}}(b)={\hat{a}}-\sigma _{{\hat{a}}{\hat{b}}}\sigma _{{\hat{b}}}^{-2}({\hat{b}}-b)\), we have the equivalence

$$\begin{aligned} {\hat{a}}(b) \in \left[ z-\tfrac{1}{2}, z+\tfrac{1}{2}\right] \Leftrightarrow b \in \left[ b(z)-\tfrac{1}{2}\varDelta , b(z)+\tfrac{1}{2}\varDelta \right] \nonumber \\ \end{aligned}$$
(55)

where \(b(z)={\hat{b}}+(z-{\hat{a}})\varDelta \) and \(\varDelta =\sigma _{{\hat{b}}}^{2}/\sigma _{{\hat{a}}{\hat{b}}}\) (note: b(z) should here not be confused with \({\hat{b}}(z)\)). Using this equivalence one can show that the sum of \({\mathcal {D}}_{1}(b)\) and \({\mathcal {D}}_{2}(b)\) can be written as

$$\begin{aligned} {\mathcal {D}}(b) = \sigma _{{\hat{b}}}^{-2}\left\{ ({\hat{b}}-b)^{2}+\frac{\rho _{{\hat{a}}{\hat{b}}}^{2}}{1-\rho _{{\hat{a}}{\hat{b}}}^{2}}\left( b(z)-b\right) ^{2}\right\} \end{aligned}$$
(56)

for \(b \in [b(z)-\tfrac{1}{2}\varDelta , b(z)+\tfrac{1}{2}\varDelta ]\), \(z \in {\mathbb {Z}}\), where \(\rho _{{\hat{a}}{\hat{b}}}\) denotes the correlation coefficient of \({\hat{a}}\) and \({\hat{b}}\). Thus \({\mathcal {D}}(b)\) is the sum of a parabolas \({\mathcal {D}}_{1}(b)=({\hat{b}}-b)^{2}/\sigma _{{\hat{b}}}^{2}\) and an infinite z-sequence of equally shaped parabola \({\mathcal {D}}_{2}(b)=\frac{\rho _{{\hat{a}}{\hat{b}}}^{2}}{1-\rho _{{\hat{a}}{\hat{b}}}^{2}}(b(z)-b)^{2}/\sigma _{b}^{2}\), centred at b(z) and with domain \(b \in [b(z)-\tfrac{1}{2}\varDelta , b(z)+\tfrac{1}{2}\varDelta ]\).

Equation (56) shows that the contribution of \({\mathcal {D}}_{2}(b)\) to \({\mathcal {D}}(b)\) is driven by the correlation coefficient \(\rho _{{\hat{a}}{\hat{b}}}\); it is small if the correlation is small and it gets larger the closer the correlation coefficient gets to one. An illustration of \({\mathcal {D}}(b)\), together with its two components \({\mathcal {D}}_{1}(b)\) and \({\mathcal {D}}_{2}(b)\), is given in Fig.  5 for two different values of the correlation coefficient, \(\rho _{{\hat{a}}{\hat{b}}}=0.5\) and \(\rho _{{\hat{a}}{\hat{b}}}=0.9\). It shows that \({\mathcal {D}}(b)\) is a multimodal function of which the multimodality, with its multiple local minima, gets more pronounced the larger the correlation coefficient gets, i.e. the more weight is given to \({\mathcal {D}}_{2}(b)\) in the sum of \({\mathcal {D}}(b)\). Figure  6 illustrates the multimodality of \({\mathcal {D}}(b)\) for \(p=1\) and \(n=2\), with diagonal (left) and nondiagonal (right) conditional ambiguity variance matrix. The multiple local minima of \({\mathcal {D}}(b)\) and also the domain in which its global minimizer is guaranteed to reside, are given by the following Lemma.

Fig. 6
figure 6

The dual function \({\mathcal {D}}(b)\) for \(p=1\) and \(n=2\), with diagonal (left plot) and nondiagonal (right plot) conditional ambiguity variance matrix, having 0.8 correlation

Lemma 7

(Local minimizers and global domain)

  1. (a)

    The local minimizers and corresponding minima of \({\mathcal {D}}(b)\) are

    $$\begin{aligned} \left\{ \begin{array}{lcl} {\hat{b}}(z) &{}=&{} \arg \min \limits _{{\hat{a}}(b) \in {\mathscr {S}}_{z}} {\mathcal {D}}(b)\\ {\mathcal {D}}({\hat{b}}(z))&{}=&{}||{\hat{a}}-z||_{Q_{{\hat{a}}{\hat{a}}}}^{2},\;\forall z \in {\mathbb {Z}}^{n} \end{array} \right. \end{aligned}$$
    (57)
  2. (b)

    The global minimizer of \({\mathcal {D}}(b)\),

    $$\begin{aligned} {\check{b}} = \arg \min _{b \in {\mathbb {R}}^{n}} {\mathcal {D}}(b) \in {\mathcal {E}}(r_{z}),\; z \in {\mathbb {Z}}^{n} \end{aligned}$$
    (58)

    resides in the ellipsoidal region \({\mathcal {E}}(r_{z}) = \{ b \in {\mathbb {R}}^{n}\;| \;||{\hat{b}}-b||_{Q_{{\hat{b}}{\hat{b}}}}^{2} \le r_{z}^{2}\}\), for all \(z \in {\mathbb {Z}}^{n}\), with \(r_{z}^{2}=||{\hat{a}}-z||_{Q_{{\hat{a}}{\hat{a}}}}^{2}\).

\(\blacksquare \)

Proof

First we prove (57). For \({\hat{a}}(b) \in {\mathscr {S}}_{z}\), we have \({\check{a}}(b)=z\), and therefore

$$\begin{aligned} \begin{array}{lcl} {\mathcal {D}}(b) &{}=&{}||{\hat{b}}-b||_{Q_{{\hat{b}}{\hat{b}}}}^{2}+||{\hat{a}}(b)-z||_{Q_{{\hat{a}}(b){\hat{a}}(b)}}^{2}\\ &{}=&{} ||{\hat{a}}-z||_{Q_{{\hat{a}}{\hat{a}}}}^{2}+||{\hat{b}}(z)-b||_{Q_{{\hat{b}}(a){\hat{b}}(a)}}^{2} \end{array} \end{aligned}$$
(59)

showing that its local minimizer and minimum are given by (57). That no other minima exist of \({\mathcal {D}}(b)\) follows from the fact that the pull-in regions \({\mathscr {S}}_{z}\) partition \({\mathbb {R}}^{n}\), \(\forall z \in {\mathbb {Z}}^{n}\) (Teunissen 1999a). The proof of (58) follows by recognizing that as the global minimizer \({\check{b}}\) is one of the local minimizers, \({\check{b}}\) resides in the set \(\{ b \in {\mathbb {R}}^{p}\;|\; {\mathcal {D}}(b)\le ||{\hat{a}}-z||_{Q_{{\hat{a}}{\hat{a}}}}^{2}\}\), and thus also in the larger set \(E_{z}\). \(\square \)

This result shows the size of the local minima and where they are located in b-space \({\mathbb {R}}^{p}\), but it does not show how their global minimum can be obtained, other then that it is confined to \({\mathcal {E}}(r_{z})\). Due to the presence of this multimodality of \({\mathcal {D}}(b)\), one can therefore not expect standard iterative descent techniques (Teunissen 1990) to be successful for finding its global minimum \({\check{b}}\). We will therefore have to develop a global algorithm dedicated to \({\mathcal {D}}(b)\).

7 Global minimization of dual

In this section we present our proposed method for finding the global minimizer \({\check{b}}\) of the dual function \({\mathcal {D}}(b)\). According to Lemma 7, we can confine the search for \({\check{b}}\) to a convex set \({\mathcal {C}} \subset {\mathbb {R}}^{p}\), being either a suitably scaled ellipsoid \({\mathcal {E}}(r)=\{b \in {\mathbb {R}}^{p}|\; ||{\hat{b}}-b||_{Q_{{\hat{b}}{\hat{b}}}}^{2} \le r^{2}\}\) or any other of its circumscribing convex regions. The problem to be solved reads therefore

$$\begin{aligned} {\check{b}}=\arg \min _{b \in {\mathcal {C}} \subset {\mathbb {R}}^{p}} {\mathcal {D}}(b) \end{aligned}$$
(60)

The challenge in solving this problem is due to the multimodal dual function not being convex. Although various heuristic and stochastic methods for the approximate computation of nonconvex global minimizers exist (Zhigljavsky 1991; Horst et al. 2000; Pardalos and Romeijn 2002), we choose to present a method that has finite termination with a guaranteed \(\epsilon \)-tolerance. Our method for solving (60) is constructed from the following three constituents:

  1. 1.

    Branch and bound (BB): Branch and bound algorithms (Lawler and Wood 1966; Balakrishnan et al. 1991; Guida 2015) are methods for global minimization of nonconvex problems. They are nonheuristic, in the sense that they maintain a provable upper and lower bound on the global minimum, i.e. they terminate with a guarantee that the computed solution has a prescribed accuracy.

  2. 2.

    Convex relaxation: To be able to compute the required lower bounds in the BB-algorithm, we construct differentiable, convex lower bounding functions of \({\mathcal {D}}(b)\) over convex sets. They are constructed such that the lower bounds converge to the nonconvex dual function as the convex sets shrink to a point.

  3. 3.

    Projected gradient descent (PGD): As our convex lower bounding functions are only continuous differentiable (i.e. \(C^{1}\)-functions), the projected gradient descent method (Bertsekas 1999; Nocedal and Wright 2006) is used for the computation of their convex constrained minima.

We now describe each of these constituents and how they interrelate and integrate.

7.1 Branch and bound (BB)

The basic idea of the BB-algorithm is

  • to partition the initial box \({\mathcal {C}} \subset {\mathbb {R}}^{p}\) in k boxes \({\mathcal {B}}_{i}\),

    $$\begin{aligned} {\mathcal {C}}= \cup _{i=1}^{k} {\mathcal {B}}_{i} \end{aligned}$$
    (61)
  • to find local lower and upper bounds of \({\mathcal {D}}(b)\) for each box \({\mathcal {B}}_{i}\),

    $$\begin{aligned} L({\mathcal {B}}_{i}) \le \min \limits _{b \in {\mathcal {B}}_{i}}{\mathcal {D}}(b) \le U({\mathcal {B}}_{i}), \end{aligned}$$
    (62)
  • to form global bounds from the local bounds,

    $$\begin{aligned} L_{k} \le \min \limits _{b \in {\mathcal {C}}}{\mathcal {D}}(b) \le U_{k} \end{aligned}$$
    (63)

    where

    $$\begin{aligned} L_{k}=\min \limits _{i=1, \ldots , k} L({\mathcal {B}}_{i})\;\textrm{and}\;U_{k}=\min \limits _{i=1, \ldots , k} U({\mathcal {B}}_{i}) \end{aligned}$$
    (64)
  • to terminate if the difference of these bounds is small enough, \(U_{k}-L_{k} \le \epsilon \), else to refine the partition and repeat the process.

The efficacy of the BB-concept depends on the chosen method of partitioning, on the sharpness of the bounds and on the ease with which they can be computed. Importantly, for convergence, the bounds should become tight as the box shrinks to a point.

Although there exist a large variety of different BB-mechanizations, we shall here restrict ourselves to the simple approach where the partitioning of \({\mathcal {C}} \subset {\mathbb {R}}^{p}\) is sequentially constructed through a splitting in half of the boxes. So at the first level, we start with the trivial partitioning, which is \({\mathcal {C}}\) itself, and compute the lower and upper bounds \(L_{1}=L({\mathcal {C}})\) and \(U_{1}=U({\mathcal {C}})\) (hence, these bounds are local and global at the same time),

$$\begin{aligned} L_{1}=L({\mathcal {C}}) \le {\mathcal {D}}({\check{b}}) \le U_{1}=U({\mathcal {C}}) \end{aligned}$$
(65)

If \(U_{1}-L_{1} \le \epsilon \), the algorithm terminates. Otherwise we go to the second iteration level and partition \({\mathcal {C}}\) into two boxes \({\mathcal {C}}={\mathcal {B}}_{1} \cup {\mathcal {B}}_{2}\), and compute \(L({\mathcal {B}}_{i})\) and \(U({\mathcal {B}}_{i})\), \(i = 1, 2\). The splitting of the box is usually done along its longest edge. Then we can construct new global lower and upper bounds,

$$\begin{aligned} \underbrace{\min (L({\mathcal {B}}_{1}), L({\mathcal {B}}_{2}))}_{L_{2}} \le {\mathcal {D}}({\check{b}}) \le \underbrace{\min (U({\mathcal {B}}_{1}), U({\mathcal {B}}_{2}))}_{U_{2}} \end{aligned}$$
(66)

As both \({\mathcal {B}}_{1}\) and \({\mathcal {B}}_{2}\) are ’smaller’ than \({\mathcal {C}}\) (i.e. they are its partition), one can generally expect the local bounds for \({\mathcal {B}}_{i}\) to be sharper than the previous global bounds are for \({\mathcal {B}}_{i}\). One can therefore assume that the lower and upper bounds of the pair of boxes obtained by splitting are no worse than the lower and upper bounds of the box they were formed from.

If \(U_{2}-L_{2}< \epsilon \), the algorithm terminates. Otherwise, we partition one of \({\mathcal {B}}_{1}\) and \({\mathcal {B}}_{2}\) into two boxes, to obtain a new partition of \({\mathcal {C}}\) into three boxes, and we compute the local lower and upper bounds for these new boxes. We then update the global lower bound \(L_{3}\) as the minimum of the local lower bounds over the partition of \({\mathcal {C}}\), and similarly for the upper bound \(U_{3}\). The choice which of the two boxes to split, \({\mathcal {B}}_{1}\) or \({\mathcal {B}}_{2}\), is based on the value of their local lower bound. The box to be split is the one of which the local lower bound equals the global lower bound, i.e. the one that has the smallest local lower bound. As at each iteration level a box is split into two, we have after k iterations a partitioning of the form (61), with associated global lower and upper bounds of \({\mathcal {D}}({\check{b}})\) as given in (63), with \(L_{k}\) nondecreasing and \(U_{k}\) nonincreasing. Note, although the choice of which box to split may not be correct in the sense that it does not contain the solution \({\check{b}}\), at a certain following stage the BB-algorithm will revisit the nonselected box containing \({\check{b}}\) as its local lower bound will then have become the smallest.

Fig. 7
figure 7

a The function \(g(x) = (x-\lceil x \rfloor )^2\) as a sequence of cut-off parabolas on interval \([z_{L}-\tfrac{1}{2}, z_{U}+\tfrac{1}{2}]\); b with its parabolic lower bounding function; and c with its best possible, continuous differentiable lower bounding convex function

7.1.1 Initialization and bounds

To start the BB-algorithm, the initial box \({\mathcal {C}}\) needs to be formed. We choose \({\mathcal {C}}\) to be the box

$$\begin{aligned} {\mathcal {C}} = \left\{ b \in {\mathbb {R}}^{p}|\; |b_{\alpha }-{\hat{b}}_{\alpha }|\le r \sigma _{{\hat{b}}_{\alpha }}, \alpha =1, \ldots ,p\right\} \end{aligned}$$
(67)

It follows from the ellipsoidal planes-of-support lemma that the box \({\mathcal {C}}\) is circumscribed by the ellipsoid \({\mathcal {E}}(r)=\{b \in {\mathbb {R}}^{p}|\; ||b-{\hat{b}}||_{Q_{{\hat{b}}{\hat{b}}}}^{2}\le r^{2}\}\), see, e.g. (Teunissen 1995). The scalar \(r>0\) is a user-defined parameter. It can be set following Lemma 7, or by choosing a user-defined confidence-level. In the latter case, \(r^{2}=\chi ^{2}_{\alpha }(p,0)\) corresponds with a confidence-level \(1-\alpha \).

For the bounds we need to be able to compute upper and lower bounds of \(\min _{b \in {\mathcal {B}}}{\mathcal {D}}(b)\) for any relevant box \({\mathcal {B}}\) that the BB-algorithm creates. The computation of local upper bounds \(U({\mathcal {B}})\) is rather straightforward, since any \(b \in {\mathcal {B}}\) can be used for that purpose. We choose to compute the bound as

$$\begin{aligned} U({\mathcal {B}})= {\mathcal {D}}(b_{*}) \ge \min \limits _{b \in {\mathcal {B}}}{\mathcal {D}}(b) \end{aligned}$$
(68)

with \(b_{*}\) being the ’centre of gravity’ of the box, i.e. if box \({\mathcal {B}}\) is bounded as \(b_{L} \preceq b \preceq b_{U}\), then \( b_{*}= \tfrac{1}{2} (b_{L}+b_{U}) \).

The computation of local lower bounds \(L({\mathcal {B}})\) is much more involved. We cannot use standard gradient-based methods for computing the minimizer, since \({\mathcal {D}}(b)\) is not convex and convergence is therefore not assured. The idea is therefore to find a differentiable convex lower bounding function \( {\mathcal {D}}_{L}(b) \le {\mathcal {D}}(b), \forall b \in {\mathcal {B}} \) such that the minimizer of \({\mathcal {D}}_{L}(b)\) over \({\mathcal {B}}\) can be computed with standard means and used as the local lower bound

$$\begin{aligned} L({\mathcal {B}}) = \min \limits _{b \in {\mathcal {B}}}{\mathcal {D}}_{L}(b) \le \min \limits _{b \in {\mathcal {B}}}{\mathcal {D}}(b) \end{aligned}$$
(69)

We now show how this can be achieved.

7.2 Convex relaxation

We will develop the convex relaxation for the dual function \({\mathcal {D}}^{\circ }(x)\), as a similar approach can be developed for the other dual versions, like \({\mathcal {D}}^{\bullet }(x)\). Using the diagonal approximation \(Q^{\circ }_{{\hat{a}}(b){\hat{a}}(b)}=\textrm{diag}(\sigma ^{2}_{1}, \ldots , \sigma ^{2}_{n})\), the dual function (29) can be written as

$$\begin{aligned} {\mathcal {D}}^{\circ }(x)=||b-{\hat{b}}||_{Q_{{\hat{b}}{\hat{b}}}}^{2}+ {\mathcal {G}}(b) \end{aligned}$$
(70)

with

$$\begin{aligned} \begin{array}{lcl} {\mathcal {G}}(b) &{}=&{} \sum _{i=1}^{n} g(x_{i})/\sigma _{i}^{2}\\ g(x)&{}=&{}\left( x-\lceil x \rfloor \right) ^{2}\\ x_{i}&{}=&{}{\hat{a}}_{i}(b)={\hat{a}}_{i}+m_{i}^{T}(b-{\hat{b}})\\ m_{i}^{T} &{}=&{} c_{i}^{T}Q_{{\hat{a}}{\hat{b}}}Q_{{\hat{b}}{\hat{b}}}^{-1}\\ c_{i} &{}=&{} i\mathrm{th\;column\;of\;unit\;matrix}\;I_{n} \end{array} \end{aligned}$$
(71)

Since the first term of (70) is already convex for the whole space \({\mathbb {R}}^{p}\), we can concentrate on the second term and try to find a convex differentiable lower bounding function \({\mathcal {G}}_{L}(b)\) such that

$$\begin{aligned} {\mathcal {G}}_{L}(b) \le {\mathcal {G}}(b), \;\forall b \in {\mathcal {B}} \end{aligned}$$
(72)

Once this function is found, we have found the dual convex lower bounding function as \({\mathcal {D}}^{\circ }_{L}(b)=||b-{\hat{b}}||_{Q_{{\hat{b}}{\hat{b}}}}^{2}+ {\mathcal {G}}_{L}(b) \le {\mathcal {D}}(b), \;\forall b \in {\mathcal {B}}\).

Fig. 8
figure 8

Four convex differentiable lower bounding functions (in red) of \(g(x) = (x-\lceil x \rfloor )^2\) on the interval \([l,u] \subset [z, z+1]\), \(z \in {\mathbb {Z}}\)

Note that the lack of convexity of \({\mathcal {G}}(b)\) is due to the single function g(x), which itself is a sequence of cut-off parabolas, see Fig. 7a. Hence, if we can find a convex lower bounding function \(g_{L}(x)\) of g(x) on the required interval, then we automatically have constructed a convex \({\mathcal {G}}_{L}(b)\) on the required box \({\mathcal {B}}\). To do so, we first need to construct the intervals of \({\hat{a}}_{i}(b)\), \(i=1, \ldots , n\), that correspond with \(b_{L} \preceq b \preceq b_{U}\). As these n intervals will differ, the lower bounding functions of g(x) on these intervals will differ as well. They will be denoted as \(g_{i,L}(x)\).

The interval \([l_{i},u_{i}]\) for which \(g_{i,L}(x)\) has to be convex

As the function \({\mathcal {G}}_{L}(b)=\sum _{i=1}^{n} g_{i,L}({\hat{a}}_{i}(b))/\sigma _{i}^{2}\) has to be convex for the box \(b_{\alpha , L} \le b_{\alpha } \le b_{\alpha , U}\), \(\alpha = 1, \ldots , p\), the functions \(g_{i,L}(x)\) need to be convex for the intervals \(l_{i} \le a_{i}(b) \le u_{i}\) that correspond with this box. Application of the projection-lemma from the Appendix shows the relation between these intervals given as

$$\begin{aligned} b_{\alpha , L} \le b_{\alpha } \le b_{\alpha , U}\; \Rightarrow \;l_{i} \le {\hat{a}}_{i}(b) \le u_{i}\; \end{aligned}$$
(73)

with

$$\begin{aligned} \begin{array}{lcl} l_{i}&{}=&{} {\hat{a}}_{i}(0)+\sum \limits _{\alpha =1}^{p}\left( -{\tilde{b}}_{\alpha }|m_{i \alpha }|+{\bar{b}}_{\alpha }m_{i \alpha }\right) \\ u_{i}&{}=&{} {\hat{a}}_{i}(0)+\sum \limits _{\alpha =1}^{p}\left( +{\tilde{b}}_{\alpha }|m_{i \alpha }|+{\bar{b}}_{\alpha }m_{i \alpha }\right) \end{array} \end{aligned}$$
(74)

where \({\tilde{b}}_{\alpha }=\tfrac{1}{2}(b_{\alpha , U}-b_{\alpha , L})\), \({\bar{b}}_{\alpha }=\tfrac{1}{2}(b_{\alpha , U}+b_{\alpha , L})\), and \(m_{i \alpha }=c_{i}^{T}Q_{{\hat{a}}{\hat{b}}}Q_{{\hat{b}}{\hat{b}}}^{-1}c_{\alpha }\).

Note that the widths of the baseline intervals \(b_{\alpha , U}-b_{\alpha , L}\) propagate into the widths of the ambiguity intervals as \(u_{i}-l_{i}= \sum _{\alpha =1}^{p} (b_{\alpha ,U}-b_{\alpha , L})|m_{i \alpha }|\). Thus as the baseline intervals get smaller due to the rectangular BB-splitting, the corresponding ambiguity intervals get smaller as well, producing in the limit, when \(b_{\alpha , U}=b_{\alpha ,L}={\check{b}}_{\alpha }\), the result \({\hat{a}}({\check{b}})\).

As the BB-splitting acts on only one interval at a time, we can now also show how this halfway splitting affects the intervals \([l_{i}, u_{i}]\). Let the \(\gamma \)th interval \([b_{\gamma , L}, b_{\gamma , U}]\), with \(\gamma \in \{1, \ldots , p\}\) be split halfway in \([b_{\gamma , L}, {\bar{b}}_{\gamma }]\) and \([{\bar{b}}_{\gamma }, b_{\gamma , U}]\). Then the new intervals \([l^{1}_{i}, u^{1}_{i}]\), corresponding with \([b_{\gamma , L}, {\bar{b}}_{\gamma }]\), can be expressed in the old as

$$\begin{aligned} \begin{array}{lcl} l^{1}_{i}&{}=&{} l_{i}+\tfrac{1}{2}{\tilde{b}}_{\gamma }\left( |m_{i\gamma }|-m_{i \gamma }\right) \\ u^{1}_{i} &{}=&{} u_{i}-\tfrac{1}{2}{\tilde{b}}_{\gamma }\left( |m_{i\gamma }|+m_{i \gamma }\right) \\ \end{array} \end{aligned}$$
(75)

Hence, the length of the interval changes as \((u^{1}_{i}-l^{1}_{i})=(u_{i}-l_{i})-{\tilde{b}}_{\gamma }|m_{i \gamma }|\), i.e. it gets shorter by \({\tilde{b}}_{\gamma }|m_{i \gamma }|\), where \({\tilde{b}}_{\gamma }=\tfrac{1}{2}(b_{\gamma , U}-b_{\gamma , L})\).

Now that we know the intervals \([l_{i}, u_{i}]\) over which the functions \(g_{i, L}(x)\), \(i=1, \ldots , n\), need to provide a differentiable lower bounding of g(x), we can start constructing these functions. As we will do so for an arbitrary interval [lu], we will dispense with the lower index i and write \(g_{L}(x)\) instead of \(g_{i,L}(x)\).

Fig. 9
figure 9

The linear-parabolic differentiable lower bounding function (in red) of \(g(x) = (x-\lceil x \rfloor )^2\) when \(l \in [z_{l}-1, z_{l}-\tfrac{1}{2}]\), \(u \in [z_{u}+\tfrac{1}{2}, z_{u}+1]\) for \(z_{u} \ge z_{l}\)

Convex lower bounding function of g(x)on [lu]

Many different differentiable convex lower bounding functions \(g_{L}(x)\) of \(g(x) = (x-\lceil x\rfloor )^2\) on \(x \in [l,u]\) can be constructed. For example, if we assume for the moment that \(l=z_{l}-\tfrac{1}{2}\), \(z_{l} \in {\mathbb {Z}}\), and \(u=z_{u}+\tfrac{1}{2}\), \(z_{u} \in {\mathbb {Z}}\), then the parabola

$$\begin{aligned} g_{L}(x)= \alpha (x-{\bar{z}})^{2}+\beta \end{aligned}$$
(76)

with \(\alpha =\tfrac{1}{2}(\tfrac{1}{2}+{\tilde{z}})^{-1}\), \({\tilde{z}}=\tfrac{z_{u}-z_{l}}{2}\), \({\bar{z}}=\tfrac{z_{u}+z_{l}}{2}\), and \(\beta =-\tfrac{1}{2}{\tilde{z}}\), is such lower bounding function of g(x), see Fig. 7b. It gives a perfect fit to g(x) if \(u-l=1\). However, the longer the interval [lu] becomes, i.e. the larger the difference \(u-l\), the more negative the minimum \(\beta \) of the parabola becomes. With choice (76), the lower bounding fit to g(x) gets thus poorer the more l and u differ. A much better choice for the lower bounding function would be (see Fig. 7c)

$$\begin{aligned} g_{L}(x)=\left\{ \begin{array}{ccl} (x-z_{l})^{2}&{}\textrm{for}&{} l=z_{l}-\tfrac{1}{2} \le x \le z_{l}\\ 0 &{}\textrm{for}&{} z_{l} \le x \le z_{u}\\ (x-z_{u})^{2}&{}\textrm{for}&{} z_{u}\le x \le z_{u}+\tfrac{1}{2}=u \end{array} \right. \end{aligned}$$
(77)

Also this choice gives a perfect fit when \(u-l=1\), while now its minimum is independent of the interval length. In fact, the minimum value of 0 provides the best possible convex lower bounding over the interval \([z_{l}, z_{u}]\). When we compare the smoothness of the above two choices, we note that (76) is a \(C^{2}\) function, while (77) is only a \(C^{1}\) function. The continuous differentiability of (77) is sufficient however for the application of the projected gradient descent method (see Sect. 7.3).

So far we made the simplifying, but unrealistic assumption that \(l=z_{l}-\tfrac{1}{2}\) and \(u=z_{u}+\tfrac{1}{2}\). For the general situation in which [lu] can be any interval, we first assume that the interval lies inside the interval bordered by two consecutive integers: \([l,u] \subset [z, z+1]\) for some \(z \in {\mathbb {Z}}\). Then the following 4 different cases can be discriminated (see Fig. 8):

Case 1: If \([l,u] \subset [z, z+\tfrac{1}{2}]\), then \(g_{L}(x)= (x-z)^{2}\). Thus if both l and u lie in the first half of the interval, the parabola itself can be taken as the lower bounding function, see Fig. 8a.

Case 2: If \([l,u] \subset [z+\tfrac{1}{2}, z+1]\), then \(g_{L}(x)= (x-z-1)^{2}\). Thus if both l and u lie in the second half of the interval, the parabola centred at \(z+1\) can be taken as the lower bounding function, see Fig. 8b.

For the third and the fourth case, we assume that l and u lie in different halves of the interval \([z, z+1]\), whereby then the discrimination has to be made whether or not the function value at l is larger than at u.

Case 3: Let \(l \in [z, z+\tfrac{1}{2}]\), \(u \in [z+\tfrac{1}{2}, z+1]\), and \((l-z)^{2} \ge (u-z-1)^{2}\). Now the best convex lower bounding differentiable function is either a decreasing straight line, connecting the points \((l, (l-z)^{2})\) and \((u, (u-z-1)^{2})\), or a decreasing straight line that starts at the point \((l, (l-z)^{2})\), is tangent of the parabola \(y=(x-z-1)^{2}\) at the point \((\alpha , (\alpha -z-1)^{2})\), and then continues along the parabola to the point \((u, (u-z-1)^{2})\), see Fig. 8c. It is given as:

$$\begin{aligned} \begin{array}{l} g_{L}(x)=\left\{ \begin{array}{l} \overset{u \le \alpha }{=}\ a(x-l)+(l-z)^{2}\;\textrm{for}\; l\le x\le u \\ \overset{u \ge \alpha }{=}\ \left\{ \begin{array}{ll} = &{} 2(\alpha \!-\!z\!-\!1)(x\!-\!\alpha )\!+\!(\alpha -z-1)^{2}\\ &{} \textrm{for}\; l\le x \le \alpha \\ = &{} (x-z-1)^{2}\\ &{} \textrm{for}\; \alpha \le x \le u \end{array} \right. \end{array} \right. \end{array}\nonumber \\ \end{aligned}$$
(78)

with \(\alpha =l+ \sqrt{1-2(l-z)}\) and \(a=\tfrac{(u-z-1)^{2}-(l-z)^{2}}{u-l} \le 0\).

Case 4: Let \(l \in [z, z+\tfrac{1}{2}]\), \(u \in [z+\tfrac{1}{2}, z+1]\), and \((l-z)^{2} \le (u-z-1)^{2}\). This case is a ‘mirror image’ of the previous one, see Fig. 8d. The convex lower bounding differentiable function is given as:

$$\begin{aligned} \begin{array}{l} g_{L}(x)=\left\{ \begin{array}{l} \overset{l \ge \beta }{=}\ a(x-l)+(l-z)^{2}\;\textrm{for}\; l\le x\le u \\ \overset{l \le \beta }{=}\ \left\{ \begin{array}{ll} = &{} (x-z)^{2}\\ &{} \textrm{for}\; l \le x \le \beta \\ = &{} 2(\beta -z)(x-\beta )+(\beta -z)^{2}\\ &{} \textrm{for}\; \beta \le x \le u \\ \end{array} \right. \end{array} \right. \end{array} \end{aligned}$$
(79)

with \(\beta =u-\sqrt{2(u-z)-1}\) and \(a=\tfrac{(u-z-1)^{2}-(l-z)^{2}}{u-l} \ge 0\).

We can use the above insight also to construct lower bounding functions in case \([l,u] \subset [z, z+1]\) is not true, but instead \(l \in [z_{l}-1, z_{l}]\) and \(u \in [z_{u}, z_{u}+1]\) for some integers \(z_{l}\le z_{u}\), see Fig. 9. When \(l \in [z_{l}-1, z_{l}-\tfrac{1}{2}]\) and \(u \in [z_{u}+\tfrac{1}{2}, z_{u}+1]\), the lower bounding function will be given as

$$\begin{aligned} g_{L}(x)=\left\{ \begin{array}{ccl} f_{1}(l,x)&{}\textrm{for}&{} l \le x \le z_{l}\\ 0 &{}\textrm{for}&{} z_{l} \le x \le z_{u}\\ f_{2}(x,u)&{}\textrm{for}&{} z_{u}\le x \le u \end{array} \right. \end{aligned}$$
(80)

with \(f_{1}(l,x)\) and \(f_{2}(x,u)\) constructed such that they provide the required lower bounds. They can be found directly from (78), through replacement \(z:=z_{l}-1\), and from (79), through replacement \(z:=z_{u}\), as

$$\begin{aligned} \begin{array}{c} f_{1}(l,x) = \\ \left\{ \begin{array}{lcl} 2(\alpha -z_{l})(x-\alpha )+(\alpha -z_{l})^{2}&{}\textrm{for}&{} l \le x \le \alpha \\ (x-z_{l})^{2} &{}\textrm{for}&{} \alpha \le x \le z_{l} \end{array} \right. \end{array} \end{aligned}$$
(81)

and

$$\begin{aligned} \begin{array}{c} f_{2}(x,u) = \\ \left\{ \begin{array}{lcl} (x-z_{u})^{2} &{}\textrm{for}&{} z_{u}\le x \le \beta \\ 2(\beta -z_{u})(x -\beta )+(\beta -z_{u})^{2}&{}\textrm{for}&{} \beta \le x \le u\\ \end{array} \right. \end{array} \end{aligned}$$
(82)

where \(\alpha =l+\sqrt{1-2(l-z_{l}+1)}\) and \(\beta =u-\sqrt{2(u-z_{u})-1}\).

Note that either one or both of the straight line components of (81) and (82) will be absent when \(l \in [z_{l}-\tfrac{1}{2}, z_{l}]\) or/and \(u \in [z_{u}, z_{u}+\tfrac{1}{2}]\). Both are absent when \(l \in [z_{l}-\tfrac{1}{2}, z_{l}]\) and \(u \in [z_{u}, z_{u}+\tfrac{1}{2}]\), in which case the lower bounding function generalizes (77) to become

$$\begin{aligned} g_{L}(x)=\left\{ \begin{array}{ccl} (x-z_{l})^{2}&{}\textrm{for}&{} l \le x \le z_{l}\\ 0 &{}\textrm{for}&{} z_{l} \le x \le z_{u}\\ (x-z_{u})^{2}&{}\textrm{for}&{} z_{u}\le x \le u \end{array} \right. \end{aligned}$$
(83)

With the above construction of \(g_{L}(x)\) for the different scenarios we are now in the position to formulate our sought for lower bounding function as \({\mathcal {D}}^{\circ }_{L}(b)=||{\hat{b}}-b||_{Q_{{\hat{b}}{\hat{b}}}}^{2}+\sum _{i=1}^{n}g_{i,L}({\hat{a}}_{i}(b))/\sigma _{i}^{2}\). It is continuous differentiable, convex and lower bounding \({\mathcal {D}}^{\circ }(b)\) on the interval \(b_{L} \preceq b \preceq b_{U}\). Hence, it is now in the form that the PGD-method can be applied to it.

7.3 Projected gradient descent (PGD)

The PGD-method is designed to solve a constrained minimization problem

$$\begin{aligned} \min _{ b \in {\mathcal {B}}} f(b) \end{aligned}$$
(84)

of which the objective function \(f: {\mathbb {R}}^{p} \rightarrow {\mathbb {R}}\) is only continuous differentiable, i.e. \(C^{1}\)-function (Bertsekas 1999; Nocedal and Wright 2006; Parikh and Boyd 2013; Nesterov 2018). This implies that only first order gradient information of the objective function can be used. If in addition both the objective function and the constraint are convex, then any local minimum is automatically a global minimum. This is the case before us when \(f(b):= {\mathcal {D}}_{L}(b)\) and \({\mathcal {B}}=\{b \in {\mathbb {R}}^{p}|\; b_{L} \preceq b \preceq b_{U}\}\).

The PGD algorithmic steps for solving (84) are:

  1. 1.

    Initialize: Start with a feasible solution, \(b^{0} \in {\mathcal {B}}\) and then loop for \(k=0, \ldots \) until stop criterium:

  2. 2.

    Gradient descent step: Compute stepsize \(\mu _{k}\) and gradient descent

    $$\begin{aligned} {\hat{b}}^{k}=b^{k}-\mu _{k}\nabla f\left( b^{k}\right) \end{aligned}$$
    (85)

    such that \(f({\hat{b}}^{k}) \le f(b^{k})\).

  3. 3.

    Projection step: Project \({\hat{b}}^{k}\) onto \({\mathcal {B}}\) to ensure conformity with the constraints,

    $$\begin{aligned} b^{k+1}= P_{{\mathcal {B}}}({\hat{b}}^{k})\;\textrm{with}\;P_{{\mathcal {B}}}(y)=\arg \min _{b \in {\mathcal {B}}}||y-b||^{2} \end{aligned}$$
    (86)
Fig. 10
figure 10

The function \(g'_{i,L}(x)=\tfrac{dg_{i,L}}{dx}(x)\) (cf. 91) on the interval \([l_{i}, u_{i}]\) for \(l_{i} \in [z_{l_{i}}-1, z_{l_{i}}-\tfrac{1}{2}]\), \(u_{i} \in [z_{u_{i}}+\tfrac{1}{2}, z_{u_{i}}+1]\)

This iterative scheme can be seen as repeatedly solving an approximate version of the original minimization problem, namely one in which the objective function f(b) is approximated by a quadratic function \(F_{k}(b)=f(b^{k})+\nabla f(b^{k})^{T}(b-b^{k})+\tfrac{1}{2 \mu _{k}}||b-b^{k}||^{2}\) (\(\mu _{k}>0\)). The above iterative scheme can then be summarized as repeatedly solving

$$\begin{aligned} b^{k+1}= \arg \min \limits _{b \in {\mathcal {B}}} F_{k}(b) \end{aligned}$$
(87)

To see this, we first rewrite the quadratic function \(F_{k}(b)\) in the more convenient form \(F_{k}(b)= \tfrac{1}{2 \mu _{k}}||{\hat{b}}^{k}-b||^{2} + c_{k}\), with \(c_{k}=f(b^{k})-\tfrac{1}{2}\mu _{k}||\nabla f(b^{k})||^{2}\) and \({\hat{b}}^{k}=b^{k}-\mu _{k}\nabla f(b^{k})\), showing that the gradient step (85) provides the unconstrained minimizer of \(F_{k}(b)\). Substitution of \(F_{k}(b)= \tfrac{1}{2 \mu _{k}}||{\hat{b}}^{k}-b||^{2} + c_{k}\) into (87) gives \(b^{k+1}= \arg \min \limits _{b \in {\mathcal {B}}}||{\hat{b}}^{k}-b||^{2}\) and therefore \(b^{k+1}=P_{{\mathcal {B}}}({\hat{b}}^{k})\), which is the projection step (86).

For computing the stepsize \(\mu _{k}\) in each iteration, different linesearch strategies exist, from simple to advanced (Nesterov 2018). One of the simplest starts with \(\mu _{k}=1\), followed by halving it, \(\mu _{k} \leftarrow \mu _{k}/2\), until \(f(b^{k}-\mu _{k}\nabla f(b^{k})) < f(b^{k})\). More involved accelerated strategies exist, where \(b^{k+1}\) is taken as a convex combination (weighted mean) of \(b^{k}\) and \(P_{{\mathcal {B}}}(b^{k}-\mu _{k}\nabla f(b^{k}))\).

A potential complicating factor in applying the PGD-method lies in the projection onto the convex set \({\mathcal {B}}\), which, depending on the geometry of \({\mathcal {B}}\), can be quite involved. The PGD-method is only efficient if this projection can be done efficiently. Fortunately, in our case, with the convex set given as \({\mathcal {B}} = \{ b \in {\mathbb {R}}^{p}|\; b_{L} \preceq b \preceq b_{U}\}\), the projection can be done very efficiently.

As \(P_{{\mathcal {B}}}(y)= \arg \min \limits _{b \in {\mathcal {B}}}||y-b||^{2}\), the minimization problem to be solved is

$$\begin{aligned} \begin{array}{lcl} \min \limits _{b_{L} \preceq b \preceq b_{U}} ||y-b||^{2} &{}=&{} \min \limits _{b_{L} \preceq b \preceq b_{U}} \sum \limits _{\alpha =1}^{p} (y_{\alpha }-b_{\alpha })^{2}\\ &{}=&{} \sum \limits _{\alpha =1}^{p} \min \limits _{b_{\alpha , L} \le b_{\alpha } \le b_{\alpha , U}} (y_{\alpha }-b_{\alpha })^{2}\\ &{}=&{} \sum \limits _{\alpha =1}^{p} \left( y_{\alpha }- \textrm{median}(b_{\alpha , L}, y_{\alpha }, b_{\alpha , U})\right) ^{2} \end{array} \end{aligned}$$
(88)

in which the minimizer \(\textrm{median}(b_{\alpha , L}, y_{\alpha }, b_{\alpha , U})\) denotes the median value of the triplet \(b_{\alpha ,L}\), \(y_{\alpha }\), and \(b_{\alpha ,U}\). We therefore have,

$$\begin{aligned} P_{{\mathcal {B}}}(y)_{\alpha } = \left\{ \begin{array}{lcl} b_{\alpha ,L} &{}\textrm{if}&{} y_{\alpha } \le b_{\alpha ,L}\\ y_{\alpha } &{}\textrm{if}&{} b_{\alpha ,L}\le y_{\alpha } \le b_{\alpha ,U}\\ b_{\alpha ,U} &{}\textrm{if}&{} b_{\alpha ,U} \le y_{\alpha } \end{array} \right. \end{aligned}$$
(89)

What now remains to be determined for the PGD-method to be applicable to solve \(L({\mathcal {B}})=\min _{b \in {\mathcal {B}}} {\mathcal {D}}^{\circ }_{L}(b)\) is the gradient of the objective function.

Fig. 11
figure 11

Multimodal dual-function of vertical positioning mixed-integer GNSS model, with its per iteration constructed convex lower bounding functions (red and blue) over intervals that get split for the red functions (i.e. intervals for which minimum of lower bounding function is lowest). Convergence was achieved in 7 iterations. Shown are the results of iterations \(\#1\), \(\#2\), \(\#3\), \(\#6\), and \(\#7\), with an additional zoom-in of \(\#7\)

The gradient of \({\mathcal {D}}_{L}^{\circ }(b)\)

The gradient of \({\mathcal {D}}^{\circ }_{L}(b)=||{\hat{b}}-b||_{Q_{{\hat{b}}{\hat{b}}}}^{2}+{\mathcal {G}}_{L}(b)\) is given as

$$\begin{aligned} \begin{array}{lcl} \nabla {\mathcal {D}}_{L}(b) &{}=&{} 2Q_{{\hat{b}}{\hat{b}}}^{-1}(b-{\hat{b}})+\nabla {\mathcal {G}}_{L}(b) \\ &{}=&{} 2Q_{{\hat{b}}{\hat{b}}}^{-1}(b-{\hat{b}})+ \sum \limits _{i=1}^{n} \sigma _{i}^{-2} g_{i,L}'({\hat{a}}_{i}(b)) m_{i}\\ &{}=&{} 2Q_{{\hat{b}}{\hat{b}}}^{-1}\left[ b-{\hat{b}}+ Q_{{\hat{b}}{\hat{a}}}Q^{o -1}_{{\hat{a}}{\hat{a}}}s(b)\right] \end{array} \end{aligned}$$
(90)

with \(s(b) = \tfrac{1}{2}[g'_{1,L}({\hat{a}}_{1}(b)), \ldots , g'_{n,L}({\hat{a}}_{n}(b))]^{T}\) and \(g'_{i,L}(x) =\tfrac{dg_{i,L}}{dx}(x)\). The entries of the vector s(b) are driven by the intervals \([l_{i}, u_{i}]\), the derivatives \(g'_{i,L}(x)\) of the functions that are convex lower bounding on \([l_{i}, u_{i}]\), and the locations of the \({\hat{a}}_{i}(b)\) within the intervals \([l_{i}, u_{i}]\). For instance, for \(l_{i} \in [z_{l_{i}}-1, z_{l_{i}}-\tfrac{1}{2}]\) and \(u_{i} \in [z_{u_{i}}+\tfrac{1}{2}, z_{u_{i}}+1]\), with \(z_{u_{i}} \ge z_{l_{i}}\), the applicable derivative \(g'_{i,L}(x)\) follows from (80), (81) and (82) as

$$\begin{aligned} g'_{i,L}(x) = \left\{ \begin{array}{ccl} 2\left( \alpha _{i}-z_{l_{i}}\right) &{}\textrm{for}&{} l_{i} \le x \le \alpha _{i}\\ 2\left( x-z_{l_{i}}\right) &{}\textrm{for}&{} \alpha _{i} \le x \le z_{l_{i}}\\ 0 &{} \textrm{for} &{} z_{l_{i}}\le x \le z_{u_{i}}\\ 2\left( x-z_{u_{i}}\right) &{} \textrm{for}&{} z_{u_{i}}\le x \le \beta _{i}\\ 2\left( \beta _{i}-z_{u_{i}}\right) &{} \textrm{for}&{} \beta _{i} \le x\le u_{i} \end{array} \right. \end{aligned}$$
(91)

with

$$\begin{aligned} \begin{array}{lcl} \alpha _{i}&{}=&{}l_{i}+\sqrt{1-2(l_{i}-z_{l_{i}}+1)}\\ \beta _{i} &{}=&{}u_{i}-\sqrt{1-2(z_{u_{i}}+1-u_{i})} \end{array} \end{aligned}$$
(92)

The behaviour of \(g'_{i,L}(x)\) for \(x \in [l_{i}, u_{i}]\) is illustrated in Fig. 10. It shows that the entries of s(b) are determined, in dependence of the location of \({\hat{a}}_{i}(b)\), through a mixed hard-soft thresholding, see Fig. 10.

We now present two examples to illustrate the workings of our global algorithm. To provide an insightful graphical display of the box-splitting iterations, we show the results for the 1D and the 2D case, i.e. \(b \in {\mathbb {R}}\) and \(b \in {\mathbb {R}}^{2}\).

Example 3

(Horizontal position known) This example and the next one are based on the single-frequency, single-epoch, single-baseline linearized GNSS model of example 1 (cf. 24), using a \(m+1=31\) satellite configuration having \(n=30\) DD ambiguities, with signal wavelength \(\lambda =19.03\) cm, and pseudorange and carrier-phase standard deviations of \(\sigma _{p}=20\) cm and \(\sigma _{\phi }=0.2\) cm, respectively. As we use satellite-elevation weighting, the used variance matrices of the observables are: \({\textsf{D}}(p)=\sigma _{p}^{2}D^{T}QD\) and \({\textsf{D}}(\phi )=\sigma _{\phi }^{2}D^{T}QD\), with \(Q=\textrm{diag}[\sin ^{-1}(\textrm{el}_{i}), \ldots , \sin ^{-1}(\textrm{el}_{m+1})]\), where \(\textrm{el}_{i}\) is the elevation angle of satellite i. In the current example we have \(b \in {\mathbb {R}}\), as we assume the horizontal positions known, while in the next example, we have \(b \in {\mathbb {R}}^{2}\), as the vertical positions are then assumed known. Note, as the model is in linearized form, that the parameters are increments with respect to the chosen approximate values, which in our case are taken as the correct values used for the simulation. Hence, an increment value of zero implies that the parameter value equals the correct value. In our case the float increment is computed to be \({\hat{b}}=0.7042\) m.

Fig. 12
figure 12

Multimodal dual-function of horizontal positioning mixed-integer GNSS model, with the contour lines of its per iteration constructed convex lower bounding functions over the rectangles that get split (i.e. rectangles for which minimum of lower bounding function is lowest). Convergence was achieved in 20 iterations. Shown are the results of iterations \(\#1\), \(\#2\), \(\#10\), \(\#16\), and \(\#20\)

Although the initialization in practice should follow our description of (67), we here use a larger, and thus more conservative, value for the initial box or interval size. Here we have chosen the initial interval to be \([-16\textrm{m}, +16\textrm{m}]\). The reason for this choice of larger interval length is that the corresponding increase in required number of iterations allows us to better show the various steps that are taken in the splitting process and construction of the convex lower bounding functions. The results of the various iteration steps are shown in Fig. 11, starting at the top of the first column with iteration \(\#1\) and finishing at the bottom of the second column with iteration \(\#7\). The first interval \([-16\textrm{m}, +16\textrm{m}]\) is designated to be centred at 0, but which corresponds to the float solution \({\hat{b}}\). Thus the horizontal axis denotes the difference \(b-{\hat{b}}\). The multimodal dual function is shown as a black curve, while the convex lower bounding functions are shown as either red or blue dashed curves. The lower bounding function is shown as a red dashed curve if its minimum is the smallest, thus implying that it is its interval that will be split for the next iteration step. We thus start off with a red dashed lower bounding function, as is shown in Fig. 11 at iteration \(\#1\).

With the initial interval of iteration \(\#1\) split in the middle, we obtain in iteration \(\#2\), two lower bounding functions. As the red curve has the smallest minimum, we continue with the left interval and split it in the middle, thus again giving us two lower bounding functions in iteration \(\#3\). This time it is the right interval that has the lower bounding function with smallest minimum. Splitting this interval in the middle and continuing in this way with the splitting process, we reach at the second last iteration \(\#6\) an interval that after splitting has in its left half the lower bounding function with smallest minimum. With this lower bounding function at iteration \(\#7\), the iteration stops, as then the fulfillment of our stop criterium, set at \(\epsilon =10^{-6}\), is realized. The obtained minimizer is then outputted to provide the asked for solution. In the current example, the iteration gives \({\check{b}}-{\hat{b}}=-0.7019\) m, which combined with the float increment \({\hat{b}}=0.7042\) m, gives for the ambiguity-fixed increment \({\check{b}}=0.0023\) m, thus showing that the fixed-solution differs by 2.3 mm from the correct value. \(\square \)

Example 4

(Vertical position known) This example continues with the same model as used in the previous example, be it that now the vertical positions are assumed known and thus \(b \in {\mathbb {R}}^{2}\). The results of the various iteration steps are shown in Fig. 12, with iterations \(\#1\), \(\#2\), \(\#10\) and \(\#16\) columnwise in the first two rows, while the last iteration \(\#20\) is shown in the third row as a greater zoom-in. For each iteration, the function values of the dual function are shown in colour coded form, as well, next to it, the contour lines of the convex lower bounding function for the box that is to be split. At each iteration step, its iteration number is located close to the centre of gravity of the red box, i.e. the box that will be split. The old iteration number is then moved to the centre of the box that remains unsplit. Iteration results \(\#1\), \(\#2\) and \(\#10\) are shown to the same scale, but for iteration \(\#16\) we use a zoom-in to show a greater detail and this is again done for the last iteration \(\#20\). Note, that even with this greater zoom-in, still a dense variability of the dual function is apparent, thus showing that it still has many local minima at this greater scale. In the red box containing the global minimum however, we now have a less pronounced multi-modality. At iteration \(\#20\) the stop criterium \(\epsilon =10^{-6}\) is fulfilled and as before the fixed solution is obtained at the mm-level. \(\square \)

8 Constrained and partitioned dual

In this contribution we restricted our study of the dual problem to the mixed-integer model (1). It is possible however to generalize the developed methodology also to the constrained case \(b \in {\mathcal {C}} \subset {\mathbb {R}}^{p}\) or to the case when only part of \(b \in {\mathbb {R}}^{p}\) needs to be ambiguity-resolved. For these two cases we provide the following outlook.

Constrained Dual

For the constrained mixed-integer model

$$\begin{aligned} {\textsf{E}}(y)=Aa+Bb,\; a \in {\mathbb {Z}}^{n},\; b \in {\mathcal {C}}\subset {\mathbb {R}}^{p} \end{aligned}$$
(93)

the constrained primal approach would be to first solve for the integer ambiguity a and then the baseline b. As we have, using F(ab) of (6),

$$\begin{aligned} {\check{a}}=\arg \min _{a \in {\mathbb {Z}}^{n}}F(a, {\check{b}}(a)),\;{\check{b}}(a)=\arg \min _{b \in {\mathcal {C}}}F(a,b) \end{aligned}$$
(94)

the primal function \(F(a, {\check{b}}(a))\) would be used to set up a (nonellipsoidal) search space to determine \({\check{a}}\), which then on its turn is used to determine the ambiguity-resolved constrained baseline as \({\check{b}}={\check{b}}({\check{a}})\). This is the approach used in the LAMBDA-based constrained solutions of GNSS attitude determination (Teunissen 2010; Giorgi et al. 2010) and rank-defect bias estimation (Khodabandeh 2022; Teunissen 2006).

The dual formulation to (94) is

$$\begin{aligned} {\check{b}}=\arg \min _{b \in {\mathcal {C}}} F({\check{a}}(b), b),\;{\check{a}}(b)=\arg \min _{a \in {\mathbb {Z}}^{n}}F(a,b) \end{aligned}$$
(95)

In this case, first the constrained baseline is solved as \({\check{b}}\), from which the integer ambiguity follows as \({\check{a}}={\check{a}}({\check{b}})\). Earlier we demonstrated in Sect. 3.3, that for the typical unconstrained mixed-integer GNSS model, one cannot expect the success-rate of \({\check{a}}(b)\) to be large. This may change however when constraints on b are included. If \({\mathcal {C}}\) is such that the variability in \({\check{a}}(b)\) is small in relation to the size of the pull-in regions, then only a few integer candidates \({\check{a}}(b)\) may need to be considered, thus simplifying the computation of \({\check{b}}\) in (95) considerably. A typical example from the field of GNSS attitude determination is one where the distance between the antennae is sufficiently small in relation to the used signal wavelength.

In (95) we recognize the dual function as \({\mathcal {D}}(b)=F({\check{a}}(b), b)\), thus giving

$$\begin{aligned} {\check{b}} = \arg \min _{b \in {\mathcal {C}}} {\mathcal {D}}(b) \end{aligned}$$
(96)

This shows that if \({\mathcal {C}}\) would be easily and efficiently covered by a box \({\mathcal {B}}\), that our method of the previous section is directly applicable to the constrained mixed-integer model (93) as well. In the more general case, one can use the indicator function of \({\mathcal {C}}\), \(p_{{\mathcal {C}}}(b)=\infty \) if \(b \notin {\mathcal {C}}\), and \(p_{{\mathcal {C}}}(b)=0\) otherwise, to replace the constrained formulation (96) by the unconstrained formulation \({\check{b}} = \arg \min _{b \in {\mathbb {R}}^{n}} \left[ {\mathcal {D}}(b)+p_{{\mathcal {C}}}(b)\right] \). Similar to our development in the previous section, the approach would then be to construct convex lower bounding functions using a continuous barrier function for \(p_{{\mathcal {C}}}(b)\) (Nocedal and Wright 2006; Nesterov 2018).

Partitioned Dual

Earlier we mentioned that a potentially attractive feature of the dual formulation is that the minimization of the objective function takes place in \({\mathbb {R}}^{p}\) instead of \({\mathbb {R}}^{n}\). This can be attractive if p is much smaller than n and the dual function \({\mathcal {D}}(b)\) is easily formed or acceptably approximated. This changes however if the dimension of \(b \in {\mathbb {R}}^{p}\) is large as well.

Still, even with p large, it could be that one is only interested in the ambiguity-resolved solution of some of the components of b, say \(b_{1}\) of \(b=[b_{1}^{T}, b_{2}^{T}]^{T} \in {\mathbb {R}}^{p_{1}+p_{2}}\). In that case one can combine the primal- and dual approach to obtain the decomposition

$$\begin{aligned} \begin{array}{l} F(a, b_{1}, b_{2})=||{\hat{b}}_{1}-b_{1}||_{Q_{{\hat{b}}_{1}{\hat{b}}_{1}}}^{2}\\ \quad +||{\hat{a}}(b_{1})-a||_{Q_{{\hat{a}}(b_{1}){\hat{a}}(b_{1})}}^{2}+||{\hat{b}}_{2}(a,b_{1})-b_{2}||_{Q_{{\hat{b}}_{2}(a,b_{1}){\hat{b}}_{2}(a,b_{1})}}^{2} \end{array} \end{aligned}$$

By now defining the partitioned dual \(\mathcal{P}\mathcal{D}(b_{1})\) as

$$\begin{aligned} \mathcal{P}\mathcal{D}(b_{1})=||{\hat{b}}_{1}-b_{1}||_{Q_{{\hat{b}}_{1}{\hat{b}}_{1}}}^{2}+||{\hat{a}}(b_{1})-{\check{a}}(b_{1})||_{Q_{{\hat{a}}(b_{1}){\hat{a}}(b_{1})}}^{2} \nonumber \\ \end{aligned}$$
(97)

with \({\check{a}}(b_{1})=\arg \min _{a \in {\mathbb {Z}}^{n}}||{\hat{a}}(b_{1})-a||_{Q_{{\hat{a}}(b_{1}){\hat{a}}(b_{1})}}^{2}\), the solution for a and b becomes

$$\begin{aligned} \left\{ \begin{array}{lcl} {\check{b}}_{1} &{}=&{} \arg \min \limits _{b_{1} \in {\mathbb {R}}^{p_{1}}}\mathcal{P}\mathcal{D}(b_{1})\\ {\check{a}} &{}=&{} {\check{a}}({\check{b}}_{1})\\ {\check{b}}_{2}&{}=&{} {\hat{b}}_{2}({\check{a}}, {\check{b}}_{1}) \end{array} \right. \end{aligned}$$
(98)

In this way one can still apply the dual approach to solve for \(b_{1}\) in a lower-dimensioned space. However, the price one pays for this reduction in dimension is that \({\hat{a}}(b_{1})\) will be less precise than \({\hat{a}}(b)\), implying that, in dependence of the application, \(\mathcal{P}\mathcal{D}(b_{1})\) may be more difficult to construct or to approximate.

9 Summary and conclusions

In this contribution we introduced and studied the dual mixed-integer least-squares formulation. The dual differs from its primal counterpart in the order in which the integer ambiguity vector \(a \in {\mathbb {Z}}^{n}\) and baseline vector \(b \in {\mathbb {R}}^{p}\) are estimated. In the primal formulation, the integer ambiguity estimator \({\check{a}}\) is computed first, followed by the computation of the ambiguity-resolved baseline estimator \({\check{b}}={\hat{b}}({\check{a}})\). In the dual formulation, the order is reversed, giving first \({\check{b}}\), followed by the ambiguity estimator \({\check{a}}={\check{a}}({\check{b}})\). As not the ambiguities, but rather the entries of b are usually the parameters of interest, the dual formulation seems a natural way to go. And this seems even more so for applications where the baseline dimension p is considerable less than the ambiguity dimension n. We show however that this optimism must somewhat be tempered, due to the fact that the formation of the dual objective function \({\mathcal {D}}(b)\) also requires the solution of an n-dimensional integer least-squares (ILS) problem, and even one that depends on the unknown baseline b. The potential advantage of the dual formulation lies therefore in the ease with which this implicit ILS problem can be solved. This problem is trivially solved in case the conditional ambiguity variance matrix is diagonal, which in case of GNSS happens for instance with the multi-frequency geometry-free model.

To make the dual formulation more computationally attractive, we introduced two approximations to it, both with the purpose of alleviating the computational demand on the implicit ILS problem. The first approximation is based on replacing the variance matrix of \({\hat{a}}(b)\) by a matrix of simpler structure (preferably diagonal), the second by replacing the integer estimator \({\check{a}}(b)\) by a simpler estimator (preferably integer rounding). To be able to describe the quality of their dual solutions, we provided a complete distributional analysis of the ambiguity- and baseline-estimators that follow from these approximations. This will allow users to rigorously judge whether or not the approximations made in the dual formulation are acceptable for their application. We hereby note that the second approximation of the dual formulation also resulted in the finding of a new class of admissible integer estimators, the pull-in regions of which were described as well.

To characterize the challenges that the computation of \({\check{b}}\), as a global minimizer of \({\mathcal {D}}(b)\), poses, we illustrated and emphasized the multimodality of the dual function \({\mathcal {D}}(b)\) with its many local minima. As a consequence, the dual function lacks convexity, thus requiring special care in developing an algorithm that is globally convergent. Our proposed method, which has finite termination with a guaranteed \(\epsilon \)-tolerance, is constructed from combining the branch-and-bound principle, with a special convex-relaxation of the dual, to which the projected-gradient-descent method is applied to obtain the required bounds. Each of the method’s three constituents are described, whereby special emphasis is given to the construction of the required continuously differentiable, convex lower bounding function of the dual. Illustrative examples are given to provide insight into the workings of the method, while in future work its numerical performance for advanced models will be evaluated. Finally, we provided an outlook on solving a constrained and partitioned dual.