Keywords

1 Introduction

About a decade ago, Xu et al. (2012) introduced the partial Errors-In-Variables (PEIV) model in order to accommodate for nonrandom elements within the coefficient matrix A, which describes the connection of the unknown parameters with the observations that were collected in order to determine them. Obviously, nonrandom elements can as well be considered as “random with zero variance” (and zero covariance if applicable), thus leading to a singular cofactor matrix \(D\{ \operatorname {\mathrm {vec}} A\} = \sigma _0^2 Q_A\) that is positive semidefinite.

Incidentally, at about the same time, Snow (2012) published his PhD dissertation in which a large part is concerned with exactly the much wider subclass of EIV-Models where the cofactor matrix \(Q_A\) is allowed to be singular (and, quite possibly, the cofactor matrix \(Q_y\), too). There, a variety of algorithms is proposed to handle all the cases in which a unique Total Least-Squares (TLS) solution exists.

Here, however, most of the attention will be directed to the case where \(Q_A\) shows a Kronecker product structure: \(Q_A := Q_0 \otimes Q_x\). For this special case, the original algorithm of Schaffrin and Wieser (2008) to find the Weighted TLS solution had been designed, with particular efficiency whenever \(Q_x := Q_y\) can be assumed. More often than not, this assumption is fulfilled when it comes to straight-line adjustment in two or three dimensions. Therefore, the prominent example in this study will be taken from this class.

In Sects. 2.1 and 2.2, a review is provided for the EIV-Model with singular cofactor matrix, resp. with a Kronecker-product structure for \(Q_A\). A similar review for the PEIV-Model will be found in Sect. 3, followed by a selection of algorithms for the various options in Sect. 4. Finally, the key algorithms will be compared in terms of their efficiency when applied to a number of typical examples in Sect. 5, before certain conclusions can be drawn in Sect. 6.

2 The EIV-Model and the Weighted TLS Solution—A Review

2.1 Potentially Singular Cofactor Matrices

The definition of the EIV-Model is given as

(1a)
$$\displaystyle \begin{gathered} \left[\begin{array}{cccccc} \boldsymbol{e}_y \\ \boldsymbol{e}_A \end{array}\right] \sim(\left[\begin{array}{cccccc} \boldsymbol{0} \\ \boldsymbol{0} \end{array}\right],\sigma_0^2 Q = \sigma_0^2 \left[\begin{array}{cccccc} \underset{n\times n}{Q_y} & Q_{yA} \\[0.75em] Q_{Ay} & \underset{nm\times nm}{Q_A} \end{array}\right] ), {} \end{gathered} $$
(1b)

Q symmetric, nonnegative-definite, with the usual notation as in Snow (2012) and Schaffrin et al. (2014), for instance; see also Fang (2011) whose derivation is reviewed in the following while temporarily assuming that Q is positive-definite with

$$\displaystyle \begin{aligned} \begin{gathered} Q^{-1} := \left[\begin{array}{cccccc} P_{11} & P_{12} \\ P_{21} & P_{22} \end{array}\right], \\ P_{11} = P_{11}^T, \; P_{21} = P_{12}^T, \; P_{22} = P_{22}^T. \end{gathered} \end{aligned} $$
(2)

Thus, the target function reads (with \(\boldsymbol {\lambda }\) as \(n\times 1\) vector of Lagrange multipliers):

$$\displaystyle \begin{aligned} \begin{gathered} \Phi(\boldsymbol{e}_y,\boldsymbol{e}_A,\boldsymbol{\xi},\boldsymbol{\lambda}) := \boldsymbol{e}_y^TP_{11} \boldsymbol{e}_y + 2\boldsymbol{e}_y^T P_{12} \boldsymbol{e}_A + \\ + \boldsymbol{e}_A^T P_{22} \boldsymbol{e}_A + 2\boldsymbol{\lambda}^T \big[\boldsymbol{y} - A\boldsymbol{\xi} - \boldsymbol{e}_y + \big(\boldsymbol{\xi}^T \otimes I_n\big)\boldsymbol{e}_A \big] {}, \end{gathered} \end{aligned} $$
(3)

which must be stationary, leading to the necessary Euler-Lagrange conditions:

$$\displaystyle \begin{aligned} \frac{1}{2}\frac{\partial\Phi}{\partial\boldsymbol{e}_y} &= P_{11} \tilde{\boldsymbol{e}}_y + P_{12} \tilde{\boldsymbol{e}}_A - \hat{\boldsymbol{\lambda}} \doteq \mathbf{0}, {} \end{aligned} $$
(4a)
$$\displaystyle \begin{aligned} \frac{1}{2}\frac{\partial\Phi}{\partial\boldsymbol{e}_A} &= P_{21} \tilde{\boldsymbol{e}}_y + P_{22} \tilde{\boldsymbol{e}}_A + \big(\hat{\boldsymbol{\xi}} \otimes I_n \big) \hat{\boldsymbol{\lambda}} \doteq \mathbf{0}, {} \end{aligned} $$
(4b)
$$\displaystyle \begin{aligned} \frac{1}{2}\frac{\partial\Phi}{\partial\boldsymbol{\xi}} \; &= -A^T\hat{\boldsymbol{\lambda}} + \tilde{E}_A^T \hat{\boldsymbol{\lambda}} \doteq \mathbf{0}, {} \end{aligned} $$
(4c)
$$\displaystyle \begin{aligned} \frac{1}{2}\frac{\partial\Phi}{\partial\boldsymbol{\lambda}} \; &= \boldsymbol{y} - A\hat{\boldsymbol{\xi}} - \tilde{\boldsymbol{e}}_y + \big({\hat{\boldsymbol{\xi}}}^T \otimes I_n \big) \tilde{\boldsymbol{e}}_A \doteq \mathbf{0}, {} \end{aligned} $$
(4d)

and the sufficient condition:

$$\displaystyle \begin{aligned} \begin{aligned} \frac{1}{2} \frac{\partial^2 \Phi}{\partial \left[\begin{array}{cccccc} \boldsymbol{e}_y \\ \boldsymbol{e}_A \end{array}\right] \partial \left[\begin{array}{cccccc} \boldsymbol{e}_y^T, & \boldsymbol{e}_A^T \end{array}\right]} = \left[\begin{array}{cccccc} P_{11} & P_{12} \\ P_{21} & P_{22} \end{array}\right] \\ \text{is positive-definite.} \end{aligned} \end{aligned} $$
(5)

Taking (4a) and (4b) together and solving for the combined residual vector gives

$$\displaystyle \begin{gathered} \begin{aligned} \left[\begin{array}{cccccc} \tilde{\boldsymbol{e}}_y \\ \tilde{\boldsymbol{e}}_A \end{array}\right] = \left[\begin{array}{cccccc} Q_y & Q_{yA} \\ Q_{Ay} & Q_A \end{array}\right] \cdot \left[\begin{array}{cccccc} I_n \\ -(\hat{\boldsymbol{\xi}} \otimes I_n) \end{array}\right] \cdot \hat{\boldsymbol{\lambda}} = \\ = \left[\begin{array}{cccccc} Q_y - Q_{yA} \big(\hat{\boldsymbol{\xi}} \otimes I_n\big) \\ Q_{Ay} - Q_A \big(\hat{\boldsymbol{\xi}} \otimes I_n\big) \end{array}\right] \cdot \hat{\boldsymbol{\lambda}} \end{aligned} {} \end{gathered} $$
(6a)

and, with (4d),

(6b)

thus,

$$\displaystyle \begin{gathered} \hat{\boldsymbol{\lambda}} = Q_1^{-1} (\boldsymbol{y} - A\hat{\boldsymbol{\xi}}) {} \end{gathered} $$
(7a)

for

(7b)

and

(7c)

Finally, the TLS solution can be obtained from (4c) through

$$\displaystyle \begin{gathered} \boldsymbol{0} = (A-\tilde{E}_A)^T \hat{\boldsymbol{\lambda}} = \\ = (A-\tilde{E}_A)^T Q_1^{-1} \big[(\boldsymbol{y}-\tilde{E}_A \hat{\boldsymbol{\xi}}) - (A-\tilde{E}_A) \hat{\boldsymbol{\xi}} \big]\notag \end{gathered} $$

as estimated parameter vector

$$\displaystyle \begin{gathered} {} \begin{gathered} \hat{\boldsymbol{\xi}} = \big[(A-\tilde{E}_A)^T Q_1^{-1} (A-\tilde{E}_A) \big]^{-1} \cdot \\ \cdot \big[(A-\tilde{E}_A)^T Q_1^{-1} (\boldsymbol{y} - \tilde{E}_A\hat{\boldsymbol{\xi}}) \big] \end{gathered} \end{gathered} $$
(8a)

with the residual vector

$$\displaystyle \begin{gathered} \left[\begin{array}{cccccc} \tilde{\boldsymbol{e}}_y \\ \tilde{\boldsymbol{e}}_A \end{array}\right] = QB^T \cdot Q_1^{-1} (\boldsymbol{y} - A\hat{\boldsymbol{\xi}}) \end{gathered} $$
(8b)

and the estimated variance component

$$\displaystyle \begin{gathered} \hat\sigma_0^2 = (\boldsymbol{y} - A\hat{\boldsymbol{\xi}})^T Q_1^{-1} (\boldsymbol{y} - A\hat{\boldsymbol{\xi}}) / (n-m). {} \end{gathered} $$
(8c)

This would be a “Fang-type” algorithm after Fang (2011).

Alternatively, (4c) and (7a) can be combined to

$$\displaystyle \begin{gathered} \begin{aligned} A^TQ_1^{-1} (\boldsymbol{y} - A\hat{\boldsymbol{\xi}}) = A^T\hat{\boldsymbol{\lambda}} = \tilde{E}_A^T \hat{\boldsymbol{\lambda}} = \\ = \operatorname{\mathrm{vec}}(\hat{\boldsymbol{\lambda}}^T\tilde{E}_A) = (I_m\otimes \hat{\boldsymbol{\lambda}}^T) \tilde{\boldsymbol{e}}_A \end{aligned} \end{gathered} $$
(9a)

and, using (6a), to

$$\displaystyle \begin{gathered} \begin{aligned} &A^TQ_1^{-1} (\boldsymbol{y} - A\hat{\boldsymbol{\xi}}) = \\ &\quad = (I_m\otimes \hat{\boldsymbol{\lambda}}^T) \big[ Q_{Ay}\hat{\boldsymbol{\lambda}} - Q_A(\hat{\boldsymbol{\xi}}\otimes\hat{\boldsymbol{\lambda}}) \big] = \\ &\quad = (I_m\otimes \hat{\boldsymbol{\lambda}}^T) \big[ Q_{Ay}\hat{\boldsymbol{\lambda}} - Q_A (I_m \otimes\hat{\boldsymbol{\lambda}}) \hat{\boldsymbol{\xi}} \big] \end{aligned} \end{gathered} $$
(9b)

from which the following equation can be obtained:

$$\displaystyle \begin{gathered} \begin{aligned} \big[A^TQ_1^{-1}A - (I_m\otimes \hat{\boldsymbol{\lambda}})^T Q_A (I_m\otimes \hat{\boldsymbol{\lambda}}) \big] \hat{\boldsymbol{\xi}} = \\ = A^TQ_1^{-1}\boldsymbol{y} - (I_m\otimes \hat{\boldsymbol{\lambda}})^T Q_{Ay} \hat{\boldsymbol{\lambda}}. {} \end{aligned} \end{gathered} $$
(9c)

Apparently, (9c) turns out to be a generalized form of formula (18a) in Schaffrin (2015) that allows treatment of EIV-Models with cross-covariances \(Q_{Ay}\) that are non-zero. This would be part of a modified “Mahboub-type” algorithm after Mahboub (2012) and Schaffrin (2015).

It is now noticed that, for a unique TLS solution to be obtained, only \(Q_1\) needs to be nonsingular, not Q itself! This means that the more restrictive rank condition

$$\displaystyle \begin{aligned} \operatorname{\mathrm{rk}} BQ = \operatorname{\mathrm{rk}} B = n {} \end{aligned} $$
(10)

ought to hold for the algorithms (8a)–(8c), resp. (9c) with (7a)–(7b) to work. But Neitzel and Schaffrin (2016) have already proved that the more general “Neitzel-Schaffrin condition”

(11)

is necessary and sufficient for the uniqueness of the Weighted TLS solution. So, there must be a way to generalize the algorithm (8a)–(8c) for the case that (10) is violated (\( \operatorname {\mathrm {rk}} BQ < n\)) but (11) is fulfilled. The generalization of (9c) is left for a future publication; but see Snow (2012, ch. 3.2) for some preliminary results, particularly the system (3.21) shown therein.

Assuming that \( \operatorname {\mathrm {rk}} [BQ \, \text{\textbar } \, A-\tilde {E}_A] = n\) holds true as well, the extended matrix

$$\displaystyle \begin{aligned} Q_3 := Q_1 + (A-\tilde{E}_A) S (A-\tilde{E}_A)^T > 0 {} \end{aligned} $$
(12)

will be nonsingular for any symmetric, positive-definite matrix S (that needs to be suitably chosen as it may affect the efficiency of our algorithms), where

$$\displaystyle \begin{gathered} Q_3 \cdot \hat{\boldsymbol{\lambda}} = (\boldsymbol{y} - A\hat{\boldsymbol{\xi}}) + (A-\tilde{E}_A) S [(A-\tilde{E}_A)^T \hat{\boldsymbol{\lambda}}] = \notag \\ = \boldsymbol{y} - A\hat{\boldsymbol{\xi}} = Q_1 \cdot \hat{\boldsymbol{\lambda}} \end{gathered} $$
(13a)

due to (4c). Thus, wherever \(\hat {\boldsymbol {\lambda }} = Q_1^{-1}(\boldsymbol {y} - A\hat {\boldsymbol {\xi }})\) appears in algorithm (8a)–(8c), it can simply be replaced by \(\hat {\boldsymbol {\lambda }} = Q_3^{-1}(\boldsymbol {y} - A\hat {\boldsymbol {\xi }})\) giving us the more general algorithm (13b)–(13d) as follows:

$$\displaystyle \begin{gathered} \begin{gathered} \hat{\boldsymbol{\xi}} = \big[(A-\tilde{E}_A)^T Q_3^{-1} (A-\tilde{E}_A) \big]^{-1} \cdot \\ \cdot \big[(A-\tilde{E}_A)^T Q_3^{-1} (\boldsymbol{y} - \tilde{E}_A\hat{\boldsymbol{\xi}}) \big], {} \end{gathered} \end{gathered} $$
(13b)
$$\displaystyle \begin{gathered} \left[\begin{array}{cccccc} \tilde{\boldsymbol{e}}_y \\ \tilde{\boldsymbol{e}}_A \end{array}\right] = QB^T \cdot Q_3^{-1} (\boldsymbol{y} - A\hat{\boldsymbol{\xi}}), {} \end{gathered} $$
(13c)
$$\displaystyle \begin{gathered} \hat\sigma_0^2 = (\boldsymbol{y} - A\hat{\boldsymbol{\xi}})^T Q_3^{-1} (\boldsymbol{y} - A\hat{\boldsymbol{\xi}}) / (n-m). {} \end{gathered} $$
(13d)

We note that the interpretation of the vector \(\boldsymbol {y} - \tilde {E}_A\hat {\boldsymbol {\xi }}\) is still not clear!

2.2 \(Q_A = Q_0 \otimes Q_x\) with Kronecker-Product Structure

Now, a special case should be treated where a Kronecker-product structure can be assumed for

$$\displaystyle \begin{aligned} \underset{nm\times nm}{Q_A} := \underset{m\times m}{Q_0} \otimes\underset{n\times n}{ Q_x} \quad \text{with} \quad Q_{Ay} = 0 = Q_{yA}^T. {} \end{aligned} $$
(14)

Then, \(Q_1 = BQB^T\) can be rewritten as

$$\displaystyle \begin{aligned} &Q_1 = \end{aligned} $$
(15a)
(15b)

and \(\hat {\boldsymbol {\lambda }}\) (if \(Q_1\) is nonsingular) as

$$\displaystyle \begin{aligned} & \hat{\boldsymbol{\lambda}} = Q_1^{-1} (\boldsymbol{y} - A\hat{\boldsymbol{\xi}}) = \\ &\qquad \qquad \quad = (Q_y + \hat{\boldsymbol{\xi}}^T Q_0 \hat{\boldsymbol{\xi}} \cdot Q_x)^{-1} (\boldsymbol{y} - A\hat{\boldsymbol{\xi}}), \end{aligned} $$
(15c)

which leads to the residual vector

$$\displaystyle \begin{aligned} \tilde{\boldsymbol{e}}_y = Q_y (Q_y + \hat{\boldsymbol{\xi}}^T Q_0 \hat{\boldsymbol{\xi}} \cdot Q_x)^{-1} (\boldsymbol{y} - A\hat{\boldsymbol{\xi}}) \end{aligned} $$
(15d)

and to the residual matrix

$$\displaystyle \begin{aligned} \tilde{E}_A = -Q_x (Q_y + \hat{\boldsymbol{\xi}}^T Q_0 \hat{\boldsymbol{\xi}} \cdot Q_x)^{-1} (\boldsymbol{y} - A\hat{\boldsymbol{\xi}}) \hat{\boldsymbol{\xi}}^T Q_0. \end{aligned} $$
(15e)

From (4c), it now follows that

$$\displaystyle \begin{aligned} -A^T\hat{\boldsymbol{\lambda}} &= A^T (Q_y + \hat{\boldsymbol{\xi}}^T Q_0 \hat{\boldsymbol{\xi}} \cdot Q_x)^{-1} (A\hat{\boldsymbol{\xi}} - \boldsymbol{y}) = \notag \\ &= -\tilde{E}_A^T \hat{\boldsymbol{\lambda}} = Q_0 \hat{\boldsymbol{\xi}} \cdot \hat\nu, {} \end{aligned} $$
(16a)

with the scalar

$$\displaystyle \begin{aligned} \begin{aligned} \hat\nu := (\boldsymbol{y} - A\hat{\boldsymbol{\xi}})^T (Q_y + \hat{\boldsymbol{\xi}}^T Q_0 \hat{\boldsymbol{\xi}} \cdot Q_x)^{-1} Q_x \cdot \\ \cdot (Q_y + \hat{\boldsymbol{\xi}}^T Q_0 \hat{\boldsymbol{\xi}} \cdot Q_x)^{-1} (\boldsymbol{y} - A\hat{\boldsymbol{\xi}}) \end{aligned} {} \end{aligned} $$
(16b)

and, thus, ultimately the estimated parameter vector as

$$\displaystyle \begin{aligned} \hat{\boldsymbol{\xi}} &= \big[A^T (Q_y + \hat{\boldsymbol{\xi}}^T Q_0 \hat{\boldsymbol{\xi}} \cdot Q_x)^{-1} A -\hat\nu \cdot Q_0\big]^{-1} \cdot \notag \\ &\quad \cdot A^T (Q_y + \hat{\boldsymbol{\xi}}^T Q_0 \hat{\boldsymbol{\xi}} \cdot Q_x)^{-1}\boldsymbol{y}, {} \end{aligned} $$
(16c)

which needs to be computed iteratively using (16b)–(16c) until convergence. This constitutes the original algorithm by Schaffrin and Wieser (2008) that only requires the invertibility of \(Q_1\) in (15b) and leads to the estimated variance component

$$\displaystyle \begin{aligned} \hat\sigma_0^2 &= \hat{\boldsymbol{\lambda}}^T (\boldsymbol{y} - A\hat{\boldsymbol{\xi}}) / (n-m) = \notag \\ &= (\boldsymbol{y} - A\hat{\boldsymbol{\xi}})^T (Q_y + \hat{\boldsymbol{\xi}}^T Q_0 \hat{\boldsymbol{\xi}} \cdot Q_x)^{-1} (\boldsymbol{y} - A\hat{\boldsymbol{\xi}}) \cdot \notag \\ &\quad \cdot (n-m)^{-1}. {} \end{aligned} $$
(16d)

Obviously, \(Q_1 = Q_y + \hat {\boldsymbol {\xi }}^T Q_0 \hat {\boldsymbol {\xi }} \cdot Q_x\) will be nonsingular as long as \(Q_y\) is nonsingular, which is, however, not always necessary due to the second term. On the other hand, oftentimes the cofactor matrices \(Q_y\) and \(Q_x\) turn out to be identical:

$$\displaystyle \begin{aligned} Q_x := Q_y, \end{aligned} $$
(17)

in which case the algorithm (16b)–(16d) simplifies to

$$\displaystyle \begin{aligned} \hat{\boldsymbol{\lambda}} & = Q_y^{-1} (\boldsymbol{y} - A\hat{\boldsymbol{\xi}}) \cdot (1 + \hat{\boldsymbol{\xi}}^T Q_0 \hat{\boldsymbol{\xi}})^{-1}, {} \end{aligned} $$
(18a)
$$\displaystyle \begin{aligned} \hat\nu & = \hat{\boldsymbol{\lambda}}^T Q_y \hat{\boldsymbol{\lambda}} = \hat{\boldsymbol{\lambda}}^T (\boldsymbol{y} - A\hat{\boldsymbol{\xi}}) \cdot (1 + \hat{\boldsymbol{\xi}}^T Q_0 \hat{\boldsymbol{\xi}})^{-1}, {} \end{aligned} $$
(18b)
$$\displaystyle \begin{aligned} \hat{\boldsymbol{\xi}} & =\big[A^TQ_y^{-1}A - \hat{\boldsymbol{\lambda}}^T (\boldsymbol{y} - A\hat{\boldsymbol{\xi}}) \cdot Q_0\big]^{-1} A^TQ_y^{-1}\boldsymbol{y}, {} \end{aligned} $$
(18c)
$$\displaystyle \begin{aligned} \hat\sigma_0^2 &= \hat{\boldsymbol{\lambda}}^T (\boldsymbol{y} - A\hat{\boldsymbol{\xi}}) / (n-m) = \notag \\ &= \hat\nu \cdot (1 + \hat{\boldsymbol{\xi}}^T Q_0 \hat{\boldsymbol{\xi}}) / (n-m), {} \end{aligned} $$
(18d)

but requires \(Q_y\) to be invertible; in contrast, \(Q_0\) and—thus—\(Q_A\) may be singular! In particular, \(Q_0 := 0\) refers to the classical Gauss-Markov Model (GMM).

Although the algorithm (18a)–(18d) loses its validity in case of a singular matrix \(Q_y\) and thus singular \(Q_1 = (1 + \hat {\boldsymbol {\xi }}^T Q_0 \hat {\boldsymbol {\xi }}) \cdot Q_y\), it is still possible to handle this case along the lines of algorithm (13b)–(13d), but without the “gain in efficiency” from the Kronecker-product structure of the matrix \(Q_A\).

3 The Special Case of the Partial Errors-In-Variables (PEIV) Model

This special subgroup covers all the EIV-Models where some of the elements within the matrix A happen to be nonrandom. But, instead of introducing zero variances (and zero covariances) with the corresponding singular cofactor matrix \(Q_A\), Xu et al. (2012) preferred a dualistic viewpoint and re-wrote the observation equations from (1a) as

$$\displaystyle \begin{gathered} \begin{aligned} \boldsymbol{y} &= (A-E_A)\boldsymbol{\xi} + \boldsymbol{e}_y = \\ &= (\boldsymbol{\xi}^T \otimes I_n) (\operatorname{\mathrm{vec}} A - \boldsymbol{e}_A) + \boldsymbol{e}_y =: \\ &=: (\boldsymbol{\xi}^T \otimes I_n) \cdot \underset{nm\times 1}{\boldsymbol{\mu}_A} + \boldsymbol{e}_y, \end{aligned} \end{gathered} $$
(19a)

where \(\boldsymbol {\mu }_A\) is split into

$$\displaystyle \begin{gathered} \boldsymbol{\mu}_A := \boldsymbol{\alpha} + G\cdot \boldsymbol{\mu}_a \quad \text{with} \quad \boldsymbol{\mu}_a := \underset{t\times 1}{\boldsymbol{a}} - \boldsymbol{e}_a. \end{gathered} $$
(19b)

Here the \(t\times 1\) vector \(\boldsymbol {a}\) contains a basis for the random elements of A, whereas the \(nm\times 1\) vector \(\boldsymbol {\alpha }\) shows all nonrandom elements of \( \operatorname {\mathrm {vec}} A\) plus zeros elsewhere. As a result, the actual random elements within A are generated through the product of the vector \(\boldsymbol {a}\) of basis elements with the (given) \(nm\times t\) matrix G, and they show up in all those places where the \(nm\times 1\) vector \(\boldsymbol {\alpha }\) shows zeros.

In addition, let the random error vectors be specified by

$$\displaystyle \begin{aligned} \left[\begin{array}{cccccc} \boldsymbol{e}_y \\ \boldsymbol{e}_a \end{array}\right] \sim(\left[\begin{array}{cccccc} \boldsymbol{0} \\ \boldsymbol{0} \end{array}\right],\sigma_0^2 Q = \sigma_0^2 \left[\begin{array}{cccccc} \underset{n\times n}{Q_y} & 0 \\[0.75em] 0 & \underset{t\times t}{Q_a} \end{array}\right]), \end{aligned} $$
(19c)

with symmetric, positive-definite matrix Q. Any cross-covariances between \(\boldsymbol {y}\) and \(\boldsymbol {a}\) could also be considered, but they are avoided here to keep the following development of formulas relatively simple.

In analogy to the GMM variant in Schaffrin (2015), let the target function be defined by

$$\displaystyle \begin{aligned} \Phi(\boldsymbol{\mu}_a, \boldsymbol{\xi}) &:= (\boldsymbol{a}-\boldsymbol{\mu}_a)^T Q_a^{-1} (\boldsymbol{a}-\boldsymbol{\mu}_a) + \\ & + \big[\boldsymbol{y} -(\boldsymbol{\xi}^T\otimes I_n) (\boldsymbol{\alpha} + G\boldsymbol{\mu}_a) \big]^T \cdot \\ &\quad \cdot Q_y^{-1} \big[\boldsymbol{y} -(\boldsymbol{\xi}^T\otimes I_n) (\boldsymbol{\alpha} + G\boldsymbol{\mu}_a) \big], \end{aligned} $$
(20)

which must be made stationary, leading to the necessary Euler-Lagrange conditions

$$\displaystyle \begin{aligned} \frac{1}{2} \frac{\partial\Phi}{\partial\boldsymbol{\mu}_a} &= -Q_a^{-1} (\boldsymbol{a} - \hat{\boldsymbol{\mu}}_a) - G^T(\hat{\boldsymbol{\xi}}\otimes I_n) Q_y^{-1} \cdot \notag \\ & \quad \cdot \big[\boldsymbol{y} -(\hat{\boldsymbol{\xi}}^T\otimes I_n) (\boldsymbol{\alpha} + G\hat{\boldsymbol{\mu}}_a) \big] \doteq \boldsymbol{0}, {} \end{aligned} $$
(21a)
$$\displaystyle \begin{aligned} \frac{1}{2} \frac{\partial\Phi}{\partial\boldsymbol{\xi}} &= \big(\left[\begin{array}{cccccc} \boldsymbol{\alpha}_1^T \\ \cdots \\ \boldsymbol{\alpha}_m^T \end{array}\right] + \left[\begin{array}{cccccc} \hat{\boldsymbol{\mu}}_a^T G_1^T \\ \cdots \\ \hat{\boldsymbol{\mu}}_a^T G_m^T \end{array}\right]\big) Q_y^{-1} \cdot \\ &\quad \cdot \big[\boldsymbol{y} -(\hat{\boldsymbol{\xi}}^T\otimes I_n) (\boldsymbol{\alpha} + G\hat{\boldsymbol{\mu}}_a) \big] \doteq \boldsymbol{0}, {} \end{aligned} $$
(21b)

where the terms \(\underset {n\times 1}{\boldsymbol {\alpha }_i}\) and \(\underset {n\times t}{G_i}\) come from

$$\displaystyle \begin{aligned} \begin{gathered} \underset{1\times nm}{\boldsymbol{\alpha}^T} := \left[\begin{array}{cccccc} \boldsymbol{\alpha}_1^T, & {\cdots}, & \boldsymbol{\alpha}_m^T \end{array}\right] \quad \text{and} \\ \underset{t\times nm}{G^T} := \left[\begin{array}{cccccc} G_1^T, & {\cdots}, & G_m^T \end{array}\right]. \end{gathered} \end{aligned} $$
(21c)

Reordering (21a) and (21b) yields first

$$\displaystyle \begin{aligned} & \big[Q_a^{-1} + G^T(\hat{\boldsymbol{\xi}}\otimes I_n) Q_y^{-1} (\hat{\boldsymbol{\xi}}^T\otimes I_n) G \big] \hat{\boldsymbol{\mu}}_a =: \\ &\qquad \quad \qquad =: \big[Q_a^{-1} + S_{\hat\xi}^T Q_y^{-1} S_{\hat\xi} \big] \hat{\boldsymbol{\mu}}_a = \\ & = Q_a^{-1} \cdot \boldsymbol{a} + S_{\hat\xi}^T Q_y^{-1} \big[\boldsymbol{y} - \boldsymbol{\alpha}_1 \cdot \hat\xi_1 - \cdots - \boldsymbol{\alpha}_m \cdot \hat\xi_m \big], \end{aligned} $$
(22a)

with

$$\displaystyle \begin{gathered} \underset{n\times t}{S_{\hat\xi}} := G_1\cdot \hat\xi_1 + \cdots + G_m\cdot \hat\xi_m, \end{gathered} $$
(22b)

and then

$$\displaystyle \begin{gathered} \begin{aligned} \big([\boldsymbol{\alpha}_i + G_i\hat{\boldsymbol{\mu}}_a]^T Q_y^{-1} [\boldsymbol{\alpha}_i + G_i\hat{\boldsymbol{\mu}}_a]\big) \cdot \hat{\boldsymbol{\xi}} = {}\\ = [\boldsymbol{\alpha}_i + G_i\hat{\boldsymbol{\mu}}_a]^T Q_y^{-1} \cdot \boldsymbol{y}. \end{aligned} \end{gathered} $$
(22c)

Furthermore, the residual vectors result from

$$\displaystyle \begin{gathered} \begin{gathered} \tilde{\boldsymbol{e}}_y = \boldsymbol{y} - (\hat{\boldsymbol{\xi}}^T\otimes I_n) (\boldsymbol{\alpha} + G\hat{\boldsymbol{\mu}}_a) \quad \text{and} \\ \tilde{\boldsymbol{e}}_a = \boldsymbol{a} - \hat{\boldsymbol{\mu}}_a, \end{gathered} \end{gathered} $$
(22d)

and the estimated variance component from

$$\displaystyle \begin{gathered} \hat\sigma_0^2 = (\tilde{\boldsymbol{e}}_y^T Q_y^{-1} \tilde{\boldsymbol{e}}_y + \tilde{\boldsymbol{e}}_a^T Q_a^{-1} \tilde{\boldsymbol{e}}_a) / (n-m). {} \end{gathered} $$
(22e)

The above just describes the original approach by Xu et al. (2012). Various improvements in terms of computational efficiency were later achieved by Shi et al. (2015), Wang et al. (2016), and Zhao (2017). Moreover, Wang et al. (2017) and Han et al. (2020) also allowed for cross-covariances between \(Q_y\) and \(Q_a\), a case that is included in the numerical experiments of Sect. 5.

4 The Various Algorithms

The various algorithms for weighted TLS solutions compared in this contribution are listed below in bulleted form with brief descriptions. For further details about them, the reader is referred to the references provided.

Algorithms for Weighted TLS Solutions Within the EIV-Model

  1. 1.

    Snow’s (2012) algorithm, (8a)–(8c), referred to as A herein: Presented as Algorithm 1 on p. 18 of Snow (2012), this algorithm handles cross-covariances via matrix \(Q_{yA}\) but requires matrix \(Q_1\) of (7b) to be non-singular. Obviously, it works too when \(Q_{yA} = 0\). This algorithm was also presented in Fang (2011).

  2. 2.

    Snow’s (2012) algorithm, (13b)–(13d), referred to as B herein: Presented as Algorithm 3 on p. 31 of Snow (2012), this algorithm handles both cross-covariances via matrix \(Q_{yA}\) and a singular matrix \(Q_1\). This is the only algorithm we consider here that can handle a singular matrix \(Q_1\). We note that we only experimented with \(S := I_m\) in our formulation of matrix \(Q_3\) defined in (12).

  3. 3.

    Schaffrin’s (2015) algorithm, (7a)–(7b) with (9c), referred to as D herein: This algorithm is encapsulated in equations (8b), (9), and (18b) of Schaffrin (2015). However, in light of (7b) and (9c), it now allows for a non-zero cross-covariance matrix \(Q_{yA}\).

  4. 4.

    Schaffrin and Wieser’s (2008) algorithm, (16b)–(16d), which requires that the cofactor matrix \(Q_A\) can be expressed as a Kronecker product according to (14) and does not currently allow for a cross-covariance matrix \(Q_{yA}\). Full details can be found in Schaffrin and Wieser (2008).

Algorithms for Weighted TLS Solutions Within the Partial EIV-Model

  1. 5.

    Xu’s et al. (2012), (22c)–(22e), algorithm: Reviewed in Sect. 3 above, this is the original algorithm for the weighted TLS solution within the partial EIV-model. It does not allow for a cross-covariance matrix \(Q_{yA}\).

  2. 6.

    Shi’s et al. (2015) algorithm: The authors’ stated purpose was to provide an improvement in efficiency to Xu et al. (2012) for the case when the number of independent random variables in the coefficient matrix A was much larger than the length of the observation vector \(\boldsymbol {y}\). It was also stated that if the converse was true, Xu’s original algorithm would be superior. Like Xu’s original algorithm, this algorithm also does not allow for a cross-covariance matrix \(Q_{yA}\).

  3. 7.

    Wang’s et al. (2017) algorithm: Wang et al. (2016) and Wang et al. (2017) published papers on the partial EIV-Model in Chinese, the latter of which extended the model to accommodate a cross-covariance matrix \(Q_{yA}\). Han et al. (2020) described an algorithm that also handles cross-covariances, but we found that it is often about half as fast as Wang’s, so we did not include it in the tabulated results below.

  4. 8.

    Zhao’s (2017) algorithm: Stating a motivation to improve upon the algorithms by Xu et al. (2012) and Shi et al. (2015), Zhao (2017) developed yet another algorithm for a TLS solution within the partial EIV-Model that he argued should be preferred over those earlier algorithms because of a reduction he found in both the number of iterations and total time required to solve 2D affine and similarity transformation problems. He also compared these algorithms to a “Fang-type” algorithm, listing his results in his Tables 5 and 8, which show a drastic reduction in both iterations and time compared to Xu et al. (2012) and Shi et al. (2015), but a more marginal improvement in time over Fang’s type without any reduction in the number of iterations. We note that Zhao’s algorithm also does not allow for a cross-covariance matrix \(Q_{yA}\).

Classical Algorithm Without Direct Reference to an EIV-Model as Described Herein

  1. 9.

    Deming’s (1931, 1934) algorithm (within a Gauss-Helmert Model): Finally, we mention the classical least-squares solution within the Gauss-Helmert Model, which might also be referred to as “Deming’s algorithm.” Because of its long-time usage and well-known behavior, we chose to include it in our experiments for comparison purposes. An example of a rigorous presentation of it can be found in Schaffrin and Snow (2010) as well as in chapter 4 of Snow (2012), among others.

4.1 Uniformity of Algorithm Coding

Many factors that are beyond the scope of our work here could be considered when writing computer code to optimize efficiency (time) in numerical computing. However, our aim was not to try write code that could run as fast as possible, which would have been a somewhat arduous task considering the number of algorithms we chose to compare. In fact, because we had already written some algorithms in MATLAB in the past, and because other authors cited above have published or shared their algorithms in MATLAB, we decided to stick with that language, though we might have written faster code in C++, for example.

What we were mainly concerned with was following a few simple practices for writing efficient code in MATLAB while keeping the code relatively easy to read. We strove to do this consistently for all the algorithms we tested. To summarize, we used the following guidelines to help ensure all the algorithms were coded with a similar level of efficiency:

  • The MATLAB inverse function was never used to solve a system of equations. Instead its “backslash operator” was used.

  • When a cofactor matrix had to be inverted to obtain a weight matrix, its Cholesky factor was computed and inverted to save time.

  • The inversion of a cofactor matrix to obtain a weight matrix was never done more than once in any algorithm. Thus, if a weight matrix appeared multiple times in formulas, its inverse was saved and reused within the algorithm.

  • Likewise, certain matrix products that occurred multiple times within an algorithm were computed once, saved, and reused.

  • Product involving two matrices, say A and B, and a trailing vector, say \(\boldsymbol {c}\), were grouped as A(\(B\boldsymbol {c}\)) to reduce the number of dot products.

  • Matrices that were populated in conditional loops were appropriately sized first, so that memory was not repeatedly reallocated at successive iterations in the loop—a notoriously slow process.

  • MATLAB sparse matrices were used for diagonal cofactor matrices and for identity matrices appearing in Kronecker products, though this was not necessary for most of our small datasets.

Other time-saving techniques or operations that we might have left out were, at least, done so consistently among all algorithms. We do not suspect that any inefficiencies remaining in our code would affect the number of iterations required by the algorithms.

Regarding the reporting of execution times in the following chapter, we acknowledge that what counts most here is the relative times among the algorithms, since many factors related to hardware, software, and available computing resources could influence the absolute times. To try to minimize these factors, we wrote a high-level script that called all algorithms sequentially 5000 times each for each problem. Any open programs other than MATLAB were closed before executing the script. This means that the same computer was in more-or-less the same state for all algorithms used in the comparisons. Nevertheless, the times surely will vary between repeated instances of the same test. As such, we would not distinguish between two algorithms with times (per 5000 runs) that agree within 10 percent of each other.

5 Numerical Experiments

A motivation for the experiments that follow were the claims or suggestions in many of the cited papers on the partial EIV-Model that TLS solutions within that model should be preferred over those within the standard EIV-Model laid out in Sect. 2 above. The arguments are usually made in favor of computational efficiency, viz. fewer iterations to convergence or faster overall computational times. Or the argument is sometimes made that the partial EIV model and associated TLS solutions are easier to formulate. One only needs to peruse those papers to find such statements. We certainly reject the argument regarding ease of model and algorithm formulation, as we take the standard model to be more elegant and simpler in form; that does seem obvious to us, but of course others may think differently.

In any case, we would not argue with anyone who selects an algorithm based on its savings in time, especially when computations are being made in time-critical situations. Thus we conducted the following experiments to see how the various algorithms listed above compared in a variety of problems and datasets. The problem types we explored were 2D line fitting, 2D affine and similarity transformations, and a third-order auto-regressive problem. However, for the sake of space, we only report on the 2D line-fitting problem here.

Table 1 lists some results obtained from six different datasets identified in the first column, together with their number of points. The first dataset is a combination of two that appear in section 17 of Deming (1964). The datasets labeled Haneberg, Pearson, and Niemeier can be found in Schaffrin and Snow (2020). Neri’s data can be found in Snow (2012) and Kelly’s data in Kelly (1984). The table also lists Neri*, which are the original Neri data with simulated cross-covariances added in Snow (2012).

Table 1 Results of 2D line-fitting: data set, number of points, and number of iterations/times in s for 5000 executions. Low and high times shown in bold, and those within 0.2 s. Neri* includes a non-zero cross-covariance matrix, which some of the algorithms cannot accommodate

The table shows the number of iterations to convergence and the time in seconds for 5000 consecutive executions. Note that each consecutive execution represents an independent call to the algorithm’s function so that no results from a previous execution are used in a subsequent one. The convergence criterion requires the norm of the incremental vector of estimated parameters to be less than a specified value; the value used was 10−10. We do not bother listing estimated parameter, residuals, or total SSR, as these values were the same for at least six digits beyond the decimal point for all algorithms.

The lowest and highest times for each dataset are highlighted in bold typeface (and those within 0.2 s, too). The algorithms from Wang et al. (2017) and Schaffrin and Wieser (2008) have the lowest times, and the “Fang-type” algorithm B from Snow (2012) or that of Xu et al. (2012) has the highest. For Snow’s algorithm, we do not know whether “better choices” for matrix \(S := I_m\) appearing in \(Q_3\) might have led to lower times. However, we should say again that among the algorithms featured here, only Snow’s A and B (also Fang 2011), and now Schaffrin’s D, and that of Wang et al. (2017), resp. Deming’s can handle a cross-covariance matrix \(Q_{yA}\); and only algorithm B can handle a singular cofactor matrix \(Q_1\). Moreover, the algorithm of Schaffrin and Wieser (2008), which performs admirably here, cannot be used in transformation problems, since the cofactor matrix \(Q_A\) cannot easily be expressed as a Kronecker product in those problems.

6 Conclusions and Outlook

Our study has confirmed that a variety of published algorithms for the TLS solution within (partial) EIV-Models yield equivalent numerical results for the estimated parameters and residuals across a number of tested datasets. This is to be expected. It also suggests that one can do just fine working within the standard EIV-Model rather than resorting to the partial EIV-Model, though it might be worth adopting the latter in certain cases. We suggest the following logic for choosing a TLS algorithm, beginning with more general cases and moving towards more specific ones.

  • If the cofactor matrix \(Q_1\) is singular, choose Snow’s algorithm B.

  • If the cross-covariance matrix \(Q_{yA}\) is nonzero, choose between Snow’s algorithm A, Schaffrin’s D, and that of Wang et al. (2017).

  • If the cross-covariance matrix \(Q_{yA}\) is zero, consider Schaffrin’s D and that of Wang et al. (2016).

  • If the cofactor matrix \(Q_A\) can be expressed as a Kronecker product according to (14), consider first the algorithm from Schaffrin and Wieser (2008).

Our future work will consider making further efficiency improvements to the algorithms as we have coded them and will report on their performance among a wider variety of problems and datasets.