1 Introduction

Wiśniewski (2009) proposed a method for estimating parameters in split functional models of geodetic observations. Such a split occurs when two functional models that differ from each other in terms of mutually competing versions of the same parameter can correspond to a single observation. For example, in a network deformation analysis carried out based on an aggregate set of observations obtained during two measurement epochs, any given observation from this set corresponds to one of the following models: a functional model of observations from the first measurement epoch or a functional model of observations obtained during the second measurement epoch. Another example concerns the sets containing outliers. In that case, any given observation from this set can be a “good” observation or a wrong observation with a functional model appropriate for it.

The proposed method called Msplit estimation assumes that if a particular observation already occurs, it brings two mutually competing pieces of f-information (Jones and Jones 2000) determined in relation to two versions of the same parameter (Wiśniewski 2009, 2010). Msplit estimators of these versions are the quantities that minimise the aggregate information being the product of competing information. Similar assumptions are also adopted in the maximum likelihood method (ML-method) (e.g. Rao 1973; Wiśniewski 2017). However, this method does not allow the existence of several versions of the parameters in a functional model relating to the same observation. From this perspective, Msplit estimation can be regarded as a particular kind of development of the ML method. In the absence of competing versions of the parameter, Msplit estimators become ML-estimators. Since the study conducted by Huber (1964), a generalisation of the ML-method has been very popular, known as M-estimation, in which f-information is replaced by certain arbitrary functions. A similar substitution can also be observed in Msplit estimation. Msplit estimation based on L1 norm condition was developed to take advantage of this possibility (Wyszkowska and Duchnowski 2019).

The general theory of Msplit estimation was developed without detailed assumptions about probabilistic observation models, which enables the creation of Msplit estimation varieties corresponding to specific models of this nature. The most commonly accepted probabilistic model of geodetic observations is the normal distribution. The family of normal distributions corresponds to the basic variant of Msplit estimation called “squared Msplit estimation”. This variant of Msplit estimation can be regarded as a particular type of expansion of the least-squares (LS) method (Wiśniewski 2009). In the absence of competing parameter versions, squared Msplit estimators become LS-estimators. Whenever Msplit estimation is mentioned further on, this will indicate this particular variant.

Msplit estimation was applied inter alia in the analysis of geodetic network deformation (Duchnowski and Wiśniewski 2012, 2014; Zienkiewicz 2014; Zienkiewicz et al. 2017; Wiśniewski and Zienkiewicz 2016). In these problems, Msplit estimation is particularly effective in identifying stable potential reference points (PRPs) (Nowel 2019). Janowski and Rapiński (2013) applied Msplit estimation in 3D modelling, primarily for the detection of surface structures (e.g. roof planes) of engineering structures. The modelling was carried out based on laser scanning data. A similar problem is also analysed in a study by Janicka et al. (2020), where Msplit estimation was proposed as a means of detecting and determining the displacements of adjacent planes. Laser scanning data also provided the basis for determining the terrain profiles using Msplit estimation (Błaszczak-Bąk et al. 2015; Wyszkowska et al. 2021).

Msplit estimation can provide an alternative to M-estimation which is robust to gross errors. The possibility of such applications was indicated in the following studies (Wiśniewski 2009, 2010; Yang et al. 2010; Ge et al. 2013; Janicka and Rapinski 2013; Amiri-Simkooei et al. 2017). Wiśniewski and Zienkiewicz (2021a, b) demonstrated that with properly established, competitive functional models, the robustness of Msplit estimators to gross errors is their inherent feature. The robustness of these estimators in a wider context (e.g. to poorly chosen models) was analysed in detail by Duchnowski and Wiśniewski (2019, 2020).

In both the traditional models and split functional models constructed on their basis (in Msplit estimation), it is assumed that only observations are affected by random errors. Currently, an errors-in-variables (EIV) model, in which design matrix elements are also affected by random errors, is applied in many geodetic problems. For example, this model was applied in geodetic datum transformation (Teunissen 1988; Davis 1999; Acar et al. 2006; Akyilmaz 2007; Schaffrin and Felus 2008; Mahboub 2012; Fang 2015; Aydin et al. 2018; Mercan et al. 2018)) as well as in remote sensing (Felus and Schaffrin 2005), in a function approximation (Wang and Zhao 2019), in linear regression (Schaffrin and Wieser 2008; Amiri-Simkooei and Jazaeri 2012; Zeng et al. 2018; Lv and Sui 2020) and in the least-squares collocation (Schaffrin 2020; Wiśniewski and Kamiński 2020). The effect of the random design matrix on the weighted LS estimate is presented in Xu et al. (2014). This paper also proposed a bias-corrected weighted LS estimate for the EIV model. A developed EIV stochastic model and an estimation of the model's components (using, inter alia, the MINQUE method) are presented in Xu and Liu (2014).

The estimation of parameters in functional models extended to the EIV form is most commonly carried out using the total least-squares (TLS) method. The optimisation problem of this method as well as its solution based on the singular value decomposition (SVD) was presented by Golub and Loan (1980). TLS using the SVD procedure was developed and adapted to geodetic purposes as well (e.g. Felus 2004; Akyilmaz 2007; Schaffrin and Felus 2008). Another way to solve the TLS optimisation problem, based on a nonlinear Lagrange function, is proposed in Schaffrin et al. (2006).

In the practical applications of the TLS method, besides having effective algorithms at one's disposal, the possibility for taking into account random weights of EIV model components is also very important. The basic solutions in this regard were presented by van Huffel and Vandewalle (1991), who established the generalised total least-squares (GTLS) method. On the other hand, Schaffrin and Wieser (2008) proposed an expansion of the TLS method, in which weights were derived from the adopted covariance matrix models (stochastic models). In the method proposed in the cited study, called “the weighted total least-squares (WTLS) method”, stochastic models can apply to both observation vectors and the vectors created from random errors affecting the design matrix.

WTLS is still developed and analysed. For example, Fang (2013) analysed necessary and sufficient conditions for WTLS optimality. Amiri-Simkooei (2017, 2018) presented the theory behind the constrained weighted total least-squares (CWTLS) method. The WTLS optimisation problem was also formulated and solved using a second-order approximation function (Wang and Zhao 2019). Due to their nonlinear nature, the WTLS estimators are biased. Bias-corrected versions of these estimators are presented in studies by (Xu et al. 2012; Tong et al. 2015). Moreover, an important problem in WTLS theory and practice is the assessment of the accuracy of the determined estimators. What might be helpful in this regard are the strategies for determining the covariance matrix of WTLS estimates (Amiri-Simkooei et al. 2016) and methods for estimating the variance components in EIV models (Xu and Liu 2014).

In the optimisation problem of the WTLS method based on the Lagrange approach, the objective function is minimised with the conditions defined by the nonlinear EIV model. An iterative algorithm to solve this problem was proposed in (Schaffrin and Wieser 2008). Shen et al. (2011), based on Newton–Gauss algorithm of nonlinear LS adjustment (Pope 1974), proposed another iterative method for solving WTLS problems, which is easier in practical applications. In this model, a nonlinear EIV model is replaced with a linear approximation, which significantly facilitates the organisation of a corresponding computational algorithm.

The origin of TLS or WTLS estimation is the LS-method which is neutral for all observations. The WTLS estimators' lack of robustness to gross errors is, therefore, an inherent feature, which may restrict the scope of the practical application of these estimators. Therefore, a robust estimation of EIV model parameters is of interest to many authors. For example, Wang et al. (2016) proposed a robust total least-squares (RTLS) method in which the robustness of WTLS estimators was obtained by means of the application of weight functions adopted in robust M-estimation. Another proposal, based on the least trimmed squares (LTS) method, was presented by Lv and Sui (2020). In that method, the authors used the inherent robustness of estimators minimising the sum of the squared orthogonal errors. They called the LTS version adjusted to EIV models “total least trimmed squares” (TLTS).

The current study will apply the EIV model in a basic Msplit estimation variant. When referring to the WTLS method, models of covariance matrices of random components of this model will also be taken into account. According to the basic principles of Msplit estimation, it will be assumed that the parameters in EIV model will have two mutually competing versions (which consequently leads to the split of this model). The objective function of the proposed method, called the “Total Msplit (TMsplit) estimation”, will be created through the application of the Lagrange approach (Schaffrin and Wieser 2008) using the approach adopted in (Shen et al. 2011), i.e. the split EIV models will be replaced with their linear approximations.

The paper is organised as follows. As the proposed method is an expansion of Msplit estimation, which takes into account the basic assumptions used in the WTLS method, it appears necessary to review the theoretical foundations of both these methods. These foundations, set in the context relevant to this study, are provided in Sect. 2. The theory behind TMsplit estimation and its algorithm are provided in Sect. 3. In Sect. 4, examples of the method application will be provided. TMsplit estimation will be applied to estimate parameters in competing bias models (Sect. 4.1). The obtained results will be compared with classical Msplit estimators calculated in Wiśniewski (2010). In Sect. 4.2, the data provided in Neri et al. (1989) and also used, inter alia, in studies by Schaffrin and Wieser (2008), Shen et al. (2011) and Mahboub (2012), will be used to determine TMsplit estimators of linear regression parameters. It will be assumed that the basic set of “good” observations is disturbed by “strange” observations for which a corresponding regression line also exists. Moreover, in this Chapter, the behaviour of TMsplit estimators will be checked in the event that the set contains one observation affected by gross error with different values. The determined TMsplit estimators will be compared with the WTLS estimators published in the cited studies. TMsplit estimators' robustness to gross errors is additionally analysed using an example of a two-dimensional affine transformation (Sect. 4.3). The data for this example are derived from (Lv and Sui 2020). The RTLS and TLTS estimators showed in the cited paper will be compared with TMsplit estimators. The paper concludes with a summary.

2 Review of Msplit and WTLS estimation

2.1 Msplit estimation

Let \({\mathbf{y}} = {\mathbf{AX}} + {\mathbf{v}}\) be a functional model of \({\mathbf{y}} = [y_{1} , \ldots ,y_{n} ]^{T}\) observation vector, where \({\mathbf{A}}\) is the \(n \times m\) coefficient matrix (\(rank({\mathbf{A}}) = m\)), \({\mathbf{X}}\) is the m-vector of unknown parameters to be estimated, and \({\mathbf{v}}\) is n-vector of random observation errors. In Msplit estimation, this model is split into two models:

$$ {\mathbf{y}} = {\mathbf{AX}}_{\alpha } + {\mathbf{v}}_{\alpha } \quad {\text{and}}\quad {\mathbf{y}} = {\mathbf{AX}}_{\beta } + {\mathbf{v}}_{\beta } $$
(1)

where \({\mathbf{X}}_{\alpha }\) and \({\mathbf{X}}_{\beta }\) are mutually competing versions of the same vector of parameters X. The vectors \({\mathbf{v}}_{\alpha }\),\({\mathbf{v}}_{\beta }\) are respective versions of the vector v, which result from the observation errors and the errors of the functional models.

Msplit estimators of parameters \({\mathbf{X}}_{\alpha }\) and \({\mathbf{X}}_{\beta }\) are quantities that minimise the following general objective function (Wiśniewski 2009).

$$ \varphi ({\mathbf{X}}_{\alpha } ,{\mathbf{X}}_{\beta } ) = \sum\limits_{i = 1}^{n} {\rho_{\alpha } (v_{i\alpha })\rho_{\beta } (} v_{i\beta }) $$
(2)

where \(\rho_{\alpha }\) and \(\rho_{\beta }\) are arbitrary functions. In the context of cross-weighting that is natural in Msplit estimation, function (2) can also be expressed in the following form:

$$ \varphi ({\mathbf{X}}_{\alpha } ,{\mathbf{X}}_{\beta } ) = \sum\limits_{i = 1}^{n} {\rho_{\alpha } (v_{i\alpha } )} w_{\alpha } (v_{i\beta } ) = \sum\limits_{i = 1}^{n} {\rho_{\beta } (v_{i\beta } } )w_{\beta } (v_{i\alpha } ) $$
(3)

where \(w_{\alpha } (v_{i\beta } ) = \rho_{\beta } (v_{i\beta } )\) and \(w_{\beta } (v_{i\alpha } ) = \rho_{\alpha } (v_{i\alpha } )\) are now regarded as a special type of weight function. The specific character of weighting is that the contribution of function \(\rho_{\alpha } (v_{i\alpha } )\) to the optimisation problem is enhanced (or weakened) by the weight function whose argument is quantity \(v_{i\beta }\) competing in relation to \(v_{i\alpha }\) (and vice versa). The weight functions are not like those in M-estimation, which are modified to make the estimator robust. Mutual “cross weighting” functions \(w_{\alpha } (v_{i\beta } )\) and \(w_{\beta } (v_{i\alpha } )\) are applied to determine mutually competitive estimates related to the same observation set (Wiśniewski 2009). In the case of Msplit estimation, one supposes that the observation set might be a mixture of realisations of two different random variables that differ from each other in the parameters of the functional models. One of those variables might be regarded as a “strange” one and its realisations as outliers in a particular case. Then results of Msplit estimation are estimates of the parameters of the “good” variable (like in robust M-estimation) but also estimates of the parameters of the “strange” variable.

This study will use the basic variant of Msplit estimation, in which \(\rho (v_{\alpha } ) = v_{i\alpha }^{2} q_{i}^{ - 1}\) and \(\rho (v_{\beta } ) = v_{i\beta }^{2} q_{i}^{ - 1}\). The quantities \(q_{i}\) are diagonal elements of the \({\mathbf{Q}}_{{\mathbf{y}}}\) cofactor matrix occurring in the \({\mathbf{C}}_{{\mathbf{y}}} = \sigma_{0}^{2} {\mathbf{Q}}_{{\mathbf{y}}}\) covariance matrix model (\(\sigma_{0}^{2}\)—unknown variance component). The adopted functions can be associated (although this is not necessary) with normal distributions as probabilistic observation models. Taking these functions into account, based on Eqs. (2) and (3), the following will be recorded (Wiśniewski 2009; Zienkiewicz 2018a, 2018b)

$$ \begin{aligned} \varphi ({\mathbf{X}}_{\alpha } ,{\mathbf{X}}_{\beta } ) & = \sum\limits_{i = 1}^{n} {\rho_{\alpha } (v_{i\alpha } )\rho_{\beta } (} v_{i\beta } ) = \sum\limits_{i = 1}^{n} {v_{i\alpha }^{2} } v_{i\beta }^{2} q_{i}^{ - 2} \\ & = \sum\limits_{i = 1}^{n} {v_{i\alpha }^{2} } w_{\alpha } (v_{i\beta } ) = \sum\limits_{i = 1}^{n} {v_{i\beta }^{2} } w_{\beta } (v_{i\alpha } ) \\ & = {\mathbf{v}}_{\alpha }^{T} {\mathbf{W}}_{\alpha } ({\mathbf{v}}_{\beta } ){\mathbf{v}}_{\alpha } = {\mathbf{v}}_{\beta }^{T} {\mathbf{W}}_{\beta } ({\mathbf{v}}_{\alpha } ){\mathbf{v}}_{\beta } \\ \end{aligned} $$
(4)

where

$$ \begin{aligned} w_{\alpha } (v_{i\beta } ) & = v_{i\beta }^{2} q_{i}^{ - 2} ,\,\,w_{\beta } (v_{i\alpha } ) = v_{i\alpha }^{2} q_{i}^{ - 2} \\ {\mathbf{W}}_{\alpha } ({\mathbf{v}}_{\beta } ) & = {\text{Diag}}\left( {w_{\alpha } (v_{1\beta } ), \ldots ,w_{\alpha } (v_{n\beta } )} \right), \\ {\mathbf{W}}_{\beta } ({\mathbf{v}}_{\alpha } ) & = {\text{Diag}}\left( {w_{\beta } (v_{1\alpha } ), \ldots ,w_{\beta } (v_{n\alpha } )} \right) \\ \end{aligned} $$
(5)

The solution to the optimisation problem \(\varphi ({\mathbf{X}}_{\alpha } ,{\mathbf{X}}_{\beta } ) \to \min\) includes such quantities \({\hat{\mathbf{X}}}_{\alpha }\) and \({\hat{\mathbf{X}}}_{\beta }\) (Msplit estimators) for which the following is true:

$$ \begin{aligned} \frac{1}{2}\left. {\frac{{\partial \varphi ({\mathbf{X}}_{\alpha } ,{\mathbf{X}}_{\beta } )}}{{\partial {\mathbf{X}}_{\alpha } }}} \right|_{{{\mathbf{X}}_{\alpha } = {\hat{\mathbf{X}}}_{\alpha } ,{\mathbf{X}}_{\beta } = {\hat{\mathbf{X}}}_{\beta } }} & = {\mathbf{A}}^{T} {\mathbf{W}}_{\alpha } ({\tilde{\mathbf{v}}}_{\beta } ){\tilde{\mathbf{v}}}_{\alpha } = {\mathbf{A}}^{T} {\mathbf{W}}_{\alpha } ({\tilde{\mathbf{v}}}_{\beta } )({\mathbf{y}} - {\mathbf{A}}\hat{\mathbf{X}}_{\alpha } ) = {\mathbf{0}} \\ \frac{1}{2}\left. {\frac{{\partial \varphi ({\mathbf{X}}_{\alpha } ,{\mathbf{X}}_{\beta } )}}{{\partial {\mathbf{X}}_{\beta } }}} \right|_{{{\mathbf{X}}_{\alpha } = {\hat{\mathbf{X}}}_{\alpha } ,{\mathbf{X}}_{\beta } = {\hat{\mathbf{X}}}_{\beta } }} & = {\mathbf{A}}^{T} {\mathbf{W}}_{\beta } ({\tilde{\mathbf{v}}}_{\alpha } ){\tilde{\mathbf{v}}}_{\beta } = {\mathbf{A}}^{T} {\mathbf{W}}_{\beta } ({\tilde{\mathbf{v}}}_{\alpha } )({\mathbf{y}} - {\mathbf{A}}\hat{\mathbf{X}}_{\beta } ) = {\mathbf{0}} \\ \end{aligned} $$
(6)

where \({\tilde{\mathbf{v}}}_{\alpha } = {\mathbf{y}} - {\mathbf{A}}\hat{\mathbf{X}}_{\alpha }\) and \({\tilde{\mathbf{v}}}_{\beta } = {\mathbf{y}} - {\mathbf{A}}\hat{\mathbf{X}}_{\beta }\) are residual vectors. The above equations are solved by means of iteration. The iterative procedure can be organised in such a manner that in the steps \(l = 1, \ldots ,s\), the following quantities are determined (Wiśniewski and Zienkiewicz 2021a, 2021b):

$$ \begin{aligned} {\mathbf{X}}_{\alpha (l + 1)} & = \left( {{\mathbf{A}}^{T} {\mathbf{W}}_{\alpha } ({\mathbf{v}}_{\beta (l)} ){\mathbf{A}}} \right)^{ - 1} {\mathbf{A}}^{T} {\mathbf{W}}_{\alpha } ({\mathbf{v}}_{\beta (l)} ){\mathbf{y}},\quad {\mathbf{v}}_{\alpha (l + 1)} = {\mathbf{y}} - {\mathbf{A}}\hat{\mathbf{X}}_{\alpha (l + 1)} \\ {\mathbf{X}}_{\beta (l + 1)} & = \left( {{\mathbf{A}}^{T} {\mathbf{W}}_{\beta } ({\mathbf{v}}_{\alpha (l)} ){\mathbf{A}}} \right)^{ - 1} {\mathbf{A}}^{T} {\mathbf{W}}_{\beta } ({\mathbf{v}}_{\alpha (l)} ){\mathbf{y}},\quad {\mathbf{v}}_{\beta (l + 1)} = {\mathbf{y}} - {\mathbf{A}}\hat{\mathbf{X}}_{\beta (l + 1)} \\ \end{aligned} $$
(7)

(the iterative procedure using gradients and Hessians of the function \(\varphi ({\mathbf{X}}_{\alpha } ,{\mathbf{X}}_{\beta } )\) is presented in Wiśniewski 2009, 2010). The iterative process defined by Eq. (7) is convergent and ends for such \(l = s\) that \({\mathbf{X}}_{\alpha (s)} = {\mathbf{X}}_{\alpha (s - 1)}\) and \({\mathbf{X}}_{\beta (s)} = {\mathbf{X}}_{\beta (s - 1)}\). Then, \({\hat{\mathbf{X}}}_{\alpha } = {\mathbf{X}}_{\alpha (s)}\) and \({\hat{\mathbf{X}}}_{\beta } = {\mathbf{X}}_{\beta (s)}\).

In Msplit estimation, the stochastic model \({\mathbf{C}}_{{\mathbf{v}}} = {\mathbf{C}}_{{\mathbf{y}}} = \sigma_{0}^{2} {\mathbf{Q}}_{{\mathbf{y}}}\), similar to the functional model, is split. The split results in covariance matrices \({\mathbf{C}}_{{{\mathbf{v}}_{\alpha } }} = \sigma_{0\alpha }^{2} {\mathbf{Q}}_{{\mathbf{y}}}\) and \({\mathbf{C}}_{{{\mathbf{v}}_{\beta } }} = \sigma_{0\beta }^{2} {\mathbf{Q}}_{{\mathbf{y}}}\), which are two versions of the covariance matrix \({\mathbf{C}}_{{\mathbf{v}}}\) (Wiśniewski and Zienkiewicz 2021b). The invariant and unbiased estimators of variance coefficients \(\sigma_{0\alpha }^{2}\) and \(\sigma_{0\beta }^{2}\) are the following quantities (Wiśniewski and Zienkiewicz, 2021a, b):

$$ \begin{aligned} \hat{\sigma }_{0\alpha }^{2} & = \frac{{{\tilde{\mathbf{v}}}_{\alpha }^{T} {\overline{\mathbf{Q}}}_{{{\mathbf{v}}_{\alpha } }}^{ - 1} {\mathbf{Q}}_{{\mathbf{y}}} {\overline{\mathbf{Q}}}_{{{\mathbf{v}}_{\alpha } }}^{ - 1} {\tilde{\mathbf{v}}}_{\alpha } }}{{{\text{Tr(}}{\mathbf{N}}_{\alpha }^{T} {\mathbf{N}}_{\alpha } )}}\quad {\text{and}} \\ \hat{\sigma }_{0\beta }^{2} & = \frac{{{\tilde{\mathbf{v}}}_{\beta }^{T} {\overline{\mathbf{Q}}}_{{{\mathbf{v}}_{\beta } }}^{ - 1} {\mathbf{Q}}_{{\mathbf{y}}} {\overline{\mathbf{Q}}}_{{{\mathbf{v}}_{\beta } }}^{ - 1} {\tilde{\mathbf{v}}}_{\alpha } }}{{{\text{Tr(}}{\mathbf{N}}_{\beta }^{T} {\mathbf{N}}_{\beta } )}} \\ \end{aligned} $$
(8)

where

$$ \begin{aligned} {\mathbf{N}}_{\alpha } & = {\mathbf{Q}}_{{\mathbf{y}}} {\overline{\mathbf{Q}}}_{{{\mathbf{v}}_{\alpha } }}^{ - 1} {\mathbf{M}}_{\alpha } ,\quad {\mathbf{N}}_{\beta } = {\mathbf{Q}}_{{\mathbf{y}}} {\overline{\mathbf{Q}}}_{{{\mathbf{v}}_{\beta } }}^{ - 1} {\mathbf{M}}_{\beta } \\ {\mathbf{M}}_{\alpha } & = {\mathbf{I}}_{n} - {\mathbf{A}}({\mathbf{A}}^{T} {\overline{\mathbf{Q}}}_{{{\mathbf{v}}_{\alpha } }}^{ - 1} {\mathbf{A}})^{ - 1} {\mathbf{A}}^{T} {\overline{\mathbf{Q}}}_{{{\mathbf{v}}_{\alpha } }}^{ - 1} , \\ {\mathbf{M}}_{\beta } & = {\mathbf{I}}_{n} - {\mathbf{A}}({\mathbf{A}}^{T} {\overline{\mathbf{Q}}}_{{{\mathbf{v}}_{\beta } }}^{ - 1} {\mathbf{A}})^{ - 1} {\mathbf{A}}^{T} {\overline{\mathbf{Q}}}_{{{\mathbf{v}}_{\beta } }}^{ - 1} \\ \end{aligned} $$
(9)

and \({\overline{\mathbf{Q}}}_{{{\mathbf{v}}_{\alpha } }} = [{\mathbf{W}}_{\alpha } ({\mathbf{v}}_{\beta } )]^{ - 1} {\mathbf{Q}}_{{\mathbf{y}}}\), \({\overline{\mathbf{Q}}}_{{{\mathbf{v}}_{\beta } }} = [{\mathbf{W}}_{\beta } ({\mathbf{v}}_{\alpha } )]^{ - 1} {\mathbf{Q}}_{{\mathbf{y}}}\), \({\mathbf{I}}_{n}\) denotes an \(n \times n\) identity matrix (\({\text{Tr}}\)-matrix trace).

2.2 Weighted TLS method

The total least-squares method is applied where the classical model \({\mathbf{y}} = {\mathbf{AX}} + {\mathbf{v}}\) is replaced by the EIV model of the following form:

$$ {\mathbf{y}} = ({\mathbf{A}} - {\mathbf{E}}){\mathbf{X}} + {{\varvec{\upupsilon}}} = {\mathbf{AX}} + {{\varvec{\upupsilon}}} - {\mathbf{EX}} $$
(10)

where \({\mathbf{E}}\) is an \(n \times m\) random matrix corresponding to the matrix \({\mathbf{A}}\) being observed. The random vector corresponding to observation vector y in the EIV model was denoted as \({{\varvec{\upupsilon}}}\). If we assume that \({\mathbf{v}} = {\mathbf{y}} - {\mathbf{AX}}\) is the error vector in the classical functional model, then \({{\varvec{\upupsilon}}} = {\mathbf{v}} + {\mathbf{EX}}\). It should be considered that \({\mathbf{EX}} = ({\mathbf{X}}^{T} \otimes {\mathbf{I}}_{n} ){\mathbf{e}}\), where \({\mathbf{e}} = {\text{vec}}({\mathbf{E}})\) is a vector formed from successive columns of matrix \({\mathbf{E}}\) (\(\otimes\)—the Kronecker product, \({\mathbf{I}}_{n}\)—an identity matrix of dimensions \(n \times n\)). Moreover, in order to simplify further notation, additional designations \({\mathbf{X}}_{ \otimes } = {\mathbf{X}} \otimes {\mathbf{I}}_{n} = ({\mathbf{X}}^{T} \otimes {\mathbf{I}}_{n} )^{T}\), \({\mathbf{X}}_{ \otimes }^{T} = {\mathbf{X}}^{T} \otimes {\mathbf{I}}_{n} = ({\mathbf{X}} \otimes {\mathbf{I}}_{n} )^{T}\) are introduced. Model (10) can then be expressed in the following form:

$$ {\mathbf{y}} = {\mathbf{AX}} + {{\varvec{\upupsilon}}} - {\mathbf{EX}} = {\mathbf{AX}} + {{\varvec{\upupsilon}}} - ({\mathbf{X}}^{T} \otimes {\mathbf{I}}_{n} ){\mathbf{e}} = {\mathbf{AX}} + {{\varvec{\upupsilon}}} - {\mathbf{X}}_{ \otimes }^{T} {\mathbf{e}} $$
(11)

In the WTLS method, in addition to the stochastic model of the observation vector \({\mathbf{C}}_{{\mathbf{y}}} = \sigma_{0}^{2} {\mathbf{Q}}_{{\mathbf{y}}}\), a stochastic model of \({\mathbf{e}}\) vector is also adopted. In the simplest case, it can be assumed that such a model is the expression \({\mathbf{C}}_{{\mathbf{e}}} = \sigma_{0}^{2} {\mathbf{Q}}_{{\mathbf{e}}}\), where \({\mathbf{Q}}_{{\mathbf{e}}}\) is the known cofactor matrix. However, there are examples in which not all columns of matrix \({\mathbf{A}}\) are affected by random disturbances (e.g. in the linear regression analysis). In that case, matrix \({\mathbf{Q}}_{{\mathbf{e}}}\) can be subject to appropriate decomposition \({\mathbf{Q}}_{{\mathbf{e}}} = {\mathbf{Q}}_{0} \otimes {\mathbf{Q}}_{{\mathbf{x}}}\), where \({\mathbf{Q}}_{{\mathbf{x}}}\) denotes a nonnegative definite diagonal matrix of size \(n \times n\) (Schaffrin and Wieser 2008; Shen et al. 2011).

In the Lagrange approach applied in Schaffrin and Wieser (2008) and Lv and Sui (2020), the WTLS method is based on the following objective function:

$$ \varphi ({{\varvec{\upupsilon}}},{\mathbf{e}},{{\varvec{\uplambda}}},{\mathbf{X}}) = {{\varvec{\upupsilon}}}^{T} {\mathbf{Q}}_{{\mathbf{y}}}^{ - 1} {{\varvec{\upupsilon}}} + {\mathbf{e}}^{T} {\mathbf{Q}}_{{\mathbf{e}}}^{ + } {\mathbf{e}} - 2{{\varvec{\uplambda}}}^{T} ({\mathbf{y}} - {\mathbf{AX}} - {{\varvec{\upupsilon}}} + {\mathbf{X}}_{ \otimes }^{T} {\mathbf{e}}) $$
(12)

where \({{\varvec{\uplambda}}}\) denotes an \(n \times 1\) vector of “Lagrange multipliers”. Matrix \({\mathbf{Q}}_{{\mathbf{e}}}^{ + }\) is the Moore–Penrose inverse of \({\mathbf{Q}}_{{\mathbf{e}}}\). For example, when \({\mathbf{Q}}_{0} = {\text{Diag}}(0,1)\), hence \({\mathbf{Q}}_{{\mathbf{e}}} = {\text{Diag}}({\mathbf{0}},{\mathbf{Q}}_{{\mathbf{x}}} )\), and \({\mathbf{Q}}_{{\mathbf{x}}}\) is regular, then \({\mathbf{Q}}_{{\mathbf{e}}}^{ + } = {\text{Diag}}({\mathbf{0}},{\mathbf{Q}}_{{\mathbf{x}}}^{ - 1} )\)(e.g. Rao 1973; Felus 2004).The iteration procedures that solve the optimisation problem \(\varphi ({{\varvec{\upupsilon}}},{\mathbf{e}},{{\varvec{\uplambda}}},{\mathbf{X}}) \to \min\) with the nonlinear conditions \({\mathbf{y}} - {\mathbf{AX}} - {{\varvec{\upupsilon}}} + {\mathbf{X}}_{ \otimes }^{T} {\mathbf{e}} = {\mathbf{0}}\) are, in general, complicated. The application of a similar objective function in Msplit estimation will generate numerical procedures and computational algorithms of even greater complexity. From the perspective of computational process optimisation, however, the approach adopted in Shen et al. (2011), also applied in Wang et al. (2016), is particularly interesting for the purposes of the study. The iterative method proposed in the study is based on the Newton–Gauss algorithm of nonlinear LS adjustment, proposed by Pope (1974). In this method, the EIV model (11) is replaced, in the j-the iteration, by a linear approximation of the following form:

$$ \begin{aligned} {\mathbf{y}} & = {\mathbf{AX}} + {{\varvec{\upupsilon}}} - {\mathbf{EX}} \\ & = {\mathbf{AX}}^{j} + {\mathbf{A}}^{j} \delta {\mathbf{X}} + {{\varvec{\upupsilon}}} - {\mathbf{EX}}^{j} \\ & = {\mathbf{AX}}^{j} + {\mathbf{A}}^{j} \delta {\mathbf{X}} + {{\varvec{\upupsilon}}} - ({\mathbf{X}}_{ \otimes }^{j} )^{T} {\mathbf{e}} \\ \end{aligned} $$
(13)

where \({\mathbf{A}}^{j} = {\mathbf{A}} - {\tilde{\mathbf{E}}}^{j}\), \({\mathbf{X}} = {\mathbf{X}}^{j} + \delta {\mathbf{X}}\) and \({\mathbf{X}}_{ \otimes }^{j} = {\mathbf{X}}^{j} \otimes {\mathbf{I}}_{n}\). \({\tilde{\mathbf{E}}}^{j}\) are residual matrices built on the basis of the residual vector \({\tilde{\mathbf{e}}}^{j}\) determined in the j-th iteration. Vector \(\delta {\mathbf{X}}\) is a small quantity to be solved in the iteration. After taking into account the condition \({\mathbf{y}} - {\mathbf{AX}}^{j} - {\mathbf{A}}^{j} \delta {\mathbf{X}} - {{\varvec{\upupsilon}}} + ({\mathbf{X}}_{ \otimes }^{j} )^{T} {\mathbf{e}} = {\mathbf{0}}\) resulting from model (13), the objective function (12) can be rewritten in the following form (Shen et al. 2011).

$$ \varphi ({{\varvec{\upupsilon}}},{\mathbf{e}},{{\varvec{\uplambda}}},\delta {\mathbf{X}}) = {{\varvec{\upupsilon}}}^{T} {\mathbf{Q}}_{{\mathbf{y}}}^{ - 1} {{\varvec{\upupsilon}}} + {\mathbf{e}}^{T} {\mathbf{Q}}_{{\mathbf{e}}}^{ + } {\mathbf{e}} - 2{{\varvec{\uplambda}}}^{T} \left( {{\mathbf{y}} - {\mathbf{AX}}^{j} - {\mathbf{A}}^{j} \delta {\mathbf{X}} - {{\varvec{\upupsilon}}} + ({\mathbf{X}}_{ \otimes }^{j} )^{T} {\mathbf{e}}} \right) $$
(14)

The minimum of this function is obtained through satisfying the following Euler–Lagrange necessary conditions (Shen et al. 2011)

$$ \begin{aligned} & \left. {\frac{1}{2}\frac{\partial \varphi }{{\partial {{\varvec{\upupsilon}}}}}} \right|_{{{{\varvec{\upupsilon}}} = {\tilde{\boldsymbol{\upupsilon}}},{\mathbf{e}} = {\tilde{\mathbf{e}}},\delta {\mathbf{X}} = \delta {\hat{\mathbf{X}}},{{\varvec{\uplambda}}} = {\hat{\boldsymbol{\lambda}}}}} = {\mathbf{Q}}_{{\mathbf{y}}}^{ - 1} {\tilde{\boldsymbol{\upupsilon}}} + {\hat{\boldsymbol{\lambda}}} = {\mathbf{0}} \\ & \left. {\frac{1}{2}\frac{\partial \varphi }{{\partial {\mathbf{e}}}}} \right|_{{{{\varvec{\upupsilon}}} = {\tilde{\boldsymbol{\upupsilon}}},{\mathbf{e}} = {\tilde{\mathbf{e}}},\delta {\mathbf{X}} = \delta {\hat{\mathbf{X}}},{{\varvec{\uplambda}}} = {\hat{\boldsymbol{\lambda}}}}} = {\mathbf{Q}}_{{\mathbf{e}}}^{ + } {\tilde{\mathbf{e}}} - {\mathbf{X}}_{ \otimes }^{j} {\hat{\boldsymbol{\lambda}}} = {\mathbf{0}} \\ & \left. {\frac{1}{2}\frac{\partial \varphi }{{\partial \delta {\mathbf{X}}}}} \right|_{{{{\varvec{\upupsilon}}} = {\tilde{\boldsymbol{\upupsilon}}},{\mathbf{e}} = {\tilde{\mathbf{e}}},\delta {\mathbf{X}} = \delta {\hat{\mathbf{X}}},{{\varvec{\uplambda}}} = {\hat{\boldsymbol{\lambda}}}}} = ({\mathbf{A}}^{j} )^{T} {\hat{\boldsymbol{\lambda}}} = {\mathbf{0}} \\ & \left. {\frac{1}{2}\frac{\partial \varphi }{{\partial {{\varvec{\uplambda}}}}}} \right|_{{{{\varvec{\upupsilon}}} = {\tilde{\boldsymbol{\upupsilon}}},{\mathbf{e}} = {\tilde{\mathbf{e}}},\delta {\mathbf{X}} = \delta {\hat{\mathbf{X}}},{{\varvec{\uplambda}}} = {\hat{\boldsymbol{\lambda}}}}} = {\mathbf{y}} - {\mathbf{AX}}^{j} - {\mathbf{A}}^{j} \delta {\hat{\mathbf{X}}} - {\tilde{\boldsymbol{\upupsilon}}} + ({\mathbf{X}}_{ \otimes }^{j} )^{T} {\mathbf{e}} = {\mathbf{0}} \\ \end{aligned} $$
(15)

The solution to the equations contained in Eq. (15) are the following quantities:

$$ \begin{aligned} {\hat{\boldsymbol{\lambda}}} & = - ({\mathbf{Q}}_{l}^{j} )^{ - 1} ({\mathbf{y}} - {\mathbf{AX}}^{j} - {\mathbf{A}}^{j} \delta {\hat{\mathbf{X}}}) \\ \delta {\hat{\mathbf{X}}}^{j + 1} & = \left[ {({\mathbf{A}}^{j} )^{T} ({\mathbf{Q}}_{l}^{j} )^{ - 1} {\mathbf{A}}^{j} } \right]^{ - 1} ({\mathbf{A}}^{j} )^{T} ({\mathbf{Q}}_{l}^{j} )^{ - 1} ({\mathbf{y}} - {\mathbf{AX}}^{j} ) \\ {\mathbf{X}}^{j + 1} & = {\hat{\mathbf{X}}}^{j} + \delta {\hat{\mathbf{X}}}^{j + 1} \\ & = \left[ {({\mathbf{A}}^{j} )^{T} ({\mathbf{Q}}_{l}^{j} )^{ - 1} {\mathbf{A}}^{j} } \right]^{ - 1} ({\mathbf{A}}^{j} )^{T} ({\mathbf{Q}}_{l}^{j} )^{ - 1} ({\mathbf{y}} - {\mathbf{E}}^{j} {\mathbf{X}}^{j} ) \\ \end{aligned} $$
(16)

and

$$ \begin{aligned} & {\tilde{\boldsymbol{\upupsilon}}}^{j + 1} = - {\mathbf{Q}}_{{\mathbf{y}}} ({\mathbf{Q}}_{l}^{j} )^{ - 1} ({\mathbf{y}} - {\mathbf{A}}\hat{\mathbf{X}}^{j} - {\mathbf{A}}^{j} \delta {\hat{\mathbf{X}}}^{j + 1} ) \\ & {\tilde{\mathbf{e}}} = {\mathbf{Q}}_{{\mathbf{e}}} {\mathbf{X}}_{ \otimes }^{j} ({\mathbf{Q}}_{l}^{j} )^{ - 1} ({\mathbf{y}} - {\mathbf{A}}\hat{\mathbf{X}}^{j} - {\mathbf{A}}^{j} \delta {\hat{\mathbf{X}}}^{j + 1} ) \\ \end{aligned} $$
(17)

where

$$ {\mathbf{Q}}_{l}^{j} = {\mathbf{Q}}_{{\mathbf{y}}} + ({\mathbf{X}}_{ \otimes }^{j} )^{T} {\mathbf{Q}}_{{\mathbf{e}}} {\mathbf{X}}_{ \otimes }^{j} $$
(18)

(\({\tilde{\boldsymbol{\upupsilon}}}\)—residual vector corresponding to the observation vector y). Shen et al. (2011) also apply the iterative process on the assumption that \({\mathbf{E}}^{j} \delta {\mathbf{X}}\) is a negligible quantity. Then, \({\mathbf{A}}^{j} \delta {\mathbf{X}} = ({\mathbf{A}} - {\mathbf{E}}^{j} )\delta {\mathbf{X}} = {\mathbf{A}}\delta {\mathbf{X}}\) and \({\mathbf{y}} = {\mathbf{AX}}^{j} + {\mathbf{A}}\delta {\mathbf{X}} + {{\varvec{\upupsilon}}} - {\mathbf{EX}}^{j}\).

The estimation of variance coefficient \(\sigma_{0}^{2}\), common to stochastic models \({\mathbf{C}}_{{\mathbf{y}}} = \sigma_{0}^{2} {\mathbf{Q}}_{{\mathbf{y}}}\) and \({\mathbf{C}}_{{\mathbf{e}}} = \sigma_{0}^{2} {\mathbf{Q}}_{{\mathbf{e}}}\), is also of interest in WTLS. Schaffrin and Wieser (2008) proposed a biased estimator in the following form:

$$ \hat{\sigma }_{0}^{2} = ({\mathbf{y}} - {\mathbf{A}}\hat{\mathbf{X}})^{T} {\mathbf{Q}}_{l}^{ - 1} ({\mathbf{y}} - {\mathbf{A}}\hat{\mathbf{X}})/(n - m) $$
(19)

A correction of the estimator (19) by means of introducing a bias term \(\delta b\) to it was presented by Shen et al. (2011). More complex EIV stochastic models are also being under consideration. For example, Xu and Liu (2014) introduced a model containing variance components and proposed a way to estimate these components.

3 Total Msplit estimation

3.1 Theoretical foundations

Let it be assumed that according to the Msplit estimation rules, the EIV model (10) is split into two mutually competing models

$$ \begin{aligned} {\mathbf{y}} & = ({\mathbf{A}} - {\mathbf{E}}){\mathbf{X}}_{\alpha } + {{\varvec{\upupsilon}}}_{\alpha } = {\mathbf{AX}}_{\alpha } + {{\varvec{\upupsilon}}}_{\alpha } - {\mathbf{EX}}_{\alpha } \\ {\mathbf{y}} & = ({\mathbf{A}} - {\mathbf{E}}){\mathbf{X}}_{\beta } + {{\varvec{\upupsilon}}}_{\beta } = {\mathbf{AX}}_{\beta } + {{\varvec{\upupsilon}}}_{\beta } - {\mathbf{EX}}_{\beta } \\ \end{aligned} $$
(20)

By applying the approach used in Shen et al. (2011) and Wang et al. (2016), see Eq. (13), the above models in the j-th iteration will be replaced with the following linear approximations:

$$ \begin{aligned} {\mathbf{y}} & = {\mathbf{AX}}_{\alpha }^{j} + {\mathbf{A}}^{j} \delta {\mathbf{X}}_{\alpha } + {{\varvec{\upupsilon}}}_{\alpha } - {\mathbf{EX}}_{\alpha }^{j} \\ & = {\mathbf{AX}}_{\alpha }^{j} + {\mathbf{A}}^{j} \delta {\mathbf{X}}_{\alpha } + {{\varvec{\upupsilon}}}_{\alpha } - ({\mathbf{X}}_{ \otimes \alpha }^{j} )^{T} {\mathbf{e}} \\ {\mathbf{y}} & = {\mathbf{AX}}_{\beta }^{j} + {\mathbf{A}}^{j} \delta {\mathbf{X}}_{\beta } + {{\varvec{\upupsilon}}}_{\beta } - {\mathbf{EX}}_{\beta }^{j} \\ & = {\mathbf{AX}}_{\beta }^{j} + {\mathbf{A}}^{j} \delta {\mathbf{X}}_{\beta } + {{\varvec{\upupsilon}}}_{\beta } - ({\mathbf{X}}_{ \otimes \beta }^{j} )^{T} {\mathbf{e}} \\ \end{aligned} $$
(21)

where \({\mathbf{X}}_{\alpha } = {\mathbf{X}}_{\alpha }^{j} + \delta {\mathbf{X}}_{\alpha }\), \({\mathbf{X}}_{\beta } = {\mathbf{X}}_{\beta }^{j} + \delta {\mathbf{X}}_{\beta }\) and \({\mathbf{X}}_{ \otimes \alpha }^{j} = {\mathbf{X}}_{\alpha }^{j} \otimes {\mathbf{I}}_{n}\), \({\mathbf{X}}_{ \otimes \beta }^{j} = {\mathbf{X}}_{\beta }^{j} \otimes {\mathbf{I}}_{n}\). After determining the quantities \({\mathbf{E}}^{j}\), \({\mathbf{X}}_{\alpha }^{j}\) and \({\mathbf{X}}_{\beta }^{j}\), the models that are valid in the j-th iteration, contained in Eq. (20), take the following forms:

$$ \begin{aligned} {\mathbf{y}} & = ({\mathbf{A}} - {\mathbf{E}}^{j} ){\mathbf{X}}_{\alpha }^{j} + {{\varvec{\upupsilon}}}_{\alpha }^{j} = {\mathbf{A}}^{j} {\mathbf{X}}_{\alpha }^{j} + {{\varvec{\upupsilon}}}_{\alpha }^{j} \\ {\mathbf{y}} & = ({\mathbf{A}} - {\mathbf{E}}^{j} ){\mathbf{X}}_{\beta }^{j} + {{\varvec{\upupsilon}}}_{\beta }^{j} = {\mathbf{A}}^{j} {\mathbf{X}}_{\beta }^{j} + {{\varvec{\upupsilon}}}_{\beta }^{j} \\ \end{aligned} $$
(22)

Total Msplit estimators of parameters \({\mathbf{X}}_{\alpha }\) and \({\mathbf{X}}_{\beta }\) are quantities \({\hat{\mathbf{X}}}_{\alpha }\) and \({\hat{\mathbf{X}}}_{\beta }\) which minimise the following objective function:

$$ \varphi ({\mathbf{X}}_{\alpha } ,{\mathbf{X}}_{\beta } ) = {{\varvec{\upupsilon}}}_{\alpha }^{T} {\mathbf{W}}_{\alpha } ({{\varvec{\upupsilon}}}_{\beta } ){{\varvec{\upupsilon}}}_{\alpha } + {\mathbf{e}}^{T} {\mathbf{Q}}_{{\mathbf{e}}}^{ + } {\mathbf{e}} = {{\varvec{\upupsilon}}}_{\beta }^{T} {\mathbf{W}}_{\beta } ({{\varvec{\upupsilon}}}_{\alpha } ){\mathbf{v}}_{\beta } + {\mathbf{e}}^{T} {\mathbf{Q}}_{{\mathbf{e}}}^{ + } {\mathbf{e}} $$
(23)

This function [lake functions Eqs. (3) and (4)] is not designed to make the method robust and also does not “predict” outliers among observations of elements of matrix A. In view of the following conditions resulting from Eq. (21)

$$ \begin{aligned} {\mathbf{y}} - {\mathbf{AX}}_{\alpha }^{j} - {\mathbf{A}}^{j} \delta {\mathbf{X}}_{\alpha } - {{\varvec{\upupsilon}}}_{\alpha } + ({\mathbf{X}}_{ \otimes \alpha }^{j} )^{T} {\mathbf{e}} & = {\mathbf{0}} \\ {\mathbf{y}} - {\mathbf{AX}}_{\beta }^{j} - {\mathbf{A}}^{j} \delta {\mathbf{X}}_{\beta } - {{\varvec{\upupsilon}}}_{\beta } + ({\mathbf{X}}_{ \otimes \beta }^{j} )^{T} {\mathbf{e}} & = {\mathbf{0}} \\ \end{aligned} $$
(24)

the original objective function (23) will be supplemented to the following form:

$$ \begin{aligned} & \varphi ({{\varvec{\upupsilon}}}_{\alpha } ,{{\varvec{\upupsilon}}}_{\beta } ,{\mathbf{e}},{\mathbf{X}}_{\alpha } ,{\mathbf{X}}_{\beta } ,{{\varvec{\uplambda}}}_{\alpha } ,{{\varvec{\uplambda}}}_{\beta } ) \\ & \quad = {{\varvec{\upupsilon}}}_{\alpha }^{T} {\mathbf{W}}_{\alpha } ({{\varvec{\upupsilon}}}_{\beta } ){\mathbf{v}}_{\alpha } + {\mathbf{e}}^{T} {\mathbf{Q}}_{{\mathbf{e}}}^{ + } {\mathbf{e}} \\ & \qquad - 2{{\varvec{\uplambda}}}_{\alpha }^{T} \left( {{\mathbf{y}} - {\mathbf{AX}}_{\alpha }^{j} - {\mathbf{A}}^{j} \delta {\mathbf{X}}_{\alpha } - {{\varvec{\upupsilon}}}_{\alpha } + ({\mathbf{X}}_{ \otimes \alpha }^{j} )^{T} {\mathbf{e}}} \right) \\ & \qquad - 2{{\varvec{\uplambda}}}_{\beta }^{T} \left( {{\mathbf{y}} - {\mathbf{AX}}_{\beta }^{j} - {\mathbf{A}}^{j} \delta {\mathbf{X}}_{\beta } - {{\varvec{\upupsilon}}}_{\beta } + ({\mathbf{X}}_{ \otimes \beta }^{j} )^{T} {\mathbf{e}}} \right) \\ & \quad = {{\varvec{\upupsilon}}}_{\beta }^{T} {\mathbf{W}}_{\beta } ({{\varvec{\upupsilon}}}_{\alpha } ){\mathbf{v}}_{\beta } + {\mathbf{e}}^{T} {\mathbf{Q}}_{{\mathbf{e}}}^{ + } {\mathbf{e}} \\ & \qquad - 2{{\varvec{\uplambda}}}_{\alpha }^{T} \left( {{\mathbf{y}} - {\mathbf{AX}}_{\alpha }^{j} - {\mathbf{A}}^{j} \delta {\mathbf{X}}_{\alpha } - {{\varvec{\upupsilon}}}_{\alpha } + ({\mathbf{X}}_{ \otimes \alpha }^{j} )^{T} {\mathbf{e}}} \right) \\ & \qquad - 2{{\varvec{\uplambda}}}_{\beta }^{T} \left( {{\mathbf{y}} - {\mathbf{AX}}_{\beta }^{j} - {\mathbf{A}}^{j} \delta {\mathbf{X}}_{\beta } - {{\varvec{\upupsilon}}}_{\beta } + ({\mathbf{X}}_{ \otimes \beta }^{j} )^{T} {\mathbf{e}}} \right) \\ \end{aligned} $$
(25)

where \({{\varvec{\uplambda}}}_{\alpha }\) and \({{\varvec{\uplambda}}}_{\beta }\) are Lagrange multiplier vectors corresponding to conditions (24). It is established that the Euler–Lagrange necessary conditions have the following forms in the optimisation problem \(\varphi ({{\varvec{\upupsilon}}}_{\alpha } ,{{\varvec{\upupsilon}}}_{\beta } ,{\mathbf{e}},{\mathbf{X}}_{\alpha } ,{\mathbf{X}}_{\beta } ,{{\varvec{\uplambda}}}_{\alpha } ,{{\varvec{\uplambda}}}_{\beta } ) \to \min\):

$$ \left. {\frac{1}{2}\frac{\partial \varphi }{{\partial {{\varvec{\upupsilon}}}_{\alpha } }}} \right|_{\Omega } = {\mathbf{W}}_{\alpha } ({\tilde{\boldsymbol{\upupsilon}}}_{\beta } ){\tilde{\boldsymbol{\upupsilon}}}_{\alpha } + {\hat{\boldsymbol{\lambda}}}_{\alpha } = {\mathbf{0}} $$
(26a)
$$ \left. {\frac{1}{2}\frac{\partial \varphi }{{\partial {{\varvec{\upupsilon}}}_{\beta } }}} \right|_{\Omega } = {\mathbf{W}}_{\beta } ({\tilde{\boldsymbol{\upupsilon}}}_{\alpha } ){\tilde{\boldsymbol{\upupsilon}}}_{\beta } + {\hat{\boldsymbol{\lambda}}}_{\beta } = {\mathbf{0}} $$
(26b)
$$ \left. {\frac{1}{2}\frac{\partial \varphi }{{\partial {\mathbf{e}}}}} \right|_{\Omega } = {\mathbf{Q}}_{{\mathbf{e}}}^{ + } {\tilde{\mathbf{e}}} - {\mathbf{X}}_{ \otimes \alpha }^{j} {\hat{\boldsymbol{\lambda}}}_{\alpha } - {\mathbf{X}}_{ \otimes \beta }^{j} {\hat{\boldsymbol{\lambda}}}_{\beta } = {\mathbf{0}} $$
(26c)
$$ \left. {\frac{1}{2}\frac{\partial \varphi }{{\partial \delta {\mathbf{X}}_{\alpha } }}} \right|_{\Omega } = ({\mathbf{A}}^{j} )^{T} {\hat{\boldsymbol{\lambda}}}_{\alpha } = {\mathbf{0}} $$
(26d)
$$ \left. {\frac{1}{2}\frac{\partial \varphi }{{\partial \delta {\mathbf{X}}_{\beta } }}} \right|_{\Omega } = ({\mathbf{A}}^{j} )^{T} {\hat{\boldsymbol{\lambda}}}_{\beta } = {\mathbf{0}} $$
(26e)
$$ \left. {\frac{1}{2}\frac{\partial \varphi }{{\partial {{\varvec{\uplambda}}}_{\alpha } }}} \right|_{\Omega } = {\mathbf{y}} - {\mathbf{AX}}_{\alpha }^{j} - {\mathbf{A}}^{j} \delta {\hat{\mathbf{X}}}_{\alpha } - {\tilde{\boldsymbol{\upupsilon}}}_{\alpha } + ({\mathbf{X}}_{ \otimes \alpha }^{j} )^{T} {\tilde{\mathbf{e}}} = {\mathbf{0}} $$
(26f)
$$ \left. {\frac{1}{2}\frac{\partial \varphi }{{\partial {{\varvec{\uplambda}}}_{\beta } }}} \right|_{\Omega } = {\mathbf{y}} - {\mathbf{AX}}_{\beta }^{j} - {\mathbf{A}}^{j} \delta {\hat{\mathbf{X}}}_{\beta } - {\tilde{\boldsymbol{\upupsilon}}}_{\beta } + ({\mathbf{X}}_{ \otimes \beta }^{j} )^{T} {\tilde{\mathbf{e}}} = {\mathbf{0}} $$
(26g)

The set of simultaneous substitutions: \({{\varvec{\upupsilon}}}_{\alpha } = {\tilde{\boldsymbol{\upupsilon}}}_{\alpha }\), \({{\varvec{\upupsilon}}}_{\beta } = {\tilde{\boldsymbol{\upupsilon}}}_{\beta }\), \({\mathbf{e}} = {\tilde{\mathbf{e}}}\), \(\delta {\mathbf{X}}_{\alpha } = \delta {\hat{\mathbf{X}}}_{\alpha }\), \(\delta {\mathbf{X}}_{\beta } = \delta {\hat{\mathbf{X}}}_{\beta }\), \({{\varvec{\uplambda}}}_{\alpha } = {\hat{\boldsymbol{\lambda}}}_{\alpha }\), \({{\varvec{\uplambda}}}_{\beta } = {\hat{\boldsymbol{\lambda}}}_{\beta }\), introduced to simplify the notation, was denoted as \(\Omega\). Based on Eqs. (26a)–(26c), the following residual vectors are determined:

$$ {\tilde{\boldsymbol{\upupsilon}}}_{\alpha } = - {\mathbf{W}}_{\alpha }^{ - 1} ({\tilde{\boldsymbol{\upupsilon}}}_{\beta } ){\hat{\boldsymbol{\lambda}}}_{\alpha } $$
$$ {\tilde{\boldsymbol{\upupsilon}}}_{\beta } = - {\mathbf{W}}_{\beta }^{ - 1} ({\tilde{\boldsymbol{\upupsilon}}}_{\alpha } ){\hat{\boldsymbol{\lambda}}}_{\beta } $$
$$ {\tilde{\mathbf{e}}} = {\mathbf{Q}}_{{\mathbf{e}}} ({\mathbf{X}}_{ \otimes \alpha }^{j} {\hat{\boldsymbol{\lambda}}}_{\alpha } + {\mathbf{X}}_{ \otimes \beta }^{j} {\hat{\boldsymbol{\lambda}}}_{\beta } ) $$
(27)

By substituting the quantities obtained above to Eqs. (26f) and (26g), a system of normal equations relating to vectors \({\hat{\boldsymbol{\lambda}}}_{\alpha }\) and \({\hat{\boldsymbol{\lambda}}}_{\beta }\) is obtained. The system has the following form:

$$ \begin{aligned} {{\varvec{\Theta}}}_{\alpha }^{j} {\hat{\boldsymbol{\lambda}}}_{\alpha } + {{\varvec{\Theta}}}_{\alpha \beta }^{j} {\hat{\boldsymbol{\lambda}}}_{\beta } & = - ({\mathbf{y}} - {\mathbf{AX}}_{\alpha }^{j} - {\mathbf{A}}^{j} \delta {\hat{\mathbf{X}}}_{\alpha } ) \\ {{\varvec{\Theta}}}_{\beta \alpha }^{j} {\hat{\boldsymbol{\lambda}}}_{\alpha } + {{\varvec{\Theta}}}_{\beta }^{j} {\hat{\boldsymbol{\lambda}}}_{\beta } & = - ({\mathbf{y}} - {\mathbf{AX}}_{\beta }^{j} - {\mathbf{A}}^{j} \delta {\hat{\mathbf{X}}}_{\beta } ) \\ \end{aligned} $$
(28)

where

$$ \begin{array}{*{20}l} {{{\varvec{\Theta}}}_{\alpha }^{j} = {\mathbf{W}}_{\alpha }^{ - 1} ({\tilde{\boldsymbol{\upupsilon}}}_{\beta } ) + ({\mathbf{X}}_{ \otimes \alpha }^{j} )^{T} {\mathbf{Q}}_{{\mathbf{e}}} {\mathbf{X}}_{ \otimes \alpha }^{j} ,} \hfill & {{{\varvec{\Theta}}}_{\alpha ,\beta }^{j} = ({\mathbf{X}}_{ \otimes \alpha }^{j} )^{T} {\mathbf{Q}}_{{\mathbf{e}}} {\mathbf{X}}_{ \otimes \beta }^{j} } \hfill \\ {{{\varvec{\Theta}}}_{\beta \alpha }^{j} = ({{\varvec{\Theta}}}_{\alpha \beta }^{j} )^{T} = ({\mathbf{X}}_{ \otimes \beta }^{j} )^{T} {\mathbf{Q}}_{{\mathbf{e}}} {\mathbf{X}}_{ \otimes \alpha }^{j} ,} \hfill & {{{\varvec{\Theta}}}_{\beta }^{j} = {\mathbf{W}}_{\beta }^{ - 1} ({\tilde{\mathbf{v}}}_{\alpha } ) + ({\mathbf{X}}_{ \otimes \beta }^{j} )^{T} {\mathbf{Q}}_{{\mathbf{e}}} {\mathbf{X}}_{ \otimes \beta }^{j} } \hfill \\ \end{array} $$
(29)

After introducing block matrices

$$ \begin{aligned} {{\varvec{\Theta}}}^{j} & = \left[ {\begin{array}{*{20}l} {{{\varvec{\Theta}}}_{\alpha }^{j} } \hfill & {{{\varvec{\Theta}}}_{\alpha \beta }^{j} } \hfill \\ {{{\varvec{\Theta}}}_{\beta \alpha }^{j} } \hfill & {{{\varvec{\Theta}}}_{\beta }^{j} } \hfill \\ \end{array} } \right], \\ \mathop{\mathbf{A}}\limits^{\frown} & = {\mathbf{I}}_{2} \otimes {\mathbf{A}} = \left[ {\begin{array}{*{20}c} {\mathbf{A}} & {\mathbf{0}} \\ {\mathbf{0}} & {\mathbf{A}} \\ \end{array} } \right], \\ \mathop{\mathbf{y}}\limits^{\frown} & = {\mathbf{1}}_{2} \otimes {\mathbf{y}} = \left[ {\begin{array}{*{20}c} {\mathbf{y}} \\ {\mathbf{y}} \\ \end{array} } \right] \\ \end{aligned} $$
(30)

(\({\mathbf{1}}_{2} = [1,\;1]^{T}\)) and after including the mutually competing quantities being determined in combined vectors, i.e. having introduced vectors

$$ \begin{aligned} {{\varvec{\uplambda}}} & = \left[ {{{\varvec{\uplambda}}}_{\alpha }^{T} ,\;{{\varvec{\uplambda}}}_{\beta }^{T} } \right]^{T} ,\delta {\mathbf{X}} = \left[ {\delta {\mathbf{X}}_{\alpha }^{T} ,\;\delta {\mathbf{X}}_{\beta }^{T} } \right]^{T} , \\ {\mathbf{X}} & = \left[ {{\mathbf{X}}_{\alpha }^{T} ,\;{\mathbf{X}}_{\beta }^{T} } \right]^{T} \\ \end{aligned} $$
(31)

system of Eqs. (28) can also be expressed as follows:

$$ {{\varvec{\Theta}}}^{j} {\hat{\boldsymbol{\lambda}}} = - (\mathop{\mathbf{y}}\limits^{\frown} - \mathop{\mathbf{A}}\limits^{\frown}\mathbf{X}^{j} - \mathop{\mathbf{A}}\limits^{\frown}{}^{j} \delta {\hat{\mathbf{X}}}) $$
(32)

The solution to Eq. (32) is the combined Lagrange multiplier vector of the following form:

$$ {\hat{\boldsymbol{\lambda}}} = \left[ {\begin{array}{*{20}c} {{\hat{\boldsymbol{\lambda}}}_{\alpha } } \\ {{\hat{\boldsymbol{\lambda}}}_{\beta } } \\ \end{array} } \right] = - ({{\varvec{\Theta}}}^{j} )^{ - 1} (\mathop{\mathbf{y}}\limits^{\frown} - \mathop{\mathbf{A}}\limits^{\frown}\mathbf{X}^{j} - \mathop{\mathbf{A}}\limits^{\frown}{}^{j} \delta {\hat{\mathbf{X}}}) $$
(33)

After substituting Eq. (33) to conditions (26d) and (26e), jointly recorded as

$$ \left. {\begin{array}{*{20}c} {({\mathbf{A}}^{j} )^{T} {\hat{\boldsymbol{\lambda}}}_{\alpha } = {\mathbf{0}}} \\ {({\mathbf{A}}^{j} )^{T} {\hat{\boldsymbol{\lambda}}}_{\beta } = {\mathbf{0}}} \\ \end{array} } \right\}\quad \Leftrightarrow ( \mathop{\mathbf{A}}\limits^{\frown}{}^{j})^{T} {\hat{\boldsymbol{\lambda}}} = {\mathbf{0}} $$
(34)

the following normal equation is obtained:

$$ ( \mathop{\mathbf{A}}\limits^{\frown}{}^{j})^{T} ({{\varvec{\Theta}}}^{j} )^{ - 1} (\mathop{\mathbf{y}}\limits^{\frown} - \mathop{\mathbf{y}}\limits^{\frown}\mathbf{X}^{j} - \mathop{\mathbf{A}}\limits^{\frown}{}^{j} \delta {\hat{\mathbf{X}}}) = {\mathbf{0}} $$
(35)

The solution to this equation is vector

$$ \delta {\hat{\mathbf{X}}}^{j + 1} = \left[ {\begin{array}{*{20}c}{\delta {\hat{\mathbf{X}}}_{\alpha }^{j + 1} } \\ {\delta{\hat{\mathbf{X}}}_{\beta }^{j + 1} } \\ \end{array} } \right] =\left( {(\mathop{\mathbf{A}}\limits^{\frown}{}^{j})^{T}({{\varvec{\Theta}}}^{j} )^{ -1}\mathop{\mathbf{A}}\limits^{\frown}{}^{j}} \right)^{ - 1} (\mathop{\mathbf{y}}\limits^{\frown} -\mathop{\mathbf{A}}\limits^{\frown}\mathbf{X}^{j} ) $$
(36)

which represents an evaluation of the combined vector of \(\delta {\mathbf{X}}\) increments in the (\(j + 1\)) iteration. In order to determine the combined vector \({\mathbf{X}} = {\mathbf{X}}^{j} + \delta {\mathbf{X}}\) that is valid in this iteration, it will be taken into account that \({\mathbf{A}} = {\mathbf{A}}^{j} + {\mathbf{E}}^{j}\), and thus, \(\mathop{\mathbf{A}}\limits^{\frown} = \mathop{\mathbf{A}}\limits^{\frown}{}^{j} + \mathop{\mathbf{e}}\limits^{\frown}{}^{j}\), where \(\mathop{\mathbf{E}}\limits^{\frown}{}^{j} = {\mathbf{I}}_{2} \otimes {\mathbf{E}}^{j}\). Equation (35) then takes the following form:

$$ (\mathop{\mathbf{A}}\limits^{\frown}{}^{j} )^{T} ({{\varvec{\Theta}}}^{j} )^{ - 1} (\mathop{\mathbf{y}}\limits^{\frown} - \mathop{\mathbf{A}}\limits^{\frown} {}^{j} {\mathbf{X}}^{j} - \mathop{\mathbf{E}}\limits^{\frown}{}^{j} {\mathbf{X}}^{j} - \mathop{\mathbf{A}}\limits^{\frown}{}^{j} \delta {\hat{\mathbf{X}}}) = {\mathbf{0}} $$
(37)

which yields the following:

$$ \begin{aligned} {\hat{\mathbf{X}}}^{j + 1} & = \left[ {\begin{array}{*{20}c} {{\hat{\mathbf{X}}}_{\alpha }^{j + 1} } \\ {{\hat{\mathbf{X}}}_{\beta }^{j + 1} } \\ \end{array} } \right] = {\hat{\mathbf{X}}}^{j} + \delta {\hat{\mathbf{X}}}^{j + 1} \\ & = \left( {({\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{A} }}^{j} )^{T} ({{\varvec{\Theta}}}^{j} )^{ - 1} {\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{A} }}^{j} } \right)^{ - 1} ({\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{A} }}^{j} )^{T} ({{\varvec{\Theta}}}^{j} )^{ - 1} ({\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y} }} - {\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{E} }}^{j} {\mathbf{X}}^{j} ) \\ \end{aligned} $$
(38)

Based on Eqs. (26a) and (26b), jointly recorded as

$$ \left. {\begin{array}{*{20}c} {{\tilde{\boldsymbol{\upupsilon}}}_{\alpha } = - {\mathbf{W}}_{\alpha }^{ - 1} ({\tilde{\mathbf{v}}}_{\beta } ){\hat{\boldsymbol{\lambda}}}_{\alpha } } \\ {{\tilde{\boldsymbol{\upupsilon}}}_{\beta } = - {\mathbf{W}}_{\beta }^{ - 1} ({\tilde{\mathbf{v}}}_{\alpha } ){\hat{\boldsymbol{\lambda}}}_{\beta } } \\ \end{array} } \right\}\quad \Leftrightarrow \quad {\tilde{\boldsymbol{\upupsilon}}} = - {\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{W} }}^{ - 1} {\hat{\boldsymbol{\lambda}}} $$
(39)

the combined residual vector that is valid in the \((j + 1)\) iteration can be determined:

$$ {\tilde{\boldsymbol{\upupsilon}}}^{j + 1} = \left[ {\begin{array}{*{20}c} {{\tilde{\boldsymbol{\upupsilon}}}_{\alpha }^{j + 1} } \\ {{\tilde{\boldsymbol{\upupsilon}}}_{\beta }^{j + 1} } \\ \end{array} } \right] = {\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{W} }}^{ - 1} ({\tilde{\boldsymbol{\upupsilon}}}_{\alpha }^{j} ,{\tilde{\boldsymbol{\upupsilon}}}_{\beta }^{j} )({{\varvec{\Theta}}}^{j} )^{ - 1} ({\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y} }} - {\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{A} }}\hat{X}^{j} - {\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{A} }}^{j} \delta {\hat{\mathbf{X}}}^{j + 1} ) $$
(40)

where

$$ {\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{W} }}^{ - 1} ({\tilde{\boldsymbol{\upupsilon}}}_{\alpha }^{j} ,{\tilde{\boldsymbol{\upupsilon}}}_{\beta }^{j} ) = {\text{Diag}}\left( {{\mathbf{W}}_{\alpha }^{ - 1} ({\tilde{\boldsymbol{\upupsilon}}}_{\beta }^{j} ),\;{\mathbf{W}}_{\beta }^{ - 1} ({\tilde{\boldsymbol{\upupsilon}}}_{\alpha }^{j} )} \right) $$
(41)

On the other hand, based on Eq. (26c), expressed in the following form:

$$ {\tilde{\mathbf{e}}} = {\mathbf{Q}}_{{\mathbf{e}}} ({\mathbf{X}}_{ \otimes \alpha }^{j} {\hat{\boldsymbol{\lambda}}}_{\alpha } + {\mathbf{X}}_{ \otimes \beta }^{j} {\hat{\boldsymbol{\lambda}}}_{\beta } ) = {\mathbf{Q}}_{{\mathbf{e}}} {\mathbf{X}}_{ \otimes }^{j} {\hat{\boldsymbol{\lambda}}} $$
(42)

the following residual vector is determined:

$$ {\tilde{\mathbf{e}}}^{j + 1} = - {\mathbf{Q}}_{{\mathbf{e}}} {\mathbf{X}}_{ \otimes }^{j} ({{\varvec{\Theta}}}^{j} )^{ - 1} ({\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y} }} -\mathop{\mathbf{A}}\limits^{\frown} \hat{\mathbf{X}}^{j} - {\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{A} }}^{j} \delta {\hat{\mathbf{X}}}^{j + 1} ) $$
(43)

where \({\mathbf{X}}_{ \otimes }^{j} = \left[ {({\mathbf{X}}_{ \otimes \alpha }^{j} )^{T} ,({\mathbf{X}}_{ \otimes \beta }^{j} )^{T} } \right]^{T}\). Taking that vector, we can obtain \({\tilde{\mathbf{E}}}^{j + 1}\), and hence, the matrix \({\mathbf{A}}^{j + 1} = {\mathbf{A}} - {\tilde{\mathbf{E}}}^{j + 1}\).

Variance coefficient estimators in TMsplit estimation should be derived by referring to the split EIV models (e.g. by applying the theory presented in Wiśniewski and Zienkiewicz (2021b). However, this problem that requires additional, detailed theoretical and empirical analyses is beyond the scope of this paper. With minor random disturbances of matrix A, variance coefficient estimators appropriate for Msplit estimation can also be recommended here. Then, in Eq. (8), the vectors \({\tilde{\mathbf{v}}}_{\alpha }\) and \({\tilde{\mathbf{v}}}_{\beta }\) should be replaced by the vectors \({\tilde{\boldsymbol{\upupsilon}}}_{\alpha } = {\mathbf{y}} - ({\mathbf{A}} - {\tilde{\mathbf{E}}}){\hat{\mathbf{X}}}_{\alpha }\) and \({\tilde{\boldsymbol{\upupsilon}}}_{\beta } = {\mathbf{y}} - ({\mathbf{A}} - {\tilde{\mathbf{E}}}){\hat{\mathbf{X}}}_{\beta }\). These estimators of variance coefficients stay unbiased; however, they loose their invariance according to growing values of \({\tilde{\mathbf{E}}}\hat{\mathbf{X}}_{\alpha }\) and \({\tilde{\mathbf{E}}}\hat{\mathbf{X}}_{\beta }\).

3.2 Algorithm

Total Msplit estimation algorithm contains the following basic elements: Step 0—a starting step, Step 1—iterative calculation of Msplit estimators for valid split EIV models (with internal iterations \(l = 0, \ldots ,s\)), Step 2—updating the EIV models' parameters, and return to Step 1 (until the adopted criterion for stopping the iterative process, \(j = 0, \ldots ,k\)), Step 3—adopting the final values of Total Msplit estimators. Each of these steps is described in more detail below.

Step 0: Similar to Msplit estimation, the iterative process can also be initiated here using the following classical least-squares (LS) estimators

$$ {\hat{\mathbf{X}}}_{LS} = ({\mathbf{A}}^{T} {\mathbf{WA}})^{ - 1} {\mathbf{A}}^{T} {\mathbf{Wy}}\quad {\text{and}}\quad {\tilde{\mathbf{v}}}_{LS} = {\mathbf{y}} - {\mathbf{A}}\hat{\mathbf{X}}_{LS} $$
(44)

where \({\mathbf{W}} = {\mathbf{Q}}_{{\mathbf{y}}}^{ - 1}\) is the weight matrix. Therefore, the following are adopted: \({\mathbf{X}}_{\alpha (0)}^{0} = {\hat{\mathbf{X}}}_{LS}\), \({\mathbf{X}}_{\beta (0)}^{0} = {\hat{\mathbf{X}}}_{LS}\), \({{\varvec{\upupsilon}}}_{\alpha (0)}^{0} = {\tilde{\mathbf{v}}}_{LS}\), \({{\varvec{\upupsilon}}}_{\beta (0)}^{0} = {\tilde{\mathbf{v}}}_{LS}\). Moreover, \({\mathbf{E}}^{0} = {\mathbf{0}}\), \({\mathbf{A}}^{0} = {\mathbf{A}}\) and \(\delta {\hat{\mathbf{X}}}^{0} = {\mathbf{0}}\).

Step 1: Calculate Msplit estimators \({\hat{\mathbf{X}}}_{\alpha }^{j}\) and \({\hat{\mathbf{X}}}_{\beta }^{j}\):

step \(1_{(1)}\): Based on the valid vector \({{\varvec{\upupsilon}}}_{\beta (l)}^{j}\), the following weight matrix is constructed

$$ {\mathbf{W}}_{\alpha } ({{\varvec{\upupsilon}}}_{\beta (l)}^{j} ) = {\text{Diag}}\left( {\left( {{{\varvec{\upupsilon}}}_{1\beta (l)}^{j} } \right)^{2} q_{1}^{ - 2} , \ldots ,\left( {{{\varvec{\upupsilon}}}_{n\beta (l)}^{j} } \right)^{2} q_{n}^{ - 2} } \right) $$
(45)

and the following is calculated:

$$ \begin{aligned} {\mathbf{X}}_{\alpha (l + 1)}^{j} & = \left( {{\mathbf{A}}^{T} {\mathbf{W}}_{\alpha } ({{\varvec{\upupsilon}}}_{\beta (l)}^{j} ){\mathbf{A}}} \right)^{ - 1} {\mathbf{A}}^{T} {\mathbf{W}}_{\alpha } ({{\varvec{\upupsilon}}}_{\beta (l)}^{j} ){\mathbf{y}} \\ {{\varvec{\upupsilon}}}_{\alpha (l + 1)}^{j} & = {\mathbf{y}} - {\mathbf{A}}^{j} {\mathbf{X}}_{\alpha (l + 1)}^{j} \\ \end{aligned} $$
(46)

step \(1_{(2)}\): Based on the vector \({{\varvec{\upupsilon}}}_{\alpha (l + 1)}^{j}\), the following weight matrix is constructed

$$ {\mathbf{W}}_{\beta } ({{\varvec{\upupsilon}}}_{\alpha (l + 1)}^{j} ) = {\text{Diag}}\left( {\left( {{{\varvec{\upupsilon}}}_{1\alpha (l + 1)}^{j} } \right)^{2} q_{1}^{ - 2} , \ldots ,\left( {{{\varvec{\upupsilon}}}_{n\alpha (l + 1)}^{j} } \right)^{2} q_{n}^{ - 2} } \right) $$
(47)

and the following is calculated:

$$ \begin{aligned} {\mathbf{X}}_{\beta (l + 1)}^{j} & = \left( {{\mathbf{A}}^{T} {\mathbf{W}}_{\beta } ({{\varvec{\upupsilon}}}_{\alpha (l + 1)}^{j} ){\mathbf{A}}} \right)^{ - 1} {\mathbf{A}}^{T} {\mathbf{W}}_{\beta } ({{\varvec{\upupsilon}}}_{\alpha (l + 1)}^{j} ){\mathbf{y}} \\ {{\varvec{\upupsilon}}}_{\beta (l + 1)}^{j} & = {\mathbf{y}} - {\mathbf{A}}^{j} {\mathbf{X}}_{\beta (l + 1)}^{j} \\ \end{aligned} $$
(48)

step \(1_{(3)}\): Repeat steps \(1_{(1)}\) and \(1_{(2)}\) until

$$ \begin{gathered} \left\| {{\mathbf{X}}_{\alpha (l + 1)}^{j} - {\mathbf{X}}_{\alpha (l)}^{j} } \right\| < \varepsilon_{0} \quad {\text{and}} \hfill \\ \left\| {{\mathbf{X}}_{\beta (l + 1)}^{j} - {\mathbf{X}}_{\beta (l)}^{j} } \right\| < \varepsilon_{0} \quad ({\text{for a given}}\,\varepsilon_{0} ) \hfill \\ \end{gathered} $$
(49)

Once criterion (49) has been satisfied, the following Msplit estimators in the j-th iteration:

$$ {\hat{\mathbf{X}}}_{\alpha }^{j} = {\mathbf{X}}_{\alpha (l + 1)}^{j} ,{\hat{\mathbf{X}}}_{\beta }^{j} = {\mathbf{X}}_{\beta (l + 1)}^{j} $$
(50)

and residual vectors

$$ {\tilde{\boldsymbol{\upupsilon}}}_{\alpha }^{j} = {\mathbf{y}} - {\mathbf{A}}^{j} {\hat{\mathbf{X}}}_{\alpha }^{j} ,{\tilde{\boldsymbol{\upupsilon}}}_{\beta }^{j} = {\mathbf{y}} - {\mathbf{A}}^{j} {\hat{\mathbf{X}}}_{\beta }^{j} $$
(51)

are adopted.

Step 2: For the determined iterative Msplit estimators \({\hat{\mathbf{X}}}_{\alpha }^{j}\), \({\hat{\mathbf{X}}}_{\beta }^{j}\), and residual vectors \({\tilde{\boldsymbol{\upupsilon}}}_{\alpha }^{j}\), \({\tilde{\boldsymbol{\upupsilon}}}_{\beta }^{j}\), weight matrices that are valid in the j-th iteration are determined:

$$ \begin{aligned} {\mathbf{W}}_{\alpha } ({\tilde{\boldsymbol{\upupsilon}}}_{\beta }^{j} ) & = {\text{Diag}}\left( {\left( {{\tilde{\boldsymbol{\upupsilon}}}_{1\beta }^{j} } \right)^{2} q_{1}^{ - 2} , \ldots ,\left( {{\tilde{\boldsymbol{\upupsilon}}}_{n\beta }^{j} } \right)^{2} q_{n}^{ - 2} } \right) \\ {\mathbf{W}}_{\beta } ({\tilde{\boldsymbol{\upupsilon}}}_{\alpha }^{j} ) & = {\text{Diag}}\left( {\left( {{\tilde{\boldsymbol{\upupsilon}}}_{1\alpha }^{j} } \right)^{2} q_{1}^{ - 2} , \ldots ,\left( {{\tilde{\boldsymbol{\upupsilon}}}_{n\alpha }^{j} } \right)^{2} q_{n}^{ - 2} } \right) \\ \end{aligned} $$
(52)

Based on these matrices and taking \({\hat{\mathbf{X}}}_{ \otimes \alpha }^{j} = {\hat{\mathbf{X}}}_{\alpha }^{j} \otimes {\mathbf{I}}_{n}\), \({\hat{\mathbf{X}}}_{ \otimes \beta }^{j} = {\hat{\mathbf{X}}}_{\beta }^{j} \otimes {\mathbf{I}}_{n}\), we can determine matrix \({{\varvec{\Theta}}}^{j}\), Eq. (30); then, the combined vector of increments is calculated:

$$ \delta {\hat{\mathbf{X}}}^{j + 1} = \left( {({\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{A} }} - {\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{E} }}^{j} )^{T} ({{\varvec{\Theta}}}^{j} )^{ - 1} ({\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{A} }} - {\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{E} }}^{j} )} \right)^{ - 1} ({\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y} }} -\mathop{\mathbf{A}}\limits^{\frown}\hat{\mathbf{X}}^{j} ) $$
(53)

This vector enables the calculation of Lagrange multiplier vector that is valid in the (\(j + 1\)) iteration

$$ {\hat{\boldsymbol{\lambda}}}^{j + 1} = - ({{\varvec{\Theta}}}^{j} )^{ - 1} \left( {{\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y} }} - \mathop{\mathbf{A}}\limits^{\frown}\hat{\mathbf{X}}^{j} - ({\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{A} }} - {\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{E} }}^{j} )\delta {\hat{\mathbf{X}}}^{j + 1} } \right) $$
(54)

and then the calculation of combined residual vectors

$$ {\tilde{\boldsymbol{\upupsilon}}}^{j + 1} = {\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{W} }}^{ - 1} ({\tilde{\boldsymbol{\upupsilon}}}_{\alpha }^{j} ,{\tilde{\boldsymbol{\upupsilon}}}_{\beta }^{j} )({{\varvec{\Theta}}}^{j} )^{ - 1} \left( {{\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y} }} - \mathop{\mathbf{A}}\limits^{\frown}\hat{\mathbf{X}}^{j} - ({\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{A} }} - {\mathbf{E}}^{j} )\delta {\hat{\mathbf{X}}}^{j + 1} } \right) $$
(55)

and

$$ {\tilde{\mathbf{e}}}^{j + 1} = - {\mathbf{Q}}_{{\mathbf{e}}} {\hat{\mathbf{X}}}_{ \otimes }^{j} ({{\varvec{\Theta}}}^{j} )^{ - 1} \left( {{\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y} }} - {\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{A} }}\hat{X}^{j} - ({\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{A} }} - {\mathbf{E}}^{j} )\delta {\hat{\mathbf{X}}}^{j + 1} } \right) $$
(56)

Based on vector \({\tilde{\mathbf{e}}}^{j + 1}\), it is necessary to build a matrix of disturbances \({\tilde{\mathbf{E}}}^{j + 1}\) and to calculate the vector of the following parameters which are valid in the (\(j + 1\)) iteration:

$$ {\hat{\mathbf{X}}}^{j + 1} = \left( {({\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{A} }} - {\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{E} }}^{j} )^{T} ({{\varvec{\Theta}}}^{j} )^{ - 1} ({\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{A} }} - {\mathbf{E}}^{j} )} \right)^{ - 1} ({\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{A} }} - {\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{E} }}^{j} )^{T} ({{\varvec{\Theta}}}^{j} )^{ - 1} ({\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y} }} - {\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{E} }}^{j} {\hat{\mathbf{X}}}^{j} ) $$
(57)

Step 3: Repeat Step 1 and Step 2 until

$$ \left\| {{\hat{\mathbf{X}}}^{j + 1} - {\hat{\mathbf{X}}}^{j} } \right\| < \varepsilon_{0} $$
(58)

Once criterion (58) has been satisfied, it is assumed that TMsplit estimator of the combined vector of \({\mathbf{X}} = \left[ {{\mathbf{X}}_{\alpha }^{T} ,\;{\mathbf{X}}_{\beta }^{T} } \right]^{T}\) parameters is vector \({\hat{\mathbf{X}}} = {\hat{\mathbf{X}}}^{j + 1}\). The blocks of this vector are TMsplit estimators \({\hat{\mathbf{X}}}_{\alpha }\) and \({\hat{\mathbf{X}}}_{\beta }\), the “removal” of which from vector \({\hat{\mathbf{X}}}\) may be facilitated by relationships

$$ {\hat{\mathbf{X}}}_{\alpha } = {\mathbf{D}}_{{X_{\alpha } }} {\hat{\mathbf{X}}}\quad {\text{and}}\quad {\hat{\mathbf{X}}}_{\beta } = {\mathbf{D}}_{{X_{\beta } }} {\hat{\mathbf{X}}} $$
(59)

where \({\mathbf{D}}_{{X_{\alpha } }} = \left[ {{\mathbf{I}}_{m} ,\;{\mathbf{0}}_{m,m} } \right]\) and \({\mathbf{D}}_{{X_{\beta } }} = \left[ {{\mathbf{0}}_{m,m} ,{\mathbf{I}}_{m} \;} \right]\)(\({\mathbf{0}}_{m,m}\)—zero matrix with dimensions of \(m \times m\)). Once the iterative process is complete, the final residual vectors \({\tilde{\boldsymbol{\upupsilon}}} = {\tilde{\boldsymbol{\upupsilon}}}^{k}\) and \({\tilde{\mathbf{e}}} = {\tilde{\mathbf{e}}}^{k}\) are also determined. Based on vector \({\tilde{\boldsymbol{\upupsilon}}} = [{\tilde{\boldsymbol{\upupsilon}}}_{\alpha }^{T} ,{\tilde{\boldsymbol{\upupsilon}}}_{\beta }^{T} ]^{T}\), two versions of the residual vector corresponding to the observation vector y, i.e.

$$ {\tilde{\boldsymbol{\upupsilon}}}_{\alpha } = {\mathbf{D}}_{{\upsilon_{\alpha } }} {\tilde{\boldsymbol{\upupsilon}}}\quad {\text{and}}\quad {\tilde{\boldsymbol{\upupsilon}}}_{\beta } = {\mathbf{D}}_{{\upsilon_{\beta } }} {\tilde{\boldsymbol{\upupsilon}}} $$
(60)

where \({\mathbf{D}}_{{\upsilon_{\alpha } }} = \left[ {{\mathbf{I}}_{n} ,\;{\mathbf{0}}_{n,n} } \right]\) and \({\mathbf{D}}_{{\upsilon_{\beta } }} = \left[ {{\mathbf{0}}_{n,n} ,{\mathbf{I}}_{n} \;} \right]\), are obtained. On the other hand, vector \({\tilde{\mathbf{e}}}\) provides the basis for the construction of residual matrix \({\tilde{\mathbf{E}}}\) corresponding to matrix A, hence also \({\tilde{\mathbf{A}}} = {\mathbf{A}} - {\tilde{\mathbf{E}}}\).

The given above process of iterative determination of TMsplit estimates is also described by the flowchart presented in Fig. 1

Fig. 1
figure 1

Flochart of the Total Msplit estimation

4 Examples

4.1 Example 1: competitive models of systematic errors

In one of the examples provided in a study by Wiśniewski (2010), it was assumed that \(y_{i}\), \(i = 1, \ldots ,n\) were observations of a certain value of \(Y\) disturbed not only with random errors \(v_{i}\) but also with systematic errors \(s_{i} = s(t_{i} ) = a + bt_{i}\) (e.g. Wiśniewski 1985; Kubáčková and Kubáček 1991; Yang and Zhang 2005). The problem, however, is that two versions of this model can be used: \(s_{\alpha } (t_{i} ) = a_{\alpha } + b_{\alpha } t_{i}\) and \(s_{\beta } (t_{i} ) = a_{\beta } + b_{\beta } t_{i}\), whereas it is not known which of them concerns specific observation \(y_{i}\). For this reason, in Msplit estimation, the classical observation model

$$ y_{i} = Y + s(t_{i} ) + v_{i} = (Y + a) + bt_{i} + v_{i} = X + bt_{i} + v_{i} $$
(61)

is split into the following models:

$$ \begin{aligned} y_{i} & = (Y + a_{\alpha } ) + b_{\alpha } t_{i} + v_{i\alpha } = X_{\alpha } + b_{\alpha } t_{i} + v_{i\alpha } \\ y_{i} & = (Y + a_{\beta } ) + b_{\beta } t_{i} + v_{i\beta } = X_{\beta } + b_{\beta } t_{i} + v_{i\beta } \\ \end{aligned} $$
(62)

where \(X = Y + a\). In these models, two mutually competing versions of the parameters occur, namely \(X_{\alpha } = Y + a_{\alpha }\), \(X_{\beta } = Y + a_{\beta }\). Observations were simulated with the assumption of theoretical values of the parameters \(X_{\alpha } ,\;b_{\alpha }\) and \(X_{\beta } ,\;b_{\beta }\). Theoretical observations \(\overline{y}_{i}\), \(i = 1, \ldots ,10\), were affected by Gaussian errors with the expected value of 0, and standard deviation of \(\sigma_{y}\). For theoretical values \(X_{\alpha } = 6.0\), \(b_{\alpha } = 0.5\), \(X_{\beta } = 3.0\), \(b_{\beta } = 1.0\), and standard deviation \(\sigma_{y} = 0.14\), the set of observations presented in Table 1 was obtained. The table also presents Msplit estimates, namely \(\hat{X}_{\alpha }\), \(\hat{b}_{\alpha }\), \(\hat{X}_{\beta }\), \(\hat{b}_{\beta }\), the mutual competitive residuals \(\hat{v}_{i\alpha }\),\(\hat{v}_{i\beta }\) and weights related to such residuals \(w_{\alpha } (\hat{v}_{i\beta } ) = \hat{v}_{i\beta }^{2} q_{i}^{ - 2}\), \(w_{\beta } (\hat{v}_{i\alpha } ) = \hat{v}_{i\alpha }^{2} q_{i}^{ - 2}\) (for \(q_{i}^{ - 2} = \sigma_{y}^{ - 4}\)). A graphical illustration of the set of observations, and a graphical interpretation of the obtained results (as compared to LS-estimators determined using model (61)), are shown in Fig. 2. For the sake of clarity, the figure shows the competitive residuals only of the observation \(y_{10}\). In this figure, the mutually competing results of Msplit estimation are conventionally denoted as Msplit(α) and Msplit(β) (when it is convenient, these notations will also be used further on in this paper).

Table 1 Observed data and results of Msplit estimation (Wiśniewski 2010)
Fig. 2
figure 2

Set of observations and results of Msplit estimation (as compared to the results of LS-estimation). The example competitive residuals \(\hat{v}_{10,\alpha }\) and \(\hat{v}_{10,\beta }\) for the observation \(y_{10} = 11.2\)

The models contained in Eq. (62) will now be replaced with EIV models of the following form:

$$ \begin{aligned} y_{i} & = X_{\alpha } + b_{\alpha } (t_{i} - e_{{t_{i} }} ) + \upsilon_{i\alpha } \\ y_{i} & = X_{\beta } + b_{\beta } (t_{i} - e_{{t_{i} }} ) + \upsilon_{i\beta } \\ \end{aligned} $$
(63)

where \(e_{{t_{i} }}\) is a random error affected to the variable \(t_{i}\). For n observations, based on Eq. (63), models \({\mathbf{y}} = ({\mathbf{A}} - {\mathbf{E}}){\mathbf{X}}_{\alpha } + {{\varvec{\upupsilon}}}_{\alpha }\) and \({\mathbf{y}} = ({\mathbf{A}} - {\mathbf{E}}){\mathbf{X}}_{\beta } + {{\varvec{\upupsilon}}}_{\beta }\) are constructed, where

$$ \begin{aligned} {\mathbf{A}} & = \left[ {\begin{array}{*{20}l} {1_{1} } \hfill & {t_{1} } \hfill \\ \vdots \hfill & \vdots \hfill \\ {1_{n} } \hfill & {t_{n} } \hfill \\ \end{array} } \right], \\ {\mathbf{E}} & = \left[ {\begin{array}{*{20}l} {0_{1} } \hfill & {e_{{t_{1} }} } \hfill \\ \vdots \hfill & \vdots \hfill \\ {0_{n} } \hfill & {e_{{t_{n} }} } \hfill \\ \end{array} } \right] = \left[ {{\mathbf{0}}_{n} ,\;{\mathbf{e}}_{t} } \right], \\ {\mathbf{X}}_{\alpha } & = \left[ {\begin{array}{*{20}c} {X_{\alpha } } \\ {b_{\alpha } } \\ \end{array} } \right],{\mathbf{X}}_{\beta } = \left[ {\begin{array}{*{20}c} {X_{\beta } } \\ {b_{\beta } } \\ \end{array} } \right] \\ \end{aligned} $$
(64)

(the first column of matrix A is not random). Vector \({\mathbf{e}} = {\text{vec}}({\mathbf{E}})\), built from matrix E columns, has the following form:

$$ {\mathbf{e}} = \left[ {0_{1} , \ldots ,0_{n} ,e_{{t_{1} }} , \ldots ,e_{{t_{n} }} } \right]^{T} = \left[ {{\mathbf{0}}_{n}^{T} ,\;{\mathbf{e}}_{t}^{T} } \right]^{T} $$
(65)

In view of the structure of this vector, \({\mathbf{Q}}_{{\mathbf{e}}}\) cofactor matrix, similarly as in Schaffrin and Wieser (2008) and Shen et al. (2011), will be expressed in the following form:

$$ {\mathbf{Q}}_{{\mathbf{e}}} = {\mathbf{Q}}_{0} \otimes {\mathbf{Q}}_{{\mathbf{x}}} = {\mathbf{Q}}_{0} \otimes {\mathbf{Q}}_{{{\mathbf{e}}_{t} }} \quad {\text{with}}\quad {\mathbf{Q}}_{0} = \left[ {\begin{array}{*{20}c} 0 & 0 \\ 0 & 1 \\ \end{array} } \right] $$
(66)

where \({\mathbf{Q}}_{{{\mathbf{e}}_{t} }}\) is the cofactor matrix of vector \({\mathbf{e}}_{t}\). Note that here \({\mathbf{Q}}_{{{\mathbf{e}}_{t} }}\) is regular, but \({\mathbf{Q}}_{0}\) is not; thus, \({\mathbf{Q}}_{{\mathbf{e}}}\) is not regular too (the similar situation is in the example given by Schaffrin and Wieser 2008). Random errors \(e_{{t_{i} }}\) are simulated as Gaussian quantities with the expected value of 0, and standard deviation of \(\sigma_{e}\). Msplit and TMsplit estimators of model (63) parameters will be determined for four values of standard deviation: \(\sigma_{e} = 0\) (variant I), \(\sigma_{e} = 0.13\) (variant II), \(\sigma_{e} = 0.28\)(variant III), \(\sigma_{e} = 0.37\)(variant IV). In each of these variants, observations \(y_{i}\), \(i = 1, \ldots ,10\), are as in the example cited above (Table 1). The data adopted for the calculations are listed in Table 2, while the Msplit and TMsplit estimators obtained for these data are presented in Table 3. Additionally, the residuals \(\tilde{e}_{{t_{i} }}\) are presented in Table 4. Table 5 shows the competitive residuals and the respective weights for variant III.

Table 2 Observed data for Total Msplit estimation (variant I, II, III, IV)
Table 3 Results of Msplit and Total Msplit estimation (variant I, II, III, IV)
Table 4 Residuals \(\tilde{e}_{t}\) column of A
Table 5 Residuals and weights in Msplit and Total Msplit estimations (variant III, \(\sigma_{e} = 0.28\))

In variant I, since matrix A is not disturbed by random disturbances, TMsplit estimators are equal to Msplit estimators. With an increase in the \(\sigma_{e}\) standard deviation value, the differences between these estimators increase, with TMsplit estimators remaining close to the theoretical values (as do Msplit estimators for the variant without random errors of matrix A, \(\sigma_{e} = 0\)). This is well illustrated by the norm values of the vector of differences between the vector of theoretical parameters \({\mathbf{X}} = [{\mathbf{X}}_{\alpha }^{T} ,{\mathbf{X}}_{\beta }^{T} ]^{T}\) and the vector of obtained estimators \({\hat{\mathbf{X}}} = [{\hat{\mathbf{X}}}_{\alpha }^{T} ,{\hat{\mathbf{X}}}_{\beta }^{T} ]^{T}\), provided in the last row of Table 3.

In general, the iterative process involved in the determination of TMsplit estimators ended after 4 to 6 steps of “external” iteration. In each of these steps, 6 to 7 “internal” iterations were carried out resulting in Msplit estimators that are valid for this step. The course of the iterative process in Total Msplit estimation, based on the example of variant II, is presented in Fig. 3.

Fig. 3
figure 3

Iterative process resulting in TMsplit estimators of parameters \(X_{\alpha } ,\;b_{\alpha }\) and parameters \(X_{\beta } ,\;b_{\beta }\) competing in relation to them (j—the number of “external” iterations, l—the number of “internal” iterations)

The given examples apply one observation set, respectively. The additional analyses will be based on Monte Carlo (MC) simulations. The main objective is to determine the empirical accuracy of Total Msplit estimation and the measures of its efficacy.

The accuracy of Msplit estimation can be determined by applying asymptotical covariance matrices proposed in Wiśniewski and Zienkiewicz (2021b). The diagonal elements allow us to compute the estimated standard deviation \(\sigma_{{\hat{X}_{\alpha } }}\), \(\sigma_{{\hat{b}_{\alpha } }}\), \(\sigma_{{\hat{X}_{\beta } }}\), \(\sigma_{{\hat{b}_{\beta } }}\) of the respective Msplit estimates. To apply such an approach to Total Msplit estimation, one should develop the theory presented in the paper mentioned, which is beyond the scope of the present paper. Based on simulated observation sets, an empirical way could be an alternative for the analytical assessment in question. Total Msplit estimates \(\hat{X}_{\alpha }^{k}\), \(\hat{b}_{\alpha }^{k}\),\(\hat{X}_{\beta }^{k}\), \(\hat{b}_{\beta }^{k}\) are computed for each \(k = 1, \ldots ,N\) simulation, which is a base for determining the following MC-estimates of the parameters \(X_{\alpha }\), \(b_{\alpha }\),\(X_{\beta }\), \(b_{\beta }\):

$$ \begin{gathered} \hat{X}_{\alpha }^{MC} = \frac{1}{N}\sum\limits_{k = 1}^{N} {\hat{X}_{\alpha }^{k} } ,\quad \hat{b}_{\alpha }^{MC} = \frac{1}{N}\sum\limits_{k = 1}^{N} {\hat{b}_{\alpha }^{k} } , \hfill \\ \hat{X}_{\beta }^{MC} = \frac{1}{N}\sum\limits_{k = 1}^{N} {\hat{X}_{\beta }^{k} } ,\quad \hat{b}_{\beta }^{MC} = \frac{1}{N}\sum\limits_{k = 1}^{N} {\hat{b}_{\beta }^{k} } \hfill \\ \end{gathered} $$
(67)

The Monte Carlo estimators of the parameter standard deviations can be computed in the following way (e.g. Koch 2013; Nowel 2016; Lv and Sui 2020).

$$ \begin{aligned} \hat{\sigma }_{{\hat{X}_{\alpha } }}^{MC} & = \sqrt {\frac{1}{N}\sum\limits_{k = 1}^{N} {(\hat{X}_{\alpha }^{k} - \hat{X}_{\alpha }^{MC} )^{2} } } , \\ \hat{\sigma }_{{\hat{b}_{\alpha } }}^{MC} & = \sqrt {\frac{1}{N}\sum\limits_{k = 1}^{N} {(\hat{b}_{\alpha }^{k} - \hat{b}_{\alpha }^{MC} )^{2} } } \\ \hat{\sigma }_{{\hat{X}_{\beta } }}^{MC} & = \sqrt {\frac{1}{N}\sum\limits_{k = 1}^{N} {(\hat{X}_{\beta }^{k} - \hat{X}_{\beta }^{MC} )^{2} } } , \\ \hat{\sigma }_{{\hat{b}_{\beta } }}^{MC} & = \sqrt {\frac{1}{N}\sum\limits_{k = 1}^{N} {(\hat{b}_{\beta }^{k} - \hat{b}_{\beta }^{MC} )^{2} } } \\ \end{aligned} $$
(68)

Such quantities determine the accuracy of the estimates obtained. They can also be used to compare the stability of Msplit and Total Msplit estimators.

Msplit estimation and its several developments (including Total Msplit estimation) focus on optimal fitting the competitive functional models in the observation set. The efficacy of Msplit estimation, like the efficacy of other methods, can be described by the differences between the parameter estimates obtained and the actual parameter values. Considering N simulations, the efficacy is usually measured by the root mean squared error (RMSE) (e.g. Kargoll et al. 2018; Lv and Sui 2020). Here, the efficacy of Msplit or Total Msplit estimates is determined by following RMSEs

$$ \begin{aligned} {\text{RMSE}}_{{\hat{X}_{\alpha } }} & = \sqrt {\frac{1}{N}\sum\limits_{k = 1}^{N} {(\hat{X}_{\alpha }^{k} - X_{\alpha } )^{2} } } , \\ {\text{RMSE}}_{{\hat{b}_{\alpha } }} & = \sqrt {\frac{1}{N}\sum\limits_{k = 1}^{N} {(\hat{b}_{\alpha }^{k} - b_{\alpha } )^{2} } } \\ {\text{RMSE}}_{{\hat{X}_{\beta } }} & = \sqrt {\frac{1}{N}\sum\limits_{k = 1}^{N} {(\hat{X}_{\beta }^{k} - X_{\beta } )^{2} } } , \\ {\text{RMSE}}_{{\hat{b}_{\beta } }} & = \sqrt {\frac{1}{N}\sum\limits_{k = 1}^{N} {(\hat{b}_{\beta }^{k} - b_{\beta } )^{2} } } \\ \end{aligned} $$
(69)

Additionally, the global root mean squared error, concerning the whole parameter vector, is determined (e.g. Wiśniewski 2014)

$$ {\text{RMSE}}_{{{\hat{\mathbf{X}}}}} = \sqrt {\frac{1}{N}\sum\limits_{k = 1}^{N} {({\hat{\mathbf{X}}}^{k} - {\mathbf{X}})^{T} ({\hat{\mathbf{X}}}^{k} - {\mathbf{X}})/r} } $$
(70)

where r is the number of estimated parameters (here \(r = 4\)).

The simulated errors \(v_{i}^{k}\), \(e_{t,i}^{k}\), \(i = 1, \ldots ,10\) (for each \(k = 1, \ldots ,N\)) affect the theoretical observations \(\overline{y}_{i}\) or the elements \(a_{i,2}\) of the second column of the matrix A (the theoretical observations and the theoretical matrix A stay the same). The simulations are performed by applying the Gaussian random generators of the MatLab system, \(\sigma_{y} randn(n,1)\) or \(\sigma_{e} randn(n,1)\), respectively.

First, let us examine the efficacy of Msplit estimates, which apply the models of (62). Since matrix A is constant, only observation errors are simulated. The computations are determined in several variants of the observation standard deviation, i.e. \(\sigma_{y} = 0.05,\;\;0.1,\;\;0.2,\;\;0.3\). Table 6 presents results obtained for \(N = 3000\) and the theoretical parameter values \({\mathbf{X}} = [X_{\alpha } ,\;b_{\alpha } ,\;X_{\beta } ,\;b_{\beta } ]^{T} =\) \([6.0,\;0.5,\;3.0,\;1.0]^{T}\). Figure 4 shows Msplit estimates obtained in each simulation and MC estimates (for \(\sigma_{y} = 0.1\)).

Table 6 Msplit estimates and their accuracy and efficacy (for sets without random disturbances in coefficients in the functional models)
Fig. 4
figure 4

Msplit estimates obtained in each simulation \(k = 1, \ldots ,3000\) (for \(\sigma_{y} = 0.1\) and the theoretical values of parameters \(X_{\alpha } = 6.0\), \(b_{\alpha } = 0.5\),\(X_{\beta } = 3.0\), \(b_{\beta } = 1.0\))

The accuracy and efficacy of the estimates \(\hat{b}_{\alpha }\) and \(\hat{b}_{\beta }\) are the most satisfying. The values of \(\hat{\sigma }_{{\hat{b}_{\alpha } }}^{MC}\), \(\hat{\sigma }_{{\hat{b}_{\beta } }}^{MC}\) and \({\text{RMSE}}_{{\hat{b}_{\alpha } }}\), \({\text{RMSE}}_{{\hat{b}_{\beta } }}\) are relatively small for all values of the observation standard deviations. The values obtained for the estimates \(\hat{X}_{\alpha }\), \(\hat{X}_{\beta }\) are higher; however, they are still acceptable.

Let us now use the models (63) to examine how random disturbances of matrix A might influence the accuracy and efficacy of the estimates. The observation errors are simulated assuming the constant standard deviation \(\sigma_{y} = 0.1\), whereas the errors \(e_{t,i}\) in several variants, in which \(\sigma_{e} = 0,\;\;0.05,\;\;0.1,\;\;0.2,\;\;0.3\). Table 7 presents the results for \(N = 3000\).

Table 7 Msplit estimates and their accuracy and efficacy (for sets with random disturbances in coefficients in the functional models)

Both accuracy and efficacy of Msplit estimates decrease when the coefficient matrix is disturbed with random errors. That effect can be reduced by using Total Msplit estimation, for which the measures in questions are smaller. As in the previous case, TMsplit estimates of the parameters \(b_{\alpha }\) and \(b_{\beta }\) have the most satisfying accuracy and efficacy. The efficacy of Total Msplit estimation is confirmed by the parameter estimates obtained in each simulation. The example estimates and the MC estimates are presented in Fig. 5 (for \(\sigma_{e} = 0.2\)).

Fig. 5
figure 5

TMsplit estimates obtained in each simulation \(k = 1, \ldots ,3000\) (for \(\sigma_{y} = 0.1\),\(\sigma_{e} = 0.2\) and the theoretical values of parameters \(X_{\alpha } = 6.0\), \(b_{\alpha } = 0.5\),\(X_{\beta } = 3.0\) \(b_{\beta } = 1.0\))

4.2 Example 2: linear regression

Schaffrin and Wieser (2008) as well as Shen et al. (2011) and Mahboub (2012) applied WTLS for the estimation of the intercept \(\xi_{1}\) and slope \(\xi_{2}\) of the regression line

$$ y_{i} = \xi_{1} + (x_{i} - e_{i} )\xi_{2} - \upsilon_{i} $$
(71)

Let it now be assumed that the set of observations not only contains observations concerning model (71) but also observations for which the regression line differs in parameters \(\xi_{1}\) and \(\xi_{2}\) (in Total Msplit estimation, these will be parameters \(\xi_{1\beta }\) and \(\xi_{2\beta }\)). These observations will hereinafter be referred to conventionally as outliers. In contrast to the classical approach, the deviation here is of a different nature, and is not necessarily related to the existence of gross errors. If the assignment of observation \(y_{i}\) to its respective regression line is not known, then this observation may correspond to both the following model

$$ y_{i} = \xi_{1} + (x_{i} - e_{i} )\xi_{2} - \upsilon_{i} = \xi_{1\alpha } + (x_{i} - e_{i} )\xi_{2\alpha } - \upsilon_{i\alpha } $$
(72)

and to the model that is competing in relation to it, namely:

$$ y_{i} = \xi_{1\beta } + (x_{i} - e_{i} )\xi_{2\beta } - \upsilon_{i\beta } $$
(73)

For the estimation of parameters in models (72) and (73), Total Msplit estimation will be applied. The calculations will be carried out using the data provided in (Neri et al. 1989) and also used in (Schaffrin and Wieser 2008; Shen et al. 2011; Mahboub 2012). These data will be supplemented with two variants of bias. In the first of these variants, regression line (73) with theoretical parameters \(\xi_{1\beta } = 2.0\), \(\xi_{2\beta } = 0.75\) is adopted, while in the second variant, \(\xi_{1\beta } = 4.5\), \(\xi_{2\beta } = - 0.70\) is adopted. For the outliers, weights \(W_{{x_{i} }}\) were determined based on the information on the weights of coordinates \(x_{i}\) concerning the group of original observations (where necessary, also using interpolation). Moreover, the outliers are assigned equal weights \(W_{{y_{i} }} = 50\), which corresponds to the standard deviation \(\sigma_{y} = 0.14\). The values of these weights are very important to the success of Total Msplit estimation. Too high accuracy (high weights) of the added observations may cause the original observations to be “ignored”, while too low accuracy (low weights) may cause the added observations to be “ignored”. In the presented example, satisfying results that were not much different from each other were obtained for \(40 < W_{{y_{i} }} < 80\). Original observations (\(i = 1, \ldots ,10\)) (Neri et al. 1989), outliers (\(i = 11,12,13\)), and the weights \(W_{{x_{i} }}\) and \(W_{{y_{i} }}\), corresponding to these observations, are listed in Table 8. The results of model (71) parameter estimation using WTLS for original observations, transcribed from Shen et al. (2011), are provided in columns 2 and 3 of Table 9 [the estimate \(\hat{\sigma }_{0}\) is computed by applying Eq. (19)]. Based on original observations and using models (72) and (73), TMsplit estimators of the parameters occurring there were calculated (column 4, Table 9). A graphical interpretation of the original set of observations and the location of regression lines (determined based on the WTLS and TMsplit estimators) in this set are shown in Fig. 6. On the other hand, the WTLS and TMsplit estimators determined for the sets extended to include outliers are provided in other columns of Table 9. These sets and the corresponding regression lines are shown in Fig. 7.

Table 8 Observed data and corresponding weights
Table 9 WTLS and Total Msplit estimation results for the sets containing biases (additional regression line) (for TMsplit: \(\xi_{1} : = \xi_{1\alpha }\), \(\xi_{2} : = \xi_{2\alpha }\))
Fig. 6
figure 6

A set of original observations (Shen et al. 2011) and the regression line position were determined based on WTLS and TMsplit estimators

Fig. 7
figure 7

Sets containing outliers (green) and the regression line position determined based on WTLS and TMsplit estimators

The TMsplit estimators determined for the original set of observations may appear not wholly satisfactory (column 4, Table 9). Due to the lack of outliers, Total Msplit estimation, predicting the existence of two mutually competing regression lines, forces two regression lines to fit into the set of observations (see Fig. 6). It should be noted that the regression line established by WTLS estimators lies between these lines. The average values of TMsplit estimators \(\hat{\xi }_{{1{\text{M}}_{{{\text{split}}}} }} = (\hat{\xi }_{1\alpha } + \hat{\xi }_{\beta } )/2 = 5.4069\) and \(\hat{\xi }_{{{\text{2M}}_{{{\text{split}}}} }} = (\hat{\xi }_{2\alpha } + \hat{\xi }_{2\beta } )/2 = - 0.4612\), as compared to WTLS estimators \(\hat{\xi }_{1} = 5.4799\) and \(\hat{\xi }_{2} = - 0.4805\), can already be considered satisfactory. For both sets containing outliers, TMsplit estimators \(\hat{\xi }_{1\alpha }\) and \(\hat{\xi }_{2\alpha }\) are close to the corresponding WTLS estimators obtained for the original set of observations. On the other hand, estimators \(\hat{\xi }_{1\beta }\) and \(\hat{\xi }_{2\beta }\) are close to the true values of parameters \(\xi_{1\beta } = 2.0\), \(\xi_{2\beta } = 0.75\)(Variant I) and \(\xi_{1\beta } = 4.5\), \(\xi_{2\beta } = - 0.70\)(Variant II). In such cases, WTLS estimators yielded no good answers. This is particularly true for Variant II for which the relevant comparisons are particularly unfavourable. The results obtained using WTLS are not surprising, as the lack of WTLS estimators' robustness to observations is their inherent feature.

The example presented above concerned a situation where outliers can be assigned a regression line that is appropriate for them. In practice, however, the outlying of observations may relate to single observations and result from, for example, the effect of gross errors. In order to check the response of TMsplit and WTLS estimators to such errors, one of the observations from the original set will be affected by gross error with a few value versions. For example, let it be assumed that such an observation is \(y_{5} = 3.5\) (\(x_{5} = 3.3\)) with weights \(W_{{x_{5} }} = 200\) and \(W_{{y_{5} }} = 20\) (Table 8). This observation will be affected by gross error with values of \(g = 1\), \(g = 2\), \(g = 5\), \(g = 10\), respectively. The data adopted for the calculations are provided in Table 10. On the other hand, the TMsplit and WTLS estimators of parameters \(\xi_{1}\) and \(\xi_{2}\) are provided in Table 11.

Table 10 Observations and weights corresponding to them (observation \(y_{5}\), highlighted in bold, is affected by gross error g with different values)
Table 11 TMsplit and WTLS estimators determined for the set containing an observation affected by gross error (for TMsplit:\(\xi_{1} : = \xi_{1\alpha }\), \(\xi_{2} : = \xi_{2\alpha }\))

TMsplit estimators \(\hat{\xi }_{1\alpha }\) and \(\hat{\xi }_{2\alpha }\) for each accepted gross error value are satisfactory, especially in comparison with WTLS estimators being obtained, which are unacceptable even for small gross errors. The quantities \(\hat{\xi }_{1\beta }\) and \(\hat{\xi }_{2\beta }\) are competing in relation to \(\hat{\xi }_{1\alpha }\) and \(\hat{\xi }_{2\alpha }\). These are estimators of the parameters of the \(y_{5} = \xi_{1\beta } + \xi_{2\beta } x_{5}\) regression line on which the observation affected by gross error should lie. By using the equation \(\tilde{y}_{5} = \hat{\xi }_{1\beta } + \hat{\xi }_{2\beta } x_{5}\), it is possible to calculate the prediction of observation \(y_{5}\) affected by gross error. The prediction of this observation for the adopted gross error values, as compared to its simulated values, is presented in Table 12. A graphical interpretation of the obtained results is provided in Fig. 8.

Table 12 Prediction of an observation affected by gross error
Fig. 8
figure 8

Regression lines determined using Total Msplit estimation based on the set containing a single observation affected by gross error \(g\) with different values

4.3 Example 3: Two-dimensional affine transformation

The estimators of parameters in the EIV model, which are robust to gross errors, include inter alia robust total least-squares (RTLS) and total least trimmed squares (TLTS) estimators (Wang et al. 2016; Lv and Sui 2020). One of the examples provided in Lv and Sui (2020) concerns a two-dimensional affine transformation carried out based on observations affected by gross errors. In that case, the authors applied the following transformation model:

$$ \left[ {\begin{array}{*{20}c} {u_{t} } \\ {\upsilon_{t} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}l} {u_{s} } \hfill & {\upsilon_{s} } \hfill & 1 \hfill & 0 \hfill & 0 \hfill & 0 \hfill \\ 0 \hfill & 0 \hfill & 0 \hfill & {u_{s} } \hfill & {\upsilon_{s} } \hfill & 1 \hfill \\ \end{array} } \right]\left[ {\begin{array}{*{20}l} {a_{1} } \hfill \\ {b_{1} } \hfill \\ {c_{1} } \hfill \\ {a_{2} } \hfill \\ {b_{2} } \hfill \\ {c_{2} } \hfill \\ \end{array} } \right] $$
(74)

where \(u_{s}\), \(\upsilon_{s}\) and \(u_{t}\), \(\upsilon_{t}\) are the coordinates of the common points in the start and target coordinate systems, while \(a_{1}\), \(b_{1}\), \(c_{1}\), \(a_{2}\),\(b_{2}\), \(c_{2}\) are the parameters being determined. Table 13 presents 15 observation points simulated in the start and target coordinate systems (Lv and Sui 2020). Some of these observations (highlighted in bold) are affected by gross errors. For these data, Lv and Sui (2020) applied TLTS estimation (using two different algorithms yielding the same results) as well as TLTS with RTLS as a starting point (hereafter denoted as TLTS/RTLS). The authors compared the obtained estimators with WTLS and RTLS estimators. These results may now be supplemented with TMsplit estimators (the seventh column of Table 14). TMsplit estimators will also be calculated for the data without gross errors. All the cited and calculated estimators are provided in Table 14.

Table 13 Observed points with outliers in the start and target coordinate systems (Lv and Sui 2020). Boldface numbers indicate outliers
Table 14 A comparison of the estimated parameters from different methods (for TMsplit: \(a_{1} : = a_{1\alpha }\), \(b_{1} : = b_{1\alpha }\), \(c_{1} : = c_{1\alpha }\), \(a_{2} : = a_{2\alpha }\), \(b_{2} : = b_{2\alpha }\), \(c_{2\alpha } : = c_{2}\))

Table 14 shows that TMsplit(α) estimates are generally the closest to TLTS estimates. However, in the case of parameters \(a_{1}\), \(b_{1}\), \(c_{1}\) and \(b_{2}\), TMsplit(α) estimators are closest to TLTS/RTLS estimators and thus to the true parameter values. The interpretation of TMsplit(β) estimators is similar to that in the previous example, i.e. these are estimators of the parameters of the model for observations affected by gross errors. It is worth noting here that in the absence of gross errors, both versions of TMsplit estimators differ slightly from each other.

5 Summary

By using Msplit estimation, it is possible to determine the estimators of mutually competing parameters in classical functional models. However, there are cases in geodetic practice in which classical models need to be replaced with EIV models. The method proposed in this paper, called “Total Msplit estimation”, is an development of Msplit estimation that accepts such models. The Total Msplit estimation objective function was determined by applying the Lagrange approach, as in the case of WTLS method. The mutually competing EIV models occurring in this function were replaced with their linear approximations. This enabled the construction of a relatively simple yet efficient algorithm for determining TMsplit estimators. The basis of this algorithm is the iterative updating of EIV models (external iterations) based on Msplit estimators (internal iterations) obtained in the previous iterative step. The proposed algorithm is efficient in all cases presented in the paper. It concerns both results and the flow of the iterative process. The problems with the convergence of the external iterations might occur because of the linear approximation of EIV models applied. It is especially evident when the errors disturbing the matrix A are too big.

The examples presented in the paper showed that the properties of TMsplit estimators are, in general, similar to the properties of Msplit estimators. If the elements of matrix A are not affected by random errors, then TMsplit and Msplit estimators are equal to each other. The possibility of determining estimators of competing parameters is of particular importance when the sets of observations are a mixture of the realisations of two random variables with mutually competing positional parameters. Such a situation occurs in Examples 1 and 2, where each observation group can be assigned a corresponding regression line. The problem, however, is that it is not known which of these lines is the best for a particular observation. TMsplit estimators, similarly like Msplit estimators for classical models, yield satisfactory results here. Due to their theoretical origin (neutral LS-method), WTLS estimators are not robust to gross errors. Where the sets are realisations of only a single random variable, Total Msplit estimation further offers two mutually competing solutions. These are forced solutions, yet so close to each other that even in such a situation, it is possible to evaluate the functional model parameters (e.g. after the calculation of average values of respective estimators).

Total Msplit estimation can also be applied for the estimation of EIV model parameters in the case where the outlying of observations results from their being affected by gross errors. From the perspective of the Msplit and TMsplit estimation theory, such a case is not significantly different from that discussed earlier. In the second part of Example 2, it was shown that the determined TMsplit estimators enabled the determination of not only the regression line appropriate for “good” observations (TMsplit(α) solution), but also of the regression line on which the observation affected by gross error lies (TMsplit(β) solution).

In EIV models, it is assumed that matrix A is observed as well. Therefore, its elements can also be affected by gross errors arising from various reasons. Such a situation is the case in Example 3, in which certain coordinates are affected by gross errors, both in the start and target systems in two-dimensional affine transformation. In this example, TMsplit estimators are close to robust TLTS estimators and, for certain transformation parameters, they are also close to the results of TLTS estimation with RTLS estimation as a starting step.

The estimates' accuracy is an important issue (especially in comparing the estimation methods). The accuracy of Msplit estimation can be determined by applying asymptotical covariance matrices. To apply such an approach to Total Msplit estimation, one should develop the theory, which is beyond the scope of the present paper. Section 4.1 presents the assessments of the Msplit and Total Msplit estimates accuracy (empirical standard deviations) obtained from the Monte Carlo simulations. Generally, the accuracy of Msplit estimates decreases with the growing standard deviation of errors disturbing the matrix A. Total Msplit estimates have smaller standard deviations than respective Msplit estimates in such a context. Similar relations concern the measures of efficacy, namely values of RMSEs. Thus, when the competitive functional models are supplemented with EIV models, then Msplit estimation should be replaced by Total Msplit estimation. It is especially advisable when disturbances of matrix A have large standard deviations (here, the application of Total Msplit estimation is justified for \(\sigma_{e} = \sigma_{y}\)).