A Note on the Connection Between Trek Rules and Separable Nonlinear Least Squares in Linear Structural Equation Models

Ernst, Maximilian S.; Peikert, Aaron; Brandmaier, Andreas M.; Rosseel, Yves

doi:10.1007/s11336-022-09891-5

A Note on the Connection Between Trek Rules and Separable Nonlinear Least Squares in Linear Structural Equation Models

Theory and Methods
Open access
Published: 25 December 2022

Volume 88, pages 98–116, (2023)
Cite this article

Download PDF

You have full access to this open access article

Psychometrika Aims and scope Submit manuscript

A Note on the Connection Between Trek Rules and Separable Nonlinear Least Squares in Linear Structural Equation Models

Download PDF

2189 Accesses
1 Altmetric
Explore all metrics

Abstract

We show that separable nonlinear least squares (SNLLS) estimation is applicable to all linear structural equation models (SEMs) that can be specified in RAM notation. SNLLS is an estimation technique that has successfully been applied to a wide range of models, for example neural networks and dynamic systems, often leading to improvements in convergence and computation time. It is applicable to models of a special form, where a subset of parameters enters the objective linearly. Recently, Kreiberg et al. (Struct Equ Model Multidiscip J 28(5):725–739, 2021. https://doi.org/10.1080/10705511.2020.1835484) have shown that this is also the case for factor analysis models. We generalize this result to all linear SEMs. To that end, we show that undirected effects (variances and covariances) and mean parameters enter the objective linearly, and therefore, in the least squares estimation of structural equation models, only the directed effects have to be obtained iteratively. For model classes without unknown directed effects, SNLLS can be used to analytically compute least squares estimates. To provide deeper insight into the nature of this result, we employ trek rules that link graphical representations of structural equation models to their covariance parametrization. We further give an efficient expression for the gradient, which is crucial to make a fast implementation possible. Results from our simulation indicate that SNLLS leads to improved convergence rates and a reduced number of iterations.

A new criterion for assessing discriminant validity in variance-based structural equation modeling

Article Open access 22 August 2014

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

Article 04 June 2018

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Article Open access 05 May 2021

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

In the behavioral and social sciences, structural equation models (SEMs) have become widely accepted as a multivariate statistical tool for modeling the relation between latent and observed variables. Apart from maximum likelihood estimation, least squares (LS) estimation is a common approach for parameter estimation. In LS, parameters are estimated by minimizing a nonlinear function of the parameters and data. In practice, this problem is typically solved by applying generic nonlinear optimization techniques, such as Newton-type gradient descent approaches that iteratively minimize the objective function until convergence is reached. However, for some model classes, generic optimization algorithms can be adapted to make better use of the model structure and thus solve the problem more efficiently. For a particular type of models, the parameters separate, that is, one set of parameters enters the objective in a nonlinear way, while another set of parameters enters the objective linearly. For a vector of observations y and predictors x of size m, the objective is of the form

$$\begin{aligned} \sum _{i = 1}^m \left[ y_i - \sum _{j=1}^n \alpha _j \varphi _j(\beta , x_i)\right] ^2 \end{aligned}$$

(1)

where $\alpha \in \mathbb {R}^n, \beta \in \mathbb {R}^k$ are parameter vectors and the (nonlinear) functions $\varphi _j$ are continuously differentiable w.r.t. $\beta $. Golub and Pereyra (1973) showed that this kind of objective allows for a reformulation of the optimization problem, such that only the parameters $\beta $ have to be obtained iteratively, while the parameters $\alpha $ can be computed after the optimization in a single step. The procedure has been subsequently called separable nonlinear least squares (SNLLS). It has been successfully applied in many disciplines, and it has been observed that the reduced dimension of the parameter space can lead to reduced computation time, a reduced number of iterations and better convergence properties (Golub & Pereyra, 2003). Inspired by earlier work (Kreiberg et al., 2016, 2021) that showed that this procedure can also be applied to factor analysis models, we generalize their result to the entire class of linear structural equation models and give analytical gradients for the reduced optimization problem, which is central for an efficient implementation.

1 Review of Concepts

In the following, we briefly review the notation for structural equation models, the generalized least squares estimator and the trek rules used to derive the model-implied covariance matrix.

1.1 Linear Structural Equation Models

Linear structural equation models can be defined in RAM notation (reticular action model; McArdle & McDonald, 1984) as follows (we follow the notation from Drton, 2018): Let $x, \varepsilon $ be random vectors with values in $\mathbb {R}^m$ and

$$\begin{aligned} x = \varvec{\Lambda }x + \varepsilon \end{aligned}$$

(2)

where $\varvec{\Lambda }\in \mathbb {R}^{m \times m}$ is a matrix of constants or unknown (directed) parameters. Let $\varvec{\Omega }\in \mathbb {R}^{m \times m}$ be the covariance matrix of $\varepsilon $ and $\textbf{I}$ the identity matrix. If $\textbf{I}- \varvec{\Lambda }$ is invertible, Eq. 2 can be solved by $x = (\textbf{I}-\varvec{\Lambda })^{-1}\varepsilon $ with covariance matrix

$$\begin{aligned} \mathbb {V}[x] = \left( \textbf{I}-\varvec{\Lambda }\right) ^{-1} \varvec{\Omega }\left( \textbf{I}-\varvec{\Lambda }\right) ^{-T} \end{aligned}$$

(3)

If x is partitioned into a part $x_{\text {obs}}$ of $m_\textrm{obs}$ observed variables and $x_{\text {lat}}$ of $m_\textrm{lat}$ latent variables, we can reorder x such that $x = \left( x_{\text {obs}}^T \; x_{\text {lat}}^T\right) ^T$, and the covariance matrix of the observed variables is given by

$$\begin{aligned} \varvec{\Sigma }:=\mathbb {V}[x_{\text {obs}}] = \textbf{F}(\textbf{I}-\varvec{\Lambda })^{-1}\varvec{\Omega }(\textbf{I}-\varvec{\Lambda })^{-T}\textbf{F}^T\end{aligned}$$

(4)

where $\textbf{F}= \left[ {\textbf {I}}\,\big |\,{\textbf {0}}\right] \in \mathbb {R}^{m_\textrm{obs} \times (m_\textrm{obs} +m_\textrm{lat})}$ is a rectangular filter matrix. We denote the parameters by $\theta = \left( {\theta _{\Lambda }^{T} \Omega _{\Lambda }^{T}}\right) ^{T}\in \mathbb {R}^{q}$, partitioned into directed parameters from $\varvec{\Lambda }$ and undirected parameters from $\varvec{\Omega }$. (We call them directed or undirected parameters because they correspond to directed or undirected paths in the graph of the model.) If we want to stress that $\varvec{\Sigma }$ is a function of the parameters, we write $\varvec{\Sigma }(\theta )$. If we are also interested in the mean structure, we introduce a vector of (possibly zero) mean parameters $\gamma \in \mathbb {R}^m$ such that $x = \gamma + \varvec{\Lambda }x + \varepsilon $ and obtain

$$\begin{aligned} \mu :=\mathbb {E}[x_{obs}] = \textbf{F}(\textbf{I}- \varvec{\Lambda })^{-1}\gamma \end{aligned}$$

(5)

1.2 Least Squares Estimation

The least squares objective function for $\theta $ is:

$$\begin{aligned} F_{\text {LS}} = (s - \sigma )^T \textbf{V}(s - \sigma ) \end{aligned}$$

(6)

where $\sigma = {{\,\textrm{vech}\,}}(\varvec{\Sigma })$ is the half-vectorization of $\varvec{\Sigma }$, that is, the vector of non-duplicated elements of the model-implied covariance matrix, $s = {{\,\textrm{vech}\,}}(\textbf{S})$ is the half-vectorization of the observed covariance matrix and $\textbf{V}$ is a fixed symmetric positive definite weight matrix. Specific forms of the weight matrix $\textbf{V}$ lead to commonly used special cases of this estimation technique: Generalized least squares estimation uses $\textbf{V}= \frac{1}{2} \, \textbf{D}^T\left( \textbf{S}^{-1} \otimes \textbf{S}^{-1}\right) \textbf{D}$ (where $\textbf{D}$ denotes the duplication matrix from Magnus and Neudecker (2019b)), asymptotic distribution-free estimation uses a consistent estimator of the asymptotic covariance matrix of s, and unweighted least squares estimation uses the identity matrix (Bollen, 1989; Browne, 1982, 1984).

1.3 Trek Rules

To show that in SEM undirected effects enter the least squares objective linearly, we employ trek rules (Drton, 2018), which are path tracing rules used to derive the model-implied covariance between any pair of variables in a SEM (Boker et al., 2002). Various authors have proposed rules to link the graph to the covariance parametrization of the model. Here, we give the rules as put forward by Drton (2018), which are based on treks as basic building blocks (for an overview of alternative formulations see Mulaik, 2009). A trek $\tau $ from a node i to j is a path connecting them, where directed edges can be traveled forwards and backwards, but it is not allowed to walk from one arrowhead into another (without colliding arrowheads). A top node of a trek is a node which has only outgoing edges.

To derive an expression for the model-implied covariance between any two variables i and j based on the postulated SEM, we follow 4 steps:

1.
Find all treks from i to j.
2.
For each trek, multiply all parameters along it.
3.
If a trek does not contain a covariance parameter, factor in the variance of the top node.
4.
Add all obtained trek monomials from the different treks together.

Note that a trek is ordered in the sense that two treks containing exactly the same nodes and edges are considered different if they are traveled in a different order. In particular, each trek has a source (i) and a target (j), and a trek from j to i is considered to be a different trek, even if it contains exactly the same nodes and edges. Also note that variances are not considered to be edges in the mixed graph corresponding to the model (i.e., it is not allowed to travel variance edges). Therefore, all graphical representations of SEMs in this article omit variance edges, and it is required to factor them in according to rule 3 after the treks are collected.

1.3.1 Example

To illustrate how the model-implied covariances can be derived using trek rules, we give an example based on the graph shown in the path diagram in Fig. 1. To find the model-implied covariance between nodes $X_2$ and $X_6$ in the model shown in Fig. 1, we first find all treks from $X_2$ to $X_6$

(7)

(8)

We now compute the trek monomials for each trek. The second trek does not contain a covariance parameter, so we need to factor in the variance of the top node. We find the trek’s top node G and denote the variance parameter of G by $\omega _G$. Finally, we add the resulting trek monomials and we find that the model-implied covariance between $X_2$ and $X_6$ can be expressed as follows:

$$\begin{aligned} {{\,\textrm{cov}\,}}(X_2, X_6) = \lambda _2\omega _l\lambda _6 + \beta _2\omega _G\beta _6 \end{aligned}$$

(9)

As a second example, we derive the model-implied variance of $X_3$. Again, we first find all treks from $X_3$ to $X_3$:

(10)

(11)

$$\begin{aligned} X_3 \end{aligned}$$

(12)

All treks do not contain a covariance parameter, so we need to factor in the variance of the respective top nodes $\zeta _1$, G and $X_3$. We denote the variance parameters of $\zeta _1$ and $X_3$ by $\omega _{\zeta _1}$ and $\omega _3$ and add the resulting trek monomials to obtain

$$\begin{aligned} {{\,\textrm{var}\,}}(X_3) = \lambda _3^2\omega _{\zeta _1} + \beta _3^2\omega _G + \omega _3 \end{aligned}$$

(13)

1.3.2 Formal Definitions

We denote the elements of $\varvec{\Omega }$, the undirected effects between nodes i and j, by $\omega _{ij}$ and the elements of $\varvec{\Lambda }$, the directed effects, by $\lambda _{ij}$. Drton (2018) defines a trek monomial of a trek $\tau $ without a covariance parameter as

$$\begin{aligned} \tau (\varvec{\Lambda }, \varvec{\Omega }) = \omega _{i_0 i_0} \prod _{k \rightarrow l \in \tau } \lambda _{lk} \end{aligned}$$

(14)

where $i_0$ is the top node of the trek, and a trek monomial of a trek $\tau $ containing an undirected edge between $i_0$ and $j_0$ as

$$\begin{aligned} \tau (\varvec{\Lambda }, \varvec{\Omega }) = \omega _{i_0 j_0} \prod _{k \rightarrow l \in \tau } \lambda _{lk} \end{aligned}$$

(15)

(notice the swapped indices of $\lambda _{lk}$ compared to the formula in Drton because our $\varvec{\Lambda }$ corresponds to his $\varvec{\Lambda }^T$). With this, the elements of $\varvec{\Sigma }(\theta )$ are represented as a summation over treks. He proves that

$$\begin{aligned} \varvec{\Sigma }(\theta )_{ij} = \sum _{\tau \in {\mathcal {T}}(i,j)} \tau (\varvec{\Lambda }, \varvec{\Omega }) \end{aligned}$$

(16)

where ${\mathcal {T}}(i,j)$ is the set of all treks from i to j. It follows that the model-implied covariance is a sum of monomials of parameters. Because covariances between the error terms are not transitive, exactly one undirected parameter (variance or covariance) is present in each monomial. Therefore, if all the directed parameters were fixed, the model-implied covariance would be a linear function of the undirected parameters. This is what makes the SNLLS procedure applicable to structural equation models.

For later use, we also note that Drton gives the following expression:

$$\begin{aligned} (\textbf{I}- \varvec{\Lambda })^{-1}_{ij} = \sum _{\tau \in {\mathcal {P}}(j,i)} \prod _{k \rightarrow l \in \tau } \lambda _{lk} \end{aligned}$$

(17)

where ${\mathcal {P}}(j,i)$ is the set of directed paths from j to i. This is because we can write $(\textbf{I}- \varvec{\Lambda })^{-1} = \sum _{k = 0}^\infty \varvec{\Lambda }^k$, where the geometric series converges iff all eigenvalues of $\varvec{\Lambda }$ lie in $(-1, 1)$. (Further explanations about this and an excellent account of the connections between matrix algebra and graphs can be found in Kepner & Gilbert (2011).)

2 Separable Nonlinear Least Squares for SEM

We first outline the proofs for the applicability of SNLLS to CFA as given by Golub and Pereyra (1973) and Kreiberg et al. (2021). Subsequently, we proof that SNLLS is applicable to linear structural equation models. We further extend the existing proofs to subsume models that contain a mean structure. Last, we derive analytic gradients that are central for efficient software implementations.

2.1 Outline of Previous Work

To minimize Eq. 1, Golub and Pereyra (1973) define the matrix function

$$\begin{aligned} \Phi _{ij} :=\varphi _j(\beta , x_i) \end{aligned}$$

(18)

such that Eq. 1 can be written as

$$\begin{aligned} \Vert y - \Phi (\beta ) \alpha \Vert ^2 \end{aligned}$$

(19)

where $\Vert \cdot \Vert $ denotes the euclidean norm. For a fixed value of $\beta $, a solution for $\alpha $ can be obtained as $\alpha = \Phi ^+(\beta ) y$. They further proved that under the assumption that $\Phi (\beta )$ has constant rank near the solution, only the nonlinear parameters $\beta $ have to be obtained iteratively by replacing $\alpha $ and minimizing the modified objective

$$\begin{aligned} \left\Vert y - \Phi (\beta )\Phi ^+(\beta ) y \right\Vert ^2 \end{aligned}$$

(20)

where $\Phi ^+$ denotes the Moore–Penrose generalized inverse. Afterward, the least squares solution for the linear parameters $\alpha $ can be obtained as the standard least squares estimator ${{\,\mathrm{arg\,min}\,}}_{\alpha \in \mathbb {R}^n} \Vert \Phi (\hat{\beta })\alpha - y \Vert = \Phi ^+(\hat{\beta }) y$.

Kreiberg et al. (2021) showed that this procedure is applicable for CFA models (we reproduce their main results in our notation), as it is possible to rewrite the model-implied covariances $\sigma $ as a product of a matrix-valued function $\textbf{G}(\theta _{\varvec{\Lambda }})$ (that depends only on the directed parameters) and the undirected parameters $\theta _{\varvec{\Omega }}$, so the LS objective can be written as

$$\begin{aligned} F_{\text {LS}}&= (s - \sigma )^T \textbf{V}(s - \sigma ) \end{aligned}$$

(21)

$$\begin{aligned}&= \left( s - \textbf{G}(\theta _{\varvec{\Lambda }})\theta _{\varvec{\Omega }}\right) ^T \textbf{V}\left( s - \textbf{G}(\theta _{\varvec{\Lambda }})\theta _{\varvec{\Omega }} \right) \end{aligned}$$

(22)

$$\begin{aligned}&= \left\Vert s - \textbf{G}(\theta _{\varvec{\Lambda }})\theta _{\varvec{\Omega }}\right\Vert ^2_\textbf{V}\end{aligned}$$

(23)

They further stated that if $\theta _{\varvec{\Lambda }}$ is fixed, we know from standard linear least squares estimation that the minimizer for the undirected effects can be obtained as

$$\begin{aligned} \hat{\theta }_{\varvec{\Omega }} = \left( \textbf{G}( \theta _{\varvec{\Lambda }})^T \textbf{V}\textbf{G}( \theta _{\varvec{\Lambda }})\right) ^{-1} \textbf{G}( \theta _{\varvec{\Lambda }})^T\textbf{V}s \end{aligned}$$

(24)

Inserting Eq. 24 into Eq. 22 and simplifying, they obtained a new objective to be minimized:

$$\begin{aligned} \hat{\theta }_{\varvec{\Lambda }}&= \mathop {{{\,\mathrm{arg\,min}\,}}}\limits _{\theta _{\varvec{\Lambda }}} \left[ s^T\textbf{V}s - s^T\textbf{V}\textbf{G}(\theta _{\varvec{\Lambda }}) \left( \textbf{G}(\theta _{\varvec{\Lambda }})^T \textbf{V}\textbf{G}(\theta _{\varvec{\Lambda }})\right) ^{-1} \textbf{G}(\theta _{\varvec{\Lambda }})^T\textbf{V}s \right] \end{aligned}$$

(25)

$$\begin{aligned}&= \mathop {{{\,\mathrm{arg\,min}\,}}}\limits _{\theta _{\varvec{\Lambda }}} F_{\text {SNLLS}}(\theta _{\varvec{\Lambda }}) \end{aligned}$$

(26)

This objective only depends on the directed parameters $\theta _{\varvec{\Lambda }}$. After minimizing it to obtain a LS estimate $\hat{\theta }_{\varvec{\Lambda }}$, Eq. 24 can be used to obtain the LS estimate of $\theta _{\varvec{\Omega }}$. We would like to note they assume that $\textbf{G}$ has full rank, which is not a necessary assumption and can be relaxed using alternative formulations of Eqs. 24 and 25. To extend the method to general structural equation models, we have to derive $\textbf{G}(\theta _{\varvec{\Lambda }})$. We do that in the following for all models formulated in the RAM notation.

2.2 Derivation of $\textbf{G}(\theta _{\varvec{\Lambda }})$

Since $\textbf{F}= \left[ {\textbf {I}}\,\big |\,{\textbf {0}}\right] $ with $\textbf{0} \in \mathbb {R}^{m_\textrm{obs} \times m_\textrm{lat}}$, the product $\textbf{F}\textbf{M}\textbf{F}^{T}$ for any $\textbf{M}\in \mathbb {R}^{m \times m}$ is equal to just deleting the last $m_\textrm{lat}$ rows and columns of $\textbf{M}$. We also note that for any matrices $\textbf{M}, \textbf{D}\in \mathbb {R}^{n \times n}$ we can write

$$\begin{aligned} \left( \textbf{M}\textbf{D}\textbf{M}^{T}\right) _{ij} = \sum _{k = 1}^n \sum _{l = 1}^n m_{il} \; d_{lk} \; m_{jk} \end{aligned}$$

(27)

With this in mind, we can rewrite the model-implied covariance matrix $\varvec{\Sigma }(\theta )$ as

$$\begin{aligned} \varvec{\Sigma }(\theta )_{ij}&= \bigg ( \textbf{F}{} & {} (\textbf{I}-\varvec{\Lambda })^{-1}{} & {} \varvec{\Omega }&(\textbf{I}-\varvec{\Lambda })^{-T}{} & {} \textbf{F}^T \bigg )_{ij} \end{aligned}$$

(28)

$$\begin{aligned}&= \bigg ({} & {} (\textbf{I}-\varvec{\Lambda })^{-1}{} & {} \varvec{\Omega }&(\textbf{I}-\varvec{\Lambda })^{-T}{} & {} \bigg )_{ij} \end{aligned}$$

(29)

$$\begin{aligned}&= \; \; \sum _{k = 1}^m \sum _{l = 1}^m{} & {} (\textbf{I}- \varvec{\Lambda })^{-1}_{il}{} & {} \omega _{lk}&(\textbf{I}- \varvec{\Lambda })^{-1}_{jk}{} & {} \end{aligned}$$

(30)

$$\begin{aligned}&= \; \; \sum _{k = 1}^m \sum _{l = 1}^m{} & {} \Big (\sum _{\tau \in \mathcal {P}(l,i)} \prod _{r \rightarrow s \in \tau } \lambda _{sr}\Big ){} & {} \omega _{lk}&\Big (\sum _{\tau \in \mathcal {P}(k,j)} \prod _{r \rightarrow s \in \tau } \lambda _{sr}\Big ){} & {} \end{aligned}$$

(31)

with $i,j \in \{1, \ldots , m_\textrm{obs}\}$. We now immediately see that each entry of $\varvec{\Sigma }$ is a sum of products of entries of $(\textbf{I}-\varvec{\Lambda })^{-1}$ and $\varvec{\Omega }$. More importantly, exactly one entry of $\varvec{\Omega }$ enters each term of the sum; if we keep all entries of $\varvec{\Lambda }$ fixed, each element in $\varvec{\Sigma }$ is a linear function of the entries of $\varvec{\Omega }$ and is therefore a linear function of the undirected parameters in $\varvec{\Omega }$ (under the assumption that $\varvec{\Omega }$ is linearly parameterized). As a result, the parameter vector $\theta $ is separable in two parts, $\theta _{\varvec{\Lambda }}$ from $\varvec{\Lambda }$ and $\theta _{\varvec{\Omega }}$ from $\varvec{\Omega }$, and $\theta _{\varvec{\Omega }}$ enters the computation of the model-implied covariance linearly. As stated before, this is the reason why we will be able to apply separable nonlinear least squares estimation to our problem. Before we proceed, we would like to introduce some notation. If ${\mathcal {F}}$ and ${\mathcal {G}}$ are tuples of length n and m, and f and g are functions, we define a column vector of length n as

$$\begin{aligned} \Bigg (\Big [f(i)\Big ]_{i \in {\mathcal {F}}}\Bigg ) = \begin{pmatrix} f({\mathcal {F}}_1) \\ f({\mathcal {F}}_2) \\ \ldots \\ f({\mathcal {F}}_n) \end{pmatrix} \end{aligned}$$

(32)

and a matrix of size $n \times m$ as

$$\begin{aligned} \Bigg (\Big [g(i,j)\Big ]_{i \in {\mathcal {F}},\; j \in {\mathcal {G}}}\Bigg ) = \begin{pmatrix} g({\mathcal {F}}_1, {\mathcal {G}}_1) &{}\quad g({\mathcal {F}}_1, {\mathcal {G}}_2) &{}\quad \ldots &{}\quad g({\mathcal {F}}_1, {\mathcal {G}}_m)\\ g({\mathcal {F}}_2, {\mathcal {G}}_1) &{}\quad g({\mathcal {F}}_2, {\mathcal {G}}_2) &{}\quad \ldots &{}\quad g({\mathcal {F}}_2, {\mathcal {G}}_m)\\ \ldots &{}\quad \ldots &{}\quad \ldots &{}\quad \ldots \\ g({\mathcal {F}}_n, {\mathcal {G}}_1) &{}\quad \ldots &{}\quad \ldots &{}\quad g({\mathcal {F}}_n, {\mathcal {G}}_m) \end{pmatrix} \end{aligned}$$

(33)

To make the subsequent steps easier to follow, we assume that there are no equality constraints between parameters in $\varvec{\Omega }$ and no constant terms in $\varvec{\Omega }$ different from 0. In Appendices A and B, we show how to lift those assumptions. We now further simplify Eq. 30: Since only nonzero entries of $\varvec{\Omega }$ (the parameters $\theta _{\varvec{\Omega }}$) contribute to the sum, we define ${\mathcal {C}}$ as the lower triangular indices of $\theta _{\varvec{\Omega }}$ in $\varvec{\Omega }$, i.e., ${\mathcal {C}}_i = (l, k) \in \mathbb {N}\times \mathbb {N}$ with $({\theta _{\varvec{\Omega }}})_i = \omega _{lk}$ and $l \ge k$. We now rewrite Eq. 30 by omitting all zero terms:

$$\begin{aligned} \varvec{\Sigma }(\theta )_{ij}&= \sum _{(l,k) \in {\mathcal {C}}} \left[ (\textbf{I}- \varvec{\Lambda })^{-1}_{il} \omega _{lk} (\textbf{I}- \varvec{\Lambda })^{-1}_{jk} + \delta _{k \ne l} \, (\textbf{I}- \varvec{\Lambda })^{-1}_{ik} \omega _{lk} (\textbf{I}- \varvec{\Lambda })^{-1}_{jl} \right] \end{aligned}$$

(34)

$$\begin{aligned}&= \left( \left[ (\textbf{I}- \varvec{\Lambda })^{-1}_{il} (\textbf{I}- \varvec{\Lambda })^{-1}_{jk} + \delta _{k \ne l} \, (\textbf{I}- \varvec{\Lambda })^{-1}_{ik} (\textbf{I}- \varvec{\Lambda })^{-1}_{jl}\right] _{(l,k) \in {\mathcal {C}}} \right) ^T \; \theta _{\varvec{\Omega }} \end{aligned}$$

(35)

where $\delta _{k \ne l}$ is an indicator function that takes the value 1 if $k \ne l$ and 0 otherwise. Since we are only interested in the non-duplicated elements $\sigma $ of $\varvec{\Sigma }$, we define another index tuple ${\mathcal {D}}$ that denotes the indices of the original position of $\sigma _k$ in $\varvec{\Sigma }$, i.e., ${\mathcal {D}}_k = (i, j)$ such that $\sigma _k = \varvec{\Sigma }_{ij}$. This allows us to stack the expression we just found for $\varvec{\Sigma }_{ij}$ rowwise to get

$$\begin{aligned} \sigma&= \left( \Big [\varvec{\Sigma }_{ij} \Big ]_{(i, j) \in {\mathcal {D}}} \right) \end{aligned}$$

(36)

$$\begin{aligned}&= \left( \left[ (\textbf{I}- \varvec{\Lambda })^{-1}_{il} (\textbf{I}- \varvec{\Lambda })^{-1}_{jk} + \delta _{k \ne l} \, (\textbf{I}- \varvec{\Lambda })^{-1}_{ik} (\textbf{I}- \varvec{\Lambda })^{-1}_{jl}\right] _{(i, j) \in {\mathcal {D}}, \; (l,k) \in {\mathcal {C}}} \; \right) \theta _{\varvec{\Omega }} \end{aligned}$$

(37)

$$\begin{aligned}&= \textbf{G}(\theta _{\varvec{\Lambda }}) \; \theta _{\varvec{\Omega }} \end{aligned}$$

(38)

where $\textbf{G}(\theta _{\varvec{\Lambda }}) \in \mathbb {R}^{\dim (\sigma ) \times \dim (\theta _{\varvec{\Omega }})}$. (We let $\dim (\cdot )$ of a vector denote its number of elements, i.e., the dimension of the underlying (finite-dimensional) vector space.)

Even though this expression may appear involved, it is in fact easy to compute. Before the optimization procedure starts, we store ${\mathcal {C}}$ by looking up the positions of the parameters in $\varvec{\Omega }$ and also store ${\mathcal {D}}$. At each step of the optimization procedure, to compute $\textbf{G}(\theta _{\varvec{\Lambda }})$, we now compute $(\textbf{I}- \varvec{\Lambda })^{-1}$ first and then loop through the entries ${\mathcal {C}}$ and ${\mathcal {D}}$ to compute each entry of $\textbf{G}(\theta _{\varvec{\Lambda }})$ according to Eq. 37. We note that $\textbf{G}$ will typically be sparse; therefore, it is advisable to analyze its sparsity pattern previous to the optimization, and only loop through nonzero values.

In Appendix D, we present a different way of obtaining $\textbf{G}(\theta _{\varvec{\Lambda }})$ and the gradients, which mimics the approach of Kreiberg et al. (2021). However, the expressions obtained here are computationally more efficient, as the ones in the appendix contain very large Kronecker products.

2.3 Mean Structures

If the model contains mean parameters, we partition the parameter vector $\theta $ into three parts: $\theta _{\varvec{\Lambda }}$ and $\theta _{\varvec{\Omega }}$ as before, and $\theta _\gamma $ from the mean vector $\gamma $. From Eq. 5, we directly see that the model-implied mean vector $\mu (\theta )$ is a linear function of $\theta _\gamma $. If we let ${\mathcal {A}}$ denote the indices of the parameters $\theta _{\gamma }$ in $\gamma $, i.e., for $i = {\mathcal {A}}_{j}$ we have $({\theta _{\gamma }})_j$ = $\gamma _i$, we obtain the formula

$$\begin{aligned} \mu = \left( \left[ (\textbf{I}- \varvec{\Lambda })^{-1}_{ij}\right] _{i \in (1, \ldots , m_\textrm{obs}), \; j \in {\mathcal {A}}} \right) \; \theta _{\gamma }\end{aligned}$$

(39)

We now make a slight change in notation: For the previously obtained $\textbf{G}(\theta )$-matrix, we write $\textbf{G}_\sigma $ instead and define $\textbf{G}_\mu :=\left( \left[ (\textbf{I}- \varvec{\Lambda })^{-1}_{ij}\right] _{i \in (1, \ldots , m_\textrm{obs}), \; j \in {\mathcal {A}}} \right) $. Using a formulation of the least squares objective that also includes a mean structure, we see that

(40)

with

$$\begin{aligned} \textbf{G}:=\left[ \begin{array}{cc} \textbf{G}_\sigma &{}\quad \textbf{0} \\ \textbf{0} &{}\quad \textbf{G}_\mu \end{array}\right] \end{aligned}$$

(41)

It follows that in addition to the undirected parameters, the mean parameters also do not have to be optimized iteratively but can instead be computed analytically after the iterative optimization is completed.

2.4 Gradient of the SNLLS Objective

There are computationally efficient expression to compute the SNLLS objective and its gradient analytically (Kaufman, 1975; O’Leary & Rust, 2013). Because numerical approximations of the gradient are often slow and may become numerically unstable, we derive an analytical expression for the part of the gradient that is specific to SEMs. We use the notation and methods from Magnus and Neudecker (2019a) and denote the differential by ${{\,\mathrm{\textsf{d}}\,}}$ and the Jacobian by ${{\,\mathrm{\textsf{D}}\,}}$. The Jacobian of a matrix function $\textbf{M}$ with respect to a vector x is defined as ${{\,\mathrm{\textsf{D}}\,}}\textbf{M}= \frac{\partial {{\,\textrm{vec}\,}}\textbf{M}}{\partial x^{T}}$. In the approaches by Kaufman (1975) and O’Leary and Rust (2013), the gradient of the SNLSS objective is expressed in terms of the partial derivatives of the entries of $\textbf{G}$ w.r.t the nonlinear parameters, i.e., ${{\,\mathrm{\textsf{D}}\,}}\textbf{G}$. In order to be able to implement such efficient approaches in practice, we derive ${{\,\mathrm{\textsf{D}}\,}}\textbf{G}$ here. We also give the full gradient of Eq. 25 for completeness in Appendix C, although in practice, a more efficient expression from the cited literature can be used (which also does not assume $\textbf{G}$ to have full rank). For reasons of clarity, we here only consider the case without mean structure, e.g., $\textbf{G}= \textbf{G}_\sigma $. This is because the derivative of $\textbf{G}_\mu $ is similar to obtain and we do not want to make the derivation unnecessarily technical.

Let ${\mathcal {E}}$ denote the indices of $\theta _{\varvec{\Lambda }}$ in $\varvec{\Lambda }$, i.e., ${\mathcal {E}}_k = (i,j)$ such that $\varvec{\Lambda }_{ij} = ({\theta _{\varvec{\Lambda }}})_k$. We note that

$$\begin{aligned} \frac{\partial (\textbf{I}- \varvec{\Lambda })^{-1}_{kl}}{\partial \varvec{\Lambda }_{ij}} = (\textbf{I}- \varvec{\Lambda })^{-1}_{ki}(\textbf{I}- \varvec{\Lambda })^{-1}_{jl} \end{aligned}$$

(42)

With this, we derive the partial derivatives of each entry of $\textbf{G}$ in terms of the matrix $(\textbf{I}- \varvec{\Lambda })^{-1}$ as

$$\begin{aligned} \frac{\partial \textbf{G}_{r, s}}{\partial ({\theta _{\varvec{\Lambda }}})_n}&= \frac{\partial }{\partial ({\theta _{\varvec{\Lambda }}})_n} \left[ (\textbf{I}- \varvec{\Lambda })^{-1}_{il} (\textbf{I}- \varvec{\Lambda })^{-1}_{jk} +\delta _{k \ne l} \, (\textbf{I}- \varvec{\Lambda })^{-1}_{ik} (\textbf{I}- \varvec{\Lambda })^{-1}_{jl}\right] \end{aligned}$$

(43)

$$\begin{aligned}&= \left[ \frac{\partial }{\partial ({\theta _{\varvec{\Lambda }}})_n}(\textbf{I}- \varvec{\Lambda })^{-1}_{il} (\textbf{I}- \varvec{\Lambda })^{-1}_{jk} + (\textbf{I}- \varvec{\Lambda })^{-1}_{il} \frac{\partial }{\partial ({\theta _{\varvec{\Lambda }}})_n}(\textbf{I}- \varvec{\Lambda })^{-1}_{jk}\right] \nonumber \\&\quad +\delta _{k \ne l} \left[ \frac{\partial }{\partial ({\theta _{\varvec{\Lambda }}})_n}(\textbf{I}- \varvec{\Lambda })^{-1}_{ik} (\textbf{I}- \varvec{\Lambda })^{-1}_{jl} + (\textbf{I}- \varvec{\Lambda })^{-1}_{ik} \frac{\partial }{\partial ({\theta _{\varvec{\Lambda }}})_n}(\textbf{I}- \varvec{\Lambda })^{-1}_{jl} \right] \end{aligned}$$

(44)

$$\begin{aligned}&= \left[ (\textbf{I}- \varvec{\Lambda })^{-1}_{iu} (\textbf{I}- \varvec{\Lambda })^{-1}_{vl} (\textbf{I}- \varvec{\Lambda })^{-1}_{jk} + (\textbf{I}- \varvec{\Lambda })^{-1}_{il} (\textbf{I}- \varvec{\Lambda })^{-1}_{ju} (\textbf{I}- \varvec{\Lambda })^{-1}_{vk} \right] \nonumber \\&\quad +\delta _{k \ne l} \left[ (\textbf{I}- \varvec{\Lambda })^{-1}_{iu} (\textbf{I}- \varvec{\Lambda })^{-1}_{vk} (\textbf{I}- \varvec{\Lambda })^{-1}_{jl} + (\textbf{I}- \varvec{\Lambda })^{-1}_{ik} (\textbf{I}- \varvec{\Lambda })^{-1}_{ju} (\textbf{I}- \varvec{\Lambda })^{-1}_{vl} \right] \end{aligned}$$

(45)

with $(i, j) = {\mathcal {D}}_r$, $(l, k) = {\mathcal {C}}_s$, and $(u, v) = {\mathcal {E}}_n$. Since $\textbf{G}$ is of dimension $\dim (\sigma ) \times \dim (\theta _{\varvec{\Omega }})$, with $k = \dim (\sigma )$ we have

$$\begin{aligned} {{\,\textrm{vec}\,}}(\textbf{G})_t = \textbf{G}_{t - k\lfloor (t-1)/k) \rfloor , \; \lceil t / k \rceil } \end{aligned}$$

(46)

and we obtain ${{\,\mathrm{\textsf{D}}\,}}\textbf{G}\in \mathbb {R}^{\dim (\sigma )\dim (\theta _{\varvec{\Omega }}) \times \dim (\theta _{\varvec{\Lambda }})}$ as

$$\begin{aligned} {{\,\mathrm{\textsf{D}}\,}}\textbf{G}= \frac{\partial {{\,\textrm{vec}\,}}\textbf{G}}{\partial \theta _{\varvec{\Lambda }}^T} = \left( \left[ \frac{\partial \textbf{G}_{t - k\lfloor (t-1)/k) \rfloor , \; \lceil t / k \rceil }}{\partial ({\theta _{\varvec{\Lambda }}})_n}\right] _{t \in (1, \ldots , \dim (\sigma )\dim (\theta _{\varvec{\Omega }})), \; n \in (1, \ldots , \dim (\theta _{\varvec{\Lambda }}))}\right) \end{aligned}$$

(47)

To facilitate software implementation, we give a way to compute ${{\,\mathrm{\textsf{D}}\,}}\textbf{G}$ in pseudocode in Algorithm 1. In practice, ${{\,\mathrm{\textsf{D}}\,}}\textbf{G}$ will typically contain many zero values. Therefore, it is advisable to analyze the sparsity pattern of ${{\,\mathrm{\textsf{D}}\,}}\textbf{G}$ before the optimization procedure begins and to only compute the nonzero values of ${{\,\mathrm{\textsf{D}}\,}}\textbf{G}$ at each iteration. Also note that the entries of ${{\,\mathrm{\textsf{D}}\,}}\textbf{G}$ are continuous w.r.t $\theta _{\varvec{\Lambda }}$, since they are sums of products of entries of the inverse $(\textbf{I}- \varvec{\Lambda })^{-1}$, which is continuous.

3 Discussion

We have shown that separable nonlinear least squares is applicable to generalized least squares estimation of structural equation models formulated in the RAM notation. We have also shown a connection to path tracing rules in the form of trek rules. Note that when the same weight matrix is used, the point estimates obtained by SNLLS and LS are identical. Therefore, standard errors and test statistics are obtained using the same methods available for regular least squares estimation. In the following, we would like to discuss the two major benefits of using SNLLS for SEM: better convergence properties and a reduction in the computation time for parameter estimation.

3.1 Convergence

An important issue in SEM is convergence problems, especially in small samples (De Jonckere & Rosseel, 2022). If the optimizer fails to converge, no parameter estimates can be obtained. Using the SNLLS objective should lead to fewer convergence problems than LS, since only the directed parameters need to be estimated iteratively. Therefore, only the subset of directed parameters requires starting values. In many models, most of the directed parameters are factor loadings, and we can obtain very good starting values for them with the FABIN 3 estimator (Hägglund, 1982). Also, Ruhe and Wedin (1980) and Golub and Pereyra (2003) give additional proofs and reasons for why the reduced optimization problem of SNLLS should in principle be better behaved than the full LS problem. Additionally, for the class of models without unknown directed parameters, convergence problems should be eliminated altogether, as the estimator of the mean and (co)variance parameters can be computed analytically. Most prominently, this features many types of latent growth curve models.

To investigate the convergence properties of SNLLS in SEM, we ran a small simulation. We used the model in Fig. 2 to draw 1000 random data sets for varying sample sizes (N = 10 to N = 100) under the assumption of multivariate normality with zero expectation and the model-implied covariance induced by the parameters. The sample size and the factor loadings are deliberately chosen to be small to achieve a setting where non-convergence often occurs. We fitted the true model to each sample with generalized least squares (GLS; Bollen, 1989) and SNLLS estimation. All analyses were done in the programming language R (R Core Team, 2021). For GLS estimation, we used lavaan (Rosseel, 2012). The plots were created with ggplot2 (Wickham, 2016), and the data were prepared with dplyr (Wickham et al., 2021). In Fig. 3 we report the number of converged models for each sample size. In Fig. 4, we report the median number of iterations needed until convergence for each sample size. Using SNLLS effectively halved the median number of iterations until convergence for most sample sizes and more than halved the number of non-converged models for most sample sizes. This indicates that SNLLS might be a useful alternative for applied researchers to consider if they encounter convergence problems.

3.2 Computation Time

The benefits of SNLLS estimation, specifically the reduced dimensionality of the parameter space, better starting values and fewer iterations to convergence, could lead to reduced computation times. However, the computation of the SNLLS objective function and gradient is also more costly, so the cost per iteration can be higher. In sum, the question whether SNLLS estimation is faster in actual time spent in the optimization hinges upon several aspects, such as the actual implementation of the gradient, meta-parameters of the optimizer and model complexity.

Kreiberg et al. (2021) stated that estimation by SNLLS will typically be multiple times faster than LS when the reduced parameter space is much smaller than the original one. They conducted a simulation study, where they fitted a number of CFA models and concluded that the estimation time is bigger for LS than for SNLLS as the number of estimated parameters increases. Even though their simulation is useful to illustrate the potential benefits of SNLLS, it seems unfit to us to make a case for a general reduction in computation time when using SNLLS in modern software. The gradient computation in the simulation was based on a finite difference approximation in both the LS and the SNLLS condition. In existing software (Rosseel, 2012; von Oertzen et al., 2015), analytic gradients are implemented for LS estimation, so the authors compare against a straw man that would not be used in practice if computational efficiency is important. In addition, centered finite differences takes 2q calls to the objective function per computation of the gradient, where q is the number of parameters. Since SNLLS results in a smaller parameter space, their method of differentiation favors the SNLLS procedure.

It remains to implement a competitive version of SNLLS optimization for SEM using the analytic gradients derived in this paper to be able to do a realistic simulation to investigate whether SNLLS outperforms the LS estimator in practice. However, there is a large body of research concerning the efficient implementation of SNLLS (see, for example, Kaufman, 1975; O’Leary and Rust, 2013); writing competitive software for SNLLS in SEMs would be a research topic on its own. Therefore, we only give simulation results concerning the improvement of convergence rates and the number of iterations in this paper. As noted previously, for the class of models without unknown directed parameters, the estimator of the mean and (co)variance parameters can be computed in a single step. As a result, those models should especially benefit from lower computation times.

3.3 An Outlook on Maximum Likelihood Estimation

If the assumption of multivariate normality is tenable, another method of obtaining parameter estimates is maximum likelihood estimation. Here, we briefly discuss to what extent our results may have an impact on maximum likelihood optimization of SEMs. In least squares estimation with a fixed weight matrix, we saw that the undirected parameters $\theta _{\varvec{\Omega }}$ and the mean parameters $\theta _\gamma $ enter the objective linearly. For maximum likelihood estimation, we believe it is not possible to factor out the undirected parameters (for most models used in practice). This is because the likelihood of the normal distribution

$$\begin{aligned} \phi (x) = \left( (2\pi )^{m_\textrm{obs}} \det \varvec{\Sigma }\right) ^{-\frac{1}{2}} \exp \left( -\frac{1}{2} (x - \mu )^T \varvec{\Sigma }^{-1} (x - \mu ) \right) \end{aligned}$$

(48)

depends on the inverse of the model-implied covariance matrix. For the simplistic example model depicted in Fig. 5, we derive the model-implied covariance matrix as

$$\begin{aligned} \varvec{\Sigma }= \begin{pmatrix} \omega _l + \omega _1 &{} \omega _l\\ \omega _l &{} \omega _l + \omega _2 \end{pmatrix} \end{aligned}$$

(49)

and the inverse can be computed as

$$\begin{aligned} \varvec{\Sigma }^{-1} = (\det \varvec{\Sigma })^{-1} {{\,\textrm{adj}\,}}\varvec{\Sigma }\end{aligned}$$

(50)

where ${{\,\textrm{adj}\,}}$ refers to the adjugate matrix, so in our example,

$$\begin{aligned} \det \varvec{\Sigma }= (\omega _l + \omega _1)(\omega _l + \omega _2) - \omega _l^2 = \omega _1\omega _l + \omega _2\omega _l + \omega _1\omega _2 \end{aligned}$$

(51)

and

$$\begin{aligned} \varvec{\Sigma }^{-1} = (\omega _1\omega _l + \omega _2\omega _l + \omega _1\omega _2)^{-1} \begin{pmatrix} \omega _l + \omega _2 &{}\quad -\omega _l\\ -\omega _l &{}\quad \omega _l + \omega _1 \end{pmatrix} \end{aligned}$$

(52)

We see that $\theta _{\varvec{\Omega }}$ enters the determinant and therefore the inverse of $\varvec{\Sigma }$ in a nonlinear way. In general, the Leibniz Formula for the determinant gives

$$\begin{aligned} \det \varvec{\Sigma }= \sum _{\pi \in {\mathcal {S}}_{m_\textrm{obs}}} {{\,\textrm{sgn}\,}}(\pi ) \prod _{i = 1}^{m_\textrm{obs}} \varvec{\Sigma }_{i,\pi (i)} \end{aligned}$$

(53)

where ${\mathcal {S}}_{m_\textrm{obs}}$ denotes the symmetric group. Since this formula multiplies entries of $\varvec{\Sigma }$, and we saw in Eq. 30 that the entries of $\varvec{\Sigma }$ depend on the undirected parameters, it is very likely that those form a product and enter the objective in a nonlinear way. However, for the mean parameters, the picture may be different and we leave this for future work. If the model is saturated (e.g., has zero degrees of freedom), the least squares estimates are the same as the maximum likelihood estimates, since $\textbf{S} = \varvec{\Sigma }(\hat{\theta }_\textrm{ML}) = \varvec{\Sigma }(\hat{\theta }_\textrm{LS})$. Also, Lee and Jennrich (1979) showed that maximum likelihood estimation can be obtained as a form of iteratively reweighted least squares if $\textbf{V}$ is a function of the parameters:

$$\begin{aligned} \textbf{V}= \frac{1}{2}\textbf{D}^T\left( \varvec{\Sigma }^{-1} \otimes \varvec{\Sigma }^{-1}\right) \textbf{D}\end{aligned}$$

(54)

where $\textbf{D}$ denotes the duplication matrix from Magnus and Neudecker (2019b). Another way of obtaining ML estimates with SNLLS would therefore be to minimize the SNLLS objective and use the obtained $\varvec{\Sigma }$ to update the weight matrix $\textbf{V}$ as given in Eq. 54. SNLLS could then be rerun with the updated weight matrix, and the weight matrix be updated again, until $\varvec{\Sigma }$ converges to $\varvec{\Sigma }(\hat{\theta }_\textrm{ML})$. However, we would like to note that this procedure is probably computationally quite inefficient.

3.4 Conclusion

We generalized separable nonlinear least squares estimation to all linear structural equation models that can be specified in the RAM notation, particularly those including a mean structure. We explained this result with the help of trek rules and the non-transitivity of the covariances of the error terms, providing deeper insight into the algebraic relations between the parameters of SEMs. We further derived analytic gradients and explained why they are of central importance to obtain a competitive implementation. Our simulation indicates that SNLLS leads to improvements in convergence rate and number of iterations. It remains for future research to investigate the computational costs empirically. We also showed why it is unlikely that undirected parameters enter the maximum likelihood objective linearly. Thus, another line of research could be concerned with the applicability of SNLLS to the mean parameters in maximum likelihood estimation and the relationship of SNLLS to other decomposition methods for maximum likelihood estimation (Pritikin et al., 2017, 2018). Further research might also examine whether SNLLS is applicable to multilevel models. SNLLS promises better convergence rates for least squares parameter estimation in SEM and, with an efficient implementation, also reduced computation times. This result is important in its own right but may as well serve as a first step for generating starting values for subsequent ML estimation.

References

Boker, S. M., McArdle, J. J., & Neale, M. (2002). An algorithm for the hierarchical organization of path diagrams and calculation of components of expected covariance. Structural Equation Modeling: A Multidisciplinary Journal, 9(2), 174–194. https://doi.org/10.1207/S15328007SEM0902_2
Article Google Scholar
Bollen, K. A. (1989). Structural equations with latent variables. John Wiley & Sons, Ltd. https://doi.org/10.1002/9781118619179
Browne, M. W. (1982). Covariance structures. In D. M. Hawkins (Ed.), Topics in applied multivariate analysis (pp. 72–141). Cambridge University Press. https://doi.org/10.1017/CBO9780511897375.003
Browne, M. W. (1984). Asymptotically distribution-free methods for the analysis of covariance structures. British Journal of Mathematical and Statistical Psychology, 37(1), 62–83. https://doi.org/10.1111/j.2044-8317.1984.tb00789.x
Article PubMed Google Scholar
De Jonckere, J., & Rosseel, Y. (2022). Using bounded estimation to avoid nonconvergence in small sample structural equation modeling. Structural Equation Modeling: A Multidisciplinary Journal, 29(3), 412–427. https://doi.org/10.1080/10705511.2021.1982716
Article Google Scholar
Drton, M. (2018). Algebraic problems in structural equation modeling. Advanced Studies in Pure Mathematics, 77, 35–86. https://doi.org/10.2969/aspm/07710035
Article Google Scholar
Ernst, M. S. (2022). Separable nonlinear least squares estimation of structural equation models. Master’s thesis. Humboldt-Universität zu Berlin.
Golub, G. H., & Pereyra, V. (1973). The differentiation of pseudo-inverses and nonlinear least squares problems whose variables separate. SIAM Journal on Numerical Analysis, 10(2), 413–432. https://doi.org/10.1137/0710036
Article Google Scholar
Golub, G. H., & Pereyra, V. (2003). Separable nonlinear least squares: The variable projection method and its applications. Inverse Problems, 19(2), R1–R26. https://doi.org/10.1088/0266-5611/19/2/201
Article Google Scholar
Hägglund, G. (1982). Factor analysis by instrumental variables methods. Psychometrika, 47(2), 209–222. https://doi.org/10.1007/BF02296276
Article Google Scholar
Kaufman, L. (1975). A variable projection method for solving separable nonlinear least squares problems. BIT Numerical Mathematics, 15(1), 49–57. https://doi.org/10.1007/BF01932995
Article Google Scholar
Kepner, J., & Gilbert, J. (Eds.). (2011). Graph algorithms in the language of linear algebra. Society for Industrial and Applied Mathematics. https://doi.org/10.1137/1.9780898719918
Kreiberg, D., Marcoulides, K., & Olsson, U. H. (2021). A faster procedure for estimating cfa models applying minimum distance estimators with a fixed weight matrix. Structural Equation Modeling: A Multidisciplinary Journal, 28(5), 725–739. https://doi.org/10.1080/10705511.2020.1835484
Article Google Scholar
Kreiberg, D., Söderström, T., & Yang-Wallentin, F. (2016). Errors-in-variables system identification using structural equation modeling. Automatica, 66, 218–230. https://doi.org/10.1016/j.automatica.2015.12.007
Article Google Scholar
Lee, S. Y., & Jennrich, R. I. (1979). A study of algorithms for covariance structure analysis with specific comparisons using factor analysis. Psychometrika, 44(1), 99–113. https://doi.org/10.1007/BF02293789
Article Google Scholar
Magnus, J. R., & Neudecker, H. (2019a). Differentials and differentiability. In Matrix differential calculus with applications in statistics and econometrics (pp. 87–110). John Wiley & Sons, Ltd. https://doi.org/10.1002/9781119541219.ch5
Magnus, J. R., & Neudecker, H. (2019b). Miscellaneous matrix results. In Matrix differential calculus with applications in statistics and econometrics (pp. 47–70). John Wiley & Sons, Ltd. https://doi.org/10.1002/9781119541219.ch3
McArdle, J. J., & McDonald, R. P. (1984). Some algebraic properties of the reticular action model for moment structures. British Journal of Mathematical and Statistical Psychology, 37(2), 234–251. https://doi.org/10.1111/j.2044-8317.1984.tb00802.x
Article PubMed Google Scholar
Mulaik, S. A. (2009). Linear causal modeling with structural equations. Chapman and Hall/CRC. https://doi.org/10.1201/9781439800393
O’Leary, D. P., & Rust, B. W. (2013). Variable projection for nonlinear least squares problems. Computational Optimization and Applications, 54(3), 579–593. https://doi.org/10.1007/s10589-012-9492-9
Article Google Scholar
Pritikin, J. N., Brick, T. R., & Neale, M. C. (2018). Multivariate normal maximum likelihood with both ordinal and continuous variables, and data missing at random. Behavior Research Methods, 50(2), 490–500. https://doi.org/10.3758/s13428-017-1011-6
Article PubMed PubMed Central Google Scholar
Pritikin, J. N., Hunter, M. D., von Oertzen, T., Brick, T. R., & Boker, S. M. (2017). Many-level multilevel structural equation modeling: An efficient evaluation strategy. Structural Equation Modeling: A Multidisciplinary Journal, 24(5), 684–698. https://doi.org/10.1080/10705511.2017.1293542
Article PubMed Google Scholar
R Core Team. (2021). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. https://www.R-project.org/
Rosseel, Y. (2012). lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48(2), 1–36. https://doi.org/10.18637/jss.v048.i02
Article Google Scholar
Ruhe, A., & Wedin, P. Å. (1980). Algorithms for separable nonlinear least squares problems. SIAM Review, 22(3), 318–337. https://doi.org/10.1137/1022057
Article Google Scholar
von Oertzen, T., Brandmaier, A. M., & Tsang, S. (2015). Structural equation modeling with $\Omega $nyx. Structural Equation Modeling: A Multidisciplinary Journal, 22(1), 148–161. https://doi.org/10.1080/10705511.2014.935842
Article Google Scholar
Wickham, H. (2016). Ggplot2: Elegant graphics for data analysis. Springer. https://www.ggplot2.tidyverse.org
Wickham, H., François, R., Henry, L., & Müller, K. (2021). Dplyr: A grammar of data manipulation. https://www.CRAN.R-project.org/package=dplyr

Download references

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

Center for Lifespan Psychology, Max Planck Institute for Human Development, Lentzeallee 94, 14195 , Berlin, Germany
Maximilian S. Ernst, Aaron Peikert & Andreas M. Brandmaier
Center for Lifespan Psychology, Humboldt-Universität Zu Berlin, Berlin, Germany
Maximilian S. Ernst & Aaron Peikert
Max Planck UCL Centre for Computational Psychiatry and Ageing Research, Berlin, Germany and London, UK Lentzeallee 94, 14195, Berlin, Germany
Aaron Peikert & Andreas M. Brandmaier
Department of Psychology, MSB Medical School Berlin, Berlin, Germany
Andreas M. Brandmaier
Department of Data Analysis, Ghent University, Ghent, Belgium
Yves Rosseel

Authors

Maximilian S. Ernst
View author publications
You can also search for this author in PubMed Google Scholar
Aaron Peikert
View author publications
You can also search for this author in PubMed Google Scholar
Andreas M. Brandmaier
View author publications
You can also search for this author in PubMed Google Scholar
Yves Rosseel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maximilian S. Ernst.

Ethics declarations

Conflict of interest

We have no conflicts of interest to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is based on the master’s thesis of Ernst (2022).

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (txt 3 KB)

Supplementary file 2 (txt 4 KB)

Appendices

Appendix A: Equality Constraints

Kreiberg et al. (2021) showed how to incorporate equality constraints in CFA models. Because their proof follows a different approach, we show how to incorporate equality constraints in our expressions. Since the SNLLS objective only depends on $\theta _{\varvec{\Lambda }}$, constraints in $\theta _{\varvec{\Omega }}$ and $\theta _\gamma $ can be difficult to implement. However, simple equality constraints (e.g., $\theta _j = \theta _i$) are feasible under SNLLS. Since $\sigma = \textbf{G}\, \theta _{\varvec{\Omega }, \gamma }$, we see that if two (or more) parameters in $\theta _{\varvec{\Omega }, \gamma }$ are equal, we can delete all but one occurrence from the parameter vector and add the relevant columns in $\textbf{G}$ together, e.g.,

$$\begin{aligned} \begin{pmatrix} a&b&c \end{pmatrix} \begin{pmatrix} d \\ e \\ d \end{pmatrix} = ad + be + cd = (a+c)d + be = \begin{pmatrix} a + c&b \end{pmatrix} \begin{pmatrix} d \\ e \end{pmatrix} \end{aligned}$$

(A1)

Or, put differently, if we allow the index tuples ${\mathcal {C}}$ and ${\mathcal {A}}$ to have sets of indices as entries, i.e., ${\mathcal {C}}_i = \{(k, l) \in \mathbb {N}\times \mathbb {N}\; | \; \theta _{\varvec{\Omega }_i} = \omega _{kl} \wedge k \ge l\}$, we obtain

$$\begin{aligned} \textbf{G}_\sigma = \left( \left[ \sum _{(l,k) \in c} (\textbf{I}- \varvec{\Lambda })^{-1}_{il} (\textbf{I}- \varvec{\Lambda })^{-1}_{jk} + \delta _{k \ne l} \, (\textbf{I}- \varvec{\Lambda })^{-1}_{ik} (\textbf{I}- \varvec{\Lambda })^{-1}_{jl}\right] _{(i, j) \in {\mathcal {D}}, \; c \in {\mathcal {C}}} \right) \end{aligned}$$

(A2)

An expression for $\textbf{G}_\mu $ can be obtained in a similar way.

Appendix B: Constants in $\varvec{\Omega }$

To handle constants in $\varvec{\Omega }$ different from zero, we introduce c as the vector of constant nonzero entries in $\varvec{\Omega }$ and ${\mathcal {E}}$ as the lower triangular indices of c in $\varvec{\Omega }$. Further, define

$$\begin{aligned} \textbf{G}_c :=\left( \left[ (\textbf{I}- \varvec{\Lambda })^{-1}_{il} (\textbf{I}- \varvec{\Lambda })^{-1}_{jk} + \delta _{k \ne l} \, (\textbf{I}- \varvec{\Lambda })^{-1}_{ik} (\textbf{I}- \varvec{\Lambda })^{-1}_{jl}\right] _{(i, j) \in {\mathcal {D}}, \; (l,k) \in {\mathcal {E}}} \; \right) \end{aligned}$$

(B1)

with ${\mathcal {D}}$ defined as in Eq. 36, i.e., the indices of the original position of $\sigma _k$ in $\varvec{\Sigma }$. This allows us to modify Eq. 38 to

$$\begin{aligned} \sigma = \textbf{G}(\theta _{\varvec{\Lambda }}) \; \theta _{\varvec{\Omega }} \; + \; \textbf{G}_c \; c \end{aligned}$$

(B2)

and reformulate the least squares objective as

$$\begin{aligned} F_{\text {LS}}&= (s - \sigma )^T \textbf{V}(s - \sigma ) \end{aligned}$$

(B3)

$$\begin{aligned}&= \left( s - \left( \textbf{G}(\theta _{\varvec{\Lambda }})\theta _{\varvec{\Omega }} + \textbf{G}_c c \right) \right) ^T \textbf{V}\left( s - \left( \textbf{G}(\theta _{\varvec{\Lambda }})\theta _{\varvec{\Omega }} + \textbf{G}_c c\right) \right) \end{aligned}$$

(B4)

$$\begin{aligned}&= \left( \left( s - \textbf{G}_c c \right) - \textbf{G}(\theta _{\varvec{\Lambda }})\theta _{\varvec{\Omega }} \right) ^T \textbf{V}\left( \left( s - \textbf{G}_c c \right) - \textbf{G}(\theta _{\varvec{\Lambda }})\theta _{\varvec{\Omega }}\right) \end{aligned}$$

(B5)

$$\begin{aligned}&= \left\Vert s' - \textbf{G}(\theta _{\varvec{\Lambda }})\theta _{\varvec{\Omega }}\right\Vert ^2_\textbf{V}\end{aligned}$$

(B6)

with $s' = \left( s - \textbf{G}_c c \right) $. For a fixed value of $\theta _{\varvec{\Lambda }}$, we can now again solve for $\theta _{\varvec{\Omega }}$.

Appendix C: Gradient of the SNLLS Objective

Let $a^T:= s^T\textbf{V}\textbf{G}\left( \textbf{G}^T \textbf{V}\textbf{G}\right) ^{-1}$. We derive the differential as

$$\begin{aligned} {{\,\mathrm{\textsf{d}}\,}}F_\text {SNLLS}&= {{\,\mathrm{\textsf{d}}\,}}\left( s^T\textbf{V}s - s^T\textbf{V}\textbf{G}\left( \textbf{G}^T\textbf{V}\textbf{G}\right) ^{-1}\textbf{G}^T\textbf{V}s \right) \nonumber \\&= - {{\,\mathrm{\textsf{d}}\,}}\left( s^T\textbf{V}\textbf{G}\left( \textbf{G}^T\textbf{V}\textbf{G}\right) ^{-1}\textbf{G}^T\textbf{V}s \right) \nonumber \\&= - s^T\textbf{V}{{\,\mathrm{\textsf{d}}\,}}\textbf{G}\left( \textbf{G}^T\textbf{V}\textbf{G}\right) ^{-1}\textbf{G}^T\textbf{V}s \nonumber \\&\quad - s^T\textbf{V}\textbf{G}{{\,\mathrm{\textsf{d}}\,}}\left( \textbf{G}^T\textbf{V}\textbf{G}\right) ^{-1}\textbf{G}^T\textbf{V}s \nonumber \\&\quad - s^T\textbf{V}\textbf{G}\left( \textbf{G}^T\textbf{V}\textbf{G}\right) ^{-1} {{\,\mathrm{\textsf{d}}\,}}\textbf{G}^T\textbf{V}s \nonumber \\&= -2 \; s^T\textbf{V}{{\,\mathrm{\textsf{d}}\,}}\textbf{G}\left( \textbf{G}^T\textbf{V}\textbf{G}\right) ^{-1}\textbf{G}^T\textbf{V}s \nonumber \\&\quad + s^T\textbf{V}\textbf{G}\left( \textbf{G}^T\textbf{V}\textbf{G}\right) ^{-1} {{\,\mathrm{\textsf{d}}\,}}\left( \textbf{G}^T\textbf{V}\textbf{G}\right) \left( \textbf{G}^T\textbf{V}\textbf{G}\right) ^{-1}\textbf{G}^T\textbf{V}s \nonumber \\&= -2 \; s^T\textbf{V}{{\,\mathrm{\textsf{d}}\,}}\textbf{G}a \nonumber \\&\quad + s^T\textbf{V}\textbf{G}\left( \textbf{G}^T\textbf{V}\textbf{G}\right) ^{-1} \left[ \left( {{\,\mathrm{\textsf{d}}\,}}\textbf{G}^T\textbf{V}\textbf{G}\right) + \left( \textbf{G}^T\textbf{V}{{\,\mathrm{\textsf{d}}\,}}\textbf{G}\right) \right] \left( \textbf{G}^T\textbf{V}\textbf{G}\right) ^{-1}\textbf{G}^T\textbf{V}s \nonumber \\&= -2 \; s^T\textbf{V}{{\,\mathrm{\textsf{d}}\,}}\textbf{G}a\nonumber \\&\quad + a^T \left[ \left( {{\,\mathrm{\textsf{d}}\,}}\textbf{G}^T\textbf{V}\textbf{G}\right) + \left( \textbf{G}^T\textbf{V}{{\,\mathrm{\textsf{d}}\,}}\textbf{G}\right) \right] a \nonumber \\&= -2 \; s^T\textbf{V}{{\,\mathrm{\textsf{d}}\,}}\textbf{G}a\nonumber \\&\quad +2 \; a^T \textbf{G}^T\textbf{V}{{\,\mathrm{\textsf{d}}\,}}\textbf{G}a \nonumber \\&= -2 \; \left( a^T \otimes s^T\textbf{V}\right) {{\,\mathrm{\textsf{d}}\,}}{{\,\textrm{vec}\,}}\textbf{G}\nonumber \\&\quad +2 \; \left( a^T \otimes a^T \textbf{G}^T \textbf{V}\right) {{\,\mathrm{\textsf{d}}\,}}{{\,\textrm{vec}\,}}\textbf{G}\nonumber \\&= -2 \; {{\,\textrm{vec}\,}}\left( \textbf{V}s a^T \right) ^T {{\,\mathrm{\textsf{d}}\,}}{{\,\textrm{vec}\,}}\textbf{G}\nonumber \\&\quad +2 \; {{\,\textrm{vec}\,}}\left( \textbf{V}\textbf{G}a a^T \right) ^T {{\,\mathrm{\textsf{d}}\,}}{{\,\textrm{vec}\,}}\textbf{G}\nonumber \\&= 2 \; {{\,\textrm{vec}\,}}\left( \textbf{V}\textbf{G}a a^T - \textbf{V}s a^T \right) ^T {{\,\mathrm{\textsf{d}}\,}}{{\,\textrm{vec}\,}}\textbf{G}\nonumber \\&= 2 \; {{\,\textrm{vec}\,}}\left( \left( \textbf{V}\textbf{G}a - \textbf{V}s\right) a^T\right) ^T {{\,\mathrm{\textsf{d}}\,}}{{\,\textrm{vec}\,}}\textbf{G}\nonumber \\&= \underbrace{2 \; {{\,\textrm{vec}\,}}\left( \left( \textbf{V}\textbf{G}a - \textbf{V}s\right) a^T\right) ^T {{\,\mathrm{\textsf{D}}\,}}\textbf{G}}_{= {{\,\mathrm{\textsf{D}}\,}}F_\text {SNLLS}} {{\,\mathrm{\textsf{d}}\,}}\theta _{\varvec{\Lambda }} \end{aligned}$$

(C1)

Appendix D: Alternative Proof

This is the analogous formulation to the one given in Kreiberg et al. (2021) for CFA models. We see that the resulting expressions contain very large Kronecker products; for reasons of computational efficiency, we therefore favor the expressions given in the main text. Let $\textbf{D}^+$ denote the Moore–Penrose inverse of the duplication matrix $\textbf{D}_{m_\textrm{obs}}$ from Magnus and Neudecker (2019b) such that

$$\begin{aligned} \sigma = \textbf{D}^+{{\,\textrm{vec}\,}}(\varvec{\Sigma }) \end{aligned}$$

(D1)

and $\textbf{L}$ be a matrix such that

$$\begin{aligned} {{\,\textrm{vec}\,}}(\varvec{\Omega }) = \textbf{L}\; \theta _{\varvec{\Omega }} \end{aligned}$$

(D2)

We can obtain $\textbf{L}\in \mathbb {R}^{m^2 \times \dim (\theta _{\varvec{\Omega }})}$ as

$$\begin{aligned} \textbf{L}_{ij} = {\left\{ \begin{array}{ll} 1, &{} \text {if } i = (k-1)m + l \vee i = (l-1)m + k \text { with } (k, l) = {\mathcal {C}}_j\\ 0, &{} \text {otherwise} \end{array}\right. } \end{aligned}$$

(D3)

and derive $\textbf{G}(\theta _{\varvec{\Lambda }})$ as

$$\begin{aligned} \sigma (\theta )&= \textbf{D}^+ {{\,\textrm{vec}\,}}(\varvec{\Sigma }) \end{aligned}$$

(D4)

$$\begin{aligned}&= \textbf{D}^+ {{\,\textrm{vec}\,}}(\textbf{F}(\textbf{I}-\varvec{\Lambda })^{-1}\varvec{\Omega }(\textbf{I}-\varvec{\Lambda })^{-T}\textbf{F}^T) \end{aligned}$$

(D5)

$$\begin{aligned}&= \textbf{D}^+ \left( \textbf{F}\left( \textbf{I}-\varvec{\Lambda }\right) ^{-1}\otimes \textbf{F}\left( \textbf{I}-\varvec{\Lambda }\right) ^{-1}\right) {{\,\textrm{vec}\,}}(\varvec{\Omega }) \end{aligned}$$

(D6)

$$\begin{aligned}&= \underbrace{\textbf{D}^+ \left( \textbf{F}\left( \textbf{I}-\varvec{\Lambda }\right) ^{-1}\otimes \textbf{F}\left( \textbf{I}-\varvec{\Lambda }\right) ^{-1}\right) \textbf{L}}_{= G(\theta _{\varvec{\Lambda }})} \; \theta _{\varvec{\Omega }} \end{aligned}$$

(D7)

We further define $\textbf{P}:= \left( \textbf{L}^T \otimes \textbf{D}^+ \left( \textbf{F}\otimes \textbf{F}\right) \right) $ and $\textbf{Q}:= \left( \textbf{I}_m \otimes \textbf{K}_m \otimes \textbf{I}_m \right) $, where ${\textbf {K}}_m$ is the commutation matrix from Magnus and Neudecker (2019b), and derive ${{\,\mathrm{\textsf{D}}\,}}\textbf{G}$ as

$$\begin{aligned} {{\,\mathrm{\textsf{d}}\,}}{{\,\textrm{vec}\,}}\textbf{G}&= {{\,\mathrm{\textsf{d}}\,}}{{\,\textrm{vec}\,}}\left[ \textbf{D}^+ \left( \textbf{F}\left( \textbf{I}-\varvec{\Lambda }\right) ^{-1}\otimes \textbf{F}\left( \textbf{I}-\varvec{\Lambda }\right) ^{-1}\right) \textbf{L}\right] \nonumber \\&= {{\,\mathrm{\textsf{d}}\,}}{{\,\textrm{vec}\,}}\left[ \textbf{D}^+ \left( \textbf{F}\otimes \textbf{F}\right) \left( \left( \textbf{I}-\varvec{\Lambda }\right) ^{-1}\otimes \left( \textbf{I}-\varvec{\Lambda }\right) ^{-1}\right) \textbf{L}\right] \nonumber \\&= \left( \textbf{L}^T \otimes \textbf{D}^+ \left( \textbf{F}\otimes \textbf{F}\right) \right) {{\,\mathrm{\textsf{d}}\,}}{{\,\textrm{vec}\,}}\left( \left( \textbf{I}-\varvec{\Lambda }\right) ^{-1}\otimes \left( \textbf{I}-\varvec{\Lambda }\right) ^{-1}\right) \nonumber \\&= \textbf{P}{{\,\textrm{vec}\,}}\left( {{\,\mathrm{\textsf{d}}\,}}\left( \textbf{I}-\varvec{\Lambda }\right) ^{-1}\otimes \left( \textbf{I}-\varvec{\Lambda }\right) ^{-1} + \left( \textbf{I}-\varvec{\Lambda }\right) ^{-1}\otimes {{\,\mathrm{\textsf{d}}\,}}\left( \textbf{I}-\varvec{\Lambda }\right) ^{-1}\right) \nonumber \\&= \textbf{P}[ \nonumber \\&\quad {{\,\textrm{vec}\,}}\left( \left( \textbf{I}-\varvec{\Lambda }\right) ^{-1} {{\,\mathrm{\textsf{d}}\,}}\varvec{\Lambda }\left( \textbf{I}-\varvec{\Lambda }\right) ^{-1}\otimes \left( \textbf{I}-\varvec{\Lambda }\right) ^{-1}\right) \nonumber \\&\quad + {{\,\textrm{vec}\,}}\left( \left( \textbf{I}-\varvec{\Lambda }\right) ^{-1}\otimes \left( \textbf{I}-\varvec{\Lambda }\right) ^{-1} {{\,\mathrm{\textsf{d}}\,}}\varvec{\Lambda }\left( \textbf{I}-\varvec{\Lambda }\right) ^{-1}\right) \nonumber \\&\quad ] \nonumber \\&= \textbf{P}\textbf{Q}[ \nonumber \\&\quad {{\,\textrm{vec}\,}}\left( \left( \textbf{I}-\varvec{\Lambda }\right) ^{-1} {{\,\mathrm{\textsf{d}}\,}}\varvec{\Lambda }\left( \textbf{I}-\varvec{\Lambda }\right) ^{-1} \right) \otimes {{\,\textrm{vec}\,}}\left( \textbf{I}-\varvec{\Lambda }\right) ^{-1} \nonumber \\&\quad + {{\,\textrm{vec}\,}}\left( \textbf{I}-\varvec{\Lambda }\right) ^{-1} \otimes {{\,\textrm{vec}\,}}\left( \left( \textbf{I}-\varvec{\Lambda }\right) ^{-1} {{\,\mathrm{\textsf{d}}\,}}\varvec{\Lambda }\left( \textbf{I}-\varvec{\Lambda }\right) ^{-1}\right) \nonumber \\&\quad ] \nonumber \\&= \textbf{P}\textbf{Q}[ \nonumber \\&\quad \left( \textbf{I}_{m^2} \otimes {{\,\textrm{vec}\,}}\left( \textbf{I}-\varvec{\Lambda }\right) ^{-1} \right) {{\,\textrm{vec}\,}}\left( \left( \textbf{I}-\varvec{\Lambda }\right) ^{-1} {{\,\mathrm{\textsf{d}}\,}}\varvec{\Lambda }\left( \textbf{I}-\varvec{\Lambda }\right) ^{-1} \right) \nonumber \\&\quad + \left( {{\,\textrm{vec}\,}}\left( \textbf{I}-\varvec{\Lambda }\right) ^{-1} \otimes \textbf{I}_{m^2} \right) {{\,\textrm{vec}\,}}\left( \left( \textbf{I}-\varvec{\Lambda }\right) ^{-1} {{\,\mathrm{\textsf{d}}\,}}\varvec{\Lambda }\left( \textbf{I}-\varvec{\Lambda }\right) ^{-1} \right) \nonumber \\&\quad ] \nonumber \\&= \underbrace{\textbf{P}\textbf{Q}\left[ \left( \textbf{I}_{m^2} \otimes {{\,\textrm{vec}\,}}\left( \textbf{I}-\varvec{\Lambda }\right) ^{-1} \right) + \left( {{\,\textrm{vec}\,}}\left( \textbf{I}-\varvec{\Lambda }\right) ^{-1} \otimes \textbf{I}_{m^2} \right) \right] \left( \left( \textbf{I}-\varvec{\Lambda }\right) ^{-T} \otimes \left( \textbf{I}-\varvec{\Lambda }\right) ^{-1} \right) \textsf{D} \varvec{\Lambda }}_{= {{\,\mathrm{\textsf{D}}\,}}\textbf{G}} {{\,\mathrm{\textsf{d}}\,}}\theta _\Lambda \end{aligned}$$

(D8)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Ernst, M.S., Peikert, A., Brandmaier, A.M. et al. A Note on the Connection Between Trek Rules and Separable Nonlinear Least Squares in Linear Structural Equation Models. Psychometrika 88, 98–116 (2023). https://doi.org/10.1007/s11336-022-09891-5

Download citation

Received: 05 February 2022
Revised: 21 August 2022
Accepted: 19 October 2022
Published: 25 December 2022
Issue Date: March 2023
DOI: https://doi.org/10.1007/s11336-022-09891-5

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A Note on the Connection Between Trek Rules and Separable Nonlinear Least Squares in Linear Structural Equation Models

Abstract

Similar content being viewed by others

A new criterion for assessing discriminant validity in variance-based structural equation modeling

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

1 Review of Concepts

1.1 Linear Structural Equation Models

1.2 Least Squares Estimation

1.3 Trek Rules

1.3.1 Example

1.3.2 Formal Definitions

2 Separable Nonlinear Least Squares for SEM

2.1 Outline of Previous Work

2.2 Derivation of \(\textbf{G}(\theta _{\varvec{\Lambda }})\)

2.3 Mean Structures

2.4 Gradient of the SNLLS Objective

3 Discussion

3.1 Convergence

3.2 Computation Time

3.3 An Outlook on Maximum Likelihood Estimation

3.4 Conclusion

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (txt 3 KB)

Supplementary file 2 (txt 4 KB)

Appendices

Appendix A: Equality Constraints

Appendix B: Constants in \(\varvec{\Omega }\)

Appendix C: Gradient of the SNLLS Objective

Appendix D: Alternative Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation