Fully and partially exploratory factor analysis with bi-level Bayesian regularization

Chen, Jinsong

doi:10.3758/s13428-022-01884-7

Fully and partially exploratory factor analysis with bi-level Bayesian regularization

Published: 12 July 2022

Volume 55, pages 2125–2142, (2023)
Cite this article

Download PDF

Behavior Research Methods Aims and scope Submit manuscript

Fully and partially exploratory factor analysis with bi-level Bayesian regularization

Download PDF

Jinsong Chen¹

829 Accesses
1 Altmetric
Explore all metrics

Abstract

This research introduces the fully and partially exploratory factor analysis (EFA) with bi-level Bayesian regularization. The proposed models enable factor selection with a sparse model by conceptualizing the factor and loading as the group and individual levels, respectively. They offer a series of benefits such as factor extraction and parameter estimation in one step, simultaneous estimation of the model and tuning parameters, and the availability of interval estimates. Moreover, partial knowledge can be incorporated together with unknown number of factors in the partially EFA. Simulation studies and real-data analyses demonstrated that both models performed satisfactorily under reasonable conditions and were robust to interference of local dependence, while the partially EFA with appropriate information can outperform the fully version and work well under more extreme conditions. The proposed models have been implemented in the R package LAWBL.

Reporting reliability, convergent and discriminant validity with structural equation modeling: A review and best-practice recommendations

Article Open access 30 January 2023

A new criterion for assessing discriminant validity in variance-based structural equation modeling

Article Open access 22 August 2014

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

Article 04 June 2018

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

In social and behavioral research involving latent variables or factors, factor analysis (FA) models are widely adopted with two typical modeling approaches. The exploratory approach characterized by exploratory FA (EFA; Jennrich & Sampson, 1966) is data-driven with little substantive knowledge before modeling, whereas the confirmatory approach represented by confirmatory FA (CFA; Jöreskog, 1969) is theory-driven with a confirmatory structure. In general, however, there is a vast amount of space between the fully exploratory end where even the number of factors is unclear and the fully confirmatory end where every loading is either fixed or specified. Recent development of regularization methods enables more flexibility to cover a wider space in between the two ends. With penalty, statistical regularization imposes a cost on the optimization function to achieve a sparse model that is simpler and more interpretable. One typical example is the L₁-norm penalty, or the least absolute shrinkage and selection operator (Lasso), which can shrink unspecified parameters towards zero to obtain a sparse model (Tibshirani, 1996).

Statistical regularization is more successful towards the confirmatory end with a confirmatory part and known number of factors, either within the frequentist (e.g., Huang et al., 2017; Jacobucci et al., 2016) or Bayesian (e.g., Feng et al., 2017; Lu et al., 2016; Muthén & Asparouhov, 2012) frameworks. Under both frameworks, coefficient or loading estimation is usually conceptualized as a regularized variable-selection problem under the assumption of local independence. Both can result in sparse models and are supplemental to each other to some extent (Jacobucci & Grimm, 2018). More recently, a partially confirmatory approach offers more flexibility to regularize the loading pattern and local dependence (i.e., correlated residual) simultaneously (Chen et al., 2021).

On the exploratory end, where the number of factors is unclear, however, the benefit of regularization is relatively limited. The frequentist framework is often adopted in EFA regularization (e.g., Choi et al., 2010; Hirose & Yamamoto, 2015; Trendafilov et al., 2017). The principal component analysis (PCA)-like decomposition of orthonormal factors with penalized likelihood-type methods are usually adopted to shrink the loading vector, rather than individual loading, towards zero. However, their advantages are blurred over non-regularized EFA in the frequentist (see Auerswald & Moshagen, 2019 for more details) or Bayesian (e.g., Conti et al., 2014; Frühwirth-Schnatter & Lopes, 2018) frameworks. For instance, the sparse EFA (SEFA; Trendafilov et al., 2017) based on special reparameterization and L₁-norm penalty on the loading vector can achieve sparse model and factor selection at the same time, representing the current frontiers of the field. However, it only offers orthogonal solutions, which require post hoc rotation for correlated factors. Moreover, the shrinkage parameters need to be found additionally, usually by cross-validation or employing information criteria, and interval estimates of model parameters are not available.

Recently, a one-step Bayesian regularized exploratory factor analysis (BREFA) model with bi-level Bayesian regularization was introduced (Chen, 2021a). It demonstrated clear advantages over traditional multi-step EFA where the number of factors was first extracted, followed by factor rotation and parameter estimation. Based on idea similar to bi-level Bayesian sparse group selection (Xu & Ghosh, 2015) with the spike-and-slab priors (Ishwaran & Rao, 2005; Mitchell & Beauchamp, 1988), BREFA enables factor selection with a sparse model by conceptualizing the factor and loading as the group and individual levels, respectively. It can extract factors and estimate model parameters in one step. The tuning parameters can be also estimated at the same time, and interval estimates are available for significant test in case needed.

However, BREFA can be unidentified or unstable due to issues such as sign indeterminacy, column switch, and local dependence, which will be explained in more detail in the Model Identification and Stableness subsection. In this research, the BREFA will be revised to address the above issues and the revised model will be called fully EFA (FEFA). Moreover, the FEFA will be further extended as a partially EFA (PEFA) model where the loading pattern can be partially specified together with unknown number of factors. The extension provides more flexibility on the exploratory end to accommodate partial knowledge. Simulation studies and real-data analyses will be conducted to evaluate the performance of both models under different conditions, including the interference of local dependence. Comparisons between the PEFA and FEFA (and hence the BREFA) will be considered aiming to give applied researchers more guidance under an exploratory setting. Finally, both models have been implemented in the free-of-charge R package LAWBL (Chen, 2021c), making them more accessible to applied researchers.

For both models, the loading vector is reparameterized to tackle model sparsity at the factor and loading levels. Then, the multivariate spike-and-slab priors (Yuan & Lin, 2005) with the posterior median estimator (PME) are used at both levels, which can produce exact zero estimates. When factor selection is irrelevant, it reduces to the single-level regularization on the loading pattern and is similar to Lu et al. (2016)’s sparse CFA with spike-and-slab prior on the confirmatory end (the estimation of factorial correlations and shrinkage parameters are different though). The full Bayesian hierarchical formulation can be implemented with the resampling-based Markov chain Monte Carlo (MCMC; Gilks et al., 1996) estimation, which was mixed with the Gibbs sampling (Casella & George, 1992; Geman & Geman, 1984), empirical Bayes Gibbs sampling (Casella, 2001; Park & Casella, 2008), and Metropolis–Hastings algorithm (Hastings, 1970; Metropolis et al., 1953).

Theoretical framework

Bayesian sparse group selection

The solution for factor and loading selection in EFA is associated with the sparse Lasso-type variable selection and estimation for group data at both the group and individual levels.

Consider the following linear regression with G groups, each with a group coefficient vector β_g of length m_g, as:

$$y_{n\times1}=\sum\nolimits_{g=1}^G{\mathbf X}_g{\mathbf\beta}_g+\mathbf\epsilon,$$

(1)

where the error distribution is ε_n×1 ∼ N_n(0, σ²I_n), I_n is the n-dimensional identity matrix, and X_g is an n × m_g covariate matrix corresponding to the group β_g, g = 1, 2, . . . , G. Extending the group-level Lasso method (Yuan & Lin, 2006), the sparse group Lasso estimator was provided (Simon et al., 2013) to shrink the coefficient vector towards zero at both the group and individual levels, as:

$$\hat{\beta}=\underset{\beta }{\mathrm{argmin}}\left({\left\Vert y-{\sum}_{g=1}^G\mathbf{X}_g\mathbf{\upbeta}_g\right\Vert}_2^2+{\gamma}_1{\left\Vert \beta \right\Vert}_1+{\gamma}_2{\sum}_{g=1}^G{\left\Vert \mathbf{\upbeta}_g\right\Vert}_2\right),$$

(2)

where β is any individual element in the vector β_g, γ₁ and γ₂ are the shrinkage parameters at the individual and group levels, respectively. According to Xu and Ghosh (2015), Eq. (2) can be reformulated as the Bayesian sparse group Lasso with the prior:

$$p\left(\beta \right)\propto \exp \left(-{\gamma}_1{\left\Vert \beta \right\Vert}_1-{\gamma}_2{\sum}_{g=1}^G{\left\Vert \mathbf{\upbeta}_g\right\Vert}_2\right),$$

(3)

which is essentially a scale mixture of normals with an exponential mixing density.

The Bayesian sparse group Lasso can be extended with the spike-and-slab priors (SSP), which are multivariate zero inflated mixture priors mixed with point mass priors for the spike part and Lasso-type priors (with heavy tail) for the slab part (Yuan & Lin, 2005). The SSP can combine the power of point mass mixture priors and double exponential distributions in variable selection and estimation. As a result, the Bayesian sparse group selection with SSP (BSGL-SSP) is found to have desirable properties such as group-wise sparsity and within-group sparsity with exact zero estimates, optimal asymptotic estimation rate, and lower false-positive rate (Xu & Ghosh, 2015). Moreover, it can help eliminate the interference of spurious factors and maintain the stability of the resampling process under the EFA setting (Chen, 2021a). Accordingly, it will be employed and extended to construct the FEFA and PEFA in this study. Readers can refer to the literature for more information about the BSGL-SSP (e.g., Liquet et al., 2017; Xu & Ghosh, 2015).

Fully exploratory factor analysis with spike-and-slab prior

Formulation of the FEFA with SSP has been discussed in previous research (Chen, 2021a) and will be introduced here with related modifications for model identification and stableness. For a psychological test or scale with J items, K factors, and N examinees, the data matrix can be denoted as Y = (Y₁, …, Y_J) = (y_ij)_N×J with vector y_i =(y_i1,··· , y_iJ)^T, where examinee i = 1, …, N and item j = 1, …, J. Underlying the response data, there assumed to be latent factors in the K-dimensional Euclidean space, Ω = (Ω₁, …, Ω_K) = (ω_ik)_N × K with vector ω_i = (ω_i1,…ω_ik…, ω_iK)^T and k = 1,…, K. Note that the K factors includes both true and spurious factors. A general FA model satisfies the following equation:

$${\boldsymbol{y}}_i=\boldsymbol{\Lambda} {\boldsymbol{\upomega}}_i+{\boldsymbol{\upepsilon}}_i,$$

(4)

where Λ = (Λ₁, …, Λ_K) = (λ_jk)_J×K is the J×K loading matrix and ε_i ∼ N_J(0, Ψ) are the error terms with diagonal covariance matrix Ψ = diag(ψ_jj)_J×J. Moreover, the factors are assumed to follow multivariate normal distributions, ω_i ~ N_K(0, Φ). Like classic factor analysis, it is assumed that Φ = (ϕ_kk′)_K × K is a correlation matrix for scale determinacy, and the data have been standardized and centered (i.e., there is no intercept term in the equation).

Under the FEFA setting, all elements in the loading matrix Λ are considered as unspecified and to be selected. Accordingly, two levels of regularization are of concern in Λ: 1) individual loading λ_jk and 2) loading vector Λ_k, which is on the factor level. The situation is comparable to the sparse group selection in BSGL-SSP, but with important difference: in sparse group selection, the regularized coefficients are regression on observed variables (i.e., data), whereas the regularized loadings are regression on latent variables (i.e., factors). Similar to the BSGL-SSP, Λ_k can be reparameterized to tackle the two concerns separately, as:

$${\boldsymbol{\Lambda}}_k={\mathbf{V}}_k^{1/2}{\mathbf{b}}_k,$$

(5)

where ${\mathbf{V}}_k^{1/2}=\operatorname{diag}\left\{{\tau}_{1k},\dots, {\tau}_{Jk}\right\}$ with τ_jk ≥ 0 is responsible for the magnitude of individual loading, and b_k = (b_1k, …, b_Jk)^T controls the entire loading vector. To select the loading vector at the factor level, multivariate SSP can be assumed for b_k, as:

$${\mathbf{b}}_k\sim \left(1-{\pi}_0\right){N}_J\left(\mathbf{0},{\mathbf{I}}_J\right)+{\pi}_0{\delta}_0\left({\mathbf{b}}_k\right),$$

(6)

where δ₀(b_k) represents a point mass at 0 for b_k. It implies that b_k is either zero (i.e., for spurious factor) or has a multivariate normal distribution with 0 mean and identity matrix as its covariance matrix.

To select individual loading λ_jk, the following SSP can be assumed for τ_jk:

$${\tau}_{jk}\sim \left(1-{\pi}_1\right){N}^{+}\left(0,{s}_k^2\right)+{\pi}_1{\delta}_0\left({\tau}_{jk}\right),$$

(7)

where ${N}^{+}\left(0,{s}_k^2\right)$ represents the normal distribution $N\left(0,{s}_k^2\right)$ truncated below 0. It implies that the magnitude of individual loading is either zero or normally distributed above zero. To some extent, the shrinkage parameter in the SSP, $1/{s}_k^2$, is equivalent to a composite of the individual-level and group-level shrinkage parameters in sparse group Lasso, namely, γ₁ and γ₂ in Eq. (3). Interested readers may refer to the literature for more details (e.g., Kuo & Mallick, 1998; Xu & Ghosh, 2015). Note that different from the BSGL-SSP, which has an orthogonal design, one shrinkage parameter per factor (i.e., loading vector) is adopted to address correlated factors. Hyper-prior can be assigned as ${s}_k^2\sim \mathrm{Inv}-\mathrm{Gamma}\left(1,{r}_k\right)$. Empirical Bayes–Gibbs sampling can be adopted when estimating r_k (Appendix A).

For hyperparameters π₀ and π₁, conjugate hyper-priors can be used:

$${\pi}_0\sim \mathrm{Beta}\left({a}_{01},{a}_{02}\right),\kern0.5em {\pi}_1\sim \mathrm{Beta}\left({a}_{11},{a}_{12}\right),$$

(8)

where uniform hyper-priors will be applied with a₀₁ = a₀₂ = a₁₁ = a₁₂ = 1.For covariance matrix Ψ, an inverse gamma prior can be applied on its j-th diagonal term ψ_jj:

$${\psi}_{jj}\sim \mathrm{Inv}-\mathrm{Gamma}\left({d}_{j1},{d}_{j2}\right),$$

(9)

where the hyperparameters d_j1 = 1 and d_j2 = .01 will be set. Note that d_j1 and d_j2 determine the density of the inverse gamma distribution and accordingly the scale of ψ_jj. With larger values or stronger prior knowledge, the importance of the data likelihood in the full conditional distribution will decline.

The correlation matrix Φ can be obtained following Liu and his colleague’s approach (Liu, 2008; Liu & Daniels, 2006): first sample from a covariance matrix Φ* with a conjugate prior $\mathrm{Inv}-\mathrm{Wishart}\left({\mathbf{S}}_0^{-1},{q}_0\right)$; then Φ* is transformed into a correlation matrix with a Metropolis–Hastings acceptance probability (see Appendix A for more details).

Model identification and stableness

The above model is not identifiable or becomes unstable during the resampling process without imposing additional constraints on the parameters. The identification issues mainly involve the uniqueness when decomposing the covariance structure as follows:

$$\mathrm{COV}\left({\mathbf{y}}_i\right)=\boldsymbol{\Lambda} \boldsymbol{\Phi} {\boldsymbol{\Lambda}}^{\mathrm{T}}+\boldsymbol{\Psi},$$

(10)

Similar to traditional EFA (Anderson & Rubin, 1956), two types of issues are of concern and related constraints are imposed: the identifiability of the residual covariance matrix Ψ and the loading matrix Λ. However, the FEFA with the bi-level regularization is less restrictive in some of the issues, and accordingly provides more flexibility in terms of model identification.

Full rank matrix

The (true) factorial covariance matrix needs to be of full rank, which implies that any two true factors cannot be highly correlated. Otherwise, the two factors are undistinguishable. Similarly, the true loading matrix should be of full (column) rank to be identifiable.

Scale and sign indeterminacy

It is evident that Eq. (4) remain unchanged when $\widetilde{\boldsymbol{\Lambda}}=\boldsymbol{\Lambda} \mathbf{A}$ and $\widetilde{\boldsymbol{\Omega}}={\mathbf{A}}^{-1}\boldsymbol{\Omega}$ where A can be arbitrary nonzero scalar or invertible matrix with dimension (K × K). It implies that the scale of the factor is indeterminate. In this research, Φ is constrained as a correlation matrix to fix the scale. Alternatively, one can estimate a covariance matrix while fixing one item per factor. However, such constraint is difficult to implement when both the number of factors and allocation of items to factors are unknown. Similarly, both the signs of Λ_k and factorial correlations are indeterminate since A can switch among the 2^K diagonal matrices, diag(±1, …, ±1)_K, which can be problematic during the resampling process. A constraint is imposed on the mean of the loading vector, ${\bar {\boldsymbol{\Lambda}}}_k\ge 0$, which allows the factorial correlations to be positive or negative.

Rotational indeterminacy and column switch

Since A can be also a rotation matrix, special modeling constraint is needed to maintain rotational invariance in conventional Bayesian EFA (e.g., Conti et al., 2014; Frühwirth-Schnatter & Lopes, 2018). The issue is of little concern for the FEFA though since the bi-level regularization always strives for a simpler model at both the factor and loading levels. The simplest solution prevails with or without rotational invariance, which was consistent with what we found in both simulation and real-data analyses.

Another issue is the column switch of the loading matrix during the resampling process, which is especially of concern under the Bayesian setting (e.g., Conti et al., 2014; Frühwirth-Schnatter & Lopes, 2018). We found the issue more salient with a single shrinkage parameter for all factors, but can be greatly alleviated by imposing an individual shrinkage parameter for each factor. Accordingly, FEFA with single s_k for each factor k is largely immune to the issue of column switch, except for the trivial issue of column permutation in the loading matrix; that is, the positions of the true and estimated factors might not match each other.

Uniqueness prerequisite

To achieve uniqueness, there should be at least three nonzero loadings per factor in the true model (Anderson & Rubin, 1956). Moreover, for the uniqueness of Ψ, the maximum number of possible factors should be within the Ledermann bound, as ${K}_{\mathrm{M}}\le \left(2J+1-\sqrt{8J+1}\right)/2$ (Ledermann, 1937). Both constraints can be readily implemented in the algorithm.

Cut-off value for factor selection

Strictly speaking, factor k should be considered as spurious or unselected only if b_k = 0 or the eigenvalue of the factor is exactly zero (i.e., ${\left\Vert {\boldsymbol{\Lambda}}_k\right\Vert}_2^2=0$). In practice, however, this is not a good idea for two reasons: 1) it is important to ignore minor factors with small eigenvalues for the purpose of dimension reduction. Note that it is conventional to ignore factors with eigenvalues smaller than 1 under the classic EFA settings; and 2) estimates of zero loading vector can fluctuate during the resampling process due to various interferences (e.g., local dependence), making associated eigenvalues larger than zero. A more practical way is to consider factor k as spurious if ${\left\Vert {\hat{\boldsymbol{\Lambda}}}_k\right\Vert}_2^2<\upvarepsilon$₀ for some cut-off value ε₀, which can be empirically determined.

Local dependence and parameter overflow

In most EFA settings, Ψ is assumed to be a diagonal matrix with local independence. In practice, however, correlated residuals can be inevitable to applied researchers. Local dependence can create spurious factors and interfere with the resampling process. We also found that the loading estimates can overflow (i.e., larger than one) with local dependence, further adding to model unstableness during the resampling process. When overflow occurs, one can roll back the associated loading estimates to those from a previous draw, assuming the previous draw is admissible. One possible pitfall is that the Markov chain might be stuck and stop moving, which should be extremely rare in practice and can be avoided by starting with different initial values. Together with the constraint of at least three nonzero loadings per factor and appropriate cut-off value for factor selection, it is expected that the FEFA will be robust to interference of local dependence, which usually involves an item pair. However, it also means that factors with only two items will be ignored.

Posterior analysis and MCMC estimation

A graphical representation of the model structure of different parameters is presented in Fig. 1. With the above model specification, the joint and conditional posteriors can be obtained as shown in Appendix A. Although the joint posterior distribution has a complicated form that is difficult to handle, most resulting full conditionals are standard distributions that can be directly sampled from with Gibbs sampler or block Gibbs sampler.

Note that for simultaneous selection and estimation in spike-and-slab-type models, the PME was suggested, which is a soft thresholding estimator with the oracle property of variable selection consistency and asymptotic normality (Xu & Ghosh, 2015). The eigenvalue based on the PME is also a soft thresholding estimator. The spike part leads to the median thresholding estimator that can select the loading vector and individual loading automatically and the soft thresholds depend on π₀ and π₁, respectively, while the hyperparameter in the slab part, ${s}_k^2$, decides the shrinkage of individual loading.

Multiple chains with different initial values can be run to monitor the convergence of the algorithm. After the burn-in period, the convergence for the parameters of interest can be determined using the estimated potential scale reduction (EPSR) value (Gelman, 1996). The standard errors of estimates can be characterized with the highest posterior density (HPD) intervals, or more specifically the 100(1−α)% HPD interval (Box & Tiao, 1973). For model comparisons, the deviance information criterion (DIC; Spiegelhalter et al., 2002) can be used.

Extension to partially exploratory factor analysis

In the FEFA above, no substantive knowledge is needed, as all elements in the loading matrix are unspecified (i.e., to be selected). In practice, however, one might want to incorporate partial knowledge. The FEFA can be extended as the PEFA where the loading pattern can be partially specified, together with unknown number of factors. The extension provides more flexibility on the exploratory end to accommodate partial knowledge.

Specifically, elements in the loading matrix can be fixed as zero, specified as free to estimate based on substantive knowledge (specified loading), or unspecified. When there is at least one specified loading per factor, no factor selection is needed and the reduced model is similar to other Bayesian CFA regularization methods (e.g., Lu et al., 2016; Muthén & Asparouhov, 2012).

Estimation of specified loadings can proceed item by item. For a specific item, one can rearrange the factors so that the specified and unspecified loadings can be partitioned. For item j, denote the number of specified loadings as ${K}_j^{\sim }$, the factor matrix as ${\boldsymbol{\Omega}}_j^{\sim }=\left({\boldsymbol{\Omega}}_1,\dots, {\boldsymbol{\Omega}}_{K_j^{\sim }}\right)$, and the specified loading vector as ${\boldsymbol{\uplambda}}_j^{\sim }={\left({\lambda_j}_1,\dots, {\lambda}_{j{K}_j^{\sim }}\right)}^T$. With a conjugate normal prior, ${\boldsymbol{\uplambda}}_j^{\sim}\sim {N}_{K_j^{\sim }}\left({\boldsymbol{\uplambda}}_{0j},{\mathbf{H}}_{0j}\right)$, the full conditional posterior of the specified loading vector can be derived as:

$${\lambda}_j^{\sim}\mid \mathrm{rest}\sim {N}_{K_j^{\sim }}\left({\boldsymbol{\Sigma}}_j^{\sim}\left({\psi}_{jj}^{-1}{\left({\Omega}_j^{\sim}\right)}^{\mathrm{T}}{Y}_j+{H}_{0j}^{-1}{\lambda}_{0j}\right),{\boldsymbol{\Sigma}}_j^{\sim}\right),$$

(11)

where ${\boldsymbol{\Sigma}}_j^{\sim }={\left({\psi}_{jj}^{-1}{\left({\boldsymbol{\Omega}}_j^{\sim}\right)}^{\mathrm{T}}{\boldsymbol{\Omega}}_j^{\sim }+{\mathbf{H}}_{0j}^{-1}\right)}^{-1}$.

Estimation of unspecified loadings can proceed factor by factor. For factors with specified loadings, the sparsity at the factor level is not of concern, and some prior and related posterior distributions need to be modified accordingly. This section presents only those that are adjusted while others are similar to those for FEFA. For a specific factor, one can also rearrange the items so that the specified and unspecified loadings can be partitioned. Assume there are K^* ≤ K factors with at least one specified loading. For factor k, where k = 1, …, K^*, denote the number of unspecified loadings as ${J}_k^{\ast }$ and the loading vector as ${\boldsymbol{\Lambda}}_k^{\ast }={\left({\lambda}_{1k},\dots, {\lambda}_{J_k^{\ast }k}\right)}^{\mathrm{T}}$. The factor-level loading vector can be reparameterized similar to the above, as:

$$\boldsymbol{\Lambda}_k^{\ast }=\mathbf{V}_k^{\ast 1/2}\mathbf{b}_k^{\ast },$$

(12)

where ${\mathbf{b}}_k^{\ast }={\left({b}_{1k},\dots, {b}_{J_k^{\ast }k}\right)}^{\mathrm{T}}$, ${\mathbf{V}}_k^{\ast 1/2}=\operatorname{diag}\left\{{\tau}_{1k},\cdots, {\tau}_{J_k^{\ast }k}\right\}$, and τ_jk ≥ 0. The prior distribution for ${\mathbf{b}}_k^{\ast }$ can be adjusted as:

$${\mathbf{b}}_k^{\ast}\sim {N}_{J_k^{\ast }}\left(\mathbf{0},{\mathbf{I}}_{J_k^{\ast }}\right).$$

(13)

The corresponding conditional posterior is changed as:

$${\mathbf{b}}_k^{\ast}\mid \mathrm{rest}\sim {N}_{J_k^{\ast }}\left({\boldsymbol{\upmu}}_k^{\ast },{\boldsymbol{\Sigma}}_k^{\ast}\right),$$

(14)

where ${\boldsymbol{\upmu}}_k^{\ast }={\boldsymbol{\Psi}}^{-1}{\boldsymbol{\Sigma}}_k^{\ast }{\mathbf{V}}_k^{\ast 1/2}{\mathbf{Y}}^{\mathrm{T}}{\boldsymbol{\Omega}}_k$ and ${\boldsymbol{\Sigma}}_k^{\ast }={\left({\mathbf{I}}_{J_k^{\ast }}+{\boldsymbol{\Psi}}^{-1}{\mathbf{V}}_k^{\ast 1/2}\left({\boldsymbol{\Omega}}_k^{\mathrm{T}}{\boldsymbol{\Omega}}_k\right){\mathbf{V}}_k^{\ast 1/2}\right)}^{-1}$. The conditional posterior of ${s}_k^2$ for factor k with specified loadings can be adjusted as:

$${s}_k^2\mid \mathrm{rest}\sim \mathrm{Inv}-\mathrm{Gamma}\left(1+\frac{1}{2}\#\left({\tau}_{jk}=0\right),{r}_k+\frac{1}{2}\left(\sum_{j=1}^{J_k^{\ast }}{\tau}_{jk}^2+J-{J}_k^{\ast}\right)\right).$$

(15)

For factors without specified loading, the posterior probability of b_k = 0 (k ∉ K^∗) given the remaining parameters can be adjusted as:

$${l}_k=p\left({\mathbf{b}}_k=\mathbf{0}|\mathrm{rest}\right)=\frac{\pi_0}{\pi_0+\left(1-{\pi}_0\right){\left|{\boldsymbol{\Sigma}}_k\right|}^{1/2}\mathrm{exp}\left\{\mathrm{T}\mathrm{r}\left[{\left({\boldsymbol{\upmu}}_k^{\sim}\right)}^{\mathrm{T}}{\boldsymbol{\Sigma}}_k^{-1}{\boldsymbol{\upmu}}_k^{\sim}\right]/2\right\}},$$

(16)

where ${\boldsymbol{\upmu}}_k^{\sim }={\boldsymbol{\Psi}}^{-1}{\boldsymbol{\Sigma}}_k{\mathbf{V}}_k^{1/2}{\left(\mathbf{Y}-{\sum}_{k^{\prime} \in {K}^{\ast }}{\boldsymbol{\Omega}}_{k^{\prime} }{\boldsymbol{\Lambda}}_{k^{\prime}}^{\mathrm{T}}\right)}^{\mathrm{T}}{\boldsymbol{\Omega}}_k$. The conditional posteriors of all other parameters are the same as those for FEFA. A graphical representation of the model structure of different parameters can be found in Fig. 2.

Existing literature (Chen et al., 2021; Kyung et al., 2010; Wang, 2012) can be followed in assigning the prior or hyper-prior values for both the FEFA and PEFA as, a₀₁ = a₀₂ = a₁₁ = a₁₂ = 1, d_j1 = 1, d_j2 =. 01, λ_0j = 0, H_0j = 4I, S₀ = I+.1_od, and q₀ = K + 2, with I as the identity matrix and I +.1_od as the matrix with diagonal elements as 1 and off-diagonal elements as .1. These hyperparameters are consistent with those described above. Finally, the identification conditions are similar to, but less restrictive than those in FEFA.

Simulation studies

Simulation studies were employed to evaluate the performance of the proposed models across different settings. Factor extraction and parameter recovery were evaluated through two studies. For data generation, data were simulated with three true factors, K_t = 3, and six items per factor, namely J = 18. The factor variance was set as 1, and three cases of factorial correlations were considered: ϕ₁₂ = 0, .4, or .6 whereas ϕ₁₃ = ϕ₂₃ = .4 for all cases. The true loading matrix was simulated as:

$${\boldsymbol{\Lambda}}_{\mathrm{t}}^{\mathrm{T}}=\left[\begin{array}{c}{\lambda}_{11}\,{\lambda}_{21}\,{\lambda}_{31}\,{\lambda}_{41}\,{\lambda}_{51}\,{\lambda}_{61}\,{\lambda}_0\,{\lambda}_0\,{\lambda}_0\,{\lambda}_0\,{\lambda}_0\,{\lambda}_0\,{\lambda}_0\,{\lambda}_0\,{\lambda}_0\,{\lambda}_0\,{\lambda}_{17,1}\,{\lambda}_{18,1}\,\\ {}\begin{array}{c}{\lambda}_0\,{\lambda}_0\,{\lambda}_0\,{\lambda}_0\,{\lambda}_{52}\,{\lambda}_{62}\,{\lambda}_{72}\,{\lambda}_{82}\,{\lambda}_{92}\,{\lambda}_{10,2}\,{\lambda}_{11,2}\,{\lambda}_{12,2}\,{\lambda}_0\,{\lambda}_0\,{\lambda}_0\,{\lambda}_0\,{\lambda}_0\,{\lambda}_0\,\\ {}{\lambda}_0\,{\lambda}_0\,{\lambda}_0\,{\lambda}_0\,{\lambda}_0\,{\lambda}_0\,{\lambda}_0\,{\lambda}_0\,{\lambda}_0\,{\lambda}_0\,{\lambda}_{11,3}\,{\lambda}_{12,3}\,{\lambda}_{13,3}\,{\lambda}_{14,3}\,{\lambda}_{15,3}\,{\lambda}_{16,3}\,{\lambda}_{17,3}\,{\lambda}_{18,3}\end{array}\end{array}\right],$$

where the following major loadings were set as .7: λ₁₁ to λ₆₁, λ₇₂ to λ_12,2, and λ_13,3 to λ_18,3; the following minor loadings were set as .3: λ_17,1, λ_18,1, λ₅₂, λ₆₂, λ_9,2, λ_11,3, and λ_12,3; and all other loadings were set as λ₀ = 0. Namely, there were six major loadings and two minor loadings per factor. Under the psychological or educational context, a major loading is a loading connecting an item and the factor that the item is designed to measure. In contrast, a minor loading is a cross-loading that load on some factor unintentionally. For the case of local independence, all off-diagonal elements in Ψ were set as 0. Local dependence was also evaluated by setting ψ_4,3 = ψ_10,9 = ψ_16,15 =ψ_14,1 = ψ₇₂ = ψ_13,8 =.2 with a symmetric upper triangle. The sample size was set as N = 200 or 1000, and the number of replications was 200 for each simulation cell.

Both the FEFA and PEFA models were evaluated for factor extraction and parameter recovery, with eight possible factors, i.e., K = 8. For PEFA, the first three major loadings of the first two factors were specified while all other loadings were unspecified. For each replication, the computation time was about 1.5 and 1.6 min when N = 200, and about 2.8 and 3 min when N = 1000, for the FEFA and PEFA, respectively, on a laptop with an Intel Core i7 CPU. Note that a larger number of factors (e.g., K = 10) had been tried, with only trivial difference except for a longer running time.

For performance assessment, the bias of the parameter estimates (BIAS), the mean of the standard error estimates (SE), and the root mean squares error (RMSE) between the estimates and the true values were computed. The true positive rate (TPR) and false positive rate (FPR) for loading and factor selections were also reported. For loading and correlation estimation, the TPR and FPR indicate the proportion of times the true nonzero and zero values, respectively, were estimated as significantly different from zero at α = .05 based on the HPD interval. For factor extraction, the TPR and FPR indicate the proportion of times the true and spurious factor was selected or ${\left\Vert {\hat{\boldsymbol{\Lambda}}}_k\right\Vert}_2^2<\upvarepsilon$₀, respectively, for k = 1, …, K.

The prior and hyper-prior values were set as depicted in a previous section. In a preliminary study, the author found that small cut-off can be used (e.g., ε₀ = .1) when there was no local dependence. With local dependence or to ignore minor factors, a larger cut-off was needed to obtain stabilized solution. When the cut-off was too large, however, one might ignore substantive factors or combine multiple factors as one. The sample size also played a role, and smaller size preferred smaller cut-off. The good news was that different cut-offs usually converged to the same solution, given the Markov chain can be stabilized and before reaching a cut-off that was too large. To be consistent with traditional EFA, ε₀ was set as one in the simulation, which appeared to work well.

With the above settings, most Markov chains had reached stationary status (i.e., EPSR < 1.1) within 15,000 iterations, and the burn-in was set at 20,000 iterations. After the burn-in phase, parameters were estimated based on an additional 20,000 iterations. All programming was conducted on the R platform (R Development Core Team, 2020), using the LAWBL package (Chen, 2021c). Data were simulated and analyzed using the sim_lvm and pefa functions, respectively, in the package. More information and tutorial about the functions and package can be found online at https://jinsong-chen.github.io/LAWBL/.

Study 1: Evaluation of factor extraction

The first step was to evaluate if the right number of factors can be correctly identified. Two traditional methods were also evaluated for comparisons: The Hull method (Lorenzo-Seva et al., 2011) is based on the comparative fit index (Bentler, 1990) to assess the fit of each factor solution, which was superior to other implementations in Lorenzo-Seva et al. (2011); and the revised parallel analysis (PA-R; Green et al., 2015) implemented with the eigenvalues obtained from an EFA and the 95^th percentile as reference eigenvalue based on 100 random samples. Both methods were also evaluated in Auerswald and Moshagen (2019)’s study.

As shown in Table 1, both FEFA and PEFA extracted the right number of factors most of the time, even under local dependence, and the latter was slightly better. Hull was also satisfactory in general and robust to interference of local dependence, but can be problematic when factorial correlation is high and the sample size is small. In contrast, PA-R was problematic when there was local dependence, regardless of factorial correlation and sample size.

Table 1 Accuracy of different factor extraction methods

Full size table

In addition to the right number of factors, both FEFA and PEFA can also extract the true and spurious factors based on their eigenvalue. As mentioned, however, an FEFA model is identified up to the column permutation of the loading matrix, which means the position of the true and selected factors might not match each other. For each true factor k = 1,…, K_t, one can minimize the following to find its match in selected factor k’:

$$\underset{k^{\prime}\in {K}_{\mathrm{s}}}{\mathit{\min}}\left[{\sum}_{j=1}^J{\left({\lambda}_{jk}-{\hat{\lambda}}_{jk^{\prime}}\right)}^2\right],$$

(17)

where K_s ≤ K contains all the selected factors.

As shown in Tables 2 and 3, the extractions of true and spurious factors were satisfactory in terms of bias, RMSE, and SE, and the estimates tended to be slightly worse with high factorial correlation, local dependence, or smaller sample size. The TPRs and FPRs were also satisfactory in all cases. However, one concern was that two factors can be mixed as one, which was more likely with higher factorial correlations. Another concern was that a spurious factor can be inflated and identified as true (i.e., the FPR was larger than zero) due to local dependence. Fortunately, the problematic cases were no more than a few percentages under both concerns.

Table 2 Factor extraction based on eigenvalues for FEFA, N = 200

Full size table

Table 3 Factor extraction based on eigenvalues for FEFA, N = 1000

Full size table

Study 2: Evaluation of parameter recovery

As shown in Tables 4 and 5, the estimates of factorial correlations were satisfactory regardless of the factorial correlation, local dependence, or sample size. It is interesting to note that, once a spurious factor was identified as true, factorial correlations among true factors tended to be lower than the true value, which might need more investigation in future study.

Table 4 Estimation of factorial correlations for FEFA, N = 200

Full size table

Table 5 Estimation of factorial correlations for FEFA, N = 1000

Full size table

The loading estimates were close across different factorial correlations (i.e., ϕ₁₂ = 0, .4, or .6) and only the cases for ϕ₁₂ = .4 were reported as shown in Tables 6 and 7 (the PEFA results were slightly better but still similar and can be found in Appendix Table 12). As shown, the estimates were satisfactory in general although they tended to be slightly worse with local dependence or smaller sample size. The TPRs and FPRs were also satisfactory in general, but the FPR for λ₀ was slightly higher with local dependence and larger sample size, suggesting a slightly larger chance to inflate the zero loadings as significant. We also found that when two factors were mixed as one, the estimated loading vector for the mixed factor was similar to the addition of the two true loading vectors. However, more studies are needed on the issue, because there were only a few cases of mixed factors in existing research.

Table 6 Loading estimates for FEFA with ϕ₁₂ = .4 and N = 200

Full size table

Table 7 Loading estimates for FEFA with ϕ₁₂ = .4 and N = 1000

Full size table

Real-life example

Study 3: Questionnaire of mathematics learning

Table 8 gives potential items for a questionnaire of enjoyment, confidence (i.e., self-efficacy), utility (i.e., valuing), and anxiety in mathematics learning. A similar background questionnaire can be found in large-scale assessments such as the Trends in International Mathematics and Science Study (IEA, 2011). Based on the design, there should be four factors measured by 18 items, with major loadings for each item also given in the table.

Table 8 Student questionnaire of mathematics learning in school

Full size table

Responses from 218 Chinese fifth-grade students were collected and analyzed using the FEFA and PEFA. For the PEFA, the first three major loadings in the first three factors were specified, whereas all other loadings were unspecified. All other settings and prior values were the same as those in the simulation study. For both models, the Marko chains had reached stationary status within 10,000 iterations. The burn-in was set at 20,000 iterations, and 20,000 more iterations were drawn for parameter estimation, which took about 2 and 3 min for the FEFA and PEFA, respectively, on a similar laptop.

Estimation results (Table 9) showed that only three factors were selected in the FEFA whereas the PEFA recovered all four factors with corresponding major loadings successfully. The DIC for the FEFA and PEFA were 2643 and 2483, respectively, confirming that the latter fit better. Combing with the high correlation between the first two factors (~.76) in the PEFA (Table 10), it is clear that the first two factors were mixed as one in the FEFA. One can also see that the loading estimates for Factor 1 in the FEFA were close to those for Factor 1 and 2 in the PEFA. Other than that, the two cases were not far away in terms of the point and interval estimates and the number of estimates that are significant or marginally significant. Meanwhile, the author obtained essentially the same results with lower cut-off (e.g., ε₀ = .3). Finally, the number of factors one can extract using Hull and PA-R was one and three, respectively. The small number of factors based on Hull was consistent with the simulation finding when the sample size was small and factorial correlation was high.

Table 9 Loading estimates for the questionnaire of mathematics learning

Full size table

Table 10 Factorial correlation for the questionnaire of mathematics learning

Full size table

Study 4: Humor styles questionnaire

The Humor Styles Questionnaire (HSQ; Martin et al., 2003) with four factors and 32 items can be found in Appendix Table 13. Although the major loading of each item was confirmed, researchers were concerned about cross-loadings for some items, as the related behaviors tended to be multidimensional (Heintz, 2017). Public data for HSQ are available online at https://openpsychometrics.org/_rawdata/, and complete responses from 993 individuals were analyzed using the FEFA and PEFA, with other settings similar above.

The burn-in was set at 20,000 iterations, and 20,000 more iterations were drawn for parameter estimation, which took about 6 and 6.5 min for the FEFA and PEFA, respectively. The DIC for the FEFA and PEFA were 17,757 and 17,761, respectively, implying that they were essentially equivalent. Estimation results (Table 11) showed that both models recovered all four factors with corresponding major loadings successfully, with similar point and interval estimates for most loadings. It is worth noting that most minor loadings found significant were smaller than .2, suggesting the influence of the sample size. In brief, estimations of both models were approximate in terms of loadings, eigenvalues, and factorial correlations (Appendix Table 14). Meanwhile, the author won’t be able to obtain stable solution with substantively lower cut-off (e.g., ε₀ = .6). Finally, the number of factors one can extract using Hull and PA-R was 4 and 13, respectively.

Table 11 Estimation results for the Humor Styles Questionnaire

Full size table

Discussion

With the bi-level Bayesian regularization, the FEFA and PEFA can offer a series of benefits such as factor extraction and model sparseness in one step, simultaneous estimation of all model parameters including the shrinkage, both point and interval estimates, and incorporation of partial knowledge. Taken together, these benefits can improve the flexibility towards the exploratory end of factor analysis substantially. Simulation studies and real-data analyses demonstrated that both models performed satisfactorily under reasonable conditions and were robust to interference of local dependence, while the PEFA with appropriate information can outperform the FEFA and work well under more extreme conditions. Even considering the running time needed, the proposed methodology provides a viable alternative to traditional EFA with unknown number of factors.

On the other hand, more work is needed to fully understand the performance and utility of the proposed models across a wider range of scenarios, such as high dimensionality, more complex loading patterns including irrelevant items, and/or different amount of prespecified information. In the simulation study, the design of the loading structure was balanced and succinct so that factor extraction and parameter recovery can be evaluated simultaneously. Future research can consider more complex scenarios. For instance, we can decrease the magnitude of cross-loadings to .1 and/or allow the loading magnitudes to vary considerably. Although it would be challenging to evaluate the accuracy of parameter recovery, we can still figure out the performance of factor extraction.

In future studies, it would be also desirable to extend both models to address categorical data and missingness, both of which can be widely encountered in social and behavioral research. Moreover, extensions to include covariates, structural model, or more complex data structure such as multilevel or multiple samples are also worth exploring. Finally, a thorough comparisons between the existing and other EFA approaches for different purposes (e.g., extracting the number of factors, model simplicity, identifying relevant and irrelevant variables) can guide applied researchers to choose the appropriate ones across different practical settings.

Practical implications and suggestions

The proposed models can offer several advantages in practice. One advantage of the Bayesian approach is the availability of interval estimates. Under traditional EFA, one can only relies on conventional cut-off (e.g., .3 or .4) to determine significant loadings. As illustrated in real-life examples, the interval estimates offer additional flexibility to select significant loadings against insignificant ones. For factor extraction, this research adopted the conventional rule of eigenvalue larger than one, which worked well. It appeared though, a substantially smaller cut-off (e.g., .6) usually provided similar results, given the Markov chain is stabilized (note that the chain can be unstable when the cut-off is too small). In practice, it might not be a bad idea to try multiple cut-offs, which can give us more confidence once they converge to the same solution. Nevertheless, more investigation is desirable in this perspective.

The PEFA would be more valuable due to its properties of both factor-level simplicity and model interpretability. In an era not lacking in research findings, an approach that is capable of incorporating existing knowledge and exploring data-driven patterns at the same time can always offer more flexibility to scale or model development. When either developing a new scale or revising an existing scale, for instance, it is useful to be able to keep some parts of the scale more confirmatory, namely fully or partially consistent with existing knowledge, while allowing other parts to be more exploratory with dimension reduction.

Since the FEFA can be regarded as a special case of the PEFA, they consist of a partially exploratory approach to factor analysis where local dependence is sacrificed in return for factor extraction and loading regularization. When the number of factors is given and a few loadings per factor can be pre-specified, one can resort to the partially confirmatory approach where the loading structure and local dependence can be regularized simultaneously (Chen, 2022). Specifically, the partially confirmatory factor analysis (PCFA) is available for continuous data (Chen et al., 2021), with the partially confirmatory item response model for dichotomous data (Chen, 2020), and the generalized PCFA for data mixed with categorical and continuous responses (Chen, 2021b). With these two approaches, we are now equipped with a set of Bayesian regularization tools to address issues related to the factor, loading, and local dependence in factor analysis. It would be interesting to see how they can complement each other and play in concert in practice.

References

Anderson, T. W., & Rubin, H. (1956). Statistical inference in factor analysis. In J. Neyman (Ed.), Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability (Vol. V, pp. 111–150). : University of California Press.
Auerswald, M., & Moshagen, M. (2019). How to determine the number of factors to retain in exploratory factor analysis: A comparison of extraction methods under realistic conditions. Psychological Methods, 24(4), 468–491. https://doi.org/10.1037/met0000200
Article PubMed Google Scholar
Bentler, P. M. (1990). Comparative fit indexes in structural models. Psychological Bulletin, 107, 238–246. https://doi.org/10.1037/0033-2909.107.2.238
Article PubMed Google Scholar
Box, G. E. P., & Tiao, G. C. (1973). Bayesian inference in statistical analysis. Addison-Wesley.
Google Scholar
Casella, G. (2001). Empirical Bayes Gibbs sampling. Biostatistics, 2(4), 485–500. https://doi.org/10.1093/biostatistics/2.4.485
Article PubMed Google Scholar
Casella, G., & George, E. I. (1992). Explaining the Gibbs sampler. The American Statistician, 46(3), 167–174. https://doi.org/10.1080/00031305.1992.10475878
Article Google Scholar
Chen, J. (2020). A partially confirmatory approach to the multidimensional item response theory with the Bayesian lasso. Psychometrika, 85(3), 738–774. https://doi.org/10.1007/s11336-020-09724-3
Article PubMed Google Scholar
Chen, J. (2021a). A Bayesian regularized approach to exploratory factor analysis in one step. Structural Equation Modeling: A Multidisciplinary Journal, 28(4), 518–528. https://doi.org/10.1080/10705511.2020.1854763
Article Google Scholar
Chen, J. (2021b). A generalized partially confirmatory factor analysis framework with mixed Bayesian lasso methods. Multivariate Behavioral Research. https://doi.org/10.1080/00273171.2021.1925520
Chen, J. (2021c). LAWBL: Latent (variable) analysis with Bayesian learning (R package version 1.4.0). Retrieved from https://CRAN.R-project.org/package=LAWBL
Chen, J. (2022). Partially confirmatory approach to factor analysis with Bayesian learning: A LAWBL tutorial. Structural Equation Modeling: A Multidisciplinary Journal. https://doi.org/10.1080/00273171.2021.1925520
Chen, J., Guo, Z., Zhang, L., & Pan, J. (2021). A partially confirmatory approach to scale development with the Bayesian lasso. Psychological Methods, 26(2), 210–235. https://doi.org/10.1037/met0000293
Article Google Scholar
Choi, J., Zou, H., & Oehlert, G. (2010). A penalized maximum likelihood approach to sparse factor analysis. Statistics and Its Interface, 3(4), 429–436. https://doi.org/10.4310/SII.2010.v3.n4.a1
Article Google Scholar
Conti, G., Frühwirth-Schnatter, S., Heckman, J. J., & Piatek, R. (2014). Bayesian exploratory factor analysis. Journal of Econometrics, 183(1), 31–57.
Article PubMed PubMed Central Google Scholar
Feng, X.-N., Wu, H.-T., & Song, X.-Y. (2017). Bayesian regularized multivariate generalized latent variable models. Structural Equation Modeling: A Multidisciplinary Journal, 24(3), 341–358. https://doi.org/10.1080/10705511.2016.1257353
Article Google Scholar
Frühwirth-Schnatter, S., & Lopes, H. F. (2018). Sparse Bayesian factor analysis when the number of factors is unknown. Retrieved from https://arxiv.org/abs/1804.04231
Gelman, A. (1996). Inference and monitoring convergence. In W. R. Gilks, S. Richardson, & D. J. Spiegelharter (Eds.), Markov chain Monte Carlo in practice (pp. 131–144). Chapman & Hall.
Google Scholar
Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721–741. https://doi.org/10.1109/TPAMI.1984.4767596
Article PubMed Google Scholar
Gilks, W. R., Richardson, S., & Spiegelhalter, D. J. (Eds.). (1996). Markov chain Monte Carlo in practice. Chapman & Hall.
Google Scholar
Green, S. B., Thompson, M. S., Levy, R., & Lo, W.-J. (2015). Type I and type II error rates and overall accuracy of the revised parallel analysis method for determining the number of factors. Educational and Psychological Measurement, 75, 428–457. https://doi.org/10.1177/0013164414546566
Article PubMed Google Scholar
Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their application. Biometrika, 57, 97–109. https://doi.org/10.1093/biomet/57.1.97
Article Google Scholar
Heintz, S. (2017). Putting a spotlight on daily humor behaviors: Dimensionality and relationships with personality, subjective well-being, and humor styles. Personality and Individual Differences, 104.
Hirose, K., & Yamamoto, M. (2015). Sparse estimation via nonconcave penalized likelihood in factor analysis model. Statistics and Computing, 25(5), 863–875. https://doi.org/10.1007/s11222-014-9458-0
Article Google Scholar
Huang, P.-H., Chen, H., & Weng, L.-J. (2017). A penalized likelihood method for structural equation modeling. Psychometrika, 82(2), 329–354. https://doi.org/10.1007/s11336-017-9566-9
Article PubMed Google Scholar
IEA. (2011). TIMSS 2011 student questionnaire (grade 8). International Association for the Evaluation of Educational Achievement.
Google Scholar
Ishwaran, H., & Rao, J. S. (2005). Spike and slab variable selection: Frequentist and Bayesian strategies. The Annals of Statistics and Computing, 33, 730–773. https://doi.org/10.1214/009053604000001147
Article Google Scholar
Jacobucci, R., & Grimm, K. J. (2018). Comparison of frequentist and Bayesian regularization in structural equation modeling. Structural Equation Modeling, 25(4), 639–649. https://doi.org/10.1080/10705511.2017.1410822
Article PubMed PubMed Central Google Scholar
Jacobucci, R., Grimm, K. J., & McArdle, J. J. (2016). Regularized structural equation modeling. Structural Equation Modeling, 23(4), 555–566. https://doi.org/10.1080/10705511.2016.1154793
Article PubMed PubMed Central Google Scholar
Jennrich, R. I., & Sampson, P. F. (1966). Rotation for simple loadings. Psychometrika, 31, 313–323. https://doi.org/10.1007/BF02289465
Article PubMed Google Scholar
Jöreskog, K. G. (1969). A general approach to confirmatory maximum likelihood factor analysis. Psychometrika, 34, 183–202. https://doi.org/10.1007/BF02289343
Article Google Scholar
Kuo, L., & Mallick, B. (1998). Variable selection for regression models. Sankhyā: The Indian journal of statistics. Series B (1960–2002), 60, 65–81.
Google Scholar
Kyung, M., Gill, J., Ghosh, M., & Casella, G. (2010). Penalized regression, standard errors, and Bayesian lassos. Bayesian Analysis, 5, 369–411. https://doi.org/10.1214/10-BA607
Article Google Scholar
Ledermann, W. (1937). On the rank of the reduced correlational matrix in multiple-factor analysis. Psychometrika, 2, 85–93.
Article Google Scholar
Liquet, B., Mengersen, K., Pettitt, A., & Sutton, M. (2017). Bayesian variable selection regression of multivariate responses for group data. Bayesian. Analysis, 12(4), 1039–1067. https://doi.org/10.1214/17-BA1081
Article Google Scholar
Liu, X. (2008). Parameter expansion for sampling a correlation matrix: An efficient GPX-RPMH algorithm. Journal of Statistical Computation and Simulation, 78, 1065–1076. https://doi.org/10.1080/00949650701519635
Article Google Scholar
Liu, X., & Daniels, M. J. (2006). A new efficient algorithm for sampling a correlation matrix based on parameter expansion and re-parameterization. Journal of Computational and Graphical Statistics, 15, 897–914. https://doi.org/10.1198/106186006X160681
Article Google Scholar
Lorenzo-Seva, U., Timmerman, M. E., & Kiers, H. A. (2011). The Hull method for selecting the number of common factors. Multivariate Behavioral Research, 46, 340–364. https://doi.org/10.1080/00273171.2011.564527
Article PubMed Google Scholar
Lu, Z. H., Chow, S. M., & Loken, E. (2016). Bayesian factor analysis as a variable-selection problem: Alternative priors and consequences. Multivariate Behavioral Research, 51, 519–539. https://doi.org/10.1080/00273171.2016.1168279
Article PubMed PubMed Central Google Scholar
Martin, R. A., Puhlik-Doris, P., Larsen, G., Gray, J., & Weir, K. (2003). Individual differences in uses of humor and their relation to psychological well-being: Development of the humor styles questionnaire. Journal of Research in Personality, 37, 48–75. https://doi.org/10.1016/S0092-6566(02)00534-2
Article Google Scholar
Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., & Teller, E. (1953). Equations of state calculations by fast computing machine. Journal of Chemical Physics, 21, 1087–1091. https://doi.org/10.1063/1.1699114
Article Google Scholar
Mitchell, T. J., & Beauchamp, J. J. (1988). Bayesian variable selection in linear regression. Journal of the American Statistical Association, 83, 1023–1032. https://doi.org/10.1080/01621459.1988.10478694
Article Google Scholar
Muthén, B. O., & Asparouhov, T. (2012). Bayesian structural equation modeling: A more flexible representation of substantive theory. Psychological Methods, 17, 313–335. https://doi.org/10.1037/a0026802
Article PubMed Google Scholar
Park, T., & Casella, G. (2008). The Bayesian lasso. Journal of the American Statistical Association, 103, 681–686. https://doi.org/10.1198/016214508000000337
Article Google Scholar
R Development Core Team. (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing (URL http://www.R-project.org
Google Scholar
Simon, N., Friedman, J., Hastie, T., & Tibshirani, R. (2013). A sparse-group lasso. Journal of Computational and Graphical Statistics, 22(2), 231–245. https://doi.org/10.1080/10618600.2012.681250
Article Google Scholar
Spiegelhalter, D. J., Best, N. G., Carlin, B. P., & van der Linde, A. (2002). Bayesian measures of model complexity and fit (with discussion). Journal of the Royal Statistical Society Series B, 64(4), 583–639. https://doi.org/10.1111/1467-9868.00353
Article Google Scholar
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B, 58, 267–288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x
Article Google Scholar
Trendafilov, N. T., Fontanella, S., & Adachi, K. (2017). Sparse exploratory factor analysis. Psychometrika, 82(3), 778–794. https://doi.org/10.1007/s11336-017-9575-8
Article Google Scholar
Wang, H. (2012). Bayesian graphical lasso models and efficient posterior computation. Bayesian Analysis, 7, 867–886. https://doi.org/10.1214/12-BA729
Article Google Scholar
Xu, X., & Ghosh, M. (2015). Bayesian variable selection and estimation for group lasso. Bayesian Analysis, 10(4), 909–936. https://doi.org/10.1214/14-BA929
Article Google Scholar
Yuan, M., & Lin, Y. (2005). Efficient empirical Bayes variable selection and estimation in linear models. Journal of the American Statistical Association, 100(472), 1215–1225. https://doi.org/10.1198/016214505000000367
Article Google Scholar
Yuan, M., & Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society Series B (Statistical Methodology), 68(1), 49–67. https://doi.org/10.1111/j.1467-9868.2005.00532.x
Article Google Scholar

Download references

Funding

This work was supported by a General Research Fund Grant (17603022) from the Hong Kong Research Grants Council.

Author information

Authors and Affiliations

Faculty of Education, The University of Hong Kong, Room 420, 4/F, Meng Wah Complex, Pokfulam Road, Pokfulam, Hong Kong
Jinsong Chen

Authors

Jinsong Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jinsong Chen.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix A: Joint and Conditional Posteriors

Similar to Xu and Gosh (2015), the joint posterior of b = (b₁, …, b_K)^T, τ = (τ_jk)_J×K, π₀, π₁, Ψ, Ω, and Φ conditional on the data Y is proportional to the multiplication of individual prior distribution and the likelihood of the data based on Bayes’ rule, as:

$$\mathrm{p}\left(\mathbf{b},\boldsymbol{\uptau}, {\pi}_0,{\pi}_1,\boldsymbol{\Psi}, \boldsymbol{\Omega}, \boldsymbol{\Phi} |\mathbf{Y}\right)$$

$$\kern0.75em \propto {\left|\boldsymbol{\Psi} \right|}^{-\frac{N}{2}}\exp \left\{-\frac{1}{2}\mathrm{Tr}\left[\left(\mathbf{Y}-{\sum}_{k=1}^K{\boldsymbol{\Omega}}_{\mathrm{k}}{\left({\mathbf{V}}_{\mathrm{k}}^{\frac{1}{2}}{\mathbf{b}}_{\mathrm{k}}\right)}^{\mathrm{T}}\right){\boldsymbol{\Psi}}^{-1}{\left(\mathbf{Y}-{\sum}_{k=1}^K{\boldsymbol{\Omega}}_{\mathrm{k}}{\left({\mathbf{V}}_{\mathrm{k}}^{\frac{1}{2}}{\mathbf{b}}_{\mathrm{k}}\right)}^{\mathrm{T}}\right)}^{\mathrm{T}}\right]\right\}$$

$$\kern0.75em \times {\prod}_{k=1}^K\left[\left(1-{\pi}_0\right)\left(2\uppi \left){}^{-\frac{J}{2}}\exp \left\{-\frac{{\mathbf{b}}_{\mathrm{k}}^{\mathrm{T}}{\mathbf{b}}_{\mathrm{k}}}{2}\right\}\mathrm{I}\right({\mathbf{b}}_{\mathrm{k}}\ne \mathbf{0}\right)+{\pi}_0{\updelta}_0\left({\mathbf{b}}_{\mathrm{k}}\right)\right]$$

$$\kern0.75em \times {\prod}_{k=1}^K{\prod}_{j=1}^J\left[2\left(1-{\pi}_1\right)\left(2\uppi {s}_k^2\left){}^{-\frac{1}{2}}\exp \left\{-\frac{\tau_{jk}^2}{2{s}_k^2}\right\}\mathrm{I}\right({\tau}_{kj}>0\right)+{\pi}_1{\updelta}_0\left({\uptau}_{\mathrm{jk}}\right)\right]$$

$$\kern0.75em \times\,{\pi}_0^{a_{01}-1}{\left(1-{\pi}_0\right)}^{a_{02}-1}\times {\pi}_1^{a_{11}-1}{\left(1-{\pi}_1\right)}^{a_{12}-1}\times {\prod}_{k=1}^K\left[{r}_k{\left({s}_k^2\right)}^{-2}\exp \left\{-\frac{r_k}{s_k^2}\right\}\right]$$

$$\kern0.75em \times {\prod}_{j=1}^J\left[{\left({\psi}_{jj}^2\right)}^{-{d}_{j1}-1}\exp \left\{-\frac{d_{j2}}{\psi_{jj}^2}\right\}\right]$$

$$\kern0.75em \times\,{\left|\boldsymbol{\Phi} \right|}^{-N/2}\exp \left\{-\frac{1}{2}\mathrm{Tr}\left[{\boldsymbol{\Phi}}^{-1}\left({\boldsymbol{\Omega}}^{\mathrm{T}}\boldsymbol{\Omega} \right)\right]\right\}\times {\left|{\boldsymbol{\Phi}}^{\ast}\right|}^{-\left({q}_0+K+1\right)/2}\exp \left\{-\frac{1}{2}\mathrm{Tr}\left[{\mathbf{S}}_0^{-1}{\boldsymbol{\Phi}}^{\ast -1}\right]\right\},$$

where I(A) is an indicator function that takes the value 1 if A is true and 0 otherwise.

Although the joint posterior distribution has a complicated form difficult to handle, most resulting full conditionals are standard distributions that can be directly sampled from with Gibbs sampler or block Gibbs sampler, as follows:

The posterior distribution of b_k conditional on everything else is still a multivariate spike and slab distribution:

$${\mathbf{b}}_k\mid \mathrm{rest}\sim \left(1-{l}_k\right){N}_J\left({\boldsymbol{\upmu}}_k,{\boldsymbol{\Sigma}}_k\right)+{l}_k{\delta}_0\left({\mathbf{b}}_k\right),$$

(A1)

where ${\boldsymbol{\upmu}}_k={\boldsymbol{\Psi}}^{-1}{\boldsymbol{\Sigma}}_k{\mathbf{V}}_k^{1/2}{\mathbf{Y}}^{\mathrm{T}}{\boldsymbol{\Omega}}_k$, and ${\boldsymbol{\Sigma}}_k={\left({\mathbf{I}}_J+{\boldsymbol{\Psi}}^{-1}{\mathbf{V}}_k^{1/2}\left({\boldsymbol{\Omega}}_k^{\mathrm{T}}{\boldsymbol{\Omega}}_k\right){\mathbf{V}}_k^{1/2}\right)}^{-1}$. l_k is the posterior probability of b_k = 0 given the remaining parameters, as:

$${l}_k=p\left({\mathbf{b}}_k=\mathbf{0}|\mathrm{rest}\right)=\frac{\pi_0}{\pi_0+\left(1-{\pi}_0\right){\left|{\boldsymbol{\Sigma}}_k\right|}^{1/2}\mathrm{exp}\left\{\mathrm{T}\mathrm{r}\left[{\left({\boldsymbol{\upmu}}_k\right)}^{\mathrm{T}}{\boldsymbol{\Sigma}}_k^{-1}{\boldsymbol{\upmu}}_k\right]/2\right\}}.$$

(A2)

The conditional posterior of τ_jk is also a spike and slab distribution:

$${\tau}_{jk}\mid \mathrm{rest}\sim \left(1-{m}_{jk}\right){N}^{+}\left({u}_{jk},{v}_{jk}^2\right)+{m}_{jk}{\delta}_0\left({\tau}_{jk}\right),$$

(A3)

where ${u}_{jk}={\psi}_{jj}^{-1}{v}_{jk}^2\left({\mathbf{Y}}_j-{\sum}_{k^\prime \ne k}{\lambda}_{jk^\prime }{\boldsymbol{\Omega}}_{k^{\prime}}\right){\boldsymbol{\Omega}}_k^{\mathrm{T}}$, ${v}_{jk}^2={\left(\frac{1}{s_k^2}+{\psi}_{jj}^{-1}{\boldsymbol{\Omega}}_k^{\mathrm{T}}{\boldsymbol{\Omega}}_k{b}_{jk}^2\right)}^{-1}$, and

$${m}_{jk}=p\left({\tau}_{jk}=\mathbf{0}|\mathrm{rest}\right)=\frac{\pi_1}{\pi_1+2\left(1-{\pi}_1\right){\left({s}_k^2\right)}^{-1/2}{\left({v}_{jk}^2\right)}^{1/2}\mathrm{exp}\left(\frac{u_{jk}^2}{2{v}_{jk}^2}\right)\boldsymbol{\Phi} \left(\frac{u_{jk}}{v_{jk}}\right)},$$

(A4)

with Φ as the standard cumulative normal distribution function.

The conditional posteriors of π₀ and π₁ continue to be Beta distributions:

$${\displaystyle \begin{array}{c}\kern1em {\pi}_0\mid \mathrm{rest}\sim \mathrm{Beta}\left(\#\left({\mathbf{b}}_k=\mathbf{0}\right)+{a}_{01},\#\left({\mathbf{b}}_k\ne \mathbf{0}\right)+{a}_{02}\right)\\ {}{\pi}_1\mid \mathrm{rest}\sim \mathrm{Beta}\left(\#\left({\tau}_{kj}=0\right)+{a}_{11},\#\left({\tau}_{kj}\ne 0\right)+{a}_{12}\right).\end{array}}$$

(A5)

Alternatively, one can fix π₀ and/or π₁ as .5, and the differences are usually trivial.

The conditional posterior of ${s}_k^2$ is still an inverse gamma distribution:

$${s}_k^2\mid \mathrm{rest}\sim \mathrm{Inv}-\mathrm{Gamma}\left(1+\frac{1}{2}\#\left({\tau}_{jk}=0\right),{r}_k+\frac{1}{2}\sum_{j=1}^J{\tau}_{jk}^2\right).$$

(A6)

The conditional posterior of ψ_jj is also an inverse gamma distribution:

$${\psi}_{jj}\mid \mathrm{rest}\sim \mathrm{Inv-Gamma}\left({d}_{j1}+\frac{N-1}{2},{d}_{j2}+\frac{1}{2}\left({\mathbf{Y}}_j-{\sum}_{k=1}^K{\boldsymbol{\Omega}}_k{\boldsymbol{\Lambda}}_k^{\mathrm{T}}\right)\right).$$

(A7)

The conditional posterior of Ω is still a multivariate normal distribution:

$${\boldsymbol{\upomega}}_i\mid \mathrm{rest}\sim {N}_K\left({\left({\boldsymbol{\Phi}}^{-1}+{\boldsymbol{\Lambda}}^{\mathrm{T}}{\boldsymbol{\Psi}}^{-1}\boldsymbol{\Lambda} \right)}^{-1}{\boldsymbol{\Lambda}}^{\mathrm{T}}{\boldsymbol{\Psi}}^{-1}{\mathbf{y}}_i,{\left({\boldsymbol{\Phi}}^{-1}+{\boldsymbol{\Lambda}}^{\mathrm{T}}{\boldsymbol{\Psi}}^{-1}\boldsymbol{\Lambda} \right)}^{-1}\right).$$

(A8)

To obtain Φ, one can first sample from the conditional posterior of Φ^∗ as $\mathrm{Inv}-\mathrm{Wishart}\left({\mathbf{S}}_N^{-1},{q}_N\right)$, where q_N = q₀ + N and S_N = S₀ + Ω^TΩ. Then one can obtain a provisional Φ as Φ = D⁻¹Φ^∗D⁻¹, where $\mathbf{D}=\operatorname{diag}{\left({\phi}_{11}^{\ast },\cdots, {\phi}_{KK}^{\ast}\right)}^{1/2}$. The provisional Φ is accepted with a probability of $\mathit{\min}\left({\left(\frac{\mid {\boldsymbol{\Phi}}^{(t)}\mid }{\mid {\boldsymbol{\Phi}}^{\left(t-1\right)}\mid}\right)}^{\left(K+1\right)/2},1\right)$.

An empirical Bayes–Gibbs sampling or Monte Carlo Expectation–Maximization algorithm (Casella, 2001; Park & Casella, 2008) can be used to estimate r_k. For the t-th Expectation-Maximization update,

$${r}_k^{(t)}=\frac{1}{E_{r_k^{\left(t-1\right)}}\left(\frac{1}{s_k^2}|\mathbf{Y}\right)},$$

(A9)

where the posterior expectation of $1/{s}_k^2$ is estimated from the Gibbs samples based on ${r}_k^{\left(t-1\right)}$.

Table 12 Loading Estimates for PEFA with ϕ₁₂ = .4 and N = 1000

Full size table

Table 13 Humor Styles Questionnaire

Full size table

Table 14 Eigenvalues and Factorial Correlation for the Humor Styles Questionnaire

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, J. Fully and partially exploratory factor analysis with bi-level Bayesian regularization. Behav Res 55, 2125–2142 (2023). https://doi.org/10.3758/s13428-022-01884-7

Download citation

Accepted: 20 May 2022
Published: 12 July 2022
Issue Date: June 2023
DOI: https://doi.org/10.3758/s13428-022-01884-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Fully and partially exploratory factor analysis with bi-level Bayesian regularization

Abstract

Similar content being viewed by others

Reporting reliability, convergent and discriminant validity with structural equation modeling: A review and best-practice recommendations

A new criterion for assessing discriminant validity in variance-based structural equation modeling

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

Theoretical framework

Bayesian sparse group selection

Fully exploratory factor analysis with spike-and-slab prior

Model identification and stableness

Full rank matrix

Scale and sign indeterminacy

Rotational indeterminacy and column switch

Uniqueness prerequisite

Cut-off value for factor selection

Local dependence and parameter overflow

Posterior analysis and MCMC estimation

Extension to partially exploratory factor analysis

Simulation studies

Study 1: Evaluation of factor extraction

Study 2: Evaluation of parameter recovery

Real-life example

Study 3: Questionnaire of mathematics learning

Study 4: Humor styles questionnaire

Discussion

Practical implications and suggestions

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendix

Appendix A: Joint and Conditional Posteriors

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Fully and partially exploratory factor analysis with bi-level Bayesian regularization

Abstract

Similar content being viewed by others

Reporting reliability, convergent and discriminant validity with structural equation modeling: A review and best-practice recommendations

A new criterion for assessing discriminant validity in variance-based structural equation modeling

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

Theoretical framework

Bayesian sparse group selection

Fully exploratory factor analysis with spike-and-slab prior

Model identification and stableness

Full rank matrix

Scale and sign indeterminacy

Rotational indeterminacy and column switch

Uniqueness prerequisite

Cut-off value for factor selection

Local dependence and parameter overflow

Posterior analysis and MCMC estimation

Extension to partially exploratory factor analysis

Simulation studies

Study 1: Evaluation of factor extraction

Study 2: Evaluation of parameter recovery

Real-life example

Study 3: Questionnaire of mathematics learning

Study 4: Humor styles questionnaire

Discussion

Practical implications and suggestions

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendix

Appendix

Appendix A: Joint and Conditional Posteriors

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation