In social and behavioral research involving latent variables or factors, factor analysis (FA) models are widely adopted with two typical modeling approaches. The exploratory approach characterized by exploratory FA (EFA; Jennrich & Sampson, 1966) is data-driven with little substantive knowledge before modeling, whereas the confirmatory approach represented by confirmatory FA (CFA; Jöreskog, 1969) is theory-driven with a confirmatory structure. In general, however, there is a vast amount of space between the fully exploratory end where even the number of factors is unclear and the fully confirmatory end where every loading is either fixed or specified. Recent development of regularization methods enables more flexibility to cover a wider space in between the two ends. With penalty, statistical regularization imposes a cost on the optimization function to achieve a sparse model that is simpler and more interpretable. One typical example is the L1-norm penalty, or the least absolute shrinkage and selection operator (Lasso), which can shrink unspecified parameters towards zero to obtain a sparse model (Tibshirani, 1996).

Statistical regularization is more successful towards the confirmatory end with a confirmatory part and known number of factors, either within the frequentist (e.g., Huang et al., 2017; Jacobucci et al., 2016) or Bayesian (e.g., Feng et al., 2017; Lu et al., 2016; Muthén & Asparouhov, 2012) frameworks. Under both frameworks, coefficient or loading estimation is usually conceptualized as a regularized variable-selection problem under the assumption of local independence. Both can result in sparse models and are supplemental to each other to some extent (Jacobucci & Grimm, 2018). More recently, a partially confirmatory approach offers more flexibility to regularize the loading pattern and local dependence (i.e., correlated residual) simultaneously (Chen et al., 2021).

On the exploratory end, where the number of factors is unclear, however, the benefit of regularization is relatively limited. The frequentist framework is often adopted in EFA regularization (e.g., Choi et al., 2010; Hirose & Yamamoto, 2015; Trendafilov et al., 2017). The principal component analysis (PCA)-like decomposition of orthonormal factors with penalized likelihood-type methods are usually adopted to shrink the loading vector, rather than individual loading, towards zero. However, their advantages are blurred over non-regularized EFA in the frequentist (see Auerswald & Moshagen, 2019 for more details) or Bayesian (e.g., Conti et al., 2014; Frühwirth-Schnatter & Lopes, 2018) frameworks. For instance, the sparse EFA (SEFA; Trendafilov et al., 2017) based on special reparameterization and L1-norm penalty on the loading vector can achieve sparse model and factor selection at the same time, representing the current frontiers of the field. However, it only offers orthogonal solutions, which require post hoc rotation for correlated factors. Moreover, the shrinkage parameters need to be found additionally, usually by cross-validation or employing information criteria, and interval estimates of model parameters are not available.

Recently, a one-step Bayesian regularized exploratory factor analysis (BREFA) model with bi-level Bayesian regularization was introduced (Chen, 2021a). It demonstrated clear advantages over traditional multi-step EFA where the number of factors was first extracted, followed by factor rotation and parameter estimation. Based on idea similar to bi-level Bayesian sparse group selection (Xu & Ghosh, 2015) with the spike-and-slab priors (Ishwaran & Rao, 2005; Mitchell & Beauchamp, 1988), BREFA enables factor selection with a sparse model by conceptualizing the factor and loading as the group and individual levels, respectively. It can extract factors and estimate model parameters in one step. The tuning parameters can be also estimated at the same time, and interval estimates are available for significant test in case needed.

However, BREFA can be unidentified or unstable due to issues such as sign indeterminacy, column switch, and local dependence, which will be explained in more detail in the Model Identification and Stableness subsection. In this research, the BREFA will be revised to address the above issues and the revised model will be called fully EFA (FEFA). Moreover, the FEFA will be further extended as a partially EFA (PEFA) model where the loading pattern can be partially specified together with unknown number of factors. The extension provides more flexibility on the exploratory end to accommodate partial knowledge. Simulation studies and real-data analyses will be conducted to evaluate the performance of both models under different conditions, including the interference of local dependence. Comparisons between the PEFA and FEFA (and hence the BREFA) will be considered aiming to give applied researchers more guidance under an exploratory setting. Finally, both models have been implemented in the free-of-charge R package LAWBL (Chen, 2021c), making them more accessible to applied researchers.

For both models, the loading vector is reparameterized to tackle model sparsity at the factor and loading levels. Then, the multivariate spike-and-slab priors (Yuan & Lin, 2005) with the posterior median estimator (PME) are used at both levels, which can produce exact zero estimates. When factor selection is irrelevant, it reduces to the single-level regularization on the loading pattern and is similar to Lu et al. (2016)’s sparse CFA with spike-and-slab prior on the confirmatory end (the estimation of factorial correlations and shrinkage parameters are different though). The full Bayesian hierarchical formulation can be implemented with the resampling-based Markov chain Monte Carlo (MCMC; Gilks et al., 1996) estimation, which was mixed with the Gibbs sampling (Casella & George, 1992; Geman & Geman, 1984), empirical Bayes Gibbs sampling (Casella, 2001; Park & Casella, 2008), and Metropolis–Hastings algorithm (Hastings, 1970; Metropolis et al., 1953).

Theoretical framework

Bayesian sparse group selection

The solution for factor and loading selection in EFA is associated with the sparse Lasso-type variable selection and estimation for group data at both the group and individual levels.

Consider the following linear regression with G groups, each with a group coefficient vector βg of length mg, as:

$$y_{n\times1}=\sum\nolimits_{g=1}^G{\mathbf X}_g{\mathbf\beta}_g+\mathbf\epsilon,$$
(1)

where the error distribution is εn×1Nn(0, σ2In), In is the n-dimensional identity matrix, and Xg is an n × mg covariate matrix corresponding to the group βg, g = 1, 2, . . . , G. Extending the group-level Lasso method (Yuan & Lin, 2006), the sparse group Lasso estimator was provided (Simon et al., 2013) to shrink the coefficient vector towards zero at both the group and individual levels, as:

$$\hat{\beta}=\underset{\beta }{\mathrm{argmin}}\left({\left\Vert y-{\sum}_{g=1}^G\mathbf{X}_g\mathbf{\upbeta}_g\right\Vert}_2^2+{\gamma}_1{\left\Vert \beta \right\Vert}_1+{\gamma}_2{\sum}_{g=1}^G{\left\Vert \mathbf{\upbeta}_g\right\Vert}_2\right),$$
(2)

where β is any individual element in the vector βg, γ1 and γ2 are the shrinkage parameters at the individual and group levels, respectively. According to Xu and Ghosh (2015), Eq. (2) can be reformulated as the Bayesian sparse group Lasso with the prior:

$$p\left(\beta \right)\propto \exp \left(-{\gamma}_1{\left\Vert \beta \right\Vert}_1-{\gamma}_2{\sum}_{g=1}^G{\left\Vert \mathbf{\upbeta}_g\right\Vert}_2\right),$$
(3)

which is essentially a scale mixture of normals with an exponential mixing density.

The Bayesian sparse group Lasso can be extended with the spike-and-slab priors (SSP), which are multivariate zero inflated mixture priors mixed with point mass priors for the spike part and Lasso-type priors (with heavy tail) for the slab part (Yuan & Lin, 2005). The SSP can combine the power of point mass mixture priors and double exponential distributions in variable selection and estimation. As a result, the Bayesian sparse group selection with SSP (BSGL-SSP) is found to have desirable properties such as group-wise sparsity and within-group sparsity with exact zero estimates, optimal asymptotic estimation rate, and lower false-positive rate (Xu & Ghosh, 2015). Moreover, it can help eliminate the interference of spurious factors and maintain the stability of the resampling process under the EFA setting (Chen, 2021a). Accordingly, it will be employed and extended to construct the FEFA and PEFA in this study. Readers can refer to the literature for more information about the BSGL-SSP (e.g., Liquet et al., 2017; Xu & Ghosh, 2015).

Fully exploratory factor analysis with spike-and-slab prior

Formulation of the FEFA with SSP has been discussed in previous research (Chen, 2021a) and will be introduced here with related modifications for model identification and stableness. For a psychological test or scale with J items, K factors, and N examinees, the data matrix can be denoted as Y = (Y1, …, YJ) = (yij)N×J with vector yi =(yi1,··· , yiJ)T, where examinee i = 1, …, N and item j = 1, …, J. Underlying the response data, there assumed to be latent factors in the K-dimensional Euclidean space, Ω = (Ω1, …, ΩK) = (ωik)N × K with vector ωi = (ωi1,ωik…, ωiK)T and k = 1,…, K. Note that the K factors includes both true and spurious factors. A general FA model satisfies the following equation:

$${\boldsymbol{y}}_i=\boldsymbol{\Lambda} {\boldsymbol{\upomega}}_i+{\boldsymbol{\upepsilon}}_i,$$
(4)

where Λ = (Λ1, …, ΛK) = (λjk)J×K is the J×K loading matrix and εiNJ(0, Ψ) are the error terms with diagonal covariance matrix Ψ = diag(ψjj)J×J. Moreover, the factors are assumed to follow multivariate normal distributions, ωi ~ NK(0, Φ). Like classic factor analysis, it is assumed that Φ = (ϕkk)K × K is a correlation matrix for scale determinacy, and the data have been standardized and centered (i.e., there is no intercept term in the equation).

Under the FEFA setting, all elements in the loading matrix Λ are considered as unspecified and to be selected. Accordingly, two levels of regularization are of concern in Λ: 1) individual loading λjk and 2) loading vector Λk, which is on the factor level. The situation is comparable to the sparse group selection in BSGL-SSP, but with important difference: in sparse group selection, the regularized coefficients are regression on observed variables (i.e., data), whereas the regularized loadings are regression on latent variables (i.e., factors). Similar to the BSGL-SSP, Λk can be reparameterized to tackle the two concerns separately, as:

$${\boldsymbol{\Lambda}}_k={\mathbf{V}}_k^{1/2}{\mathbf{b}}_k,$$
(5)

where \({\mathbf{V}}_k^{1/2}=\operatorname{diag}\left\{{\tau}_{1k},\dots, {\tau}_{Jk}\right\}\) with τjk ≥ 0 is responsible for the magnitude of individual loading, and bk = (b1k, …, bJk)T controls the entire loading vector. To select the loading vector at the factor level, multivariate SSP can be assumed for bk, as:

$${\mathbf{b}}_k\sim \left(1-{\pi}_0\right){N}_J\left(\mathbf{0},{\mathbf{I}}_J\right)+{\pi}_0{\delta}_0\left({\mathbf{b}}_k\right),$$
(6)

where δ0(bk) represents a point mass at 0 for bk. It implies that bk is either zero (i.e., for spurious factor) or has a multivariate normal distribution with 0 mean and identity matrix as its covariance matrix.

To select individual loading λjk, the following SSP can be assumed for τjk:

$${\tau}_{jk}\sim \left(1-{\pi}_1\right){N}^{+}\left(0,{s}_k^2\right)+{\pi}_1{\delta}_0\left({\tau}_{jk}\right),$$
(7)

where \({N}^{+}\left(0,{s}_k^2\right)\) represents the normal distribution \(N\left(0,{s}_k^2\right)\) truncated below 0. It implies that the magnitude of individual loading is either zero or normally distributed above zero. To some extent, the shrinkage parameter in the SSP, \(1/{s}_k^2\), is equivalent to a composite of the individual-level and group-level shrinkage parameters in sparse group Lasso, namely, γ1 and γ2 in Eq. (3). Interested readers may refer to the literature for more details (e.g., Kuo & Mallick, 1998; Xu & Ghosh, 2015). Note that different from the BSGL-SSP, which has an orthogonal design, one shrinkage parameter per factor (i.e., loading vector) is adopted to address correlated factors. Hyper-prior can be assigned as \({s}_k^2\sim \mathrm{Inv}-\mathrm{Gamma}\left(1,{r}_k\right)\). Empirical Bayes–Gibbs sampling can be adopted when estimating rk (Appendix A).

For hyperparameters π0 and π1, conjugate hyper-priors can be used:

$${\pi}_0\sim \mathrm{Beta}\left({a}_{01},{a}_{02}\right),\kern0.5em {\pi}_1\sim \mathrm{Beta}\left({a}_{11},{a}_{12}\right),$$
(8)

where uniform hyper-priors will be applied with a01 = a02 = a11 = a12 = 1.For covariance matrix Ψ, an inverse gamma prior can be applied on its j-th diagonal term ψjj:

$${\psi}_{jj}\sim \mathrm{Inv}-\mathrm{Gamma}\left({d}_{j1},{d}_{j2}\right),$$
(9)

where the hyperparameters dj1 = 1 and dj2 = .01 will be set. Note that dj1 and dj2 determine the density of the inverse gamma distribution and accordingly the scale of ψjj. With larger values or stronger prior knowledge, the importance of the data likelihood in the full conditional distribution will decline.

The correlation matrix Φ can be obtained following Liu and his colleague’s approach (Liu, 2008; Liu & Daniels, 2006): first sample from a covariance matrix Φ* with a conjugate prior \(\mathrm{Inv}-\mathrm{Wishart}\left({\mathbf{S}}_0^{-1},{q}_0\right)\); then Φ* is transformed into a correlation matrix with a Metropolis–Hastings acceptance probability (see Appendix A for more details).

Model identification and stableness

The above model is not identifiable or becomes unstable during the resampling process without imposing additional constraints on the parameters. The identification issues mainly involve the uniqueness when decomposing the covariance structure as follows:

$$\mathrm{COV}\left({\mathbf{y}}_i\right)=\boldsymbol{\Lambda} \boldsymbol{\Phi} {\boldsymbol{\Lambda}}^{\mathrm{T}}+\boldsymbol{\Psi},$$
(10)

Similar to traditional EFA (Anderson & Rubin, 1956), two types of issues are of concern and related constraints are imposed: the identifiability of the residual covariance matrix Ψ and the loading matrix Λ. However, the FEFA with the bi-level regularization is less restrictive in some of the issues, and accordingly provides more flexibility in terms of model identification.

Full rank matrix

The (true) factorial covariance matrix needs to be of full rank, which implies that any two true factors cannot be highly correlated. Otherwise, the two factors are undistinguishable. Similarly, the true loading matrix should be of full (column) rank to be identifiable.

Scale and sign indeterminacy

It is evident that Eq. (4) remain unchanged when \(\widetilde{\boldsymbol{\Lambda}}=\boldsymbol{\Lambda} \mathbf{A}\) and \(\widetilde{\boldsymbol{\Omega}}={\mathbf{A}}^{-1}\boldsymbol{\Omega}\) where A can be arbitrary nonzero scalar or invertible matrix with dimension (K × K). It implies that the scale of the factor is indeterminate. In this research, Φ is constrained as a correlation matrix to fix the scale. Alternatively, one can estimate a covariance matrix while fixing one item per factor. However, such constraint is difficult to implement when both the number of factors and allocation of items to factors are unknown. Similarly, both the signs of Λk and factorial correlations are indeterminate since A can switch among the 2K diagonal matrices, diag(±1, …, ±1)K, which can be problematic during the resampling process. A constraint is imposed on the mean of the loading vector, \({\bar {\boldsymbol{\Lambda}}}_k\ge 0\), which allows the factorial correlations to be positive or negative.

Rotational indeterminacy and column switch

Since A can be also a rotation matrix, special modeling constraint is needed to maintain rotational invariance in conventional Bayesian EFA (e.g., Conti et al., 2014; Frühwirth-Schnatter & Lopes, 2018). The issue is of little concern for the FEFA though since the bi-level regularization always strives for a simpler model at both the factor and loading levels. The simplest solution prevails with or without rotational invariance, which was consistent with what we found in both simulation and real-data analyses.

Another issue is the column switch of the loading matrix during the resampling process, which is especially of concern under the Bayesian setting (e.g., Conti et al., 2014; Frühwirth-Schnatter & Lopes, 2018). We found the issue more salient with a single shrinkage parameter for all factors, but can be greatly alleviated by imposing an individual shrinkage parameter for each factor. Accordingly, FEFA with single sk for each factor k is largely immune to the issue of column switch, except for the trivial issue of column permutation in the loading matrix; that is, the positions of the true and estimated factors might not match each other.

Uniqueness prerequisite

To achieve uniqueness, there should be at least three nonzero loadings per factor in the true model (Anderson & Rubin, 1956). Moreover, for the uniqueness of Ψ, the maximum number of possible factors should be within the Ledermann bound, as \({K}_{\mathrm{M}}\le \left(2J+1-\sqrt{8J+1}\right)/2\) (Ledermann, 1937). Both constraints can be readily implemented in the algorithm.

Cut-off value for factor selection

Strictly speaking, factor k should be considered as spurious or unselected only if bk = 0 or the eigenvalue of the factor is exactly zero (i.e., \({\left\Vert {\boldsymbol{\Lambda}}_k\right\Vert}_2^2=0\)). In practice, however, this is not a good idea for two reasons: 1) it is important to ignore minor factors with small eigenvalues for the purpose of dimension reduction. Note that it is conventional to ignore factors with eigenvalues smaller than 1 under the classic EFA settings; and 2) estimates of zero loading vector can fluctuate during the resampling process due to various interferences (e.g., local dependence), making associated eigenvalues larger than zero. A more practical way is to consider factor k as spurious if \({\left\Vert {\hat{\boldsymbol{\Lambda}}}_k\right\Vert}_2^2<\upvarepsilon\)0 for some cut-off value ε0, which can be empirically determined.

Local dependence and parameter overflow

In most EFA settings, Ψ is assumed to be a diagonal matrix with local independence. In practice, however, correlated residuals can be inevitable to applied researchers. Local dependence can create spurious factors and interfere with the resampling process. We also found that the loading estimates can overflow (i.e., larger than one) with local dependence, further adding to model unstableness during the resampling process. When overflow occurs, one can roll back the associated loading estimates to those from a previous draw, assuming the previous draw is admissible. One possible pitfall is that the Markov chain might be stuck and stop moving, which should be extremely rare in practice and can be avoided by starting with different initial values. Together with the constraint of at least three nonzero loadings per factor and appropriate cut-off value for factor selection, it is expected that the FEFA will be robust to interference of local dependence, which usually involves an item pair. However, it also means that factors with only two items will be ignored.

Posterior analysis and MCMC estimation

A graphical representation of the model structure of different parameters is presented in Fig. 1. With the above model specification, the joint and conditional posteriors can be obtained as shown in Appendix A. Although the joint posterior distribution has a complicated form that is difficult to handle, most resulting full conditionals are standard distributions that can be directly sampled from with Gibbs sampler or block Gibbs sampler.

Fig. 1
figure 1

A directed acyclic graph for the FEFA. Note. SSP = spike and slab prior; IW = Inverse Wishart; IG = Inverse Gamma

Fig. 2
figure 2

A directed acyclic graph for the PEFA. Note. SSP = spike and slab prior; IW = Inverse Wishart; IG = Inverse Gamma

Note that for simultaneous selection and estimation in spike-and-slab-type models, the PME was suggested, which is a soft thresholding estimator with the oracle property of variable selection consistency and asymptotic normality (Xu & Ghosh, 2015). The eigenvalue based on the PME is also a soft thresholding estimator. The spike part leads to the median thresholding estimator that can select the loading vector and individual loading automatically and the soft thresholds depend on π0 and π1, respectively, while the hyperparameter in the slab part, \({s}_k^2\), decides the shrinkage of individual loading.

Multiple chains with different initial values can be run to monitor the convergence of the algorithm. After the burn-in period, the convergence for the parameters of interest can be determined using the estimated potential scale reduction (EPSR) value (Gelman, 1996). The standard errors of estimates can be characterized with the highest posterior density (HPD) intervals, or more specifically the 100(1−α)% HPD interval (Box & Tiao, 1973). For model comparisons, the deviance information criterion (DIC; Spiegelhalter et al., 2002) can be used.

Extension to partially exploratory factor analysis

In the FEFA above, no substantive knowledge is needed, as all elements in the loading matrix are unspecified (i.e., to be selected). In practice, however, one might want to incorporate partial knowledge. The FEFA can be extended as the PEFA where the loading pattern can be partially specified, together with unknown number of factors. The extension provides more flexibility on the exploratory end to accommodate partial knowledge.

Specifically, elements in the loading matrix can be fixed as zero, specified as free to estimate based on substantive knowledge (specified loading), or unspecified. When there is at least one specified loading per factor, no factor selection is needed and the reduced model is similar to other Bayesian CFA regularization methods (e.g., Lu et al., 2016; Muthén & Asparouhov, 2012).

Estimation of specified loadings can proceed item by item. For a specific item, one can rearrange the factors so that the specified and unspecified loadings can be partitioned. For item j, denote the number of specified loadings as \({K}_j^{\sim }\), the factor matrix as \({\boldsymbol{\Omega}}_j^{\sim }=\left({\boldsymbol{\Omega}}_1,\dots, {\boldsymbol{\Omega}}_{K_j^{\sim }}\right)\), and the specified loading vector as \({\boldsymbol{\uplambda}}_j^{\sim }={\left({\lambda_j}_1,\dots, {\lambda}_{j{K}_j^{\sim }}\right)}^T\). With a conjugate normal prior, \({\boldsymbol{\uplambda}}_j^{\sim}\sim {N}_{K_j^{\sim }}\left({\boldsymbol{\uplambda}}_{0j},{\mathbf{H}}_{0j}\right)\), the full conditional posterior of the specified loading vector can be derived as:

$${\lambda}_j^{\sim}\mid \mathrm{rest}\sim {N}_{K_j^{\sim }}\left({\boldsymbol{\Sigma}}_j^{\sim}\left({\psi}_{jj}^{-1}{\left({\Omega}_j^{\sim}\right)}^{\mathrm{T}}{Y}_j+{H}_{0j}^{-1}{\lambda}_{0j}\right),{\boldsymbol{\Sigma}}_j^{\sim}\right),$$
(11)

where \({\boldsymbol{\Sigma}}_j^{\sim }={\left({\psi}_{jj}^{-1}{\left({\boldsymbol{\Omega}}_j^{\sim}\right)}^{\mathrm{T}}{\boldsymbol{\Omega}}_j^{\sim }+{\mathbf{H}}_{0j}^{-1}\right)}^{-1}\).

Estimation of unspecified loadings can proceed factor by factor. For factors with specified loadings, the sparsity at the factor level is not of concern, and some prior and related posterior distributions need to be modified accordingly. This section presents only those that are adjusted while others are similar to those for FEFA. For a specific factor, one can also rearrange the items so that the specified and unspecified loadings can be partitioned. Assume there are K*K factors with at least one specified loading. For factor k, where k = 1, …, K*, denote the number of unspecified loadings as \({J}_k^{\ast }\) and the loading vector as \({\boldsymbol{\Lambda}}_k^{\ast }={\left({\lambda}_{1k},\dots, {\lambda}_{J_k^{\ast }k}\right)}^{\mathrm{T}}\). The factor-level loading vector can be reparameterized similar to the above, as:

$$\boldsymbol{\Lambda}_k^{\ast }=\mathbf{V}_k^{\ast 1/2}\mathbf{b}_k^{\ast },$$
(12)

where \({\mathbf{b}}_k^{\ast }={\left({b}_{1k},\dots, {b}_{J_k^{\ast }k}\right)}^{\mathrm{T}}\), \({\mathbf{V}}_k^{\ast 1/2}=\operatorname{diag}\left\{{\tau}_{1k},\cdots, {\tau}_{J_k^{\ast }k}\right\}\), and τjk ≥ 0. The prior distribution for \({\mathbf{b}}_k^{\ast }\) can be adjusted as:

$${\mathbf{b}}_k^{\ast}\sim {N}_{J_k^{\ast }}\left(\mathbf{0},{\mathbf{I}}_{J_k^{\ast }}\right).$$
(13)

The corresponding conditional posterior is changed as:

$${\mathbf{b}}_k^{\ast}\mid \mathrm{rest}\sim {N}_{J_k^{\ast }}\left({\boldsymbol{\upmu}}_k^{\ast },{\boldsymbol{\Sigma}}_k^{\ast}\right),$$
(14)

where \({\boldsymbol{\upmu}}_k^{\ast }={\boldsymbol{\Psi}}^{-1}{\boldsymbol{\Sigma}}_k^{\ast }{\mathbf{V}}_k^{\ast 1/2}{\mathbf{Y}}^{\mathrm{T}}{\boldsymbol{\Omega}}_k\) and \({\boldsymbol{\Sigma}}_k^{\ast }={\left({\mathbf{I}}_{J_k^{\ast }}+{\boldsymbol{\Psi}}^{-1}{\mathbf{V}}_k^{\ast 1/2}\left({\boldsymbol{\Omega}}_k^{\mathrm{T}}{\boldsymbol{\Omega}}_k\right){\mathbf{V}}_k^{\ast 1/2}\right)}^{-1}\). The conditional posterior of \({s}_k^2\) for factor k with specified loadings can be adjusted as:

$${s}_k^2\mid \mathrm{rest}\sim \mathrm{Inv}-\mathrm{Gamma}\left(1+\frac{1}{2}\#\left({\tau}_{jk}=0\right),{r}_k+\frac{1}{2}\left(\sum_{j=1}^{J_k^{\ast }}{\tau}_{jk}^2+J-{J}_k^{\ast}\right)\right).$$
(15)

For factors without specified loading, the posterior probability of bk = 0 (k ∉ K) given the remaining parameters can be adjusted as:

$${l}_k=p\left({\mathbf{b}}_k=\mathbf{0}|\mathrm{rest}\right)=\frac{\pi_0}{\pi_0+\left(1-{\pi}_0\right){\left|{\boldsymbol{\Sigma}}_k\right|}^{1/2}\mathrm{exp}\left\{\mathrm{T}\mathrm{r}\left[{\left({\boldsymbol{\upmu}}_k^{\sim}\right)}^{\mathrm{T}}{\boldsymbol{\Sigma}}_k^{-1}{\boldsymbol{\upmu}}_k^{\sim}\right]/2\right\}},$$
(16)

where \({\boldsymbol{\upmu}}_k^{\sim }={\boldsymbol{\Psi}}^{-1}{\boldsymbol{\Sigma}}_k{\mathbf{V}}_k^{1/2}{\left(\mathbf{Y}-{\sum}_{k^{\prime} \in {K}^{\ast }}{\boldsymbol{\Omega}}_{k^{\prime} }{\boldsymbol{\Lambda}}_{k^{\prime}}^{\mathrm{T}}\right)}^{\mathrm{T}}{\boldsymbol{\Omega}}_k\). The conditional posteriors of all other parameters are the same as those for FEFA. A graphical representation of the model structure of different parameters can be found in Fig. 2.

Existing literature (Chen et al., 2021; Kyung et al., 2010; Wang, 2012) can be followed in assigning the prior or hyper-prior values for both the FEFA and PEFA as, a01 = a02 = a11 = a12 = 1, dj1 = 1, dj2 =. 01, λ0j = 0, H0j = 4I, S0 = I+.1od, and q0 = K + 2, with I as the identity matrix and I +.1od as the matrix with diagonal elements as 1 and off-diagonal elements as .1. These hyperparameters are consistent with those described above. Finally, the identification conditions are similar to, but less restrictive than those in FEFA.

Simulation studies

Simulation studies were employed to evaluate the performance of the proposed models across different settings. Factor extraction and parameter recovery were evaluated through two studies. For data generation, data were simulated with three true factors, Kt = 3, and six items per factor, namely J = 18. The factor variance was set as 1, and three cases of factorial correlations were considered: ϕ12 = 0, .4, or .6 whereas ϕ13 = ϕ23 = .4 for all cases. The true loading matrix was simulated as:

$${\boldsymbol{\Lambda}}_{\mathrm{t}}^{\mathrm{T}}=\left[\begin{array}{c}{\lambda}_{11}\,{\lambda}_{21}\,{\lambda}_{31}\,{\lambda}_{41}\,{\lambda}_{51}\,{\lambda}_{61}\,{\lambda}_0\,{\lambda}_0\,{\lambda}_0\,{\lambda}_0\,{\lambda}_0\,{\lambda}_0\,{\lambda}_0\,{\lambda}_0\,{\lambda}_0\,{\lambda}_0\,{\lambda}_{17,1}\,{\lambda}_{18,1}\,\\ {}\begin{array}{c}{\lambda}_0\,{\lambda}_0\,{\lambda}_0\,{\lambda}_0\,{\lambda}_{52}\,{\lambda}_{62}\,{\lambda}_{72}\,{\lambda}_{82}\,{\lambda}_{92}\,{\lambda}_{10,2}\,{\lambda}_{11,2}\,{\lambda}_{12,2}\,{\lambda}_0\,{\lambda}_0\,{\lambda}_0\,{\lambda}_0\,{\lambda}_0\,{\lambda}_0\,\\ {}{\lambda}_0\,{\lambda}_0\,{\lambda}_0\,{\lambda}_0\,{\lambda}_0\,{\lambda}_0\,{\lambda}_0\,{\lambda}_0\,{\lambda}_0\,{\lambda}_0\,{\lambda}_{11,3}\,{\lambda}_{12,3}\,{\lambda}_{13,3}\,{\lambda}_{14,3}\,{\lambda}_{15,3}\,{\lambda}_{16,3}\,{\lambda}_{17,3}\,{\lambda}_{18,3}\end{array}\end{array}\right],$$

where the following major loadings were set as .7: λ11 to λ61, λ72 to λ12,2, and λ13,3 to λ18,3; the following minor loadings were set as .3: λ17,1, λ18,1, λ52, λ62, λ9,2, λ11,3, and λ12,3; and all other loadings were set as λ0 = 0. Namely, there were six major loadings and two minor loadings per factor. Under the psychological or educational context, a major loading is a loading connecting an item and the factor that the item is designed to measure. In contrast, a minor loading is a cross-loading that load on some factor unintentionally. For the case of local independence, all off-diagonal elements in Ψ were set as 0. Local dependence was also evaluated by setting ψ4,3 = ψ10,9 = ψ16,15 =ψ14,1 = ψ72 = ψ13,8 =.2 with a symmetric upper triangle. The sample size was set as N = 200 or 1000, and the number of replications was 200 for each simulation cell.

Both the FEFA and PEFA models were evaluated for factor extraction and parameter recovery, with eight possible factors, i.e., K = 8. For PEFA, the first three major loadings of the first two factors were specified while all other loadings were unspecified. For each replication, the computation time was about 1.5 and 1.6 min when N = 200, and about 2.8 and 3 min when N = 1000, for the FEFA and PEFA, respectively, on a laptop with an Intel Core i7 CPU. Note that a larger number of factors (e.g., K = 10) had been tried, with only trivial difference except for a longer running time.

For performance assessment, the bias of the parameter estimates (BIAS), the mean of the standard error estimates (SE), and the root mean squares error (RMSE) between the estimates and the true values were computed. The true positive rate (TPR) and false positive rate (FPR) for loading and factor selections were also reported. For loading and correlation estimation, the TPR and FPR indicate the proportion of times the true nonzero and zero values, respectively, were estimated as significantly different from zero at α = .05 based on the HPD interval. For factor extraction, the TPR and FPR indicate the proportion of times the true and spurious factor was selected or \({\left\Vert {\hat{\boldsymbol{\Lambda}}}_k\right\Vert}_2^2<\upvarepsilon\)0, respectively, for k = 1, …, K.

The prior and hyper-prior values were set as depicted in a previous section. In a preliminary study, the author found that small cut-off can be used (e.g., ε0 = .1) when there was no local dependence. With local dependence or to ignore minor factors, a larger cut-off was needed to obtain stabilized solution. When the cut-off was too large, however, one might ignore substantive factors or combine multiple factors as one. The sample size also played a role, and smaller size preferred smaller cut-off. The good news was that different cut-offs usually converged to the same solution, given the Markov chain can be stabilized and before reaching a cut-off that was too large. To be consistent with traditional EFA, ε0 was set as one in the simulation, which appeared to work well.

With the above settings, most Markov chains had reached stationary status (i.e., EPSR < 1.1) within 15,000 iterations, and the burn-in was set at 20,000 iterations. After the burn-in phase, parameters were estimated based on an additional 20,000 iterations. All programming was conducted on the R platform (R Development Core Team, 2020), using the LAWBL package (Chen, 2021c). Data were simulated and analyzed using the sim_lvm and pefa functions, respectively, in the package. More information and tutorial about the functions and package can be found online at https://jinsong-chen.github.io/LAWBL/.

Study 1: Evaluation of factor extraction

The first step was to evaluate if the right number of factors can be correctly identified. Two traditional methods were also evaluated for comparisons: The Hull method (Lorenzo-Seva et al., 2011) is based on the comparative fit index (Bentler, 1990) to assess the fit of each factor solution, which was superior to other implementations in Lorenzo-Seva et al. (2011); and the revised parallel analysis (PA-R; Green et al., 2015) implemented with the eigenvalues obtained from an EFA and the 95th percentile as reference eigenvalue based on 100 random samples. Both methods were also evaluated in Auerswald and Moshagen (2019)’s study.

As shown in Table 1, both FEFA and PEFA extracted the right number of factors most of the time, even under local dependence, and the latter was slightly better. Hull was also satisfactory in general and robust to interference of local dependence, but can be problematic when factorial correlation is high and the sample size is small. In contrast, PA-R was problematic when there was local dependence, regardless of factorial correlation and sample size.

Table 1 Accuracy of different factor extraction methods

In addition to the right number of factors, both FEFA and PEFA can also extract the true and spurious factors based on their eigenvalue. As mentioned, however, an FEFA model is identified up to the column permutation of the loading matrix, which means the position of the true and selected factors might not match each other. For each true factor k = 1,…, Kt, one can minimize the following to find its match in selected factor k’:

$$\underset{k^{\prime}\in {K}_{\mathrm{s}}}{\mathit{\min}}\left[{\sum}_{j=1}^J{\left({\lambda}_{jk}-{\hat{\lambda}}_{jk^{\prime}}\right)}^2\right],$$
(17)

where KsK contains all the selected factors.

As shown in Tables 2 and 3, the extractions of true and spurious factors were satisfactory in terms of bias, RMSE, and SE, and the estimates tended to be slightly worse with high factorial correlation, local dependence, or smaller sample size. The TPRs and FPRs were also satisfactory in all cases. However, one concern was that two factors can be mixed as one, which was more likely with higher factorial correlations. Another concern was that a spurious factor can be inflated and identified as true (i.e., the FPR was larger than zero) due to local dependence. Fortunately, the problematic cases were no more than a few percentages under both concerns.

Table 2 Factor extraction based on eigenvalues for FEFA, N = 200
Table 3 Factor extraction based on eigenvalues for FEFA, N = 1000

Study 2: Evaluation of parameter recovery

As shown in Tables 4 and 5, the estimates of factorial correlations were satisfactory regardless of the factorial correlation, local dependence, or sample size. It is interesting to note that, once a spurious factor was identified as true, factorial correlations among true factors tended to be lower than the true value, which might need more investigation in future study.

Table 4 Estimation of factorial correlations for FEFA, N = 200
Table 5 Estimation of factorial correlations for FEFA, N = 1000

The loading estimates were close across different factorial correlations (i.e., ϕ12 = 0, .4, or .6) and only the cases for ϕ12 = .4 were reported as shown in Tables 6 and 7 (the PEFA results were slightly better but still similar and can be found in Appendix Table 12). As shown, the estimates were satisfactory in general although they tended to be slightly worse with local dependence or smaller sample size. The TPRs and FPRs were also satisfactory in general, but the FPR for λ0 was slightly higher with local dependence and larger sample size, suggesting a slightly larger chance to inflate the zero loadings as significant. We also found that when two factors were mixed as one, the estimated loading vector for the mixed factor was similar to the addition of the two true loading vectors. However, more studies are needed on the issue, because there were only a few cases of mixed factors in existing research.

Table 6 Loading estimates for FEFA with ϕ12 = .4 and N = 200
Table 7 Loading estimates for FEFA with ϕ12 = .4 and N = 1000

Real-life example

Study 3: Questionnaire of mathematics learning

Table 8 gives potential items for a questionnaire of enjoyment, confidence (i.e., self-efficacy), utility (i.e., valuing), and anxiety in mathematics learning. A similar background questionnaire can be found in large-scale assessments such as the Trends in International Mathematics and Science Study (IEA, 2011). Based on the design, there should be four factors measured by 18 items, with major loadings for each item also given in the table.

Table 8 Student questionnaire of mathematics learning in school

Responses from 218 Chinese fifth-grade students were collected and analyzed using the FEFA and PEFA. For the PEFA, the first three major loadings in the first three factors were specified, whereas all other loadings were unspecified. All other settings and prior values were the same as those in the simulation study. For both models, the Marko chains had reached stationary status within 10,000 iterations. The burn-in was set at 20,000 iterations, and 20,000 more iterations were drawn for parameter estimation, which took about 2 and 3 min for the FEFA and PEFA, respectively, on a similar laptop.

Estimation results (Table 9) showed that only three factors were selected in the FEFA whereas the PEFA recovered all four factors with corresponding major loadings successfully. The DIC for the FEFA and PEFA were 2643 and 2483, respectively, confirming that the latter fit better. Combing with the high correlation between the first two factors (~.76) in the PEFA (Table 10), it is clear that the first two factors were mixed as one in the FEFA. One can also see that the loading estimates for Factor 1 in the FEFA were close to those for Factor 1 and 2 in the PEFA. Other than that, the two cases were not far away in terms of the point and interval estimates and the number of estimates that are significant or marginally significant. Meanwhile, the author obtained essentially the same results with lower cut-off (e.g., ε0 = .3). Finally, the number of factors one can extract using Hull and PA-R was one and three, respectively. The small number of factors based on Hull was consistent with the simulation finding when the sample size was small and factorial correlation was high.

Table 9 Loading estimates for the questionnaire of mathematics learning
Table 10 Factorial correlation for the questionnaire of mathematics learning

Study 4: Humor styles questionnaire

The Humor Styles Questionnaire (HSQ; Martin et al., 2003) with four factors and 32 items can be found in Appendix Table 13. Although the major loading of each item was confirmed, researchers were concerned about cross-loadings for some items, as the related behaviors tended to be multidimensional (Heintz, 2017). Public data for HSQ are available online at https://openpsychometrics.org/_rawdata/, and complete responses from 993 individuals were analyzed using the FEFA and PEFA, with other settings similar above.

The burn-in was set at 20,000 iterations, and 20,000 more iterations were drawn for parameter estimation, which took about 6 and 6.5 min for the FEFA and PEFA, respectively. The DIC for the FEFA and PEFA were 17,757 and 17,761, respectively, implying that they were essentially equivalent. Estimation results (Table 11) showed that both models recovered all four factors with corresponding major loadings successfully, with similar point and interval estimates for most loadings. It is worth noting that most minor loadings found significant were smaller than .2, suggesting the influence of the sample size. In brief, estimations of both models were approximate in terms of loadings, eigenvalues, and factorial correlations (Appendix Table 14). Meanwhile, the author won’t be able to obtain stable solution with substantively lower cut-off (e.g., ε0 = .6). Finally, the number of factors one can extract using Hull and PA-R was 4 and 13, respectively.

Table 11 Estimation results for the Humor Styles Questionnaire

Discussion

With the bi-level Bayesian regularization, the FEFA and PEFA can offer a series of benefits such as factor extraction and model sparseness in one step, simultaneous estimation of all model parameters including the shrinkage, both point and interval estimates, and incorporation of partial knowledge. Taken together, these benefits can improve the flexibility towards the exploratory end of factor analysis substantially. Simulation studies and real-data analyses demonstrated that both models performed satisfactorily under reasonable conditions and were robust to interference of local dependence, while the PEFA with appropriate information can outperform the FEFA and work well under more extreme conditions. Even considering the running time needed, the proposed methodology provides a viable alternative to traditional EFA with unknown number of factors.

On the other hand, more work is needed to fully understand the performance and utility of the proposed models across a wider range of scenarios, such as high dimensionality, more complex loading patterns including irrelevant items, and/or different amount of prespecified information. In the simulation study, the design of the loading structure was balanced and succinct so that factor extraction and parameter recovery can be evaluated simultaneously. Future research can consider more complex scenarios. For instance, we can decrease the magnitude of cross-loadings to .1 and/or allow the loading magnitudes to vary considerably. Although it would be challenging to evaluate the accuracy of parameter recovery, we can still figure out the performance of factor extraction.

In future studies, it would be also desirable to extend both models to address categorical data and missingness, both of which can be widely encountered in social and behavioral research. Moreover, extensions to include covariates, structural model, or more complex data structure such as multilevel or multiple samples are also worth exploring. Finally, a thorough comparisons between the existing and other EFA approaches for different purposes (e.g., extracting the number of factors, model simplicity, identifying relevant and irrelevant variables) can guide applied researchers to choose the appropriate ones across different practical settings.

Practical implications and suggestions

The proposed models can offer several advantages in practice. One advantage of the Bayesian approach is the availability of interval estimates. Under traditional EFA, one can only relies on conventional cut-off (e.g., .3 or .4) to determine significant loadings. As illustrated in real-life examples, the interval estimates offer additional flexibility to select significant loadings against insignificant ones. For factor extraction, this research adopted the conventional rule of eigenvalue larger than one, which worked well. It appeared though, a substantially smaller cut-off (e.g., .6) usually provided similar results, given the Markov chain is stabilized (note that the chain can be unstable when the cut-off is too small). In practice, it might not be a bad idea to try multiple cut-offs, which can give us more confidence once they converge to the same solution. Nevertheless, more investigation is desirable in this perspective.

The PEFA would be more valuable due to its properties of both factor-level simplicity and model interpretability. In an era not lacking in research findings, an approach that is capable of incorporating existing knowledge and exploring data-driven patterns at the same time can always offer more flexibility to scale or model development. When either developing a new scale or revising an existing scale, for instance, it is useful to be able to keep some parts of the scale more confirmatory, namely fully or partially consistent with existing knowledge, while allowing other parts to be more exploratory with dimension reduction.

Since the FEFA can be regarded as a special case of the PEFA, they consist of a partially exploratory approach to factor analysis where local dependence is sacrificed in return for factor extraction and loading regularization. When the number of factors is given and a few loadings per factor can be pre-specified, one can resort to the partially confirmatory approach where the loading structure and local dependence can be regularized simultaneously (Chen, 2022). Specifically, the partially confirmatory factor analysis (PCFA) is available for continuous data (Chen et al., 2021), with the partially confirmatory item response model for dichotomous data (Chen, 2020), and the generalized PCFA for data mixed with categorical and continuous responses (Chen, 2021b). With these two approaches, we are now equipped with a set of Bayesian regularization tools to address issues related to the factor, loading, and local dependence in factor analysis. It would be interesting to see how they can complement each other and play in concert in practice.