Factor Analysis Procedures Revisited from the Comprehensive Model with Unique Factors Decomposed into Specific Factors and Errors

Factor analysis (FA) procedures can be classified into three types (Adachi in WIREs Comput Stat https://onlinelibrary.wiley.com/doi/abs/10.1002/wics.1458, 2019): latent variable FA (LVFA), matrix decomposition FA (MDFA), and its variant (Stegeman in Comput Stat Data Anal 99: 189–203, 2016) named completely decomposed FA (CDFA) through the theorems proved in this paper. We revisit those procedures from the Comprehensive FA (CompFA) model, in which a multivariate observation is decomposed into common factor, specific factor, and error parts. These three parts are separated in MDFA and CDFA, while the specific factor and error parts are not separated, but their sum, called a unique factor, is considered in LVFA. We show that the assumptions in the CompFA model are satisfied by the CDFA solution, but not completely by the MDFA one. Then, how the CompFA model parameters are estimated in the FA procedures is examined. The study shows that all parameters can be recovered well in CDFA, while the sum of the parameters for the specific factor and error parts is approximated by the LVFA estimate of the unique factor parameter and by the MDFA estimate of the specific factor parameter. More detailed results are given through our subdivision of the CompFA model according to whether the error part is uncorrelated among variables or not.

Factor analysis (FA) was originally conceived of by Spearman (1904) and developed toward its modern form by Thurstone (1947). FA is performed for a multivariate data set in order to extract two types of mutually uncorrelated factors: common factors and all others. The common factors, whose number is much less than that of observed variables, serve to explain the variation of all variables. On the other hand, the other factors explain the variations in the variables that remain unaccounted for by the common factors. The latter factors are referred to as unique factors in a prevalent FA model, while those factors are called specific factors in another FA model. Those two FA models are introduced in the following two paragraphs. The prevalent FA model is expressed as for a p-variate observed vector x( p × 1), withẽ = u and the expectations of the elements in x being zero (e.g., Bartholomew, Knott, & Moustaki, 2011;Mulaik, 2010;Yanai & Ichikawa, 2007). Here, vectors f (m × 1) and u ( p × 1) contain the common and unique factor scores, respectively, with m < p. The scores in f and u are treated as random latent variables, while ( p × m) and ( p × p) are the nonrandom parameter matrices to be estimated. The elements of are called factor loadings, as they describe how the p variables in x load the m common factors in f. In contrast to being unconstrained, is restricted to a diagonal matrix. This implies that the p variables in x have one-to-one correspondences to the p unique factors in u: its jth element 968 PSYCHOMETRIKA uniquely affects the jth one of x with the jth diagonal element of being a coefficient. We refer to the FA procedure based on (1) as latent variable FA (LVFA) following Adachi (2019), in order to distinguish it from a procedure to be introduced later.
The vectorẽ = u in (1) is sometimes referred to as an error vector; however, the classic literature such as Harman (1976), Reyment & Jöreskog (1993), and Thurstone (1947) includes the description thatẽ = u is divided into two vectors as u = s + e. (2) Here, e (rather thanẽ) is referred to as an error vector, and the elements of s ( p × 1) are called specific factor scores, with ( p × p) being diagonal. That is, the FA model is also introduced by incorporating (2) into (1), i.e., Here, s is found to perform a role similar to the unique factor part u in (1): is diagonal as is , which implies that the jth specific factor score in s specifically (i.e., uniquely) affects the jth variable of x, with the jth diagonal element of a coefficient. As found in this sentence, the adjective "specific" used for s in (3) has the same implication as the "unique" for u in (1): "specific" and "unique" merely serve to avoid the confusion between u in (1) and s in (3). However, the error vector e is included in (3) in addition to s, while the error vectorẽ in (1) equals u. That is, each error inẽ is assumed to uniquely affect the corresponding variable in (1), but s is separated from e. In this point, (3) is more generalized or comprehensive than (1). We thus refer to (3) as a comprehensive FA (CompFA) model. This model can also be considered compatible with Spearman's (1904) original conception of FA (Anderson & Rubin, 1956, p. 112;Yanai, Shegemasu, Mayekawa, & Ichikawa, 1990, p. 4).
However, the CompFA model has been left out of consideration in recent FA studies, as found in the fact that (3) is not treated in the major introductions to FA published in this century (e.g., Bartholomew, Knott, & Moustaki, 2011;Mulaik, 2010). That can be attributed to the following two points seen in the above classic literature: First, the CompFA model has been very briefly mentioned on only a couple of pages in Harman (1976, pp. 19-20), Reyment and Jöreskog, (1993, pp. 75-76), and Thurstone (1947, pp. 74-75); thus, model (3) was not impressed as fulfilling a certain role in FA. Second, in that literature, (3) has been introduced merely as a model; no procedure is described for estimating parameters based on (3).
However, a recently proposed FA procedure, which is called matrix decomposition FA (MDFA) to distinguish it from LVFA (Adachi, 2019;Adachi & Trendafilov, 2018), can be viewed as a parameter estimation procedure for the CompFA model. In more exactness, MDFA can be modeled as a nonrandom matrix version of (3):

Comprehensive Factor Analysis Model
In Sects. 1.1 and 1.2, we review the standard assumptions for CompFA model (3) and its nonrandom matrix version (4), respectively. Those standard assumptions are not involved with the inter-variable correlations of errors. We discuss in Sect. 1.3 that the consideration of the error correlations allows the CompFA model to be subdivided. The prospects for the following sections are given in the final subsection.

Random Version of Standard Assumptions
The CompFA model is expressed as (3), i.e., x = f + s + e, for the random observation vector x, whose expectation E[x] is the p × 1 zero vector 0 p . We review the assumptions for the expectations and covariances of f, s, and e in the classic literature (Harman, 1976;Reyment & Jöreskog, 1993;Thurstone, 1947). 970 PSYCHOMETRIKA In line with E[x] = 0 p , the expectations of the common factor, specific factor, and error vectors are supposed as The covariance matrices for the factor score vectors are assumed to satisfy Here ] denotes the m × p covariance matrix between f and s, I m is the m × m identity matrix, and m O p expresses the m × p zero matrix. The factor score vectors are assumed to be uncorrelated to the error vector: The standard constraints for model (3) consist of (5)- (7). From (6), the covariance matrix of f and that of s are found to be C and C[ s, s] = C[s, s] = 2 , respectively. Here, the diagonal elements of 2 are called specific variances, as they stand for the variances of the specific factor part s. Further, (6) and (7) (3) is found to be expressed as

Nonrandom Matrix Version of Standard Assumptions
The nonrandom matrix version of the CompFA model can be expressed as (4), i.e., X = F + S + E for the n × p data matrix X. Here, X is column-centered with 1 n X = 0 p and supposed to have full column rank with rank(X) = p, with 1 n and rank(X) denoting the n × 1 vector of ones and the rank of X, respectively. We summarize the assumptions for (4), i.e., the matrix versions of the assumptions in Sect. 1.1.
The versions of (5) and (6) are expressed as respectively. The two equations in (7) can be changed into the matrix forms n −1 F E = m O p and n −1 S E = p O p ,which are equivalent to respectively. The standard assumptions for model (4) consist of (9)-(12).

971
The p × p inter-variable matrices for X and E can be expressed as C XX = n −1 X X and C EE = n −1 E E, because of 1 n X = 0 p and (9). Then, (9)-(12) lead to the nonrandom matrix version of (8): where F + S + E is column-centered because of (9).

Uncorrelated Error and Correlated Error Assumptions
In the classic literature (Harman, 1976;Reyment & Jöreskog 1993;Thurstone, 1947), the elements of e in (3) are particularly referred to as measurement errors. If such errors are considered to be uncorrelated among variables, we can add the assumption that the off-diagonal elements of C[e, e] are zeros, i.e., to those in Sect. 1.1, with D[e, e] = diag(C[e, e]). Here, diag(N) denotes the diagonal matrix whose diagonal elements are those of a square matrix N. In the classic literature, (14) is not explicitly presented, but might be implicitly supposed for the following reasons: In that literature, only latent variable FA (LVFA) is described for estimating parameters, and the constraints considered in LVFA follow from adding (14) to the standard ones in Sect. 1.1, as shown in Sect. 2.2. We can also add the nonrandom matrix version of (14), i.e., to those in Sect. 1.2, with D EE = diag(C EE ). We can also consider a version of the CompFA model, in which (14) and (15) are not assumed, that is, the errors are allowed to be correlated among variables. In this correlated error version, C[e, e] in (8) and C EE in (13) are merely supposed to be unconstrained covariance matrices that are nonnegative-definite.
We should note the difference between uncorrelated error constraints (14) and (15), the latter being stronger, as explained next. The rows of the error matrix E in C EE = n −1 E E can be considered the realizations of the transpose of e in (14). This consideration leads to C[e, e] in (14) equaling E[C EE ], as detailed in Appendix 1. Thus, we can rewrite (14) as the expectation of C EE being diagonal: (14) requires the diagonality of E[C EE ], but not that of C EE itself. This diagonality is required by (15), in contrast. To emphasize the strength of (15), we call this the strong uncorrelated error condition. We can consider that data meeting (15) is hardly encountered, i.e., is unusual. However, in the later sections, such data would be noted: The FA procedures are shown to perfectly fit the data meeting strong condition (15), which is a motivation to study the related properties of the FA procedures. The next theorem gives the foundation for the perfect fit to be shown later.
Theorem 1. If the error matrix E in CompFA model (4) satisfies (15), (4) can be rewritten as the error-free model: Here, can be regarded as the specific variance and factor score matrices, respectively.

Prospects for Relating the CompFA Model to LVFA, MDFA, and CDFA
As the assumptions in the CompFA model have been specified, they can now be used for providing the prospects for the following sections, where the relationships of the CompFA model to LVFA, MDFA, and CDFA will be studied.
Among the relationships, those independent of the uncorrelated and correlated error assumptions (in Sect. 1.3) can be summarized as in Table 1. Its left-hand "Model" and "Note" columns merely present the facts described before Sect. 1: (2) links LVFA model (1) to CompFA model (3), and this matrix version is model (4) underlying MDFA and CDFA.
The right-hand columns in Table 1 present the key points in the relationships to be found. The lower cells in the "Specific Factor & Errors" column show the facts to be found in Sects. 3.2 and 4.2: The MDFA solution does not meet (12) in the CompFA assumptions, but only satisfies its diagonal part diag(S E) = p O p , but (12) is satisfied by the CDFA solution. On the other hand, the specific factor and error parts ( s and e) are unseparated in LVFA.
The column furthest right in Table 1 shows how 2 , 2 , and are usually estimated in the FA procedures for the CompFA data matrix i.e., the observations underlaid by (4) with F, , S, , and E set to particular matrices F, , S, , and E, respectively. Here, the latter matrices have been underlined for the sake of indicating that particular values are substituted into the elements of those matrices. The final (third) subsections in Sect. 2-4 are particularly concerned with how the true and D EE in (17) are related to LVFA, MDFA, and CDFA estimates, respectively, as outlined in the following two paragraphs. In Sects. 2.3 and 3.3, we will discuss that the LVFA estimate of 2 approximates 2 +D EE and the MDFA one of 2 approximates 2 + D EE , respectively. The latter MDFA property of 2 ≈ 2 + D EE is restated as 2 being contaminated by D EE . In Sect. 4.3, CDFA will be shown to provide the estimate 2 ≈ 2 differently from MDFA, with a discussion of how this difference follows from the CDFA solution satisfying (12), which is not met by the MDFA solution.
In Sects. 2.3, 3.3, and 4.3, we will also discuss that the uncorrelated error assumption (in Sect. 1.3) leads to the facts that are not covered in Table 1. The facts are summarized as follows. Two of the formulas with "≈" in Table 1 are replaced by equalities only in strong uncorrelated error condition (15): 2 = 2 +D EE in LVFA and 2 = 2 +D EE in MDFA and CDFA, with all procedures fitting data perfectly. However, apart from that strong condition, the CDFA estimate of 2 can approximate 2 , as described above.
The good recovery of loadings with ≈ in all procedures will be shown numerically in Sect. 5.

Latent Variable Factor Analysis
In Sect. 2.1, we review the formulation of latent variable FA (LVFA) with the assumptions added to (1). Then, in Sect. 2.2, we show how those LVFA assumptions can follow from the CompFA ones. In Sect. 2.3, we discuss how the estimate of 2 approximates 2 + D EE , as shown in Table 1.

Formulation
LVFA is modeled as (1), i.e., x = f +ẽ = f + u, where the expectations and covariances for f andẽ = u are assumed to satisfy (e.g., Bartholomew, Knott, & Moustaki, 2011;Yanai & Ichikawa, 2007). Here, the diagonal elements of 2 are called unique variances, as they stand for the variances of the unique factor part u. LVFA assumptions (18)-(20) imply that inter-variable covariance matrix C[x, x] for (1) is expressed as Thus, and 2 can be estimated so that model-based (21) approximates its data-based counterpart C XX = n −1 X X. This estimation can be attained by minimizing over and 2 (Harman & Jones, 1966), with M 2 = trM M denoting the squared Frobenius norm of a matrix M. Another estimation procedure is to minimize the function which is the negative of the log likelihood following from (21) and the additional normality assumptions for f and u (e.g., Bartholomew, Knott, & Moustaki, 2011;Yanai & Ichikawa, 2007).
This theorem shows that the LVFA assumptions follow from adding uncorrelated error assumption (14) to the standard CompFA ones in Sect. 1.1. However, LVFA can be performed for CompFA data (17), independently of whether its errors satisfy (14) or not. In the next subsection, how LVFA behaves for data (17) is discussed.

Behaviors for CompFA Data
CompFA data matrix (17) leads to the covariance matrix i.e., (13) with , 2 , and C EE set to particular matrices , 2 , and C EE = n −1 E E, respectively. Let us consider the LVFA solution for (24). Loss functions (22) and (23) are known to be minimized for for a given (e.g., Mulaik, 2010, (8.47), (8.80)). Using (24) in (25), this can be rewritten as 2 = diag( for ≈ ; if estimated approximates the true counterpart , each of the diagonal elements in 2 , i.e., a unique variance, can be the estimate of the sum of the corresponding true specific and error variances, though these two cannot be estimated separately. The next corollary, which follows from Theorem 1, shows that formula (26) with "≈" is replaced by the equality 2 = 2 + D EE , if (24) satisfies strong uncorrelated error condition (15): Corollary 1. For (15), (24) is restricted to C XX = + 2 + D EE . For this matrix, LVFA loss functions (22) and (23) can attain their lower limit zero with the solution of 2 given by 2 = 2 + D EE and that of satisfying = .

975
Proof. The matrix difference C XX − ( + 2 ) in (22) and the product C XX ( + 2 ) −1 in (23) are rewritten, using C XX = + 2 + D EE , as The former difference can be p O p , and the latter product can be I p , which allows (22) and (23) to attain zeros, for = and 2 = 2 + D EE .
The result in this corollary is referred to in Sect. 3.3.

Matrix Decomposition Factor Analysis
In Sect. 3.1, we review the original formulation of matrix decomposition FA (MDFA). Then, in Sects. 3.2 and 3.3, we discuss two properties of the MDFA solution shown in Table 1. One property is the solution's satisfying diag(S E) = p O p but not (12), which implies that MDFA can be reformulated with constraints that are less restrictive than the CompFA assumptions, as discussed in Sect. 3.2. The other property 2 ≈ 2 + D EE is suggested by a theorem to be presented in Sect. 3.3.

Original Formulation
For the data matrix X with 1 n X = 0 p , MDFA is formulated as minimizing the least squares function for (4), i.e., over F, , S, and subjec to constraints (9) and (10). In the MDFA literature (e.g., Adachi & Trendafilov, 2018;Sočan, 2003;Stegeman, 2016), 1 n E = 0 p in (9) has not been described, but this is trivial, since 1 n E = 0 p follows from the other equations in (9), (4), and 1 n X = 0 p . One difference in (27) from the LVFA loss functions, besides the former being based on (4), is that (22) and (23) in LVFA do not include the factor scores, while (27) includes those scores as F and S to be estimated. However, infinite solutions of F and S exist that minimize (27); the optimal F and S are not unique (Adachi & Trendafilov, 2018, Sect. 4). Thus, only the solutions of and are interpreted in MDFA, as are those of and in LVFA.
Though (27) is defined using data matrix X, the MDFA solution of and can be obtained only if the covariance matrix C XX = n −1 X X is available, i.e., even if the original X is unavailable (Adachi, 2012(Adachi, , 2020. This can be captured in the fact that (27) can be expanded as n −1 tr(X X + F F + S S ) − 2n −1 tr(X F + X S − F S ) and simplified using (10) as trC XX + tr + tr 2 − 2trC XF − 2trC XS , in which X does not appear. Here, C XF = n −1 X F and C XS = n −1 X S contain the covariances of variables to factors and are uniquely determined for given and (Adachi & Trendafilov, 2018, p. 411). On the other hand, the optimal and can be obtained for given C XF and C XS (Adachi & Trendafilov, 2018, p. 410). Thus, the optimal updates of the block matrices [C XF ,C XS ] and [ , ] are iterated alternately to provide the solution of [ , ] in Adachi's (2012Adachi's ( , 2020) MDFA algorithm.

Reformulation from Properties of the Solution
Besides (9) and (10) imposed as constraints in MDFA, (11) and (12) are included in the standard CompFA assumptions (Sect. 1.2). Thus, we must note whether (11) and (12) are satisfied by the MDFA solution, i.e., the parameter estimates and E resulting in the minimization of (27) under (9) and (10). Adachi and Trendafilov (2018, Theorem 4.1) show that the solution satisfies (11) and the diagonal part of (12), i.e., but does not meet the off-diagonal part of (12) with S Ediag(S E) = p O p in general. That is, the MDFA solution satisfies standard CompFA assumptions (9)-(11) but does not meet (12). The MDFA solution satisfying (11) and (28) implies that these equations can be included in the constraints. That is, MDFA can be reformulated as minimizing (27) over F, , S, and subject to constraints (9)-(11) and (28). Here, (28) being only the diagonal part of (12) implies that the MDFA formulation is less restrictive than the CompFA model, in that (12) is relaxed as (28). In contrast, the theorems presented in Sect. 4.2 show that the CDFA solutions satisfy CompFA constraints (9)-(12) completely.

Behaviors for CompFA Data
In this section, we consider how MDFA behaves for CompFA data matrix (17). The next corollary, which follows from Theorem 1, shows the MDFA solution in strong uncorrelated error condition (15). (17)

Corollary 2. For data matrix
Proof. Theorem 1 shows that (17) can be rewritten into the error-free form X = F +S˜ for This can attain the lower limit zero for F = F , S =S, and =˜ = ( 2 + D EE ) 1/2 . By comparing Corollaries 1 and 2, we can find that both LVFA and MDFA perfectly fit the CompFA data with strong condition (15), then 2 = 2 + D EE in LVFA, but 2 = 2 + D EE in MDFA; its estimates of the specific variances are not their true values, and they are contaminated by the error variances in D EE . This shows an undesirable property of MDFA.
Though the data with (15) are unusual, as mentioned in Sect. 1.3, whether the above contamination can occur for usual CompFA data (17) that are not restricted by (15) is to be considered. For this consideration, we reparameterize the error matrix in (17) as and being p × p. The next theorem suggests that the MDFA estimate of 2 can be contaminated by D EE , i.e., 2 can be close to 2 + D EE : Theorem 3. Consider the case where MDFA is performed for the data matrix specified as (17) with (29) and rank(E) = rank( ) = p. Let two matrices be defined as with a p× p diagonal matrix. Then, the matrices¯ andS in (30) can be substituted into and S in function (27), respectively, andS can be substituted into S in (9) and (10). Moreover, if = diag( ) and F satisfies F S = F E = m O p , thenS in (30) can be substituted into S in (28), and we can use =¯ , S =S, and (17) in (27) to rewrite this function as Proof. See Appendix 2. Theorem 3 suggests that the MDFA estimate of 2 can be contaminated by D EE with as explained next. The theorem allows us to consider that¯ andS in (30) can be the MDFA estimates of and S, respectively. By substituting those¯ andS into and S in the loss function (27) to be minimized, it can be rewritten as (31). This value can be small for F ≈ F and 2 ≈ D EE . This use in (30) and =¯ lead to (32). It will be confirmed in Sect. 5 that (32) actually arises.

Completely Decomposed Factor Analysis
In Sect. 4.1, we review Stegeman's (2016) restrictive variant of MDFA that has been rephrased as completely decomposed FA (CDFA) through the theorems present in Sect. 4.2. They also allow CDFA to be reformulated so that it is perfectly matched with the CompFA model. In Sect. 4.3, we argue how the CDFA estimate of 2 approximates 2 as shown in Table 1.

Stegeman's Factor Analysis Procedure
Stegeman's (2016) original formulation of CDFA is to add the constraint to MDFA formulated as in Sect. 3.1: In CDFA, (27) is minimized over F, , S, and subject to (9), (10), and (33), for the data matrix X with 1 n X = 0 p . As described in Stegeman (2016, p. 196), the CDFA solution can be obtained through the following three sequential steps: First, the optimal subject to (33) can be obtained by performing ten Berge and Kiers' (2001) minimum rank factor analysis (MRFA) for C XX = n −1 X X. Then, the resulting provides the solution of S. Finally, loss function (27), whose and S are fixed to the ones resulting so far, is minimized over F and subject to (9) and (10). This minimization is attained for through the singular value decomposition (SVD) of X − S defined as Here, V V = W W = I q , and is the q × q diagonal matrix whose diagonal elements are arranged in decreasing order, with q = rank(X − S ) and q ≥ m supposed. The matrices V m (n × m) and W m ( p × m) in (34) contain the first m columns of V and W, respectively, with m the upper-left m× m diagonal block of .
The optimal F and in (34) are found to satisfy since we can use (10), (34), (35), and As in MDFA, the optimal F and S cannot be uniquely determined, but their estimation can be skipped to obtain the optimal and , only if the covariance matrix C XX is available. This is shown using the fact that (10) and (33) lead to X S = n 2 . This, (33), and (35) imply that the SVD of (X − S ) (X − S ) =X X −n 2 = n(C XX − 2 ) can be defined as n(C XX − 2 ) = W 2 W with 2 given by MRFA for C XX . This SVD can provide with (34).

Behaviors for CompFA Data
In this section, we consider how CDFA behaves for CompFA data (17). At first, the following corollary shows the CDFA solution in strong uncorrelated error condition (15):

Corollary 3. CDFA can be substituted for MDFA in Corollary 2.
Proof. This is the same as the proof for Corollary 2, since (27) is also the CDFA loss function.
This corollary shows that the CDFA estimate of 2 (specific variances) is contaminated by D EE (error variances) for the CompFA data with (15), as is the MDFA estimate. However, we can argue that the contamination is less likely to occur in CDFA than in MDFA for the usual CompFA data that are not restricted by (15). This argument follows from the fact that constraint (28), i.e., diag(S E) = p O p , in MDFA is strengthened as (12), i.e., S E = p O p , in CDFA, as explained in the following paragraph.
In Sect. 3.3, we discussed that the MDFA estimate of 2 can be contaminated as (32), which follows from (30) withS = (S + G )¯ −1 . This matrix and E lead toS E = n¯ −1 ( − 2 ) as shown by (A6) in Appendix 2. Here, is a p × p diagonal matrix and is defined as in (29). In CDFA, the above equation forS E must be substituted into S E in (12) as since (12) is included in the CDFA constraints as described in Sect. 4.2. The equivalence of (37) to strong uncorrelated error condition (15) is shown next: (15), it is rewritten as C EE = D EE . This is equivalent to (37).
Proof. The last identity in (37) holds if and only if is a diagonal matrix: = . This equivalence to C EE = D EE is shown using (29) rewritten as C EE = n −1 E E = . That is, = implies that C EE = is also diagonal: C EE = D EE . On the other hand, C EE = D EE , i.e., C EE = being diagonal, implies that is diagonal: = .
This theorem shows that contamination (32) is less likely to occur in CDFA for the data that do not satisfy (15). On the other hand, even for such data, (32) can occur in MDFA, where S E may not be p O p , but only diag(S E) = m O p is required. These arguments are empirically supported in the next section.

Simulation Study
We assess the performance of the FA procedures for the CompFA data in a simulation study. Its purposes and the data types to be simulated are detailed in Sect. 5.1. Data analysis and assessment procedures are described in Sect. 5.2, and the results are reported in Sects. 5.3-5.5.

Purposes and Data Synthesis Procedures
In this study, the FA procedures are carried out for the CompFA data synthesized with the true loading matrix and specific variance matrix 2 . The major purpose of this study is to numerically assess the following hypotheses:  (32) and (26), respectively. [H 4 ] has not been discussed, but rather has been presupposed for the discussions in Sects. 2-4. The estimates in the hypotheses are obtained, given the covariance matrix C XX = n −1 X X for the CompFA data. Thus, we describe below how the true and 2 are set and how they lead to C XX .
Let U (α, β) denote the uniform distribution for the interval [α, β]. Each element of and each diagonal one of 2 are drawn from U (−1, 1) and U (0.1, 0.8), respectively, subject to rank( ) = m. From the resulting and 2 , we generate 12 types of C XX with rank(C XX ) constrained to be p. Here the 12 (= 3 × 2 × 2) types are defined by combining the three levels of error correlations, two levels of error magnitudes, and two versions of the CompFA model, which are explained in the following paragraphs.
The two versions of the model correspond to its nonrandom (N) version (4) and random (R) version (3). The covariance matrix for the N version is given by (24) for (17) with C EE = n −1 n i=1 e i e i : Here, error vectors e i (i = 1, …, n) are chosen as e i = αε i , with ε i drawn from N p (0 p , ), i.e., the p-variate normal distribution whose mean vector is 0 p and covariance matrix is ( p × p). How α and are defined is described later. The matrix for the R version is given by This follows from (3), whose random vectors are followed by the observation-number subscript i. The factor score vectors in (39) are sampled with [f i , s i ] ∼ N m+ p (0 m+ p , I m+ p ). The three levels of error correlations can be referred to as no, low, and high levels (C N , C L , and C H ), while the two levels of error magnitudes can be called low and high levels (E L and E H ), respectively. Here, C, E, N, L, and H in the parentheses are abbreviations of correlation, error, no, low, and high, respectively. At the C N level, is set to the diagonal matrix D R whose diagonal elements are drawn from U (0.1, 0.8). At the C L and C H levels, is set to D 1/2 R RD 1/2 R , with R = (r jk ) the p × p symmetric nonnegative-definite matrix whose elements are chosen as r j j = 1 and r jk = τ jkr jk for j = k. Here, τ jk is randomly set to 1 or −1, andr jk is drawn from N 1 (0.2ρ, 0.05 2 ρ 2 ) subject to −1 <r jk < 1, with ρ = 1 for C L and ρ = 2 for C H . The α value is set so that tr /tr = 0.1 and 0.2 for the E L and E H levels, respectively. We should notice that is diagonal at the C N level, but C EE = n −1 n i=1 e i e i obtained with e i ∼ N p (0 p , ) is not necessarily diagonal; the C N level does not exactly match (15). In Appendix 3, it is reported that a simulation study for C XX satisfying (15) numerically demonstrated the facts in Corollaries 1-3.
The procedures in the last three paragraphs were replicated 500 times with the settings n = 200, p = 12, and m = 3. Thus, we had 6000 (=12 types × 500 times) C XX .

Data Analysis and Assessment Procedures
For each of the 6000 C XX , we performed MDFA, CDFA, and the two types of LVFA, i.e., the least squares LVFA (LS-LVFA) and maximum likelihood LVFA (ML-LVFA) for minimizing (22) and (23), respectively. Their algorithms are described in Appendix 4. In every procedure, has rotational indeterminacy; thus, it was rotated by the orthogonal Procrustes method (e.g., Adachi, 2020, p. 206), to optimally approximate the true counterpart in a least squares sense. This approximation allows to be comparable to ; thus, the Procrustes method has been typically used in previous simulation studies for FA (e.g., Adachi, 2013, Appendix D;Stegeman, 2016, Sect. 4.2).
The similarities between estimated parameters and their true values can be assessed with the smallness of a mean absolute difference (MAD). The MAD for loadings is defined as MAD( ) = − 1 /(pm), where − 1 denotes the L 1 norm of − , i.e., the sum of the absolute values of the elements in − . The denominator pm in this definition is replaced by p in the following two MAD for p × p diagonal matrices: Here, MAD in (40) and (41) for LVFA differs from those for MDFA and CDFA, as 2 corresponds to 2 in LVFA. We obtain (41) in addition to (40) Now, we have the MAD values for the three matrices (M), , 2 , and 2 + D EE , which were obtained from each solution of the three FA procedures (P) performed for the 6000 (= 3 × 2 × 2 × 500) C XX . Here, these C XX are classified by combining the three levels of error correlations (C), two levels of error magnitudes (E), two versions (V) of the model, and 500 replications (R). In order to assess the hypotheses in Sect. 5.1, we performed analysis of variance of the randomized block design (ANOVA-RBD) (e.g., Kirk, 2013) for the MAD values. Here, P, M, C, E, and V were treated as treatments and R was treated as a block factor; the factorial design can be expressed as P × M × C × E × V with R consisting of 500 blocks, with the sets of the levels in P, M, C, E, and V being {LVFA, MDFA, CDFA}, { , 2 , 2 + D EE }, {E L E H }, and {N,R}, respectively.
The above ANOVA-RBD provided the F values for main and interaction effects in Table 2. We do not use those values for statistical hypothesis testing, since this is senseless for our simulated data, whose sample size is too large to reject null hypotheses. But rather, we consider the F values as standing for the sizes of the effects, as the number of the levels in each treatment is restricted to two or three; thus, it makes sense to compare the F values. We regard the effects with F ≥ 24,000 as substantial enough, though the lower limit of 24,000 is a benchmark threshold chosen because 24,000 is far greater than the largest one (= 14,491.0) among the F-values less than 24,000. Thus, the five effects boldfaced in Table 2 are regarded as substantial.

Averages for Substantial Effects
The averages of MAD for the levels associated with the substantial effects are presented in Table 3. The averages for the main effect of V in Panel (C) show that the MAD for R version (39) is greater than that for N one (38); this can be attributable to the fact that (39) has higher randomness. The averages for the other main effects in the bottom rows of Panels (A) and (B) can be interpreted from the interactions considered next.
The  (40) and (41) values are much larger than those in MAD( ). This difference in the increments can be interpreted by taking into account the related three-way interaction, as explained in the next subsection.  Figure 1 shows the averages associated with the P × M × C and P × M × E interactions, whose F values are the largest among those for the three-way interactions (Table 2). Although those values are not substantial, we note Fig. 1 for exploring the mechanisms that underlie the interactions treated in Sect. 5.3.

Three-way Interactions Related to the Substantial Two-way Interactions
Panel (A) in Fig. 1 shows that the changes of C N to C L and C L to C H increase MAD ( ) for all procedures, but decrease the (40) value for CDFA. This result can be explained by the similarity of the matrices C EE = n −1 n i=1 e i e i at C N to (15) and the deviation of C EE at C H from (15). Here, (15) leads to the solution of = and 2 = 2 = 2 + D EE , as shown in Corollaries 1-3. However, for C EE deviating from (15), the CDFA estimate of 2 can approximate 2 , as discussed with Theorem 6. Thus, the deviation of C EE from (15) increases MAD( ), but decreases (40) only in CDFA. The above explanation can also be used for the increase in the CDFA average of (41) with the deviation from (15). On the other hand, the deviation is not found to decrease the LVFA and MDFA values of (40). This is congruous with (26) and (32). Further, the LVFA and MDFA averages of (40) rather increase with the change from C L to C H . These increases may be correlated to those in MAD( ) from C L to C H , as is jointly estimated with 2 and 2 in (40). The LVFA and MDFA averages of (41) being smaller than those of (40) at all C levels is also congruous with (26) and (32).
Panel (B) shows that the increments in MAD with the change from E L to E H differ among the procedures and matrices. For explaining the differences, we must consider error magnitudes (EM) and D EE effects, where the EM effect stands for the errors with greater EM disturbing parameter estimation more deeply, and the D EE effect refers to the fact that the diagonal elements of D EE for E H have greater values than those for E L . We can consider that the increase in MAD( ) for every procedure follows only from the EM effect, while those in the LVFA and MDFA values of (40) are affected by the EM plus D EE effects, because of (26) and (32). The increments in (40) being greater than their CDFA counterpart can be explained with Theorem 6, which shows that (32) is less likely to occur in CDFA for C EE deviating from (15). On the other hand, (41) values and their increments for MDFA and LVFA are found to be smaller than their (40) counterparts. This result can be attributable to the fact that (41) is affected only by the EM effect in LVFA and MDFA, as the use of their properties (26) and (32) in (41) allows us to find that MAD( 2 + D EE ) can approximate zero, but this approximation can be deteriorated by the EM effect.

Additional Results
The results considered so far show the advantage of CDFA over the two other procedures. Which of those two is better may be answered from the following result: The panels for 2 in Fig. 1 show that the MDFA estimate of 2 is slightly closer to 2 than the LVFA estimate of 2 on average. The generality of this relationship was found: The value, which is defined as the LVFA value of (40) minus its MDFA value, was positive for the 5819 pairs of LVFA-MDFA solutions among the 6000 for all C XX . Further, we performed ANOVA-RBD for with C, E, and V treated as treatments, and R treated as blocks. Among all resulting F values, the one for the main effect of C (3651.7) and the value for the effect of E (3238.4) were the largest, with the other F values less than 940. However, the two effects for the largest F values are not considered substantial, with the averages at levels C N , C L , C H , E L , and E H being 0.003, 0.008, 0.019, 0.006, and 0.014, respectively; even the largest of those cannot be regarded as substantial. In conclusion, the MDFA estimate of 2 is significantly closer to 2 than the LVFA estimate of 2 , but not substantially closer.

Real Data Illustration
The results in the previous sections can be summarized as follows: In CDFA whose formulation is exactly matched to the CompFA model, its parameters (factor loadings and specific variances) are well recovered for the CompFA data. In MDFA and LVFA, the loadings can be recovered as well as in CDFA, but the specific variances cannot be recovered, so their MDFA estimates are contaminated by the error variances, and the LVFA estimates of the unique variances approximate the sums of the specific and error variances. These results suggest that CDFA should be used for the CompFA data, particularly for the purpose of estimating the specific variances. Therefore, in this section, we use two real data sets to illustrate how the CDFA estimates should be interpreted, and we show the merits of using CDFA through comparisons among CDFA, MDFA, and LVFA solutions, on the supposition that the data sets are underlaid by the CompFA model. Here, LVFA is restricted to ML-LVFA, as its solutions were almost equivalent to LS-LVFA.
One of the data sets is that from Yanai and Ichikawa (2007) for personality test scores of n = 200 students for p = 12 items. The other data set, which we obtained from Izenman's (2008) website, is known as Holzinger and Swineford's (1939) 24 psychological tests data and contains intelligence test scores of n = 301 participants for p = 24 items. We performed the FA procedures for the correlation matrices for the personality and intelligence test scores, with m set at 3 and 5, respectively. In every procedure, was rotated by the varimax method (Kaiser, 1958) which is typically used in FA for real data. Tables 4 and 5 present the resulting = [λ 1 , ..., λ p ], λ 1 2 , ..., λ p 2 , ψ j , θ j , and ev j , with the last three terms being the jth diagonal elements of 2 , 2 , and C EE , respectively, and ev an abbreviation of error variance. In the tables, the sub-/superscripts of L, M, and C, which stand for LVFA, MDFA, and CDFA, respectively, have been attached to , λ j , θ j , and ev j , for distinguishing the solutions for different procedures. We use the above notation with sub-/superscripts in this section.
Tables 4 and 5 show that the loadings resulting in all procedures are mutually similar and lead to the identical interpretation of common factors. Although this result does not show the merits of using CDFA, the merits are shown by the other results, as described in the following paragraphs. There, an important role is fulfilled by the fact that holds for v j = n −1 X 2 , which denotes the variance for variable j and is also the jth diagonal element of C XX . Here, the first identity follows from the fact that the CDFA solution meets (9)-(12) and thus (13), the last identity follows from (25), and the second identity is derived as follows: (4), (10), and (11) imply diag(C XX ) = diag( (28). Each of the terms λ j 2 (with superscripts) in (42) can be called a common variance (or communality) by abbreviating the variance of the common factor part affecting variable j, as found from λ j 2 equaling n −1 Fλ j 2 and the variance of λ j f under (6) and (10). When C XX is a correlation matrix as in our case, v j = 1; thus, λ j 2 , θ 2 j , and ev j (with superscripts) in (42) stand for the proportions of the common, specific, and error variances in the variance of variable j, respectively, with ψ 2 j the proportion of the unique variance.
Let us note the CDFA solution in Table 4 with keeping (42) in mind. For example, the results λ C 1 2 = 0.44, C2 1 = 0.39, and ev C 1 = 0.176 for the variable extraversion show that 44 percent of the variations in extraversion are explained by the common factors, 39 percent are accounted for by the factor specific to extraversion, and 17.6 percent of the variations remain unexplained by common and specific factors. The comparison of C2 j across j (= 1, …, p) shows that C2 3 = 0.45 for empathy is the largest and empathy is affected most by the corresponding specific factor among all variables. On the other hand, C2 2 = 0.18 for activity is the smallest among all C2 j , but λ C 2 2 = 0.70 is the largest among all λ C j 2 , implying that activity is affected least by the corresponding specific factor, but is explained best by the common factors, among all variables.
MDFA and LVFA solutions can be interpreted in a parallel manner, except that the term "specific" is replaced by "unique" and an error variance is not obtained in LVFA. However, we can find in Tables 4 and 5 that M2 j and 2 j are much greater than C2 j and rather close to C2 j +ev C j for almost all variables. This finding is congruous with (26) and (32). For example, C2 1 = 0.60 and 2 1 = 0.61 > C2 1 = 0.39 for the variable extraversion in Table 4. We can consider that C2 1 = 0.60 and 2 1 = 0.61 are contaminated by the true error variance for extraversion and greater than its true specific variance.
We can also find that the error variance ev M j resulting in MDFA is far smaller than its CDFA counterpart ev C j for every variable. This result follows from the fact that MDFA is less restrictive than CDFA; (27) value n −1 E 2 = trC EE = p j=1 ev j in MDFA cannot be greater than that in CDFA, which suggests that ev M j < ev C j tends to occur. This property does not imply goodness of the MDFA solution but is rather congruous with its undesirable property shown by (32); comparing (42) allows us to find that M2 j being larger than its true value decreases ev M j .

Conclusion
In this paper, latent variable factor analysis (LVFA), matrix decomposition factor analysis (MDFA), and its variant from Stegeman (2016) were revisited from the comprehensive FA (CompFA) model. The variant of MDFA was reformulated to be called CDFA and exactly underlaid by the CompFA model. On the other hand, MDFA was reformulated as the procedure with (12) in the CompFA model assumptions relaxed as (28). We also showed how the model for LVFA can be related to the CompFA model.   Table   5.
Solutions for intelligence test data. A goal of the revisit was to show how LVFA, MDFA, and CDFA behave for the CompFA data based on the CompFA model. Except for the unusual case where the data satisfy strong condition (15), the following results were theoretically and numerically found: The CDFA estimates of the specific variances can approximate their true values, but the MDFA estimates are contaminated by the error variances, and the LVFA estimates of the unique variances approximate the sum of the true specific and error variances. It was also shown numerically that the factor loadings can be recovered well in all three procedures. On the supposition that the data to be analyzed by FA are underlaid by the CompFA model, the above results have the practical implications described in the following paragraph.
When only factor loadings are of interest, LVFA, MDFA, and CDFA are equally useful. However, if the specific variances are also interesting, CDFA is to be used. The LVFA estimate of unique variances and the MDFA estimates of specific variances must be considered as larger than the true specific variances.
However, a problem remains for showing the above implications to FA users, who are interested in only the factor loadings, not the specific variances. This problem would be dealt with, if psychometricians could enlighten the users about the importance of the specific variances. For this enlightenment, CompFA model (4) can be used as follows: By removing the specific factor part by setting = p O p in (4), this model and corresponding least squares function (27) are rewritten as X = F + E and X − F 2 , respectively. Minimizing this function gives the formulation of principal component analysis (PCA) as approximating X by reduced rank matrix F (Eckart & Young, 1936). This fact demonstrates that FA can be distinguished from PCA simply by the fact that the former has specific factor part S . That is, the significance of using FA (rather than PCA) is in obtaining S , which convinces the users of how it is necessary to interpret the specific variances in 2 together with loading matrix in the FA solution.
Finally, we should remember that MDFA can be regarded as a procedure for a relaxed variant of the CompFA model with (12) replaced by (28). To study such a relaxed CompFA model is beyond the scope of the present study; thus, it remains for future approaches.
Third, the transpose ofS in (30) post-multiplied by (A4) can be rewritten as where the first identity follows from F S = m O p . Fourth, we can use (A1), (A2), and (29) to further rewrite (A5) asS Finally, the substitution of (A6) into S E in (28) leads to diag(S E) = ndiag(¯ −1 ( The remaining task is to show that substituting =¯ , S=S and (17) in (27) allows this function to be rewritten as (31), for = diag( ) and F satisfying F E = m O p . Function (27) after the substitution is given by the use of (A4) in (27): f (F, ,S,¯ ) = n −1 F − F + E − G 2 .
Further, this can be decomposed as since (A1), (A2), F E = m O p , and the equation F G = m O p following from F E = m O p and G = E −1 . The last term in (A7) can be rewritten as n −1 E − G 2 = trD EE + tr 2 −2n −1 trE G = trD EE + tr 2 − 2tr = trD EE + tr 2 − 2tr diag( ) = trD EE − tr 2 , where we have used n −1 trE E = trD EE , = diag( ), (29), and this implying E G = n . This completes the proof for Theorem 3.

Appendix 3. Demonstration for Corollaries
We performed a simulation study. Its procedures are the same as those in Sects. 5.1 and 6.2, except that simulated data satisfy (15): C XX are restricted to (38) with its C EE replaced by the diagonal matrix for the C N level. For all of the resulting 1000 C XX (= 500 replications × 2 levels of E L and E H ), every procedure provided the solutions with the loss function value = 0, MAD( ) < 0.0006, and the (41) value < 0.0015.

Appendix 4. Algorithms
Here, we describe the FA algorithms used in this paper. Harman and Jones' (1966) MINRES and Adachi's (2013) version of Rubin and Thayer's (1982) EM algorithm are used in LS-and ML-LVFA, respectively. Adachi's (2012Adachi's ( , 2020 algorithm is used for MDFA. Bertisimas, Cohenhaver, and Mazumder's (2017) algorithm is used for the first MRFA step in CDFA, and Stegeman's (2016) algorithm is used for the remaining steps in CDFA. The above LS-LVFA algorithm can give an improper solution including a negative unique variance; such a solution cannot be given by the algorithms for the other procedures (Adachi, 2013(Adachi, , 2019.