Factor Uniqueness of the Structural Parafac Model

Factor analysis is a well-known method for describing the covariance structure among a set of manifest variables through a limited number of unobserved factors. When the observed variables are collected at various occasions on the same statistical units, the data have a three-way structure and standard factor analysis may fail. To overcome these limitations, three-way models, such as the Parafac model, can be adopted. It is often seen as an extension of principal component analysis able to discover unique latent components. The structural version, i.e., as a reparameterization of the covariance matrix, has been also formulated but rarely investigated. In this article, such a formulation is studied by discussing under what conditions factor uniqueness is preserved. It is shown that, under mild conditions, such a property holds even if the specific factors are assumed to be within-variable, or within-occasion, correlated and the model is modified to become scale invariant.


Introduction
Factor analysis (FA) (Bartholomew, Knott, & Moustaki, 2011) is a well-known method explaining the relationships among a set of manifest variables, observed on a sample of statistical units, in terms of a limited number of latent variables. In FA, data are stored in a matrix, say X, of order (I × J ) being I and J the number of statistical units and variables, respectively. Thus, FA deals with two-way two-mode data, where the modes are the entities of the data matrix, i.e., statistical units and manifest variables, and the ways are the indexes of the elements of X, i.e., i = 1, . . . , I and j = 1, . . . , J . In many practical situations, it may occur that the scores on the same manifest variables with respect to a sample of statistical units are replicated across K different occasions, e.g., time, locations, conditions, etc. Examples can be found in several domains. In medicine, these are the daily measures of some vital characteristics on a set of patients. In chemistry, samples measured at different emission wavelengths and excitation wavelengths. In social sciences, think about the yearly evaluation of the main cities of a given area in terms of indicators assessing the quality of life. In marketing, ratings on some goods expressed by a sample of consumers. In psychology, the scores for a group of children on a set of personality variables (traits) rated by different judges (methods). In the previous examples, there are three sets of entities (statistical units, manifest variables and occasions), hence three modes. For instance, in the latter one, these are children, traits and methods. The available information is stored in 557 where the symbol '⊗' denotes the Kronecker product of matrices and A, B, C have order (I × P), (J × Q), (K × R), respectively. The Tucker3 model can be seen as a generalization of PCA where the JK columns of X A , J variables measured at K different occasions, are modeled through the linear combinations of P latent components for the statistical units (the columns of A), with weights G A (C ⊗ B) . By exploiting the symmetry of the model with respect to the modes, we derive that P, Q and R are the number of components for the statistical units, the manifest variables and the occasions, respectively, while the elements of A, B and C are the scores of the entities of the various modes on the corresponding components. Finally, G A is the statistical unit-mode matricization, of order (P × QR), of the so-called core tensor G, of order (P × Q × R), expressing the strength of the triple interactions among the components of the three modes. Like PCA, the model parameters are estimated in the OLS sense by minimizing E A 2 , where · is the Frobenius norm of matrices. The solutions are not obtained by the singular value decomposition (SVD, Eckart & Young, 1936) of a particular matrix as is for PCA. For this purpose, alternating least squares (ALS) algorithms can be applied. See, for instance, Kroonenberg & De Leeuw (1980). As a possible extension of SVD to the multiway case, we mention the higher-order singular value decomposition (De Lathauwer, De Moor, & Vandewalle, 2000).
The applicability of the Tucker3 model may be limited for several reasons. First of all, the interpretability of the solution, in particular of the elements of the core tensor, is usually a very complex issue. Moreover, differently from standard PCA, the solutions are not nested. Finally, the obtained solution is not unique. In fact, it is straightforward to see that we can post-multiply A, B and C by square non-singular transformation matrices of appropriate order, say P, Q and R, obtaining the new component matrices A T = AP, B T = BQ and C T = CR. If we compensate such transformations in the core by setting G AT = P −1 G A (R −1 ⊗ Q −1 ) , we get an equally well-fitting solution because To overcome the possible limitations of the Tucker3 model, it may be convenient to consider the Parafac one, independently proposed by Carroll & Chang (1970) and Harshman (1970). The Parafac model can be seen as a constrained version of the Tucker3 one where the same number of components, say S, is sought for all the modes (P = Q = R = S) and the core tensor is equal to the identity tensor (G = I, i.e., g pqr = 1, if p = q = r , and g pqr = 0, otherwise). We have where the symbol '•' denotes the Khatri-Rao product of matrices, i.e., it is where b s and c s are the sth columns of B and C, respectively (s = 1, . . . , S).
As for the Tucker3 case, the parameter estimates are found in the OLS sense by ALS algorithms. The most interesting feature of Parafac is that under mild conditions, the solution is unique. This point has been deeply investigated by Kruskal (1977), who has found the following result. Let us denote by k-rank(Z) the so-called k-rank of a matrix Z. It is defined as the largest number k such that every subset of k columns of Z is linearly independent. Moreover, let (A, B, C) and (A T , B T , C T ) be two Parafac solutions. Kruskal (1977) has shown that if then, by considering (3), implies that there exist a permutation matrix P and three diagonal matrices D A , D B and D C , for which D A D B D C = I, where I denotes the identity matrix, such that In other words, if (4) holds, then the solution (A, B, C) is unique up to scaling and a simultaneous column permutation. Although Kruskal's condition has been extended by some other authors (Jiang & Sidiropoulos, 2004;Stegeman, ten Berge, & De Lathauwer, 2006;Stegeman, 2009;Domanov & De Lathauver, 2013a;Domanov & De Lathauver, 2013b), what follows is based on such a condition because practitioners mainly refer to it in their applications. In particular, without loss of generality, we set the scaling of the factor loading matrices by assuming that B and C are column-wise normalized, i.e., where '*' denotes the Hadamard product, i.e., element-wise product of matrices. Therefore, (4) and (5) imply that there exists a permutation matrix P such that It is important to note that parameter estimates are not scale invariant. A rescaling of the data does not imply only an analogous rescaling of the estimates. It follows that there exists a way to rescale the data better, or more appropriate, than another, making crucial the choice. This is witnessed by a remarkably large number of papers devoted to the preprocessing step. See, for instance, Harshman & Lundy (1984), Bro & Smilde (2003), Kiers (2006) and references therein. The goal of preprocessing is similar to that for standard PCA. Namely, data are standardized in order to eliminate unwanted differences among the variables. Difficulties arise because it is no longer obvious what is the best strategy to adopt. For instance, a three-way three-mode array can be normalized within one of the three modes or even within a combination of two modes. Different ways of preprocessing the data lead to different solutions, and a wrong preprocessing may lead to unfeasible results (see, e.g., Kiers, 2006). Obviously, a scale-invariant model would be simpler to apply in practice especially for non-expert users because such a crucial decision on the best preprocessing strategy would be no longer necessary.

Models
Starting from the original formulations of the models, we can derive what is the covariance structure corresponding to each model. Let us consider the Tucker3 model in (1) by limiting our attention to the ith row of X A , say x Ai , pertaining to the ith statistical unit. x Ai is the vector of length JK containing the scores of statistical unit i on the J manifest variables during the K occasions. We get where with obvious notation, e Ai represents the ith row of E A and a i is the ith row of A containing the component scores for statistical unit i. To simplify the notation, we omit the subscripts 'A' and 'i,' and we rewrite (9) in terms of column vectors, explicitly considering a vector of intercepts As usual in standard FA, we assume that the common factors a and the specific factors e are random with E(a) = 0 and E(e) = 0, without loss of generality because of µ, and E(ae ) = 0.
If E(aa ) = and E(ee ) = , then the covariance matrix of x is given by In what follows, and will be assumed to be positive definite. The generic element of the matrix (of order JK × JK), σ jk, j k , holds the covariance between manifest variable j at occasion k and manifest variable j at occasion k ( j, j = 1, . . . , J ; k, k = 1, . . . , K ). Bearing in mind the standard FA model, it should be clear that the Tucker3 model is a constrained version of standard FA. If we set = (C ⊗ B)G , then (11) can be rewritten as which is the oblique FA model, where = (C ⊗ B)G is the matrix of factor loadings having a particular form taking into account the three-way three-mode structure of the data. Several papers available in the literature investigate the model in (11). Interested readers may refer to Bloxom (1968), Bentler & Lee (1978), Bentler & Lee (1979), Lee & Fong (1983), Bloxom (1984), Bentler, Poon, & Lee (1988), Kroonenberg & Oort (2003) and references therein. Other multilinear decompositions with Kronecker structured covariance matrices are presented and investigated by, for instance, Gerard & Hoff (2015) and Hoff (2016).
The Parafac model can be derived from (10) setting G = G A = I A as in (3). We have This shows that even the Parafac model is the same as the classical FA model, where the observed variables are expressed as a linear combination of a limited number of common factors (a) and specific factors (e). In order to take into account the three-way three-mode structure of the data, the loadings are constrained to be equal to C • B. Similarly, the structural version of Parafac can be derived from (11) imposing the restriction Comparing (10) with (13), we see that the restriction G = G A = I A greatly simplifies model interpretation. This aspect can be highlighted by focusing on a single occasion, say k. Let x k be the vector of the J manifest variables measured at occasion k, with obvious notation, the Tucker3 and Parafac assume the form respectively, where diag(z) is the diagonal matrix with the elements of z on the main diagonal. Under the Tucker3 model, B may be interpreted as the matrix of factor loadings for the variables on the common factors ( r c kr G r ) a varying the covariance structure across the occasions. Even under the Parafac, the factor loadings for the variables are the same (B) in each occasion, but the common factors diag(c k )a vary only in variance. By exploiting the symmetry of the models with respect to the modes, we may interpret C as the factor loading matrix for the occasions.
In the following section, we reconsider the structural Parafac model by analyzing whether the constraints = C • B affect the parameter identifiability under different covariance structure of the specific factors.

Scale Invariance
The practical applicability of FA is favored by the property of scale invariance. As is well known, the factorial structure of the standard FA model is not destroyed by variable rescaling. For example, if x is rescaled as x L = Lx where L is a diagonal matrix, then the covariance matrix of x L , L , is equal to with L = L and L = L L. Hence, the factor loadings are the same but expressed in the new units of measurements. This guarantees that when the model is estimated by a scale-invariant method, like maximum likelihood, a rescaling of the data implies an analogous rescaling of the estimates. If L contains standard deviations, L is the correlation matrix of x and standard FA applied to covariance or correlation matrices leads to the same conclusions. This is not true for PCA. The components extracted by using L or differ, and therefore, a crucial question to answer before applying PCA is how to preprocess the data.
In the structural case, the Parafac model introduced in (13), and the resulting covariance matrix given in (14), is generally not scale invariant in the sense given by Browne (1982). In fact, if x in (13) is rescaled as x L = Lx, then it follows that, in general, the structural Parafac model in (14) is not scale invariant because the equation does not always have a solution. In other terms, the matrix L usually destroys the Khatri-Rao structure of the factor loadings unless L has a particular structure. Bearing in mind that the same variables are collected at different occasions, it may be reasonable to perform the same scaling of the variables in the various occasions and the same scaling of the occasions across the different variables. This is equivalent to say that where The generic element l V j gives the scaling of variable j ( j = 1, . . . , J ). Similarly, l Ok represents the scaling of occasion k (k = 1, . . . , K ). It follows that x jk , i.e., variable j at occasion k, is rescaled as l V j l Ok x jk . In this particular case, by substituting (20) into (19), we get with B L = L V B and C L = L O C. Therefore, the scaling is absorbed by the Khatri-Rao product and the particular factor loading structure is not destroyed. However, in general, the structural Parafac model is not scale invariant and a researcher may be interested in knowing how to modify its structure to derive a scale-invariant version. Following Lee & Fong (1983), the Parafac formulation in (13) can be modified by incorporating a positive definite diagonal matrix D in the model able to absorb scale changes. We get From (22), the covariance matrix of x is In this way, the Parafac model becomes scale invariant. In fact, when x is rescaled as becomes with µ L = Lµ and D L = LD. The covariance matrix of x L is Hence, the model is not affected by L because the scaling is absorbed in the diagonal matrix D L . It is important to note that if we impose a particular structure on D, then the same should be on D L = LD for every possible L. It follows that we cannot impose any particular structure on D.
In order to identify the diagonal matrix D, either the constraint ((C • B) (C • B) + ) * I = I (Lee & Fong, 1983) or = I (Bentler, Poon, & Lee, 1988) is imposed. The two formulations are not equivalent. The former allows us to interpret the model as a reparameterization of the correlation matrix, and the latter has an interpretation less clear but avoids complex constraints on the estimates of the parameter matrices except for .

Factor Uniqueness of the Structural Parafac Model
A fundamental property of the Parafac model is its factor uniqueness. In this section, we start showing that Parafac maintains this property even in the structural formulation when, as in the standard FA model in (12), the specific factors are assumed to be uncorrelated, i.e., the matrix is diagonal. However, in three-way three-mode FA, we note that the uncorrelation of the specific factors might be a too restrictive assumption. For instance, it might be reasonable to assume that is block diagonal, i.e., the specific factors of the different variables are correlated within the same occasion = blockdiag( 11 , . . . , kk , . . . , KK where blockdiag(Z 1 , . . . , Z K ) denotes the block-diagonal matrix with diagonal elements equal to the square matrices Z 1 , . . . , Z K and kk denotes the covariance matrix of order (J × J ) for the specific factors at occasion k, k = 1, . . . , K . The Parafac covariance model in (14) with the correlation structure of the specific factors given in (26) represents a more realistic model able to fit reasonably well in many practical situations.
To interpret/understand correlations between specific factors, we note that they are allowed only between specific factors in the same occasion. In other words, the idea is that correlations between variables measured in different occasions are due only to common factors, while correlations between variables in the same occasion can also be due to other factors (i.e., the specific factors in that occasion) that are not present in other occasions and therefore out of our interest. An accurate estimation of the 'true'/interesting common factors should consider explicitly the presence of such factors that are in 'common' only with the variables measured in the same occasion. From a statistical point of view, such covariances are only nuisance parameters. In the literature, the assumption of correlated specific factors is not new. See, e.g., Browne (1984a) and Kroonenberg & Oort (2003).
In the following, we investigate under what conditions factor uniqueness holds even with a block-diagonal structure for . Finally, we address the scale invariance problem studying the factor uniqueness property of the scale-invariant version of Parafac in (22-23).
It is important to note that what follows can be extended to the case where the specific factors of the same variable are correlated across the different occasions. Such an extension can be easily obtained by exploiting the symmetry of the model with respect to variables and occasions.

Diagonal
Let us start by showing the factor uniqueness property of the structural Parafac when is diagonal. Note that disjoint submatrices are submatrices having no entry in common.
Result 1 Suppose that the following hold: (c) if any row of C • B is deleted, there remain two disjoint submatrices of rank S, where and T are diagonal matrices and all the remaining ones have S columns. Under the scaling set in (7), there exists a permutation matrix P such that Proof Let us rewrite equality (a) in terms of standard FA models with uncorrelated common factors Condition (c) implies that if any row of (C • B) 1/2 is deleted, there remain two disjoint submatrices of rank S since rank( ) = rank( 1/2 ) = S. From Theorem 5.1 of Anderson & Rubin 563 (1956), we know that when this is true, there exists a column-wise orthonormal matrix T such that indicating that Kruskal's condition is met and the two decompositions in (31) are equal up to a column permutation under constraints (7). Let us indicate with P such a permutation matrix, we have Finally, we note that, in the previous proposition, Kruskal's condition can be replaced with any other sufficient condition available in the literature.

Block Diagonal
When is block diagonal, the conditions of Anderson & Rubin (1956) cannot be longer applied as it assumes a diagonal structure for . To prove the factor uniqueness, the following proposition is considered.
Proposition (Browne, 1980) Let us consider the decomposition = + , where is partitioned as = [ 1 , . . . , k , . . . , K ] and has the corresponding block-diagonal structure = blockdiag( 11 , . . . , kk , . . . , KK ). The identification of and up to post-multiplication by an orthogonal matrix T holds if at least three of the submatrices k are of full column rank.
The above proposition has been formulated in the context of the FA model for multiple batteries of tests. The matrices k and kk refer to the kth battery. The factor loading matrix is obtained by juxtaposing the factor loading matrices of the various batteries, and the specific factors are uncorrelated between the batteries, but correlated within the batteries. The proposition can be applied in order to prove the factor uniqueness of the structural Parafac model when the specific factors are correlated within the occasions. A sufficient condition for the factor uniqueness of the Parafac solution is given in Result 2.
Result 2 Suppose that the following hold: (c) rank(Bdiag(c k )) = S for at least three occasions, where = blockdiag( 11 , . . . , kk , . . . , KK ) is block diagonal with blocks defined by the occasions, T has the same structure, all the remaining matrices have S columns, and c k is the vector containing the elements of the kth row of C.
Under the scaling set in (7), there exists a permutation matrix P such that B T = BP, C T = CP, T = P P.
Proof Let us rewrite equality (a) in terms of standard FA models with uncorrelated common factors where the matrix of factor loadings (C • B) 1/2 , if partitioned according to the blocks of , takes the form ( Condition (c) implies rank(Bdiag(c k ) 1/2 ) = S for at least three occasions since rank( ) = rank( 1/2 ) = S. This allows us to use Browne's proposition in order to show that (38) implies the existence of a column-wise orthonormal matrix T such that indicating that Kruskal's condition is met and the two decompositions in (39) are equal up to a column permutation under constraints (7). Let us indicate with P such a permutation matrix, we have B T = BP, C T = CP, 1/2 T = T 1/2 P ⇒ T = P P.

Scale Invariance
It remains to prove that the scale-invariant versions of the structural Parafac model in (25), with constraint (a) ((C • B) (C • B) + ) * I = I (Lee & Fong, 1983) or (b) * I = I (Bentler, Poon, & Lee, 1988), satisfy the factor uniqueness property. Note that the latter constraint is modified with respect to the original version to also handle the case where is block diagonal. The following results hold for either of the two cases where is diagonal or block diagonal.
Let us consider the equality In order to show that the previous results hold even in this case, it is sufficient to show that the use of either constraint (a) or (b) implies D = D T .
In case (a), by focusing the attention on the main diagonal elements of the covariance matrices on the left and right of equality (42), we deduce that because D and D T are required to be positive definite. Case (b) is more complex to show because now the constraint applies only on the covariance matrix of the specific factors and it becomes effective in the identification only if the decomposition of the total covariance is unique. To this end, we rewrite (42) in terms of FA models. Namely, setting = D(C • B) 1/2 and Z = D D, with obvious notation, we get If the decomposition of the total covariance into variability due to the common factors and variability due to the specific factors is unique, then it must be Z = Z T , i.e., D D = D T T D T ; hence, D = D T because * I = T * I = I. It remains to discuss under what conditions the total covariance decomposition is unique. In particular, we are going to show that this is obtained when condition (c) in either Result 1 or 2 is satisfied.
In the case of a diagonal , when condition (c) of Result 1 is met, if any row of = D(C • B) 1/2 is deleted, there remain two disjoint submatrices of rank S. This allows us to apply Theorem 5.1 of Anderson & Rubin (1956) obtaining the required factor uniqueness.
In the case of a block diagonal , when condition (c) of Result 2 is met, at least three submatrices of D( . . . , D k , . . . , D K ), are of full column rank. This allows us to apply Browne's proposition obtaining the required factor uniqueness.

Estimation of the Structural Parafac Model
The parameter estimation of the structural Parafac model can be done in several ways. Ordinary least squares (OLS), generalized least squares (GLS), maximum likelihood (ML) approaches can be applied. They differ in the discrepancy function to be minimized between the sample covariance matrix and the one induced by the Parafac model. If observations are independent and identically distributed as a multinormal distribution, the estimate is obtained in such a way to minimize where θ = {B, C, , } denotes the set of parameters, S is the sample covariance matrix of order (JK × JK), (θ) is the estimated covariance matrix according to the Parafac model, and W is a weight matrix. If W = I in (45), then the OLS estimate is found. The OLS approach is, however, not recommended because it is not scale invariant regardless the model properties. If we set W = S, then the so-called best GLS estimate is found, while the ML estimate is almost certainly obtained setting W = (θ) when I is large enough (Browne, 1974). The 'best' GLS and ML estimators are asymptotically equivalent and therefore share the same asymptotic properties. The most relevant difference is that the GLS approach does not exactly reproduce the sample variances of the manifest variables (see, e.g., Bentler & Lee, 1979). If the multivariate normality assumption is violated, the asymptotically distribution-free (ADF) estimation method (Browne, 1984b) can be adopted. The ADF method minimizes the discrepancy function in (45) with a particular choice of W involving the sample fourth-order moments about the mean (see, e.g., Lee, 2007, p. 45). A Bayesian strategy can also be adopted (see, e.g., Hoff, 2016). A closed-form solution to the minimization of (45) does not exist. However, Bentler, Poon, & Lee (1988) show that well-known software packages designed for structural equation models (for a review, see Narayanan, 2012) can be adapted for solving three-way three-mode FA estimation problems such as the Parafac one in (45).

Related Models
The structural Parafac model is estimated by using a scale-invariant method of estimation, i.e., GLS or ML, and it is not the only attempt to go beyond the component Parafac and its OLS estimation. In the component approach, Bro, Sidiropoulos, & Smilde (2002) address the estimation problem of various models for which OLS fitting procedures exist, such as the Parafac one, when errors are no longer homoscedastic and uncorrelated. They minimize a discrepancy function like  Wentzell, 2006). In between the component and structural approaches, we found the proposal of Harshman (1972), named Parafac2, modeling the within-occasion covariance matrices as, with obvious notation, where S k is the sample covariance matrix of the manifest variables at occasion k and E k is an error term. The model is estimated by OLS, i.e., by minimizing the loss with W = I. It is important to note that (48) differs from (14) because in the former, covariances among different occasions and among specific factors are not considered. Mayekawa (1987) extends the Parafac2 model by introducing the variances of the specific factors and providing the ML estimation. About the structural approach, it is worth mentioning the work of Stegeman & Lam (2014), where an estimation procedure, based on minimum rank FA, for the structural Parafac model in (14) is proposed under the assumption of uncorrelated specific factors. In the proposal, properties of the estimators are not studied and their standard errors are not derived. Moreover, the adequacy of the model is not evaluated by goodness-of-fit tests.

Application
The type (b) scale-invariant structural Parafac model is applied to the data introduced by Bentler & McClain (1976). The data refer to the correlation matrix observed on a sample of I = 68 fifth-grade children with respect to J = 4 personality variables (E: Extraversion; A: Test anxiety; I : Impulsivity; M: Academic achievement motivation) measured by Peer report (P), Teacher rating (T ) and Self-rating (S), hence K = 3. They are reported in Table 1. The data in Table 1 represent an example of multitrait multimethod matrix. The traits are the personality variables, and the methods refer to the judge ratings.
Several scale-invariant structural Parafac models are estimated from the correlation data in Table 1. They are obtained by varying the number of factors (either S = 2 or S = 3), the covariance structure of the common factors (either orthogonal or oblique) and the covariance matrix of the specific factors (i) diagonal, the specific factors are uncorrelated; (ii) block diagonal, the specific factors are correlated only within the methods; (iii) banded, the specific factors are correlated only within the traits). The scale invariance is obtained by adding the matrix D as in (23) and the constraint * I = I for identification purposes. Estimation is addressed by ML assuming that the vectors x i , i = 1, . . . , I , are independent and identically distributed as a multivariate normal. The analysis is performed by using the procedure CALIS for structural equation models available in the SAS software. After some experimentations, we found that the best strategy to limit the risk of non-convergence or convergence to a local optimum is to adopt the Newton-Raphson algorithm and operate as follows. For each model with diagonal, the best-fitting solution is determined by considering 100 runs of the Newton-Raphson algorithm. For every run, the starting point is the OLS solution, i.e., the minimizer of (45) with W = I, randomly initialized. This choice limits the risk of attaining local optima. When S = 2, 82 (orthogonal common factors) and 83 (oblique common factors) times (out of 100), the global optimum is found, whilst when S = 3, such numbers decrease to 60 and 46, respectively. Such calculations are made by considering the first four decimal digits. The solution corresponding to the minimum loss function value is 568 PSYCHOMETRIKA considered the best-fitting solution. Such best-fitting solutions are then used as starting points of the corresponding models with block diagonal or banded. We observed that alternative starting points led to local optima. In the scientific community, there is no agreement on what is the best way to select a model. The suggestion is to use several different methods and choose the model on which there is the maximum agreement. Following this, we consider the χ 2 test of overall fit (Browne, 1982), AIC (Akaike, 1974) and BIC (Schwarz, 1978), which are widely used for model selection in this domain. See, for instance, Akaike (1987), Bartholomew, Knott, & Moustaki (2011) and, in the three-way three-mode context, Kroonenberg & Oort (2003). The χ 2 test refers to the null hypothesis against the general alternative H 1 : is any positive definite matrix different from (θ). The summary of the twelve models considered in terms of p-value of the χ 2 test, AIC, BIC and number of parameters is reported in Table 2. By inspecting Table 2, we conclude that there is an agreement toward the use of S = 3 factors. In particular, at the level α = 0.05, the only four models for which the null hypothesis of the χ 2 test is not rejected are the ones where is non-diagonal and the common factors are either orthogonal or oblique. The first model is also the best one according to the AIC criterion. We thus investigate the solution of such a model.
First of all, we report the estimates of the diagonal scaling matrix D in Table 3. All these estimates appear to be significant. The estimated factor loadings for the traits and the methods are given in Tables 4 and 5. Note that, without loss of fit, in the estimation process the elements of the first rows of the factor loading matrices for the traits and the methods are constrained to 1. Factor 1 is related to the duality between Impulsivity (with positive sign) versus Academic achievement motivation (with negative sign) mainly measured by Peer report. Factor 2 depends on Test anxiety and Academic achievement motivation (both with negative sign) with respect to Table 3.  the traits and Teacher rating and Peer report (both with positive sign) with respect to the methods. Finally, Factor 3 is essentially related to the measurements by Teacher rating for Extraversion. (All such factor loadings are positive.) The square root of the estimated covariance matrix for the common factors, 1/2 , and of the banded covariance matrix for the unique factors, 1/2 , are given in Tables 6 and 7, respectively. From Table 7, we can see that several off-diagonal elements are rather different from zero. The banded structure of implies differences in the estimated common factors in comparison with the model where is assumed to be diagonal. To clarify this point, we briefly compare the previously described solution with the corresponding one where is diagonal. Although the null hypothesis of the χ 2 test is rejected, such a model is characterized by the lowest BIC. The interpretation is slightly modified due to the loading of Impulsivity which remarkably increases (3.66). The most relevant differences for Factor 2 involve Academic achievement motivation (loading from − 2.92 to − 1.35) and Teacher rating (from 1.70 to 1.26). With respect to Factor 3, we observe differences in the methods. In detail, the loadings of Teacher rating and Self-rating reduce to 2.13 and 1.03, respectively. Obviously, these differences are related to the fact that assuming to be diagonal leads to a model such that the common factors fit some amount of error, i.e., the correlations between the specific factors within the traits. Finally, by looking at the standard errors reported in Tables 3, 4, 5, 6 and 7, we note that in some cases, especially for the variable factor loadings, some estimates are not significantly different from zero. This is quite obvious when 42 parameters are estimated by using only 68 observations.   The structural Parafac model has been discussed in this paper. The focus has been on its major property: uniqueness of factor loading matrices. We have shown that, under mild conditions, such a property, proved only for the component version of the model, holds. In particular, we have shown that it holds even if the specific factors are assumed to be within-variable or within-occasion correlated and the model is modified to become scale invariant. In this respect, a consideration is in order. Our uniqueness proofs are based on Kruskal's sufficient condition (1977) formulated for the component version of the model. However, this could be substituted with any other sufficient condition that exploits the particular features of the structural model. As an example, in this context, Kruskal's condition is always applied by having one of the three matrices square and of full rank, i.e., 1/2 . Conditions taking into account such a feature (e.g., Jiang & Sidiropoulos, 2004) probably would lead to more powerful results. Furthermore, we have focused our attention to the structural Parafac model with unconstrained matrices B and C. However, especially in the component-based version, such matrices and A can be constrained (e.g., column-wise orthonormality, linear dependency, nonnegativity, unimodality, symmetry). In general, imposing constraints implies replacing Kruskal's condition with more relaxed uniqueness conditions. For instance, in case of linear dependency, see Stegeman & Lam (2012). Our results might be improved by using such relaxed conditions.