1 1. Introduction

The Canonical Decomposition (CANDECOMP) (Carroll & Chang, 1970) and the Parallel Factor Analysis (PARAFAC) model (Harshman, 1970) are identical methods for component analysis of three-way arrays. The CANDECOMP/PARAFAC (CP) model assumes that a three-way array containing, for example, scores of cases on variables measured at several occasions is the sum of a systematic part and a residual part, where the former is the sum of R factors. The CP model has been applied in various disciplines such as linguistics (Harshman, Ladefoged, & Goldstein, 1977), psychology (Meyer, 1980; Krijnen & Ten Berge, 1992), marketing (Harshman & DeSarbo, 1984), chemometrics (Smilde, 1992; Leurgans & Ross, 1992), and neuroimaging (Andersen & Rayens, 2004; Beckmann & Smith, 2005).

Let ○ denote the outer vector product, i.e., for vectors x and y we define xy = xy′. For three vectors x, y, and z, the product xyz is a three-way array with elements x i y j z k . The CP model can be written as

$$ = \sum\limits_{r = 1}^R {{a_r} \circ {b_r} \circ {c_r} + } $$

, where _X is the I × J × K three-way data array; a r , b r , and c r are the vectors of the rth factor in each of the three modes; and _E is the residual array. The vectors a r , b r , and c r are found by minimizing the sum of squares of _E. We refer to the latter as the CP criterion function. A CP solution is usually denoted by a triplet (A,B,C), where the parameter matrices contain the vectors a r , b r , and c r as rth columns.

In this paper, we consider the real-valued CP model. The three-way rank of _X is usually defined as the minimal number of rank-1 arrays whose sum equals _X, where a rank-1 array is the outer product of three vectors. Hence, it follows from (1) that the CP model assumes _X is the sum of R rank-1 arrays and a residual array. The smallest R for which _X satisfies the CP model with residuals _E equal to zero, is by definition equal to the three-way rank of _X. Moreover, CP tries to find the best three-way rank-R approximation of _X.

For estimating the CP parameters (A,B,C), several alternating least squares type of algorithms are available (Harshman, 1970; Carroll & Chang, 1970; Ten Berge, Kiers, & Krijnen, 1993; Krijnen & Ten Berge, 1992). Other CP algorithms can be found in Hopke, Paatero, Jia, Ross, and Harshman (1998) and Tomasi and Bro (2006). See also the Multilinear Engine of Paatero (1999).

One of the most attractive features of the CP model is the rotational uniqueness of its solutions. Kruskal (1977, 1989) showed that, for a fixed residual array _E, a CP solution (A,B,C) is unique up to rescaling/counterscaling and jointly permuting columns of the three parameter matrices if

$${k_A} + {k_B} + {k_C} \ge 2R + 2$$

, where k A , k B , and k C denote the k-ranks of the component matrices. The k-rank of a matrix is the largest number x such that every subset of x columns of the matrix is linearly independent. For an accessible proof of (2), see Stegeman and Sidiropoulos (2007).

To avoid the scaling indeterminacy in a CP solution, the columns of two component matrices can be set to unit length. Throughout the paper, we impose this restriction on A and B.

It is well known that the practical use of the CP model is complicated by the occurrence of “degeneracies” while running a CP algorithm. In such cases, the CP criterion function decreases very slowly, some factor magnitudes seem to increase without bound, and the parameter matrices become nearly rank deficient (Harshman & Lundy, 1984, p. 271; Kruskal, Harshman, & Lundy, 1983, 1985, 1989; Mitchell & Burdick, 1994). Such degeneracies are a problem for the analysis of three-way arrays, since the obtained CP solution is hardly interpretable. Degeneracies can be avoided by imposing orthogonality and non-negativity restrictions on the parameter matrices; see Theorem 2 and Lim (2005).

Synthetic data for which degeneracies occur in the CP model were considered by Kruskal et al. (1983) and Paatero (2000). Stegeman (2006, 2008a, 2008b) analysed the structure of degeneracies for all I × J × 2 arrays and several I × J × 3 arrays. It is claimed but not generally proven that in case of degeneracy the CP criterion function does not have a global minimum, that is, does not attain its infimum (Kruskal et al., 1983, 1985). For a synthetic 2 × 2 × 2 array it is shown that this is indeed true (Ten Berge, Kiers, & De Leeuw, 1988; Stegeman, 2006). De Silva and Lim (2006) showed that for R = 1 there always exists an optimal CP solution, while for 2 ≤ R ≤ min(I, J, K) there always exists an array _X of three-way rank R + 1 which has no optimal CP solution. Also, the same authors show that all 2 × 2 × 2 arrays of three-way rank 3 have no optimal CP solution for R = 2.

Apart from the (unrestricted) CP model, degeneracies also occur in other component models (DeSarbo & Carroll, 1985; Krijnen & Ten Berge, 1992; Stegeman, 2008b). Zijlstra and Kiers (2002) showed that degeneracies do not occur in component models which yield rotationally indeterminate components. p ]Here we show that there is a close relation between the occurrence of CP degeneracies and the non-existence of an optimal CP solution. In Section 3, we investigate the situation where the CP criterion function does not attain its infimum. We show that any sequence of (A,B,C) n which monotonically decreases the CP criterion function to its infimum will exhibit the features of a degeneracy. This implies that any CP algorithm minimizing the CP criterion function will yield a degeneracy if the CP model does not have an optimal solution for a particular array _X. In Section 4, we consider orthogonality and non-negativity restrictions under which the CP criterion function attains its infimum. Hence, under these restrictions degeneracies do not occur in the CP model and (we hope) an interpretable CP solution is obtained. Section 5 contains a discussion of our results. In the next section, we introduce some notation.

2 2. Notation

In matrix notation, the CP model is

$${X_k} = A{D_k}B' + {E_k},{\rm{ for }}k = 1, \ldots ,K$$

, where _X k is the kth I × J frontal slice of _X, _E k the kth I × J residual matrix, and D k the diagonal matrix with row k of the matrix C as diagonal. For our purposes, it is convenient to rewrite the model. Let ⊗ be the Kronecker product and “vec” the operator that stacks the columns of a matrix one underneath the other. Let x = vec(vecX 1,…, vec X K ) contain the data, e = vec(vec E 1, …, vec E K ) the residuals, and θ = vec(vec A, vec B, vec C) the q = R(I + J + K) parameters, where θ lies in ℝq. We denote the CP criterion function by f(θ).

Let the Euclidean norm of a vector and the Frobenius norm of a matrix be denoted by the symbol ‖ · ‖. As mentioned earlier, we restrict A and B to have columns of length 1. Let ˜C have the unit length columns \({{\tilde c}_r} = {c_r}{\left\| {\left. {{c_r}} \right\|} \right.^{ - 1}}\), r = 1, …, R. The factors \({f_r} = ({{\tilde c}_r} \otimes {b_r} \otimes {a_r})\), r = 1, …, R, which have unit length, are collected as columns in the matrix F. The magnitude or, more specifically, the Euclidean length d r = ‖c r ‖, r = 1, …, R, of the factors is collected in d = (d 1, …, d R )′. It follows that

$${\left\| \theta \right\|^2} = {\left\| d \right\|^2} + 2R$$

. By (3) and vec(a r b r ) = b r a r , we have x = Fd + e. For the CP criterion function f, we have

$$f(\theta ) = {\left\| {x - Fd} \right\|^2}$$

. An optimal CP solution is defined as a vector ∈ ℝq which globally minimizes f(θ). Various alternating least squares algorithms have been constructed to minimize f. These yield a sequence {θ 1, θ 2, …} = {θ n } of parameter vectors, which monotonically decreases the CP criterion function, i.e., f(θ n ) ≥ f(θ n+1). The monotonicity of the sequence {f(θ n )} is assumed throughout this paper. This is guaranteed to hold for alternating least squares algorithms. But practical experience shows that many other CP algorithms also yield monotonically decreasing sequences {f(θ n )}. For an element θ n of a sequence of CP updates, we denote the corresponding matrices as A n , B n , ˜C n , and the factors and their lengths as F n and d n , respectively.

3 3. When an Optimal CP Solution Does not Exist

Here, we present a result for the case where the CP model does not have an optimal solution, that is, the CP criterion function f does not attain its infimum. We have

Theorem 1.If the CP criterion function f does not attain its infimum and f(θ n ) ↓ inf f, as n → ∞, thenθ n ‖ → ∞.

Proof: Suppose that f does not attain its infimum, {θ n } is a sequence such that f(θ n ) ↓ inf f, and {θ n } has a bounded subsequence EquationSource% MathType!MTEF!2!1!+- % feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn % hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqr1ngB % PrgifHhDYfgasaacH8srps0lbbf9q8WrFfeuY-Hhbbf9v8qqaqFr0x % c9pk0xbba9q8WqFfea0-yr0RYxir-Jbba9q8aq0-yq-He9q8qqQ8fr % Fve9Fve9Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaacaqG7b % qef00BU9gD5bxzGm0BYnxA2fgaiuaacaWF4oWaaSbaaSqaaGqaciaa % +5gadaWgaaadbaGaa43AaaqabaaaleqaaOGaaiyFaaaa!43A4! <InlineEquation ID="IE1"><EquationSource Format="MATHTYPE"><![CDATA[ {\text{\{ }}\theta _{n_k } \} $$. It follows that the latter has a further subsequence \( {\text{\{ }}\theta _{n_i } \} \) such that \({\lim _{i \to \infty }}{\theta _{{n_{{k_i}}}}} = \hat \theta \) for a certain limit point ∈ ℝq (Rudin, 1976, p. 51). Hence, by continuity of f,

$$\mathop {\lim }\limits_{i \to \infty } f({\theta _{{n_{{k_i}}}}}) = f\left( {\mathop {\lim }\limits_{i \to \infty } {\theta _{{n_{{k_i}}}}}} \right) = f(\hat \theta ) = \inf f$$

. That is, f attains its infimum, a contradiction. It follows that {θ n } does not have a bounded subsequence. Therefore, the infimum over all subsequential limits of ‖θ n ‖ is infinite, so that ‖θ n ‖ → ∞, as n → ∞. □

Suppose an optimal CP solution does not exist. If a CP algorithm is used which is designed to minimize the CP criterion function (and does terminate with a suboptimal solution), then the Euclidean norm of the parameter vector θ n diverges to infinity as the iterations of this CP algorithm increase without bound. It follows from (4) that ‖d n ‖ → ∞, as n → ∞. Hence, there are factor magnitude(s) which diverge to infinity as the number of iterative steps increases without bound. Equivalently, this means that given any fixed large number M, there exists a finite number of iterative steps N such that ‖d n ‖ > M for all nN. This is also observed when degeneracies occur while running a CP algorithm and was proven analytically for degeneracies occurring for I × J × 2 arrays and some I × J × 3 arrays by Stegeman (2006, 2008a, 2008b).

Next, we relate the non-existence of an optimal CP solution to near linear dependency of the factors and near rank deficiency of the individual parameter matrices. Let evmin(F n F n ) be the smallest, evmax(F n F n ) be the largest eigenvalue, and κ(F n ) = ev 1/2max (F n F n )/ev 1/2min (F n F n ) be the condition number of F n (Ortega & Rheinboldt, 1970, p. 42).We have the following corollary to Theorem 1.

Corollary 1.If the CP criterion function f does not attain its infimum and f(θ n ) ↓ inf f, as n→ ∞, then evmin(F n F n ) → 0, as n → ∞.

Proof: From the triangle inequality and f(θ n ) ↓ inf f, it follows that

$$ \left\| {F_n d_n } \right\| \leqslant \left\| {x - F_n d_n } \right\| + \left\| x \right\| = f(\theta _n )^{\frac{1} {2}} + \left\| x \right\| \downarrow (inf f)^{\frac{1} {2}} + \left\| x \right\|. $$
(7)

Hence, the sequence {‖F n d n 2} is bounded. That is, there exists a positive number M such that {‖F n d n 2} for all n. From

$$ 0 \leqslant ev_{min} (F_n^\prime F_n )\left\| {d_n } \right\|^2 \leqslant \left\| {F_n d_n } \right\|^2 \leqslant M $$
((8))

, and (4) it follows that

$$ 0 \leqslant ev_{min} (F_n^\prime F_n ) \leqslant \frac{M} {{\left\| {d_n } \right\|^2 }} = \frac{M} {{\left\| {\theta _n } \right\|^2 - 2R}} $$
((9))

. By Theorem 1, ‖θ n 2 → ∞, so that evmin(F n F n ) → 0, as n → ∞. □

Since F n F n is positive semidefinite with unit diagonal elements, 1 ≤ evmax(F n F n ) ≤ R. Hence, by Corollary 1, κ(F n ) → ∞. Furthermore, Corollary 1 implies the following corollary.

Corollary 2.If the CP criterion function f does not attain its infimum and f(θ n ) ↓ inf f, as n→∞, then the smallest singular value of each column-wise normalized parameter matrix tends to zero, as n → ∞.

Proof: Suppose that f does not attain its infimum. From the definition of F n F n and elementary properties of the Kronecker product, \( F_n^\prime F_n = A_n^\prime A_n *B_n^\prime B_n *\tilde C_n^\prime \tilde C_n \) follows, where * is the element-wise Hadamard product. Since A n A n ,B n B n and ˜C n ˜C n are positive semidefinite with unit diagonal elements, it follows for all n that

$$ max\left\{ {ev_{min} (A_n^\prime A_n ),ev_{min} (B_n^\prime B_n ),ev_{min} (\tilde C_n^\prime \tilde C_n )} \right\} \leqslant ev_{min} (A_n^\prime A_n *B_n^\prime B_n *\tilde C_n^\prime \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{C} _n ) $$
((10))

(Schur, 1911; Styan, 1973). An application of Corollary 1 completes the proof. □

Since the smallest singular value of each normalized parameter matrix tends to zero, it follows that the smallest singular value is arbitrarily small if the number of iterations of the CP algorithm are sufficiently large. In this sense the normalized parameter matrices are nearly rank deficient for n sufficiently large. Since the largest singular value of each column-wise normalized parameter matrix lies between 1 and R 1/2, Corollary 2 implies that κ(A n ) → ∞, κ(B n ) → ∞, and \(\kappa ({{\tilde C}_n}) \to \infty \). That is, the condition number of each normalized parameter matrix tends to infinity as the number of iterative steps increases without bound. These phenomena were proven analytically for degeneracies occurring for I × J × 2 arrays and some I × J × 3 arrays by Stegeman (2006, 2008a, 2008b).

If R = 2, a geometric interpretation of Corollary 1 can be given as follows. Since we may assume d 1n > 0 and d 2n > 0 without loss of generality, and F n has unit column length, it follows that

$${\left\| {{F_n}{d_n}} \right\|^2} = {\left\| {{d_n}} \right\|^2} + 2{d_1}{d_2}\cos ({f_{1n}},{f_{2n}})$$

. Therefore, {‖F n d n 2} bounded, {‖d n 2} unbounded, d 1n > 0, d 2n > 0, evmin(F n F n ) = 1 − |cos(f 1n , f 2n )| → 0, as n → ∞, implies that cos(f 1n , f 2n ) → −1, as n → ∞. (If cos(f 1n , f 2n ) → 1, as n → ∞, it would follow that ‖F n d n 2 → ∞, which is contradictory.) Hence, the angle between the factors tends to 180°. More specifically, the two factors may be represented as two vectors on the boundary of a unit “ball” in ℝq of which the end points tend to positions on a straight line that contains the center as well. This is in line with what is usually observed when a degeneracy occurs while running a CP algorithm: the factors involved nearly cancel out but still contribute to a better fit of the CP model.

In case the CP criterion function f does not attain its infimum, Theorem 1 and its corollaries show the “degenerate” behavior of any sequence {θ n } such that f(θ n ) ↓ inf f. These results seem to provide a mathematical basis for the detection of cases where the criterion function does not attain its infimum. Indeed, if, for a large number of runs of a CP algorithm, the magnitudes of some factors and the condition numbers of the parameter matrices increase to arbitrary large values, then the conclusion that the CP criterion function does not attain its infimum seems inevitable. However, such reasoning need not be valid for a small number of such sequences. For example, cases are known where these phenomena occur in locally optimal neighborhoods while the CP model does have an optimal solution. See, for example, Paatero (2000) who showed this for a class of 2 × 2 × 2 arrays and Stegeman (2008b) who showed this for 5 × 3 × 3 arrays of rank 5. In these cases, running the CP algorithm several times with different starting values results in both degeneracies and optimal CP solutions.

4 4. Restrictions under which an Optimal CP Solution Exists

As mentioned earlier, De Silva and Lim (2006) showed that for 2 ≤ R ≤ min(I,J,K) there always exists an array _X of three-way rank R + 1 which has no optimal CP solution. In this section we consider orthogonality and non-negativity restrictions under which the CP model always has an optimal solution. Such constraints have been included in alternating least squares CP algorithms (Kiers, 1989a, 1989b, 1991; Krijnen & Kiers, 1993; Bro & De Jong, 1997) and can be included in the Multilinear Engine (Paatero, 1999).

To show the existence of an optimal CP solution, we make use of level sets. Let L(γ) = {θ ∈ ℝq : f(θ) ≤ γ} be a level set of the CP criterion function f (Ortega & Rheinboldt, 1970, p. 98). Theorem 1 gives a condition under which f does not have a bounded level set. We need the following lemmas, proven by Ortega and Rheinboldt (1970, p. 104).

Lemma 1. Let g : D ⊂ ℝq → ℝ1, where D is unbounded. Then all level sets of g are bounded if and only if limn→∞g(θ n )=∞whenever {θ n } ⊂ D and limn→∞θ n ‖=∞.

Lemma 2. Let g : D ⊂ ℝq → ℝ1be continuous on the closed set D. Then g has a bounded level set if and only if the set of global minimizers of g is nonempty and bounded.

Note that the CP criterion function f is continuous with an unbounded domain. The continuity of f implies that the level sets L(γ) are closed. We have the following result.

Theorem 2. If one of the parameter matrices (A,B,C) is constrained to be column-wise orthonormal, then all level sets of f are bounded and the CP model has an optimal solution.

Proof: Suppose that A n A n = I for all n. Then it follows that EquationSource% MathType!MTEF!2!1!+- % feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn % hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqr1ngB % PrgifHhDYfgasaacH8srps0lbbf9q8WrFfeuY-Hhbbf9v8qqaqFr0x % c9pk0xbba9q8WqFfea0-yr0RYxir-Jbba9q8aq0-yq-He9q8qqQ8fr % Fve9Fve9Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaaakeaaieWaca % WFgbWaa0baaSqaaiaad6gaaeaarmqr1ngBPrgitLxBI9gBaGqbaiab % +jdiIcaakiaa-zeadaWgaaWcbaGaamOBaaqabaGccqGH9aqpcaWFbb % Waa0baaSqaaGqaciaa95gaaeaacqGFYaIOaaGccaWFbbWaaSbaaSqa % aiaad6gaaeqaaOGaaiOkaiaa-jeadaqhaaWcbaGaamOBaaqaaiab+j % diIcaatCvAUfKttLearyGqPr3C0jxzH1giwvMCHbYuH52CaGGbcOGa % eWNqai0aaSbaaSqaaiaad6gaaeqaaOGaaiOkaiqb8neadzaaiaWaa0 % baaSqaaiaad6gaaeaacqGFYaIOaaGccuaFdbWqgaacamaaBaaaleaa % caWGUbaabeaakiabg2da9iaa-Leaaaa!60C8! <InlineEquation ID="IE1"><EquationSource Format="MATHTYPE"><![CDATA[ F_n^\prime F_n = A_n^\prime A_n *B_n^\prime B_n *\tilde C_n^\prime \tilde C_n = I $$ . Hence, by the triangle inequality (Luenberger, 1969, p. 22),

$$ f^{\frac{1} {2}} \left( {\theta _n } \right) = \left\| {x - F_n d_n } \right\| \geqslant \left| {\left\| x \right\| - \left\| {F_n d_n } \right\|} \right| = \left| {\left\| x \right\| - \left\| {d_n } \right\|} \right| \to \infty $$
((12))

whenever ‖d n ‖ → ∞, which is equivalent to ‖θ n ‖ → ∞ by (4). By Lemma 1 all level sets of f are bounded. Let L(γ) be a nonempty level set of f. Then L(γ) is bounded and closed and, hence, compact. Restricting f to L(γ), it follows from Lemma 2 that f attains its infimum on L(γ). □

Also using level sets, Lim (2005) showed that the CP model has an optimal solution if the parameter matrices are constrained to have non-negative elements. We will state it here without a proof.

Theorem 3.If each of the parameter matrices (A,B,C) is constrained to have non-negative elements, then all level sets of f are bounded and the CP model has an optimal solution.

Using the theory of level sets, Krijnen (2006) analyzed the existence of optimal solutions for various factor models related to the CP model.

5 5. Conclusions and Discussion

We have analyzed the situation where the CP model does not have an optimal solution, i.e., the CP criterion function f does not attain its infimum. We showed that for any sequence of CP updates {θ n } such that f(θ n ) ↓ inf f, it holds that ‖θ n ‖ → ∞, ‖d n ‖ → ∞, κ(F n ) → ∞, and, by Corollary 2, κ(A n ) → ∞, κ(B n ) → ∞, and \(\kappa ({{\tilde C}_n}) \to \infty \). Hence, the sequence of parameter vectors diverges to a CP “degeneracy”, i.e. the factors become nearly linearly dependent and the individual parameter matrices become nearly rank deficient. Our result provides a general proof of the claim by Kruskal et al. (1983, 1985) that degeneracies occur when no optimal CP solution exists. Hence, any CP algorithm minimizing the CP criterion function will yield a degeneracy if the CP model does not have an optimal solution for this particular array _X. Moreover, our result can be used to detect a degeneracy while running a CP algorithm, e.g. by monitoring the smallest singular values of the parameter matrices together with the factor lengths.

For I ×J ×2 arrays and some I ×J × 3 arrays, the occurrence of degeneracies while running a CP algorithm was mathematically described by Stegeman (2006, 2008a, 2008b). Here, the number of factors involved in the degeneracy and the type of rank deficiencies in the parameter matrices follow from the characteristics of the limit point of the sequence of (A,B,C) n .

Apart from the work of Stegeman (2006, 2008a, 2008b) and De Silva and Lim (2006) no general criteria are known to indicate whether degeneracies will occur while running a CP algorithm for the CP model for a particular array _X. However, our result may help research in this direction, since it shows the importance of determining whether a CP model has an optimal solution or not.

For the general situation considered in this paper, we may distinguish the following four cases with respect to Corollary 2 and the k-ranks k A , k B , and k C in Kruskal’s condition (2). Case 1, k A = k B = k C = R and Case 2, k A = k B = R, k C < R, are frequently encountered in empirical applications of CANDECOMP/PARAFAC. In Case 1, all parameter matrices are nearly rank deficient for large n by Corollary 2. In Case 2, A n and B n are nearly rank deficient for large n by Corollary 2. Case 3, k A = R, k B < R, k C < R, is less common. In this case Corollary 2 is nontrivial in the sense that it implies that A n is nearly rank deficient for large n. To our best knowledge Case 4, k A < R, k B < R, k C < R, has not been encountered in an empirical setting, but it was considered numerically (Harshman, 1972) and algebraically (Kruskal, 1976). In this case it is obvious that the parameter matrices are singular and that the conclusion of Corollary 2 is trivial.

Note that by multiplications of orthonormal commutation matrices it can be arranged that the order in the Kronecker product in (5) is altered (Magnus & Neudecker, 1979), so that the role played by A n , B n , or C n is interchanged. Hence, the four cases above cover all possibilities.

In order to guarantee that CP has an optimal solution, one can impose orthogonality or nonnegativity constraints on the parameter matrices (see Theorem 2 and Lim, 2005). Also, leaving out one data slice in one of the modes or changing the preprocessing scheme may overcome the problems of degeneracies.