1 Introduction

Carroll and Chang (1970) developed Candecomp as a method of fitting scalar products derived from INDSCAL. Specifically, they sought to minimize the function

$$ g\left( {X,D} \right) = \sum\limits_{i = 1}^m {\left\| {S_i - XD_i X'} \right\|^2 } $$

, where S i is a symmetric p × p matrix of (true or pseudo) scalar products, X is a p × r matrix of components, and D i is a diagonal r × r matrix of saliencies, with elements of row i of D in the diagonal, i = 1, …, m. Because direct minimization of g seems difficult, Carroll and Chang instead proposed Candecomp, to minimize

$$ f\left( {X,Y,D} \right) = \sum\limits_{i = 1}^m {\left\| {S_i - XD_i Y'} \right\|^2 } $$

. They assumed that, at the minimum, the symmetry of the slices S i will cause X and Y to be equal or at least column-wise proportional. In the latter case, columns of Y can be rescaled to become equal to those of X, the inverse scaling being applied to columns of D. The Candecomp algorithm minimizes f by alternately optimizing X conditionally for fixed Y and D, optimizing Y conditionally for fixed X and D, and optimizing D conditionally for fixed X and Y. In practice, the claim that symmetry of the slices will render X and Y column-wise proportional at the minimum seems warranted. However, for contrived data, counterexamples do exist. In particular, the single-component case (r = 1) has played a role in contriving counterexamples.

Ten Berge and Kiers (1991) examined the single-component case, where X, Y, and D can be represented as vectors x, y, d. They constructed Gramian matrices S 1 and S 2, and a solution {x, y,d} for which the derivatives of f vanish, yet the claim that x and y will be proportional is unwarranted. They also showed that such cases could not possibly represent a global minimum for f, and conjectured that they could arise only at local minima. However, they seem to have ignored saddle points as another possibility. In the present note it is shown that, when S 1, …,S m are Gramian, nonproportional components in the single-component case can be obtained only for saddle points. This is more than merely a theoretical exercise: Because saddle points seem far more difficult to attain by Candecomp than local minima, the practical relevance of counterexamples where x and y are nonproportional is thus further reduced. We start by examining derivatives. Throughout, we use Σ to denote ∑ m i=1 .

2 First- and Second-Order Derivatives of f

The conditionally optimal solutions for X, Y, and D are those that satisfy the equations

$$ \frac{{\partial f}} {{\partial X}}2X\sum {D_i Y'YD_i - 2\sum {S_i } YD_i = O} $$

,

$$ \frac{{\partial f}} {{\partial Y}}2Y\sum {D_i X'XD_i - 2\sum {S_i } XD_i = O} $$

, and

$$ \frac{{\partial f}} {{\partial Di}} = 2\left( {X'X*Y'Y} \right)Vec\left( {Diag\left( {D_i } \right)} \right) - 2Vec\left( {Diag\left( {X'S_i Y} \right)} \right) = O $$

, where * is the element-wise (Hadamard) product of matrices, and Diag(Vec(D i )) the vector holding the diagonal elements of D i . In the r = 1 case, these equations simplify to

$$ 2\sum {d_i^2 \left( {y'y} \right)x - 2} \sum {d_i S_i y = 0} $$

,

$$ 2\sum {d_i^2 \left( {x'x} \right)y - 2} \sum {d_i S_i x = 0} $$

, and

$$ 2d_i x'xy'y - 2x'S_i y = 0 $$

, where d i is the only element of D i . The second-order derivative matrix for the r = 1 case is

$$ \left[ {\begin{array}{*{20}c} {2y'y\left( {d'd} \right)Ip} \\ {4\left( {d'd} \right)yx' - 2\sum {d_i S_i } } \\ {4\left( {y'y} \right)dx' - 2\left[ {S_1 y \ldots S_m y} \right]^\prime } \\ \end{array} \begin{array}{*{20}c} {4\left( {d'd} \right)xy' - 2\sum {d_i S_i } } \\ {2x'x\left( {d'd} \right)I_p } \\ {4\left( {x'x} \right)dy' - 2\left[ {S_1 x \ldots S_m x} \right]^\prime } \\ \end{array} \begin{array}{*{20}c} {4y'y\left( {xd'} \right) - 2\left[ {S_1 y \ldots S_m y} \right]} \\ {4\left( {x'x} \right)yd' - 2\left[ {S_1 x \ldots S_m x} \right]} \\ {2x'xy'yI_m } \\ \end{array} } \right] $$

, where d is the vector holding d 1, …, d m . When this matrix is positive definite (i.e., having all eigenvalues positive), we have a local or global minimum. When it is indefinite (i.e., having positive and negative eigenvalues), we have a saddle point. Consider the following contrived data and solution of Ten Berge and Kiers (1991), with x and y nonproportional. Let

$$\eqalign{ & {{\rm{S}}_1} = \left[ {\matrix{ 3 & 1 & 0 \cr 1 & 3 & 0 \cr 0 & 0 & 0 \cr } } \right],{{\rm{S}}_2} = \left[ {\matrix{ 3 & { - 1} & 0 \cr { - 1} & 3 & 0 \cr 0 & 0 & 1 \cr } } \right], \cr & {\rm{x}} = \left[ {\matrix{ 1 \cr 0 \cr 0 \cr } } \right],{\rm{y}} = \left[ {\matrix{ 0 \cr 1 \cr 0 \cr } } \right],{\rm{ and d = }}\left[ {\matrix{ 1 \cr { - 1} \cr } } \right] \cr} $$

. It can be verified that the first-order derivatives vanish. Ten Berge and Kiers (1991, p. 324) conjectured from their Result 6 that such cases with x and y nonproportional occur only at local minima of f. However, the matrix of second-order derivatives (9) is

$$\left[ {\matrix{ 4 & 0 & 0 & 0 & 4 & 0 & 2 & { - 2} \cr 0 & 4 & 0 & { - 4} & 0 & 0 & { - 6} & { - 6} \cr 0 & 0 & 4 & 0 & 0 & 2 & 0 & 0 \cr 0 & { - 4} & 0 & 4 & 0 & 0 & { - 6} & { - 6} \cr 4 & 0 & 0 & 0 & 4 & 0 & 2 & { - 2} \cr 0 & 0 & 2 & 0 & 0 & 4 & 0 & 0 \cr 2 & { - 6} & 0 & { - 6} & 2 & 0 & 2 & 0 \cr { - 2} & { - 6} & 0 & { - 6} & { - 2} & 0 & 0 & 2 \cr } } \right]$$

with eigenvalues 13.04, 10, 8, 6, 2, 0, 0, and −11.04, so it is indefinite. Therefore, we are looking at a saddle point. This is not just a property of this particular example. All stationary points of f with x and y nonproportional are saddle points. To prove this, we might try to show that (9) is indefinite whenever (6), (7), and (8) are satisfied for a nonproportional pair x and y. It is, however, much easier to use an alternative approach to minimizing f.

3 An Alternative Approach for the r =1 Case

Instead of using Candecomp to minimize f, the r = 1 case can be solved by constraining x and y to be of unit length, and expressing d i as x′S i y, i = 1, …, m, see (8). Then minimizing f is reduced to maximizing

$$ h\left( {x,y} \right) = \sum {\left( {x'S_i y} \right)^2 } $$

subject to the constraint x′x = y′y = 1. Clearly, for y fixed, x can be optimized conditionally as the eigenvector associated with the largest eigenvalue of ∑S i yy′S i and, for x fixed, the conditionally optimal y is the eigenvector associated with the largest eigenvalue of ∑S i xx′S i . Because d is updated implicitly, this algorithm optimizes {x,d} and {y,d}, iteratively. The first-order derivatives of the Lagrangian \( L\left( {x,y} \right) = \sum {\left( {x'S_i y} \right)^2 - \lambda _1 \left( {x'x - 1} \right) - } \lambda _2 \left( {y'y - 1} \right) \) are

$$ \frac{{\partial L}} {{\partial x}} = 2\sum {S_i yy'S_i x - 2\lambda _1 x} $$

, and

$$ \frac{{\partial L}} {{\partial y}} = 2\sum {S_i xx'S_i y - 2\lambda _2 y} $$

, and the matrix of second-order derivatives is

$$ \frac{{\partial ^2 L}} {{\partial x\partial y}} = 2\left[ {\begin{array}{*{20}c} {\sum {S_i yy'S_i - \lambda _1 I_3 } } \\ {\sum {\left( {x'S_i y} \right)Si + } \sum {S_i xy'Si} } \\ \end{array} \begin{array}{*{20}c} {\sum {\left( {x'S_i y} \right)S_i + } \sum {S_i yx'S_i } } \\ {\sum {S_i xx'S_i - \lambda _2 I_3 } } \\ \end{array} } \right] $$

. It is immediate from (12) and (13) that λ 1 = λ 2 = (x′S i y)2 when first-order derivatives vanish. Let the columns of a 2p × 2 matrix W span the subspace orthogonal to the columns of \(\left[ {\matrix{ {2{\rm{x}}} & 0 \cr 0 & {2{\rm{y}}} \cr } } \right]\). When the first-order derivatives of L vanish, we have a maximum if \( W'\frac{{\partial ^2 L}} {{\partial x\partial y}}W \) is negative definite, and a saddle point if it is indefinite. For the data of (10), that matrix is

$$\left[ {\matrix{ {32} & 0 & {40} & 0 \cr 0 & { - 4} & 0 & { - 2} \cr {40} & 0 & {32} & 0 \cr 0 & { - 2} & 0 & { - 4} \cr } } \right]$$

, with eigenvalues 72, −2,−6, and −8. It is indefinite, implying a saddle point like before. It will now be shown that all cases where x and y are nonproportional are saddle points of h.

Result. When a solution for h(x, y) is a local maximum, it has conditional optimality. Such a solution has xy when any subset of S 1, …,S m are Gramian and admit a nonsingular linear combination.

Proof: Suppose the solution (x, y) is a local maximum of h. Then x is a local maximum of h y (z) = z′S i yy′S i z, with y fixed. Define A y = ∑S i yy′S i , with eigendecomposition A y = KΛK′, where Λ is the diagonal matrix of eigenvalues λ 1λ 2 ≥ … ≥ λ m . Define u = K′z. Then u is a local maximum of u′Λu subject to u′u = 1. Because the first-order derivatives vanish we have

$$\Lambda {\rm{u}} = {\rm{\lambda u}}$$

for some eigenvalue λ and eigenvector u of Λ. The second derivatives matrix is 2(Λλ I). At a local maximum, it is at least negative semidefinite on the subspace orthogonal to u. That is, for all h orthogonal to u we have h′(Λλ I)h ≤ 0. When we pick the largest eigenvalue and associated eigenvector e 1, the first column of the identity matrix, this inequality is satisfied. However, when we pick any other eigenvalue and associated eigenvector, e′ 1(Λλ I)e 1 will be strictly positive. It follows that a local maximum has u = e 1 hence x equal to the principal eigenvector of A y . That is, x is conditionally optimal for h given y. The entire derivation can be repeated with roles of x and y reversed, showing that we have conditional optimality also for y at local maxima of h. Finally, Result 6 of Ten Berge and Kiers (1991) implies that every solution with conditional optimality for x and y has (x′S i x)2 = (y′S i y)2. When any subset of S 1, …, S m are Gramian, and admit a nonsingular linear combination, it follows that xy.

The meaning of this result is that at stationary points of h in the case of Gramian matrices, x and y can be nonproportional only at saddle points. This also extends to the more general function f. When that function has a local minimum, h has a local maximum, which means that xy is guaranteed. As a result, a comment in Ten Berge and Kiers (1991, p. 324) can now be sharpened: Where it was stated that asymmetry in the present examples can occur only at local maxima, it is now clear that asymmetry can occur only at saddle points. This fact reduces the probability of ever finding asymmetry in practical applications. Still, it does happen now and then that Candecomp converges to saddle points. For instance, when the data are S 1 and S 2 of (13), we have a saddle point at

$${{\rm{x}}_0} = \left[ {\matrix{ 1 \cr 0 \cr 0 \cr } } \right],{\rm{ }}{{\rm{y}}_{\rm{0}}}{\rm{ = }}\left[ {\matrix{ 0 \cr 1 \cr 0 \cr } } \right],{\rm{ and }}{{\rm{d}}_{\rm{0}}}{\rm{ = }}\left[ {\matrix{ 1 \cr { - 1} \cr } } \right]$$

. When Candecomp is started at points where x is replaced by x 0+t × rand (−0.5,+0.5), updating y and d first, and t is picked as small as 0.0001, it converges to the saddle point in 7% of the cases.

4 Discussion

The assumption that Candecomp, when applied to symmetric slices, will converge to solutions with two component matrices equal up to column scaling remains unsettled for r > 1. However, the counterexamples presented by Ten Berge and Kiers are saddle points rather than local minima. This is reassuring, because Candecomp, for r = 1, tends to get stuck at local minima much easier than at saddle points. Still, the probability of convergence to a saddle point, when Candecomp starts very close to one, is strictly positive for r = 1.