1 Introduction

Symmetric matrices with non-negative off-diagonal elements and zero diagonal elements arise as data in many experimental sciences. This occurs when the values are measurements of squared distances between points in a Euclidean space (e.g. atoms, stars, cities). Such a matrix is referred to as a Euclidean distance matrix. Because of data errors such a matrix may not be exactly Euclidean and it is desirable to find the best Euclidean matrix which approximates the non-Euclidean matrix. The aim of this paper is to study a new method for solving the Euclidean distance matrix problem and compare it with other older methods [1].

An important application arises in the conformation of molecular structures from nuclear magnetic resonance data (see [2] and [3]). Here a Euclidean distance matrix is used to represent the squares of distances between the atoms of a molecular structure. An attempt to determine such a structure by nuclear magnetic resonance experiments gives rise to a distance matrix F which, because of data errors, may not be Euclidean. There are many other applications in subjects as diverse as archeology, cartography, genetics, geography, and multivariate analysis. Pertinent references are given by Al-Homidan [4, 5].

Characterization theorems for the Euclidean distance matrix have been given in many forms. In Section 2 we show a very important characterization which brings out the underlying structure and is readily applicable to the algorithms that follow.

This paper addresses a non-smooth optimization problem in which some matrix, defined in terms of the problem variables, has to be positive semidefinite. One way to handle this problem is to impose a functional constraint in which the least eigenvalue of the matrix is non-negative. However, if there are multiple eigenvalues at the solution, which is usually the case, such a constraint is non-smooth, and this non-smoothness cannot be modeled by a convex polyhedral composite function. An important factor is the determination of the multiplicity of the zero eigenvalues, or alternatively the rank of the matrix at the solution. If this rank is known it is usually possible to solve the problem by conventional techniques.

Glunt et al. [6] formulate the Euclidean distance matrix problem as a constrained least distance problem in which the constraint is the intersection of two convex sets. The Dykstra-Han alternating projection algorithm can then be used to solve the problem. This method is globally convergent but the rate of convergence is very slow. However, the method does have the capability to determine the correct rank of the solution matrix.

Recently, there has been much interest in the interior point methods applied to problems with semidefinite matrix constraints (e.g. the survey papers [7] and [8] and the references therein). Semidefinite programming optimizes a linear function subject to a positive semidefinite matrix. It is a convex programming problem since the objective and constraints are convex. In this paper, we deal with a problem that is a little different since the objective is quadratic; also an additional rank constraint is added which makes the problem non-convex and harder to solve. Here, we use a different approach from the interior point methods. If the correct rank of the solution matrix is known, it is shown in Section 3 how to formulate the problem as a smooth unconstrained minimization problem, for which rapid convergence can be obtained by for example the BFGS method. We give expressions for the objective function and its first derivatives.

In [1] a hybrid method is studied between a projection method and a quasi-Newton method; a similar study can be performed as regards all its features. Finally, in Section 4, numerical comparisons are carried out.

2 The Euclidean distance matrix problem

In this section the definition of the Euclidean distance matrix is given, and the relationship between points and distances is summarized. A characterization theorem for the Euclidean distance matrix is proved in a concise way that brings out the underlying structure and is readily applicable to the algorithms that follow.

It is necessary to distinguish between distance matrices that are obtained in practice and those that can be derived exactly from n vectors in an affine subspace.

Definition 2.1 A matrix F R n × n is called a distance matrix iff it is symmetric, the diagonal elements are zero

f i i =0,i=1,,n,

and the off-diagonal entries are non-negative

f i j 0,ij.

Definition 2.2 A matrix D R n × n is called a Euclidean distance matrix iff there exist n vectors x 1 ,, x n in an affine subspace of dimension R r (rn1) such that

d i j = x i x j 2 2 ,i,j.
(2.1)

The Euclidean distance problem can now be stated as follows: Given a distance matrix F R n × n , find the Euclidean distance matrix D R n × n that minimizes

F D F ,
(2.2)

where F denotes the Frobenius norm.

The theorem is essentially due to Schoenberg [9].

Theorem 2.3 The distance matrix D R n × n is a Euclidean distance matrix if and only if the (n1)×(n1) symmetric matrix A defined by

a i j = 1 2 [ d s 1 + d t 1 d s t ](1i,jn1)
(2.3)

is positive semidefinite, where s=i+1, t=j+1, and D is irreducibly embeddable in R r (r<n) where r=rank(A). Moreover, consider the spectral decomposition

A=UΛ U T .
(2.4)

Let Λ r be the matrix of non-zero eigenvalues in Λ and define X by

X= U r ,then A=X Λ r X T ,
(2.5)

where Λ r R r × r is a diagonal matrix and U r R ( n 1 ) × r .

3 The method

In this section we consider a different approach to the Euclidean distance matrix problem (2.2). The main idea is to replace (2.2) by a smooth unconstrained optimization problem in order to use superlinearly convergent quasi-Newton methods. To do this it is necessary to estimate the rankr as this piece of information is not generally known. Once a value of r is chosen, the problem (2.2) is solved by the BFGS method. We give the relevant formulas for the derivatives. At the end of the section we discuss details of the initialization and implementation.

If the rankr is known, it is possible to express (2.2) as a smooth unconstrained optimization problem in the following way. The unknowns in the problem are chosen to be the elements of the matrix X and Λ r introduced in (2.5). We take X to have r columns and Λ r a diagonal matrix as shown below. This gives us an unconstrained optimization problem in r(n1) r ( r + 1 ) 2 unknowns. We therefore parametrize X and Λ r in the following way:

A=X Λ r X T ,where X= [ 1 0 0 x 21 1 0 x 32 1 x r + 1 , r x m 1 x m 2 x m r ] , Λ r = [ λ 1 λ r ] .
(3.1)

The objective function ϕ(X) is readily calculated by first forming D from X and Λ r as indicated by (2.1), after which ϕ is given by ϕ(X, Λ r )= D F F 2 . When s=t, then d s t =0, using (2.3) we get a i i = 1 2 [ d s 1 + d s 1 0]= d s 1 , then the elements of the matrix D take the form

d s 1 = a i i = k = 1 i 1 x i k 2 λ k + λ i if  i = 1 , , r , s = i + 1 , d s 1 = a i i = k = 1 r x i k 2 λ k if  i = r + 1 , , n 1 , d s t = a i j = k = 1 i 1 x i k x j k λ k + x i j + λ i if  i r  or  j r , d s t = a i j = k = 1 r x i k x j k λ k if  i > r  and  j > r ,

where t=j+1. Hence

ϕ = s , t = 1 n ( d i j f i j ) 2 = 2 s = 2 n ( d s 1 f s 1 ) 2 + 2 s , t = 2 s < t n ( d s t f s t ) 2 = 2 i = 1 r [ k = 1 i 1 x i k 2 λ k + λ i f s 1 ] 2 + 2 i = r + 1 n 1 [ k = 1 r x i k 2 λ k f s 1 ] 2 + 2 i , j = 1 i < j r [ k = 1 i 1 x i k 2 λ k + λ i + k = 1 j 1 x k j 2 λ k + λ j 2 k = 1 i 1 x i k x k j λ k + x i j λ i f s t ] 2 + 2 i = 1 r j = r + 1 n 1 [ k = 1 i 1 x i k 2 λ k + λ i + k = 1 r x k j 2 λ k 2 k = 1 i 1 x i k x k j λ k + x i j λ i f s t ] 2 + 2 i , j = r + 1 i < j n 1 [ k = 1 r x i k 2 λ k + k = 1 r x k j 2 λ k 2 k = 1 r x i k x k j λ k f s t ] 2 .
(3.2)

Our chosen method to minimize ϕ(X) is the BFGS quasi-Newton method (see for example [10]). This requires expressions for the first partial derivatives of ϕ, which are given from (3.2) by

ϕ λ i = 2 { 2 l = 1 i k = l + i n 1 ( d k + 1 l f k + 1 l ) x i k 2 + 2 k = i + 1 n 1 ( d k + 1 i + 1 f k + 1 i + 1 ) ( 1 + x i k 2 2 x i k ) + 2 l = i + 2 r k = l n 1 ( d k + 1 l f k + 1 l ) ( x i l 1 2 + x i k 2 2 ) x i , l 1 x i k ) } ,
(3.3)

for all i=1,,r. For j=1,,r, and i=j+1,,n1:

ϕ x i j =4 k = 0 i 1 ( d j + 1 k + 1 f j + 1 k + 1 )(2 x i j λ j )+4 k = i n 1 ( d j + 1 k + 1 f j + 1 k + 1 )(2 x i j λ j 2 x k j λ j ).
(3.4)

The BFGS method also requires the Hessian approximation to be initialized. Where necessary, we do this using a unit matrix.

Some care has to be taken when choosing the initial value of the matrix X and Λ r , in particular the rank must be r. If not, the minimization method may not be able to increase the rank of X. An extreme case occurs when the initial matrix X=0 and Λ r =0 are chosen, and F0. It can be seen from (3.3) and (3.4) that the components of the gradient vector are all zero, so that X=0 and Λ r =0 are stationary points, but not minimizers. A gradient method will usually terminate in this situation and so fail to find the solution.

A reliable method for initializing X and Λ r =0 is to use the construction suggested by (3.1) and (2.3). Thus we define the elements of A by those of F by

a i j = 1 2 ( f i j f 1 i f 1 j ),i2,j2.
(3.5)

The first row and column of A are zero and are ignored. We then find the spectral decomposition UΣ U T of the nontrivial part of A. Finally the nontrivial part of X and Λ r in (3.1) is initialized to the matrix Σ r 1 / 2 U r T where Σ r =diag( σ i ), i=1,,r is composed of the r largest eigenvalues in Σ, and the columns of U r are the corresponding eigenvectors. When Σ r is positive definite, this procedure ensures that A has the correct rankr. Otherwise the process must be modified in some way, for example by ensuring that the diagonal elements in Σ r lie above a positive threshold.

An advantage of this method is that it allows the spatial dimensions to be chosen by the user. This is useful when the rank is already known. For example if the entries in F are derived from distances between cities then the dimension will be no higher than r=2. Likewise, if the entries are derived from distances between atoms in a molecule or stars in space, then the maximum dimension is r=3.

In general, however, the rank is not known, for example the atoms in a molecule may turn out to be collinear or coplanar. We therefore must consider an algorithm in which we are prepared to revise our estimate of r. A simple strategy is to repeat the entire method for different values of r. If r denotes the correct value of r which solves (2.2), then it is observed that the BFGS method converges rapidly if r r , and that it exhibits superlinear convergence. On the other hand if r> r then slow convergence is observed. One reason is that there are more variables in the problem. Also redundancy in the parameter space may have an effect. Thus it makes sense to start with a small value of r, and increase it by one until the solution is recognized. One way to recognize termination is when D ( r ) agrees sufficiently well with D ( r + 1 ) , where D ( r ) denotes the Euclidean distance matrix obtained by minimizing ϕ when Λ r in (3.1) has r diagonal elements. Numerical experience is reported in [4] for solving various test problems by other methods which will be compared with this method.

An obvious alternative to using the BFGS method is to evaluate the Hessian matrix of second derivatives of ϕ(X) and use Newton’s method. This would likely reduce the number of iterations required. However, there is also the disadvantage of increased complexity, and increased housekeeping at each iteration. Moreover, it is possible that the Hessian has some negative eigenvalues so a modified form of Newton’s method would be required. A simple example serves to illustrate the possibility of a negative eigenvalue. Take n=2, r=1, and let F= [ 0 1 1 0 ] , X=[1], and Λ r =[ λ 1 ]. Then ϕ=2 ( 1 λ 1 2 ) 2 . This has global minimizers at λ 1 =±1, a local maximizer at λ 1 =0, and the Hessian is negative for all λ 1 such that 3 λ 1 2 <1.

This method has entirely different features, some good, some bad, which suggests that a combination of both this method and a projection method [6] might be successful. Projection methods are globally convergent and hence potentially reliable, but the rate of convergence is first order or slower, which can be very inefficient. Quasi-Newton methods are reliable and locally superlinearly convergent, but they require that the correct rank r is known. Therefore hybrid methods should be established along the lines of [1], in which the projection algorithm is used sparingly as a way of establishing the correct rank, while the BFGS method is used to provide rapid convergence.

4 Numerical results

In this section, we compare three methods, our method, the hybrid method in [1] and the unconstrained method of the same reference. The algorithms have been tested on randomly generated distance matrices F with values distributed between 10−3 and 103. All calculations were performed with Mathlab 8. Figure 1 compares the line searches and CPU time of the three methods. The termination criterion for both methods is D ( k ) D ( k 1 ) < 10 5 . All methods converge to essentially the same values.

Figure 1
figure 1

Comparing the line searches and CPU time of the three methods for the Euclidean distance matrix problem.

In Figure 1, the upper figure shows that the number of line searches for our method is slightly lower than the unconstrained method and higher than the hybrid method. However, in the lower figure it is clear that our method is much faster and this because our method has r ( r + 1 ) 2 less CPU time. A hybrid method uses much less line searches from both methods, however, it consumes much more time than our method because it uses a projection method as a start. This makes our method more efficient and faster.

The housekeeping associated with each line search is O( n 2 ). Also, if care is taken, it is possible to calculate ϕ(X) and ϕ(X) in O( n 2 ) operations. The initial value r ( 0 ) is tabulated, and r is increased by one until the solution is found. The total number of line searches is tabulated, and in this figure, it is found that fewer line searches are required as r increases. Also the initial value r ( 0 ) =6 is rather arbitrary: a smaller value of r ( 0 ) would have given an even larger number of line searches.