Computational complexity of inferring phylogenies from dissimilarity matrices Authors William H. E. Day Department of Computer Science Memorial University of Newfoundland Article

Received: 09 December 1986 DOI :
10.1007/BF02458863

Cite this article as: Day, W.H.E. Bltn Mathcal Biology (1987) 49: 461. doi:10.1007/BF02458863
Abstract
Molecular biologists strive to infer evolutionary relationships from quantitative macromolecular comparisons obtained by immunological, DNA hybridization, electrophoretic or amino acid sequencing techniques. The problem is to find unrooted phylogenies that best approximate a given dissimilarity matrix according to a goodness-of-fit measure, for example the least-squares-fit criterion or Farris'sf statistic. Computational costs of known algorithms guaranteeing optimal solutions to these problems increase exponentially with problem size; practical computational considerations limit the algorithms to analyzing small problems. It is established here that problems of phylogenetic inference based on the least-squares-fit criterion and thef statistic are NP-complete and thus are so difficult computationally that efficient optimal algorithms are unlikely to exist for them.

The Natural Sciences and Engineering Research Council of Canada partially supported this research through an individual operating grant (A4142) to W.H.E. Day.

Literature Bandelt, H.-J. and A. Dress. 1986. “Reconstructing the Shape of a Tree from Observed Dissimilarity Data”.

Adv. appl. Math.
7 , 309–343.

MATH MathSciNet CrossRef Buneman, P. 1971. “The Recovery of Trees from Measures of Dissimilarity”. InMathematics in the Archaeological and Historical Sciences , F. R. Hodson, D. G. Kendall and P. Tautu (Eds), pp. 387–395. Edinburgh: Edinburgh University Press.

Cavalli-Sforza, L. L. and A. W. F. Edwards. 1965. “Analysis of Human Evolution.” InGenetics Today: Proceedings of the XI International Congress of Genetics , Vol. 3, S. J. Geerts (Ed.), pp. 923–933. Oxford: Pergamon Press.

— and —. 1967. “Phylogenetic Analysis: Models and Estimation Procedures”.Am. J. hum. Genet.
19 , 233–257;Evolution
21 , 550–570.

Day, W. H. E. 1983. “Computationally Difficult Parsimony Problems in Phylogenetic Systematics”.

J. theor. Biol.
103 , 429–438.

MathSciNet CrossRef —, D. S. Johnson and D. Sankoff. 1986. “The Computational Complexity of Inferring Rooted Phylogenies by Parsimony”.

Math. Biosci.
81 , 33–42.

MATH MathSciNet CrossRef — and D. Sankoff. 1986. “Computational Complexity of Inferring Phylogenies by Compatibility”.

Syst. Zool.
35 , 224–229.

CrossRef — and —. 1987. “Computational Complexity of Inferring Phylogenies from Chromosome Inversion Data”.

J. theor. Biol.
124 , 213–218.

CrossRef Dobson, A. J. 1974. “Unrooted Trees for Numerical Taxonomy”.

J. appl. Probab.
11 , 32–42.

MATH MathSciNet CrossRef Farris, J. S. 1972. “Estimating Phylogenetic Trees from Distance Matrices”.

Am. Nat.
106 , 645–668.

CrossRef —.1981. “Distance Data in Phylogenetic Analysis”. InAdvances in Cladistics: Proceedings of the First Meeting of the Willi Hennig Society , V. A. Funk and D. R. Brooks (Eds), pp. 3–23. Bronx: New York Botanical Garden.

Fitch, W. M. and E. Margoliash. 1967. “Construction of Phylogenetic Trees”.Science
155 , 279–284.

Foulds, L. R. and R. L. Graham. 1982. “The Steiner Problem in Phylogeny is NP-complete”.

Adv. appl. Math.
3 , 43–49.

MATH MathSciNet CrossRef Garey, M. R. and D. S. Johnson. 1979.Computers and Intractability: A Guide to the Theory of NP-Completeness . San Francisco: W. H. Freeman.

Graham, R. L. and L. R. Foulds. 1982. “Unlikelihood that Minimal Phylogenies for a Realistic Biological Study can be Constructed in Reasonable Computational Time”.

Math. Biosci.
60 , 133–142.

MATH MathSciNet CrossRef Hakimi, S. L. and S. S. Yau. 1965. “Distance Matrix of a Graph and its Realizability.”

Quart. appl. Math.
22 , 305–317.

MATH MathSciNet Harary, F. 1969Graph Theory . Reading, Massachusetts: Addison-Wesley.

Hartigan, J. A. 1967. “Representation of Similarity Matrices by Trees”.

J. Am. Statist. Ass.
62 , 1140–1158.

MathSciNet CrossRef Jardine, N. and R. Sibson.Mathematical Taxonomy . London: John Wiley.

Křivánek, M. 1986. “On the Computational Complexity of Clustering.” InData Analysis and Informatics IV, E. Didayet al. (Eds), pp. 89–96. Amsterdam: Elsevier Science.

— and J. Morávek. 1984. “On NP-hardness in Hierarchical Clustering.” InCompstat 1984, T. Havránek, Z. Šidák and M. Novák (Eds), pp. 189–194. Wien: Physica-Verlag.

— and —. 1986. “NP-hard Problems in Hierarchical-tree Clustering.”

Acta Inform.
23 , 311–323.

MathSciNet CrossRef Prager, E. M. and A. C. Wilson. 1976. “Congruency of Phylogenies Derived from Different Proteins.”

J. mol. Evol.
9 , 45–57.

CrossRef Sattath, S. and A. Tversky. 1977. “Additive Similarity Trees.”

Psychometrika
42 , 319–345.

CrossRef © Society for Mathematical Biology 1987