Journal of Molecular Evolution

, Volume 19, Issue 2, pp 153–170

Accuracy of estimated phylogenetic trees from molecular data

II. Gene frequency data


  • Masatoshi Nei
    • Center for Demographic and Population GeneticsThe University of Texas at Houston
  • Fumio Tajima
    • Center for Demographic and Population GeneticsThe University of Texas at Houston
  • Yoshio Tateno
    • Center for Demographic and Population GeneticsThe University of Texas at Houston

DOI: 10.1007/BF02300753

Cite this article as:
Nei, M., Tajima, F. & Tateno, Y. J Mol Evol (1983) 19: 153. doi:10.1007/BF02300753


The accuracies and efficiencies of three different methods of making phylogenetic trees from gene frequency data were examined by using computer simulation. The methods examined are UPGMA, Farris' (1972) method, and Tateno et al.'s (1982) modified Farris method. In the computer simulation eight species (or populations) were assumed to evolve according to a given model tree, and the evolutionary changes of allele frequencies were followed by using the infinite-allele model. At the end of the simulated evolution five genetic distance measures (Nei's standard and minimum distances, Rogers' distance, Cavalli-Sforza's fλ, and the modified Cavalli-Sforza distance) were computed for all pairs of species, and the distance matrix obtained for each distance measure was used for reconstructing a phylogenetic tree. The phylogenetic tree obtained was then compared with the model tree. The results obtained indicate that in all tree-making methods examined the accuracies of both the topology and branch lengths of a reconstructed tree (rooted tree) are very low when the number of loci used is less than 20 but gradually increase with increasing number of loci. When the expected number of gene substitutions (M) for the shortest branch is 0.1 or more per locus and 30 or more loci are used, the topological error as measured by the distortion index (dT) is not great, but the probability of obtaining the correct topology (P) is less than 0.5 even with 60 loci. When M is as small as 0.004, P is substantially lower. In obtaining a good topology (small dT and high P) UPGMA and the modified Farris method generally show a better performance than the Farris method. The poor performance of the Farris method is observed even when Rogers' distance which obeys the triangle inequality is used. The main reason for this seems to be that the Farris method often gives overestimates of branch lengths. For estimating the expected branch lengths of the true tree UPGMA shows the best performance. For this purpose Nei's standard distance gives a better result than the others because of its linear relationship with the number of gene substitutions. Rogers' or Cavalli-Sforza's distance gives a phylogenetic tree in which the parts near the root are condensed and the other parts are elongated. It is recommended that more than 30 loci, including both polymorphic and monomorphic loci, be used for making phylogentic trees. The conclusions from this study seem to apply also to data on nucleotide differences obtained by the restriction enzyme techniques.

Key words

UPGMAFarris' methodModified Farris methodGenetic distanceTopological errorsErrors in branch lengthTriangle inequality

Copyright information

© Springer-Verlag 1983