Skip to main content
Log in

Effectiveness of measures requiring and not requiring prior sequence alignment for estimating the dissimilarity of natural sequences

  • Published:
Journal of Molecular Evolution Aims and scope Submit manuscript

Summary

Various measures of sequence dissimilarity have been evaluated by how well the additive least squares estimation of edges (branch lengths) of an unrooted evolutionary tree fit the observed pairwise dissimilarity measures and by how consistent the trees are for different data sets derived from the same set of sequences. This evaluation provided sensitive discrimination among dissimilarity measures and among possible trees. Dissimilarity measures not requiring prior sequence alignment did about as well as did the traditional mismatch counts requiring prior sequence alignment. Application of Jukes-Cantor correction to singlet mismatch counts worsened the results. Measures not requiring alignment had the advantage of being applicable to sequences too different to be critically alignable. Two different measures of pairwise dissimilarity not requiring alignment have been used: (1) multiplet distribution distance (MDD), the square of the Euclidean distance between vectors of the fractions of base singlets (or doublets, or triplets, or…) in the respective sequences, and (2) complements of long words (CLW), the count of bases not occurring in significantly long common words. MDD was applicable to sequences more different than was CLW (noncoding), but the latter often gave better results where both measures were available (coding). MDD results were improved by using longer multiplets and, if the sequences were coding, by using the larger amino acid and codon alphabets rather than the nucleotide alphabet. The additive least squares method could be used to provide a reasonable consensus of different trees for the same set of species (or related genes).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Astolfi P, Kidd KK, Cavalli-Sforza LL (1981) A comparison of methods for reconstructing evolutionary trees. Syst Zool 30:156–169

    Article  Google Scholar 

  • Blaisdell BE (1983) A prevalent persistent global nonrandomness that distinguishes coding and noncoding eucaryotic nuclear DNA sequences. J Mol Evol 19:122–133

    Article  PubMed  CAS  Google Scholar 

  • Blaisdell BE (1986) A measure of the similarity of sets of sequences not requiring sequence alignment. Proc Natl Acad Sci USA 83:5155–5159

    Article  PubMed  CAS  Google Scholar 

  • Cavalli-Sforzas LL, Edwards AWF (1967) Phylogenetic analysis: models and estiation procedures. Evolution 32:550–570

    Article  Google Scholar 

  • Cornish-Bowden A (1979) How reliably do amino acid composition comparisons predict sequence similarities between proteins? J Theor Biol 76:369–386

    Article  PubMed  CAS  Google Scholar 

  • Dayhoff MO (1979) Atlas of protein sequence and structure, vol 5, suppl 3. National Biomedical Research Foundation, Washington DC, p 8

    Google Scholar 

  • Dickerson RE, Geis I (1983) Hemoglobin: structure, function, evolution and pathology. Benjamin/Cummings, Menlo Park CA, p 93

    Google Scholar 

  • Felsenstein J (1982) Numerical methods for inferring evolutionary trees. Q Rev Biol 57:379–404

    Article  Google Scholar 

  • Felsenstein J (1986) PHYLIP—phylogeny inference package (version 2.9). Unversity of Washington, Seattle

    Google Scholar 

  • Felsenstein J (1987) PHYLIP Newsletter, number 9, May 1987

  • Gibbs AJ, Dale MB, Kinns HR, MacKenzie HG (1971) The transition matrix method for comparing sequences; its use in describing and classifying proteins by their amino acid sequence. Syst Zool 20:417–425

    Article  CAS  Google Scholar 

  • Jukes TH, Cantor CR (1969) Evolution of protein molecules. In: Munro NH (ed) Mammalian protein metabolism. Academic Press, New York, pp 22–123

    Google Scholar 

  • Karlin S, Ghandour G, Foulser DE, Korn LJ (1984) Comparative analysis of human and bovine papillomaviruses. Mol Biol Evol 1:357–370

    PubMed  CAS  Google Scholar 

  • Karlin S, Morris M, Ghandour G, Leung M (1988a) Efficient algorithms for molecular sequence analysis. Proc Natl Acad Sci USA 85:841–845

    Article  PubMed  CAS  Google Scholar 

  • Karlin S, Morris M, Ghandour G, Leung M (1988b) Algorithms for identifying local molecular sequence features. CABIOS 4:41–51

    PubMed  CAS  Google Scholar 

  • Karlin S, Ost F, Blaisdell BE (1989) Patterns in DNA and amino acid sequences and their statistical significance. In: Waterman MS (ed) Mathematical methods for DNA sequences. CRC Press, Boca Raton FL (in press)

    Google Scholar 

  • Kittur SD, Hoppener JWM, Antonarakis SE, Daniels JDJ, Meyers DA, Maestri NE, Maarten J, Korneluk RG, Nelkin BD, Kazazian HH (1985) Linkage map of the shortarm of chromosome 11: location of the genes for catalase, calcitonin and insulin-like growth factor II. Proc Natl Acad Sci USA 82:5064–5067

    Article  PubMed  CAS  Google Scholar 

  • Konkel DA, Maizel JV, Leder P (1979) The evolution and sequence comparison of two recently diverged mouse chromosome beta-globin genes. Cell 18:865–873

    Article  PubMed  CAS  Google Scholar 

  • Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities intthe amino acid sequence of two proteins. J Mol Biol 48:443–453

    Article  PubMed  CAS  Google Scholar 

  • Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4:408–425

    Google Scholar 

  • Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197

    Article  PubMed  CAS  Google Scholar 

  • Sourdis J, Nei M (1988) Relative efficiencies of the maximum parsimony and distance-matrix methods in attaining the correct phylogenetic tree. Mol Biol Evol 5:298–311

    PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Blaisdell, B.E. Effectiveness of measures requiring and not requiring prior sequence alignment for estimating the dissimilarity of natural sequences. J Mol Evol 29, 526–537 (1989). https://doi.org/10.1007/BF02602924

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02602924

Key words

Navigation