Van de Peer, Y., Neefs, JM., De Rijk, P. et al. J Mol Evol (1993) 37: 221. doi:10.1007/BF02407359
The detailed descriptions now available for the secondary structure of small-ribosomalsubunit RNA, including areas of higly variable primary structure, facilitate the alignment of nucleotide sequences. However, for optimal exploitation of the information contained in the alignment, a method must be available that takes into account the local sequence variability in the computation of evolutionary distance. A quantitative definition for the variability of an alignment position is proposed in this study. It is a parameter in an equation which expresses the probability that the alignment position contains a different nucleotide in two sequences, as a function of the distance separating these sequences, i.e., the number of substitutions per nucleotide that occurred during their divergence. This parameter can be estimated from the distance matrix resulting from the conversion of pairwise sequence dissimilarities into pairwise distances. Alignment positions can then be subdivided into a number of sets of matching variability, and the average variability of each set can be derived. Next, the conversion of dissimilarity into distance can be recalculated for each set of alignment positions separately, using a modified version of the equation that corrects for multiple substitutions and changing for each set the parameter that reflects its average variability. The distances computed for each set are finally averaged, giving a more precise distance estimation.
Trees constructed by the algorithm based on variability calibration have a topology markedly different from that of trees constructed from the same alignments in the absence of calibration. This is illustrated by means of trees constructed from small-ribosomal-subunit RNA sequences of Metazoa. A reconstruction of vertebrate evolution based on calibrated alignments matches the consensus view of paleontologists, contrary to trees based on uncalibrated alignments. In trees derived from sequences covering several metazoan phyla, artefacts in topology that are probably due to a high clock rate in certain lineages are avoided.