Statistical Evaluation of Genome Rearrangement
Genomic distances based on the number of rearrangement steps – inversions, transpositions, reciprocal translocations – necessary to convert the gene or segment order of one genome to that of another are potentially meaningful measures of evolutionary divergence. The significance of a comparison between two genomes, however, depends on how it differs from the case where the order of the n segments constituting one genome is randomized with respect to the other. In this presentation, we discuss the comparison of randomized segment orders from a probabilistic and statistical viewpoint as a basis for evaluating the relationships among real genomes. The combinatorial structure containing all the information necessary to calculate genomic distance d is the bicoloured “breakpoint graph”, essentially the union of two bipartite matchings within the set of 2n segment ends, a red matching induced by segment endpoint adjacencies in one genome and black matching similarly determined by the other genome. The number c of alternating-colour cycles in the breakpoint graph is the key component in formulae for d. Indeed, d ≥ n–c, where equality holds for the most inclusive repertory of rearrangement types postulated to account for evolutionary divergence.