New Genome Similarity Measures Based on Conserved Gene Adjacencies
- Cite this paper as:
- Kowada L.A.B., Doerr D., Dantas S., Stoye J. (2016) New Genome Similarity Measures Based on Conserved Gene Adjacencies. In: Singh M. (eds) Research in Computational Molecular Biology. RECOMB 2016. Lecture Notes in Computer Science, vol 9649. Springer, Cham
Many important questions in molecular biology, evolution and biomedicine can be addressed by comparative genomics approaches. One of the basic tasks when comparing genomes is the definition of measures of similarity (or dissimilarity) between two genomes, for example to elucidate the phylogenetic relationships between species.
The power of different genome comparison methods varies with the underlying formal model of a genome. The simplest models impose the strong restriction that each genome under study must contain the same genes, each in exactly one copy. More realistic models allow several copies of a gene in a genome. One speaks of gene families, and comparative genomics methods that allow this kind of input are called gene family-based. The most powerful – but also most complex – models avoid this preprocessing of the input data and instead integrate the family assignment within the comparative analysis. Such methods are called gene family-free.
In this paper, we study an intermediate approach between family-based and family-free genomic similarity measures. The model, called gene connections, is on the one hand more flexible than the family-based model, on the other hand the resulting data structure is less complex than in the family-free approach. This intermediate status allows us to achieve results comparable to those for family-free methods, but at running times similar to those for the family-based approach.
Within the gene connection model, we define three variants of genomic similarity measures that have different expression power. We give polynomial-time algorithms for two of them, while we show NP-hardness of the third, most powerful one. We also generalize the measures and algorithms to make them more robust against recent local disruptions in gene order. Our theoretical findings are supported by experimental results, proving the applicability and performance of our newly defined similarity measures.