# An approximation algorithm for genome sorting by reversals to recover all adjacencies

- 71 Downloads

## Abstract

Genome rearrangement problems have been extensively studied for more than two decades, intended to understand the species evolutionary relationships in terms of the long range genetic mutations at the genome level. While most earlier studies focus on the simplified genomes ignoring gene duplicates, thousands of whole genome sequencing projects reveal that a genome typically carries multiple gene duplicates distributed in various ways along the genome. Given a source genome and a target genome such that one is a re-ordering of the genes in the other, we measure the evolutionary distance by the minimum number of reversals applied on the source genome to recover all the gene adjacencies in the target genome. We define this optimization problem as *sorting by reversals to recover all adjacencies*, or SBR2RA in short. We show that SBR2RA is APX-hard and uncover some similarities and differences to the classic counterpart, the *sorting by reversals* problem. From the approximability perspective, we present a \(2 \alpha \)-approximation algorithm, where \(\alpha \in [1, 2]\) is the best approximation ratio for a related optimization problem which is suspected to be NP-hard.

## Keywords

Genome rearrangement Sorting by reversals Gene adjacency Maximum matching Alternating cycle## Notes

### Acknowledgements

PZ is partially supported by the NNSF China Grant 61672323 and the NSF of Shandong Province Grant ZR2016AM28; DZ is partially supported by the NNSF China Grants 61472222, 61732009, and 61761136017; WT is partially supported by the funds from the Office of the Vice President for Research and Economic Development at Georgia Southern University; GL is supported by the NSERC.

## Supplementary material

## References

- Bafna V, Pevzner PA (1996) Genome rearrangements and sorting by reversals. SIAM J Comput 25:272–289MathSciNetCrossRefzbMATHGoogle Scholar
- Bafna V, Pevzner PA (1998) Sorting by transpositions. SIAM J Discrete Math 11:224–240MathSciNetCrossRefzbMATHGoogle Scholar
- Berman P, Hannenhalli S, Karpinski M (2002) \(1.375\)-approximation algorithm for sorting by reversals. In: Proceedings of the 10th annual European symposium on algorithms (ESA’02), pp 200–210Google Scholar
- Berman P, Karpinski M (1999) On some tighter inapproximability results. In: Proceedings of the of 26th international colloquium on automata, languages and programming (ICALP’99), pp 200–209Google Scholar
- Caprara A (1997) Sorting by reversals is difficult. In: Proceedings of the first annual international conference on computational molecular biology, pp 75–83Google Scholar
- Chen W, Chen Z, Samatova NF, Peng L, Wang J, Tang M (2014) Solving the maximum duo-preservation string mapping problem with linear programming. Theor Comput Sci 530:1–11MathSciNetCrossRefzbMATHGoogle Scholar
- Christie DA (1996) Sorting permutations by block-interchanges. Inf Process Lett 60:165–169MathSciNetCrossRefzbMATHGoogle Scholar
- Christie DA (1998) A \(3/2\) approximation algorithm for sorting by reversals. In: ACM-SIAM proceedings of the ninth annual symposium on discrete algorithms (SODA’98), pp 244–252Google Scholar
- Christie DA, Irving RW (2001) Sorting strings by reversals and by transpositions. SIAM J Discrete Math 14:193–206MathSciNetCrossRefzbMATHGoogle Scholar
- Chrobak M, Kolman P, Sgall J (2004) The greedy algorithm for the minimum common string partition problem. In: Proceedings of the 7th international workshop on approximation algorithms for combinatorial optimization problems (APPROX 2004) and the 8th international workshop on randomization and computation (RANDOM 2004), LNCS 3122, pp 84–95Google Scholar
- Goldstein A, Kolman P, Zheng J (2004) Minimum common string partition problem: hardness and approximations. In: Proceedings of the 15th international symposium on algorithms and computation (ISAAC 2004), LNCS 3341, pp 484–495Google Scholar
- Gu Q-P, Peng S, Sudborough H (1999) A \(2\)-approximation algorithm for genome rearrangements by reversals and transpositions. Theor Comput Sci 210:327–339MathSciNetCrossRefzbMATHGoogle Scholar
- Hannenhalli S, Pevzner P (1995) Transforming cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals. In: ACM proceedings of the 27th annual symposium on the theory of computing (STOC’95), pp 178–189Google Scholar
- Jerrum MR (1985) The complexity of finding minimum-length generator sequences. Theor Comput Sci 36:265–289MathSciNetCrossRefzbMATHGoogle Scholar
- Kececioglu JD, Sankoff D (1993) Exact and approximation algorithms for the inversion distance between two permutations. In: Proceedings of the fourth annual symposium on combinatorial pattern matching (CPM’93), LNCS 684, pp 87–105Google Scholar
- Kolman P, Waleń T (2007) Approximating reversal distance for strings with bounded number of duplicates. Discrete Appl Math 155:327–336MathSciNetCrossRefzbMATHGoogle Scholar
- Rubert DP, Feijão P, Braga MDV, Stoye J, Martinez FHV (2017) Approximating the DCJ distance of balanced genomes in linear time. Algorithms Mol Biol 12:3CrossRefzbMATHGoogle Scholar
- Sankoff D (1999) Genome rearrangement with gene families. Bioinformatics 16:909–917CrossRefGoogle Scholar
- Watterson G, Ewens W, Hall T, Morgan A (1982) The chromosome inversion problem. J Theor Biol 99:1–7CrossRefGoogle Scholar