The Similarity Distribution of Paralogous Gene Pairs Created by Recurrent Alternation of Polyploidization and Fractionation
We study modeling and inference problems around the process of fractionation, or the genome-wide process of losing one gene per duplicate pair following whole genome doubling (WGD), motivated by the evolution of plants over many tens of millions of years, with their repeated cycles of genome doubling and fractionation. We focus on the frequency distribution of similarities between the two genes, over all the duplicate pairs in the genome. Our model is fully general, accounting for repeated duplication, triplication or other k-tupling events (all subsumed under the term WGD), as well as a general fractionation rate in any time period among multiple progeny of a single gene. It also has a biologically and combinatorially well-motivated way of handling the tendency for at least one sibling to survive fractionation. We show how the method reduces to previously proposed models for special cases, and settles unresolved questions about the expected number of gene pairs tracing their ancestry back to each WGD event.
KeywordsWhole genome duplication Gene loss Birth and death process Multinomial model Paralog gene tree Sequence divergence
Research supported in part by grants from the Natural Sciences and Engineering Research Council of Canada. DS holds the Canada Research Chair in Mathematical Genomics.
- 4.Zhang, Y., Zheng, C., Sankoff, D.: Evolutionary model for the statistical divergence of paralogous and orthologous gene pairs generated by whole genome duplication and speciation. IEEE/ACM Trans. Comput. Biol. Bioinform. (2017). doi: 10.1109/TCBB.2017.2712695
- 5.Sankoff, D., Zheng, C., Zhang, Y., Meidanis, J., Lyons, E., Tang, H.: Models for similarity distributions of syntenic homologs and applications to phylogenomics. IEEE/ACM Trans. Comput. Biol. Bioinform. (2017, in press)Google Scholar
- 6.Nadeau, J.H., Sankoff, D.: Comparable rates of gene loss and functional divergence after genome duplications early in vertebrate evolution. Genetics 147(3), 1259–1266 (1997)Google Scholar
- 9.Lyons, E., Pedersen, B., Kane, J., Alam, M., Ming, R., Tang, H., Wang, X., Bowers, J., Paterson, A., Lisch, D., Freeling, M.: Finding and comparing syntenic regions among Arabidopsis and the outgroups papaya, poplar and grape: CoGe with rosids. Plant Physiol. 148, 1772–1781 (2008)CrossRefGoogle Scholar
- 11.Tomato Genome Consortium: The tomato genome sequence provides insights into fleshy fruit evolution. Nature 485, 635–641 (2012)Google Scholar