The Similarity Distribution of Paralogous Gene Pairs Created by Recurrent Alternation of Polyploidization and Fractionation

  • Yue Zhang
  • David SankoffEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10562)


We study modeling and inference problems around the process of fractionation, or the genome-wide process of losing one gene per duplicate pair following whole genome doubling (WGD), motivated by the evolution of plants over many tens of millions of years, with their repeated cycles of genome doubling and fractionation. We focus on the frequency distribution of similarities between the two genes, over all the duplicate pairs in the genome. Our model is fully general, accounting for repeated duplication, triplication or other k-tupling events (all subsumed under the term WGD), as well as a general fractionation rate in any time period among multiple progeny of a single gene. It also has a biologically and combinatorially well-motivated way of handling the tendency for at least one sibling to survive fractionation. We show how the method reduces to previously proposed models for special cases, and settles unresolved questions about the expected number of gene pairs tracing their ancestry back to each WGD event.


Whole genome duplication Gene loss Birth and death process Multinomial model Paralog gene tree Sequence divergence 



Research supported in part by grants from the Natural Sciences and Engineering Research Council of Canada. DS holds the Canada Research Chair in Mathematical Genomics.


  1. 1.
    Eckardt, N.: A sense of self: the role of DNA sequence elimination in allopolyploidization. Plant Cell 13, 1699–1704 (2001)CrossRefGoogle Scholar
  2. 2.
    Freeling, M., Woodhouse, M.R., Subramaniam, S., Turco, G., Lisch, D., Schnable, J.C.: Fractionation mutagenesis and similar consequences of mechanisms removing dispensable or less-expressed DNA in plants. Curr. Opin. Plant Biol. 15(2), 131–139 (2012)CrossRefGoogle Scholar
  3. 3.
    McLachlan, G.J., Peel, D., Basford, K.E., Adams, P.: The EMMIX software for the fitting of mixtures of normal and t-components. J. Stat. Softw. 4(2), 1–14 (1999)CrossRefGoogle Scholar
  4. 4.
    Zhang, Y., Zheng, C., Sankoff, D.: Evolutionary model for the statistical divergence of paralogous and orthologous gene pairs generated by whole genome duplication and speciation. IEEE/ACM Trans. Comput. Biol. Bioinform. (2017). doi: 10.1109/TCBB.2017.2712695
  5. 5.
    Sankoff, D., Zheng, C., Zhang, Y., Meidanis, J., Lyons, E., Tang, H.: Models for similarity distributions of syntenic homologs and applications to phylogenomics. IEEE/ACM Trans. Comput. Biol. Bioinform. (2017, in press)Google Scholar
  6. 6.
    Nadeau, J.H., Sankoff, D.: Comparable rates of gene loss and functional divergence after genome duplications early in vertebrate evolution. Genetics 147(3), 1259–1266 (1997)Google Scholar
  7. 7.
    Kumar, S., Subramanian, S.: Mutation rates in mammalian genomes. Proc. Nat. Acad. Sci. 99(2), 803–808 (2002)CrossRefGoogle Scholar
  8. 8.
    Lyons, E., Freeling, M.: How to usefully compare homologous plant genes and chromosomes as DNA sequences. Plant J. 53(4), 661–673 (2008). doi: 10.1111/j.1365-313X.2007.03326.x CrossRefGoogle Scholar
  9. 9.
    Lyons, E., Pedersen, B., Kane, J., Alam, M., Ming, R., Tang, H., Wang, X., Bowers, J., Paterson, A., Lisch, D., Freeling, M.: Finding and comparing syntenic regions among Arabidopsis and the outgroups papaya, poplar and grape: CoGe with rosids. Plant Physiol. 148, 1772–1781 (2008)CrossRefGoogle Scholar
  10. 10.
    Murat, F., Armero, A., Pont, C., Klopp, C., Salse, J.: Reconstructing the genome of the most recent common ancestor of flowering plants. Nat. Genet. 49(4), 490–496 (2017)CrossRefGoogle Scholar
  11. 11.
    Tomato Genome Consortium: The tomato genome sequence provides insights into fleshy fruit evolution. Nature 485, 635–641 (2012)Google Scholar
  12. 12.
    Hirakawa, H., Shirasawa, K., Kosugi, S., Tashiro, K., Nakayama, S., Yamada, M., Kohara, M., Watanabe, A., Kishida, Y., Fujishiro, T., et al.: Dissection of the octoploid strawberry genome by deep sequencing of the genomes of Fragaria species. DNA Res. 21(2), 169–181 (2014)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.University of OttawaOttawaCanada

Personalised recommendations