Reconstructing Histories of Complex Gene Clusters on a Phylogeny

  • Tomáš Vinař
  • Broňa Brejová
  • Giltae Song
  • Adam Siepel
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5817)


Clusters of genes that have evolved by repeated segmental duplication present difficult challenges throughout genomic analysis, from sequence assembly to functional analysis. These clusters are one of the major sources of evolutionary innovation, and they are linked to multiple diseases, including HIV and a variety of cancers. Understanding their evolutionary histories is a key to the application of comparative genomics methods in these regions of the genome. We propose a probabilistic model of gene cluster evolution on a phylogeny, and an MCMC algorithm for reconstruction of duplication histories from genomic sequences in multiple species. Several projects are underway to obtain high quality BAC-based assemblies of duplicated clusters in multiple species, and we anticipate use of our methods in their analysis. Supplementary materials are located at


Branch Length Atom Type Guide Tree Ancestral Sequence Segment Type 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. An, P., et al.: APOBEC3G genetic variants and their influence on the progression to AIDS. J. Virol. 78(20), 11070–11076 (2004)CrossRefPubMedPubMedCentralGoogle Scholar
  2. Becker, T.S., Lenhard, B.: The random versus fragile breakage models of chromosome evolution: a matter of resolution. Mol. Genet. Genomics 278(5), 487–491 (2007)CrossRefPubMedGoogle Scholar
  3. Bertrand, D., Lajoie, M., El-Mabrouk, N.: Inferring ancestral gene orders for a family of tandemly arrayed genes. J. Comput. Biol. 15(8), 1063–1067 (2008)CrossRefPubMedGoogle Scholar
  4. Birtle, Z., Goodstadt, L., Ponting, C.: Duplication and positive selection among hominin-specific PRAME genes. BMC Genomics 6, 120 (2005)CrossRefPubMedPubMedCentralGoogle Scholar
  5. Bitti, P.P., et al.: Association between the ancestral haplotype HLA A30B18DR3 and multiple sclerosis in central Sardinia. Genet. Epidemiol. 20(2), 271–273 (2001)CrossRefPubMedGoogle Scholar
  6. Degenhardt, J.D., et al.: Copy number variation of CCL3-like genes affects rate of progression to simian-AIDS in Rhesus Macaques (Macaca mulatta). PLoS Genet. 5(1), e1000346 (2009)CrossRefGoogle Scholar
  7. Dumas, L., Kim, Y.H., Karimpour-Fard, A., Cox, M., Hopkins, J., Pollack, J.R., Sikela, J.M.: Gene copy number variation spanning 60 million years of human and primate evolution. Genome Res. 17(9), 1266–1267 (2007)CrossRefPubMedPubMedCentralGoogle Scholar
  8. Edgar, R.C.: MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5, 113 (2004)CrossRefPubMedPubMedCentralGoogle Scholar
  9. Elemento, O., Gascuel, O., Lefranc, M.P.: Reconstructing the duplication history of tandemly repeated genes. Mol. Biol. Evol. 19(3), 278 (2002)CrossRefPubMedGoogle Scholar
  10. Finckh, U., et al.: Genetic association of a cystatin C gene polymorphism with late-onset Alzheimer disease. Arch. Neurol. 57(11), 1579–1583 (2000)CrossRefPubMedGoogle Scholar
  11. Gibbs, R., et al.: Evolutionary and biomedical insights from the rhesus macaque genome. Science 316(5822), 222–224 (2007)CrossRefPubMedGoogle Scholar
  12. Green, E.D.: Strategies for the systematic sequencing of complex genomes. Nat. Rev. Genet. 2(8), 573 (2001)CrossRefPubMedGoogle Scholar
  13. Hajdinjak, T., Zagradisnik, B.: Prostate cancer and polymorphism D85Y in gene for dihydrotestosterone degrading enzyme UGT2B15: Frequency of DD homozygotes increases with Gleason Score. Prostate 59(4), 436–439 (2004)CrossRefPubMedGoogle Scholar
  14. Hastings, W.K.: Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57, 97–109 (1970)CrossRefGoogle Scholar
  15. Holmes, I., Bruno, W.J.: Evolutionary HMMs: a Bayesian approach to multiple alignment. Bioinformatics 17(9), 803–810 (2001)CrossRefPubMedGoogle Scholar
  16. Karolchik, D., et al.: The UCSC Genome Browser Database: 2008 update. Nucleic Acids Res. 36(Database issue), D773–D779 (2008)Google Scholar
  17. Kosiol, C., Vinar, T., da Fonseca, R.R., Hubisz, M.J., Bustamante, C.D., Nielsen, R., Siepel, A.: Patterns of positive selection in six Mammalian genomes. PLoS Genet. 4(8), e1000144 (2008)CrossRefGoogle Scholar
  18. Lajoie, M., Bertrand, D., El-Mabrouk, N., Gascuel, O.: Duplication and inversion history of a tandemly repeated genes family. J. Comput. Biol. 14(4), 462–468 (2007)CrossRefPubMedGoogle Scholar
  19. Lander, E.S., et al.: Initial sequencing and analysis of the human genome. Nature 409(6822), 860–921 (2001)CrossRefPubMedGoogle Scholar
  20. Ma, J., Ratan, A., Raney, B.J., Suh, B.B., Miller, W., Haussler, D.: The infinite sites model of genome evolution. Proc. Natl. Acad. Sci. USA 105(38), 14254–14261 (2008)CrossRefPubMedPubMedCentralGoogle Scholar
  21. Nadeau, J.H., Taylor, B.A.: Lengths of chromosomal segments conserved since divergence of man and mouse. Proc. Natl. Acad. Sci. USA 81(3), 814–818 (1984)CrossRefPubMedPubMedCentralGoogle Scholar
  22. Ohno, S.: Evolution by Gene Dupplication. Springer, Berlin (1970)CrossRefGoogle Scholar
  23. Peng, Q., Pevzner, P.A., Tesler, G.: The fragile breakage versus random breakage models of chromosome evolution. PLoS Comput. Biol. 2(2), e14 (2006)CrossRefGoogle Scholar
  24. Ronquist, F., Huelsenbeck, J.P.: MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19(12), 1572–1574 (2003)CrossRefPubMedGoogle Scholar
  25. Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.C., Haussler, D., Miller, W.: Human-mouse alignments with BLASTZ. Genome Res. 13(1), 103–107 (2003)CrossRefPubMedPubMedCentralGoogle Scholar
  26. Tang, K.S., Chiu, H.F., Chen, H.H., Eng, H.L., Tsai, C.J., Teng, H.C., Huang, C.S.: Link between colorectal cancer and polymorphisms in the uridine-diphosphoglucuronosyltransferase 1A7 and 1A1 genes. World J. Gastroenterol 11(21), 3250–3254 (2005)CrossRefPubMedPubMedCentralGoogle Scholar
  27. Wapinski, I., Pfeffer, A., Friedman, N., Regev, A.: Automatic genome-wide reconstruction of phylogenetic gene trees. Bioinformatics 23(13), i549–i558 (2007)CrossRefGoogle Scholar
  28. Zhang, Y., Song, G., Hsu, C.-H., Miller, W.: Simultaneous History Reconstruction for Complex Gene Clusters in Multiple Species. In: Pacific Symposium on Biocomputing (PSB), vol. 14, pp. 162–173 (2009)Google Scholar
  29. Zhang, Y., Song, G., Vinar, T., Green, E.D., Siepel, A., Miller, W.: Reconstructing the Evolutionary History of Complex Human Gene Clusters. In: Vingron, M., Wong, L. (eds.) RECOMB 2008. LNCS (LNBI), vol. 4955, pp. 29–49. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  30. Zody, M.C., et al.: Evolutionary toggling of the MAPT 17q21.31 inversion region. Nat. Genet. 40, 1076–1083 (2008)CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Tomáš Vinař
    • 1
  • Broňa Brejová
    • 1
  • Giltae Song
    • 2
  • Adam Siepel
    • 3
  1. 1.Faculty of Mathematics, Physics and InformaticsComenius UniversityBratislavaSlovakia
  2. 2.Center for Comparative Genomics and Bioinformatics, 506B Wartik LabPenn State UniversityUniversity ParkUSA
  3. 3.Dept. of Biological Statistics and Comp. BiologyCornell UniversityIthacaUSA

Personalised recommendations