Journal of Molecular Evolution

, Volume 69, Issue 1, pp 81–93

Evolutionary Dynamics of Recently Duplicated Genes: Selective Constraints on Diverging Paralogs in the Drosophila pseudoobscura Genome


DOI: 10.1007/s00239-009-9254-1

Cite this article as:
Meisel, R.P. J Mol Evol (2009) 69: 81. doi:10.1007/s00239-009-9254-1


Duplicated genes produce genetic variation that can influence the evolution of genomes and phenotypes. In most cases, for a duplicated gene to contribute to evolutionary novelty it must survive the early stages of divergence from its paralog without becoming a pseudogene. I examined the evolutionary dynamics of recently duplicated genes in the Drosophila pseudoobscura genome to understand the factors affecting these early stages of evolution. Paralogs located in closer proximity have higher sequence identity. This suggests that gene conversion occurs more often between duplications in close proximity or that there is more genetic independence between distant paralogs. Partially duplicated genes have a higher likelihood of pseudogenization than completely duplicated genes, but no single factor significantly contributes to the selective constraints on a completely duplicated gene. However, DNA-based duplications and duplications within chromosome arms tend to produce longer duplication tracts than retroposed and inter-arm duplications, and longer duplication tracts are more likely to contain a completely duplicated gene. Therefore, the relative position of paralogs and the mechanism of duplication indirectly affect whether a duplicated gene is retained or pseudogenized.


DrosophilaGene duplicationPseudogeneCopy number polymorphism

Supplementary material

239_2009_9254_MOESM1_ESM.pdf (303 kb)
Supplementary Figure S1Relative rates of amino acid evolution for ancestral and derived paralogs. Estimated amino acid substitutions along the ancestral (dark gray) and derived (white) lineages are graphed for each duplicated gene in the dataset; each pair of bars represents the ancestral and derived copy of a duplicated gene, respectively. Paralogs for which the ancestral and derived copy cannot be distinguished are indicated by two gray bars. Paralogs for which the derived copy contains (A) a partial coding sequence and (B) a complete coding sequence are shown, and the status of the open reading frame of the derived copy is indicated below the X-axis. A relative rate test based on a chi-square test was used to determine if the difference in amino acid substitutions in the ancestral and derived copy of each duplicated gene is significant (Tajima 1993). Paralogs for which the test could not be performed (because of too few substitutions between the paralogs) are indicated by the black bars labeled “N/A”. Paralogs for which the test is significant at P < 0.05 are indicated by a single asterisk and paralogs for which the test is significant at P < 0.005 are indicated by two asterisks. Two completely duplicated genes in which the derived copy is evolving significantly faster than the ancestral copy are labeled. Supplementary material 1 (PDF 304 kb)
239_2009_9254_MOESM2_ESM.pdf (271 kb)
Supplementary Figure S2Tissue expression of D. melanogaster orthologs of completely duplicated genes. Tissue expression data were retrieved for the D. melanogaster orthologs of each completely duplicated gene in the D. pseudoobscura genome (Chintapalli et al. 2007). For each tissue, the number of non-degenerated and degenerated genes expressed and not expressed in that tissue is graphed. A single asterisk indicates a significant departure from independence between degeneration and expression using a G test with P < 0.05. Supplementary material 2 (PDF 272 kb)
239_2009_9254_MOESM3_ESM.pdf (27 kb)
Supplementary material 3 (PDF 27 kb)
239_2009_9254_MOESM4_ESM.xls (16 kb)
Supplementary Table S1Copy number polymorphism primers. Supplementary material 4 (XLS 17 kb)
239_2009_9254_MOESM5_ESM.xls (66 kb)
Supplementary Table S2Data on each duplicated gene. Supplementary material 5 (XLS 67 kb) (198 kb)
Supplementary Data-Annotated alignments of duplicated genes. Alignments are in the MEGA format, with coding and non-coding sequences annotated. Ancestral copies are indicated by “anc” in the sequence name, and derived copies are indicated by “dup” in the sequence name. For duplicated genes where the ancestral and derived copies could not be determined, the two copies are named “copyA” and “copyB”. Coding sequences were reverse complemented in some alignments (relative to the rest of the aligned sequence) containing multiple genes to keep open reading frames in proper orientation; this was done if the coding sequences were in opposite orientations. The following genes were reverse complemented relative to the rest of the aligned sequence: CG15287, CG14860,CG8016, CG8589, CG13190, CG13063, CG11070, CG16734, CG16983. Supplementary material 6 (ZIP 198 kb)

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.Intercollege Graduate Program in Genetics and Department of BiologyThe Pennsylvania State UniversityUniversity ParkUSA
  2. 2.Department of Molecular Biology and GeneticsCornell UniversityIthacaUSA