Journal of Molecular Evolution

, Volume 74, Issue 1–2, pp 96–111 | Cite as

Evolutionary Genomics of Colias Phosphoglucose Isomerase (PGI) Introns



Little is known of intron sequences’ variation in cases where eukaryotic gene coding regions undergo strong balancing selection. Phosphoglucose isomerase, PGI, of Colias butterflies offers such a case. Its 11 introns include many point mutations, insertions, and deletions. This variation changes with intron position and length, and may leave little evidence of homology within introns except for their first and last few basepairs. Intron position is conserved between PGIs of Colias and the silkmoth, but no intron sequence homology remains. % GC content and length are functional properties of introns which can affect whole-gene transcription; we find a relationship between these properties which may indicate selection on transcription speed. Intragenic recombination is active in these introns, as in coding sequences. The small extent of linkage disequilibrium (LD) in the introns decays over a few hundred basepairs. Subsequences of Colias introns match subsequences of other introns, untranslated regions of cDNAs, and insect-related transposons and pathogens, showing that a diverse pool of sequence fragments is the source of intron contents via turnover due to deletion, recombination, and transposition. Like Colias PGI’s coding sequences, the introns evolve reticulately with little phylogenetic signal. Exceptions are coding-region allele clades defined by multiple amino acid variants in strong LD, whose introns are closely related but less so than their exons. Similarity of GC content between introns and flanking exons, lack of small introns despite mutational bias toward deletion, and findings already mentioned suggest constraining selection on introns, possibly balancing transcription performance against advantages of higher recombination rate conferred by intron length.


Molecular polymorphism Complex haplotypes Natural variation Intron evolution Glycolysis Linkage disequilibrium Intragenic recombination Transposable elements 



We thank Carol Boggs, Mike Bramson, Jason Hill, Jen Johnson, Martin Kreitman, Mark Longo, Dmitri Petrov, Steve Palumbi, and Chris Wheat for comments on the paper or other helpful discussions. We also thank Chris Aakre, Will Bassett, Nina Duong, Daniel Herrador, Alejandro Perez, and Eddie Wang for technical assistance. This work was supported by US National Science Foundation grants DEB 05-20315 and MCB 08-46870 to WBW. Our results do not represent official policy of any agency or corporate entity.

Supplementary material

239_2012_9492_MOESM1_ESM.doc (194 kb)
Supplementary material 1 (DOC 195 kb)


  1. Batley J, Barker G, O’Sullivan H, Edwards KJ, Edwards D (2003) Mining for single nucleotide polymorphisms and insertions/deletions in maize expressed sequence. Plant Physiol 132:84–91PubMedCrossRefGoogle Scholar
  2. Belshaw R, Bensasson D (2006) The rise and falls of introns. Heredity 96:208–213PubMedCrossRefGoogle Scholar
  3. Bennetzen JL (2000) Transposable element contributions to plant gene and genome evolution. Plant Mol Biol 42:251–269PubMedCrossRefGoogle Scholar
  4. Berger J, Suzuki T, Senti K, Stubbs J, Schaffner G, Dickson BJ (2001) Genetic mapping with SNP markers in Drosophila. Nat Genet 29:475–481PubMedCrossRefGoogle Scholar
  5. Bergman CM, Kreitman M (2001) Analysis of conserved noncoding DNA in Drosophila reveals similar constraints in intergenic and intronic sequences. Genome Res 11:1335–1345PubMedCrossRefGoogle Scholar
  6. Bézier A, Annaheim M, Herbinière J, Wetterwald C, Gyapay G, Bernard-Samain S, Wincker P, Roditi I, Heller M, Belghazi M, Pfister-Wilhem R, Periquet G, Dupuy C, Huguet E, Volkoff A-N, Lanzrein B, Drezen J-M (2009) Polydnaviruses of braconid wasps derive from an ancestral nudivirus. Science 323:926–930PubMedCrossRefGoogle Scholar
  7. Bradnam KR, Korf I (2008) Longer first introns are a general property of eukaryotic gene structure. PLoS ONE 3:e3093. doi: 10.1371/journal.pone.0003093 PubMedCrossRefGoogle Scholar
  8. Brandström M, Ellegren H (2007) The genomic landscape of short insertion and deletion polymorphisms in the chicken (Gallus gallus) genome: a high frequency of deletions in tandem duplicates. Genetics 176:1691–1701PubMedCrossRefGoogle Scholar
  9. Burset M, Seledtsov IA, Solovyev VV (2000) Analysis of canonical and non-canonical splice sites in mammalian genomes. Nucleic Acids Res 28:4364–4375PubMedCrossRefGoogle Scholar
  10. Carvalho AB, Clark AG (1999) Intron size and natural selection. Nature 401:344PubMedCrossRefGoogle Scholar
  11. Casillas S, Barbadilla A, Bergman CM (2007) Purifying selection maintains highly conserved noncoding sequences in Drosophila. Mol Biol Evol 24:2222–2234PubMedCrossRefGoogle Scholar
  12. Chamary JV, Hurst LD (2004) Similar rates but different modes of sequence evolution in introns and at exonic silent sites in rodents: evidence for selectively driven codon usage. Mol Biol Evol 21:1014–1023PubMedCrossRefGoogle Scholar
  13. Chen J-Q, Wu Y, Yang H, Bergelson J, Kreitman M, Tian D (2009) Variation in the ratio of nucleotide substitution and indel rates across genomes in mammals and bacteria. Mol Biol Evol 26:1523–1531PubMedCrossRefGoogle Scholar
  14. Comeron JM, Kreitman M (2000) The correlation between intron length and recombination in Drosophila: dynamic equilibrium between mutational and selective forces. Genetics 156:1175–1190PubMedGoogle Scholar
  15. Desjardins CA, Gundersen-Rindal DE, Hostetler JB, Tallon LJ, Fuester RW, Schatz MC, Pedroni MJ, Fadrosh DW, Haas BJ, Toms BS, Chen D, Nene V (2007) Structure and evolution of a proviral locus of Glyptapanteles indiensis bracovirus. BMC Microbiol 7:61PubMedCrossRefGoogle Scholar
  16. Doolittle WF (1978) Genes in pieces: Were they ever together? Nature 272:581–582CrossRefGoogle Scholar
  17. Duan J, Li R, Cheng D, Fan W, Zhu X, Cheng T, Wu Y, Wang J, Mita K, Xiang Z, Xia Q (2010) SilkDB 2.0: a platform for silkworm (Bombyx mori) genome biology. Nucl Acids Res 38:D453–D456. Google Scholar
  18. Duret L (2001) Why do genes have introns? Recombination might add a new piece to the puzzle. Trends Genet 17:172–175PubMedCrossRefGoogle Scholar
  19. Felsenstein J (2005) PHYLogeny Inference Package, v. 3.63.
  20. Feltus FA, Singh HP, Lohithaswa HC, Schulze SR, Silva TD, Paterson AH (2006) A comparative genomics strategy for targeted discovery of single-nucleotide polymorphisms and conserved-noncoding sequences in orphan crops. Plant Physiol 140:1183–1191PubMedCrossRefGoogle Scholar
  21. Friesen VL (2000) Introns. In: Baker AJ (ed) Molecular methods in ecology. Blackwell, Oxford, pp 274–294Google Scholar
  22. Gazave E, Marqués-Bonet T, Fernando O, Charlesworth B, Navarro A (2007) Patterns and rates of intron divergence between humans and chimpanzees. Genome Biol 8:R21PubMedCrossRefGoogle Scholar
  23. Genereux DP (2002) Evolution of genomic GC variation. Genome Biology 3:reports0058Google Scholar
  24. Gilbert W (1978) Why genes in pieces? Nature 271:501PubMedCrossRefGoogle Scholar
  25. Gillespie JH (1991) The causes of molecular evolution. Oxford University Press, New YorkGoogle Scholar
  26. Grimaldi D, Engel M (2005) Evolution of the insects. Cambridge University Press, CambridgeGoogle Scholar
  27. Guindon S, Gascuel O (2003) A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol 52:696–704. Google Scholar
  28. Haddrill PR, Bachtrog D, Andolfatto P (2008) Positive and negative selection on noncoding DNA in Drosophila simulans. Mol Biol Evol 25:1825–1834PubMedCrossRefGoogle Scholar
  29. Hall T (2004) Bioedit: biological sequence alignment editor.
  30. Hambuch TM, Parsch J (2005) Patterns of synonymous codon usage in Drosophila melanogaster genes with sex-biased expression. Genetics 170:1691–1700PubMedCrossRefGoogle Scholar
  31. Hare MP, Palumbi SR (2003) High intron sequence conservation across three mammalian orders suggests functional constraints. Mol Biol Evol 20:969–978PubMedCrossRefGoogle Scholar
  32. Hudson R, Kaplan N (1985) Statistical properties of the number of recombination events in the history of a sample of DNA sequences. Genetics 111:147–164PubMedGoogle Scholar
  33. Jareborg N, Birney E, Durbin R (1999) Comparative analysis of noncoding regions of 77 orthologous mouse and human gene pairs. Genome Res 9:815–824PubMedCrossRefGoogle Scholar
  34. Kotlar D, Lavner Y (2006) The action of selection on codon bias in humans is related to frequency, complexity and chronology of amino acids. BMC Genomics 7:67PubMedCrossRefGoogle Scholar
  35. Librado P, Rozas J (2009) DnaSP v5: a software for comprehensive analysis of DNA polymorphism data. Bioinformatics 25:1451–1452PubMedCrossRefGoogle Scholar
  36. Lynch M (2002) Intron evolution as a population-genetic process. Proc Natl Acad Sci USA 99:6118–6123PubMedCrossRefGoogle Scholar
  37. Marais G, Nouvellet P, Keightley PD, Charlesworth B (2005) Intron size and exon evolution in Drosophila. Genetics 170:481–485PubMedCrossRefGoogle Scholar
  38. Morton BR (1993) Chloroplast DNA codon use: evidence for selection at the psb A locus based on tRNA availability. J Mol Evol 37:273–280PubMedCrossRefGoogle Scholar
  39. Mount SM (1982) A catalogue of splice junction sequences. Nucleic Acids Res 10:459–472PubMedCrossRefGoogle Scholar
  40. Nekrutenko A, Li WH (2001) Transposable elements are found in a large number of human protein-coding genes. Trends Genet 17:619–621PubMedCrossRefGoogle Scholar
  41. Osanai-Futahashi M, Suetsugu Y, Mita K, Fujiwara H (2008) Genome-wide screening and characterization of transposable elements and their distribution analysis in the silkworm, Bombyx mori. Insect Biochem Mol Biol 38:1046–1057PubMedCrossRefGoogle Scholar
  42. Page RDM (2001) TREEVIEW: an application to display phylogenetic trees on personal computers. Comput Appl Biosci 12:357–358Google Scholar
  43. Rao YS, Wang ZF, Chai XW, Wu GZ, Nie QH, Zhang XQ (2010) Indel segregating within introns in the chicken genome are positively correlated with the recombination rates. Hereditas 147:53–57PubMedCrossRefGoogle Scholar
  44. Rogozin IB, Sverdlov AV, Babenko VN, Koonin EV (2005) Analysis of evolution of exon-intron structure of eukaryotic genes. Brief Bioinformatics 6:118–134PubMedCrossRefGoogle Scholar
  45. Sakurai A, Fujimori S, Kochiwa H, Kitamura-Abe S, Washio T, Saito R, Carninci P, Hayashizaki Y, Tomita M (2002) On biased distribution of introns in various eukaryotes. Gene 300:89–95PubMedCrossRefGoogle Scholar
  46. Schaeffer SW (2002) Molecular population genetics of sequence length diversity in the Adh region of Drosophila pseudoobscura. Genet Res 80:163–175PubMedCrossRefGoogle Scholar
  47. Shabalina SA, Spiridonov NA (2004) The mammalian transcriptome and the function of non-coding DNA sequences. Genome Biol 5:105PubMedCrossRefGoogle Scholar
  48. Siepel A, Bejerano G, Pedersen JS, Hinrichs A, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, Weinstock GM, Wilson RK, Gibbs RA, Kent WJ, Miller W, Haussler D (2005) Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res 15:1034–1050PubMedCrossRefGoogle Scholar
  49. Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting position-specific gap penalties and weight matrix choice. Nucleic Acid Res 22:4673–4680PubMedCrossRefGoogle Scholar
  50. Urrutia AO, Hurst LD (2003) The signature of selection mediated by expression on human genes. Genome Res 13:2260–2264PubMedCrossRefGoogle Scholar
  51. Vinogradov AE (2006) “Genome design” model: evidence from conserved intronic sequence in human-mouse comparison. Genome Res 16:347–354PubMedCrossRefGoogle Scholar
  52. Wang BQ, Watt WB, Aakre C, Hawthorne N (2009) Emergence of complex haplotypes from microevolutionary variation in sequence and structure of Colias phosphoglucose isomerase. J Mol Evol 68:433–447PubMedCrossRefGoogle Scholar
  53. Wang X, Zhao X, Zhu J, Wu W (2006) Genome-wide Investigation of intron length polymorphisms and their potential as molecular markers in rice (Oryza sativa L.). DNA Res 12:417–427CrossRefGoogle Scholar
  54. Watt WB (1972) Intragenic recombination as a source of population genetic variability. Am Nat 106:737–753CrossRefGoogle Scholar
  55. Watt WB (2003) Mechanistic studies of butterfly adaptations. In: Boggs CL, Watt WB, Ehrlich PR (eds) Butterflies: ecology and evolution taking flight. University of Chicago Press, Chicago, pp 319–352Google Scholar
  56. Watt WB, Dean AM (2000) Molecular-functional studies of adaptive genetic variation in prokaryotes and eukaryotes. Ann Rev Genet 34:593–622PubMedCrossRefGoogle Scholar
  57. Wheat CW, Watt WB, Pollock DD, Schulte PM (2006) From DNA to fitness differences: sequences and structures of adaptive variants of Colias phosphoglucose isomerase (PGI). Mol Biol Evol 23:499–512PubMedCrossRefGoogle Scholar
  58. Wright SI, Andolfatto P (2008) The impact of natural selection on the genome: emerging patterns in Drosophila and Arabidopsis. Annu Rev Ecol Evol Syst 39:193–213CrossRefGoogle Scholar
  59. Yang L, Jin G, Zhao X, Zheng Y, Xu Z, Wu W (2007) PIP: a database of potential intron polymorphism markers. Bioinformatics 23:2174–2177PubMedCrossRefGoogle Scholar
  60. Young ND, Healy J (2003) GapCoder automates the use of indel characters in phylogenetic analysis. BMC Bioinformatics 4:6PubMedCrossRefGoogle Scholar
  61. Zhang W, Sun X, Yuan H, Araki H, Wang J, Tian D (2008) The pattern of insertion/deletion polymorphism in Arabidopsis thaliana. Mol Genet Genomics 280:351–361PubMedCrossRefGoogle Scholar
  62. Zhu L, Zhang Y, Zhang W, Yang S, Chen J-Q, Tian D (2009) Patterns of exon–intron architecture variation of genes in eukaryotic genomes. BMC Genomics 10:47–58PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  1. 1.Department of BiologyStanford UniversityStanfordUSA

Personalised recommendations