Skip to main content

Analysis of phylogenetic signal in protostomial intron patterns using Mutual Information

Abstract

Many deep evolutionary divergences still remain unresolved, such as those among major taxa of the Lophotrochozoa. As alternative phylogenetic markers, the intron–exon structure of eukaryotic genomes and the patterns of absence and presence of spliceosomal introns appear to be promising. However, given the potential homoplasy of intron presence, the phylogenetic analysis of this data using standard evolutionary approaches has remained a challenge. Here, we used Mutual Information (MI) to estimate the phylogeny of Protostomia using gene structure data, and we compared these results with those obtained with Dollo Parsimony. Using full genome sequences from nine Metazoa, we identified 447 groups of orthologous sequences with 21,732 introns in 4,870 unique intron positions. We determined the shared absence and presence of introns in the corresponding sequence alignments and have made this data available in “IntronBase”, a web-accessible and downloadable SQLite database. Our results obtained using Dollo Parsimony are obviously misled through systematic errors that arise from multiple intron loss events, but extensive filtering of data improved the quality of the estimated phylogenies. Mutual Information, in contrast, performs better with larger datasets, but at the same time it requires a complete data set, which is difficult to obtain for orthologs from a large number of taxa. Nevertheless, Mutual Information-based distances proved to be useful in analyzing this kind of data, also because the estimation of MI-based distances is independent of evolutionary models and therefore no pre-definitions of ancestral and derived character states are necessary.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

References

  • Adami C (2004) Information theory in molecular biology. Phys Life Rev 1(1):3–22

    Article  Google Scholar 

  • Ahmadinejad N, Dagan T, Gruenheit N, Martin W, Gabaldon T (2010) Evolution of spliceosomal introns following endosymbiotic gene transfer. BMC Evol Biol 10(1):57

    PubMed  Article  Google Scholar 

  • Bauer M, Schuster SM, Sayood K (2008) The average mutual information profile as a genomic signature. BMC Bioinformatics 9:48

    Article  Google Scholar 

  • Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: A practical and powerful approach to multiple testing. J R Stat Soc Ser B (Methodol) 57(1):289–300

    Google Scholar 

  • Butte A (2002) The use and analysis of microarray data. Nat Rev Drug Discov 1(12):951–60

    PubMed  Article  CAS  Google Scholar 

  • Carmel L, Rogozin IB, Wolf YI, Koonin EV (2007) Evolutionarily conserved genes preferentially accumulate introns. Genome Res 17(7):1045–1050

    PubMed  Article  CAS  Google Scholar 

  • Cho S, Jin S, Cohen A, Ellis RE (2004) A phylogeny of caenorhabditis reveals frequent loss of introns during nematode evolution. Genome Res 14(7):1207–1220

    PubMed  Article  CAS  Google Scholar 

  • Csűrös M (2008) Malin: maximum likelihood analysis of intron evolution in eukaryotes. Bioinform Biol Insights 24(13):1538–1539

    PubMed  Article  Google Scholar 

  • Ding B, Gentleman R, Carey V bioDist: different distance measures. http://www.bioconductor.org/packages/release/bioc/html/bioDist.html

  • Dunn CW, Hejnol A, Matus DQ, Pang K, Browne WE, Smith SA, Seaver E, Rouse GW, Obst M, Edgecombe GD, Sorensen MV, Haddock SHD, Schmidt-Rhaesa A, Okusu A, Kristensen RM, Wheeler WC, Martindale MQ, Giribet G (2008) Broad phylogenomic sampling improves resolution of the animal tree of life. Nat Biotechnol 452(7188):745–749

    PubMed  Article  CAS  Google Scholar 

  • Ebersberger I, Strauss S, von Haeseler A (2009) HaMStR: profile hidden markov model based search for orthologs in ESTs. BMC Evol Biol 9:157–157

    PubMed  Article  Google Scholar 

  • Farris JS (1977) Phylogenetic analysis under dollo’s law. Syst Zool 26(1):77–88

    Article  Google Scholar 

  • Fedorov A, Merican AF, Gilbert W (2002) Large-scale comparison of intron positions among animal, plant, and fungal genes. Proc Nat Acad Sci USA 99(25):16,128–16,133

    Article  CAS  Google Scholar 

  • Felsenstein J (2005) PHYLIP (phylogeny inference package) version 3.6. Distributed by the author Department of Genome Sciences, University of Washington, Seattle

  • Gee H (2003) Evolution: ending incongruence. Nat Biotechnol 425(6960):782

    PubMed  Article  CAS  Google Scholar 

  • Gentleman RC, Carey VJ, Bates DM et al (2004) Bioconductor: Open software development for computational biology and bioinformatics. Genome Biol 5:R80

    PubMed  Article  Google Scholar 

  • Groth D, Hartmann S, Friemel M, Hill N, Müller S, Poustka AJ, Panopoulou G (2010) Data integration using scanners with sql output–the bioscanners project at sourceforge. J Integr Bioinforma 7(3)

  • Hartmann S, Vision TJ (2008) Using ESTs for phylogenomics: can one accurately infer a phylogenetic tree from a gappy alignment. BMC Evol Biol 8:95–95

    PubMed  Article  Google Scholar 

  • Hejnol A, Obst M, Stamatakis A, Ott M, Rouse GW, Edgecombe GD, Martinez P, Baguñà à J, Bailly X, Jondelius U, Wiens M, Müller WEG, Seaver E, Wheeler WC, Martindale MQ, Giribet G, Dunn CW (2009) Assessing the root of bilaterian animals with scalable phylogenomic methods. Proc R Soc B: Biol Sci 276(1677):4261–4270

    Article  Google Scholar 

  • Holton TA, Pisani D (2010) Deep Genomic-Scale analyses of the metazoa reject coelomata: Evidence from single- and multigene families analyzed under a supertree and supermatrix paradigm. Genome Biol Evol 2:310–324

    PubMed  Article  Google Scholar 

  • Hummel J, Keshvari N, Weckwerth W, Selbig J (2005) Species-specific analysis of protein sequence motifs using mutual information. BMC Bioinformatics 6:164

    Article  Google Scholar 

  • Irimia M, Roy SW (2008) Spliceosomal introns as tools for genomic and evolutionary analysis. Nucleic Acids Res Suppl 36(5):1703–1712

    PubMed  Article  CAS  Google Scholar 

  • Jeffroy O, Brinkmann H, Delsuc F, Philippe H (2006) Phylogenomics: the beginning of incongruence. Trends Genet 22(4):225–231

    PubMed  Article  CAS  Google Scholar 

  • Katoh K, ichi Kuma K, Toh H, Miyata T (2005) MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res Suppl 33(2):511–518

    PubMed  Article  CAS  Google Scholar 

  • Koonin EV (2009) Intron-dominated genomes of early ancestors of eukaryotes. J Hered 100(5):618–623

    PubMed  Article  CAS  Google Scholar 

  • Krauss V, Pecyna M, Kurz K, Sass H (2005) Phylogenetic mapping of intron positions: a case study of translation initiation factor eIF2. Mol Biol Evol 22(1):74–84

    PubMed  Article  CAS  Google Scholar 

  • Krauss V, Thümmler C, Georgi F, Lehmann J, Stadler PF, Eisenhardt C (2008) Near intron positions are reliable phylogenetic markers: an application to holometabolous insects. Mol Biol Evol 25(5):821–830

    PubMed  Article  CAS  Google Scholar 

  • Li W, Tucker AE, Sung W, Thomas WK, Lynch M (2009) Extensive, recent intron gains in daphnia populations. Sci Agric 326(5957):1260–1262

    PubMed  Article  CAS  Google Scholar 

  • Nguyen HD, Yoshihama M, Kenmochi N (2005) New maximum likelihood estimators for eukaryotic intron evolution. PLoS Comput Biol 1(7):e79

    PubMed  Article  Google Scholar 

  • Paradis E, Claude J, Strimmer K (2004) APE: analyses of phylogenetics and evolution in R language. Bioinform Biol Insights 20:289–290

    PubMed  Article  CAS  Google Scholar 

  • Penner O, Grassberger P, Paczuski M (2011) Sequence alignment, mutual information, and dissimilarity measures for constructing phylogenies. PLoS One 6(1):e14,373

    Article  CAS  Google Scholar 

  • Philippe H, Lartillot N, Brinkmann H (2005) Multigene analyses of bilaterian animals corroborate the monophyly of ecdysozoa, lophotrochozoa, and protostomia. Mol Biol Evol 22(5):1246–1253

    PubMed  Article  CAS  Google Scholar 

  • Qiu W, Schisler N, Stoltzfus A (2004) The evolutionary gain of spliceosomal introns: sequence and phase preferences. Mol Biol Evol 21(7):1252–1263

    PubMed  Article  CAS  Google Scholar 

  • Raible F, Tessmar-Raible K, Osoegawa K, Wincker P, Jubin C, Balavoine G, Ferrier D, Benes V, de~Jong P, Weissenbach J, Bork P, Arendt D (2005) Vertebrate-Type Intron-Rich genes in the marine annelid platynereis dumerilii. Sci Agric 310(5752):1325–1326

    PubMed  Article  CAS  Google Scholar 

  • Rogozin IB, Wolf YI, Sorokin AV, Mirkin BG, Koonin EV (2003) Remarkable interkingdom conservation of intron positions and massive, lineage-specific intron loss and gain in eukaryotic evolution. Curr Biol 13(17):1512–1517

    PubMed  Article  CAS  Google Scholar 

  • Rokas, Holland (2000) Rare genomic changes as a tool for phylogenetics. Trends Ecol Evol 15(11):454–459

    Google Scholar 

  • Rokas A, Williams BL, King N, Carroll SB (2003) Genome-scale approaches to resolving incongruence in molecular phylogenies. Nat Biotechnol 425(6960):798–804

    PubMed  Article  CAS  Google Scholar 

  • Roth A, Gonnet G, Dessimoz C (2008) Algorithm of OMA for large-scale orthology inference. BMC Bioinformatics 9(1):518

    Article  Google Scholar 

  • Roy SW (2006) Intron-rich ancestors. Trends Genet 22(9):468–471

    PubMed  Article  CAS  Google Scholar 

  • Roy SW, Gilbert W (2005a) Resolution of a deep animal divergence by the pattern of intron conservation. Proc Nat Acad Sci USA 102(12):4403–4408

    PubMed  Article  CAS  Google Scholar 

  • Roy SW, Gilbert W (2005b) Rates of intron loss and gain: implications for early eukaryotic evolution. Proc Nat Acad Sci USA 102(16):5773–5778

    PubMed  Article  CAS  Google Scholar 

  • Roy SW, Gilbert W (2005c) Complex early genes. Proc Nat Acad Sci USA 102(6):1986–1991

    PubMed  Article  CAS  Google Scholar 

  • Roy SW, Gilbert W (2006) The evolution of spliceosomal introns: patterns, puzzles and progress. Nat Rev Genet 7(3):211–221

    PubMed  Google Scholar 

  • Roy SW, Penny D (2007) Patterns of intron loss and gain in plants: intron loss-dominated evolution and genome-wide comparison of O. sativa and A. thaliana. Mol Biol Evol 24(1):171–181

    PubMed  Article  CAS  Google Scholar 

  • Roy SW, Irimia M (2009) Mystery of intron gain: new data and new models. Trends Genet 25(2):67–73

    PubMed  Article  CAS  Google Scholar 

  • Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27:379–423, 623–656

    Google Scholar 

  • Stamatakis A (2006) RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinform Biol Insights 22(21):2688–2690

    PubMed  Article  CAS  Google Scholar 

  • Sverdlov AV, Rogozin IB, Babenko VN, Koonin EV (2005) Conservation versus parallel gains in intron evolution. Nucleic Acids Res Suppl 33(6):1741–1748

    PubMed  Article  CAS  Google Scholar 

  • Swofford D (2000) PAUP*: Phylogenetic analysis using parsimony and other methods, 4b10 edn, Sinauer.

  • Venkatesh B, Ning Y, Brenner S (1999) Late changes in spliceosomal introns define clades in vertebrate evolution. Proc Nat Acad Sci USA 96(18):10,267–10,271

    Article  CAS  Google Scholar 

  • Venkatesh B, Erdmann MV, Brenner S (2001) Molecular synapomorphies resolve evolutionary relationships of extant jawed vertebrates. Proc Nat Acad Sci USA 98(20):11382–11387

    PubMed  Article  CAS  Google Scholar 

  • Weckwerth W, Selbig J (2003) Scoring and identifying organism-specific functional patterns and putative phosphorylation sites in protein sequences using mutual information. Biochem Biophys Res Commun 307(3):516–21

    PubMed  Article  CAS  Google Scholar 

  • Wilkerson MD, Ru Y, Brendel VP (2009) Common introns within orthologous genes: software and application to plants. Brief Bioinforma 10(6):631–644

    Article  CAS  Google Scholar 

  • Yandell M, Mungall CJ, Smith C, Prochnik S, Kaminker J, Hartzell G, Lewis S, Rubin GM (2006) Large-scale trends in the evolution of gene structures within 11 animal genomes. PLoS Comput Biol 2(3):e15

    PubMed  Article  Google Scholar 

  • Zheng J, Rogozin IB, Koonin EV, Przytycka TM (2007) Support for the coelomata clade of animals from a rigorous analysis of the pattern of intron conservation. Mol Biol Evol 24(11):2583–2592

    PubMed  Article  CAS  Google Scholar 

Download references

Acknowledgments

We thank Jörg Lehmann for discussion and Ingo Ebersberger for providing core orthologs and HMM profiles for the HaMStR search

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stefanie Hartmann.

Additional information

The work was supported by the German Science Foundation (DFG) special priority program “Deep Metazoan Phylogeny” SP1174 HA5744/1-1.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Hill, N., Leow, A., Bleidorn, C. et al. Analysis of phylogenetic signal in protostomial intron patterns using Mutual Information. Theory Biosci. 132, 93–104 (2013). https://doi.org/10.1007/s12064-012-0173-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12064-012-0173-0

Keywords

  • Mutual Information
  • Evolution
  • Gene structure