Abstract
Many deep evolutionary divergences still remain unresolved, such as those among major taxa of the Lophotrochozoa. As alternative phylogenetic markers, the intron–exon structure of eukaryotic genomes and the patterns of absence and presence of spliceosomal introns appear to be promising. However, given the potential homoplasy of intron presence, the phylogenetic analysis of this data using standard evolutionary approaches has remained a challenge. Here, we used Mutual Information (MI) to estimate the phylogeny of Protostomia using gene structure data, and we compared these results with those obtained with Dollo Parsimony. Using full genome sequences from nine Metazoa, we identified 447 groups of orthologous sequences with 21,732 introns in 4,870 unique intron positions. We determined the shared absence and presence of introns in the corresponding sequence alignments and have made this data available in “IntronBase”, a web-accessible and downloadable SQLite database. Our results obtained using Dollo Parsimony are obviously misled through systematic errors that arise from multiple intron loss events, but extensive filtering of data improved the quality of the estimated phylogenies. Mutual Information, in contrast, performs better with larger datasets, but at the same time it requires a complete data set, which is difficult to obtain for orthologs from a large number of taxa. Nevertheless, Mutual Information-based distances proved to be useful in analyzing this kind of data, also because the estimation of MI-based distances is independent of evolutionary models and therefore no pre-definitions of ancestral and derived character states are necessary.
This is a preview of subscription content, access via your institution.




References
Adami C (2004) Information theory in molecular biology. Phys Life Rev 1(1):3–22
Ahmadinejad N, Dagan T, Gruenheit N, Martin W, Gabaldon T (2010) Evolution of spliceosomal introns following endosymbiotic gene transfer. BMC Evol Biol 10(1):57
Bauer M, Schuster SM, Sayood K (2008) The average mutual information profile as a genomic signature. BMC Bioinformatics 9:48
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: A practical and powerful approach to multiple testing. J R Stat Soc Ser B (Methodol) 57(1):289–300
Butte A (2002) The use and analysis of microarray data. Nat Rev Drug Discov 1(12):951–60
Carmel L, Rogozin IB, Wolf YI, Koonin EV (2007) Evolutionarily conserved genes preferentially accumulate introns. Genome Res 17(7):1045–1050
Cho S, Jin S, Cohen A, Ellis RE (2004) A phylogeny of caenorhabditis reveals frequent loss of introns during nematode evolution. Genome Res 14(7):1207–1220
Csűrös M (2008) Malin: maximum likelihood analysis of intron evolution in eukaryotes. Bioinform Biol Insights 24(13):1538–1539
Ding B, Gentleman R, Carey V bioDist: different distance measures. http://www.bioconductor.org/packages/release/bioc/html/bioDist.html
Dunn CW, Hejnol A, Matus DQ, Pang K, Browne WE, Smith SA, Seaver E, Rouse GW, Obst M, Edgecombe GD, Sorensen MV, Haddock SHD, Schmidt-Rhaesa A, Okusu A, Kristensen RM, Wheeler WC, Martindale MQ, Giribet G (2008) Broad phylogenomic sampling improves resolution of the animal tree of life. Nat Biotechnol 452(7188):745–749
Ebersberger I, Strauss S, von Haeseler A (2009) HaMStR: profile hidden markov model based search for orthologs in ESTs. BMC Evol Biol 9:157–157
Farris JS (1977) Phylogenetic analysis under dollo’s law. Syst Zool 26(1):77–88
Fedorov A, Merican AF, Gilbert W (2002) Large-scale comparison of intron positions among animal, plant, and fungal genes. Proc Nat Acad Sci USA 99(25):16,128–16,133
Felsenstein J (2005) PHYLIP (phylogeny inference package) version 3.6. Distributed by the author Department of Genome Sciences, University of Washington, Seattle
Gee H (2003) Evolution: ending incongruence. Nat Biotechnol 425(6960):782
Gentleman RC, Carey VJ, Bates DM et al (2004) Bioconductor: Open software development for computational biology and bioinformatics. Genome Biol 5:R80
Groth D, Hartmann S, Friemel M, Hill N, Müller S, Poustka AJ, Panopoulou G (2010) Data integration using scanners with sql output–the bioscanners project at sourceforge. J Integr Bioinforma 7(3)
Hartmann S, Vision TJ (2008) Using ESTs for phylogenomics: can one accurately infer a phylogenetic tree from a gappy alignment. BMC Evol Biol 8:95–95
Hejnol A, Obst M, Stamatakis A, Ott M, Rouse GW, Edgecombe GD, Martinez P, Baguñà à J, Bailly X, Jondelius U, Wiens M, Müller WEG, Seaver E, Wheeler WC, Martindale MQ, Giribet G, Dunn CW (2009) Assessing the root of bilaterian animals with scalable phylogenomic methods. Proc R Soc B: Biol Sci 276(1677):4261–4270
Holton TA, Pisani D (2010) Deep Genomic-Scale analyses of the metazoa reject coelomata: Evidence from single- and multigene families analyzed under a supertree and supermatrix paradigm. Genome Biol Evol 2:310–324
Hummel J, Keshvari N, Weckwerth W, Selbig J (2005) Species-specific analysis of protein sequence motifs using mutual information. BMC Bioinformatics 6:164
Irimia M, Roy SW (2008) Spliceosomal introns as tools for genomic and evolutionary analysis. Nucleic Acids Res Suppl 36(5):1703–1712
Jeffroy O, Brinkmann H, Delsuc F, Philippe H (2006) Phylogenomics: the beginning of incongruence. Trends Genet 22(4):225–231
Katoh K, ichi Kuma K, Toh H, Miyata T (2005) MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res Suppl 33(2):511–518
Koonin EV (2009) Intron-dominated genomes of early ancestors of eukaryotes. J Hered 100(5):618–623
Krauss V, Pecyna M, Kurz K, Sass H (2005) Phylogenetic mapping of intron positions: a case study of translation initiation factor eIF2. Mol Biol Evol 22(1):74–84
Krauss V, Thümmler C, Georgi F, Lehmann J, Stadler PF, Eisenhardt C (2008) Near intron positions are reliable phylogenetic markers: an application to holometabolous insects. Mol Biol Evol 25(5):821–830
Li W, Tucker AE, Sung W, Thomas WK, Lynch M (2009) Extensive, recent intron gains in daphnia populations. Sci Agric 326(5957):1260–1262
Nguyen HD, Yoshihama M, Kenmochi N (2005) New maximum likelihood estimators for eukaryotic intron evolution. PLoS Comput Biol 1(7):e79
Paradis E, Claude J, Strimmer K (2004) APE: analyses of phylogenetics and evolution in R language. Bioinform Biol Insights 20:289–290
Penner O, Grassberger P, Paczuski M (2011) Sequence alignment, mutual information, and dissimilarity measures for constructing phylogenies. PLoS One 6(1):e14,373
Philippe H, Lartillot N, Brinkmann H (2005) Multigene analyses of bilaterian animals corroborate the monophyly of ecdysozoa, lophotrochozoa, and protostomia. Mol Biol Evol 22(5):1246–1253
Qiu W, Schisler N, Stoltzfus A (2004) The evolutionary gain of spliceosomal introns: sequence and phase preferences. Mol Biol Evol 21(7):1252–1263
Raible F, Tessmar-Raible K, Osoegawa K, Wincker P, Jubin C, Balavoine G, Ferrier D, Benes V, de~Jong P, Weissenbach J, Bork P, Arendt D (2005) Vertebrate-Type Intron-Rich genes in the marine annelid platynereis dumerilii. Sci Agric 310(5752):1325–1326
Rogozin IB, Wolf YI, Sorokin AV, Mirkin BG, Koonin EV (2003) Remarkable interkingdom conservation of intron positions and massive, lineage-specific intron loss and gain in eukaryotic evolution. Curr Biol 13(17):1512–1517
Rokas, Holland (2000) Rare genomic changes as a tool for phylogenetics. Trends Ecol Evol 15(11):454–459
Rokas A, Williams BL, King N, Carroll SB (2003) Genome-scale approaches to resolving incongruence in molecular phylogenies. Nat Biotechnol 425(6960):798–804
Roth A, Gonnet G, Dessimoz C (2008) Algorithm of OMA for large-scale orthology inference. BMC Bioinformatics 9(1):518
Roy SW (2006) Intron-rich ancestors. Trends Genet 22(9):468–471
Roy SW, Gilbert W (2005a) Resolution of a deep animal divergence by the pattern of intron conservation. Proc Nat Acad Sci USA 102(12):4403–4408
Roy SW, Gilbert W (2005b) Rates of intron loss and gain: implications for early eukaryotic evolution. Proc Nat Acad Sci USA 102(16):5773–5778
Roy SW, Gilbert W (2005c) Complex early genes. Proc Nat Acad Sci USA 102(6):1986–1991
Roy SW, Gilbert W (2006) The evolution of spliceosomal introns: patterns, puzzles and progress. Nat Rev Genet 7(3):211–221
Roy SW, Penny D (2007) Patterns of intron loss and gain in plants: intron loss-dominated evolution and genome-wide comparison of O. sativa and A. thaliana. Mol Biol Evol 24(1):171–181
Roy SW, Irimia M (2009) Mystery of intron gain: new data and new models. Trends Genet 25(2):67–73
Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27:379–423, 623–656
Stamatakis A (2006) RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinform Biol Insights 22(21):2688–2690
Sverdlov AV, Rogozin IB, Babenko VN, Koonin EV (2005) Conservation versus parallel gains in intron evolution. Nucleic Acids Res Suppl 33(6):1741–1748
Swofford D (2000) PAUP*: Phylogenetic analysis using parsimony and other methods, 4b10 edn, Sinauer.
Venkatesh B, Ning Y, Brenner S (1999) Late changes in spliceosomal introns define clades in vertebrate evolution. Proc Nat Acad Sci USA 96(18):10,267–10,271
Venkatesh B, Erdmann MV, Brenner S (2001) Molecular synapomorphies resolve evolutionary relationships of extant jawed vertebrates. Proc Nat Acad Sci USA 98(20):11382–11387
Weckwerth W, Selbig J (2003) Scoring and identifying organism-specific functional patterns and putative phosphorylation sites in protein sequences using mutual information. Biochem Biophys Res Commun 307(3):516–21
Wilkerson MD, Ru Y, Brendel VP (2009) Common introns within orthologous genes: software and application to plants. Brief Bioinforma 10(6):631–644
Yandell M, Mungall CJ, Smith C, Prochnik S, Kaminker J, Hartzell G, Lewis S, Rubin GM (2006) Large-scale trends in the evolution of gene structures within 11 animal genomes. PLoS Comput Biol 2(3):e15
Zheng J, Rogozin IB, Koonin EV, Przytycka TM (2007) Support for the coelomata clade of animals from a rigorous analysis of the pattern of intron conservation. Mol Biol Evol 24(11):2583–2592
Acknowledgments
We thank Jörg Lehmann for discussion and Ingo Ebersberger for providing core orthologs and HMM profiles for the HaMStR search
Author information
Authors and Affiliations
Corresponding author
Additional information
The work was supported by the German Science Foundation (DFG) special priority program “Deep Metazoan Phylogeny” SP1174 HA5744/1-1.
Rights and permissions
About this article
Cite this article
Hill, N., Leow, A., Bleidorn, C. et al. Analysis of phylogenetic signal in protostomial intron patterns using Mutual Information. Theory Biosci. 132, 93–104 (2013). https://doi.org/10.1007/s12064-012-0173-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12064-012-0173-0