Advertisement

Improved Core Genes Prediction for Constructing Well-Supported Phylogenetic Trees in Large Sets of Plant Species

  • Bassam AlKindy
  • Huda Al-Nayyef
  • Christophe Guyeux
  • Jean-Franc̨ois Couchot
  • Michel Salomon
  • Jacques M. Bahi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9043)

Abstract

The way to infer well-supported phylogenetic trees that precisely reflect the evolutionary process is a challenging task that completely depends on the way the related core genes have been found. In previous computational biology studies, many similarity based algorithms, mainly dependent on calculating sequence alignment matrices, have been proposed to find them. In these kinds of approaches, a significantly high similarity score between two coding sequences extracted from a given annotation tool means that one has the same genes. In a previous work article, we presented a quality test approach (QTA) that improves the core genes quality by combining two annotation tools (namely NCBI, a partially human-curated database, and DOGMA, an efficient annotation algorithm for chloroplasts). This method takes the advantages from both sequence similarity and gene features to guarantee that the core genome contains correct and well-clustered coding sequences (i.e., genes). We then show in this article how useful are such well-defined core genes for biomolecular phylogenetic reconstructions, by investigating various subsets of core genes at various family or genus levels, leading to subtrees with strong bootstraps that are finally merged in a well-supported supertree.

Keywords

Quality test Phylogenetic tree Bootstrap RAxML Core genome Core genes Supertree 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Alkindy, B., Couchot, J., Guyeux, C., Mouly, A., Salomon, M., Bahi, J.M.: Finding the core-genes of chloroplasts. Journal of Bioscience, Biochemistry, and Bioinformatics 4(5), 357–364 (2014)Google Scholar
  2. 2.
    Alkindy, B., Guyeux, C., Couchot, J., Salomon, M., Bahi, J.M.: Gene similarity-based approaches for determining core-genes of chloroplasts. In: 2014 IEEE International Conference on Bioinformatics and Biomedicine, BIBM (2014) 978-1-4799-5669-2/14/Google Scholar
  3. 3.
    Chaffey, N., Alberts, B., Johnson, A., Lewis, J., Raff, M., Roberts, K., Walter, P.: Molecular biology of the cell. Annals of Botany 91(3), 401–401 (2003)CrossRefGoogle Scholar
  4. 4.
    Stoebe, B., Martin, W., Kowallik, K.V.: Distribution and nomenclature of protein-coding genes in 12 sequenced chloroplast genomes. Plant Molecular Biology Reporter 16(3), 243–255 (1998)CrossRefGoogle Scholar
  5. 5.
    Grzebyk, D., Schofield, O., Vetriani, C., Falkowski, P.G.: The mesozoic radiation of eukaryotic algae: The portable plastid hypothesis1. Journal of Phycology 39(2), 259–267 (2003)CrossRefGoogle Scholar
  6. 6.
    De Chiara, M., Hood, D., Muzzi, A., Pickard, D.J., Perkins, T., Pizza, M., Dougan, G., Rappuoli, R., Moxon, E.R., Soriani, M., Donati, C.: Genome sequencing of disease and carriage isolates of non typeable haemophilus influenzae identifies discrete population structure. Proceedings of the National Academy of Sciences 111(14), 5439–5444 (2014)CrossRefGoogle Scholar
  7. 7.
    Kurtz, S., Phillippy, A., Delcher, A.L., Smoot, M., Shumway, M., Antonescu, C., Salzberg, S.L.: Versatile and open software for comparing large genomes. Genome Biology 5(2), R12 (2004)Google Scholar
  8. 8.
    Apweiler, R., ODonovan, C., Martin, M.J., Fleischmann, W., Hermjakob, H., Moeller, S., Contrino, S., Junker, V.: Swiss-prot and its computer-annotated supplement trembl: How to produce high quality automatic annotation. Eur. J. Biochem. 147, 9–15 (1985)CrossRefGoogle Scholar
  9. 9.
    Sugawara, H., Ogasawara, O., Okubo, K., Gojobori, T., Tateno, Y.: Ddbj with new system and face. Nucleic Acids Research 36(suppl. 1), D22–D24 (2008)Google Scholar
  10. 10.
    Wyman, S.K., Jansen, R.K., Boore, J.L.: Automatic annotation of organellar genomes with dogma. Bioinformatics 20(17), 3252–3255 (2004)CrossRefGoogle Scholar
  11. 11.
    Zafar, N., Mazumder, R., Seto, D.: Coregenes: A computational tool for identifying and cataloging. BMC Bioinformatics 33(1), 12 (2002)CrossRefGoogle Scholar
  12. 12.
    Stamatakis, A.: Raxml version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics (2014)Google Scholar
  13. 13.
    Kearse, M., Moir, R., Wilson, A., Stones-Havas, S., Cheung, M., Sturrock, S., Buxton, S., Cooper, A., Markowitz, S., Chris, Duran, o.: Geneious basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics 28(12), 1647–1649 (2012)CrossRefGoogle Scholar
  14. 14.
    Ranwez, V., Criscuolo, A., Douzery, E.J.: Supertriplets: a triplet-based supertree approach to phylogenomics. Bioinformatics 26(12), i115–i123 (2010)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Bassam AlKindy
    • 1
    • 2
  • Huda Al-Nayyef
    • 1
    • 2
  • Christophe Guyeux
    • 1
  • Jean-Franc̨ois Couchot
    • 1
  • Michel Salomon
    • 1
  • Jacques M. Bahi
    • 1
  1. 1.FEMTO-ST Institute, UMR 6174 CNRS, Department of Computer Science for Complex Systems (DISC)University of Franche-ComtéFrance
  2. 2.Department of Computer ScienceUniversity of MustansiriyahBaghdadIraq

Personalised recommendations