Advertisement

Bioinformatics Protocols for Quickly Obtaining Large-Scale Data Sets for Phylogenetic Inferences

  • Hugo López-Fernández
  • Pedro Duque
  • Sílvia Henriques
  • Noé Vázquez
  • Florentino Fdez-Riverola
  • Cristina P. Vieira
  • Miguel Reboiro-Jato
  • Jorge Vieira
Original Research Article

Abstract

Useful insight into the evolution of genes and gene families can be provided by the analysis of all available genome datasets rather than just a few, which are usually those of model species. Handling and transforming such datasets into the desired format for downstream analyses is, however, often a difficult and time-consuming task for researchers without a background in informatics. Therefore, we present two simple and fast protocols for data preparation, using an easy-to-install, open-source, cross-platform software application with user-friendly, rich graphical user interface (SEDA; http://www.sing-group.org/seda/index.html). The first protocol is a substantial improvement over one recently published (López-Fernández et al. Practical applications of computational biology and bioinformatics, 12th International conference. Springer, Cham, pp 88–96 (2019)[1]), which was used to study the evolution of GULO, a gene that encodes the enzyme responsible for the last step of vitamin C synthesis. In this paper, we show how the sequence data file used for the phylogenetic analyses can now be obtained much faster by changing the way coding sequence isoforms are removed, using the newly implemented SEDA operation “Remove isoforms”. This protocol can be used to easily show that putative functional GULO genes are present in several Prostotomian groups such as Molluscs, Priapulida and Arachnida. Such findings could have been easily missed if only a few Protostomian model species had been used. The second protocol allowed us to identify positively selected amino acid sites in a set of 19 primate HLA immunity genes. Interestingly, the proteins encoded by MHC class II genes can show just as many positively selected amino acid sites as those encoded by classical MHC class I genes. Although a significant percentage of codons, which can be as high as 14.8%, are evolving under positive selection, the main mode of evolution of HLA immunity genes is purifying selection. Using a large number of primate species, the probability of missing the identification of positively selected amino acid sites is lower. Both projects were performed in less than one week, and most of the time was spent running the analyses rather than preparing the files. Such protocols can be easily adapted to answer many other questions using a phylogenetic approach.

Keywords

Large scale analyses GULO HLA Animals SEDA Positive selection 

Notes

Acknowledgements

This article is a result of the project Norte-01-0145-FEDER-000008-Porto Neurosciences and Neurologic Disease Research Initiative at I3S, supported by Norte Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020 Partnership Agreement, through the European Regional Development Fund (FEDER). SING group thanks CITI (Centro de Investigación, Transferencia e Innovación) from University of Vigo for hosting its IT infrastructure. This work was partially funded by Consellería de Cultura, Educación e Ordenación Universitaria (Xunta de Galicia) and FEDER (European Union). H. López-Fernández is supported by a post-doctoral fellowship from Xunta de Galicia (ED481B 2016/068 − 0).

Supplementary material

12539_2018_312_MOESM1_ESM.pdf (730 kb)
Supplementary material 1 (PDF 729 KB)

References

  1. 1.
    López-Fernández H, Duque P, Henriques S, Vázquez N, Fdez-Riverola F, Vieira CP, Reboiro-Jato M, Vieira J (2019) A bioinformatics protocol for quickly creating large-scale phylogenetic trees. In: Fdez-Riverola F, Mohamad MS, Rocha M, De Paz JF, González P (eds) Practical applications of computational biology and bioinformatics, 12th International conference. Springer, Cham, pp 88–96Google Scholar
  2. 2.
    Wintergerst ES, Maggini S, Hornig DH (2006) Immune-enhancing role of vitamin C and zinc and effect on clinical conditions. Ann Nutr Metab 50:85–94CrossRefPubMedGoogle Scholar
  3. 3.
    Englard S, Seifter S (1986) The biochemical functions of ascorbic acid. Annu Rev Nutr 6:365–406CrossRefPubMedGoogle Scholar
  4. 4.
    Hansen S, Tveden-Nyborg P, Lykkesfeldt J (2014) Does vitamin C deficiency affect cognitive development and function? Nutrients 6:3818–3846CrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    Drouin G, Godin J-R, Page B (2011) The genetics of vitamin C loss in vertebrates. Curr Genom 12:371–378CrossRefGoogle Scholar
  6. 6.
    Klein J, Huigin C, Deutsch J (1994) MHC polymorphism and parasites. Philos Trans R Soc Lond B Biol Sci 346:351–358CrossRefPubMedGoogle Scholar
  7. 7.
    Hedrick PW (2002) Pathogen resistance and genetic variation at MHC loci. Evolution 56:1902–1908CrossRefPubMedGoogle Scholar
  8. 8.
    Pyo C-W, Williams LM, Moore Y, Hyodo H, Li SS, Zhao LP, Sageshima N, Ishitani A, Geraghty DE (2006) HLA-E, HLA-F, and HLA-G polymorphism: genomic sequence defines haplotype structure and variation spanning the nonclassical class I genes. Immunogenetics 58:241–251CrossRefPubMedGoogle Scholar
  9. 9.
    Pierini F, Lenz TL (2018) Divergent allele advantage at human MHC genes: signatures of past and ongoing selection. Mol Biol Evol 35:2145–2158CrossRefPubMedCentralGoogle Scholar
  10. 10.
    Vandiedonck C, Knight JC (2009) The human major histocompatibility complex as a paradigm in genomics research. Brief Funct Genom Proteom 8:379–394CrossRefGoogle Scholar
  11. 11.
    Hewitt EW (2003) The MHC class I antigen presentation pathway: strategies for viral immune evasion. Immunology 110:163–169CrossRefPubMedPubMedCentralGoogle Scholar
  12. 12.
    Roche PA, Furuta K (2015) The ins and outs of MHC class II-mediated antigen processing and presentation. Nat Rev Immunol 15:203–216CrossRefPubMedGoogle Scholar
  13. 13.
    Leferink NGH, Jose MDF, van den Berg WAM, van Berkel WJH (2009) Functional assignment of Glu386 and Arg388 in the active site of l-galactono-γ-lactone dehydrogenase. FEBS Lett 583:3199–3203CrossRefPubMedGoogle Scholar
  14. 14.
    Reboiro-Jato D, Reboiro-Jato M, Fdez-Riverola F, Vieira CP, Fonseca NA, Vieira J (2012) ADOPS–automatic detection of positively selected sites. J Integr Bioinform 9:200CrossRefPubMedGoogle Scholar
  15. 15.
    Kumar S, Stecher G, Tamura K (2016) MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 for bigger datasets. Mol Biol Evol 33:1870–1874CrossRefPubMedPubMedCentralGoogle Scholar
  16. 16.
    Vázquez N, Vieira CP, Amorim BSR, Torres A, López-Fernández H, Fdez-Riverola F, Sousa JLR, Reboiro-Jato M, Vieira J (2018) Large scale analyses and visualization of adaptive amino acid changes projects. Interdiscip Sci Comput Life Sci 10:24–32CrossRefGoogle Scholar
  17. 17.
    Geraghty DE (1990) Human leukocyte antigen F (HLA-F). An expressed HLA gene composed of a class I coding sequence linked to a novel transcribed repetitive element. J Exp Med 171:1–18CrossRefPubMedGoogle Scholar

Copyright information

© International Association of Scientists in the Interdisciplinary Areas 2018

Authors and Affiliations

  • Hugo López-Fernández
    • 1
    • 2
    • 3
    • 4
    • 5
  • Pedro Duque
    • 4
    • 5
    • 6
  • Sílvia Henriques
    • 4
    • 5
  • Noé Vázquez
    • 1
    • 2
  • Florentino Fdez-Riverola
    • 1
    • 2
    • 3
  • Cristina P. Vieira
    • 4
    • 5
  • Miguel Reboiro-Jato
    • 1
    • 2
    • 3
  • Jorge Vieira
    • 4
    • 5
  1. 1.ESEI -Escuela Superior de Ingeniería InformáticaUniversidad de VigoOurenseSpain
  2. 2.Centro de Investigaciones Biomédicas (Centro Singular de Investigación de Galicia)VigoSpain
  3. 3.SING Research GroupGalicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGOVigoSpain
  4. 4.Instituto de Investigação e Inovação em Saúde (I3S)Universidade do PortoPortoPortugal
  5. 5.Instituto de Biologia Molecular e Celular (IBMC)PortoPortugal
  6. 6.Faculdade de CiênciasUniversidade do PortoPortoPortugal

Personalised recommendations