Abstract
The advancement in next-generation sequencing technologies and the dropping of sequencing costs have seen an increase in the amount of transcriptome data generated each year. These data are of big potential for identifying genes and molecular pathways of interest across a plethora of organisms. However, navigating these resources requires some bioinformatics and evolutionary skills. Here, we describe a protocol of transcriptome data mining for genes of interest, from the creation of a protein database to the inference of phylogenetic trees, which was used for marine protists, but can be used as general pipeline across different taxa.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Giani AM, Gallo GR, Gianfranceschi L et al (2020) Long walk to genomics: history and current approaches to genome sequencing and assembly. Comput Struct Biotechnol J 18:9–19
Keeling PJ, Burki F, Wilcox HM et al (2014) The marine microbial eukaryote transcriptome sequencing project (MMETSP): illuminating the functional diversity of eukaryotic life in the oceans through transcriptome sequencing. PLoS Biol 12(6):e1001889
Vingiani GM, Stalberga D, De Luca P et al (2020) De novo transcriptome of the non-saxitoxin producing Alexandrium tamutum reveals new insights on harmful dinoflagellates. Mar Drugs 18:386
Lauritano C, De Luca D, Amoroso M et al (2019) New molecular insights on the response of the green alga Tetraselmis suecica to nitrogen starvation. Sci Rep 9:3336
De Luca D, Lauritano C (2020) In silico identification of type III PKS chalcone and stilbene synthase homologs in marine photosynthetic organisms. Biology 9:110
Vingiani GM, De Luca P, Ianora A et al (2019) Microalgal enzymes with biotechnological applications. Mar Drugs 17:459
Elagoz AM, Ambrosino L, Lauritano C (2020) De novo transcriptome of the diatom Cylindrotheca closterium identifies genes involved in the metabolism of anti-inflammatory compounds. Sci Rep 10:4138
Riccio G, De Luca D, Lauritano C (2020) Monogalactosyldiacylglycerol and sulfolipid synthesis in microalgae. Mar Drugs 18:237
Lauritano C, Ferrante MI, Rogato A (2019) Marine natural products from microalgae: an-omics overview. Mar Drugs 17(5):269
Zheng HQ, Chiang-Hsieh YF, Chien CH et al (2014) AlgaePath: comprehensive analysis of metabolic pathways using transcript abundance data from next-generation sequencing in green algae. BMC Genomics 15(1):1–12
Camacho C, Coulouris G, Avagyan V et al (2009) BLAST+: architecture and applications. BMC Bioinformatics 10(1):1–9
Altschul SF, Gish W, Miller W et al (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410
Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22(22):4673–4680
Edgar RC (2004) MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5(1):1–19
Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32(5):1792–1797
Do CB, Mahabhashyam MS, Brudno M et al (2005) ProbCons: probabilistic consistency-based multiple sequence alignment. Genome Res 15(2):330–340
Feng DF, Doolittle RF (1987) Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J Mol Evol 25(4):351–360
Papadopoulos JS, Agarwala R (2007) COBALT: constraint-based alignment tool for multiple protein sequences. Bioinformatics 23(9):1073–1079
Marchler-Bauer A, Anderson JB, Cherukuri PF et al (2005) CDD: a conserved domain database for protein classification. Nucleic Acids Res 33(suppl_1):D192–D196
Hulo N, Bairoch A, Bulliard V et al (2006) The PROSITE database. Nucleic Acids Res 34(suppl_1):D227–D230
Edgar RC, Batzoglou S (2006) Multiple sequence alignment. Curr Opin Struct Biol 16(3):368–373
Thompson JD, Linard B, Lecompte O et al (2011) A comprehensive benchmark study of multiple sequence alignment methods: current challenges and future perspectives. PLoS One 6(3):e18093
Vogt G, Etzold T, Argos P (1995) An assessment of amino acid exchange matrices in aligning protein sequences: the twilight zone revisited. J Mol Biol 249(4):816–831
Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T (2009) trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25(15):1972–1973
Darriba D, Taboada GL, Doallo R et al (2012) jModelTest 2: more models, new heuristics and parallel computing. Nat Methods 9(8):772–772
Kalyaanamoorthy S, Minh BQ, Wong TK et al (2017) ModelFinder: fast model selection for accurate phylogenetic estimates. Nat Methods 14(6):587–589
Stamatakis A (2015) Using RAxML to infer phylogenies. Curr Protoc Bioinformatics 51(1):6–14
Lefort V, Longueville JE, Gascuel O (2017) SMS: smart model selection in PhyML. Mol Biol Evol 34(9):2422–2424
Pearson WR (2013) An introduction to sequence similarity (“homology”) searching. Curr Protoc Bioinformatics 42(1):3–1
Anisimova M, Gil M, Dufayard JF et al (2011) Survey of branch support methods demonstrates accuracy, power, and robustness of fast likelihood-based approximation schemes. Syst Biol 60(5):685–699
Acknowledgments
The authors thank Servier Medical Art (SMART) website (https://smart.servier.com/) by Servier for the elements of Fig. 1.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
De Luca, D., Lauritano, C. (2022). Transcriptome Mining to Identify Genes of Interest: From Local Databases to Phylogenetic Inference. In: Verde, C., Giordano, D. (eds) Marine Genomics. Methods in Molecular Biology, vol 2498. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2313-8_3
Download citation
DOI: https://doi.org/10.1007/978-1-0716-2313-8_3
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-2312-1
Online ISBN: 978-1-0716-2313-8
eBook Packages: Springer Protocols