Transcriptome Mining to Identify Genes of Interest: From Local Databases to Phylogenetic Inference

De Luca, Daniele; Lauritano, Chiara

doi:10.1007/978-1-0716-2313-8_3

Daniele De Luca⁴ &
Chiara Lauritano⁵

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2498))

824 Accesses
1 Altmetric

Abstract

The advancement in next-generation sequencing technologies and the dropping of sequencing costs have seen an increase in the amount of transcriptome data generated each year. These data are of big potential for identifying genes and molecular pathways of interest across a plethora of organisms. However, navigating these resources requires some bioinformatics and evolutionary skills. Here, we describe a protocol of transcriptome data mining for genes of interest, from the creation of a protein database to the inference of phylogenetic trees, which was used for marine protists, but can be used as general pipeline across different taxa.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Giani AM, Gallo GR, Gianfranceschi L et al (2020) Long walk to genomics: history and current approaches to genome sequencing and assembly. Comput Struct Biotechnol J 18:9–19
Article CAS Google Scholar
Keeling PJ, Burki F, Wilcox HM et al (2014) The marine microbial eukaryote transcriptome sequencing project (MMETSP): illuminating the functional diversity of eukaryotic life in the oceans through transcriptome sequencing. PLoS Biol 12(6):e1001889
Article Google Scholar
Vingiani GM, Stalberga D, De Luca P et al (2020) De novo transcriptome of the non-saxitoxin producing Alexandrium tamutum reveals new insights on harmful dinoflagellates. Mar Drugs 18:386
Article CAS Google Scholar
Lauritano C, De Luca D, Amoroso M et al (2019) New molecular insights on the response of the green alga Tetraselmis suecica to nitrogen starvation. Sci Rep 9:3336
Article Google Scholar
De Luca D, Lauritano C (2020) In silico identification of type III PKS chalcone and stilbene synthase homologs in marine photosynthetic organisms. Biology 9:110
Article Google Scholar
Vingiani GM, De Luca P, Ianora A et al (2019) Microalgal enzymes with biotechnological applications. Mar Drugs 17:459
Article CAS Google Scholar
Elagoz AM, Ambrosino L, Lauritano C (2020) De novo transcriptome of the diatom Cylindrotheca closterium identifies genes involved in the metabolism of anti-inflammatory compounds. Sci Rep 10:4138
Article CAS Google Scholar
Riccio G, De Luca D, Lauritano C (2020) Monogalactosyldiacylglycerol and sulfolipid synthesis in microalgae. Mar Drugs 18:237
Article CAS Google Scholar
Lauritano C, Ferrante MI, Rogato A (2019) Marine natural products from microalgae: an-omics overview. Mar Drugs 17(5):269
Article CAS Google Scholar
Zheng HQ, Chiang-Hsieh YF, Chien CH et al (2014) AlgaePath: comprehensive analysis of metabolic pathways using transcript abundance data from next-generation sequencing in green algae. BMC Genomics 15(1):1–12
Article Google Scholar
Camacho C, Coulouris G, Avagyan V et al (2009) BLAST+: architecture and applications. BMC Bioinformatics 10(1):1–9
Article Google Scholar
Altschul SF, Gish W, Miller W et al (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410
Article CAS Google Scholar
Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22(22):4673–4680
Article CAS Google Scholar
Edgar RC (2004) MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5(1):1–19
Article Google Scholar
Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32(5):1792–1797
Article CAS Google Scholar
Do CB, Mahabhashyam MS, Brudno M et al (2005) ProbCons: probabilistic consistency-based multiple sequence alignment. Genome Res 15(2):330–340
Article CAS Google Scholar
Feng DF, Doolittle RF (1987) Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J Mol Evol 25(4):351–360
Article CAS Google Scholar
Papadopoulos JS, Agarwala R (2007) COBALT: constraint-based alignment tool for multiple protein sequences. Bioinformatics 23(9):1073–1079
Article CAS Google Scholar
Marchler-Bauer A, Anderson JB, Cherukuri PF et al (2005) CDD: a conserved domain database for protein classification. Nucleic Acids Res 33(suppl_1):D192–D196
CAS PubMed Google Scholar
Hulo N, Bairoch A, Bulliard V et al (2006) The PROSITE database. Nucleic Acids Res 34(suppl_1):D227–D230
Article CAS Google Scholar
Edgar RC, Batzoglou S (2006) Multiple sequence alignment. Curr Opin Struct Biol 16(3):368–373
Article CAS Google Scholar
Thompson JD, Linard B, Lecompte O et al (2011) A comprehensive benchmark study of multiple sequence alignment methods: current challenges and future perspectives. PLoS One 6(3):e18093
Article CAS Google Scholar
Vogt G, Etzold T, Argos P (1995) An assessment of amino acid exchange matrices in aligning protein sequences: the twilight zone revisited. J Mol Biol 249(4):816–831
Article CAS Google Scholar
Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T (2009) trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25(15):1972–1973
Article Google Scholar
Darriba D, Taboada GL, Doallo R et al (2012) jModelTest 2: more models, new heuristics and parallel computing. Nat Methods 9(8):772–772
Article CAS Google Scholar
Kalyaanamoorthy S, Minh BQ, Wong TK et al (2017) ModelFinder: fast model selection for accurate phylogenetic estimates. Nat Methods 14(6):587–589
Article CAS Google Scholar
Stamatakis A (2015) Using RAxML to infer phylogenies. Curr Protoc Bioinformatics 51(1):6–14
Article Google Scholar
Lefort V, Longueville JE, Gascuel O (2017) SMS: smart model selection in PhyML. Mol Biol Evol 34(9):2422–2424
Article CAS Google Scholar
Pearson WR (2013) An introduction to sequence similarity (“homology”) searching. Curr Protoc Bioinformatics 42(1):3–1
Article Google Scholar
Anisimova M, Gil M, Dufayard JF et al (2011) Survey of branch support methods demonstrates accuracy, power, and robustness of fast likelihood-based approximation schemes. Syst Biol 60(5):685–699
Article Google Scholar

Download references

Acknowledgments

The authors thank Servier Medical Art (SMART) website (https://smart.servier.com/) by Servier for the elements of Fig. 1.

Author information

Authors and Affiliations

Department of Biology, University of Naples Federico II, Botanic Garden of Naples, Naples, Italy
Daniele De Luca
Department of Ecosustainable Marine Biotechnology, Stazione Zoologica Anton Dohrn, Naples, Italy
Chiara Lauritano

Authors

Daniele De Luca
View author publications
You can also search for this author in PubMed Google Scholar
Chiara Lauritano
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Daniele De Luca or Chiara Lauritano .

Editor information

Editors and Affiliations

Institute of Biosciences and BioResources (IBBR), National Research Council, Naples, Italy
Cinzia Verde
Institute of Biosciences and BioResources (IBBR), National Research Council, Naples, Italy
Daniela Giordano

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

De Luca, D., Lauritano, C. (2022). Transcriptome Mining to Identify Genes of Interest: From Local Databases to Phylogenetic Inference. In: Verde, C., Giordano, D. (eds) Marine Genomics. Methods in Molecular Biology, vol 2498. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2313-8_3

Download citation

DOI: https://doi.org/10.1007/978-1-0716-2313-8_3
Published: 22 June 2022
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-2312-1
Online ISBN: 978-1-0716-2313-8
eBook Packages: Springer Protocols

Publish with us

Policies and ethics