Skip to main content

Transcriptome Mining to Identify Genes of Interest: From Local Databases to Phylogenetic Inference

  • Protocol
  • First Online:
Marine Genomics

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2498))

Abstract

The advancement in next-generation sequencing technologies and the dropping of sequencing costs have seen an increase in the amount of transcriptome data generated each year. These data are of big potential for identifying genes and molecular pathways of interest across a plethora of organisms. However, navigating these resources requires some bioinformatics and evolutionary skills. Here, we describe a protocol of transcriptome data mining for genes of interest, from the creation of a protein database to the inference of phylogenetic trees, which was used for marine protists, but can be used as general pipeline across different taxa.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Giani AM, Gallo GR, Gianfranceschi L et al (2020) Long walk to genomics: history and current approaches to genome sequencing and assembly. Comput Struct Biotechnol J 18:9–19

    Article  CAS  Google Scholar 

  2. Keeling PJ, Burki F, Wilcox HM et al (2014) The marine microbial eukaryote transcriptome sequencing project (MMETSP): illuminating the functional diversity of eukaryotic life in the oceans through transcriptome sequencing. PLoS Biol 12(6):e1001889

    Article  Google Scholar 

  3. Vingiani GM, Stalberga D, De Luca P et al (2020) De novo transcriptome of the non-saxitoxin producing Alexandrium tamutum reveals new insights on harmful dinoflagellates. Mar Drugs 18:386

    Article  CAS  Google Scholar 

  4. Lauritano C, De Luca D, Amoroso M et al (2019) New molecular insights on the response of the green alga Tetraselmis suecica to nitrogen starvation. Sci Rep 9:3336

    Article  Google Scholar 

  5. De Luca D, Lauritano C (2020) In silico identification of type III PKS chalcone and stilbene synthase homologs in marine photosynthetic organisms. Biology 9:110

    Article  Google Scholar 

  6. Vingiani GM, De Luca P, Ianora A et al (2019) Microalgal enzymes with biotechnological applications. Mar Drugs 17:459

    Article  CAS  Google Scholar 

  7. Elagoz AM, Ambrosino L, Lauritano C (2020) De novo transcriptome of the diatom Cylindrotheca closterium identifies genes involved in the metabolism of anti-inflammatory compounds. Sci Rep 10:4138

    Article  CAS  Google Scholar 

  8. Riccio G, De Luca D, Lauritano C (2020) Monogalactosyldiacylglycerol and sulfolipid synthesis in microalgae. Mar Drugs 18:237

    Article  CAS  Google Scholar 

  9. Lauritano C, Ferrante MI, Rogato A (2019) Marine natural products from microalgae: an-omics overview. Mar Drugs 17(5):269

    Article  CAS  Google Scholar 

  10. Zheng HQ, Chiang-Hsieh YF, Chien CH et al (2014) AlgaePath: comprehensive analysis of metabolic pathways using transcript abundance data from next-generation sequencing in green algae. BMC Genomics 15(1):1–12

    Article  Google Scholar 

  11. Camacho C, Coulouris G, Avagyan V et al (2009) BLAST+: architecture and applications. BMC Bioinformatics 10(1):1–9

    Article  Google Scholar 

  12. Altschul SF, Gish W, Miller W et al (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410

    Article  CAS  Google Scholar 

  13. Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22(22):4673–4680

    Article  CAS  Google Scholar 

  14. Edgar RC (2004) MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5(1):1–19

    Article  Google Scholar 

  15. Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32(5):1792–1797

    Article  CAS  Google Scholar 

  16. Do CB, Mahabhashyam MS, Brudno M et al (2005) ProbCons: probabilistic consistency-based multiple sequence alignment. Genome Res 15(2):330–340

    Article  CAS  Google Scholar 

  17. Feng DF, Doolittle RF (1987) Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J Mol Evol 25(4):351–360

    Article  CAS  Google Scholar 

  18. Papadopoulos JS, Agarwala R (2007) COBALT: constraint-based alignment tool for multiple protein sequences. Bioinformatics 23(9):1073–1079

    Article  CAS  Google Scholar 

  19. Marchler-Bauer A, Anderson JB, Cherukuri PF et al (2005) CDD: a conserved domain database for protein classification. Nucleic Acids Res 33(suppl_1):D192–D196

    CAS  PubMed  Google Scholar 

  20. Hulo N, Bairoch A, Bulliard V et al (2006) The PROSITE database. Nucleic Acids Res 34(suppl_1):D227–D230

    Article  CAS  Google Scholar 

  21. Edgar RC, Batzoglou S (2006) Multiple sequence alignment. Curr Opin Struct Biol 16(3):368–373

    Article  CAS  Google Scholar 

  22. Thompson JD, Linard B, Lecompte O et al (2011) A comprehensive benchmark study of multiple sequence alignment methods: current challenges and future perspectives. PLoS One 6(3):e18093

    Article  CAS  Google Scholar 

  23. Vogt G, Etzold T, Argos P (1995) An assessment of amino acid exchange matrices in aligning protein sequences: the twilight zone revisited. J Mol Biol 249(4):816–831

    Article  CAS  Google Scholar 

  24. Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T (2009) trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25(15):1972–1973

    Article  Google Scholar 

  25. Darriba D, Taboada GL, Doallo R et al (2012) jModelTest 2: more models, new heuristics and parallel computing. Nat Methods 9(8):772–772

    Article  CAS  Google Scholar 

  26. Kalyaanamoorthy S, Minh BQ, Wong TK et al (2017) ModelFinder: fast model selection for accurate phylogenetic estimates. Nat Methods 14(6):587–589

    Article  CAS  Google Scholar 

  27. Stamatakis A (2015) Using RAxML to infer phylogenies. Curr Protoc Bioinformatics 51(1):6–14

    Article  Google Scholar 

  28. Lefort V, Longueville JE, Gascuel O (2017) SMS: smart model selection in PhyML. Mol Biol Evol 34(9):2422–2424

    Article  CAS  Google Scholar 

  29. Pearson WR (2013) An introduction to sequence similarity (“homology”) searching. Curr Protoc Bioinformatics 42(1):3–1

    Article  Google Scholar 

  30. Anisimova M, Gil M, Dufayard JF et al (2011) Survey of branch support methods demonstrates accuracy, power, and robustness of fast likelihood-based approximation schemes. Syst Biol 60(5):685–699

    Article  Google Scholar 

Download references

Acknowledgments

The authors thank Servier Medical Art (SMART) website (https://smart.servier.com/) by Servier for the elements of Fig. 1.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Daniele De Luca or Chiara Lauritano .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

De Luca, D., Lauritano, C. (2022). Transcriptome Mining to Identify Genes of Interest: From Local Databases to Phylogenetic Inference. In: Verde, C., Giordano, D. (eds) Marine Genomics. Methods in Molecular Biology, vol 2498. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2313-8_3

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-2313-8_3

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-2312-1

  • Online ISBN: 978-1-0716-2313-8

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics