Skip to main content

A Comprehensive Guide to Potato Transcriptome Assembly

  • Protocol
  • First Online:
Solanum tuberosum

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2354))

Abstract

We have witnessed a rapid advancement in high-throughput genome sequencing and the maturation of long-read technologies. However, an accurate assembly of polyploid potato genomes still remains challenging. Sequencing the double-monoploid genome of Solanum tuberosum Group Phureja (Xu et al., Nature 475:189–195, 2011) has enabled functional studies of polyploid potato cultivars using RNA sequencing (RNA-Seq) technologies, although with the limitation of not covering cultivar-specific gene expression. The accumulated RNA-Seq datasets from these cultivars can be leveraged to assemble tetraploid potato transcriptomes that enable the analysis of genes that are not limited to reference genome annotations. To increase transcriptomes’ quality, short-read assemblies are nowadays complemented with full-length transcriptome sequencing using Pacific Biosciences or Oxford Nanopore platforms. In this chapter we give a detailed guide on a pipeline for de novo transcriptome assembly of polyploid potato genotypes and their integration into a pan-transcriptome.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Zhao L, Zhang H, Kohnen MV et al (2019) Analysis of transcriptome and epitranscriptome in plants using pacbio iso-seq and nanopore-based direct RNA sequencing. Front Genet 10:1–14

    Google Scholar 

  2. Xu X, Pan S, Cheng S et al (2011) Genome sequence and analysis of the tuber crop potato. Nature 475:189–195

    CAS  PubMed  Google Scholar 

  3. Denisov G, Walenz B, Halpern AL et al (2008) Consensus generation and variant detection by Celera assembler. Bioinformatics 24:1035–1040. https://doi.org/10.1093/bioinformatics/btn074

    Article  CAS  PubMed  Google Scholar 

  4. de Bruijn NG (1946) A combinatorial problem. Proc Sect Sci K Ned Akad van Wet te Amsterdam 49:758–764

    Google Scholar 

  5. Moreton J, Izquierdo A, Emes RD (2016) Assembly, assessment, and availability of de novo generated eukaryotic transcriptomes. Front Genet 6:361

    PubMed  PubMed Central  Google Scholar 

  6. Hölzer M, Marz M (2019) De novo transcriptome assembly: a comprehensive cross-species comparison of short-read RNA-Seq assemblers. Gigascience 8:1–16

    Google Scholar 

  7. Wang S, Gribskov M (2017) Comprehensive evaluation of de novo transcriptome assembly programs and their effects on differential gene expression analysis. Bioinformatics 33:327–333

    CAS  PubMed  Google Scholar 

  8. Zhao QY, Wang Y, Kong YM et al (2011) Optimizing de novo transcriptome assembly from short-read RNA-Seq data: a comparative study. BMC Bioinformatics 12(Suppl 14):S2

    CAS  PubMed  PubMed Central  Google Scholar 

  9. Zhang G, Sun M, Wang J et al (2019) PacBio full-length cDNA sequencing integrated with RNA-seq reads drastically improves the discovery of splicing transcripts in rice. Plant J 97:296–305

    CAS  PubMed  Google Scholar 

  10. Shirley M, Ma Z, Pedersen B, Wheelan S (2015) Efficient “pythonic” access to FASTA files using pyfaidx. PeerJ PrePrints 3:e970v1

    Google Scholar 

  11. Ewels P, Magnusson M, Lundin S, Käller M (2016) MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32:3047–3048

    CAS  PubMed  PubMed Central  Google Scholar 

  12. Bushmanova E, Antipov D, Lapidus A, Prjibelski AD (2019) RnaSPAdes: a de novo transcriptome assembler and its application to RNA-Seq data. Gigascience 8:1–13

    CAS  Google Scholar 

  13. Zerbino DR (2010) Using the Velvet de novo assembler for short-read sequencing technologies. Curr Protoc Bioinformatics 31:11.5.1–11.5.12

    Google Scholar 

  14. Schulz MH, Zerbino DR, Vingron M, Birney E (2012) Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics 28:1086–1092

    CAS  PubMed  PubMed Central  Google Scholar 

  15. Grabherr MG, Haas BJ, Yassour M et al (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29:644–652

    CAS  PubMed  PubMed Central  Google Scholar 

  16. Fu L, Niu B, Zhu Z et al (2012) CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28:3150–3152

    CAS  PubMed  PubMed Central  Google Scholar 

  17. Camacho C, Coulouris G, Avagyan V et al (2009) BLAST+: architecture and applications. BMC Bioinformatics 10:421

    PubMed  PubMed Central  Google Scholar 

  18. Gilbert DG (2019) Genes of the pig, Sus scrofa, reconstructed with EvidentialGene. PeerJ 7:e6374

    PubMed  PubMed Central  Google Scholar 

  19. Dobin A, Davis CA, Schlesinger F et al (2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29:15–21

    CAS  PubMed  Google Scholar 

  20. Li H, Handsaker B, Wysoker A et al (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079

    PubMed  PubMed Central  Google Scholar 

  21. Smith-Unna R, Boursnell C, Patro R et al (2016) TransRate: reference-free quality assessment of de novo transcriptome assemblies. Genome Res 26:1134–1144

    CAS  PubMed  PubMed Central  Google Scholar 

  22. Jones P, Binns D, Chang HY et al (2014) InterProScan 5: genome-scale protein function classification. Bioinformatics 30:1236–1240

    CAS  PubMed  PubMed Central  Google Scholar 

  23. Schäffer AA, Nawrocki EP, Choi Y et al (2018) VecScreen-plus-taxonomy: imposing a tax(onomy) increase on vector contamination screening. Bioinformatics 34:755–759

    PubMed  Google Scholar 

  24. Buchfink B, Xie C, Huson DH (2014) Fast and sensitive protein alignment using DIAMOND. Nat Methods 12:59–60

    PubMed  Google Scholar 

  25. Simão FA, Waterhouse RM, Ioannidis P et al (2015) BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31:3210–3212

    PubMed  Google Scholar 

  26. Waterhouse RM, Seppey M, Simao FA et al (2018) BUSCO applications from quality assessments to gene prediction and phylogenomics. Mol Biol Evol 35:543–548

    CAS  PubMed  Google Scholar 

  27. Kim D, Song L, Breitwieser FP, Salzberg SL (2016) Centrifuge: rapid and sensitive classification of metagenomic sequences. Genome Res 26:1721–1729

    CAS  PubMed  PubMed Central  Google Scholar 

  28. Breitwieser FP, Salzberg SL (2020) Pavian: interactive analysis of metagenomics data for microbiome studies and pathogen identification. Bioinformatics 36:1303–1304

    CAS  PubMed  Google Scholar 

  29. Nakamura T, Yamada KD, Tomii K, Katoh K (2018) Parallelization of MAFFT for large-scale multiple sequence alignments. Bioinformatics 34:2490–2492

    CAS  PubMed  PubMed Central  Google Scholar 

  30. Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32:1792–1797

    CAS  PubMed  PubMed Central  Google Scholar 

  31. Sievers F, Higgins DG (2018) Clustal Omega for making accurate alignments of many protein sequences. Protein Sci 27:135–145

    CAS  PubMed  Google Scholar 

  32. Kimura M (1980) A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol 16:111–120

    CAS  PubMed  Google Scholar 

  33. Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4:406–425

    CAS  PubMed  Google Scholar 

  34. Larkin MA, Blackshields G, Brown NP et al (2007) Clustal W and Clustal X version 2.0. Bioinformatics 23:2947–2948

    CAS  PubMed  Google Scholar 

  35. Brown NP, Leroy C, Sander C (1998) MView: a web-compatible database search or multiple alignment viewer. Bioinformatics 14:380–381

    CAS  PubMed  Google Scholar 

  36. Sansone SA, Rocca-Serra P, Field D et al (2012) Toward interoperable bioscience data. Nat Genet 44:121–126

    CAS  PubMed  PubMed Central  Google Scholar 

  37. Del Fabbro C, Scalabrin S, Morgante M, Giorgi FM (2013) An extensive evaluation of read trimming effects on Illumina NGS data analysis. PLoS One 8:e85024

    PubMed  PubMed Central  Google Scholar 

  38. Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J 17:10

    Google Scholar 

  39. Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120

    CAS  PubMed  PubMed Central  Google Scholar 

  40. Cozzuto L, Liu H, Pryszcz LP et al (2020) MasterOfPores: a workflow for the analysis of Oxford nanopore direct RNA sequencing datasets. Front Genet 11:211

    CAS  PubMed  PubMed Central  Google Scholar 

  41. Li B, Fillmore N, Bai Y et al (2014) Evaluation of de novo transcriptome assemblies from RNA-Seq data. Genome Biol 15:553

    PubMed  PubMed Central  Google Scholar 

  42. Bushmanova E, Antipov D, Lapidus A et al (2016) RnaQUAST: a quality assessment tool for de novo transcriptome assemblies. Bioinformatics 32:2210–2212

    CAS  PubMed  Google Scholar 

  43. Eddy SR (2011) Accelerated profile HMM searches. PLoS Comput Biol 7:1002195

    Google Scholar 

  44. Finn RD, Bateman A, Clements J et al (2014) Pfam: the protein families database. Nucleic Acids Res 42:222–230

    Google Scholar 

  45. Ashburner M, Ball C, Blake J et al (2000) Gene ontology: tool for the unification of biology. Nat Genet 25:25–29

    CAS  PubMed  PubMed Central  Google Scholar 

  46. Kanehisa M, Goto S, Sato Y et al (2012) KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res 40:109–114

    Google Scholar 

  47. Thimm O, Bläsing O, Gibon Y et al (2004) MAPMAN: a user-driven tool to display genomics data sets onto diagrams of metabolic pathways and other biological processes. Plant J 37:914–939

    CAS  PubMed  Google Scholar 

  48. Csárdi G, Nepusz T (2006) The igraph software package for complex network research. Int J Complex Syst 1695:1695

    Google Scholar 

  49. Crusoe MR, Alameldin HF, Awad S et al (2015) The khmer software package: enabling efficient nucleotide sequence analysis. F1000Research 4:900

    PubMed  PubMed Central  Google Scholar 

  50. Xie Y, Wu G, Tang J et al (2014) SOAPdenovo-trans: de novo transcriptome assembly with short RNA-Seq reads. Bioinformatics 30:1660–1666

    CAS  PubMed  Google Scholar 

  51. Robertson G, Schein J, Chiu R et al (2010) De novo assembly and analysis of RNA-seq data. Nat Methods 7:909–912

    CAS  PubMed  Google Scholar 

  52. Peng Y, Leung HCM, Yiu SM, Chin FYL (2011) T-IDBA: a de novo iterative de Bruijn graph assembler for transcriptome (extended abstract). In: Lecture notes in computer science, Lecture notes in artificial intelligence and lecture notes in bioinformatics. Springer, Berlin, pp 337–338

    Google Scholar 

  53. Liu J, Yu T, Mu Z, Li G (2019) TransLiG: a de novo transcriptome assembler that uses line graph iteration. Genome Biol 20:81

    PubMed  PubMed Central  Google Scholar 

  54. Zhang Y, Sun Y, Cole JR (2014) A scalable and accurate targeted gene assembly tool (SAT-assembler) for next-generation sequencing data. PLoS Comput Biol 10:1003737

    Google Scholar 

  55. Geniza M, Jaiswal P (2017) Tools for building de novo transcriptome assembly. Curr Plant Biol 11–12:41–45

    Google Scholar 

Download references

Acknowledgments

The authors would like to thank the coworkers from the Department of Biotechnology and Systems Biology of the National Institute of Biology for providing raw RNA-Seq datasets, Henrik Krnec for BLAST output parser, and the Omics bioinformatics team for critical discussions. This project was supported by the Slovenian Research Agency (grants P4-0165, J4-1777, J4-4165, J4-7636, J4-8228, J4-9302, and Z7-1888), and COST actions CA15110 (CHARME) and CA15109 (COSTNET).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maja Zagorščak .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Zagorščak, M., Petek, M. (2021). A Comprehensive Guide to Potato Transcriptome Assembly. In: Dobnik, D., Gruden, K., Ramšak, Ž., Coll, A. (eds) Solanum tuberosum. Methods in Molecular Biology, vol 2354. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-1609-3_8

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-1609-3_8

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-1608-6

  • Online ISBN: 978-1-0716-1609-3

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics