Abstract
We have witnessed a rapid advancement in high-throughput genome sequencing and the maturation of long-read technologies. However, an accurate assembly of polyploid potato genomes still remains challenging. Sequencing the double-monoploid genome of Solanum tuberosum Group Phureja (Xu et al., Nature 475:189–195, 2011) has enabled functional studies of polyploid potato cultivars using RNA sequencing (RNA-Seq) technologies, although with the limitation of not covering cultivar-specific gene expression. The accumulated RNA-Seq datasets from these cultivars can be leveraged to assemble tetraploid potato transcriptomes that enable the analysis of genes that are not limited to reference genome annotations. To increase transcriptomes’ quality, short-read assemblies are nowadays complemented with full-length transcriptome sequencing using Pacific Biosciences or Oxford Nanopore platforms. In this chapter we give a detailed guide on a pipeline for de novo transcriptome assembly of polyploid potato genotypes and their integration into a pan-transcriptome.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Zhao L, Zhang H, Kohnen MV et al (2019) Analysis of transcriptome and epitranscriptome in plants using pacbio iso-seq and nanopore-based direct RNA sequencing. Front Genet 10:1–14
Xu X, Pan S, Cheng S et al (2011) Genome sequence and analysis of the tuber crop potato. Nature 475:189–195
Denisov G, Walenz B, Halpern AL et al (2008) Consensus generation and variant detection by Celera assembler. Bioinformatics 24:1035–1040. https://doi.org/10.1093/bioinformatics/btn074
de Bruijn NG (1946) A combinatorial problem. Proc Sect Sci K Ned Akad van Wet te Amsterdam 49:758–764
Moreton J, Izquierdo A, Emes RD (2016) Assembly, assessment, and availability of de novo generated eukaryotic transcriptomes. Front Genet 6:361
Hölzer M, Marz M (2019) De novo transcriptome assembly: a comprehensive cross-species comparison of short-read RNA-Seq assemblers. Gigascience 8:1–16
Wang S, Gribskov M (2017) Comprehensive evaluation of de novo transcriptome assembly programs and their effects on differential gene expression analysis. Bioinformatics 33:327–333
Zhao QY, Wang Y, Kong YM et al (2011) Optimizing de novo transcriptome assembly from short-read RNA-Seq data: a comparative study. BMC Bioinformatics 12(Suppl 14):S2
Zhang G, Sun M, Wang J et al (2019) PacBio full-length cDNA sequencing integrated with RNA-seq reads drastically improves the discovery of splicing transcripts in rice. Plant J 97:296–305
Shirley M, Ma Z, Pedersen B, Wheelan S (2015) Efficient “pythonic” access to FASTA files using pyfaidx. PeerJ PrePrints 3:e970v1
Ewels P, Magnusson M, Lundin S, Käller M (2016) MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32:3047–3048
Bushmanova E, Antipov D, Lapidus A, Prjibelski AD (2019) RnaSPAdes: a de novo transcriptome assembler and its application to RNA-Seq data. Gigascience 8:1–13
Zerbino DR (2010) Using the Velvet de novo assembler for short-read sequencing technologies. Curr Protoc Bioinformatics 31:11.5.1–11.5.12
Schulz MH, Zerbino DR, Vingron M, Birney E (2012) Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics 28:1086–1092
Grabherr MG, Haas BJ, Yassour M et al (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29:644–652
Fu L, Niu B, Zhu Z et al (2012) CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28:3150–3152
Camacho C, Coulouris G, Avagyan V et al (2009) BLAST+: architecture and applications. BMC Bioinformatics 10:421
Gilbert DG (2019) Genes of the pig, Sus scrofa, reconstructed with EvidentialGene. PeerJ 7:e6374
Dobin A, Davis CA, Schlesinger F et al (2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29:15–21
Li H, Handsaker B, Wysoker A et al (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079
Smith-Unna R, Boursnell C, Patro R et al (2016) TransRate: reference-free quality assessment of de novo transcriptome assemblies. Genome Res 26:1134–1144
Jones P, Binns D, Chang HY et al (2014) InterProScan 5: genome-scale protein function classification. Bioinformatics 30:1236–1240
Schäffer AA, Nawrocki EP, Choi Y et al (2018) VecScreen-plus-taxonomy: imposing a tax(onomy) increase on vector contamination screening. Bioinformatics 34:755–759
Buchfink B, Xie C, Huson DH (2014) Fast and sensitive protein alignment using DIAMOND. Nat Methods 12:59–60
Simão FA, Waterhouse RM, Ioannidis P et al (2015) BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31:3210–3212
Waterhouse RM, Seppey M, Simao FA et al (2018) BUSCO applications from quality assessments to gene prediction and phylogenomics. Mol Biol Evol 35:543–548
Kim D, Song L, Breitwieser FP, Salzberg SL (2016) Centrifuge: rapid and sensitive classification of metagenomic sequences. Genome Res 26:1721–1729
Breitwieser FP, Salzberg SL (2020) Pavian: interactive analysis of metagenomics data for microbiome studies and pathogen identification. Bioinformatics 36:1303–1304
Nakamura T, Yamada KD, Tomii K, Katoh K (2018) Parallelization of MAFFT for large-scale multiple sequence alignments. Bioinformatics 34:2490–2492
Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32:1792–1797
Sievers F, Higgins DG (2018) Clustal Omega for making accurate alignments of many protein sequences. Protein Sci 27:135–145
Kimura M (1980) A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol 16:111–120
Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4:406–425
Larkin MA, Blackshields G, Brown NP et al (2007) Clustal W and Clustal X version 2.0. Bioinformatics 23:2947–2948
Brown NP, Leroy C, Sander C (1998) MView: a web-compatible database search or multiple alignment viewer. Bioinformatics 14:380–381
Sansone SA, Rocca-Serra P, Field D et al (2012) Toward interoperable bioscience data. Nat Genet 44:121–126
Del Fabbro C, Scalabrin S, Morgante M, Giorgi FM (2013) An extensive evaluation of read trimming effects on Illumina NGS data analysis. PLoS One 8:e85024
Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J 17:10
Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120
Cozzuto L, Liu H, Pryszcz LP et al (2020) MasterOfPores: a workflow for the analysis of Oxford nanopore direct RNA sequencing datasets. Front Genet 11:211
Li B, Fillmore N, Bai Y et al (2014) Evaluation of de novo transcriptome assemblies from RNA-Seq data. Genome Biol 15:553
Bushmanova E, Antipov D, Lapidus A et al (2016) RnaQUAST: a quality assessment tool for de novo transcriptome assemblies. Bioinformatics 32:2210–2212
Eddy SR (2011) Accelerated profile HMM searches. PLoS Comput Biol 7:1002195
Finn RD, Bateman A, Clements J et al (2014) Pfam: the protein families database. Nucleic Acids Res 42:222–230
Ashburner M, Ball C, Blake J et al (2000) Gene ontology: tool for the unification of biology. Nat Genet 25:25–29
Kanehisa M, Goto S, Sato Y et al (2012) KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res 40:109–114
Thimm O, Bläsing O, Gibon Y et al (2004) MAPMAN: a user-driven tool to display genomics data sets onto diagrams of metabolic pathways and other biological processes. Plant J 37:914–939
Csárdi G, Nepusz T (2006) The igraph software package for complex network research. Int J Complex Syst 1695:1695
Crusoe MR, Alameldin HF, Awad S et al (2015) The khmer software package: enabling efficient nucleotide sequence analysis. F1000Research 4:900
Xie Y, Wu G, Tang J et al (2014) SOAPdenovo-trans: de novo transcriptome assembly with short RNA-Seq reads. Bioinformatics 30:1660–1666
Robertson G, Schein J, Chiu R et al (2010) De novo assembly and analysis of RNA-seq data. Nat Methods 7:909–912
Peng Y, Leung HCM, Yiu SM, Chin FYL (2011) T-IDBA: a de novo iterative de Bruijn graph assembler for transcriptome (extended abstract). In: Lecture notes in computer science, Lecture notes in artificial intelligence and lecture notes in bioinformatics. Springer, Berlin, pp 337–338
Liu J, Yu T, Mu Z, Li G (2019) TransLiG: a de novo transcriptome assembler that uses line graph iteration. Genome Biol 20:81
Zhang Y, Sun Y, Cole JR (2014) A scalable and accurate targeted gene assembly tool (SAT-assembler) for next-generation sequencing data. PLoS Comput Biol 10:1003737
Geniza M, Jaiswal P (2017) Tools for building de novo transcriptome assembly. Curr Plant Biol 11–12:41–45
Acknowledgments
The authors would like to thank the coworkers from the Department of Biotechnology and Systems Biology of the National Institute of Biology for providing raw RNA-Seq datasets, Henrik Krnec for BLAST output parser, and the Omics bioinformatics team for critical discussions. This project was supported by the Slovenian Research Agency (grants P4-0165, J4-1777, J4-4165, J4-7636, J4-8228, J4-9302, and Z7-1888), and COST actions CA15110 (CHARME) and CA15109 (COSTNET).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Zagorščak, M., Petek, M. (2021). A Comprehensive Guide to Potato Transcriptome Assembly. In: Dobnik, D., Gruden, K., Ramšak, Ž., Coll, A. (eds) Solanum tuberosum. Methods in Molecular Biology, vol 2354. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-1609-3_8
Download citation
DOI: https://doi.org/10.1007/978-1-0716-1609-3_8
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-1608-6
Online ISBN: 978-1-0716-1609-3
eBook Packages: Springer Protocols