A Comprehensive Guide to Potato Transcriptome Assembly

Zagorščak, Maja; Petek, Marko

doi:10.1007/978-1-0716-1609-3_8

Maja Zagorščak⁶ &
Marko Petek⁶

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2354))

1265 Accesses
2 Altmetric

Abstract

We have witnessed a rapid advancement in high-throughput genome sequencing and the maturation of long-read technologies. However, an accurate assembly of polyploid potato genomes still remains challenging. Sequencing the double-monoploid genome of Solanum tuberosum Group Phureja (Xu et al., Nature 475:189–195, 2011) has enabled functional studies of polyploid potato cultivars using RNA sequencing (RNA-Seq) technologies, although with the limitation of not covering cultivar-specific gene expression. The accumulated RNA-Seq datasets from these cultivars can be leveraged to assemble tetraploid potato transcriptomes that enable the analysis of genes that are not limited to reference genome annotations. To increase transcriptomes’ quality, short-read assemblies are nowadays complemented with full-length transcriptome sequencing using Pacific Biosciences or Oxford Nanopore platforms. In this chapter we give a detailed guide on a pipeline for de novo transcriptome assembly of polyploid potato genotypes and their integration into a pan-transcriptome.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Zhao L, Zhang H, Kohnen MV et al (2019) Analysis of transcriptome and epitranscriptome in plants using pacbio iso-seq and nanopore-based direct RNA sequencing. Front Genet 10:1–14
Google Scholar
Xu X, Pan S, Cheng S et al (2011) Genome sequence and analysis of the tuber crop potato. Nature 475:189–195
CAS PubMed Google Scholar
Denisov G, Walenz B, Halpern AL et al (2008) Consensus generation and variant detection by Celera assembler. Bioinformatics 24:1035–1040. https://doi.org/10.1093/bioinformatics/btn074
Article CAS PubMed Google Scholar
de Bruijn NG (1946) A combinatorial problem. Proc Sect Sci K Ned Akad van Wet te Amsterdam 49:758–764
Google Scholar
Moreton J, Izquierdo A, Emes RD (2016) Assembly, assessment, and availability of de novo generated eukaryotic transcriptomes. Front Genet 6:361
PubMed PubMed Central Google Scholar
Hölzer M, Marz M (2019) De novo transcriptome assembly: a comprehensive cross-species comparison of short-read RNA-Seq assemblers. Gigascience 8:1–16
Google Scholar
Wang S, Gribskov M (2017) Comprehensive evaluation of de novo transcriptome assembly programs and their effects on differential gene expression analysis. Bioinformatics 33:327–333
CAS PubMed Google Scholar
Zhao QY, Wang Y, Kong YM et al (2011) Optimizing de novo transcriptome assembly from short-read RNA-Seq data: a comparative study. BMC Bioinformatics 12(Suppl 14):S2
CAS PubMed PubMed Central Google Scholar
Zhang G, Sun M, Wang J et al (2019) PacBio full-length cDNA sequencing integrated with RNA-seq reads drastically improves the discovery of splicing transcripts in rice. Plant J 97:296–305
CAS PubMed Google Scholar
Shirley M, Ma Z, Pedersen B, Wheelan S (2015) Efficient “pythonic” access to FASTA files using pyfaidx. PeerJ PrePrints 3:e970v1
Google Scholar
Ewels P, Magnusson M, Lundin S, Käller M (2016) MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32:3047–3048
CAS PubMed PubMed Central Google Scholar
Bushmanova E, Antipov D, Lapidus A, Prjibelski AD (2019) RnaSPAdes: a de novo transcriptome assembler and its application to RNA-Seq data. Gigascience 8:1–13
CAS Google Scholar
Zerbino DR (2010) Using the Velvet de novo assembler for short-read sequencing technologies. Curr Protoc Bioinformatics 31:11.5.1–11.5.12
Google Scholar
Schulz MH, Zerbino DR, Vingron M, Birney E (2012) Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics 28:1086–1092
CAS PubMed PubMed Central Google Scholar
Grabherr MG, Haas BJ, Yassour M et al (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29:644–652
CAS PubMed PubMed Central Google Scholar
Fu L, Niu B, Zhu Z et al (2012) CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28:3150–3152
CAS PubMed PubMed Central Google Scholar
Camacho C, Coulouris G, Avagyan V et al (2009) BLAST+: architecture and applications. BMC Bioinformatics 10:421
PubMed PubMed Central Google Scholar
Gilbert DG (2019) Genes of the pig, Sus scrofa, reconstructed with EvidentialGene. PeerJ 7:e6374
PubMed PubMed Central Google Scholar
Dobin A, Davis CA, Schlesinger F et al (2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29:15–21
CAS PubMed Google Scholar
Li H, Handsaker B, Wysoker A et al (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079
PubMed PubMed Central Google Scholar
Smith-Unna R, Boursnell C, Patro R et al (2016) TransRate: reference-free quality assessment of de novo transcriptome assemblies. Genome Res 26:1134–1144
CAS PubMed PubMed Central Google Scholar
Jones P, Binns D, Chang HY et al (2014) InterProScan 5: genome-scale protein function classification. Bioinformatics 30:1236–1240
CAS PubMed PubMed Central Google Scholar
Schäffer AA, Nawrocki EP, Choi Y et al (2018) VecScreen-plus-taxonomy: imposing a tax(onomy) increase on vector contamination screening. Bioinformatics 34:755–759
PubMed Google Scholar
Buchfink B, Xie C, Huson DH (2014) Fast and sensitive protein alignment using DIAMOND. Nat Methods 12:59–60
PubMed Google Scholar
Simão FA, Waterhouse RM, Ioannidis P et al (2015) BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31:3210–3212
PubMed Google Scholar
Waterhouse RM, Seppey M, Simao FA et al (2018) BUSCO applications from quality assessments to gene prediction and phylogenomics. Mol Biol Evol 35:543–548
CAS PubMed Google Scholar
Kim D, Song L, Breitwieser FP, Salzberg SL (2016) Centrifuge: rapid and sensitive classification of metagenomic sequences. Genome Res 26:1721–1729
CAS PubMed PubMed Central Google Scholar
Breitwieser FP, Salzberg SL (2020) Pavian: interactive analysis of metagenomics data for microbiome studies and pathogen identification. Bioinformatics 36:1303–1304
CAS PubMed Google Scholar
Nakamura T, Yamada KD, Tomii K, Katoh K (2018) Parallelization of MAFFT for large-scale multiple sequence alignments. Bioinformatics 34:2490–2492
CAS PubMed PubMed Central Google Scholar
Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32:1792–1797
CAS PubMed PubMed Central Google Scholar
Sievers F, Higgins DG (2018) Clustal Omega for making accurate alignments of many protein sequences. Protein Sci 27:135–145
CAS PubMed Google Scholar
Kimura M (1980) A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol 16:111–120
CAS PubMed Google Scholar
Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4:406–425
CAS PubMed Google Scholar
Larkin MA, Blackshields G, Brown NP et al (2007) Clustal W and Clustal X version 2.0. Bioinformatics 23:2947–2948
CAS PubMed Google Scholar
Brown NP, Leroy C, Sander C (1998) MView: a web-compatible database search or multiple alignment viewer. Bioinformatics 14:380–381
CAS PubMed Google Scholar
Sansone SA, Rocca-Serra P, Field D et al (2012) Toward interoperable bioscience data. Nat Genet 44:121–126
CAS PubMed PubMed Central Google Scholar
Del Fabbro C, Scalabrin S, Morgante M, Giorgi FM (2013) An extensive evaluation of read trimming effects on Illumina NGS data analysis. PLoS One 8:e85024
PubMed PubMed Central Google Scholar
Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J 17:10
Google Scholar
Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120
CAS PubMed PubMed Central Google Scholar
Cozzuto L, Liu H, Pryszcz LP et al (2020) MasterOfPores: a workflow for the analysis of Oxford nanopore direct RNA sequencing datasets. Front Genet 11:211
CAS PubMed PubMed Central Google Scholar
Li B, Fillmore N, Bai Y et al (2014) Evaluation of de novo transcriptome assemblies from RNA-Seq data. Genome Biol 15:553
PubMed PubMed Central Google Scholar
Bushmanova E, Antipov D, Lapidus A et al (2016) RnaQUAST: a quality assessment tool for de novo transcriptome assemblies. Bioinformatics 32:2210–2212
CAS PubMed Google Scholar
Eddy SR (2011) Accelerated profile HMM searches. PLoS Comput Biol 7:1002195
Google Scholar
Finn RD, Bateman A, Clements J et al (2014) Pfam: the protein families database. Nucleic Acids Res 42:222–230
Google Scholar
Ashburner M, Ball C, Blake J et al (2000) Gene ontology: tool for the unification of biology. Nat Genet 25:25–29
CAS PubMed PubMed Central Google Scholar
Kanehisa M, Goto S, Sato Y et al (2012) KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res 40:109–114
Google Scholar
Thimm O, Bläsing O, Gibon Y et al (2004) MAPMAN: a user-driven tool to display genomics data sets onto diagrams of metabolic pathways and other biological processes. Plant J 37:914–939
CAS PubMed Google Scholar
Csárdi G, Nepusz T (2006) The igraph software package for complex network research. Int J Complex Syst 1695:1695
Google Scholar
Crusoe MR, Alameldin HF, Awad S et al (2015) The khmer software package: enabling efficient nucleotide sequence analysis. F1000Research 4:900
PubMed PubMed Central Google Scholar
Xie Y, Wu G, Tang J et al (2014) SOAPdenovo-trans: de novo transcriptome assembly with short RNA-Seq reads. Bioinformatics 30:1660–1666
CAS PubMed Google Scholar
Robertson G, Schein J, Chiu R et al (2010) De novo assembly and analysis of RNA-seq data. Nat Methods 7:909–912
CAS PubMed Google Scholar
Peng Y, Leung HCM, Yiu SM, Chin FYL (2011) T-IDBA: a de novo iterative de Bruijn graph assembler for transcriptome (extended abstract). In: Lecture notes in computer science, Lecture notes in artificial intelligence and lecture notes in bioinformatics. Springer, Berlin, pp 337–338
Google Scholar
Liu J, Yu T, Mu Z, Li G (2019) TransLiG: a de novo transcriptome assembler that uses line graph iteration. Genome Biol 20:81
PubMed PubMed Central Google Scholar
Zhang Y, Sun Y, Cole JR (2014) A scalable and accurate targeted gene assembly tool (SAT-assembler) for next-generation sequencing data. PLoS Comput Biol 10:1003737
Google Scholar
Geniza M, Jaiswal P (2017) Tools for building de novo transcriptome assembly. Curr Plant Biol 11–12:41–45
Google Scholar

Download references

Acknowledgments

The authors would like to thank the coworkers from the Department of Biotechnology and Systems Biology of the National Institute of Biology for providing raw RNA-Seq datasets, Henrik Krnec for BLAST output parser, and the Omics bioinformatics team for critical discussions. This project was supported by the Slovenian Research Agency (grants P4-0165, J4-1777, J4-4165, J4-7636, J4-8228, J4-9302, and Z7-1888), and COST actions CA15110 (CHARME) and CA15109 (COSTNET).

Author information

Authors and Affiliations

Department of Biotechnology and Systems Biology, National Institute of Biology, Ljubljana, Slovenia
Maja Zagorščak & Marko Petek

Authors

Maja Zagorščak
View author publications
You can also search for this author in PubMed Google Scholar
Marko Petek
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maja Zagorščak .

Editor information

Editors and Affiliations

Department of Biotechnology and Systems Biology, National Institute of Biology, Ljubljana, Slovenia
David Dobnik
Department of Biotechnology and Systems Biology, National Institute of Biology, Ljubljana, Slovenia
Kristina Gruden
Department of Biotechnology and Systems Biology, National Institute of Biology, Ljubljana, Slovenia
Živa Ramšak
Department of Biotechnology and Systems Biology, National Institute of Biology, Ljubljana, Slovenia
Anna Coll

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Zagorščak, M., Petek, M. (2021). A Comprehensive Guide to Potato Transcriptome Assembly. In: Dobnik, D., Gruden, K., Ramšak, Ž., Coll, A. (eds) Solanum tuberosum. Methods in Molecular Biology, vol 2354. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-1609-3_8

Download citation

DOI: https://doi.org/10.1007/978-1-0716-1609-3_8
Published: 27 August 2021
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-1608-6
Online ISBN: 978-1-0716-1609-3
eBook Packages: Springer Protocols

Publish with us

Policies and ethics