Abstract
To obtain intact and full-length RNA transcripts of onion (Allium cepa), long-read sequencing technology was first applied. Total RNAs extracted from four tissues; flowers, leaves, bulbs and roots, of red–purple and yellow-colored onions (A. cepa) were sequenced using long-read sequencing (RSII platform, P4-C2 chemistry). The 99,247 polished high-quality isoforms were produced by sequence correction processes of consensus calling, quality filtering, orientation verification, misread-nucleotide correction and dot-matrix view. The dot-matrix view was subsequently used to remove artificial inverted repeats (IRs), and resultantly 421 IRs were removed. The remaining 98,826 isoforms were condensed to 35,505 through the removal process of redundant isoforms. To assess the completeness of the 35,505 isoforms, the ratio of full-length isoforms, short-read mapping to the isoforms, and differentially expressed genes among the four tissues were analyzed along with the gene ontology across the tissues. As a result, the 35,505 isoforms were verified as a collection of isoforms with high completeness, and designated as draft reference transcripts (DRTs, ver 1.0) constructed by long-read sequencing.
Similar content being viewed by others
Abbreviations
- DEG:
-
Differentially expressed genes
- DRTs:
-
Draft reference transcripts
- GO:
-
Gene ontology
- ICE:
-
Iterative clustering for error correction
- PHQIs:
-
Polished high-quality isoforms
- SMRT:
-
Single molecule real-time DNA sequencing
References
Biosciences Pacific (2004) In: Pacific Biosci. https://en.wikipedia.org/wiki/Pacific_Biosciences. Accessed 26 July 2016
Chin C, Sorenson J, Harris J, Robbins W, Charles R, Jean-Charles R, Bullard J, Webster D, Kasarskis A, Peluso P, Paxinos E (2011) The origin of the Haitian cholera outbreak strain. N Engl J Med 6:33–42
Chin C, Alexander D, Marks P, Klammer A, Drake J, Heiner C, Clum A, Copeland A, Huddleston J, Eichler E, Turner S (2013) Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat Methods 10:563–569
Conesa A, Götz S, García-Gómez J, Terol J, Talon M, Robles M (2005) Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21:3674–3676
Duangjit J, Bohanec B, Chan AP, Town CD, Havey MJ (2013) Transcriptome sequencing to produce SNP-based genetic maps of onion. Theor Appl Genet 126:2093–2101
Eid J, Fehr A, Gray J, Luong Lyle J, Otto G, Peluso P, Rank D, Baybayan P, Bettman B, Bibillo A, Bjornson K, Chaudhuri B, Christians F, Cicero R, Clark S, Dalal R, Dewinte A, Dixon J, Foquet M, Gaertner A, Hardenbol P, Heiner C, Hester K, Holden D, Kearns G, Kong X, Kuse R, Lacroix Y, Lin S, Lundquist P, Ma C, Marks P, Maxham M, Murphy D, Park I, Pham T, Phillips M, Roy J, Sebra R, Shen G, Sorenson J, Tomaney A, Travers K, Trulson M, Vieceli J, Wegener J, Wu D, Yang A, Zaccarin D, Zhao P, Zhong F, Korlach J, Turner S (2009) Real-time DNA sequencing from single polymerase molecules. Science 323:133–138
English A, Richards S, Han Y, Wang M, Vee V (2012) Mind the gap: upgrading genomes with Pacific Biosciences RS long-read sequencing technology. PLoS ONE 7:e47768
Fu L, Niu B, Zhu Z, Wu S, Li W (2012) CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28:3150–3152
Grabherr M, Haas B, Yassour M (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29:644–654
Huang Y, Niu B, Gao Y, Fu L, Li W (2010) CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics 26:680–682
Kim S, Kim MS, Kim YM, Yeom SI, Cheong K, Kim KT, Jeon J, Kim S, Kim DS, Sohn SH, Lee YH, Choi D (2014) Integrative structural annotation of de novo RNA-Seq provides an accurate reference gene set of the enormous genome of the onion (Allium cepa L.). DNA 22:19–27
Kuhl JC, Cheung F, Yuan QP, Martin W, Zewdie Y, McCallum J, Catanach A, Rutherford P, Sink KC, Jenderek M, Prince JP, Town CD, Havey MJ (2004) A unique set of 11,008 onion expressed sequence tags reveals expressed sequence and genomic differences between the monocot orders Asparagales and Poales. Plant Cell 16:114–125
Li W, Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22:3450
Liu C, Shen D, Guo J (2012) Clinical and microbiological characterization of Staphylococcus lugdunensis isolates obtained from clinical specimens in a hospital in China. BMC 12:1
McCallum J, Baldwin S, Shigyo M, Deng Y, Heusden SV, Pither-Joyce M, Kenel F (2012) Allium map-A comparative genomics resource for cultivated Allium vegetables. BMC Genom 13:1471–2164
Michael TP, Jackson S (2013) The first 50 plant genomes. Plant Genome 6:1–7
Rai A, Yamazaki M, Takahashi H, Nakamura M, Kojoma M, Suzuki H, Saito K (2016) RNA-seq transcriptome analysis of Panax japonicus, and its comparison with other Panax species to identify potential genes involved in the saponins biosynthesis. Front. Plant Sci 7:1–20
R Development Core Team (2013) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. http://www.R-project.org/
Rhoads A, Au K (2015) PacBio sequencing and its applications. Bioinformatics 13:278–289
Sohn SH, Frost J, Kim YH, Choi SK, Lee Y, Seo MS, Lim SH, Choi Y, Kim KH, Lomonossoff G (2014) Cell-autonomous-like silencing of GFP-partitioned transgenic Nicotiana benthamiana. J Exp Bot 65:4271–4283
Sterck L, Rombauts S, Vandepoele K, Van De Peer Y, Rouze P, Rouzé P, Van de Peer Y (2007) How many genes are there in plants (and why are they there)? Curr Opin Plant Biol 10:199–203
Westbrook C, Karl J, Wiseman R, Mate S (2015) No assembly required: full-length MHC class I allele discovery by PacBio circular consensus sequencing. Hum Immunol 76:891–896
Acknowledgments
The authors gratefully acknowledge the dedicated analysis of the National Agricultural Biotechnology Information Center, RDA (NABIC, http://www.nabic.rda.go.kr) and the efforts of misread corrections and Trinity assembly by DNA Link (Korea, http://www.dnalink.com) for establishing the workflow of long-read sequencing and raising the completeness of the transcripts. This study was funded by the National Agricultural Genome Program (NAGP, PJ010449) of Rural Development Administration (RDA), the Republic of Korea. The DRTs (ver 1.0) of onion are deposited to the NABIC with the registration No. of NU-0651. Raw data including long-, short-read sequences, annotation of PHQIs and detailed results of DEGs were deposited into the Reference Genome Analysis System (RGAS), a closed system, of the NABIC at National Institute of Agricultural Sciences (NAS, RDA).
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Sohn, SH., Ahn, YK., Lee, TH. et al. Construction of a draft reference transcripts of onion (Allium cepa) using long-read sequencing. Plant Biotechnol Rep 10, 383–390 (2016). https://doi.org/10.1007/s11816-016-0409-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11816-016-0409-4