Skip to main content
Log in

Construction of a draft reference transcripts of onion (Allium cepa) using long-read sequencing

  • Original Article
  • Published:
Plant Biotechnology Reports Aims and scope Submit manuscript

Abstract

To obtain intact and full-length RNA transcripts of onion (Allium cepa), long-read sequencing technology was first applied. Total RNAs extracted from four tissues; flowers, leaves, bulbs and roots, of red–purple and yellow-colored onions (A. cepa) were sequenced using long-read sequencing (RSII platform, P4-C2 chemistry). The 99,247 polished high-quality isoforms were produced by sequence correction processes of consensus calling, quality filtering, orientation verification, misread-nucleotide correction and dot-matrix view. The dot-matrix view was subsequently used to remove artificial inverted repeats (IRs), and resultantly 421 IRs were removed. The remaining 98,826 isoforms were condensed to 35,505 through the removal process of redundant isoforms. To assess the completeness of the 35,505 isoforms, the ratio of full-length isoforms, short-read mapping to the isoforms, and differentially expressed genes among the four tissues were analyzed along with the gene ontology across the tissues. As a result, the 35,505 isoforms were verified as a collection of isoforms with high completeness, and designated as draft reference transcripts (DRTs, ver 1.0) constructed by long-read sequencing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Abbreviations

DEG:

Differentially expressed genes

DRTs:

Draft reference transcripts

GO:

Gene ontology

ICE:

Iterative clustering for error correction

PHQIs:

Polished high-quality isoforms

SMRT:

Single molecule real-time DNA sequencing

References

  • Biosciences Pacific (2004) In: Pacific Biosci. https://en.wikipedia.org/wiki/Pacific_Biosciences. Accessed 26 July 2016

  • Chin C, Sorenson J, Harris J, Robbins W, Charles R, Jean-Charles R, Bullard J, Webster D, Kasarskis A, Peluso P, Paxinos E (2011) The origin of the Haitian cholera outbreak strain. N Engl J Med 6:33–42

    Article  Google Scholar 

  • Chin C, Alexander D, Marks P, Klammer A, Drake J, Heiner C, Clum A, Copeland A, Huddleston J, Eichler E, Turner S (2013) Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat Methods 10:563–569

    Article  CAS  PubMed  Google Scholar 

  • Conesa A, Götz S, García-Gómez J, Terol J, Talon M, Robles M (2005) Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21:3674–3676

    Article  CAS  PubMed  Google Scholar 

  • Duangjit J, Bohanec B, Chan AP, Town CD, Havey MJ (2013) Transcriptome sequencing to produce SNP-based genetic maps of onion. Theor Appl Genet 126:2093–2101

    Article  CAS  PubMed  Google Scholar 

  • Eid J, Fehr A, Gray J, Luong Lyle J, Otto G, Peluso P, Rank D, Baybayan P, Bettman B, Bibillo A, Bjornson K, Chaudhuri B, Christians F, Cicero R, Clark S, Dalal R, Dewinte A, Dixon J, Foquet M, Gaertner A, Hardenbol P, Heiner C, Hester K, Holden D, Kearns G, Kong X, Kuse R, Lacroix Y, Lin S, Lundquist P, Ma C, Marks P, Maxham M, Murphy D, Park I, Pham T, Phillips M, Roy J, Sebra R, Shen G, Sorenson J, Tomaney A, Travers K, Trulson M, Vieceli J, Wegener J, Wu D, Yang A, Zaccarin D, Zhao P, Zhong F, Korlach J, Turner S (2009) Real-time DNA sequencing from single polymerase molecules. Science 323:133–138

    Article  CAS  PubMed  Google Scholar 

  • English A, Richards S, Han Y, Wang M, Vee V (2012) Mind the gap: upgrading genomes with Pacific Biosciences RS long-read sequencing technology. PLoS ONE 7:e47768

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Fu L, Niu B, Zhu Z, Wu S, Li W (2012) CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28:3150–3152

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Grabherr M, Haas B, Yassour M (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29:644–654

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Huang Y, Niu B, Gao Y, Fu L, Li W (2010) CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics 26:680–682

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Kim S, Kim MS, Kim YM, Yeom SI, Cheong K, Kim KT, Jeon J, Kim S, Kim DS, Sohn SH, Lee YH, Choi D (2014) Integrative structural annotation of de novo RNA-Seq provides an accurate reference gene set of the enormous genome of the onion (Allium cepa L.). DNA 22:19–27

  • Kuhl JC, Cheung F, Yuan QP, Martin W, Zewdie Y, McCallum J, Catanach A, Rutherford P, Sink KC, Jenderek M, Prince JP, Town CD, Havey MJ (2004) A unique set of 11,008 onion expressed sequence tags reveals expressed sequence and genomic differences between the monocot orders Asparagales and Poales. Plant Cell 16:114–125

    Article  PubMed  PubMed Central  Google Scholar 

  • Li W, Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22:3450

    Google Scholar 

  • Liu C, Shen D, Guo J (2012) Clinical and microbiological characterization of Staphylococcus lugdunensis isolates obtained from clinical specimens in a hospital in China. BMC 12:1

    Article  CAS  Google Scholar 

  • McCallum J, Baldwin S, Shigyo M, Deng Y, Heusden SV, Pither-Joyce M, Kenel F (2012) Allium map-A comparative genomics resource for cultivated Allium vegetables. BMC Genom 13:1471–2164

    Article  Google Scholar 

  • Michael TP, Jackson S (2013) The first 50 plant genomes. Plant Genome 6:1–7

    Article  Google Scholar 

  • Rai A, Yamazaki M, Takahashi H, Nakamura M, Kojoma M, Suzuki H, Saito K (2016) RNA-seq transcriptome analysis of Panax japonicus, and its comparison with other Panax species to identify potential genes involved in the saponins biosynthesis. Front. Plant Sci 7:1–20

    Google Scholar 

  • R Development Core Team (2013) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. http://www.R-project.org/

  • Rhoads A, Au K (2015) PacBio sequencing and its applications. Bioinformatics 13:278–289

    Google Scholar 

  • Sohn SH, Frost J, Kim YH, Choi SK, Lee Y, Seo MS, Lim SH, Choi Y, Kim KH, Lomonossoff G (2014) Cell-autonomous-like silencing of GFP-partitioned transgenic Nicotiana benthamiana. J Exp Bot 65:4271–4283

  • Sterck L, Rombauts S, Vandepoele K, Van De Peer Y, Rouze P, Rouzé P, Van de Peer Y (2007) How many genes are there in plants (and why are they there)? Curr Opin Plant Biol 10:199–203

    Article  CAS  PubMed  Google Scholar 

  • Westbrook C, Karl J, Wiseman R, Mate S (2015) No assembly required: full-length MHC class I allele discovery by PacBio circular consensus sequencing. Hum Immunol 76:891–896

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgments

The authors gratefully acknowledge the dedicated analysis of the National Agricultural Biotechnology Information Center, RDA (NABIC, http://www.nabic.rda.go.kr) and the efforts of misread corrections and Trinity assembly by DNA Link (Korea, http://www.dnalink.com) for establishing the workflow of long-read sequencing and raising the completeness of the transcripts. This study was funded by the National Agricultural Genome Program (NAGP, PJ010449) of Rural Development Administration (RDA), the Republic of Korea. The DRTs (ver 1.0) of onion are deposited to the NABIC with the registration No. of NU-0651. Raw data including long-, short-read sequences, annotation of PHQIs and detailed results of DEGs were deposited into the Reference Genome Analysis System (RGAS), a closed system, of the NABIC at National Institute of Agricultural Sciences (NAS, RDA).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Seong-Han Sohn.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sohn, SH., Ahn, YK., Lee, TH. et al. Construction of a draft reference transcripts of onion (Allium cepa) using long-read sequencing. Plant Biotechnol Rep 10, 383–390 (2016). https://doi.org/10.1007/s11816-016-0409-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11816-016-0409-4

Keywords

Navigation