Construction of a draft reference transcripts of onion (Allium cepa) using long-read sequencing

Sohn, Seong-Han; Ahn, Yul-Kyun; Lee, Tae-Ho; Lee, Jong-Eun; Jeong, Min-Hee; Seo, Chae-Hwa; Chandra, Romika; Kwon, Young-Seok; Kim, Cheol-Woo; Kim, Do-Sun; Won, So-Youn; Kim, Jung Sun; Choi, Dongsu

doi:10.1007/s11816-016-0409-4

Construction of a draft reference transcripts of onion (Allium cepa) using long-read sequencing

Original Article
Published: 12 October 2016

Volume 10, pages 383–390, (2016)
Cite this article

Plant Biotechnology Reports Aims and scope Submit manuscript

Seong-Han Sohn¹,
Yul-Kyun Ahn²,
Tae-Ho Lee¹,
Jong-Eun Lee³,
Min-Hee Jeong¹,
Chae-Hwa Seo³,
Romika Chandra^1,4,
Young-Seok Kwon²,
Cheol-Woo Kim²,
Do-Sun Kim²,
So-Youn Won¹,
Jung Sun Kim¹ &
…
Dongsu Choi⁴

631 Accesses
13 Citations
Explore all metrics

Abstract

To obtain intact and full-length RNA transcripts of onion (Allium cepa), long-read sequencing technology was first applied. Total RNAs extracted from four tissues; flowers, leaves, bulbs and roots, of red–purple and yellow-colored onions (A. cepa) were sequenced using long-read sequencing (RSII platform, P4-C2 chemistry). The 99,247 polished high-quality isoforms were produced by sequence correction processes of consensus calling, quality filtering, orientation verification, misread-nucleotide correction and dot-matrix view. The dot-matrix view was subsequently used to remove artificial inverted repeats (IRs), and resultantly 421 IRs were removed. The remaining 98,826 isoforms were condensed to 35,505 through the removal process of redundant isoforms. To assess the completeness of the 35,505 isoforms, the ratio of full-length isoforms, short-read mapping to the isoforms, and differentially expressed genes among the four tissues were analyzed along with the gene ontology across the tissues. As a result, the 35,505 isoforms were verified as a collection of isoforms with high completeness, and designated as draft reference transcripts (DRTs, ver 1.0) constructed by long-read sequencing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The first high-quality genome assembly and annotation of Lantana camara, an important ornamental plant and a major invasive species

Article Open access 10 May 2024

Transcriptome-wide identification and characterization of the regulatory landscape of NAC genes in Drimia indica

Article 30 November 2023

Genome-wide analysis of NAC transcription factors and exploration of candidate genes regulating selenium metabolism in Broussonetia papyrifera

Article 16 May 2024

Abbreviations

DEG:: Differentially expressed genes
DRTs:: Draft reference transcripts
GO:: Gene ontology
ICE:: Iterative clustering for error correction
PHQIs:: Polished high-quality isoforms
SMRT:: Single molecule real-time DNA sequencing

References

Biosciences Pacific (2004) In: Pacific Biosci. https://en.wikipedia.org/wiki/Pacific_Biosciences. Accessed 26 July 2016
Chin C, Sorenson J, Harris J, Robbins W, Charles R, Jean-Charles R, Bullard J, Webster D, Kasarskis A, Peluso P, Paxinos E (2011) The origin of the Haitian cholera outbreak strain. N Engl J Med 6:33–42
Article Google Scholar
Chin C, Alexander D, Marks P, Klammer A, Drake J, Heiner C, Clum A, Copeland A, Huddleston J, Eichler E, Turner S (2013) Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat Methods 10:563–569
Article CAS PubMed Google Scholar
Conesa A, Götz S, García-Gómez J, Terol J, Talon M, Robles M (2005) Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21:3674–3676
Article CAS PubMed Google Scholar
Duangjit J, Bohanec B, Chan AP, Town CD, Havey MJ (2013) Transcriptome sequencing to produce SNP-based genetic maps of onion. Theor Appl Genet 126:2093–2101
Article CAS PubMed Google Scholar
Eid J, Fehr A, Gray J, Luong Lyle J, Otto G, Peluso P, Rank D, Baybayan P, Bettman B, Bibillo A, Bjornson K, Chaudhuri B, Christians F, Cicero R, Clark S, Dalal R, Dewinte A, Dixon J, Foquet M, Gaertner A, Hardenbol P, Heiner C, Hester K, Holden D, Kearns G, Kong X, Kuse R, Lacroix Y, Lin S, Lundquist P, Ma C, Marks P, Maxham M, Murphy D, Park I, Pham T, Phillips M, Roy J, Sebra R, Shen G, Sorenson J, Tomaney A, Travers K, Trulson M, Vieceli J, Wegener J, Wu D, Yang A, Zaccarin D, Zhao P, Zhong F, Korlach J, Turner S (2009) Real-time DNA sequencing from single polymerase molecules. Science 323:133–138
Article CAS PubMed Google Scholar
English A, Richards S, Han Y, Wang M, Vee V (2012) Mind the gap: upgrading genomes with Pacific Biosciences RS long-read sequencing technology. PLoS ONE 7:e47768
Article CAS PubMed PubMed Central Google Scholar
Fu L, Niu B, Zhu Z, Wu S, Li W (2012) CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28:3150–3152
Article CAS PubMed PubMed Central Google Scholar
Grabherr M, Haas B, Yassour M (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29:644–654
Article CAS PubMed PubMed Central Google Scholar
Huang Y, Niu B, Gao Y, Fu L, Li W (2010) CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics 26:680–682
Article CAS PubMed PubMed Central Google Scholar
Kim S, Kim MS, Kim YM, Yeom SI, Cheong K, Kim KT, Jeon J, Kim S, Kim DS, Sohn SH, Lee YH, Choi D (2014) Integrative structural annotation of de novo RNA-Seq provides an accurate reference gene set of the enormous genome of the onion (Allium cepa L.). DNA 22:19–27
Kuhl JC, Cheung F, Yuan QP, Martin W, Zewdie Y, McCallum J, Catanach A, Rutherford P, Sink KC, Jenderek M, Prince JP, Town CD, Havey MJ (2004) A unique set of 11,008 onion expressed sequence tags reveals expressed sequence and genomic differences between the monocot orders Asparagales and Poales. Plant Cell 16:114–125
Article PubMed PubMed Central Google Scholar
Li W, Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22:3450
Google Scholar
Liu C, Shen D, Guo J (2012) Clinical and microbiological characterization of Staphylococcus lugdunensis isolates obtained from clinical specimens in a hospital in China. BMC 12:1
Article CAS Google Scholar
McCallum J, Baldwin S, Shigyo M, Deng Y, Heusden SV, Pither-Joyce M, Kenel F (2012) Allium map-A comparative genomics resource for cultivated Allium vegetables. BMC Genom 13:1471–2164
Article Google Scholar
Michael TP, Jackson S (2013) The first 50 plant genomes. Plant Genome 6:1–7
Article Google Scholar
Rai A, Yamazaki M, Takahashi H, Nakamura M, Kojoma M, Suzuki H, Saito K (2016) RNA-seq transcriptome analysis of Panax japonicus, and its comparison with other Panax species to identify potential genes involved in the saponins biosynthesis. Front. Plant Sci 7:1–20
Google Scholar
R Development Core Team (2013) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. http://www.R-project.org/
Rhoads A, Au K (2015) PacBio sequencing and its applications. Bioinformatics 13:278–289
Google Scholar
Sohn SH, Frost J, Kim YH, Choi SK, Lee Y, Seo MS, Lim SH, Choi Y, Kim KH, Lomonossoff G (2014) Cell-autonomous-like silencing of GFP-partitioned transgenic Nicotiana benthamiana. J Exp Bot 65:4271–4283
Sterck L, Rombauts S, Vandepoele K, Van De Peer Y, Rouze P, Rouzé P, Van de Peer Y (2007) How many genes are there in plants (and why are they there)? Curr Opin Plant Biol 10:199–203
Article CAS PubMed Google Scholar
Westbrook C, Karl J, Wiseman R, Mate S (2015) No assembly required: full-length MHC class I allele discovery by PacBio circular consensus sequencing. Hum Immunol 76:891–896
Article CAS PubMed Google Scholar

Download references

Acknowledgments

The authors gratefully acknowledge the dedicated analysis of the National Agricultural Biotechnology Information Center, RDA (NABIC, http://www.nabic.rda.go.kr) and the efforts of misread corrections and Trinity assembly by DNA Link (Korea, http://www.dnalink.com) for establishing the workflow of long-read sequencing and raising the completeness of the transcripts. This study was funded by the National Agricultural Genome Program (NAGP, PJ010449) of Rural Development Administration (RDA), the Republic of Korea. The DRTs (ver 1.0) of onion are deposited to the NABIC with the registration No. of NU-0651. Raw data including long-, short-read sequences, annotation of PHQIs and detailed results of DEGs were deposited into the Reference Genome Analysis System (RGAS), a closed system, of the NABIC at National Institute of Agricultural Sciences (NAS, RDA).

Author information

Authors and Affiliations

Genomics Division, NAS, RDA, Jeonju, 55365, Republic of Korea
Seong-Han Sohn, Tae-Ho Lee, Min-Hee Jeong, Romika Chandra, So-Youn Won & Jung Sun Kim
Vegetable Division, NIHH, RDA, Jeonju, 55365, Republic of Korea
Yul-Kyun Ahn, Young-Seok Kwon, Cheol-Woo Kim & Do-Sun Kim
DNA Link, Inc., 150 Bugahyeon-ro, Seodaemun-gu, Seoul, 03759, Republic of Korea
Jong-Eun Lee & Chae-Hwa Seo
Department of Biology, Kunsan National University, Gunsan, 54150, Republic of Korea
Romika Chandra & Dongsu Choi

Authors

Seong-Han Sohn
View author publications
You can also search for this author in PubMed Google Scholar
Yul-Kyun Ahn
View author publications
You can also search for this author in PubMed Google Scholar
Tae-Ho Lee
View author publications
You can also search for this author in PubMed Google Scholar
Jong-Eun Lee
View author publications
You can also search for this author in PubMed Google Scholar
Min-Hee Jeong
View author publications
You can also search for this author in PubMed Google Scholar
Chae-Hwa Seo
View author publications
You can also search for this author in PubMed Google Scholar
Romika Chandra
View author publications
You can also search for this author in PubMed Google Scholar
Young-Seok Kwon
View author publications
You can also search for this author in PubMed Google Scholar
Cheol-Woo Kim
View author publications
You can also search for this author in PubMed Google Scholar
Do-Sun Kim
View author publications
You can also search for this author in PubMed Google Scholar
So-Youn Won
View author publications
You can also search for this author in PubMed Google Scholar
Jung Sun Kim
View author publications
You can also search for this author in PubMed Google Scholar
Dongsu Choi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Seong-Han Sohn.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 52 kb)

Supplementary material 2 (PPTX 355 kb)

Supplementary material 3 (XLSX 3089 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sohn, SH., Ahn, YK., Lee, TH. et al. Construction of a draft reference transcripts of onion (Allium cepa) using long-read sequencing. Plant Biotechnol Rep 10, 383–390 (2016). https://doi.org/10.1007/s11816-016-0409-4

Download citation

Received: 17 August 2016
Accepted: 20 September 2016
Published: 12 October 2016
Issue Date: November 2016
DOI: https://doi.org/10.1007/s11816-016-0409-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Construction of a draft reference transcripts of onion (Allium cepa) using long-read sequencing

Abstract

Access this article

Similar content being viewed by others

The first high-quality genome assembly and annotation of Lantana camara, an important ornamental plant and a major invasive species

Transcriptome-wide identification and characterization of the regulatory landscape of NAC genes in Drimia indica

Genome-wide analysis of NAC transcription factors and exploration of candidate genes regulating selenium metabolism in Broussonetia papyrifera

Abbreviations

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (DOCX 52 kb)

Supplementary material 2 (PPTX 355 kb)

Supplementary material 3 (XLSX 3089 kb)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Construction of a draft reference transcripts of onion (Allium cepa) using long-read sequencing

Abstract

Access this article

Similar content being viewed by others

The first high-quality genome assembly and annotation of Lantana camara, an important ornamental plant and a major invasive species

Transcriptome-wide identification and characterization of the regulatory landscape of NAC genes in Drimia indica

Genome-wide analysis of NAC transcription factors and exploration of candidate genes regulating selenium metabolism in Broussonetia papyrifera

Abbreviations

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (DOCX 52 kb)

Supplementary material 2 (PPTX 355 kb)

Supplementary material 3 (XLSX 3089 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation