Abstract
Kenaf (Hibiscus cannabinus) is one of important nature fiber crops in the world. However, the EST sequence resources available for kenaf are still very limited. In this study, 131,262,686 clean reads (13.12 Gb) using Illumina paired-end sequencing were assembled. De novo assembly yielded 90,175 unigenes with an average length of 700 bp. By sequence similarity searching for known proteins, 46,165 (51.19 %) unigenes were annotated for their function. Out of these annotated unigenes, 37,080 (41.12 %) unigenes showed significant similarity to genes of a diploid cotton (Gossypium raimondii). Searching against the Kyoto encyclopedia of genes and genomes (KEGG) indicated that 23,051 unigenes were mapped to 254 KEGG pathways, and 317 genes were assigned to starch and sucrose metabolic pathway which was related with cellulose biosynthesis. Furthermore, a total of 52,521 putative gene-associated SNPs and 11,083 SSRs (designated as HcEMS) were identified from these assembled unigenes. Among these HcEMS markers, mono-, di-, and trinucleotide repeat types were the most abundant types (40.3, 22.3, and 34.7 %, respectively). The AG-rich nucleotide repeats, including (AG)n, (AAG)n, and (AAAG)n types, were the most abundant and could be considered as the dominant types in kenaf. A sample of 835 HcEMSs were used to survey polymorphisms with eight kenaf accessions. Of them, 753 (90.1 %) successfully amplified at least one fragment and 450 (53.9 %) detected polymorphism. All these results will accelerate the understanding of genetic basis of the fiber development and marker-assisted breeding in kenaf.
This is a preview of subscription content, access via your institution.


Abbreviations
- NGS:
-
Next-generation sequencing
- CDS:
-
Coding sequence
- GO:
-
Gene ontology
- KEGG:
-
Kyoto encyclopedia of genes and genomes
- EST:
-
Expressed sequence tag
- nr:
-
Non-redundant protein sequences
- KOG:
-
Eukaryotic ortholog groups
- BLAST:
-
Basic local alignment search tool
- SNP:
-
Single nucleotide polymorphisms
- SSR:
-
Simple sequence repeat
References
Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402
Chakraborty A, Sarkar D, Satya P, Karmakar P, Singh N (2015) Pathways associated with lignin biosynthesis in lignomaniac jute fibres. Mol Genet Genomics 290:1523–1542
Chen P, Ran S, Li R, Huang Z, Qian J, Yu M, Zhou R (2014) Transcriptome de novo assembly and differentially expressed genes related to cytoplasmic male sterility in kenaf (Hibiscus cannabinus L.). Mol Breed 34(4):1879–1891
Cheng Z, Lu B-R, Sameshima K, Fu D-X, Chen J-K (2004) Identification and genetic relationships of kenaf (Hibiscus cannabinus L.) germplasm revealed by AFLP analysis. Genet Resour Crop Ev 51(4):393–401
Cheng X, Xu J, Xia S, Gu J, Yang Y, Fu J, Qian X, Zhang S, Wu J, Liu K (2009) Development and genetic mapping of microsatellite markers from genome survey sequences in Brassica napus. Theor Appl Genet 118(6):1121–1131
Cloutier S, Niu Z, Datla R, Duguid S (2009) Development and analysis of EST-SSRs for flax (Linum usitatissimum L.). Theor Appl Genet 119(1):53–63
Conesa A, Götz S, García-Gómez JM, Terol J, Talón M, Robles M (2005) Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21(18):3674–3676
Deng Y, Yao J, Wang X, Guo H, Duan D (2012) Transcriptome sequencing and comparative analysis of Saccharina japonica (Laminariales, Phaeophyceae) under blue light induction. PLoS One 7(6):e39704
DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, Philippakis AA, del Angel G, Rivas MA, Hanna M, McKenna A, Fennell TJ, Kernytsky AM, Sivachenko AY, Cibulskis K, Gabriel SB, Altshuler D, Daly MJ (2011) A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 43(5):491–498
Garg R, Patel RK, Tyagi AK, Jain M (2011) De novo assembly of chickpea transcriptome using short reads for gene discovery and marker identification. DNA Res 18(1):53–63
Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, Chen Z, Mauceli E, Hacohen N, Gnirke A, Rhind N, di Palma F, Birren BW, Nusbaum C, Lindblad-Toh K, Friedman N, Regev A (2011) Full-length transcriptome assembly from RNA-seq data without a reference genome. Nat Biotech 29(7):644–652
Haas BJ, Papanicolaou A, Yassour M, Grabherr M, Blood PD, Bowden J, Couger MB, Eccles D, Li B, Lieber M, MacManes MD, Ott M, Orvis J, Pochet N, Strozzi F, Weeks N, Westerman R, William T, Dewey CN, Henschel R, LeDuc RD, Friedman N, Regev A (2013) De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat Protoc 8(8):1494–1512
Kircher M, Stenzel U, Kelso J (2009) Improved base calling for the Illumina Genome Analyzer using machine learning strategies. Genome Biol 10(8):R83
Li B, Dewey C (2011) RSEM: accurate transcript quantification from RNA-seq data with or without a reference genome. BMC Bioinform 12:323–338
Liu T, Zhu S, Tang Q, Chen P, Yu Y, Tang S (2013a) De novo assembly and characterization of transcriptome using Illumina paired-end sequencing and identification of CesA gene in ramie (Boehmeria nivea L. Gaud). BMC Genomics 14:125–135
Liu T, Zhu S, Fu L, Tang Q, Yu Y, Chen P, Luan M, Wang C, Tang S (2013b) Development and characterization of 1,827 expressed sequence tag-derived simple sequence repeat markers for ramie (Boehmeria nivea L. Gaud). PLoS ONE 8(4):e60346
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA (2010) The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20(9):1297–1303
Mudalkar S, Golla R, Ghatty S, Reddy A (2014) De novo transcriptome analysis of an imminent biofuel crop, Camelina sativa L. using Illumina GAIIX sequencing platform and identification of SSR markers. Plant Mol Biol 84(1–2):159–171
Rozen S, Skaletsky H (2000) Primer3 on the WWW for general users and for biologist programmers. Methods Mol Biol 132:365–386
Saghai-Maroof MA, Soliman KM, Jorgensen RA, Allard RW (1984) Ribosomal DNA spacer-length polymorphisms in barley: mendelian inheritance, chromosomal location, and population dynamics. Proc Natl Acad Sci USA 81(24):8014–8018
Satya P (2012) Prezygotic interspecific hybridization barriers between kenaf (Hibiscus cannabinus L.) and four wild relatives. Plant Breed 131(5):648–655
Wang Z, Fang B, Chen J, Zhang X, Luo Z, Huang L, Chen X, Li Y (2010) De novo assembly and characterization of root transcriptome using Illumina paired-end sequencing and development of cSSR markers in sweet potato (Ipomoea batatas). BMC Genomics 11:726–739
Wang K, Wang Z, Li F, Ye W, Wang J, Song G, Yue Z, Cong L, Shang H, Zhu S, Zou C, Li Q, Yuan Y, Lu C, Wei H, Gou C, Zheng Z, Yin Y, Zhang X, Liu K, Wang B, Song C, Shi N, Kohel RJ, Percy RG, Yu JZ, Zhu YX, Yu S (2012) The draft genome of a diploid cotton Gossypium raimondii. Nat Genet 44:1098–1103
Wu N, Matand K, Wu H, Li B, Li Y, Zhang X, He Z, Qian J, Liu X, Conley S, Bailey M, Acquaah G (2013) De novo next-generation sequencing, assembling and annotation of Arachis hypogaea L. Spanish botanical type whole plant transcriptome. Theor Appl Genet 126:1145–1149
Xiong HP (2008) Breeding sciences of bast and leaf fiber crops, 1st edn. Agricultral Science and Technology Press of China, Beijing
Xu J, Li A, Wang X, Qi J, Zhang L, Zhang G, Su J, Tao A (2013) Genetic diversity and phylogenetic relationship of kenaf (Hibiscus cannabinus L.) accessions evaluated by SRAP and ISSR. Biochem Syst Ecol 49:94–100
Ye J, Fang L, Zheng H, Zhang Y, Chen J, Zhang Z, Wang J, Li S, Li R, Bolund L (2006) WEGO: a web tool for plotting GO annotations. Nucleic Acids Res 34:W293–W297
Zhang L, Li A, Wang X, Xu J, Zhang G, Su J, Qi J, Guan C (2013) Genetic diversity of kenaf (Hibiscus cannabinus) evaluated by inter-simple sequence repeat (ISSR). Biochem Genet 51:800–810
Zhang G, Zhang Y, Xu J, Niu X, Qi J, Tao A, Zhang L, Fang P, Lin L, Su J (2014) The CCoAOMT1 gene from jute (Corchorus capsularis L.) is involved in lignin biosynthesis in Arabidopsis thaliana. Gene 546:398–402
Acknowledgments
This project was sponsored by funds from the Distinguished Young Research Fund in Fujian Agriculture and Forestry University (xjq201401; 2012xjj01), China National Scholarship (201408350022), Introduction Breeding and Varieties Demonstration of Featured Crops between China and Benin (2015I0001), National Agri-Industry Technology Research System for Crops of Bast and Leaf Fiber, China (nycytx-19-E06), and Experiment Station of Kenaf and Kenaf of the Ministry of Agriculture in Southeast China (Nongkejiaofa 2011).
Author contributions
J.Q. and L.Z. conceived the project and its components. X.W. and L.Z. contributed to RNA isolation and cDNA library construction. X.W. and L.Z. performed SSR development and polymorphism evaluation. J.X., L.L., and J.Q. selected the germplasms. X.W. and L.Z. prepared figures and tables. L.Z. analyzed all the data and wrote the paper.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical standard
The experiment conducted complies with the laws of China.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Zhang, L., Wan, X., Xu, J. et al. De novo assembly of kenaf (Hibiscus cannabinus) transcriptome using Illumina sequencing for gene discovery and marker identification. Mol Breeding 35, 192 (2015). https://doi.org/10.1007/s11032-015-0388-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11032-015-0388-0
Keywords
- Kenaf (Hibiscus cannabinus L.)
- Transcriptome sequencing
- SNP
- SSR