Skip to main content

De novo assembly of kenaf (Hibiscus cannabinus) transcriptome using Illumina sequencing for gene discovery and marker identification

Abstract

Kenaf (Hibiscus cannabinus) is one of important nature fiber crops in the world. However, the EST sequence resources available for kenaf are still very limited. In this study, 131,262,686 clean reads (13.12 Gb) using Illumina paired-end sequencing were assembled. De novo assembly yielded 90,175 unigenes with an average length of 700 bp. By sequence similarity searching for known proteins, 46,165 (51.19 %) unigenes were annotated for their function. Out of these annotated unigenes, 37,080 (41.12 %) unigenes showed significant similarity to genes of a diploid cotton (Gossypium raimondii). Searching against the Kyoto encyclopedia of genes and genomes (KEGG) indicated that 23,051 unigenes were mapped to 254 KEGG pathways, and 317 genes were assigned to starch and sucrose metabolic pathway which was related with cellulose biosynthesis. Furthermore, a total of 52,521 putative gene-associated SNPs and 11,083 SSRs (designated as HcEMS) were identified from these assembled unigenes. Among these HcEMS markers, mono-, di-, and trinucleotide repeat types were the most abundant types (40.3, 22.3, and 34.7 %, respectively). The AG-rich nucleotide repeats, including (AG)n, (AAG)n, and (AAAG)n types, were the most abundant and could be considered as the dominant types in kenaf. A sample of 835 HcEMSs were used to survey polymorphisms with eight kenaf accessions. Of them, 753 (90.1 %) successfully amplified at least one fragment and 450 (53.9 %) detected polymorphism. All these results will accelerate the understanding of genetic basis of the fiber development and marker-assisted breeding in kenaf.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2

Abbreviations

NGS:

Next-generation sequencing

CDS:

Coding sequence

GO:

Gene ontology

KEGG:

Kyoto encyclopedia of genes and genomes

EST:

Expressed sequence tag

nr:

Non-redundant protein sequences

KOG:

Eukaryotic ortholog groups

BLAST:

Basic local alignment search tool

SNP:

Single nucleotide polymorphisms

SSR:

Simple sequence repeat

References

  • Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Chakraborty A, Sarkar D, Satya P, Karmakar P, Singh N (2015) Pathways associated with lignin biosynthesis in lignomaniac jute fibres. Mol Genet Genomics 290:1523–1542

    Article  CAS  PubMed  Google Scholar 

  • Chen P, Ran S, Li R, Huang Z, Qian J, Yu M, Zhou R (2014) Transcriptome de novo assembly and differentially expressed genes related to cytoplasmic male sterility in kenaf (Hibiscus cannabinus L.). Mol Breed 34(4):1879–1891

    Article  CAS  Google Scholar 

  • Cheng Z, Lu B-R, Sameshima K, Fu D-X, Chen J-K (2004) Identification and genetic relationships of kenaf (Hibiscus cannabinus L.) germplasm revealed by AFLP analysis. Genet Resour Crop Ev 51(4):393–401

    Article  CAS  Google Scholar 

  • Cheng X, Xu J, Xia S, Gu J, Yang Y, Fu J, Qian X, Zhang S, Wu J, Liu K (2009) Development and genetic mapping of microsatellite markers from genome survey sequences in Brassica napus. Theor Appl Genet 118(6):1121–1131

    Article  CAS  PubMed  Google Scholar 

  • Cloutier S, Niu Z, Datla R, Duguid S (2009) Development and analysis of EST-SSRs for flax (Linum usitatissimum L.). Theor Appl Genet 119(1):53–63

    Article  CAS  PubMed  Google Scholar 

  • Conesa A, Götz S, García-Gómez JM, Terol J, Talón M, Robles M (2005) Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21(18):3674–3676

    Article  CAS  PubMed  Google Scholar 

  • Deng Y, Yao J, Wang X, Guo H, Duan D (2012) Transcriptome sequencing and comparative analysis of Saccharina japonica (Laminariales, Phaeophyceae) under blue light induction. PLoS One 7(6):e39704

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, Philippakis AA, del Angel G, Rivas MA, Hanna M, McKenna A, Fennell TJ, Kernytsky AM, Sivachenko AY, Cibulskis K, Gabriel SB, Altshuler D, Daly MJ (2011) A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 43(5):491–498

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Garg R, Patel RK, Tyagi AK, Jain M (2011) De novo assembly of chickpea transcriptome using short reads for gene discovery and marker identification. DNA Res 18(1):53–63

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, Chen Z, Mauceli E, Hacohen N, Gnirke A, Rhind N, di Palma F, Birren BW, Nusbaum C, Lindblad-Toh K, Friedman N, Regev A (2011) Full-length transcriptome assembly from RNA-seq data without a reference genome. Nat Biotech 29(7):644–652

    Article  CAS  Google Scholar 

  • Haas BJ, Papanicolaou A, Yassour M, Grabherr M, Blood PD, Bowden J, Couger MB, Eccles D, Li B, Lieber M, MacManes MD, Ott M, Orvis J, Pochet N, Strozzi F, Weeks N, Westerman R, William T, Dewey CN, Henschel R, LeDuc RD, Friedman N, Regev A (2013) De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat Protoc 8(8):1494–1512

    Article  CAS  PubMed  Google Scholar 

  • Kircher M, Stenzel U, Kelso J (2009) Improved base calling for the Illumina Genome Analyzer using machine learning strategies. Genome Biol 10(8):R83

    Article  PubMed Central  PubMed  Google Scholar 

  • Li B, Dewey C (2011) RSEM: accurate transcript quantification from RNA-seq data with or without a reference genome. BMC Bioinform 12:323–338

    Article  CAS  Google Scholar 

  • Liu T, Zhu S, Tang Q, Chen P, Yu Y, Tang S (2013a) De novo assembly and characterization of transcriptome using Illumina paired-end sequencing and identification of CesA gene in ramie (Boehmeria nivea L. Gaud). BMC Genomics 14:125–135

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Liu T, Zhu S, Fu L, Tang Q, Yu Y, Chen P, Luan M, Wang C, Tang S (2013b) Development and characterization of 1,827 expressed sequence tag-derived simple sequence repeat markers for ramie (Boehmeria nivea L. Gaud). PLoS ONE 8(4):e60346

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA (2010) The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20(9):1297–1303

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Mudalkar S, Golla R, Ghatty S, Reddy A (2014) De novo transcriptome analysis of an imminent biofuel crop, Camelina sativa L. using Illumina GAIIX sequencing platform and identification of SSR markers. Plant Mol Biol 84(1–2):159–171

    Article  CAS  PubMed  Google Scholar 

  • Rozen S, Skaletsky H (2000) Primer3 on the WWW for general users and for biologist programmers. Methods Mol Biol 132:365–386

    CAS  PubMed  Google Scholar 

  • Saghai-Maroof MA, Soliman KM, Jorgensen RA, Allard RW (1984) Ribosomal DNA spacer-length polymorphisms in barley: mendelian inheritance, chromosomal location, and population dynamics. Proc Natl Acad Sci USA 81(24):8014–8018

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Satya P (2012) Prezygotic interspecific hybridization barriers between kenaf (Hibiscus cannabinus L.) and four wild relatives. Plant Breed 131(5):648–655

    Article  CAS  Google Scholar 

  • Wang Z, Fang B, Chen J, Zhang X, Luo Z, Huang L, Chen X, Li Y (2010) De novo assembly and characterization of root transcriptome using Illumina paired-end sequencing and development of cSSR markers in sweet potato (Ipomoea batatas). BMC Genomics 11:726–739

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Wang K, Wang Z, Li F, Ye W, Wang J, Song G, Yue Z, Cong L, Shang H, Zhu S, Zou C, Li Q, Yuan Y, Lu C, Wei H, Gou C, Zheng Z, Yin Y, Zhang X, Liu K, Wang B, Song C, Shi N, Kohel RJ, Percy RG, Yu JZ, Zhu YX, Yu S (2012) The draft genome of a diploid cotton Gossypium raimondii. Nat Genet 44:1098–1103

    Article  CAS  PubMed  Google Scholar 

  • Wu N, Matand K, Wu H, Li B, Li Y, Zhang X, He Z, Qian J, Liu X, Conley S, Bailey M, Acquaah G (2013) De novo next-generation sequencing, assembling and annotation of Arachis hypogaea L. Spanish botanical type whole plant transcriptome. Theor Appl Genet 126:1145–1149

    Article  CAS  PubMed  Google Scholar 

  • Xiong HP (2008) Breeding sciences of bast and leaf fiber crops, 1st edn. Agricultral Science and Technology Press of China, Beijing

    Google Scholar 

  • Xu J, Li A, Wang X, Qi J, Zhang L, Zhang G, Su J, Tao A (2013) Genetic diversity and phylogenetic relationship of kenaf (Hibiscus cannabinus L.) accessions evaluated by SRAP and ISSR. Biochem Syst Ecol 49:94–100

    Article  CAS  Google Scholar 

  • Ye J, Fang L, Zheng H, Zhang Y, Chen J, Zhang Z, Wang J, Li S, Li R, Bolund L (2006) WEGO: a web tool for plotting GO annotations. Nucleic Acids Res 34:W293–W297

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Zhang L, Li A, Wang X, Xu J, Zhang G, Su J, Qi J, Guan C (2013) Genetic diversity of kenaf (Hibiscus cannabinus) evaluated by inter-simple sequence repeat (ISSR). Biochem Genet 51:800–810

    Article  CAS  PubMed  Google Scholar 

  • Zhang G, Zhang Y, Xu J, Niu X, Qi J, Tao A, Zhang L, Fang P, Lin L, Su J (2014) The CCoAOMT1 gene from jute (Corchorus capsularis L.) is involved in lignin biosynthesis in Arabidopsis thaliana. Gene 546:398–402

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgments

This project was sponsored by funds from the Distinguished Young Research Fund in Fujian Agriculture and Forestry University (xjq201401; 2012xjj01), China National Scholarship (201408350022), Introduction Breeding and Varieties Demonstration of Featured Crops between China and Benin (2015I0001), National Agri-Industry Technology Research System for Crops of Bast and Leaf Fiber, China (nycytx-19-E06), and Experiment Station of Kenaf and Kenaf of the Ministry of Agriculture in Southeast China (Nongkejiaofa 2011).

Author contributions

J.Q. and L.Z. conceived the project and its components. X.W. and L.Z. contributed to RNA isolation and cDNA library construction. X.W. and L.Z. performed SSR development and polymorphism evaluation. J.X., L.L., and J.Q. selected the germplasms. X.W. and L.Z. prepared figures and tables. L.Z. analyzed all the data and wrote the paper.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Liwu Zhang or Jianmin Qi.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical standard

The experiment conducted complies with the laws of China.

Electronic supplementary material

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhang, L., Wan, X., Xu, J. et al. De novo assembly of kenaf (Hibiscus cannabinus) transcriptome using Illumina sequencing for gene discovery and marker identification. Mol Breeding 35, 192 (2015). https://doi.org/10.1007/s11032-015-0388-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11032-015-0388-0

Keywords

  • Kenaf (Hibiscus cannabinus L.)
  • Transcriptome sequencing
  • SNP
  • SSR