Abstract
High-quality rice reference genomes have accelerated the comprehensive identification of genome-wide variations and research on functional genomics and breeding. Tian-you-hua-zhan has been a leading hybrid in China over the past decade. Here, de novo genome assembly strategy optimization for the rice indica lines Huazhan (HZ) and Tianfeng (TF), including sequencing platforms, assembly pipelines and sequence depth, was carried out. The PacBio and Nanopore platforms for long-read sequencing were utilized, with the Canu, wtdbg2, SMARTdenovo, Flye, Canu-wtdbg2, Canu-SMARTdenovo and Canu-Flye assemblers. The combination of PacBio and Canu was optimal, considering the contig N50 length, contig number, assembled genome size and polishing process. The assembled contigs were scaffolded with Hi-C data, resulting in two “golden quality” rice reference genomes, and evaluated using the scaffold N50, BUSCO, and LTR assembly index. Furthermore, 42,625 and 41,815 non-transposable element genes were annotated for HZ and TF, respectively. Based on our assembly of HZ and TF, as well as Zhenshan97, Minghui63, Shuhui498 and 9311, comprehensive variations were identified using Nipponbare as a reference. The de novo assembly strategy for rice we optimized and the “golden quality” rice genomes we produced for HZ and TF will benefit rice genomics and breeding research, especially with respect to uncovering the genomic basis of the elite traits of HZ and TF.
Similar content being viewed by others
References
Alonge, M., Wang, X., Benoit, M., Soyk, S., Pereira, L., Zhang, L., Suresh, H., Ramakrishnan, S., Maumus, F., Ciren, D., et al. (2020). Major impacts of widespread structural variation on gene expression and crop improvement in tomato. Cell 182, 145–161.e23.
Amarasinghe, S.L., Su, S., Dong, X., Zappia, L., Ritchie, M.E., and Gouil, Q. (2020). Opportunities and challenges in long-read sequencing data analysis. Genome Biol 21, 30–45.
Belser, C., Istace, B., Denis, E., Dubarry, M., Baurens, F.C., Falentin, C., Genete, M., Berrabah, W., Chèvre, A.M., Delourme, R., et al. (2018). Chromosome-scale assemblies of plant genomes using nanopore long reads and optical maps. Nat Plants 4, 879–887.
Burton, J.N., Adey, A., Patwardhan, R.P., Qiu, R., Kitzman, J.O., and Shendure, J. (2013). Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat Biotechnol 31, 1119–1125.
Campbell, M.S., Law, M.Y., Holt, C., Stein, J.C., Moghe, G.D., Hufnagel, D.E., Lei, J., Achawanantakun, R., Jiao, D., Lawrence, C.J., et al. (2014). MAKER-P: A tool kit for the rapid creation, management, and quality control of plant genome annotations. Plant Physiol 164, 513–524.
Chen, X., Pan, J., Cheng, J., Jiang, G., Jin, Y., Gu, Z., Qian, Q., Zhai, W., and Ma, B. (2009). Fine genetic mapping and physical delimitation of the lesion mimic gene spotted leaf 5 (spl5) in rice (Oryza sativa L.). Mol Breeding 24, 387–395.
Chen, X., Lu, Q., Liu, H., Zhang, J., Hong, Y., Lan, H., Li, H., Wang, J., Liu, H., Li, S., et al. (2019). Sequencing of cultivated peanut, Arachis hypogaea, yields insights into genome evolution and oil improvement. Mol Plant 12, 920–934.
Cingolani, P., Platts, A., Wang, L.L., Coon, M., Nguyen, T., Wang, L., Land, S.J., Lu, X., and Ruden, D.M. (2012). A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly 6, 80–92.
Cretu Stancu, M., van Roosmalen, M.J., Renkens, I., Nieboer, M.M., Middelkamp, S., de Ligt, J., Pregno, G., Giachino, D., Mandrile, G., Espejo Valle-Inclan, J., et al. (2017). Mapping and phasing of structural variation in patient genomes using nanopore sequencing. Nat Commun 8, 1326–1338.
Dekker, J., Rippe, K., Dekker, M., and Kleckner, N. (2002). Capturing chromosome conformation. Science 295, 1306–1311.
Deng, Y., Zhai, K., Xie, Z., Yang, D., Zhu, X., Liu, J., Wang, X., Qin, P., Yang, Y., Zhang, G., et al. (2017). Epigenetic regulation of antagonistic receptors confers rice blast resistance with yield balance. Science 355, 962–965.
Du, B., Zhang, W., Liu, B., Hu, J., Wei, Z., Shi, Z., He, R., Zhu, L., Chen, R., Han, B., et al. (2009). Identification and characterization of Bph14, a gene conferring resistance to brown planthopper in rice. Proc Natl Acad Sci USA 106, 22163–22168.
Du, H., Yu, Y., Ma, Y., Gao, Q., Cao, Y., Chen, Z., Ma, B., Qi, M., Li, Y., Zhao, X., et al. (2017). Sequencing and de novo assembly of a near complete indica rice genome. Nat Commun 8, 15324–15335.
Duan, P., Xu, J., Zeng, D., Zhang, B., Geng, M., Zhang, G., Huang, K., Huang, L., Xu, R., Ge, S., et al. (2017). Natural variation in the promoter of GSE5 contributes to grain size diversity in rice. Mol Plant 10, 685–694.
Durand, N.C., Robinson, J.T., Shamim, M.S., Machol, I., Mesirov, J.P., Lander, E.S., and Aiden, E.L. (2016). Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell Syst 3, 99–101.
Finn, R.D., Bateman, A., Clements, J., Coggill, P., Eberhardt, R.Y., Eddy, S. R., Heger, A., Hetherington, K., Holm, L., Mistry, J., et al. (2014). Pfam: the protein families database. Nucl Acids Res 42, D222–D230.
Friedman, E.J., Wang, H.X., Jiang, K., Perovic, I., Deshpande, A., Pochapsky, T.C., Temple, B.R.S., Hicks, S.N., Harden, T.K., and Jones, A.M. (2011). Acireductone dioxygenase 1 (ARD1) is an effector of the heterotrimeric G protein β subunit in Arabidopsis. J Biol Chem 286, 30107–30118.
Fuentes, R.R., Chebotarov, D., Duitama, J., Smith, S., De la Hoz, J.F., Mohiyuddin, M., Wing, R.A., McNally, K.L., Tatarinova, T., Grigoriev, A., et al. (2019). Structural variants in 3000 rice genomes. Genome Res 29, 870–880.
Fukuta, Y., and Yagi, T. (1998). Mapping of a shattering resistance gene in a mutant line SR-5 induced from an indica rice variety, Nan-jing11. Breeding Sci 48, 345–348.
Goel, M., Sun, H., Jiao, W.B., and Schneeberger, K. (2019). SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies. Genome Biol 20, 277–289.
Golicz, A.A., Batley, J., and Edwards, D. (2016). Towards plant pangenomics. Plant Biotechnol J 14, 1099–1105.
Haas, B.J., Delcher, A.L., Mount, S.M., Wortman, J.R., Smith, R.K., Hannick, L.I., Maiti, R., Ronning, C.M., Rusch, D.B., Town, C.D., etal. (2003). Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res 31, 5654–5666.
Haas, B.J., Salzberg, S.L., Zhu, W., Pertea, M., Allen, J.E., Orvis, J., White, O., Buell, C.R., and Wortman, J.R. (2008). Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol 9, R7.
Hayashi, K., Yasuda, N., Fujita, Y., Koizumi, S., and Yoshida, H. (2010). Identification of the blast resistance gene Pit in rice cultivars using functional markers. Theor Appl Genet 121, 1357–1367.
Huang, X., Lu, T., and Han, B. (2013). Resequencing rice genomes: an emerging new era of rice genomics. Trends Genets 29, 225–232.
Jain, M., Fiddes, L., Miga, K., Olsen, H., Paten, B., and Akeson, M. (2015). Improved data analysis for the MinION nanopore sequencer. Nat Methods 12, 351–356.
Jiao, W.B., and Schneeberger, K. (2020). Chromosome-level assemblies of multiple Arabidopsis genomes reveal hotspots of rearrangements with altered evolutionary dynamics. Nat Commun 11, 989.
Kawahara, Y., de la Bastide, M., Hamilton, J.P., Kanamori, H., McCombie, W.R., Ouyang, S., Schwartz, D.C., Tanaka, T., Wu, J., Zhou, S., et al. (2013). Improvement of the Oryza sativa Nipponbare reference genome using next generation sequence and optical map data. Rice 6, 4–13.
Kawashima, C.G., Guimarães, G.A., Nogueira, S.R., MacLean, D., Cook, D.R., Steuernagel, B., Baek, J., Bouyioukos, C., Melo, B.V.A., Tristão, G., et al. (2016). A pigeonpea gene confers resistance to Asian soybean rust in soybean. Nat Biotechnol 34, 661–665.
Kolmogorov, M., Yuan, J., Lin, Y., and Pevzner, P.A. (2019). Assembly of long, error-prone reads using repeat graphs. Nat Biotechnol 37, 540–546.
Koren, S., Walenz, B.P., Berlin, K., Miller, J.R., Bergman, N.H., and Phillippy, A.M. (2017). Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res 27, 722–736.
Korf, I. (2004). Gene finding in novel genomes. BMC Bioinf 5, 59–67.
Korlach, J. (2015). Understanding accuracy in SMRT® sequencing. Available from: URL: https://www.pacb.com/wp-content/uploads/2015/09/Perspective_UnderstandingAccuracySMRTSequencing1.pdf.
Kou, Y., Liao, Y., Toivainen, T., Lv, Y., Tian, X., Emerson, J.J., Gaut, B.S., and Zhou, Y. (2020). Evolutionary genomics of structural variation in Asian rice (Oryza sativa) domestication. Mol Biol Evol 37, 3507–3524.
Langmead, B., and Salzberg, S.L. (2012). Fast gapped-read alignment with Bowtie 2. Nat Methods 9, 357–359.
Li, H. (2013). Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv, 1303.3997.
Liu, H., Wu, S., Li, A. and Ruan, J. (2020a). SMARTdenovo: A de novo assembler using long noisy reads. Preprints, doi: https://doi.org/10.20944/pre-prints202009.0207.v1.
Liu, Y., Du, H., Li, P., Shen, Y., Peng, H., Liu, S., Zhou, G.A., Zhang, H., Liu, Z., Shi, M., et al. (2020b). Pan-genome of wild and cultivated soybeans. Cell 182, 162–176.e13.
Marçais, G., and Kingsford, C. (2011). A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770.
Marçais, G., Delcher, A.L., Phillippy, A.M., Coston, R., Salzberg, S.L., and Zimin, A. (2018). MUMmer4: A fast and versatile genome alignment system. PLoS Comput Biol 14, e1005944.
Michael, T.P., Jupe, F., Bemm, F., Motley, S.T., Sandoval, J.P., Lanz, C., Loudet, O., Weigel, D., and Ecker, J.R. (2018). High contiguity Arabidopsis thaliana genome assembly with a single nanopore flow cell. Nat Commun 9, 541–548.
Murray, M.G., and Thompson, W.F. (1980). Rapid isolation of high molecular weight plant DNA. Nucl Acids Res 8, 4321–4326.
Nawrocki, E.P., and Eddy, S.R. (2013). Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29, 2933–2935.
Ou, S., Chen, J., and Jiang, N. (2018). Assessing genome assembly quality using the LTR Assembly Index (LAI). Nucleic Acids Res 46, e126.
Partners, N.G.D.C.M. (2020). Database resources of the national genomics data center in 2020. Nucleic Acids Res 48, D24–D33.
Pertea, M., Pertea, G.M., Antonescu, C.M., Chang, T.C., Mendell, J.T., and Salzberg, S.L. (2015). StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol 33, 290–295.
Ruan, J., and Li, H. (2020). Fast and accurate long-read assembly with wtdbg2. Nat Methods 17, 155–158.
Schattner, P., Brooks, A.N., and Lowe, T.M. (2005). The tRNAscan-SE, snoscan and snoGPS web servers for the detection of tRNAs and snoRNAs. Nucleic Acids Res 33, W686–W689.
Servant, N., Varoquaux, N., Lajoie, B.R., Viara, E., Chen, C.J., Vert, J.P., Heard, E., Dekker, J., and Barillot, E. (2015). HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol 16, 259–269.
Shen, R., Wang, L., Liu, X., Wu, J., Jin, W., Zhao, X., Xie, X., Zhu, Q., Tang, H., Li, Q., et al. (2017). Genomic structural variation-mediated allelic suppression causes hybrid male sterility in rice. Nat Commun 8, 1310–1319.
Simão, F.A., Waterhouse, R.M., Ioannidis, P., Kriventseva, E.V., and Zdobnov, E.M. (2015). BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212.
Stanke, M., Diekhans, M., Baertsch, R., and Haussler, D. (2008). Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics 24, 637–644.
Stein, J.C., Yu, Y., Copetti, D., Zwickl, D.J., Zhang, L., Zhang, C., Chougule, K., Gao, D., Iwata, A., Goicoechea, J.L., et al. (2018). Genomes of 13 domesticated and wild rice relatives highlight genetic conservation, turnover and innovation across the genus Oryza. Nat Genet 50, 285–296.
Tang, H., Zhang, X., Miao, C., Zhang, J., Ming, R., Schnable, J.C., Schnable, P.S., Lyons, E., and Lu, J. (2015). ALLMAPS: robust scaffold ordering based on multiple maps. Genome Biol 16, 3–17.
Tao, Y., Zhao, X., Mace, E., Henry, R., and Jordan, D. (2019). Exploring and exploiting pan-genomics for crop Improvement. Mol Plant 12, 156–169.
Tillich, M., Lehwark, P., Pellizzer, T., Ulbricht-Jones, E.S., Fischer, A., Bock, R., and Greiner, S. (2017). GeSeq-versatile and accurate annotation of organelle genomes. Nucleic Acids Res 45, W6–W11.
Vaser, R., Sović, I., Nagarajan, N., and Šikić, M. (2017). Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res 27, 737–746.
Walker, B.J., Abeel, T., Shea, T., Priest, M., Abouelliel, A., Sakthikumar, S., Cuomo, C.A., Zeng, Q., Wortman, J., Young, S.K., et al. (2014). Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS ONE 9, e112963.
Wang, W., Mauleon, R., Hu, Z., Chebotarov, D., Tai, S., Wu, Z., Li, M., Zheng, T., Fuentes, R.R., Zhang, F., et al. (2018). Genomic variation in 3,010 diverse accessions of Asian cultivated rice. Nature 557, 43–49.
Wang, Y., Xiong, G., Hu, J., Jiang, L., Yu, H., Xu, J., Fang, Y., Zeng, L., Xu, E., Xu, J., et al. (2015). Copy number variation at the GL7 locus contributes to grain size diversity in rice. Nat Genet 47, 944–948.
Wang, Y., Shang, L., Yu, H., Zeng, L., Hu, J., Ni, S., Rao, Y., Li, S., Chu, J., Meng, X., et al. (2020). A strigolactone biosynthesis gene contributed to the green revolution in rice. Mol Plant 13, 923–932.
Witek, K., Jupe, F., Witek, A.I., Baker, D., Clark, M.D., and Jones, J.D.G. (2016). Accelerated cloning of a potato late blight-resistance gene using RenSeq and SMRT sequencing. Nat Biotechnol 34, 656–660.
Xiao, Y., Li, J., Yu, J., Meng, Q., Deng, X., Yi, Z., and Xiao, G. (2016). Improvement of bacterial blight and brown planthopper resistance in an elite restorer line Huazhan of Oryza. Field Crops Res 186, 47–57.
Xie, X., Du, H., Tang, H., Tang, J., Tan, X., Liu, W., Li, T., Lin, Z., Liang, C., and Liu, Y.G. (2021). A chromosome-level genome assembly of the wild rice Oryza rufipogon facilitates tracing the origins of Asian cultivated rice. Sci China Life Sci 64, 282–293.
Yao, W., Li, G., Yu, Y., and Ouyang, Y. (2018). funRiceGenes dataset for comprehensive understanding and application of rice functional genes. Gigascience 7, 1–9.
Zeng, D., Liu, T., Ma, X., Wang, B., Zheng, Z., Zhang, Y., Xie, X., Yang, B., Zhao, Z., Zhu, Q., et al. (2020). Quantitative regulation of Waxy expression by CRISPR/Cas9-based promoter and 5’UTR-intron editing improves grain quality in rice. Plant Biotechnol J 18, 2385–2387.
Zhang, J., Chen, L.L., Xing, F., Kudrna, D.A., Yao, W., Copetti, D., Mu, T., Li, W., Song, J.M., Xie, W., et al. (2016). Extensive sequence divergence between the reference genomes of two elite indica rice varieties Zhenshan 97 and Minghui 63. Proc Natl Acad Sci USA 113, E5163–E5171.
Zhang, Q., Liang, Z., Cui, X., Ji, C., Li, Y., Zhang, P., Liu, J., Riaz, A., Yao, P., Liu, M., et al. (2018). N6-methyladenine DNA methylation in japonica and indica rice genomes and its association with gene expression, plant development, and stress responses. Mol Plant 11, 1492–1508.
Zhao, Q., Feng, Q., Lu, H., Li, Y., Wang, A., Tian, Q., Zhan, Q., Lu, Y., Zhang, L., Huang, T., et al. (2018). Pan-genome analysis highlights the extent of genomic variation in cultivated and wild rice. Nat Genet 50, 278–284.
Zhao, S., Zhang, C., Mu, J., Zhang, H., Yao, W., Ding, X., Ding, J., and Chang, Y. (2020). All-in-one sequencing: an improved library preparation method for cost-effective and high-throughput next-generation sequencing. Plant Methods 16, 74–87.
Zhou, Y., Chebotarov, D., Kudrna, D., Llaca, V., Lee, S., Rajasekar, S., Mohammed, N., Al-Bader, N., Sobel-Sorenson, C., Parakkal, P., et al. (2020). A platinum standard pan-genome resource that represents the population structure of Asian rice. Sci Data 7, 113–123.
Zhu, H.Y., Wang, S., Zhang, Y., Lin, H., Lu, M., Wu, X.M., Li, S.F., Zhu, X.D., Rao, Y.C., and Wang, Y.X. (2020). QTL excavation and analysis of candidate genes in contents of As, Cu, Fe, Hg and Zn in rice grain (in Chinese). Sci Sin Vitae 50, 623–632.
Acknowledgements
This work was supported by the Agricultural Science and Technology Innovation Program, the Elite Young Scientists Program of CAAS, the Science Technology and Innovation Committee of Shenzhen Municipality (KQJSCX20180323140312935, AGIS-ZDKY202004) and the Dapeng New District Special Fund for Industrial Development (KY20150113).
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Compliance and ethics The author(s) declare that they have no conflict of interest.
Electronic supplementary material
Rights and permissions
About this article
Cite this article
Zhang, H., Wang, Y., Deng, C. et al. High-quality genome assembly of Huazhan and Tianfeng, the parents of an elite rice hybrid Tian-you-hua-zhan. Sci. China Life Sci. 65, 398–411 (2022). https://doi.org/10.1007/s11427-020-1940-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11427-020-1940-9