Abstract
Coffea arabica, C. canephora and C. excelsa, with differentiated morphological traits and distinct agro-climatic conditions, compose the majority of the global coffee plantation. To comprehensively understand their genetic diversity and divergence for future genetic improvement requires high-density markers. Here, we sequenced 93 accessions encompassing these three Coffea species, uncovering 15,367,960 single-nucleotide polymorphisms (SNPs). These SNPs are unequally distributed across different genomic regions and gene families, with two disease-resistant gene families showing the highest SNP density, suggesting strong balancing selection. Meanwhile, the allotetraploid C. arabica exhibits greater nucleotide diversity, followed by C. canephora and C. excelsa. Population divergence (FST), population stratification and phylogeny all support strong divergence among species, with C. arabica and its parental species C. canephora being closer genetically. Scanning of genomic islands with elevated FST and structure-disruptive SNPs contributing to species divergence revealed that most of the selected genes in each lineage are independent, with a few being selected in parallel for two or three species, such as genes in root hair cell development, flavonols accumulation and disease-resistant genes. Moreover, some of the SNPs associated with coffee lipids exhibit significantly biased allele frequency among species, being valuable for interspecific breeding. Overall, our study not only uncovers the key population genomic patterns among species but also contributes a substantial genomic resource for coffee breeding.
Key message
Whole-genome resequencing of 93 coffee accessions unveils diversity and genetic relationship of three important Coffea species. Independent and parallel selection of genes are identified during the three species divergence
Similar content being viewed by others
Data availability
All Illumina sequencing data is submitted to NCBI and is available under Accession Number PRJNA505204 in SRA database.
References
Aerts R, Berecha G, Gijbels P, Hundera K, Van Glabeke S, Vandepitte K, Muys B, Roldán-Ruiz I, Honnay O (2013) Genetic variation and risks of introgression in the wild Coffea arabica gene pool in south-western Ethiopian montane rainforests. Evol Appl 6(2):243–252
Afzal AJ, Wood AJ, Lightfoot DA (2008) Plant receptor-like serine threonine kinases: roles in signaling and plant defense. Mol Plant Microbe Interact 21(5):507–517
Aga E, Bekele E, Bryngelsson T (2005) Inter-simple sequence repeat (ISSR) variation in forest coffee trees (Coffea arabica L.) populations from Ethiopia. Genetica 124:213–221
Ananda Kumar S, Sudisha J, Sreenath HL (2008) Genetic relation of Coffea and Indian Psilanthus species as revealed through RAPD and ISSR markers. Int J Integr Biol 3:150–158
Anderson EC, Ng TC, Crandall ED, Garza JC (2017) Genetic and individual assignment of tetraploid green sturgeon with SNP assay data. Conserv Genet 18(5):1119–1130
Baba SA, Vishwakarma RA, Ashraf N (2017) Functional characterization of CsBGlu12, a β-glucosidase from crocus sativus, provides insights into its role in abiotic stress through accumulation of antioxidant flavonols. J Biol Chem 3:150–158
Beaumont MA (2005) Adaptation and speciation: what can FST tell us? Trends Ecol Evol 20(8):0–440
Bikard D, Patel D, Le Metté C, Giorgi V, Camilleri C, Bennett MJ, Loudet O (2009) Divergent evolution of duplicate genes leads to genetic incompatibilities within A. thaliana. Science 323(5914):623–626
Cingolani P, Platts A, Wang LL, Coon M, Nguyen T, Wang L, Land SJ, Lu X, Ruden DM (2012) A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff . Fly 6(2):80–92
Conesa A, Götz S, García-Gómez JM, Terol J, Talón M, Robles M (2005) Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21(18):3674–3676
Coulibaly I, Noirot M, Lorieux M, Charrier A, Hamon S, Louarn J (2002) Introgression of self-compatibility from Coffea heterocalyx to the cultivated species Coffea canephora. Theor Appl Genet 105(6–7):994–999
Da Silva EAA, Toorop PE, Van Aelst AC, Hilhorst HWM (2004) Abscisic acid controls embryo growth potential and endosperm cap weakening during coffee (Coffea arabica cv. Rubi) seed germination. Planta 220(2):251–261
Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, McVean G, Durbin R (2011) The variant call format and VCFtools. Bioinformatics 21(18):3674–3676
Denoeud F, Carretero-Paulet L, Dereeper A, Droc G, Guyot R, Pietrella M, Zheng C, Alberti A, Anthony F, Aprea G, Aury JM, Bento P, Bernard M, Bocs S, Campa C, Cenci A, Combes MC, Crouzillat D, Da Silva C, Daddiego L, De Bellis F, Dussert S, Garsmeur O, Gayraud T, Guignon V, Jahn K, Jamilloux V, Joët T, Labadie K, Lan T, Leclercq J, Lepelley M, Leroy T, Li LT, Librado P, Lopez L, Muñoz A, Noel B, Pallavicini A, Perrotta G, Poncet V, Pot D, Priyono, Rigoreau M, Rouard M, Rozas J, Tranchant-Dubreuil C, VanBuren R, Zhang Q, Andrade AC, Argout X, Bertrand B, De Kochko A, Graziosi G, Henry RJ, Jayarama, Ming R, Nagai C, Rounsley S, Sankoff D, Giuliano G, Albert VA, Wincker P, Lashermes P (2014) The coffee genome provides insight into the convergent evolution of caffeine biosynthesis. Science 345(6201):1181–1184
DeYoung BJ, Innes RW (2006) Plant NBS-LRR proteins in pathogen sensing and host defense. Nat Immunol 7(12):1243
Ekblaw WE, Ukers WH (1935) All about coffee. Library of Alexandria, Alexandria
Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, Heger A, Hetherington K, Holm L, Mistry J, Sonnhammer ELL, Tate J, Punta M (2014) Pfam: the protein families database. Nucleic Acids Res 42(Database issue):222–230
Frichot E, François O (2015) LEA: an R package for landscape and ecological association studies. Methods Ecol Evol 6(8):925–929
Garavito A, Montagnon C, Guyot R, Bertrand B (2016) Identification by the DArTseq method of the genetic origin of the Coffea canephora cultivated in Vietnam and Mexico. BMC Plant Biol 16(1):242
Hamon P, Grover CE, Davis AP, Rakotomalala JJ, Raharimalala NE, Albert VA, Sreenath HL, Stoffelen P, Mitchell SE, Couturon E, Hamon S, de Kochko A, Crouzillat D, Rigoreau M, Sumirat U, Akaffou S, Guyot R (2017) Genotyping-by-sequencing provides the first well-resolved phylogeny for coffee (Coffea) and insights into the evolution of caffeine content in its species: GBS coffee phylogeny and the evolution of caffeine content. Mol Phylogenet Evol 109:351–361
Hardigan MA, Laimbeer FPE, Newton L, Crisovan E, Hamilton JP, Vaillancourt B, Wiegert-Rininger K, Wood JC, Douches DS, Farré EM, Veilleux RE, Buell CR (2017) Genome diversity of tuber-bearing Solanum uncovers complex evolutionary history and targets of domestication in the cultivated potato. Proc Natl Acad Sci USA 114(46):E9999–E10008
Hazzouri KM, Flowers JM, Visser HJ, Khierallah HSM, Rosas U, Pham GM, Meyer RS, Johansen CK, Fresquez ZA, Masmoudi K, Haider N, El Kadri N, Idaghdour Y, Malek JA, Thirkhill D, Markhand GS, Krueger RR, Zaid A, Purugganan MD (2015) Whole genome re-sequencing of date palms yields insights into diversification of a fruit tree crop. Nat Commun 6:8824
Lashermes P, Andrzejewski S, Bertrand B, Combes M-C, Dussert S, Graziosi G, Trouslot P, Anthony F (2000) Molecular analysis of introgressive breeding in coffee (Coffea arabica L.). Theor Appl Genet 100(1):139–146
Lehti-Shiu MD, Shiu SH (2012) Diversity, classification and function of the plant protein kinase superfamily. Philos Trans R Soc B Biol Sci 367(1602):2619
Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 26:589–595
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25(16):2078–2079
Liu Y, Liu Y, Huang H (2010) Genetic variation and natural hybridization among sympatric Actinidia species and the implications for introgression breeding of kiwifruit. Tree Genet Genomes 6(5):801–813
Masumbuko LI, Bryngelsson T (2006) Inter simple sequence repeat (ISSR) analysis of diploid coffee species and cultivated Coffea arabica L. from Tanzania. Genet Resour Crop Evol 53(2):357–366
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA (2010) The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20(9):1297–1303
Merot-L’anthoene V, Tournebize R, Darracq O, Rattina V, Lepelley M, Bellanger L, Tranchant‐Dubreuil C, Coulée M, Pégard M, Metairon S, Fournier C, Stoffelen P, Janssens SB, Kiwuka C, Musoli P, Sumirat U, Legnaté H, Kambale J, Neto J, Revel C, de Kochko A, Descombes P, Crouzillat D, Poncet V (2018) Ferreira da Costa. Plant Biotechnol J. https://doi.org/10.1111/pbi.13066
Mishra MK, Nishani S, Jayarama (2011) Molecular identification and genetic relationships among coffee species (Coffea L.) inferred from issr and srap marker analyses. Arch Biol Sci 63(3):667–679
Mizuta Y, Harushima Y, Kurata N (2010) Rice pollen hybrid incompatibility caused by reciprocal gene loss of duplicated genes. Proc Natl Acad Sci USA 107(47):20417–20422
Mizutani M (2012) Impacts of diversification of cytochrome P450 on plant metabolism. Biol Pharm Bull 35(6):824–832
Noir S, Anthony F, Bertrand B, Combes MC, Lashermes P (2003) Identification of a major gene (Mex-1) from Coffea canephora conferring resistance to Meloidogyne exigua in Coffea arabica. Plant Pathol 52(1):97–103
Oliveira HR, Hagenblad J, Leino MW, Leigh FJ, Lister DL, Penã-Chocarro L, Jones MK (2014) Wheat in the Mediterranean revisited: tetraploid wheat landraces assessed with elite bread wheat single nucleotide polymorphism markers. BMC Genet 15(1):54
Owuor JBO (1985) Interspecific hybridization between Coffea arabica L. and tetraploid C. canephora P. Ex Fr. II. Meiosis in F1 hybrids and back crosses to C. Arabica. Euphytica 30(3):861–866
Panchy N, Lehti-Shiu MD, Shiu S-H (2016) Evolution of gene duplication in plants. Plant Physiol 171(4):2294–2316
Pandey S, Nelson DC, Assmann SM (2009) Two novel GPCR-type G proteins are abscisic acid receptors in Arabidopsis. Cell 136(1):0–148
Perrois C, Strickler SR, Mathieu G, Lepelley M, Bedon L, Michaux S, Husson J, Mueller L, Privat I (2014) Differential regulation of caffeine metabolism in Coffea arabica (Arabica) and Coffea canephora (Robusta). Planta 241(1):179–191
Price MN, Dehal PS, Arkin AP (2010) FastTree 2: approximately maximum-likelihood trees for large alignments. PLoS ONE 5(5):e9490
R Development Core Team (2011) R: a language and environment for statistical computing. R Development Core Team, Vienna, pp 12–21
Ramu P, Esuma W, Kawuki R, Rabbi IY, Egesi C, Bredeson JV, Bart RS, Verma J, Buckler ES, Lu F (2017) Cassava haplotype map highlights fixation of deleterious mutations during clonal propagation. Nat Genet 49(6):959–963
Renaut S, Grassa CJ, Yeaman S, Moyers BT, Lai Z, Kane NC, Bowers JE, Burke JM, Rieseberg LH (2013) Genomic islands of divergence are not affected by geography of speciation in sunflowers. Nat Commun 4:1827
Sant’Ana GC, Pereira LFP, Pot D, Ivamoto ST, Domingues DS, Ferreira RV, Pagiatto NF, Da Silva BSR, Nogueira LM, Kitzberger CSG, Scholz MBS, De Oliveira FF, Sera GH, Padilha L, Labouisse JP, Guyot R, Charmetant P, Leroy T (2018) Genome-wide association study reveals candidate genes influencing lipids and diterpenes contents in Coffea arabica L. Sci Rep 8(1):1–12
Schuler MA, Werck-Reichhart D (2003) Functional genomics of P450s. Annu Rev Plant Biol 54(1):629–667
Smýkal P, Coyne CJ, Ambrose MJ, Maxted N, Schaefer H, Blair MW, Berger J, Greene SL, Nelson MN, Besharat N, Vymyslický T, Toker C, Saxena RK, Roorkiwal M, Pandey MK, Hu J, Li YH, Wang LX, Guo Y, Qiu LJ, Redden RJ, Varshney RK (2015) Legume crops phylogeny and genetic diversity for science and breeding. CRC Crit Rev Plant Sci 34(1–3):43–104
Surya Prakash N, Combes MC, Somanna N, Lashermes P (2002) AFLP analysis of introgression in coffee cultivars (Coffea arabica L.) derived from a natural interspecific hybrid. Euphytica 124:265–271
Sutherland BL, Galloway LF (2017) Postzygotic isolation varies by ploidy level within a polyploid complex. New Phytol 213(1):404–412
Tran HTM, Ramaraj T, Furtado A, Lee LS, Henry RJ (2018) Use of a draft genome of coffee (Coffea arabica) to identify SNPs associated with caffeine content. Plant Biotechnol J 16(10):1756–1766
van der Vossen H, Bertrand B, Charrier A (2015) Next generation variety development for sustainable production of arabica coffee (Coffea arabica L.): a review. Euphytica 204(2):243–256
Wang M, Yu Y, Haberer G, Marri PR, Fan C, Goicoechea JL, Zuccolo A, Song X, Kudrna D, Ammiraju JSS, Cossu RM, Maldonado C, Chen J, Lee S, Sisneros N, de Baynast K, Golser W, Wissotski M, Kim W, Sanchez P, Ndjiondjop M-N, Sanni K, Long M, Carney J, Panaud O, Wicker T, Machado CA, Chen M, Mayer KFX, Rounsley S, Wing RA (2014) The genome sequence of African rice (Oryza glaberrima) and evidence for independent domestication. Nat Genet 46:982–988
Wang J, Street NR, Scofield DG, Ingvarsson PK (2016) Variation in linked selection and recombination drive genomic divergence during allopatric speciation of European and American aspens. Mol Biol Evol 33:1754–1767
Wang M, Tu L, Lin M, Lin Z, Wang P, Yang Q, Ye Z, Shen C, Li J, Zhang L, Zhou X, Nie X, Li Z, Guo K, Ma Y, Huang C, Jin S, Zhu L, Yang X, Min L, Yuan D, Zhang Q, Lindsey K, Zhang X (2017) Asymmetric subgenome selection and cis-regulatory divergence during cotton domestication. Nat Genet 49(4):579
Acknowledgements
We thank all members in Yan lab for discussion and comments on the manuscript. This work was supported by the National Natural Science Foundation of China (31501364) and Hainan Provincial Natural Science Foundation of China (2018CXTD342). We thank Razgar Seyed Rahmani and Robeto Bartolome from Ghent University for improving the language of the manuscript.
Author information
Authors and Affiliations
Contributions
LH, LY conceived and led the project. LH, XW, LY developed and performed genome assembly and analysis. LH, YD, YL, CH performed genomic sequencing. LH, TS, LY wrote manuscript. TS revised the manuscript.
Corresponding authors
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
11103_2020_974_MOESM1_ESM.pdf
Supplementary material 1 (PDF 3571.4 kb). Supplementary Figure S1. Eigenvalue from PC1 to PC100 in PCA study of 93 coffee individuals. Supplementary Figure S2. Approximately-Maximum-likelihood SNP tree with branch length resembling relative divergence. Supplementary Figure S3. Cross-entropy from K=1 to K=10 in population STRUCTURE analysis in LEA. Supplementary Figure S4. Venn diagram of genomic regions being selected by FST analyses. Supplementary Figure S5. Venn diagram of GO terms of genes being selected by FST analyses. Supplementary Figure S6. Top 20 enriched GO terms of commonly selected genes. Supplementary Figure S7. Top 20 enriched GO terms of selected genes between C. arabica and C. canephora using SNPs of disruptive impact. Supplementary Figure S8. Top 20 enriched GO terms of selected genes between C. arabica and C. excelsa using SNPs of disruptive impact. Supplementary Figure S9. Top 20 enriched GO terms of selected genes between C. canephora and C. excelsa using SNPs of disruptive impact. Supplementary Figure S10. Proportions of gene members selected by FST analysis based on SNPs of disruptive impact for pairwise comparsions of all three Coffea species.
11103_2020_974_MOESM2_ESM.xlsx
Supplementary material 2 (XLSX 4429.7 kb). Supplementary Table S1. Sample names, country of origin, climate and traits of 93 coffee accessions. Supplementary Table S2. Mapping rate, depth, coverage, fraction of missing SNPs in 93 coffee accessions on the reference genome. Supplementary Table S3. Comparison of allele consistency with known SNPs. Supplementary Table S4. Pfam annotation of coffee reference genome. Supplementary Table S5. Gene Ontology enrichment analysis of candidate genes under selection after split of C. arabica and C. canephora. Supplementary Table S6. Gene Ontology enrichment analysis of candidate genes under selection after split of C. arabica and C. excelsa. Supplementary Table S7. Gene ontology enrichment analysis of candidate genes under selection after split of C. canephora and C. excelsa. Supplementary Table S8. FST of lipids-associated SNPs during species divergence.
Rights and permissions
About this article
Cite this article
Huang, L., Wang, X., Dong, Y. et al. Resequencing 93 accessions of coffee unveils independent and parallel selection during Coffea species divergence. Plant Mol Biol 103, 51–61 (2020). https://doi.org/10.1007/s11103-020-00974-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11103-020-00974-4