Abstract
Coffee is one of the world’s most important agricultural commodities. Coffee belongs to the Rubiaceae family in the euasterid I clade of dicotyledonous plants, to which the Solanaceae family also belongs. Two bacterial artificial chromosome (BAC) libraries of a homozygous doubled haploid plant of Coffea canephora were constructed using two enzymes, HindIII and BstYI. A total of 134,827 high quality BAC-end sequences (BESs) were generated from the 73,728 clones of the two libraries, and 131,412 BESs were conserved for further analysis after elimination of chloroplast and mitochondrial sequences. This corresponded to almost 13 % of the estimated size of the C. canephora genome. 6.7 % of BESs contained simple sequence repeats, the most abundant (47.8 %) being mononucleotide motifs. These sequences allow the development of numerous useful marker sites. Potential transposable elements (TEs) represented 11.9 % of the full length BESs. A difference was observed between the BstYI and HindIII libraries (14.9 vs. 8.8 %). Analysis of BESs against known coding sequences of TEs indicated that 11.9 % of the genome corresponded to known repeat sequences, like for other flowering plants. The number of genes in the coffee genome was estimated at 41,973 which is probably overestimated. Comparative genome mapping revealed that microsynteny was higher between coffee and grapevine than between coffee and tomato or Arabidopsis. BESs constitute valuable resources for the first genome wide survey of coffee and provide new insights into the composition and evolution of the coffee genome.
This is a preview of subscription content,
to check access.





References
Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402. doi:10.1093/nar/25.17.3389
Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408(6814):796–815
Berardini TZ, Mundodi S, Reiser L, Huala E, Garcia-Hernandez M, Zhang P, Mueller LA, Yoon J, Doyle A, Lander G, Moseyko N, Yoo D, Xu I, Zoeckler B, Montoya M, Miller N, Weems D, Rhee SY (2004) Functional annotation of the Arabidopsis genome using controlled vocabularies. Plant Physiol 135(2):745–755. Epub 2004 Jun 1
Blanc G, Barakat A, Guyot R, Cooke R, Delseny M (2000) Extensive duplication and reshuffling in the Arabidopsis genome. Plant Cell 12:1093–1101. doi:10.2307/3871257
Cavagnaro PF, Chung SM, Szklarczyk M, Grzebelus D, Senalik D, Atkins AE, Simon PW (2009) Characterization of a deep coverage carrot (Daucus carota L.) BAC library and initial analysis of BAC-end sequences. Mol Genet Genomics 281:273–288. doi:10.1007/s00438-008-0411-9
Cenci A, Combes MC, Lashermes P (2010) Comparative sequence analyses indicate that Coffea (Asterids) and Vitis (Rosids) derive from the same paleo-hexaploid ancestral genome. Mol Genet Genomics 283:493–501. doi:10.1007/s11103-011-9852-3
Cenci A, Combes MC, Lashermes P (2012) Genome evolution in diploid and tetraploid Coffea species as revealed by comparative analysis of orthologous genome segments. Plant Mol Biol 78:135–145. doi:10.1007/s11103-011-9852-3
Cenci A, Combes MC, Lashermes P (2013) Differences in evolution rates among eudicotiledon species observed by analysis of protein divergence. J Hered. doi:10.1093/jhered/est025
Cheng X, Xu J, Xia S, Gu J, Yang Y, Fu J, Qian X, Zhang S, Wu J, Liu K (2009) Development and genetic mapping of microsatellite markers from genome survey sequences in Brassica napus. Theor Appl Genet 118:1121–1131. doi:10.1007/s00122-009-0967-8
Cheung F, Town CD (2007) A BAC-end view of the Musa acuminata genome. BMC Plant Biol 7:29. doi:10.1186/1471-2229-7-29
Conesa A, Götz S (2008) Blast2GO: a comprehensive suite for functional analysis in plant genomics. Int J Plant Genomics 2008:1–13
Couturon E, Berthaud J (1982) Présentation d’une méthode de récupération d’haploïde spontanés découverts chez le Coffea canephora var. robusta. Café Cacao Thé 19(3):267–270
Datema E, Mueller LA, Buels R, Giovannoni JJ, Visser RGF, Stiekema WJ, van Ham CHJ (2008) Comparative BAC-end sequence analysis of tomato and potato reveals overrepresentation of specific gene families in potato. BMC Plant Biol 8:34. doi:10.1186/1471-2229-8-34
Davis AP, Tosh J, Ruch N, Fay MF (2011) Growing coffee: Psilanthus (Rubiaceae) subsumed on the basis of molecular and morphological data; implications for the size, morphology, distribution and evolutionary history of Coffea. Bot J Linn Soc 167(4):357–377. doi:10.1111/j.1095-8339.2011.01177.x
Febrer M, Cheung F, Town CD, Cannon SB, Young ND, Abberton MT, Jenkins G, Milbourne D (2007) Construction, characterization and preliminary BAC-end sequencing analysis of a bacterial artificial chromosome library of white clover (Trifolium repens L.). Genome 50:412–421. doi:10.1139/G07-013
Feuillet C, Leach JE, Rogers J, Schnable PS, Eversole K (2011) Crop genome sequencing: lessons and rationales. Trends Plant Sci 16(2):77–88
Frelichowski JE Jr, Palmer MB, Main D, Tomkins JP, Cantrell RG, Stelly DM, Yu J, Kohel RJ, Ulloa M (2006) Cotton genome mapping with new microsatellites from Acala ‘Maxxa’ BAC-ends. Mol Genet Genomics 275:479–491. doi:10.1007/s00438-006-0106-z
Goff SA, Ricke D, Lan T-H, Presting G, Wang R, Dunn M, Glazebrook J, Sessions A, Oeller P, Varma H et al (2002) A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). Science 296:92–100. doi:10.1126/science.1068275
Guyot R, de la Mare M, Viader V, Hamon P, Coriton O, Bustamante-Porras J, Poncet V, Campa C, Hamon S, de Kochko A (2009) Microcollinearity in an ethylene receptor coding gene region of the Coffea canephora genome is extensively conserved with Vitis vinifera and other distant dicotyledonous sequenced genomes. BMC Plant Biol 9(1):22. doi:10.1186/1471-2229-9-22
Guyot R, Lefebvre-Pautigny F, Tranchant-Dubreuil C, Rigoreau M, Hamon P, Leroy T, Hamon S, Poncet V, Crouzillat D, de Kochko A (2012) Ancestral synteny shared between distantly-related plant species from the asterid (Coffea canephora and Solanum Sp.) and rosid (Vitis vinifera) clades. BMC Genomics 13:103. doi:10.1186/1471-2164-13-103
Han Y, Korban SS (2008) An overview of the apple genome through BAC-end sequence analysis. Plant Mol Biol 67:581–588. doi:10.1007/s11103-008-9321-9
Hong CP, Lee SJ, Park JY, Plaha P, Park YS, Lee YK, Choi JE, Kim KY, Lee JH, Lee J, Jin H, Choi SR, Lim YP (2004) Construction of a BAC library of Korean ginseng and initial analysis of BAC-end sequences. Mol Genet Genomics 271:709–716. doi:10.1007/s00438-004-1021-9
Hong CP, Plaha P, Koo DH, Yang TJ, Choi SR, Lee YK, Uhm T, Bang JW, Edwards D, Bancroft I, Park BS, Lee J, Lim YP (2006) A survey of the Brassica rapa genome by BAC-end sequence analysis and comparison with Arabidopsis thaliana. Mol Cells 22(3):300–307
Hsu CC, Chung YL, Chen TC, Lee YL, Kuo YT, Tsai WC, Hsiao YY, Chen YW, Wu WL, Chen HH (2011) An overview of the Phalaenopsis orchid genome through BAC-end sequence analysis. BMC Plant Biol 11:3. doi:10.1186/1471-2229-11-3
Huang XQ, Madan A (1999) CAP3: a DNA sequence assembly program. Genome Res 9(9):868–877. doi:10.1101/gr.9.9.868
Huo N, Lazo GR, Vogel JP, You FM, Ma Y, Hayden DM, Coleman-Derr D, Hill TA, Dvorak J, Anderson OD, Luo MC, Gu YQ (2008) The nuclear genome of Brachypodium distachyon: analysis of BAC-end sequences. Funct Integr Genomics 8:135–147. doi:10.1007/s10142-007-0062-7
Jaillon O, Aury J, Noel B, Policriti A, Clepet C, Casagrande A, Choisne N, Aubourg S, Vitulo N, Jubin C, Vezzi A, Legeai F, Hugueney P, Dasilva C, Horner D, Mica E, Jublot D, Poulain J, Bruyère C, Billault A, Segurens B, Gouyvenoux M, Ugarte E, Cattonaro F, Anthouard V, Vico V, Del Fabbro C, Alaux M, Di Gaspero G, Dumas V, Felice N, Paillard S, Juman I, Moroldo M, Scalabrin S, Canaguier A, Le Clainche I, Malacrida G, Durand E, Pesole G, Laucou V, Chatelet P, Merdinoglu D, Delledonne M, Pezzotti M, Lecharny A, Scarpelli C, Artiguenave F, Pè ME, Valle G, Morgante M, Caboche M, Adam-Blondon A, Weissenbach J, Quétier F, Wincker P (2007) The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature 449:463–467. doi:10.1038/nature06148
Jansen RK, Kaittanis C, Saski C, Lee SB, Tomkins J, Alverson AJ, Daniell H (2006) Phylogenetic analyses of Vitis (Vitaceae) based on complete chloroplast genome sequences: effects of taxon sampling and phylogenetic methods on resolving relationships among rosids. BMC Evol Biol 6:32. doi:10.1186/1471-2148-6-32
Jetty SS, Luo M, Goicoechea J, Wang W, Kudrna D, Mueller C, Talag J, Kim H, Sisneros N, Blackmon B, Fang E, Tomkins J, Brar D, MacKil D, McCouch S, Kurata N, Lambert G, Galbraith G, Arumuganathan K, Rao R, Walling J, Gill N, Yu Y, SanMiguel P, Soderlund C, Jackson S, Wing R (2006) The Oryza BAC library resource: construction and analysis of 12 deep-coverage large-insert BAC libraries. Genome Res 16(1):140–147
Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J (2005) Repbase update, a database of eukaryotic repetitive elements. Cytogenet Genome Res 110(1–4):462–467. doi:10.1159/000084979
Kohany O, Gentles AJ, Hankus L, Jurka J (2006) Annotation, submission and screening of repetitive elements in Repbase: Repbase submitter and censor. BMC Bioinformatics 7:474. doi:10.1186/1471-2105-7-474
Lai CWJ, Yu Q, Hou S, Skelton RL, Jones MR, Lewis KLT, Murray J, Eustice M, Guan P, Agbayani R, Moore PH, Ming R, Presting GG (2006) Analysis of papaya BAC-end sequences reveals first insights into the organization of a fruit tree genome. Mol Genet Genomics 276:1–12. doi:10.1007/s00438-006-0122-z
Lashermes P, Couturon E, Charrier A (1994) Doubled haploids of Coffea canephora—development, fertility and agronomic characteristics. Euphytica 74:149–157. doi:10.1007/BF00033781
Lefebvre-Pautigny F, Wu F, Philippot M, Rigoreau M, Priyono, Zouine M, Frasse P, Bouzayen M, Broun P, Pétiard V et al (2010) High resolution synteny maps allowing direct comparisons between the coffee and tomato genomes. Tree Genet Genomes 6(4):565–577. doi:10.1007/s11295-010-0272-3
Mahé L, Combes MC, Lashermes P (2007) Comparison between a coffee single copy chromosomal region and Arabidopsis duplicated counterparts evidenced high level synteny between the coffee genome and the ancestral Arabidopsis genome. Plant Mol Biol 64:699–711. doi:10.1007/s11103-007-9191-6
Mao L, Wood TC, Yu Y, Budiman MA, Tomkins JP, Woo S-S, Sasinowski M, Presting G, Frisch D, Goff S, Dean RA, Wing RA (2000) Rice transposable elements: a survey of 73000 sequence tagged connectors. Genome Res 10:982–990. doi:10.1101/gr.10.7.982
Messing J, Bharti AK, Karlowski KM, Gundlach H, Kim HR, Yu Y, Wei F, Fuks G, Soderlund CA, Mayer KFX, Wing RA (2004) Sequence composition and genome organization of maize. Proc Natl Acad Sci USA 101:14349–14354. doi:10.1073/pnas.0406163101
Noirot M, Poncet V, Barre P, Hamon P, Hamon S, de Kochko A (2003) Genome size variations in diploid African Coffea species. Ann Bot London 92(5):709–714. doi:10.1093/aob/mcg183
Paux E, Roger D, Badaeva E, Gay G, Bernard M, Sourdille P, Feuillet C (2006) Characterizing the composition and evolution of homoeologous genomes in hexaploid wheat through BAC-end sequencing on chromosome 3B. Plant J 48:463–474. doi:10.1111/j.1365-313X.2006.02891.x
Pertea G, Huang X, Liang F, Antonescu V, Sultana R, Karamycheva S, Lee Y, White J, Cheung F, Parvizi B, Tsai J, Quackenbush J (2003) TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasets. Bioinformatics 19(5):651–652. doi:10.1093/bioinformatics/btg034
Poncet V, Rondeau M, Tranchant C, Cayrel A, Hamon S, de Kochko A, Hamon P (2006) SSR mining in coffee tree EST databases: potential use of EST–SSRs as markers for the Coffea genus. Mol Genet Genomics 276:436–449. doi:10.1007/s00438-006-0153-5
Poncet V, Dufour M, Hamon P, Hamon S, de Kochko A, Leroy T (2007) Development of genomic microsatellite markers in Coffea canephora and their transferability to other coffee species. Genome 50(12):1156–1161. doi:10.1139/G07-073
Robbrecht E, Manen JF (2006) The major evolutionary lineages of the coffee family (Rubiaceae, angiosperms). Combined analysis (nDNA and cpDNA) to infer the position of Coptosapelta and Luculia, and supertree construction based on rbcl, rps16, trnL-trnF and atpB-rbcL data. A new classification in two subfamilies, Cinchonoideae and Rubioideae. Syst Geogr 76:85–146
Shultz JL, Kazi S, Bashir R, Afzal JA, Lightfoot DA (2007) The development of BAC-end sequence based microsatellite markers and placement in the physical and genetic maps of soybean. Theor Appl Genet 114:1081–1090. doi:10.1007/s00122-007-0501-9
Terol J, Naranjo MA, Ollitrault P, Talon M (2008) Development of genomic resources for Citrus clementina: characterization of three deep coverage BAC libraries and analysis of 46,000 BAC-end sequences. BMC Genomics 9:423. doi:10.1186/1471-2164-9-423
Tomato Genome Consortium (2012) The tomato genome sequence provides insights into fleshly fruit evolution. Nature 485(7400):635–641. doi:10.1038/nature11119
Vidal RO, Costa Mondego JM, Pot D, Ambrósio AB, Andrade AC, Protasio Oereira LF, Colombo CA, Esteves Vieira LG, Carazzolle MF, Pereira G (2010) A high-throughput data mining of single nucleotide polymorphisms in Coffea species expressed sequence tags suggests differential homeologous gene expression in the allotetraploid Coffea arabica. Plant Physiol 154:1053–1066. doi:10.1104/pp.110.162438
Wikström N, Savolainen V, Chase MW (2001) Evolution of the angiosperms: calibrating the family tree. Proc R Soc B Biol Sci 268(1482):2211–2220. doi:10.1098/rspb.2001.1782
Acknowledgments
This research was supported by a grant from the Agence Nationale de la Recherche (ANR; Genoplante ANR-08-GENM-022-001).
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Dereeper, A., Guyot, R., Tranchant-Dubreuil, C. et al. BAC-end sequences analysis provides first insights into coffee (Coffea canephora P.) genome composition and evolution. Plant Mol Biol 83, 177–189 (2013). https://doi.org/10.1007/s11103-013-0077-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11103-013-0077-5