Introduction

Coffeeae tribe belongs to the Ixoroideae monophyletic subfamily of Rubiaceae family and is close to the tribes Gardenieae and Pavetteae (Bremer and Jansen 1991; Davis et al. 2007). The coffee species share the typical coffee bean morphology, i.e. a groove on the flat side of the seed. They have been described in two genera, Coffea L. and Psilanthus Hook. f., which differ in their flower morphology (Leroy 1980; Bridson 1987; Davis et al. 2005). Each genus has been divided into two subgenera: Coffea subgenus Coffea (95 species), Coffea subgenus Baracoffea (J.-F. Leroy) J.-F. Leroy (nine species), Psilanthus subgenus Psilanthus (two species) and Psilanthus subgenus Afrocoffea (Moens) (20 species) (Bridson 1988; Davis et al. 2005, 2006; Davis and Rakotonasolo 2008). Both genera occur naturally in tropical Africa; Coffea also occurs in Madagascar, Grande Comore and the Mascarenes, and Psilanthus in south-east Asia, Oceania and northern Australia. Research has mainly focused on the Coffea subgenus Coffea, which comprises the majority of coffee species, including those of economic importance, C. arabica L. (65% of world production) and C. canephora Pierre ex A. Froehner (35%) (more details at www.ico.org).

Coffea subgenus Coffea is represented by 41 species in Africa, 58 in Madagascar, one in Grande Comore and three in the Mascarenes, each area having 100% endemicity for its species (Davis et al. 2006). All species are perennial woody bushes or trees that differ greatly in morphology, size and ecological adaptation. They can constitute valuable markers of evolution in the African rain forest since coffee trees have colonised various types of forest, including humid evergreen forest, evergreen forest, mixed evergreen-deciduous forest sometimes seasonally dry, deciduous forest, savannah woodland, gallery forest, coastal forest and temporarily flooded riparian forest (Davis et al. 2006). Except for C. canephora and C. liberica Bull. ex Hiern from West and Central Africa and C. eugenioides S. Moore from East Africa, coffee species have a rather restricted distribution, sometimes only a few square kilometres. Three centres of species diversity have been identified in Madagascar (mainly in the evergreen humid forests of eastern Madagascar), Cameroon (14 species) and Tanzania (16 species, mainly in the eastern Arc Mountain) (Davis et al. 2006). However, many characters considered in taxonomy are weak and variable, and many species have not been fully characterized, so it is hard to draw valid conclusions about their relationships (Bridson 1982; Stoffelen 1998; Davis et al. 2005). All species are diploid (2n = 2x = 22), except C. arabica, which is tetraploid (2n = 4x = 44) (Charrier and Berthaud 1985). They are self-incompatible except the tetraploid species C. arabica and the diploid species C. heterocalyx Stoff. (Coulibaly et al. 2002) and C. anthonyi Stoff. & F. Anthony (Stoffelen et al. 2009), which are self-compatible.

Molecular phylogenies of coffee species have been established based on variations in intergenic spacer sequences (Lashermes et al. 1996; Cros et al. 1998; Tesfaye et al. 2007) and introns (Tesfaye et al. 2007) of plastid DNA, internal transcribed spacer (ITS) sequences of rDNA (Lashermes et al. 1997) and a combination of four plastid regions and ITS (Maurin et al. 2007). Low sequence divergence was found between Coffea and Psilanthus, indicating that molecular data do not support the recognition of two genera (Lashermes et al. 1997; Cros et al. 1998). Enlarging the number of Coffea species and Psilanthus species did not resolve the relationship between the two genera (Maurin et al. 2007). At species level, a small number of parsimony-informative characters were found in molecular studies and the primary clades were weakly supported in the trees. This was attributed to the recent origin of the genus Coffea and a radial mode of speciation (Lashermes et al. 1997; Cros et al. 1998). All the studies pointed to a correspondence between the main groups of species and their geographical origin. In Africa, groups of species were identified in West Africa, West and Central Africa, East-Central Africa and East Africa (Lashermes et al. 1997; Cros et al. 1998; Maurin et al. 2007). A lack of sequence divergence was found in the Madagascan species, and consequently their position has remained unresolved (Maurin et al. 2007).

The major objectives of the present study were to (1) reconstruct the phylogenetic relationships within Coffea subgenus Coffea using sequence data from non-coding regions of plastid DNA, (2) determine the relationships of new species from Central Africa, (3) investigate divergence times within Coffea subgenus Coffea and (4) propose a chronological history of coffee radiation using biogeographic data of African flora.

Materials and methods

Species sampling and outgroup selection

Sequences of the intergenic spacers trnL-F, trnT-L and atpB-rbcL were produced for 24 Coffea subgenus Coffea species, 2 Psilanthus species and 1 outgroup plant (Table 1). The sampling scheme covered the biogeographic diversity of the Coffea subgenus Coffea in Africa as shown by previous studies (Cros et al. 1998; Maurin et al. 2007). All coffee accessions were collected during IRD (formerly ORSTOM) missions in Africa (Anthony et al. 2007) and maintained in greenhouses at the IRD centre in Montpellier. One species from Cameroon (Anthony et al. 1985) and four species from Congo (de Namur et al. 1987) could not be identified and provisional names were used. Gardenia jasminoides J. Ellis was chosen as the outgroup based on previous molecular studies in the Rubiaceae family (Cros et al. 1998; Bremer et al. 1999; Andreasen and Bremer 2000).

Table 1 Accessions used in cpDNA analysis and their geographical distribution according to Davis et al. (2006)

Biogeographic groupings

The African coffee species were grouped according to their biogeographic origin using the terminology of Maurin et al. (2007): Upper Guinea (UG), Lower Guinea/Congolian region (LG/C), East-Central Africa (E-CA) and East Africa (EA). Based on a chorological analysis, Upper Guinea, Lower Guinea and Congolia were recognised as sub-centres of endemism in the Guineo-Congolian Regional Centre of Endemism (G-C) (White 1979, 1983).

Sequence generation

Total DNA was obtained from fresh leaves using the method of Lashermes et al. (1993), modified by Paillard et al. (1996). DNA samples were purified using QIAquick columns (QIAGEN). Target regions were amplified in 25 μl reactions with approximately 20–25 ng of total DNA, 1× colorless GoTaq Flexi Buffer [50 mM KCl, 10 mM Tris-HCl (pH 9.0 at 25°C) and 0.1% Triton X100], 1.5 mM of MgCl2, 0.2 mM of each dNTP, 0.25 μM of each primer and 0.75 U of GoTaq DNA polymerase (Promega). The PCR program consisted of 5 min at 95°C followed by 34 cycles of 1 min at 94°C, 1 min at 50°C and 1 min at 72°C, and a final extension of 72°C for 8 min. The primers used are listed in Table 2. Amplified products were cleaned using the GFX PCR kit (GE Healthcare). Sequencing reactions were performed by Cogenics using Sanger technology, separately for each strand to obtain independent forward and reverse sequences. Forward and reverse fragments were assembled and, in the case of differences, new reactions were performed. All sequences were deposited in GenBank (Table 1).

Table 2 Amplification primers for trnL-F, trnT-L and atpB-rbcL

Sequence comparisons

Sequences were obtained from GenBank for 45 species of Coffea subgenus Coffea and 24 species belonging to other Rubiaceae genera (Table 3). The Coffea species originated from Cameroon (C. bakossi Cheek & Bridson, C. mayombensis A. Chev., C. montekupensis Stoff.), Grande Comore (C. humblotiana Baill.), the Mascarenes (C. macrocarpa A. Rich., C. mauritiana Lam., C. myrtifolia (A. Rich. ex DC.) J.-F. Leroy) and Madagascar (38 species). The other Rubiaceae accessions were considered as representative of the subfamilies Ixoroideae, Cinchonoideae and Rubioideae, according to results of previous studies (Natali et al. 1995; Bremer et al. 1999; Andreasen and Bremer 2000). These subfamilies are generally recognised as being the three major lineages within Rubiaceae (Rydin et al. 2008).

Table 3 Sequences of the intergenic spacers trnL-F, trnT-L and atpB-rbcL obtained from GenBank for 45 species of Coffea subgenus Coffea and 24 species belonging to other Rubiaceae genera

Phylogenetic analyses

Sequence alignments were initially performed with CLUSTAL W (Thompson et al. 1994) and manually adjusted using the MegAlign program of the DNASTAR package (Lasergene v7.2) without difficulty due to low levels of nucleotide variation. Sequence divergence (distance) between accessions was calculated by the DNADIST program in PHYLIP (Felsenstein 1995) using the Kimura-2-parameter model. The data on the intergenic spacers trnL-F, trnT-L and atpB-rbcL were not analysed separately because they all exhibited low levels of sequence divergence. Nucleotide diversity of combined sequences was estimated for biogeographic regions using the Arlequin v3.1 software package (Excoffier et al. 2005).

Phylogenetic analyses were conducted using maximum parsimony (MP) and maximum likelihood (ML) methods implemented in PAUP* 4.0b10 (Swofford 2001). Parsimony analyses (Swofford et al. 1996) were performed using the heuristic search method with a random addition sequence of ten replicates, tree-bisection-reconnection (TBR) branch swapping, and the MULTREES option. All nucleotide substitutions were weighted equally. Branch support was examined in the maximally parsimonious trees (MPTs) with the bootstrap method (Felsenstein 1985) using PAUP* 4.0b10. Bootstrap values were calculated from 10,000 replicates with the random addition and heuristic search option. Gaps were treated either as missing data or as additional characters. As MP analysis ignores information on branch lengths, we also used maximum likelihood (ML), which includes an estimation of branch length and assumes that changes are more likely along long branches than short ones. The ML heuristic analysis was run for 10 random-addition-sequence replicates with TBR branch swapping and the HKY85 sequence evolution model. Bootstraps were calculated using 10,000 replicates.

Bayesian inference of phylogeny was implemented using MrBayes v3.1.2 (Huelsenbeck and Ronquist 2001; Ronquist and Huelsenbeck 2003). MrBayes was run with two simultaneous analyses with four parallel chains in each, starting with a random tree and sampling one tree for each 100 generations. The temperature of the chains and other parameters were left at default value. The program was run up to 1–3 × 106 Markov Chain Monte Carlo (MCMC) generations to arrive at the stationary phase (average standard deviation of split frequencies <0.01).

Divergence time estimation

Divergence time was estimated using two calibration dates: (1) Colonisation of volcanic islands by Coffea species is assumed to have followed their emergence from the Indian Ocean, 8 mya ago for Mauritius (MacDougal and Chauman 1969), 2 mya for Reunion Island (Emerick and Duncan 1982) and 0.5 mya for the Grande Comore (Nougier et al. 1986; Rocha et al. 2005). (2) The origin of the genus Rubia is assumed to be coincident with the first occurrence of fossil pollen records, dated from the Upper Miocene (Muller 1981). Numbers of substitutions per site were calculated using branch lengths for Rubia in the ML analysis and the estimated age of this species, and compared to those found in the Coffea clades.

Results

Coffee sequence characteristics

The trnL-F sequences generated for 26 coffee species ranged in length from 331 to 356 bp. Aligned sequences contained two deletions, one 8 bp and one 11 bp in length. Aligned with the outgroup sequence, the coffee sequences presented an insertion of 10 bp and a deletion of 1 bp. The trnL-F matrix was composed of 357 aligned positions, 21 (5.9%) of which were variable, with 7 (2.0%) parsimony-informative (Table 4). The maximum divergence was 2.4% between coffee sequences [i.e. C. liberica var. dewevrei (De Wild & T. Durand) Lebrun-C. stenophylla G. Don] and 3.6% between the outgroup and coffee (i.e. C. stenophylla) sequences.

Table 4 Characteristics of the cpDNA regions used in the phylogenetic analyses: number of aligned, variable and parsimony-informative positions, consistency index (CI), retention index (RI) and rescaled consistency index (RC) of MP analyses

The trnT-L sequences ranged from 348 to 511 bp in length across the coffee species. Two ambiguous regions containing variable numbers of A and T repeats were removed because of possible sequencing errors caused by Taq polymerase stuttering. Five indels ranging from 1 to 172 bp were required to align the coffee sequences. Their alignment with the outgroup sequence required six additional indels of 1–6 bp. G. jasminoides presented a 32-bp region with a 7-bp inverted repeat at each end. The trnT-L matrix was composed of 524 aligned positions, 17 (3.2%) of which were variable, with three (0.6%) parsimony-informative (Table 4). The maximum divergence was 0.9% between coffee sequences [i.e. C. kapakata (A. Chev.) Bridson-C. pocsii Bridson] and 2.1% between the outgroup and coffee (three species) sequences.

The length of atpB-rbcL sequences ranged from 684 to 723 bp. A variable region of 2 bp randomly containing A, T, C or G was found between a 7-bp inverted repeat sequence. This variable region was not used for subsequent analyses. Eight indels ranging from 1 to 32 bp were required to align the coffee sequences. Two additional indels of 1 bp were included in the alignment of the outgroup sequence. The atpB-rbcL matrix was composed of 757 aligned positions, 27 (3.6%) of which were variable, with 11 (1.5%) parsimony-informative (Table 4). The maximum divergence was 0.7% between coffee sequences (three pairs of species) and 2.4% between the outgroup and coffee (i.e. P. mannii) sequences.

Analysis of African species

The combined plastid data comprised 1,638 bp of aligned sequence of 26 coffee species and G. jasminoides as outgroup (Table 4). With gaps treated as missing data, parsimony analysis produced 61 MPTs, with a consistency index (CI) of 0.942 (0.840 excluding uninformative characters), a retention index (RI) of 0.941, and a rescaled consistency index (RC) of 0.887. The topology of the MP analysis of Coffea species supported two sister clades and several subclades that were consistent with biogeographic regions (Fig. 1). Clade I comprised four subclades corresponding to one species from East Africa (subclade Ia), the remaining East African species (Ib), C. arabica and two species from East-Central Africa and Lower Guinea (Ic), and two species from Upper Guinea (Id). Clade II comprised species exclusively native to the Guineo-Congolian region. Three subclades were supported corresponding to three species from the Lower Guinea/Congolia region, closely related to C. canephora (subclade IIa), three species from the Lower Guinea/Congolia region (IIb), and the unidentified species from Congo (IIc). C. charrieriana Stoff. & F. Anthony and C. liberica var. liberica Bull. ex Hiern were included in clade II, but they were weakly supported as subclades IId and IIe respectively. The Psilanthus species formed two sister clades but their position was weakly supported by bootstrap values. Conflicts between MPTs lay in the position of C. liberica var. liberica and the Psilanthus species, which were placed either in clade II or as sisters to clades I and II. The topology of the Bayesian majority rule consensus tree (Fig. 1) was identical to that of the MP analysis, except for the position of the Psilanthus species, which were grouped as sister clades of clades I and II in the MP analysis, and of clade II in the Bayesian analysis. As phylogeny of the Psilanthus species remained uncertain, P. ebracteolatus and P. mannii were not included in subsequent analyses. Except for subclade IId, all clades and subclades were supported by at least one synapomorphy (Fig. 1). A maximum of four synapomorphies was observed for subclade Id. Ten Coffea species were characterised by at least one autapomorphy. By contrast, the outgroup presented many more specific characters (25). Few indels were present in subclades, but there were more many in the outgroup.

Fig. 1
figure 1

Phylogenies of 24 Coffea subgenus Coffea species and two Psilanthus species from Africa, using sequences of intergenic spacers trnL-F, trn T-L and atpB-rbcL, with gaps coded as missing data. G. jasminoides was used as the outgroup. Substitutions and indels appearing once are represented by solid boxes and open rhombi respectively. For geographical groupings of species, see White (1979) and Maurin et al. (2007). UG Upper Guinea, LG Lower Guinea, C Congolia, E-CA East-Central Africa, EA East Africa. Left Strict consensus tree generated by MP analysis (CI = 0.942, RI = 0.941, RC = 0.887), with bootstrap values (>50%) listed above branches. Right Bayesian majority rule phylogeny with posterior probabilities (>50%) listed above branches

With gaps treated as new characters, MP analysis yielded 54 MPTs (CI = 0.948, RI = 0.905, RC = 0.858), the consensus of which resembled the tree found with gaps treated as missing data (data not shown). Bootstrap values increased in general, except for C. liberica var. dewevrei, C. heterocalyx and C. kapakata.

Considering the species grouped in biogeographic regions, estimates of nucleotide diversity (Nei 1987) were 0.026 ± 0.013 in Lower Guinea and Congolia (13 species), 0.021 ± 0.012 in East-Central Africa (6 species), 0.019 ± 0.012 in East-Central Africa (4 species) and 0.008 ± 0.005 in Upper Guinea (4 species).

Analysis of African and Madagascan species

The trnL-F sequences of African species and of the outgroup were aligned with those of three species from Central Africa and 42 species from the Madagascar region, available in GenBank (Table 3). One 1-bp deletion was required to align the sequences of three Madagascan species [i.e. C. augagneurii Dubard, C. pervilleana (Baill.) Drake, C. ratsimamangae J.-F. Leroy ex A.P. Davis & Rakotonas.] and one species from Grande Comore (i.e. C. humblotiana) with those of the remaining species. The matrix was composed of 299 aligned positions, 24 (8.0%) of which were variable, with 8 (2.7%) parsimony-informative. Maximum divergence was 2.5% between coffee sequences (C. leroyi A.P. Davis-C. liberica var. dewevrei) and 3.6% between the outgroup and coffee (i.e. C. leroyi, C. stenophylla) sequences.

With gaps treated as missing data, MP analysis produced a single MPT (CI = 1.0, RI = 1.0, RC = 1.0), the topology of which was identical to that of the Bayesian analysis (Fig. 2). All species from the Madagascar region fitted in clade I. No difference was detected among 10 species from Madagascar, 2 species from the Mascarenes (i.e. C. macrocarpa, C. myrtifolia) and those from East Africa. Twenty-seven species formed a subclade, in which three species from northern Madagascar (i.e. C. augagneurii, C. pervilleana, C. ratsimamangae) and one from Grande Comore (i.e. C. humblotiana) grouped together. The species from Cameroon (i.e. C. bakossi, C. mayombensis, C. montekupensis) were placed in clade II. Referring to their distribution, clades I and II were consequently named the clades A-IO (Africa-Indian Ocean) and G-C (Guinea-Congolian) respectively.

Fig. 2
figure 2

Single MPT resulting from analysis of trnL-F sequences of 69 Coffea subgenus Coffea species from Africa (27), Madagascar (38), Grande Comore (1) and the Mascarenes (3), with gaps coded as missing data (CI = 1.0, RI = 1.0, RC = 1.0). G. jasminoides was used as the outgroup. Bayesian posterior probabilities are listed above branches, parsimony bootstrap values below. Clades and subclades are identified in Fig. 1

Analysis of Rubiaceae species

The trnL-F sequences of African and Madagascan species and of the outgroup were aligned with those of 12 Rubiaceae species belonging to other genera than Coffea, and available in GenBank (Table 3). Several indels (1–19 bp) were required to align the new sequences with those of coffee. The matrix was composed of 323 aligned positions, 103 (31.9%) of which were variable, with 62 (19.2%) parsimony-informative. Divergence was found to be high between Rubia and Coffea (24.5–26.1%) and other Ixoroideae species (23.8–24.6%). By contrast, divergence was only 1.1–4.3% within the Ixoroideae subfamily.

The HKY85+G model (Hasegawa et al. 1985) was identified by Modeltest as the best nucleotide substitution model. Base frequencies were A = 0.341, C = 0.195, G = 0.108 and T = 0.356, the ti/tv ratio was 0.767, and the estimated value of the gamma shape parameter was 0.826. ML and NJ analyses produced the same phylogenetic relationships supporting the monophyly of Rubiaceae (Fig. 3). The tree of the trnL-F sequences showed three main strongly supported lineages, corresponding to the subfamilies Rubioideae, Cinchonoideae and Ixoroideae. The distribution of branch lengths was variable among branches. Long branches were observed for subfamily branching while short branches were observed within subfamilies. The clades A-IO and G-C were closely grouped with the other Ixoroideae species.

Fig. 3
figure 3

Maximum likelihood tree of 69 Coffea subgenus Coffea species and 13 species belonging to other Rubiaceae genera based on analysis of trnL-F sequences with gaps coded as missing data. Numbers indicate bootstrap support values above 50% in 10,000 replicates. Clades A-IO and G-C are identified in Fig. 2

Divergence time

The molecular phylogenetic trees generated here and in previous studies did not show any relation between phylogenetic topology and the age of emergence of the volcanic islands that Coffea species have colonised in the Indian Ocean. On one hand, no difference was detected in the trnL-F region among 2 species from Mauritius (i.e. C. macrocarpa and C. myrtifolia), 10 species from Madagascar and the East African species of our study. On the other hand, the species from Grande Comore (i.e. C. humblotiana) presented a sequence identical to that of three species from Madagascar. Such similarities among species from East Africa, Madagascar, Mascarenes and Grande Comore indicate that dispersal of the Coffea subgenus Coffea in the Indian Ocean occurred after the emergence of volcanic islands. Given the age of the youngest island (i.e. Grande Comore), dispersal of Coffea subgenus Coffea species from mainland Africa probably occurred during the last 500,000 years.

Based on the origin of the Rubia genus, substitution rates estimated in the ML analyses varied from 15.5 × 10−9 subst. per site per year to 99.6 × 10−9 subst. per site per year (Table 5). The Coffea subgenus Coffea could thus have diverged about 460,000 years BP or as recently as the last 100,000 years, depending on the cpDNA region considered and calibration.

Table 5 Substitution rate estimated for the Rubia species in the ML analyses, using Upper Miocene start (11.6 mya) and end (5.3 mya) for calibration, and corresponding divergence time estimated for Coffea subgenus Coffea

Discussion

General findings

The present study provided new plastid sequences from Coffea subgenus Coffea species. The intergenic spacer trnT-L was sequenced for the first time and new species were included in the phylogenetic analysis. Non-coding regions were chosen rather than coding regions because they are under lower selection pressure and reveal more divergence among related species (Dixon and Hillis 1993; Gielly and Taberlet 1994). However, non-coding cpDNA regions present variable evolutionary rates and bring variable numbers of potentially informative characters (Shaw et al. 2005; 2007). In coffee, sequences of the intergenic spacers trnL-F and atpB-rbcL were successfully used in previous phylogenetic studies (Cros et al. 1998; Maurin et al. 2007; Tesfaye et al. 2007). The substitutions identified in this study were confirmed by separating forward and reverse sequencing reactions. Our sequences showed a nucleotide composition and a transition/transversion rate similar to those observed for angiosperms in the intergenic regions trnL-F (Bakker et al. 2000) and atpB-rbcL (Manen and Natali 1995; Morton and Clegg 1995; Hoot and Douglas 1998). Sequence divergence was low in the Coffea subgenus Coffea (≤2.4%), as shown in previous studies of the trnL-F region (Cros et al. 1998), other plastid regions (trnL-F intron, rpl16 intron and accD-psa1) and the internal transcribed spacer (ITS 1/5·8S/ITS 2) of nuclear rDNA (Maurin et al. 2007). Divergence was, however, higher in this study than in the intergenic spacers (atpB-rbcL, trnS-G, rpl2-rps19 and rps19-rpl22), introns (atpF, trnG and trnK) and genes (matK, rpl2, rps19 and rpl22) of chloroplast genomes sequenced by Tesfaye et al. (2007). Few parsimony-informative characters were found, only 21 on a 1,638 bp length (1.3%), which explained why the main branches of phylogenetic trees were supported by a low number of characters.

Phylogenetic relationships

The results of our analysis of African species are congruent with those previously published based on plastid and ITS sequences (Lashermes et al. 1997; Cros et al. 1998; Maurin et al. 2007; Tesfaye et al. 2007). Phylogenetic analyses of our dataset revealed two lineages in Coffea subgenus Coffea. Clade A-IO spans the entire geographical range of Coffea subgenus Coffea while clade G-C is restricted to the Guineo-Congolian region. Within clades, species were classified in subclades according to their biogeographic origin (i.e. EA, E-CA, C, LG, UG). Similar groupings were found by Maurin et al. (2007) who included 83% of Coffea species in their study, but the main clades were erroneously named EA-IO (East Africa-Indian Ocean) and LG/C (Lower Guinea/Congolia). These names did not reflect the biogeographical origin of studied material since clade EA-IO included a subclade from Upper Guinea and a species (i.e. C. anthonyi) from Lower Guinea/Congolia. Similarly clade LG/C included C. canephora and C. liberica var. liberica which can be found in Upper Guinea.

Our molecular analysis resolved the species from Cameroon (i.e. C. charrieriana) and Congo (i.e. Coffea sp. ‘Congo’, Coffea sp. ‘Ngongo 3’), studied here for the first time, to two distinct subclades of clade G-C, thus increasing known diversity in Lower Guinea. High levels of similarity were observed in the trnL-F sequences of Coffea sp. ‘Mayombe’, Coffea sp. ‘Ngongo 2’ and Coffea sp. ‘Ngongo 3’, all from the south-west of the Mayombe Mountains in Congo. Moreover, their sequences were identical to that of C. mayombensis whose distribution covers west equatorial Africa, from southern Nigeria to Cabinda, including the Mayombe Mountains (Stoffelen 1998). Such grouping resembled that observed around C. canephora, a widely distributed species, grouped with species with limited distribution (i.e. C. congensis A. Froehner, C. brevipes Hiern, Coffea sp. ‘Nkoumbala’). This confirmed previous observations on the high level of endemicity in the Mayombe Mountains (Cusset 1981, 1989).

Centre of origin

Nucleotide diversity was higher in Lower Guinea and Congolia than in any other biogeographic region, as a consequence of overlap of clades A-IO and G-C in west equatorial Africa. This suggests that Lower Guinea could be the centre of origin of Coffea subgenus Coffea. The origin may thus not be in Kenya as suggested by a biogeographic analysis (Leroy 1982), but in West-Central Africa. According to floristic records, Lower Guinea is the richest sub-centre of endemism of the Guineo-Congolian Region (White 1979). Diversity in Coffea subgenus Coffea has, however, been underestimated for a long time as shown by the case of Cameroon. In the early 1990s, only 5 species were known whereas now 15 species are recognised (Anthony et al. 2006), not including the new species of this study. Sequence diversity appeared maximal in west equatorial Africa, suggesting that Lower Guinea constitutes a major centre of speciation for Coffea subgenus Coffea. This region likely played the role of refuge for coffee trees during the last arid maximum (18,000 years BP) and previous arid phases. In Central Africa, a chain of small refuges has been located near the Atlantic Ocean: in west and south Cameroon, in the Crystal and Chaillu Mountains in Gabon and in the Mayombe Mountains in Congo (Maley 1987, 1996). These areas rich in coffee species are known to be hotspots of biodiversity (Küper et al. 2004). Forest patches could also have survived between refuges and formed forest islands in a grassy sea (Leal 2004).

Radiation in Coffea subgenus Coffea

The low rate of homoplasy and the low number of characters supporting the main branches confirmed the hypothesis of a rapid and radial mode of speciation in Coffea subgenus Coffea (Lashermes et al. 1997; Cros et al. 1998). Judging from genetic distances, the origin of Coffea subgenus Coffea is recent. For example, trnL-F uncorrected pairwise sequence divergence was only 0–2.4% within Coffea species while that between Coffea and Rubia was 24.5–26.1%. Another fact in favour of a recent origin of Coffea subgenus Coffea is the low number of insertions and deletions that were required for plastid sequence alignment. To align the trnL-F sequences of 42 Madagascar species with those of 26 African species, only one short deletion (1 bp) was required. Few indels were also reported for sequence alignment of cpDNA intergenic spacers (Cros et al. 1998) and introns (Tesfaye et al. 2007), and none in coding regions (Tesfaye et al. 2007). Moreover, the sequences of Madagascar species showed high similarities with those of species from the surrounding islands and from East Africa, suggesting a common origin. Biodiversity is, however, considerable in Madagascar (Myers et al. 2000), in particular for coffee trees since the region contains 60% of Coffea subgenus Coffea species (Davis et al. 2006). The majority of Madagascan species have rather limited distribution (Davis et al. 2006), which corresponds to radial and rapid speciation.

Coffee would have spread radially from the centre of origin located in Lower Guinea, westwards up to Upper Guinea and eastwards through Central Africa (Fig. 4). Dispersal could have benefitted from several putative refuges in the Congo-Zaire Basin (Maley 1996; Colyn et al. 1991), in East Central Africa (Lovett 1993) and in East Africa (Fjeldså and Lovett 1997; Roy 1997) where montane regions offered a great range of habitats. Colonisation of Madagascar was doubtless the result of a single dispersal event from the African mainland, followed by insular speciation. Such a scenario has been already proposed to explain speciation in the genera Begonia (Plana et al. 2004) and Gaertnera (Malcomer 2002) in Madagascar. High similarity between C. humblotiana from Grande Comore and three species from north Madagascar (i.e. C. augagneurii, C. pervilleana, C. ratsimamangae) indicate that Grande Comore was colonised by coffee trees from north Madagascar (Maurin et al. 2007) or, more likely according to the geographic position of Grande Comore, in one step when coffee trees crossed the Mozambique channel. Lastly, the species from the Mascarenes showed a common origin with the East African species in our study and with ten Madagascar species, suggesting rapid colonisation of Mauritius and Reunion Island from Madagascar.

Fig. 4
figure 4

Reconstruction of the dispersal of Coffea subgenus Coffea from its centre of origin in Lower Guinea. Actual distribution of Coffea subgenus Coffea and putative forest refuges during the last major arid phase (18,000 years BP) (Maley 1996; Roy 1997) are in grey and black respectively. Biogeographic regions in Africa are outlined by marks according to White (1979) and Maurin et al. (2007). UG Upper Guinea, LG Lower Guinea, C Congolia, E-CA East-Central Africa, EA East Africa

The Dahomey Gap has recently (ca. 4,000 years BP) fragmented rain forest in the Guineo-Congolian region, isolating Upper Guinea from Lower Guinea over a distance of some 200 km in Togo and Benin (Salzmann and Hoelzmann 2005). This savannah barrier is believed to have occupied a far larger area during previous drier phases and to have separated the forest refuges of Ivory Coast-Ghana and west Cameroon by at least 1,200 km. In C. canephora, the existence of genetic groups distributed in Upper Guinea and Central Africa (Berthaud 1986; Dussert et al. 2003), which are easily distinguishable and hardly differentiated (Leroy et al. 1993), suggests that colonisation of Upper Guinea by coffee trees occurred before formation of the Dahomey Gap, similarly to what happened for the shea tree (Fontaine et al. 2004).

Divergence time

Radiation observed in Madagascar and the surrounding islands demonstrated that coffee trees are not remnants of a putative Cretaceous Gondwana flora (Guillaumet and Mangenot 1975; Leroy 1978). Their origin is much more recent than the Gondwana dislocation mentioned by Leroy (1982) and even more recent than the appearance of the volcanic islands around Madagascar. Given the age of the youngest island (i.e. Grande Comore), coffee dispersal occurred within the last 500,000 years. Sequence comparison between Coffea subgenus Coffea species and Rubia, whose origin was dated from the Upper Miocene (Muller 1981), enabled us to estimate the divergence time at about 100,000–450,000 years BP. Even though these ages should be considered as preliminary estimates, radiation in Coffea subgenus Coffea occurred probably in the second half of the Middle Pleistocene (780,000–126,000 years BP). Coffee dispersal could have benefitted from humid conditions during interglacials of the past 200,000 years (Dupont et al. 2001). The rapidity of colonisation points to the effective dispersal of coffee seeds, likely by monkeys in Africa and lemurs in Madagascar. The role of primates in seed dispersal has already been put forward to explain the rapid radiation of Aframomum the genus in Africa (Harris et al. 2000).

Adaptation and speciation

The phylogenetic relationships based on nucleotide sequences were not congruent with either morphological and biochemical classifications (Stoffelen 1998; Dussert et al. 2008) or with the adaptive capacity to grow in specific environments. For example, only three small-leaved species are known in Central Africa: C. anthonyi (Stoffelen et al. 2009), C. charrieriana (Stoffelen et al. 2008) and C. kapakata (Chevalier 1947; Bridson 1994). In our study, the first species was placed in the clade A-IO while the two others were placed in the clade G-C. Another example is the absence of caffeine in coffee seeds. Two caffeine-free species have been reported in Africa up to now, C. pseudozanguebariae Bridson (Hamon et al. 1984) and C. charrieriana (Stoffelen et al. 2008). They were classified in clades A-IO and G-C respectively. However, these species occupy very different habitats, the coastal dry forest on a coral reef substrate close to the Indian Ocean in the case of C. pseudozanguebariae (Anthony et al. 1987) and rain forest in west Cameroon in the case of C. charrieriana (Stoffelen et al. 2008). On the other hand, all caffeine-free species from Madagascar (e.g. C. homollei J.-F. Leroy) (Anthony et al. 1993) were grouped in the clade A-IO together with Madagascan species containing caffeine (e.g. C. lancifolia A. Chev.) (Rakotomalala et al. 1992). The absence of caffeine in seeds and leaves where the component is synthesised does not appear to be associated with one or more particular lineages in Coffea subgenus Coffea. Similarly, autofertility of C. anthonyi (Stoffelen et al. 2009) and C. heterocalyx (Coulibaly et al. 2002) is a character that appeared independently in clades A-IO and G-C. These characters are examples of convergent evolution on the scale of equatorial Africa and the islands in the Indian Ocean. Finally, the high adaptive capacity of Coffea subgenus Coffea probably originates in variations in gene expression mechanisms rather than in the nucleotide composition of the genes themselves.

Accelerated rates of regulatory gene evolution could accompany rapid morphological diversification in adaptive radiation (Barrier et al. 2001). Phenotypic plasticity has been shown to affect plant morphology, anatomy and physiology (Walbot 1996; Sultan 2000) as well as the ecological organisation of populations (Miner et al. 2005). Plasticity of coffee trees could be the key to rapid colonisation of African forests from Guinea to Mozambique and, farther away, of islands in the Indian Ocean. This would explain the restricted distribution of the majority of coffee species and the number of species described up to now using morphological criteria. Further studies on the evolution of Coffea subgenus Coffea should include regulatory genes whose divergence could correlate better with phenotypic evolution than molecular evolution did.