Background

Recent empirical data and theoretical advances suggest that the diversity of complex species assemblages is based at least partially on the hybridization of numerous ancestral allopatric lineages that formed hybrids upon invasion of new environments [15]. A growing amount of studies show that cytoplasmatic (mitochondrial or chloroplast) gene phylogenies of recent diverse species radiations often conflict with phylogenies based on numerous nuclear genes [3, 59]. Theoretical arguments as well as empirical evidence from hybrid zones predict that in newly colonized habitats the effect of transgressive segregation, i.e. the generation of extreme traits in hybrid populations, may lead to a drastically increased phenotypic variation. This effect may in turn serve as a substrate for evolution of novel adaptive traits [2, 1012]. These arguments in combination with an increasing body of evidence showing that species resulting from interspecific hybridization are common in plants [13] and highly probable in animals [5, 7, 914] gave rise to hypotheses about a prominent role of hybridization for the evolution of adaptive radiations [6]. Both, the initial formation of hybrid swarms of originally allopatric populations meeting in a newly colonized habitat ("hybrid swarm origin hypothesis") as well as secondary hybridization of in situ diverged lineages ("syngameon hypothesis") could possibly explain the rapid formation of megadiverse species flocks. The scenario may either involve secondary localized hybridization, i.e. hybridization of parapatric ("microallopatric") lineages within the geographical range of the primary radiation, or alternatively hybridization of sympatrically diverged lineages. Distinguishing between these two alternatives is central for the understanding of the processes that lead to the evolution of megadiversity. The first alternative predicts that increased species richness due to hybridization is dependent primarily on the spatial scale and the accompanied possibility to establish localized metapopulations. The second alternative predicts that hybridisation can aid the build-up of diversity even under fully sympatric conditions. Several studies, some published, some in preparation support the hybrid swarm origin hypothesis for some species assemblages endemic to comparatively large areas [35, 8] but a critical evaluation of the syngameon hypotheses rests on the ability to test for sympatric hybrid speciation.

However, despite increasing evidence for sympatric speciation, uncontested examples remain rare and rarely go beyond the formation of single species-pairs [15, 16]. In addition, evidence for sympatric speciation of complex species-assemblages is often based on mitochondrial phylogenies of limited taxon-sampling. This is problematic as mitochondrial phylogenies or those based on few nuclear loci may obscure true species phylogenies either due to introgressive hybridization among already established species or due to incomplete lineage sorting during rapid speciation. Hence it is not surprising, that studies applying several nuclear markers occasionally yield phylogenetic hypotheses about the origin and pattern of sympatric species assemblages which contrast with mitochondrial hypotheses. As a consequence of the uncertainty about phylogenetic relationships among members of complex species flocks questions about the processes that contribute to sympatric speciation remain difficult to test due to the lack of appropriate model systems.

Until recently the phylogeny of mitochondrial lineages of the cichlid species flock of crater lake Barombi Mbo (Cameroon) [17] was considered as one of the best examples for sympatric speciation [15]. According to this phylogeny which was based on haplotypes of single specimens, the monophyly of the 11 endemic species suggested strongly that they had formed after a single colonisation by a riverine founder species, Sarotherodon galilaeus. Because the lake's conical basin is only 2.15 km in diameter, because there are no migration-barriers along the shore, and because the lake is isolated from nearby river systems by cataracts of its outflow, allopatric scenarios for the origin and diversification of the flock were ruled out. However, in the light of the aforementioned methodological drawbacks of mitochondrial phylogenies, both the monophyly-hypothesis for the Barombi flock and the relationships among its 11 endemic species are worth to be reevaluated.

The ability to score thousands of amplified fragment-length polymorphisms (AFLPs) has created a powerful possibility for the phylogenetic reconstruction of rapidly originated species flocks. It has been successfully applied to a limited number of taxa belonging to the Lake Malawi and Lake Victoria haplochromine species flocks and revealed previously undetectable phylogenetic patterns including those supporting the hybrid swarm origin hypothesis [4, 18, 19]. In this study, we tested hypotheses about sympatric speciation with a focus on hybridization by applying a combination of mitochondrial DNA-sequencing and AFLP-genotyping as well as a set of recently proposed analytical tools [20] to the phylogenetic analysis of a complete and complex species flock.

Results

Mitochondrial phylogenetic inferences

We obtained a DNA sequence-alignment with 2553 bp including two complete mitochondrial genes, NADH dehydrogenase subunit 2 (ND2) and cytochrome b (cytb), partial proline tRNA as well as from part of the control region from all Barombi species (two samples per species) and relevant S. galilaeus populations (one to two samples per population). 2191 sites of the alignment were constant, 198 variable characters were parsimony-uninformative and the number of parsimony-informative characters was 164. Empirical base frequencies in this data set were A = 0.2721; C = 0.3271; G = 0.1268; T = 0.2740. Bootstrapped Maximum Parsimony (MP), Maximum Likelihood (ML) and Neighbour Joining (NJ) trees all recovered identical 50%-majority rule consensus-trees (figure 1). As the sistergroup to the monophyletic Barombi flock a Sarotherodon galilaeus clade was recovered which includes all west African populations except S. galilaeus sanagaensis which emerged as the sistergroup to all other ingroup taxa. Within the Barombi Mbo flock four lineages were recovered with high bootstrap support, one containing the predators of genus Stomatepia, one combining the fine-particel feeders of the genus Sarotherodon, one consisting only of the dwarf zooplanctivore Myaka and one containing the macro-invertebrate or eggfeeding sistertaxa of the genus Konia plus the highly specialized spongivore Pungu. For taxa represented by more than one sample, all conspecific samples grouped together except those of the morphologically merely distinguishable S. caroli and S. linnellii. A rough time estimate as deduced from the ultrametric tree (chronogram) [21] derived from non parametric rate smoothening (NPRS) of bootstrapped ML-distances suggests that all four lineages almost simultaneously came into existence, which must have taken place approx. 1 myr years ago. Soon after this primary radiation, the divergence of the Pungu haplotypes from Konia took place, while all other clades radiated into several species much later. A 94 sample data set with only cytochrome b and partial proline tRNA sequences (3 to 7 samples for the Barombi taxa, and 1 to 7 for all other; 1212 bp with 1003 constant characters, 59 parsimony-uninformative and 150 parsimony-informative characters) confirmed the previous findings for the Barombi flock and suggests that lineage sorting between the four large clades is complete (for a Neighbour Joining Tree see Additional File 1). However, between species within the clades lineage sorting was only complete for a subset of taxa (Pungu, Myaka, Konia ssp, S. lohbergeri and S. steinbachi), but not within Stomatepia ssp., S. caroli and S. linnellii.

AFLP based phylogenetic inferences

The phylogenetic reconstruction based on the same individuals and using 22 restrictive primer combinations with 3489 AFLP size fragments (3004 variable) confirmed the monophyletic origin of the Barombi Mbo flock and the monophyly of the Stomatepia and Konia clades (figure 1). However, several other phylogenetic groupings were recovered with high bootstrap support, which contrasted conspicuously with the mitochondrial phylogenetic hypothesis. S. gal. multifasciatus and S. gal. "Niger" were recovered as sistergroups to the clade containing all other ingroups including S. gal. sanagaensis. Within the Barombi flock Pungu is now sistergroup to a Sarotherodon subclade (S. steinbachi and S. lohbergeri), Myaka is sistergroup to the other Sarotherodon subclade consisting of the pelagic species S. caroli and S. linnellii. The Konia clade is resolved as the sistergroup to the rest of the flock. In contrast to the unresolved basal topology in the mitochondrial tree, the two species from Lake Ejagham are resolved as a monophylum which is the sistergroup to the geographically closest S. galilaeus population from the river Cross. An additional data set with 3 selective amplifications but 80 samples recovered only three supraspecific nodes with moderately high bootstrap support, the monophyly of the Barombi flock, Konia and the node containing the rest of the Barombi flock without Pungu. The latter was placed intermediate between the two well supported nodes.

Figure 1
figure 1

Phylogenetic tree (a) based on AFLP-data and chronogram (b) based mtDNA data of all Barombi Mbo cichlids (inside blue box), reference specimens of the closest riverine ancestor of the Barombi flock, Sarotherodon galilaeus, and two undescribed Sarotherodon from Lake Ejagham (S. sp. "bighead" and S. sp. "mudfeeder"). Photographs refer to the taxon to the left, different populations of S. galilaeus are depicted by two identical photographs only. (a) Numbers at nodes in the AFLP-tree are bootstrap-values (%) of tree reconstructions using the pruned (above) and unpruned (below) AFLP data set. Topologies were identical except for the position of S. g. "Niger", which was sistergroup to S. g. multifasciatus when using the unpruned data (bootstrap value: 86). Long terminal branches in the original phylogram were cut to identical lengths for graphical reasons; interior branch lengths are as in the original phylogram. (b) Numbers at nodes in the mtDNA-chronogram refer to bootstrap-values of the ML (above) and MP (below) tree reconstructions. The absolute time scale above the tree is based on the maximum age of the Barombi Mbo crater formation of 1.0 mya [56]. Encircled numbers mark nodes referring to phylogenetic groups tested with PCO (see Table 1). Red, blue and green shaded boxes unite hypothetical ancient syngameons as deduced from the conflict between mtDNA- and AFLP-based phylogenetic hypotheses. Further details of tree reconstruction see Methods.

Testing for sympatric reticulate speciation

Both the Shimodaira-Hasegawa and Templeton's test confirmed significantly the difference for the alternative tree topologies for each sequence and AFLP data sets, respectively (figure 1). These discordant phylogenies suggested strongly that hybridization among previously evolved lineages had taken place and that at least one taxon of the Barombi Mbo flock, Pungu maclareni, is the result of speciation by hybridization. By identifying the clades which contain taxa with discordant phylogenies we hypothesized that traces of three ancient hybridization events are still detectable in the multilocus AFLP data. To test for the presence of the respective phylogenetic signal for these three hypothetical ancient syngameons in the large AFLP data set, we used the recently developed method of Canonical Phylogenetic Ordination (CPO) [20]. In addition, this method is useful for differentiating between contributions to variation in the observed AFLP character pattern that were generated by the segregation of ancestral polymorphisms inherited from a common ancestor due to incomplete lineage sorting rather than by the contribution of derived characters of hybridizing lineages. This, as the contribution to the variation that is assignable to the phylogenetic group uniting the common ancestor of the hybridizing lineages (coded as phylogenetic variables) is partialled out in the CPO separately from the contribution of the phylogenetic groups characterizing the derived lineages that may have hybridized (see also Methods section).

All phylogenetic groups detected either by the mitochondrial or the AFLP data set, or formulated according to the three hypothetical syngameons were coded as phylogenetic variables and their explanatory value for the variance in the AFLP data set was tested (table 1). As a result, almost all supraspecific clades which were not conflicting among the two data sets were recovered as contributing significantly to the variation in the data set. Conflicting clades, which were either found in the mitochondrial or AFLP phylogenetic hypothesis, yielded non-significant results except for the AFLP based nodes uniting (1) all S. galilaeus and lake taxa without S. gal multifasciatus and S. gal "Niger" and (2) the node of the AFLP based hypothesis of the monophyly of the Lake Ejagham sister pair. Of the three hypothetical syngameons, the one uniting Pungu with its potential ancestor clades represented now by the Konia ssp and S. steinbachi and S. lohbergeri contributed significantly to the variation in the AFLP data set, whereas the other two did not.

Table 1 Results of canonical phylogenetic ordination of AFLP data

As a final test for the putative hybrid origin of Pungu, a tree-based method as outlined in Seehausen [6] was performed in order to test for homoplasy excess introduced by this potential hybrid taxon in the AFLP-data set (figure 2). Only the removal of Pungu resulted in far outlying higher support values for the respective nodes, whereas all other removals resulted in much lower values (Fig. 2). In addition, the other nodes tested were not affected by the removal of Pungu. Interestingly, the removal of several other taxa produced far outlier values in other nodes. However, these removals concerned part of the two hypothetical syngameons that were not supported significantly in the CPO, which were therefore not explicitly tested. The removal of Konia dikume resulted in an increased bootstrap support for the clade uniting Konia and Pungu in the analysis of the 530 loci data set and an accompanying though weaker signal in the S. caroli/S.linnellii clade. The removal of S. caroli resulted in a distinct although weak increase of the bootstrap support for the Myaka/S. linnellii node, the exclusion of both Myaka and Konia eisentrauti increased the bootstrap support for the S. linnellii/S. caroli node. Finally, the removal of S. mariae increased the support for the S. pindu/S. mongo split strongly. Among the riverine populations of S. galilaeus, the removal of S. gal. "Meme" increased strongly the value for the clade uniting the S. gal. "Cross" specimen with the two endemic species from Lake Ejagham (Details in Additional File 2). These additional findings suggest that the comparative phylogenetic analysis of mtDNA and AFLP data alone was not sufficient to uncover all possible hybridization events.

Figure 2
figure 2

Box-plots of the distribution of %-bootstrap support values for the nodes uniting the two Konia species (Figs. 2a) or Sarotherodon lohbergeri and S. steinbachi (Figs. 2b) after iterative removal of single species or taxon groups. Values are based on 2000 bootstrap replicates in the AFLP-based tree reconstruction using the Link et al algorithm [48]; they are based on either the 32-sample/2355 loci dataset for the S. lohbergeri/S. steinbachi split or on the 80 sample / 530 loci dataset for the Konia split. Outside (*) and far outside values (°) are plotted as asterisks and circles, respectively. Arrows denote far outside values resulting in a distinctly higher bootstrap support for the two clades after exclusion of Pungu maclareni. n refers to the number out of 18 maximum possible removal experiments. Removal of Pungu did not result in outside or far outside bootstrap support values for any other node out of 30 nodes tested (Additional information concerning far outside values yielded for other nodes see Additional File 2).

Discussion

Our results demonstrate that the sympatric origin of a diverse and complex species flock was aided substantially by reticulate evolution among lineages that emerged in a much smaller primary radiation. Our data suggest that at least one out of 11 taxa of the species flock in Lake Barombi Mbo, Pungu maclareni, is the result of speciation by hybridization. On the other hand, conflicts among mitochondrial and nuclear data sets as well as results of the homoplasy excess tests suggest that in the course of the evolution of the flock hybridization must have taken place among several additional Barombi taxa, too.

According to the ultrametric time-calibrated tree ("chronogram") based on the well supported phylogeny of mitochondrial haplotypes in the lake, the primary radiation in Barombi Mbo resulted into the almost instantaneous split into four distinct lineages approximately one million years ago. The accumulation of numerous apomorphic characters that support these mitochondrial lineages suggests strongly that they represented reproductively isolated species at that time. Only in an advanced stage of species flock formation and after considerable time had elapsed, their cohesion was broken partially by hybridization events between these lineages. However, according to the chronogram the ancestral mitochondrial clades that contributed, for example, to the hybrid origin of Pungu continued to accumulate apomorphic characters well after the origin of Pungu. This suggests that the species status of the ancient hybridizing lineages in terms of sufficient reproductive isolation must have allowed for their ongoing genetic cohesion and accompanied coalescence of haplotypes before additional speciation events took place.

Traditionally the rapid origin of megadiverse species flocks of extremely closely related species was explained by the combinatory action of three factors: Disruptive natural selection, disruptive sexual selection and partial isolation by distance [2224]. Although introgression among species is known for many fish species [7, 9, 2527] and although reticulate evolution and hybrid origins of species are well documented in plants [14], it is only of recent that hybridization has been proposed to play a major role in generating diversity in animals in general [2730] and in "explosive" speciation in species flocks in particular [6]. Especially in newly colonized habitats with increased ecological opportunities, secondary hybridization of primarily diverged lineages may provide rapidly sources of heritable advantageous variation by producing additional adaptive diversity through recombination of functional genotypes. Interestingly, the species with the most likely hybrid origin in Lake Barombi Mbo, Pungu maclareni, represents an ecologically highly specialized ecotype. Both its peculiar dentition and the accompanying hypertrophic jaw-muscles are unique not only in Barombi Mbo but in cichlids in general [31]. Accordingly, one putative second species with a hybrid genome in the lake, Konia dikume, ranks among the most unusual cichlids as it is the single species which is able to exploit chironimid larvae in the almost oxygen-free deep water due to its extremely high haemoglobin concentration in its blood [32]. In the light of our findings we hypothesize that hybridization produced these extreme phenotypes by transgressive segregation which allowed the exploitation of extreme niches. This supports the notion that speciation by hybridization is not only able to produce additional random variation but may significantly increase the ecological complexity in a rapidly evolving species community by providing extraordinary genetic opportunities. If indeed transgressive segregation of hybrid genotypes plays a major role in the evolution of evolutionary novelties, members of complex species-assemblages with unusual ecological adaptations should predictably turn out to be of hybrid origin more often than species with common adaptations.

Methods

Taxon sampling, collection of samples and deposition of vouchers

We obtained genetic data from relevant Sarotherodon galilaeus populations and from all species endemic to crater lakes Barombi Mbo and Ejagham, which are related to S. galilaeus. Oreochromis niloticus and Sarotherodon melanotheron were used as outgroups based on published information and pilot-study data confirming their outgroup status with respect to S. galilaeus and the investigated species flocks.

Adult specimens from Lake Barombi Mbo were collected by UKS during field visits to the lake in 2001 and 2002 and from Lake Ejagham in 1994. Identification to species is according to Trewavas et al. [33] and was straightforward except for Sarotherodon caroli and S. linnellii. Black adult breeding males of the S. caroli/S. linnellii-phenotype from the deepwater were identified as S. caroli, whereas golden males from the shallow inshore area were identified as S. linnellii. Fin-clips were taken from the right pectoral fin in the field directly after collection and preserved in 96% Ethanol p.A.. Samples from additional specimens belonging to different populations of Sarotherodon galilaeus and outgroups were either collected by UKS in the field in Cameroon or donated by others. Vouchers are deposited in the ichthyological collection of Bavarian State Collection of Zoology (ZSM). Of few S. galilaeus-samples, only photographs of the specimens are available and are deposited in the ZSM, too. Informations about specimens and their species identifications, geographic origin, accession numbers and information about which specimens were sequenced and AFLP typed in different data sets are provided in Additional file 3.

Molecular Methods

DNA Preparation

DNA samples were isolated from approx 10 mm2 fin tissue with the DNeasy™ Tissue Kit (Qiagen). DNA-quality was visually inspected under UV-light on a 0.8% agarose-gel stained with ethidium bromide. For subsequent AFLP-analysis only samples with a clearly visible high-molecular band were used. DNA-concentration was determined using the VersaFluor-Fluorometer-System (BioRad) using the stain Picogreen® dsDNA Quantitation Kit (Molecular Probes). All samples were adjusted to 60 ng/μl.

Mitochondrial DNA Amplification and Sequencing

Two different data sets were assembled, one long data set with approx. 2550 bp including the complete NADH dehydrogenase subunit 2 (ND2), the complete cytochrome b (cytb) gene and part of the proline tRNA, and finally, one part of the control region. For this long data set only two samples per species were used, but a short data set with more individuals but only the cytochrome b and the partial proline tRNA genes was generated, too. ND2 was PCR-amplified using the primers "ND2Met" 5'-CATACCCCAAACATGTTGGT-3'"ND2Trp" 5'-GTSGSTTTTCACTCCCGCTTA-3'; a second fragment containing the cytb, proline and threonine tRNAs and the 5'-end of the control region was amplified with the primers "L14725" 5'-TGACTTGAAAAACCATCGTTG and "H16498" 5'-CCTGAAGTAGGAACCAGATG [34] ; internal sequencing primers were the newly designed "cytL640" 5'-CACGAAACCGGATCAAAC-3' for cytochrome b and "L71" 5'-TACCCCTAGCTCCCAAAGCT-3'7 for the 5'-end of the control region. PCR was performed by using a PTC 220 DYAD thermocycler (MJ Research) in a 25 μl reaction volume using the Expand PCR system (Roche Diagnostics) with 25 pmol of each primer, 20 pmol of dNTPs, 12.5 pmol MgCl2 and 0.88 units of Taq polymerase. PCR parameters were 94°C for 4 min, 35 cycles with 94°C for 1.5 min, 55°C for 1 min, 72°C for 1.5 min, followed by a final elongation at 72°C for 3 min. PCR products were cleaned by using MinElute PCR purification kit (QIAGEN) and their DNA-concentration adjusted to 100 ng/μl. PCR-Products were then used as templates for cycle-sequencing reaction using the "Ready Reaction DyeDeoxy Terminator Cycle Sequencing Kit" (Applied Biosystems) with each of the PCR primers or internal primers. Cycle parameters were the following: 94°C, 2 min; 25 cycles of 94°C, 20 s; 52°C, 10 s; 60°C, 4 min. The sequenced product was filtered through Sephadex-G50 fine (Fluka) packed spin columns (Amersham) to remove unincorporated dye terminators, primers, and salts, and finally dried in a speed-vac. These products were resuspended, electrophoresed and analysed with an ABI PRISM™ 377XL-96 automated sequencer using a 4.25 % polyacrylamid gel (BioRad). Electrophoretic information was transcribed to sequence data using the program Genescan (PE Applied Biosystems).

Individual sequence files were edited and contigs assembled using Sequence Navigator™ (PE Applied Biosystems). Homologous protein-coding regions (ND2, cytb) were aligned manually and confirmed by translating DNA data into amino acid sequences in BioEdit [35]. The short fragment of the control-region was first aligned with default settings in Clustal W as implemented in Sequence Navigator™. No indels larger than 1 basepair (bp) were detected and alignment therefore was straightforward. All sequences were tested for an anti-G bias characteristic of the mitochondrial DNA to confirm that we have collected genuine mitochondrial DNA data [36]. Sequence data have been deposited in GenBank (for accession numbers se Additional File 3).

Amplified Fragment-Length Polymorphisms (AFLPs)

We followed the original protocol of the AFLP-method [37] using the AFLP™ Plant Mapping Kit (Applied Biosystems) with slight modifications of the accompanied protocol: Restriction and ligation were carried out in a single step under standardized conditions in a thermocycler (2 h at 37°C and 8 h at 16°C). 1,5 μl of the preselective amplification product were used in only 10 μl total reaction volume of the selective amplifications. The restriction enzymes used were EcoRI and MseI. Primer sequences for preselective PCR were GACTGCGTACCAATTCA and GATGAGTCCTGAGTAAC. An additional two bases were added to the 3' end for selective PCR. Analogous to the mitochondrial data set we assembled two datasets. For the long one in total 22 primer pairs were used in the following combinations and fluorescent dye-labelling (MseI-primer/ EcoRIDYE): TC-CAFAM, AT-CAFAM, AC-CCNED, TA-CAFAM, AA-CTFAM; AA-GGJOE; AC-CAFAM; AA-CAFAM; AG-CAFAM; AA-CGJOE; AC-CTFAM; AG-CTFAM; AT-CTFAM; AG-GGJOE; TC-CTFAM; TA-GGJOE; AG-ACNED; AT-ACNED; AG-CCNED; AC-ACNED; AT-CCNED; TA-CGNED. For the short data only the five primer pairs TC-CAFAM, TA-GGJOE, AC-CAFAM, AC-CCNED, AT-CAFAM were used but typed for 80 individuals (see Additional File 3).

Selectively amplified fragments were separated on 6% LongRanger polyacrylamid gels (FMC BioProducts) with an ABI PRISM™ 377XL-96 sequencer. Fluorescent signals were detected using the GENESCAN software (Applied Biosystems) with internal size standard (GS-500 ROX; Applied Biosystems). The fluorescent threshold was set to 50 units and the correct identification of ROX-marker bands by GENESCAN was checked for all electropherograms.

Bands between 100.5 bp and 499.5 bp were scored in a first step for presence or absence using the software Binthere (developed by N. Garnhart and available through the T. Kocher laboratory http://hcgs.unh.edu. The program generates aligned spreadsheets from GENESCAN-sized AFLP-data by assigning each sized fragment to a size-class of user-defined distance to the next size-class. Using a spreadsheet routine, fragments were inferred in a second step to be homologous if they differed by no more than 1.00 bp, and if the scoring procedure identified the same size-classes whether scored from small to large size-fragments (forward) or vice versa (backward). Size-classes with inconsistent allocation of fragments according to forward and backward scoring were excluded, as well as adjacent size-classes differing by less than 0.35 bp, which corresponds to the double standard deviation of 0.15 bp of the sequencer [38]. As a result of this procedure a final 0/1 data-matrix for all scored individuals was prepared. All samples of the same primer combination were run on same gel for the 33 specimen dataset and on two gels for the 80 taxon dataset. Single unsuccessful amplifications were repeated and fitted to the data matrix using the size-class assignment criteria as outlined above.

Phylogenetic Analyses

Mitochondrial Sequence Data

MrModeltest 1.1b [39], a simplified version of David Posada's "Modeltest 3.06" [40] was used to perform hierarchical likelihood ratio tests (HLRT) and to calculate approximate Akaike Information criteria (AIC) to determine the optimal nucleotide substitution models for the dataset. If the two tests did not select the same model, we chose AIC over HLRT, as AIC is a useful measure that rewards models for good fit but imposes a penalty for unnecessary parameters [41, 42], which may cause erroneous phylogenetic conclusions especially in Bayesean phylogenetic analyses [43]. For the 33 sequence dataset the HLRT selected the HK+I+Γ model (α = 0.2781; proportion of invariable sites = 0) a transition/transversion (Ti/tv) ratio of 8.5662 was calculated. The AIC selected the HKY+I+ Γ model (α = 0.9468; proportion of invariable sites = 0.4264) with a transition/transversion ratio of 8.6616. For the 94 sequence dataset, the HLRT selected the GTR+ Γ model (α = 0.4014; proportion of invariable sites = 0), whereas AIC selected the GTR+I model (proportion of invariable sites = 0.5817). Empirical base frequencies in the data 2553/1212 data sets were A = 0.2443/0.2440; C = 0.3264/0.3261; G = 0.1443/0.1446; T = 0.2581/0.2852.

The AIC settings were subsequently used for Maximum Likelihood (ML) analyses and to estimate ML distances for minimum evolution (ME) analyses in the program PAUP* 4.0b1.0 (PPC/Altivec) [44] and for the 94 sequence dataset in Treefinder [45]. Maximum Parsimony (MP) analyses were conducted with heuristic searches (TBR branch swapping and MULTREES option effective; 10 random stepwise additions of taxa for 33 sequence set and simple addition for the 94 sequence set; gaps in the control region treated as a 5th character). Non-parametric bootstrapping with 1000 (ME or MP analyses) or 100 (ML) pseudoreplicates was used for testing the robustness of the inferred trees. Tree topologies using the HLRT settings under ME were not different from topologies gained with AIC settings (data not shown).

A LRT [46] as implemented in PAUP* was performed with the respective 33 sequence ME tree under the AIC model assumptions with (-ln L = 6770.47061) and without (-ln L = 7120.92833) molecular clock enforced. Overall constancy of rates of evolution was rejected (chi2 = 701.0154, df = 32, p = 0.001). To date cladogenetic events in the absence of rate constancy, the nonparametric rate smoothing (NPRS) method [21] as implemented in Treefinder [45] was used to construct an ultrametric tree ("chronogram") using the bootstrapped 33 sequence ML derived tree-topology and associated bootstrapped branch lengths as input.

AFLP Data

We used PAUP* [44] to calculate the skewness parameter g1 [47] to test for adequacy of phylogenetic signal in the 0/1-data set. g1 calculated from 1000000 random trees revealed significant non-random structure under the parsimony optimality criterion in the 22 restrictive amplifications for the complete data set: g1 -0.805, 33 samples, 3004 variable sites out of 3489 scored); g1 values were lower in the 3 restrictive amplifications data set used for some homoplasy excess tests (see below): g1 -0.298, 80 samples, 717 variable sites out of 859 scored. 2355 and 530 loci respectively were parsimony informative within Sarotherodon galilaeus sensu lato (excluding Oreochromis niloticus and Sarotherodon melanotheron). "Pruned" data matrices using only those parsimony informative sites were constructed for Principal Canonical Ordination and homoplasy excess tests (see below) in order to account for noise in the data potentially introduced by the distant outgroup. Pairwise genetic distances were calculated from the binary data-matrix with two different algorithms: One developed by Link et al. [48] as implemented in TREECON v.1.3b [49], which is based on shared and unique characters and ignores shared absence. This algorithm is adequate for AFLP data, since noise in the data may often be created by weak signal intensities and hence absence of band-detection despite a possible weak presence of signal. Alternatively, we calculated pair-wise distance matrices with the restriction-site program RESTDIST within the PHYLIP 3.6A2 package [50]. Trees were constructed from the Link et al.-distances with the neighbour joining (NJ) algorithm as implemented within TREECON or from the RESTDIST-distances using the Fitch-and-Margoliash-algorithm [51] with unconstrained branch length as implemented in the program FITCH within the PHYLIP 3.6A2 package [50]. Non-parametric bootstrapping was performed with 100 bootstrapped data sets analyzed 10 times with random input orders, and with local and global optimization.

Hypothesis testing

Alternative phylogenetic hypotheses produced as described by tree topologies based on mtDNA and AFLP data were compared with each other and statistically evaluated using either the Shimodaira-Hasegawa LRT [52] for mitochondrial data or the Templeton's Wilcoxon signed-rank test [53] for AFLP-data (both as implemented in PAUP*).

In order to test for the presence of a phylogenetic signal that possibly reflects reticulate events in the AFLP-data, two methods were applied.

First, a canonical correspondence analysis (CCA) was performed using CANOCO 4.0 [54]. This method has previously been used successfully for testing the effect of tree-like hydrogeographic data and supplementary ecological data on microsatellite allele-frequencies in freshwater fishes [55], as well as an alternative to traditional phylogenetic comparative methods [20]. A presence/absence matrix with the 2355 AFLP-characters which were parsimony-informative within the Sarotherodon galilaeus-clade (S. galilaeus sensu lato and lake endemics) provided the data-matrix to be tested. Phylogenetic hypotheses derived from mtDNA- and AFLP-analyses as well as hypothetical syngameons as derived from the conflict between the two data-sets were translated into a phylogenetic matrix by assigning binary indicator variables, each coding for the membership of investigated samples to phylogenetic groups (e.g. nodes in phylogenetic trees or hypothetical syngameons) [20, 54]. 9999 full model Monte Carlo (MC) permutations were used to test whether a given phylogenetic group as coded by the indicator variables and identified by automatic forward selection of variables was significantly related to the AFLP-data pattern.

Second, a tree-based method as outlined in Seehausen [6] was performed in order to test for homoplasy excess introduced by potential hybrid taxa in the AFLP-data as suggested by the mtDNA-AFLP-phylogeny conflict and the CCA. Theoretically, hybrid taxa are overall intermediate to the parental taxa because they carry a mosaic of parental characters. Consequently, the inclusion of a hybrid taxon into a multilocus based phylogeny estimate introduces an excess of homoplasies and therefore conflict in the subset of clades that contributed to hybridization. Removal of the putative hybrid taxon should therefore decrease the amount of homoplasies and hence increase support for those nodes that unite descendants from taxa which gave rise to a hybrid taxon. In contrast, removal of a non-hybrid taxon should not affect support for the respective nodes. We computed Link et al bootstrap-supports (2000 replicates) for the nodes uniting Sarotherodon steinbachi and S. lohbergeri in the 2355 loci data with n = 16 experiments (each taxon removed once). Analogous support values for the node uniting the Konia eisentrauti and K. dikume were computed with the 530 loci data set with n = 14 experiments, because bootstrap support in the larger data set was always larger than 98.85% and identical runs yielded values differing by more than 1.15 % (data not shown). By reducing the number of loci but increasing the number of samples we obtained a meaningful distribution of bootstrap support values for that node.