Background

Distichodus, the type genus of the endemic African characiform family Distichodontidae, is a morphologically distinctive and moderately speciose lineage of endemic African freshwater fishes. Distichodus species are distributed across the continent, occurring throughout the freshwaters of most of sub-Saharan Africa and the river basins of the Nilo-Sudan, with representation in six of the nine ichthyofaunal provinces of continental Africa (Fig. 1). Although general aspects of the biology of the genus are poorly documented, a few studies indicate that most species are typically diurnally active and found primarily in lentic habitats shoaling in and around grasses along vegetated river banks and swamps [4]. Most species are primarily herbivorous, feeding almost entirely on periphyton, macrophytes, and detritus [5,6,7] [pers. obs.], although some, such as D. lusosso, have been characterized as dietary generalists feeding on a range of both plant and animal materials [6]. Besides playing an important role as a major constituent of the ecologically important herbivore/detrivore guilds in African freshwaters [8], Distichodus is also of considerable socio-economic importance, as many species constitute a highly valued, but increasingly over-exploited, component of artisanal and commercial fisheries across the continent [9], and due to their high fecundity and herbivorous diet are increasingly being cultured in fish farms and lentic water bodies, particularly in western Africa [4].

Fig. 1
figure 1

Geographic distribution and variation in external morphology of Distichodus species diversity. Map of Africa divided into ichthyofaunal provinces (originally defined by Roberts [1], modified by Lévêque [2], and redrawn according to new hydrological basin mapping published by FAO [3]): Congo Basin (CB), East Africa (EA), Nilo-Sudan (NS), Lower Guinea (LG), South Africa (SA), and West Africa (WA). Shaded area represents Distichodus extent of occurrence. Inset bar charts indicate number of Distichodus species present in each ichthyofaunal province: endemic (red) and total (blue) (when more than endemics). Inset frame fish photographs illustrate the extent of variation in body shape, size, and coloration in Distichodus species (from top to bottom: D. hypostomatus, D. sexfasciatus, D. lussoso, D. antonii, D. affinis, D. shenga, D. decemmaculatus)

Currently, the genus contains 25 valid species [10,11,12], most of which are found in the Congo River basin with species diversity decreasing with distance from that central African center of diversity (Fig. 1). Although no morphological synapomorphies have yet been identified for Distichodus, the genus can be distinguished from all other distichodontid genera by the combination of: an upper jaw only slightly mobile with respect to the cranium; an edentulous maxilla not tightly applied posteriorly to the premaxilla; two rows (generally) of gracile, long stalked, bicuspid teeth in each jaw; a highly mobile joint between the angulo-articular and dentary (i.e., a Distichodus-type lower jaw [13]); a reduced dentary portion of the mandibular sensory canal; and a completely pored lateral line [13, 14].

Morphological variation within the genus includes notable differences in overall body size, spanning two orders of magnitude and ranging from over ~ 1 m in the largest species (D. nefasch, D. langi) to ~ 5 cm in the smallest (D. decemmaculatus, D. teugelsi), lateral line scale counts (large- vs. small-scaled), the position of the mouth (terminal vs. inferior), coloration (including presence and number of dark vertical bands and spots), tooth number in the oral jaws, and fin ray counts, among others [14,15,16] (Fig. 1).

The genus Distichodus was erected in the mid-nineteenth century [17] and much of the currently recognized taxonomic diversity had been described by the early twentieth century. As is typical of the taxonomic literature prior to the mid-twentieth century, these older descriptions are highly abbreviated, usually lacking anatomical or ecological detail, and often based on examination of little or no comparative material. In one of the earliest attempts at providing a classification scheme for Distichodus, Boulenger [15] divided the genus in two major groups based on the number of lateral line scales. Boulenger’s classification scheme and the monophyletic status of the genus, however, were not tested until the cladistic study of Vari [13], in which the phylogenetic relationships of the Distichodontidae were investigated using comparative anatomical data. Although only five species of Distichodus were included in his study, Vari’s findings failed to support the hypothesis of Distichodus monophyly, resolving some species more closely related to a clade formed by the diminutive distichodontid genera Nannocharax and Hemigrammocharax.

Contrary to Vari’s work [13], the first molecular phylogenetic study focused on the Distichodontidae [18] found strong support for the monophyly of Distichodus, and while this study did not focus on the genus and sampling of Distichodus species was not exhaustive, it provided the first picture of Distichodus relationships. Despite this recent contribution to understanding of distichodontid relationships, taxonomic problems within Distichodus persist, and ongoing morphometric and morphological studies (Vreven, pers. comm.) indicate that considerable cryptic diversity remains unrecognized by current taxonomy [14, 16, 19]. Because the taxonomy of Distichodus has only been incidentally examined since the work of Boulenger [12, 13, 16, 18, 20], a comprehensive and focused phylogenetic treatment of the genus (including sampling of multiple individuals per species from a broad geographic range) is needed to test the current classification and to lay essential foundations for future investigations of this socio-economically important genus.

Therefore, to advance our understanding of the systematics and evolutionary history of Distichodus, in addition to providing insights into the processes generating fish diversity in freshwater environments of continental Africa, this study investigates the phylogenetic, biogeographic, and chronological framework for the diversification of the genus based on multi-locus comparative DNA sequence data. The study provides a robust phylogenetic framework for testing the adequacy of the current Distichodus taxonomy, informing future revisionary studies and conservation actions, as well as addressing an array of questions about the evolutionary history of the genus. Furthermore, given its pan-African distribution, knowledge on the temporal and geographic context for the diversification of Distichodus holds considerable promise for shedding light on the very poorly understood biogeographic history of the continent’s riverine networks.

Results

Sequence data summary statistics, partitioning scheme and substitution models

The concatenated alignment of eight genes consisted of 6824 sites, of which 1581 were variable and 1339 parsimony-informative. The few instances of failed DNA amplification and/or sequencing resulted in < 2% of missing data. The best partitioning scheme according to the PartitionFinder analysis comprise four partitions: 1) the entire mtDNA control region (cr), 2) 3rd codon positions of the protein-coding mitochondrial genes [co1, cytb, and nd], 3) 1st and 2nd codon positions of the nuclear genes [enc1, glyt, myh6, shx3px3] plus 2nd codon positions of the mitochondrial protein-coding genes, and 4) 3rd codon positions of nuclear genes plus 1st codon positions of mitochondrial protein-coding genes. The best-fit substitution models for these partitions were HKY + G + X, TrN + G + X, TrN + I + X, and TrNef+I + G, respectively. Models that include +X are those in which base frequencies are estimated using maximum likelihood rather than using the empirical frequency distributions.

For the BEAST2 analyses, all model parameter ESS values were greater than 200 and effective topological approximate ESS was always > 570. All best-fit codon models for individual gene trees input to ASTRAL-III were Muse and Gaut’s [21] (MG94) + M0 + F3x4 codon frequency models, with the exception of myh6, for which an MG94 + M3 + F3x4 model was inferred. Terminology for the number of omega (ω) classes follows Yang et al. [22].

Distichodus phylogeny

The phylogeny derived from ML analysis (RAxML tree) of the concatenated alignment of all eight markers is presented in Fig. 2. A summarized version of this phylogeny, highlighting interspecific relationships, is illustrated in Fig. 3. Single-locus phylogenies (enc1, glyt, myh6, sh3px3, mtDNA) are presented in Figs. S1, S2, S3, S4 and S5, respectively. As expected, partially because of variation in substitution rates, single-locus phylogenies differed in the level of resolution and nodal support, with ncDNA markers resulting in less resolved and supported phylogenies when compared to the mtDNA locus.

Fig. 2
figure 2

Total-evidence Distichodus phylogeny as inferred by likelihood in RAxML. Colored circles on nodes indicate degree of clade support as determined by bootstrap values (BS). Nodes labeled A and B represent the two main infrageneric clades. Outgroup taxon (Paradistichodus dimiatus) not shown

Fig. 3
figure 3

Summary tree of the RAxML Distichodus phylogeny highlighting interspecific relationships

Species-tree analyses (SVDquartets and ASTRAL-III) results are presented in Figs. 4 and 5, respectively. BEAST2 analyses yielded very similar topologies (Figs. 6 and S6, S7, S8, S9, S10, S11, S12, S13, S14, S15, S16, S17, S18 and S19), only differing slightly in resolution within one of the two main clades discovered. The RAxML, SVDquartets, and BEAST2 phylogenies exhibit largely congruent topologies with comparable nodal support, resolving the genus into two strongly supported major clades of roughly equivalent species diversity and with the same limits and composition (clades A and B in Figs. 2, 3, 4, 6, and S6, S7, S8, S9, S10, S11, S12, S13, S14, S15, S16, S17, S18 and S19). While these three different analytical methods revealed the same general pattern of relationships in clade A, with disagreement inside clade B (notably among D. engycephalus, D. kasaiensis, D. lusosso, and D. atroventralis), the ASTRAL-III analysis produced a considerably different topology (Fig. 5). The source of this disagreement with the other methods is unclear.

Fig. 4
figure 4

Distichodus species tree generated using the coalescence-based method SVDquartets

Fig. 5
figure 5

Distichodus species tree generated using the coalescence-based method ASTRAL-III

Fig. 6
figure 6

A time-scaled phylogeny of Distichodus. Chronogram resulting from BEAST2 analysis 8, intermediate in terms of calibration node (D, crown) and P95 SMB (30 Ma). Divergence-time estimates are represented by the mean ages of clades. Light red bars correspond to 95% highest posterior density (HPD) intervals of mean node ages. Calibration (fossil-based) node indicated by a dagger (†). Colored circles on nodes indicate degree of clade support as determined by posterior probabilities: black > 0.95, 0.95 ≥ blue ≥0.75, red < 0.75. Outgroup taxon (Nannocharax ansorgii) not shown

Regardless of inference method, and conforming to expectation, nodal support was greater at deeper divergences, while weaker (BS < 75; PP < 0.75) at nodes corresponding to more recent divergences, likely reflecting intraspecific population-level structuring (when sampling multiple individuals per species). Nonetheless, for the most part, interspecific relationships are well supported, with the exception of a subclade of clade B.

Monophyly of Distichodus species

Sampling of multiple individuals per species allowed testing of the monophyletic status of most morphologically diagnosed Distichodus species, and the resulting total evidence phylogeny (Figs. 2 and 3) strongly supports the monophyly of most of the species for which multiple individuals were sampled. However, there are two notable exceptions: the species pairs D. teugelsi / D. decemmaculatus, and D. fasciolatus / D. schenga, each of whose members were resolved as paraphyletic with respect to the other. Specifically, the phylogenetic placement of all sampled individuals of morphologically determined D. teugelsi renders D. decemmaculatus paraphyletic, and similarly, the phylogenetic placement of the two sampled individuals of D. schenga renders D. fasciolatus paraphyletic (Fig. 2). Although based on considerably fewer comparative data, the mtDNA phylogeny agreed, for the most part, with the concatenated phylogeny in the monophyly of most sampled species. Most ncDNA single-locus phylogenies, on the contrary, exhibited lower degrees of resolution and support than the total evidence and mtDNA trees, failing to support the monophyletic status of several of the species evaluated.

Timescale of Distichodus diversification

The resultant chronograms from the BEAST2 analyses are presented in Figs. 6 and S6, S7, S8, S9, S10, S11, S12, S13, S14, S15, S16, S17, S18 and S19, and a summary of the results including age estimates and associated HPD intervals of select nodes in Table 1. A number of findings are apparent regardless of calibration strategy, and therefore, of absolute times of divergence. Notable among these are that Distichodus (crown group) originated shortly after its divergence from Paradistichodus, and that the two major components of the Distichodus radiation (clades A and B) started diversifying roughly concurrently. However, despite this initial chronological correspondence, a large subclade of clade B consisting of seven species (the MRCA of D. kasaiensis and D. atroventralis and all of its descendants) is, for the most part, of comparatively more recent origin.

Table 1 Results from alternative BEAST2 analyses. Estimated mean ages (in Ma) and associated 95% HPD intervals of select nodes: D + P = MRCA of Distichodus & Paradistichodus; D = MRCA of Distichodus species; Dne + Dro = MRCA of D. nefasch & D. rostratus; DA = Distichodus subclade A; DB = Distichodus subclade B. P95 SMB = 95th percentile soft maximum bound (in Ma), as a proxy for the maximum node age constraint

Of the main variables defining calibration strategy (i.e., calibration node and P95 SMB), selection of calibration node appears to have the strongest effect on estimates of divergence times, with node Dne + Dro resulting in the oldest node age estimates (substantially older than those based on any of the other calibration nodes used), irrespective of P95 SMBs. However, node age estimates based on calibration node P + D did not differ considerably from those based on calibration node D, especially under equivalent P95 SMBs. This trend can be explained by the fact that the age difference between these nodes is relatively small, as previously mentioned. Unsurprisingly, older P95 SMBs resulted in older node age estimates, although perhaps not as much as anticipated.

According to the results of analysis 8 (Fig. 6), under what could be considered a “midway” calibration strategy, intermediate in terms of calibration node (D, crown) and P95 SMB (30 Ma), the origins of the Distichodus crown group date to the late Oligocene (24.1 Ma; 95% HPD = 19.44–30.20). Conforming to expectation, this estimate is older (~ 7 Ma) than the only previously published estimate, inferred in the context of a time-scaled phylogeny of the Distichodontidae (17.22 Ma; 95% HPD = 12–23) [18]. The results from analysis 8 also indicate that by the late Miocene/early Pliocene (~ 5 Ma) the bulk of species diversity in the genus was already present. Furthermore, this chronogram implies that stem lineages leading to the modern species D. hypostomatus, D. maculatus, D. engycephalus appeared around 21–18 Ma, while most remaining modern diversity likely originated during the late Miocene. Notably, the most recent divergences (~ 1 Ma) correspond to the seemingly paraphyletic species pairs fasciolatus/shenga and teugelsi/decemmaculatus mentioned above, an observation that supports the notion that each of these pairs may correspond to lineages at the early stages of differentiation and speciation.

Geographic range evolution on the Distichodus phylogeny

Model comparison using AIC and AIC weights (Table 2) indicate support for the M1 model (CB-as-source) over the M2 model (CB-as-sink), while the unconstrained (M0) model received negligible support, regardless of absolute times of divergence (input chronogram). Likewise, the pattern of range shifts out of and expansions from the Congo Basin (the ancestral area) implied by the preferred model (M1) was equivalent across analyses, irrespective of absolute node ages and despite minor topological differences between input chronograms (particularly with respect to the relative placement of D. engycephalus). Specifically, the M1 model inferred six range shifts for Distichodus out of the Congo Basin (the ancestral area) and three different range expansions from the Congo Basin to include adjacent ichthyofaunal provinces (Figs. 7, S20, and S21). Support/signal for model M1, however, appears to be stronger when based on older times of divergence (Table 2).

Table 2 Results from DEC* analysis of geographic range evolution on the Distichodus phylogeny. Results are presented for each of the three analyses based on different BEAST2 input chronograms (derived from analyses 5, 8, and 14). Comparison of alternative models (biogeographic hypotheses) and their support as assessed via Akaike weights. M0 (unconstrained, dispersal to and from the Congo Basin); M1 (allowing only dispersal out of the Congo Basin); M2 (allowing only dispersal into the Congo Basin); dispersal (d); extinction (e); number of parameters (k); Akaike information criterion (AIC); Akaike Weights (AW)
Fig. 7
figure 7

A spatiotemporal reconstruction of Distichodus range evolution. Based on the optimal DEC* model (M1; CB-as-source) and input chronogram resultant from BEAST2 analysis 8 (Fig. 7). Ichthyofaunal provinces color-coded and abbreviated as in Fig. 1. Probabilities of ancestral areas at each node are presented in Table S1

Discussion

Distichodus relationships and taxonomic implications

Here we present the first comprehensive phylogenetic, temporal, and biogeographic framework for examination of the current taxonomy and evolutionary history of Distichodus and for future evolutionary studies of the genus. Regardless of analytical method (except for ASTRAL-III, but see below), our results, based on a dataset with considerably more inclusive taxon, character, and geographic sampling for Distichodus than previous works, strongly support the existence of two roughly equal-sized, and reciprocally monophyletic lineages within the genus, while corroborating the monophyletic status of most currently recognized species. RAxML, SVDquartets, and BEAST2 topologies are largely congruent, with some swapping of taxa inside the two-clade Distichodus structure that is also supported by morphology [13]. Only the ASTRAL-III analysis did not conform to this general picture of Distichodus relationships, but there are two confounding issues then at play. First, ASTRAL-III may be sensitive to gene tree estimation error. The codon model approach to gene tree inference used here should, in principle, be the most accurate method for gene tree inference (of those currently available reversible Markov models), given its hierarchical modeling structure [23]. However, it is still not immune to the requirements of large amounts of data [24]. Some close-to-zero-length branches in the individual gene trees may either be (1) a true artifact of ILS, or (2) a consequence of insufficient data (in terms of gene length, or in terms of evolutionary rate distribution). However, distinguishing differences between real ILS and insufficient data is not possible from the current analysis. We therefore focused our attention and present our conclusions based on the results with the largest overall congruence.

Conforming to expectation, nodal support tends to be higher at more basal nodes (deeper divergences), whereas more recent divergences are, on average, less strongly supported. Low nodal support and instances of conflict between analytical methods of phylogenetic inference are particularly evident for the clade consisting of the predominantly large-bodied species (D. atroventralis, D. lusosso, D. kasaiensis, D. antonii, and D. sexfasciatus), suggesting that additional data will be necessary to resolve interspecific relationships for this particular section of the tree. Disagreement between methods in this part of the Distichodus tree is an aspect worth revisiting in the future with genome-wide NGS-generated data, as larger amounts of DNA sequence data might be capable of better resolving and supporting these divergences. Besides the obvious reasons for wanting to unambiguously resolve this part of the Distichodus tree, such an endeavor is of special interest because the highly disparate trophic-related morphologies displayed by members of this clade are undoubtedly an interesting character system from both evolutionary ecological and functional morphological perspectives.

Despite some of the disagreements between inference methods, our results offer a general working hypothesis of Distichodus relationships and, with few exceptions, are consistent with the current species-level morphology-based taxonomy of the group. Instances of questionable species monophyly and therefore in conflict with the current classification are discussed below.

Paraphyly of D. teugelsi with respect to D. decemmaculatus

Problems with the species recognition of the two dwarf species, D. decemmaculatus and D. teugelsi, have been noted by Verheyen et al. [19], and are confirmed here. Species identification has previously been based on the presence (decemmaculatus) or absence (teugelsi) of a series of dark spots or bars along the flanks, and 20 (decemmaculatus) versus 16 (teugelsi) scales around the caudal peduncle [11]. While our study finds strong support for a teugelsi/decemmaculatus clade, samples tentatively identified as D. teugelsi from the Kwilu River in the Kasai basin (with 16–17 scales around the caudal peduncle and variously marked spots or bars along the flanks), form a well-supported sister clade to the remaining samples. While samples of D. teugelsi from the type locality, the Lefini River, a right bank tributary of the Congo River upstream of Pool Malebo (lacking spots or bars on the flanks and with 16 scales around the caudal peduncle), form a clade sister to the D. decemmaculatus samples, all tentatively identified here as D. decemmaculatus or D. cf. decemmaculatus. Among these we record caudal peduncle scales counts ranging from 18 to 20, and flank pigmentation ranging from virtually absent to clearly marked and strongly spotted. While no taxonomic solution is proposed here, based on the molecular analysis presented and the observation of high variability in both pigmentation and scale counts in geographically disparate samples of both “species”, further study of the teugelsi/decemmaculatus clade, including representatives of populations across the range of each putative taxon, is needed. We note further that, as for the fasciolatus/shenga species pair discussed below, our estimation of the divergence time for the teugelsi/decemmaculatus pair (based on analysis 8; Fig. 6) is among the most recent (~ 1 Ma). The fact that this is a very recent divergence, might explain the resulting paraphyletic pattern. It is well known that genetic variation shared between closely related species can be due to retention of ancestral genetic polymorphisms resulting from incomplete lineage sorting (ILS) [25], a process that can confound phylogenetic inference and hinder robust tests of monophyly in recently diverged species pairs. Whereas mtDNA introgressive hybridization has been also recognized as one cause of misleading inferences of paraphyly, the overall congruence between the nc- and mtDNA signal involving the teugelsi/decemmaculatus pair supports ILS instead of introgression as a probable explanation for the observed pattern of paraphyly [26].

Paraphyly of D. fasciolatus with respect to D. schenga

Representatives of the widespread species D. fasciolatus are rendered paraphyletic by the placement of the two sampled individuals of the southern African species, D. schenga, a middle and lower Zambezi endemic, which are placed well nested within a strongly supported D. fasciolatus clade (Fig. 2). Distichodus schenga (type locality Tete, Zambesi River) was described by Peters in 1852 and D. fasciolatus by Boulenger in 1898 (type localities in the lower Congo River region), and the descriptions of both are minimal, not allowing for morphological species discrimination. Possibly because of this, Boulenger [15] did not include D. schenga in his key to Distichodus, and by implication did not recognize it as distinct from D. fasciolatus. Our molecular data clearly suggest that the synonomy of D. fasciolatus with D. schenga is in order, however ongoing morphometric and morphological study of the entire “fasciolatus-complex” is currently underway (Vreven, pers. comm.), and pending the results of that study we defer proposing a formal taxonomic synonomy based solely on our molecular data and minimal sampling of putative D. schenga from across the Zambezi basin.

We do note however, that the phylogenetic and chronological pattern revealed by our study (Figs. 2, 4, and 6), coupled with the allopatric distribution of these two taxa, suggest that populations currently recognized as D. schenga could have diverged from a lineage/population of D. fasciolatus that colonized the Upper Zambezi headwaters from the Kasai during the Pleistocene, when the two river systems shared a past connection [27,28,29]. This chronological and geographic dispersal scenario out of the Congo Basin is consistent with our estimated divergence time for this species pair (~ 1 Ma) (Fig. 6) and the inferred range shift involving D. schenga (Fig. 7), and has been hypothesized for various other fish taxa across the headwaters of the Congo-Zambezi watersheds [30,31,32].

In any case, a phylogenetic pattern of short, poorly supported branches is an indicator of recent species divergence that precluded mtDNA lineages from sorting to reciprocal monophyly [33]. Therefore, as for the teugelsi/decemmaculatus pair, we cannot rule out the possibility that the inferred paraphyly of D. fasciolatus with respect to D. schenga is an artifact of ILS issues. However, signal discordance between mtDNA and nuclear markers involving the fasciolatus/schenga pair strongly suggest that post-divergence introgressive hybridization could also explain the inferred paraphyly of D. fasciolatus with respect to D. schenga.

A spatiotemporal framework for Distichodus diversification

In the context of their time-scaled phylogeny of the suborder Citharinoidei, Arroyave et al. [18] were among the first authors to estimate an age for the origin of Distichodus and the timing of diversification within the genus. Their chronogram suggested that the Distichodus crown group appeared in the Miocene (~ 17 Ma), but that most of the species diversity likely originated during the past 5 Ma. These inferences were based on a molecular clock calibrated using ~ 7.5 Ma Distichodus fossilized dentition [34], which at the time was the oldest known fossil assignable to the genus. The recent discovery of a considerably older (18–19 Ma) Distichodus fossil [35], however, prompted our reexamination of the timescale of Distichodus diversification in the context of a larger dataset, both in terms of molecular markers (8 vs. 7 loci) and taxon sampling (20 vs. 16 spp.). This older fossil, however, presented us with the challenge of accurately assigning it to a node for the purpose of calibrating the molecular clock and estimating absolute times of divergence in the phylogeny of Distichodus.

Whereas the approach devised herein to address the uncertainties associated with the fossil-based calibration of the molecular clock resulted in multiple alternative chronograms, from our knowledge of the study subject we believe that some of these alternative calibration scenarios might be either overly conservative (e.g., D + P stem) or too liberal (e.g., Dne + Dro crown), therefore possibly resulting in under- or overestimation of node ages, respectively. Nonetheless, because we have no means to empirically falsify any of these alternative calibration scenarios, we consider it important and valuable to offer the reader the possibility of choosing among alternative scenarios (including those we think too extreme) based on their own knowledge of the study subject and their personal beliefs regarding best practices for justifying fossil calibrations [36].

For the most part, our findings imply a temporal framework for the diversification of Distichodus older than previously reported [18], irrespective of calibration strategy. Only analyses 1 and 2, based on calibration node D + P (stem), resulted in younger divergence time estimates (Table 1). While at present we feel more comfortable grounding our discussion of the temporal and geographic context for the diversification of Distichodus in the results from analysis 8 (intermediate in terms of calibration node [D, crown] and P95 SMB [30 Ma]), we acknowledge that, should this calibration fossil be eventually confirmed as D. nefasch, D. rostratus, or their MRCA (a possibility due to fossil tooth shape, size, and geographic distribution), a reinterpretation of the biogeographic history will be necessary to reconcile the inferred patterns of geographic range evolution with a chronological framework more than twice as old as the one discussed below (Fig. S21).

Despite the high ichthyofaunal diversity of Afrotropical continental waters, few studies have investigated the chronological and biogeographic context for the diversification of African freshwater fish clades, among which only a handful have focused on Pan-African riverine genus-level radiations, namely Hydrocynus [30], Mastacembelus [37, 38] and the species-rich Synodontis [32, 39]. Notably, a Miocene diversification for Distichodus, as implied by the chronogram resulting from analysis 8 (Fig. 6), broadly concurs with previous findings for both Mastacembelus and Synodontis [32, 38, 39]. Similarly, a concurrence of Miocene diversification among various lineages of fishes, frogs, and crabs has been pointed out by Daniels et al. [40], who suggest this likely reflects a shared response to mesic climatic shifts resulting in marked allopatric differentiation among each of these freshwater lineages during the mid- to late Miocene. While to our knowledge there are no empirical studies proving a causal relationship between particular paleohydrological events and diversification patterns in African freshwater fishes, some authors have suggested that Miocene tectonic and climatic upheaval may have influenced or even triggered diversification [31, 38]. The Miocene geological epoch was the setting for widespread epeirogenic uplift in Africa and global climate change that profoundly contributed to shaping the modern African hydrological landscape [41,42,43], which in turn, it is believed, promoted diversification in freshwater fishes as a consequence of river discharge shifts (due to climate change) and drainage disruption and modification (due to rifting) [44]. Our findings about the timing of Distichodus diversification add to instances of Miocene continent-wide freshwater radiations, and therefore to a growing body of evidence in support for a “hydrogeological” hypothesis, that paleohydrological and paleoclimatic changes promote landscape evolution which in turn promotes cladogenesis in freshwater organisms [45, 46]. Further research, however, ideally in a multi-taxon comparative framework, is certainly needed to robustly test hypotheses of concerted responses to paleogeologic and paleoclimatic scenarios.

Analysis of geographic range evolution on the phylogeny of Distichodus favored a biogeographic model in which the Congo Basin (CB) is the center of origin (ancestral area) and source of the geographic diversity of the genus, irrespective of absolute times of divergence. In particular, the biogeographic reconstruction based on the chronogram resulting from analysis 8 (Fig. 7), implies that most cladogenetic events occurred in lineages still confined to the CB throughout most of the Miocene, but also multiple lineage range shifts out of and expansions from the CB into adjacent ichthyofaunal provinces at different times during the evolutionary history of the group. Only in the late Miocene (~ 9–7 Ma) are the first recorded instances of range shifts out of the CB and of cladogenesis occurring in other ichthyofaunal provinces, namely the Nilo-Sudan (NS) and Lower Guinea (LG). The remaining instances of range shifts and expansions are more recent, dating back to the Pliocene. While most ichthyofaunal provinces appear to have been colonized only once (or twice in the case of UG), our results indicate that LG was independently colonized by five different lineages, mostly during the Pliocene.

Our reconstruction of the biogeographic history of Distichodus suggests a central role of the CB in the distribution of the continent’s freshwater ichthyofauna during the late Cenozoic, offering support to the hypothesis that the CB is the source of the ichthyofauna of less diverse river basins throughout continental Africa [44]. While a CB origin has also been postulated for the African tigerfish Hydrocynus [30], other continent-wide African freshwater fish genera such as Synodontis [32] and Mastacembelus [38] do not conform to this pattern and suggest repeated independent colonization into the CB. Considering the vast geographic area under study, and that complex evolutionary histories of dispersal and vicariance are likely to exist among the different fish lineages, these conflicting biogeographic histories certainly suggest a complex scenario for the assemblage of the continent’s ichthyofaunal communities.

Conclusions

The spatiotemporal framework for the diversification of African freshwater fish genus Distichodus presented herein provides a significant advance in our knowledge of the evolutionary history of this ecologically and socio-economically important group of fishes. With few exceptions, the resulting phylogeny is consistent with the current species-level taxonomy of the group, offering a working hypothesis of Distichodus relationships that will serve as phylogenetic framework for future evolutionary studies involving phenotypic and genomic systems. The few instances of species paraphyly (D. teugelsi, D. fasciolatus) revealed in our favored phylogeny are likely a consequence of introgression and/or incomplete lineage sorting due to recent speciation. Therefore, we refrain from making taxonomic/nomenclatural changes pending further morphological assessment based on a larger sample of comparative material. While analysis of geographic range evolution favored a biogeographic scenario in which the Congo Basin is the source of geographic diversity of the genus, this finding is both in agreement and conflict with previous studies of other continent-wide African freshwater fish genera, suggesting a complex scenario for the assemblage of Africa’s continental ichthyofaunal communities.

Methods

Taxon sampling

Ingroup sampling consisted of 133 specimens representing 20 of the 25 valid Distichodus species, thereby encompassing 80% of Distichodus currently recognized diversity (Table 3). Distichodus brevipinnis, D. langi, D. mossambicus, D. rufigiensis, and the newly described D. ingae [12], were not included in analyses due to unavailability of tissues. With the exception of D. altus, D. nefasch, D. rostratus, and D. petersii, for which only a single tissue sample was available, multiple individuals per species were sequenced to sample as large a portion of each species’ range as possible (Table 3). In addition to increasing geographic coverage, inclusion of multiple individuals per species allowed for testing the monophyletic status—and therefore species limits under the phylogenetic species concept [47, 48]––of nominal species from which more than one individual was available for sequencing. Sampling of multiple individuals per species, however, was not aimed at making inferences about tokogenetic (intraspecific) relationships and/or phylogeographic patterns. Paradistichodus dimidiatus was included as outgroup based on the findings from a relatively recent molecular phylogenetic study that investigated relationships of the Distichodontidae [18], which resolved the monotypic genus Paradistichodus as the sister group of Distichodus. Similarly, Nannocharax ansorgii was included as additional and outermost outgroup for molecular dating and inference of geographic range evolution analyses.

Table 3 Taxa, voucher specimens (catalog and tissue numbers), and GenBank accession numbers for the gene sequences included in the analyses. Institutional abbreviations: AMCC (Ambrose Monell CryoCollection, AMNH), AMNH (American Museum of Natural History), CU (Cornell University Museum of Vertebrates), SAIAB (South African Institute for Aquatic Biodiversity), MRAC (Royal Museum for Central Africa)

Most tissue samples were obtained from specimens collected during recent expeditions in West and West-Central Africa by a research team from the American Museum of Natural History (AMNH) (led by co-author MLJS). Specimens were handled and euthanized prior to preservation in accordance with recommended guidelines for the use of fishes in research [49] and stress was ameliorated by minimizing handling and through the use of the anesthetic Tricaine mesylate (MS-222) for euthanasia. Tissue samples were taken in the field and immediately preserved in 95% ethanol. Voucher specimens were fixed in formalin and subsequently transferred to 70% ethanol for long-term storage. Data for specimens cataloged and stored in the ichthyology collection of the AMNH, are available online at http://sci-web-001.amnh.org/db/emuwebamnh/index.php.

Specimen collection was made in accordance with ethical and legal guidelines for international animal research approved by the AMNH Institutional Animal Care and Use Committee (IACUC) (approval #36/06). The AMNH IACUC has guidelines relating to studies involving its members in different countries, and this study conforms to those guidelines. Specimen collection and exportation of samples used in this study follow institutional and national ethical and legal guidelines of the Ministry of Fishery and Aquaculture, Republic of Guinea, No. 65/MPA/DGAGSP/11; the Ministry of Scientific Research and Technical Innovation, Republic of Congo, No. 031/MRSIT/DGRST/GERBID.06.13; and the Ministry of Agriculture and Fisheries, Democratic Republic of Congo, No. 037/DP/SG/AGRIPEL/16.

Additional samples were obtained from colleagues at the Cornell University Museum of Vertebrates (CUMV), the Royal Museum for Central Africa (MRAC), and the South African Institute for Aquatic Biodiversity (SAIAB). Voucher specimens are deposited in the ichthyology collections of the AMNH, CUMV, MRAC, and SAIAB. Species identity of non-AMNH vouchers was confirmed either by direct examination of loaned specimens, photographs provided, or on taxonomic authority of the loaning institution. Voucher catalog numbers and GenBank accession numbers for the gene sequences generated and included in this study are listed in Table 3.

Gene sampling and nucleotide data collection

Eight gene fragments, including the seven protein-coding loci sampled by Arroyave et al. [18] to address distichodontid interrelationships (co1, cytb, enc1, glyt, myh6, nd2, and sh3px3) were sequenced. Additionally, a faster-evolving mitochondrial non-coding marker, control region (cr), was added to address more recent divergences within the genus. DNA sequence data was generated from a total of 133 Distichodus individuals. General procedures for DNA extraction, amplification, and purification, along with primers and thermal profiles for sequencing the protein-coding genes used in this study follow Arroyave and Stiassny [50] and Arroyave et al. [18]. Distichodus-specific primers for cr (cr_Dist_f: 5′-AGCGCCGGTCTTGTAATCCG-3′; cr_Dist_r: 5′-TGCTTGTGGAACTTTCTAGGGTCCAT-3′) were designed using the software Primer3 [51] from conserved flanking regions of aligned mtDNA control region sequences extracted from the two distichodontid complete mitochondrial genomes available in GenBank (Distichodus sexfasciatus AB070242 and Ichthyborus sp. AP011993). Amplification of cr via PCR was carried out using the following thermal profile: 5-min initial denaturation at 95 °C, followed by 35 cycles of denaturation at 95 °C for 60 s, annealing at 58 °C for 60 s, and extension at 72 °C for 120 s, followed by a 10-min final extension at 72 °C.

Sequence editing and partitioning scheme/substitution model selection

Contig assembly and sequence editing was performed using Geneious v.11.0.2 [52]. IUPAC nucleotide ambiguity codes were used to represent heterozygous sites. The resulting sequences were trimmed to exclude primer regions and examined for appropriateness/homology using BLASTx [53]. Each gene was aligned using MUSCLE [54] under default parameters as implemented in Geneious, followed by concatenation of individual alignments. All sequences were checked for stop codons and for miscalled amino acids by examining translation alignments.

Best-fit partitioning schemes and models of molecular evolution for the nucleotide data were determined using PartitionFinder2 [55] based on 22 pre-defined data blocks: the non-coding mtDNA control region (1 block) plus the 1st, 2nd, and 3rd codon positions of the seven protein-coding genes (3 positions × 7 genes). The PartitionFinder2 greedy algorithm was employed to search for an optimal scheme under the assumption of independent model parameters and branch lengths for each partition. Selection of the partitioning scheme and models over the set of schemes and models produced during greedy search was accomplished using the Schwarz/Bayesian Information Criterion (BIC) [56].

Phylogenetic, biogeographic, and chronological analyses

Various analytical approaches were employed to infer phylogenetic relationships in Distichodus from the multilocus dataset generated in this study, one of which also simultaneously estimates absolute times of divergence in the resultant phylogeny. The results from the latter approach were subsequently used in analyses for testing historical biogeographic hypotheses of geographic range evolution in Distichodus.

Maximum likelihood (ML) estimation of phylogeny

Phylogenetic analysis of the concatenated alignment of the eight sampled genes under a Total Evidence/Simultaneous Analysis [57, 58] approach was performed using the ML optimality criterion. Furthermore, to examine the degree of variation in topology, resolution, and clade support among the individual sampled loci, and to complement the inferences made from the simultaneous analysis of all markers, each of the nuclear genes (enc1, glyt, myh6, sh3px3) and a concatenated alignment of the mitochondrial genes (co1, cr, cytb, nd2; effectively inherited as a single locus), were independently analyzed, also using the ML optimality criterion. ML phylogenetic analyses were conducted with RAxML v.8 [59] through the CIPRES Science Gateway v.3.3 [60] as a single partition under the GTRGAMMA model with four rate classes using full ML optimization for the tree search and 1000 rapid bootstrap (BS) searches to assess nodal support [61].

Species-tree approaches

Although concatenation methods have been suggested to often perform well when incomplete lineage sorting (ILS) levels are low [24], the degree of ILS in Distochodus is unknown. To explore the outcomes of ILS-aware species-tree analyses relative to concatenation, both SVDquartets [62] and ASTRAL-III [63] were employed. SVDquartets has been suggested to perform well with low ILS and small numbers of sites per gene, and ASTRAL methods have been suggested to perform well under high ILS conditions, but may be sensitive to small numbers of sites per gene [24]. SVDquartets analysis was conducted in PAUP* v4.0a164 [64] sampling all ~ 8.6 million quartets under the multispecies coalescent on the full dataset, using the default QFM quartet assembly method. Bootstrap support values were assembled onto the SVDquartets tree using the sumtrees command in the DendroPy package [65]. Gene trees input to ASTRAL-III were estimated from best-fit codon models inferred in codonPhyML [66] under default search intensity, using custom R scripts written by the authors. Because the mitochondrial genome does not undergo recombination and is inherited as a single locus, the three protein-coding mitochondrial genes were fit with a single codon model and inferred gene tree. Gene trees for each autosomal locus were inferred separately.

Bayesian co-estimation of phylogeny and divergence times

Prior to co-estimation of phylogeny and divergence times, a new data matrix was created from the original multi-individual, multi-locus matrix, by including DNA sequence data from only a single individual per species, from or near the type locality whenever possible (for each sampled species, the first individual listed in Table 3). The resulting reduced matrix was analyzed in BEAST v.2.5.0 [67] under the optimal partitioning scheme and substitution models suggested by the PartitionFinder2 analysis. Node ages were estimated using a Bayesian relaxed-clock method [68] under the uncorrelated lognormal (UCLN) rate variation model, and assuming a birth-death process prior for topology and divergence times. By default, the prior on the mean parameter of the UCLN clock model (ucldMean.c) is a uniform distribution on the interval (0,∞), which is an uninformative and improper prior (it does not integrate to 1). Although improper priors can sometimes lead to proper posterior distributions, they may also have undesired effects and cause problems with mixing and convergence [69]. Based on previous findings regarding substitution rates in Distichodus [18], we assumed a log-normally distributed prior for the clock rate (ucldMean.c) with hyperparameters μ = 0.003 and σ = 0.5. On the other hand, the standard deviation parameter of the UCLN clock model (ucldStdev.c) is by default assigned a gamma distribution prior. Variation in substitution rates among branches in Distichodus, however, appears to be low in general [18]. Accordingly, we assumed an exponential prior distribution with 95% of the probability density on values < 1 for the standard deviation of the UCLN (ucldStdev.c).

The molecular clock was calibrated based on early Miocene (ca. 18 Ma) fossilized dentition attributable to Distichodus recovered from deposits of the Maradah Formation in Jabal Zaltan, Libya, by far the oldest fossil unambiguously assignable to the genus [35]. In fact, this fossil pushes back the first known appearance of Distichodus in the fossil record by 10 Ma with respect to the Distichodus calibration fossil used by Arroyave et al. [18] to infer a time-scaled phylogeny of citharinoid fishes. Although the Maradah fossil is unquestionably diagnostic of Distichodus (tall, slender necked tooth with a bifid apex bearing characteristically short and rounded lobes) and could potentially be ascribed to either Distichodus nefasch or D. rostratus on the basis of size and geographic distribution, its exact phylogenetic placement is unknown. The absence of relevant comparative morphological data in a phylogenetic context to which to integrate the fossil taxon, coupled with its fragmentary nature, renders it difficult to confidently assign it to a particular node and to determine whether it should be used to constrain the age of the stem or the crown group of the calibration node. Because of this phylogenetic uncertainty, along with the challenge of objectively establishing a maximum age constraint to the calibration node, we conducted a series of analyses (Table 4) to assess the robustness of node ages to analytical ambiguity and to offer alternative output scenarios based on a variety of reasonable input parameters, particularly with respect to the phylogenetic placement of the calibration node and its maximum age constraint. Specifically, we used three alternative calibration nodes: 1) MRCA of Distichodus and Paradistichodus (D + P), 2) MRCA of Distichodus (D), and 3) MRCA of D. nefasch and D. rostratus (Dne + Dro). The rationale behind this proposal is that, at the very least, the calibration fossil could be used to constrain the age of divergence between Distichodus and its sister group, Paradistichodus, but under more liberal phylogenetic designations, it could also be used to constrain the age of the entire genus or even the divergence between the species D. nefasch and D. rostratus. Furthermore, each calibration node was constrained both as stem and as crown group. Additionally, the temporal uncertainty of calibration nodes was modeled using log-normally distributed priors with a hard minimum bound set by the age of the fossil (18 Ma) and one of three alternative 95th percentile soft maximum bounds (P95 SMBs): 20, 30, and 40 Ma (Fig. 8; Table 4). The combinatorial exercise of choosing one of three alternative calibration nodes, constrained as stem or crown, and modeled by a log-normally distributed prior characterized by one of three alternative P95 SMBs, resulted in 18 different analyses (although effectively 15 since the node representing the MRCA of Distichodus as stem is equivalent to the node representing the MRCA of Distichodus and Paradistichodus as crown (Table 4). In each analysis, root age was indirectly constrained (as an implied prior) by the combined effects of the calibration prior on other internal node and the prior for topology and divergence times (birth-death process).

Table 4 Alternative BEAST2 analyses (1–15) for co-estimating phylogeny and divergence times in Distichodus resulting from variable calibration strategies (calibration node, stem vs. crown group, and 95th percentile [P95] soft maximum bound [SMB] of calibration prior)
Fig. 8
figure 8

Alternative log-normally distributed priors used to account for temporal uncertainty of calibration nodes. Each prior probability density function (PDF) is characterized by a hard minimum bound of 18 Ma (set by the age of the calibration fossil), a standard deviation (σ) of 0.5, and a variable mean (μ) (in real space) that probabilistically models the extent to which the node age spreads into the past: μ = 19 (black), μ = 24 (blue), and μ = 29 (red). The lower limit of the x-axis interval defining the area shaded under each curve corresponds to its 95th percentile soft maximum bound (P95 SMB): 20 Ma (black), 30 Ma (blue), and 40 Ma (red)

BEAST2 analyses were implemented using the Markov Chain Monte Carlo algorithm (MCMC) run for 50 million generations sampled every 1000 generations, under default proposal mechanisms and default priors for the parameters of the birth-death branching process used to provide the prior distribution for the non-calibration nodes (speciation and extinction rates) and the model of molecular evolution for each gene (substitution rates, base frequencies, gamma shape, and proportion of invariant sites). Convergence model parameter estimates were assessed via ESS values over 200, using Tracer v.1.7 [70]. Sufficient sampling of the estimate of the tree topology (ESS > 200) was determined by dividing the topological approximate ESS by the generation number of the approximate earliest stationary value in the topological autocorrelation plot, generated in the R package rwty [71]. Further assessment of MCMC convergence was undertaken by examination of the average standard deviation of split frequencies, with values << 0.01 taken as indicative of stationarity. All analyses used a 10% burn-in. A maximum clade credibility (MCC) topology was inferred using TreeAnnotator v.2.5 [67], resulting in a chronogram indicating posterior probabilities (PP) and mean ages of all nodes with their associated 95% highest posterior density (HPD) intervals.

Inference of geographic range evolution

The evolution of geographic ranges in Distichodus was investigated using the null-range-excluded dispersal-extinction-cladogenesis model (DEC*) [72], a modified version of the original likelihood-based dispersal-extinction-cladogenesis (DEC) model [73, 74]. The set of discrete geographic areas for the DEC* analysis consisted of the six Afrotropical ichthyofaunal provinces of Roberts [1] (modified by Lévêque [2]) with presence of Distichodus species: Congo Basin (CB), Zambezi (Z), Nilo-Sudan (NS), Upper Guinea (UG), Lower Guinea (LG), and East Coast (EC) (Fig. 1). African ichthyofaunal provinces were delimited on the basis of current and historical patterns of drainage connectivity and the composition of the fish fauna, and therefore represent regions with a distinctive evolutionary history and a more or less characteristic biota at the species and higher taxonomic levels [1, 2]. To assess the relative fits of alternative models of faunal assemblage in the Congo Basin, three variants of the DEC* model were fit to the data in the BioGeoBEARS R package [75], following the parameterization of dispersal multipliers from Day et al. [38]: M0, an unconstrained multiplier matrix allowing for dispersal to and from the Congo Basin; M1, an asymmetric multiplier matrix allowing only dispersal out of the Congo Basin (CB-as-source); M2, an asymmetric multiplier matrix allowing only dispersal into the Congo Basin (CB-as-sink). Tip-state ranges were assigned based on the presence of species in different ichthyofaunal provinces. In several cases, species spanned multiple provinces. The maximum range size was set to widespread (all six ichthyofaunal provinces). Given the high dimensionality of the transition matrix resulting from the combination of different provinces (areas) into ranges of sizes up to six, relative to the size of the dataset, 14 disjunct ranges of differing sizes were pruned from analysis, reducing the dimensionality of the matrix from 64 × 64 to 50 × 50. To assess the stability of numerical optimization, analysis was run five times from fresh R sessions. Model fits of the M0, M1, and M2 variants were compared using the Akaike information criterion [76] and supports were assessed using Akaike weights [77]. In an effort to take account of chronological uncertainty due to alternative molecular clock calibration scenarios, inference of geographic range evolution in Distichodus was conducted on three of the 15 time-scaled phylogenies previously inferred with BEAST2, namely the chronograms resulting from analyses based on each alternative calibration node constrained as crown and by a relatively moderate soft maximum bound (P95 SMB = 30 Ma) (analyses 5, 8, and 14 in Table 4).