Cyprinid phylogeny based on Bayesian and maximum likelihood analyses of partitioned data: implications for Cyprinidae systematics

Cyprinidae is the biggest family of freshwater fish, but the phylogenetic relationships among its higher-level taxa are not yet fully resolved. In this study, we used the nuclear recombination activating gene 2 and the mitochondrial 16S ribosomal RNA and cytochrome b genes to reconstruct cyprinid phylogeny. Our aims were to (i) demonstrate the effects of partitioned phylogenetic analyses on phylogeny reconstruction of cyprinid fishes; (ii) provide new insights into the phylogeny of cyprinids. Our study indicated that unpartitioned strategy was optimal for our analyses; partitioned analyses did not provide better-resolved or -supported estimates of cyprinid phylogeny. Bayesian analyses support the following relationships among the major monophyletic groups within Cyprinidae: (Cyprininae, Labeoninae), ((Acheilognathinae, ((Leuciscinae, Tincinae), Gobioninae)), Xenocyprininae). The placement of Danioninae was poorly resolved. Estimates of divergence dates within the family showed that radiation of the major cyprinid groups occurred during the Late Oligocene through the Late Miocene. Our phylogenetic analyses improved our understanding of the evolutionary history of this important fish family.

The family Cyprinidae is the largest freshwater fish family and includes an estimated 2420 species in about 220 genera [1]. The large number of species, wide geographic distribution, and considerable morphological diversity make the cyprinid fishes taxonomically difficult [2] and a challenge for cladistic analysis. The history of Cyprinidae classification was well documented by Hensel [3], and numerous efforts have been made to partition cyprinids into subfamilies using morphological or anatomical characteristics [2,48]. However, the systematic relationships among many cyprinid subfamilies are poorly understood, because the subfamilies are vaguely defined or supported by few morphological characteristics [2].
Cyprinidae has been conventionally divided into two major lineages, the cyprinine (barbine) and the leuciscine groups. Overall, morphology has provided few insights into cyprinid relationships below the family level and failed to reach agreement on the number and the monophyly of subfamilies within Cyprinidae. Chen et al. [8] published the cladistic evaluation of cyprinid subfamily relationships and the additional morphological studies by Cavender and Coburn [9] and Howes [2] attempted a coherent classification of all cyprinid groups, including the monotypic genus Tinca. In these previous studies, conflicting arrangements of the subfamilies Tincinae, Rasborinae, and Gobioninae were proposed.
Recently, molecular phylogenetic analyses have been performed on Cyprinidae. In general, most of the molecular studies of European cyprinids [1016] have been phylogenetically congruent. For example, all of these studies supported the nesting of Alburninae [2] within a paraphyletic Leuciscinae, but not the usual dichotomy between barbelled cyprinines (subfamilies Cyprininae, Gobioninae, and Rasborinae) and leuciscines lacking or sporadically possessing barbels (subfamilies Acheilognathinae, Leuciscinae, Cultrinae, and Alburninae) [2]. Because cyprinids are most diverse in Asiatic waters [17], phylogenetic studies that include Asian species would greatly advance cyprinid systematics [1820]. Cunha et al. [18] identified an Asian group consisting of cultrins+acheilognathins+gobionins+ xenocyprinins within the Cyprinidae using cytochrome b (Cytb) gene sequences. Other molecular phylogenies of East Asian cyprinids indicated two principal lineages within Cyprinidae and provided phylogenetic evidence for the monophyly of cultrins-xenocyprins and affiliated groups [19,20]. However, these molecular analyses were heavily based on partial mtDNA sequences, and resulted in phylogenetic trees with limited resolution and little discrimination among alternative phylogenetic hypotheses.
The current cyprinid classification developed in the absence of a strong phylogenetic framework and is largely morphology based; few revisions have resulted from recent molecular evidence, due to the limited taxon sampling in those studies. Some critical areas of Cyprinidae phylogeny and systematics remain unresolved. First, a majority of designated cyprinid subfamilies have not been tested for monophyly with either molecular or morphological data, and molecular data [1820] has failed to support the monophyly of many morphologically-defined subfamilies, e.g., Rasborinae [8] and Leuciscinae [2,4]. Second, previous analyses have not agreed on the phylogenetic positions of Rasborinae, Tincinae, and Acheilognathinae. In recent molecular phylogenies, relationships among these subgroups remained unclear, because corresponding nodes were generally not statistically supported. Third, although the leuciscine and cyprinine subdivisions of Cyprinidae are widely accepted, the higher-level taxonomic relationships within these clades remain unresolved.
Molecular phylogenetic analyses of East Asian cyprinid resulted in substantial disagreement on the classification of subfamilies compared with the traditional taxonomy [19,20]. Therefore, extensive sampling of Asian cyprinids would provide further insights into the phylogenetic systematics of this family. The present paper used extensive taxon sampling and concatenated sequence data for the nuclear recombination activating gene 2 (RAG2) and the mitochondrial 16S ribosomal RNA (16S rRNA) and Cytb genes to reconstruct the phylogeny of cyprinids.
To analyze DNA sequence data-sets with multiple genes, partitioned phylogenetic analyses have become increasingly popular in recent years. Partitioned phylogenetic analyses use separate nucleotide substitution models (and associated parameters) for subsets of the data, to better explore partition-specific models of evolution and to reduce systematic error, thus yielding more accurate phylogenies. Generally, partitioned phylogenetic analyses are undertaken in a Bayesian framework [21], but recently, mixed-model search methods using maximum likelihood (ML) have become available [22]. Furthermore, an appropriately-partitioned data-set should be well modeled but not over-partitioned, because the over-parameterization (including over-partitioning) could result in parameter nonidentifiability, increased variance, improper posterior distributions, and undue influence of the priors [23]. Alternatives to Bayes factors for phylogenetic model selection that use explicit parameterization penalties are now available for partitioned analyses [23]. We performed ML and Bayesian analyses of partitioned data to reconstruct the phylogeny of cyprinid fishes, and also used relaxed molecular clock approaches to estimate the dates of cladogenetic events within the family. Our main aims were (i) to demonstrate how partitioning concatenated data affected phylogenetic reconstruction of cyprinid fishes; (ii) to test the monophyly of the currently-recognized subfamilies within Cyprinidae; and (iii) to discuss the taxoonomic implications of the recovered clades.

Taxon sampling and total DNA isolation
Our samples include 103 cyprinid species representing all major morphological groups and all 12 subfamilies within Cyprinidae [4]. Outgroup taxa were selected based on the consensus that Cypriniformes is a monophyletic group [24,25]. Therefore, six cypriniform fishes outside Cyprinidae were included in our analyses (Catostomidae, Balitoridae, Cobitidae, and Gyrinocheilidae) ( Table 1).
Field-collected fish muscle or fin tissues were fixed in 95% ethanol and kept at 20°C in the laboratory until DNA extraction. Total genomic DNA was isolated from muscle or fin tissues using the phenol/chloroform extraction procedure [26].

DNA sequences collection and alignment
The nuclear RAG2 gene and mitochondrial genes were amplified from total DNA extracts via polymerase chain reaction (PCR) using published and/or optimized primers [2729]. Reaction mixtures contained approximately 100 ng of DNA template, 5 µL of 10× reaction buffer, 2 µL dNTPs (each 2.5 mmol L 1 ), 2.0 U Taq polymerase, and 1 µL of each oligonucleotide primer (10 µmol L 1 each ), in a final volume 50 µL. The PCR amplification profile included an Nucleotide sequences were determined using purified PCR product. We generated most of the sequences used in this study, and some sequences for the 16S rRNA and Cytb genes were obtained from GenBank (Table 1). For the two protein-coding genes, RAG2 and Cytb, multiple sequence alignments were performed using CLUSTAL X [30]. For the 16S rRNA gene, sequences were initially aligned using CLUSTAL X, then manually aligned based on secondary structural elements and conserved motifs by comparing to existing models of 16S rRNA secondary structure for cyprinid fishes [3133]. All data-sets analyzed for this study are available on request from the first author.

Data partitions and model selection
We performed partitioned analyses using different nucleotide substitution models and associated parameter for each data subsets. We evaluated ten distinct partitioning strategies ranging from unpartitioned to a maximum of eight partitions (Table 2). Each partitioning strategies were denoted with the letter P followed by the number of data partitions. The unpartitioned (P1) analyses applied a single model of sequence evolution to all the data. The eight-partition (P8) analyses included separated substitution models for the stems and loops of 16S rRNA gene and for each codon position of Cytb and of RAG2.
Model selection was undertaken using PAUP [34] and ModelTest 3.7 [35]. The Akaike information criterion (AIC) weighting [36] determined the best-fit nucleotide model for each data partition. The initial tree used in ModelTest was drawn arbitrarily from a set of equally-parsimonious trees obtained with the complete data [23]. Because MrBayes 3.1.2 [21] only allows models with one, two, or six substitu-tion rates, the AIC-selected model was often impossible to implement. Consequently we were forced to choose between under-and over-parameterized models. In these situations, a feasible solution is to select the best over-parameterized model to avoid the possible negative consequences of under-parameterization, e.g., underestimated branch lengths and consequent long-branch attraction [23]. This recommendation has been verified by several simulation studies that found few costs associated with model over-parameterization, at least within the framework of the general time-reversible (GTR) family of models [37,38]. The model GTR+I+Γ was applied to all partitions in our Bayesian and ML phylogenetic analyses.

Phylogenetic analyses of the Cyprinidae
Bayesian phylogenetic analyses were performed using the software MrBayes 3.1.2 [21]. A Metropolis-coupled Markov chain Monte Carlo (MCMC) process was undertaken for each data partition running simultaneously with a cold chain and three incrementally heated chains. The default setting for the heating parameter (T=0.2) in our preliminary analyses resulted in no or infrequent state exchanges between chains. When an alternative temperature regime (T=0.02) was used, successful state exchanges between chains improved in proportion to 40%80%.
MCMC analyses of each data partition were run for 2×10 7 generations, with sampling every 1000 generations. We employed two strategies to confirm stationarity. First, we plotted log-likelihood scores, tree lengths, and all model parameter values against generation number using Tracer v. 1.4 (http://tree.bio.ed.ac.uk/software/tracer/) to graphically evaluate "burn-in". Second, MCMC convergence was assessed graphically using the cumulative function of AWTY [39]. The cumulative function was used to analyze the posterior probability (PP) support values for each clade to verify that these values were stable across all post-burn-in generations within each analysis. The PPs should stabilize once the Markov chain reaches stationarity, and substantial deviation of PPs from equilibrium values over time would indi- cate a lack of chain convergence. Our diagnoses suggested that chain convergence generally occurred within the first two million generations of each analysis. We followed a conservative approach by discarding the first 10 million generations as burn-in and using the remaining 10 million generations (10000 sampled trees) in all subsequent analyses. The 50% majority-rule consensus trees were generated with mean branch-length estimates, PP values for each node, credible sets of trees, and parameter estimates. Trees resulting from our partitioned analyses that explicitly accommodated among-partition rate variation (APRV) had greater harmonic mean log likelihoods (HMLi) than those from equivalent analyses that did not accommodate APRV. Therefore, we employed the "prset ratepr=variable" option in MrBayes in all partitioned analyses. In all MCMC runs, we assigned uniform priors to trees and parameters of models of sequence evolution, and an exponential prior to branch lengths.
Partitioned and unpartitioned ML analyses were performed in RAxML [22]. Following the recommendation of McGuire et al. [23], we performed two sets of analyses for each partitioning strategy. In the first set of analyses, we estimated the ML values, which were used in the ML strategy-selection procedure, of the P8 Bayesian topology under each partitioning strategy. In the second set of analyses, we searched for the ML topology with the highest likelihood during 200 runs on distinct starting trees, then used 500 bootstrap replicates to measure support for the recovered clades. We employed the GTRGAMMAI substitution model in both sets of analyses.

Evaluation of alternative partitioning strategies
Alternative partitioning strategies were evaluated using four different criteria [23]: standard Bayes factors [40,41], a modified AICc [23], the Bayes information criterion (BIC) [42], and a decision-theoretic (DT) approach [4345]. The Bayes factor for any pair of partitioned models was the ratio of their marginal likelihoods. Marginal likelihoods are difficult to calculate, but can be approximated by the HMLi [46]. Using ln-transformed Bayes factors, we accepted Bayes factors greater than 10 (2ln Bayes factors>10) as strong support for the more partitioned model. Because the relationships of HMLi's under alternative partitioning strategies are similar to the relationships of ML values [23,47,48], we substituted the HMLis for ML values in the AICc, BIC, and DT tests of partition strategies under Bayesian framework. The partitioning strategy preferred by AICc, BIC, or DT had the minimum observed value. To estimate branch-length on a fixed-topology in MrBayes, the program's branch-swapping functionality was disabled and node-slider was enabled (by resetting props).
We compared the optimal partitioning strategies selected by Bayes factors, AICc, BIC, and DT tests in Bayesian analyses with those preferred by hierarchical LRT (hLRT), AICc, BIC, and DT in ML analyses. To better compare ML and Bayesian strategy-selection procedures, RAxML and Bayesian analyses employed only the GTR+I+Г substitution model. All ML comparisons were based on likelihoods calculated for the eight-partition Bayesian consensus tree, whereas Bayesian model criteria were computed in the context of optimized trees for each partitioning strategy (except DT). To apply these partition selection criteria to our ML and Bayesian analyses, we calculated the number of parameters in each data partition following the recommendation of McGuire et al. [23].

Testing alternative cyprinid phylogenetic hypotheses
Bayesian hypothesis testing [49] was used to test whether alternative hypotheses of higher-level cyprinid relationships recovered in our partitioned Bayesian analyses could be rejected by the combined data. Because Bayesian analysis infers the distribution of trees proportional to their PPs, commonly used statistical methods to compare alternative topologies, such as the approximately unbiased test [50], are not plausible under the Bayesian framework. The 95% credible sets of tree (sampled at stationarity) was built by using the "sumt" command in MrBayes. All trees were imported into PAUP and filtered by the phylogenetic hypothesis of interest; that hypothesis could not be rejected statistically when one or more trees in the 95% credible set compatible with the hypothesis.

Molecular dating of cyprinids
Rate heterogeneity among lineages in the concatenated dataset was evaluated using LRTs comparing log likelihoods of both constrained and unconstrained trees. We used the GTR+I+Г substitution model. A strict molecular clock was rejected (P<0.005, degrees of freedom=107). Therefore, the relaxed molecular clock model of Sanderson's nonparametric rate smoothing (NPRS) method [51] was used to estimate divergence dates.
The NPRS implemented in the program r8s [52] was used to produce ultrametric trees. Divergence date estimates were based on the topology resulting from the unpartitioned Bayesian analysis. Powell's algorithm for optimizing the objective function and the additive penalty function were used. The 95% confidence intervals for the estimated ages were determined using 100 bootstrap pseudoreplicates of the combined data matrix using SEQBOOT in PHYLIP 3.5c [53]. While keeping the tree fixed, nodal depth (hence age estimates) of each pseudoreplicate was estimated by ML with the preferred model of molecular evolution [51]. For each node, the mean age was calculated from 100 bootstrap ages.
To estimate divergence times, we applied multiple fossil calibration points including (i) the root node of Cyprinidae was constrained to a maximum of 55.8 million years ago (Mya) because the oldest reliable known fossils of Cyprinidae are from the Eocene [54]; (ii) the split between Tinca and the modern leuciscins was constrained to be a maximum of 18.0 Mya, because Tinca was described from the Middle Miocene [55,56] and a prominent turnover of European freshwater fish faunas represented by the appearance of modern Palaeoleucisus sp. and Palaeocarassius sp. (=aff. Abramis sp. vel aff. Alburnus sp.) happened about 1718 Mya (the late early Miocene) [57]; (iii) a minimum of 1.81 Mya was assigned to the node subtending silver (Hypophthalmichthys molitrix) and bighead (Aristichthys nobilis) carp, and to the node subtending grass carp (Ctenopharyngodon idella) [58]; (iv) a minimum age of 3 Mya was used to define the origin of Pseudorasbora [58]; (v) a fixed date of 13 Mya was used to define the lineage leading to modern European Barbus barbus according to the fossil records of Barbus [10].

Results
We generated 4257 aligned base pairs (bp) of DNA sequence data representing three genes, the nuclear RAG2 1287 aligned bp and the 16S rRNA 1830 bp and Cytb 1140 bp. Of those sites, 2209 were variable and 1797 were parsimony informative. In the 16S rRNA gene, 190 sites were variable and 106 parsimony informative. The Cytb gene had 160 variable and 140 parsimony informative sites. The remaining 290 variable sites, 270 of which were parsimony informative, occurred in the RAG2 gene.

Effects of alternative partitioning strategies
The HMLi and lnL were used to evaluate partitioning strategies in the Bayesian and ML analyses, respectively. In the present study, adding partitions substantially improved the HMLi and -lnL scores ( Table 2), suggesting that simpler partitioning strategies were poorer fits to the data than more complicated partitioning strategies. For example, partitioning Cytb and RAG2 by codon positions dramatically improved HMLi and -lnL. Partitioning the 16S rRNA gene by stems and loops improved HMLi and -lnL by about 360 and 330 log-units respectively (P4 vs. P3). Comparing the strategies with the same numbers of partitions (P5a vs. P5b, P6a vs. P6b) indicated that partitioning Cytb alone by codon was better than partitioning only RAG2 by codon. The P8 strategy, which partitioned the rRNA gene by stems and loops and the coding genes by codon position proved best in both Bayesian and ML analyses. Despite differences in model fit, tree topologies inferred by Bayesian and ML methods using the ten partition strategies were almost identical to one other; the differences involved alternative placements of weakly supported nodes (PP<0.90 and bootstrap support<70%). The most dramatic difference in topology occurred in the position of Danio within Cyprinidae; the Bayesian P7 analysis weakly supported a basal position for Danio, unlike the other analyses. Tree length estimates varied only slightly across partitioning strategies, and no notable differences in PPs (Bayesian analyses) or bootstrap supports (ML analyses) were found among strategies. The number of strongly-supported ingroup nodes (PP values0.95) decreased between the unpartitioned (73 of 95 ingroup nodes with PP values0.95) and the maximally partitioned (63 of 93). Our analyses suggested that highly-partitioned Bayesian analyses had relatively poor performance in recovering well-supported cyprinid nodes.

Selecting the optimal partitioning strategy
Two extreme partitioning strategies were selected by DT and Bayes factors (hLRT), AICc, and BIC. The DT selected the unpartitioned P1 strategy; in contrast, the other criteria preferred the most partitioned (P8) strategy. The morepartitioned strategies considered in this study did not provide better-resolved or -supported estimates of cyprinid phylogeny, because all strategies resulted in similar topologies and node support values. The partitioning strategy employed in this analysis was not as critical as expected. However, the phylogenetic analysis based on P1 (preferred by DT in both Bayesian and ML frameworks) required fewer parameters to be estimated, and we inferred that P1 strategy was optimal for our Bayesian and ML analyses. Although adding partitions improved likelihood scores, partitioning had little effect on topology or node support. Much of the improvement in likelihood scores obtained with more extensive partitioning was probably associated with substitution model and base frequency parameter estimates (nuisance parameters in this context) rather than with more critical topology and branch-length estimates [23].

Phylogeny of the Cyprinidae
Bayesian analyses of a combined molecular dataset resulted in informative phylogenetic estimates for cyprinids ( Figure  1). The monophyly of Cyprinidae was strongly supported with a PP of 1.0 in all analyses. The unpartitioned Bayesian analysis resulted in a well-resolved and -supported cyprinid phylogeny, with 73 of 95 ingroup nodes receiving PP val-ues0.95 and two additional nodes with PP values of 0.900.95.
The unpartitioned Bayesian analysis strongly supported several important clades within Cyprinidae. However, the position of Danio at the base of the leuciscines was poorly supported (PP=0.61), and the genus was basal to the entire family in the P7 analyses. Within the cyprinine lineage (Clade I) (sensu Howes [2]), the monophyly of labeonine fishes (Clade B) and sister relationship between labeonine and non-labeonine cyprinine clades (except Procypris, Clade A) were both strongly supported (PP=1.0), whereas the relationships within non-labeonine cyprinine clade (Clade A) were less well supported, with several unresolved relationships.
The unpartitioned ML analysis (the strategy preferred by DT in an ML framework) and the more complex partitioning strategies resulted in phylogenetic trees highly similar to Figure 1, with the following exceptions: (i) the phylogenetic position of Danio; (ii) the deep branching pattern within Clade A; and (iii) support for the node subtending clade D, E, and F and Tinca. All of the important cyprinid clades recovered in Figure 1 were also well supported (boot-strap70%) in the ML analysis, except that the strongly-supported (PP=0.95) node for Clade A in the Bayesian analysis had relatively low bootstrap support (66%) in the ML analysis. Table 3 lists the divergence times (with 95% confidence intervals) estimated in r8s for the main nodes marked in Figure 2. As estimated by our combined data, Cyprinidae appeared in East Asia around 42

Performance of alternative partitioning strategies
For the datasets composed of multiple genes and/or gene regions, partitioned phylogenetic analyses may greatly reduce mismodeling and systematic errors relative to analyses specifying a single model. Comparison of the 95% credible intervals of parameters sampled from the posterior distributions of the P1 and P8 analyses found significant heterogeneity, indicating that including more partitions greatly improved the Bayesian and ML likelihoods in this study. Numerous instances of non-overlap could be found in the credible intervals for different partitioning schemes. Based on the parameter estimates, partitioning the Cytb codon positions improved the HMLi and -lnL scores more substantially than partitioning the RAG2 or 16S rRNA genes.
Although partitioning substantially improved likelihood scores, its effect on topology and node support was minimal. In our Bayesian analyses, increased partitioning decreased the estimated PPs of some nodes relative to the P1 strategy. For example, seven of the ingroup nodes that had PP values  Figure 2 Phylogeny of cyprinid fishes with divergence time estimates. The chronogram is the tree from the unpartitioned Bayesian analysis with dates estimated using nonparametric rate smoothing in the program R8s. Node labels are defined in Table 3, where mean divergence dates and 95% confidence intervals for key nodes are listed. of 1.0 in the P1 tree had lower support values under the most partitioned (P8) strategy. Phylogeneticists are concerned about appropriate partitioning in their analyses, because poor topology and confidence estimates can result from poorly-or overly-partitioned models. Although improved modeling could decrease the amount of systematic error under a given partitioning strategy, random error could significantly impact phylogeny and confidence estimates. The ideal partition size for optimal phylogenetic estimates is still unclear. For our cyprinid dataset, we concluded that most of the improvement in HMLi and lnL estimates with greater partitioning was associated with better estimation of nuisance parameters, such as base frequencies and substitution rates.
We compared ten partitioning strategies in both Bayesian and ML frameworks, and four alternative model-selection criteria were employed to screen the best-fitting strategy. The standard Bayes factor/hLRT and AICc imposed relatively weak penalties for additional parameterization and consequently selected the most complex partitioning strategy, whereas the more stringent BIC and DT criteria preferred the most-and least-partitioned models, respectively. The DT method incorporates relative branch-length error as a performance measure. Therefore, under the DT framework, if a less-partitioned model returned nearly identical branch length estimates to those of a model with more partitions, there would be little difference in phylogenetic estimates between the models. The performance-based DT criterion selected the unpartitioned strategy in our analyses, indicating that there were probably no improvements in branch length estimates in our partitioned (P2-P8, Table 2) analyses compared with unpartitioned analyses.

Phylogenetic framework for systematics of the Cyprinidae
As expected, the monophyly of the family Cyprinidae was recovered with strong Bayesian PP and ML bootstrap support. Our phylogeny established a higher-level framework for Cyprinidae and revealed several well-supported groupings.
One large clade within Cyprinidae was the wellsupported cyprinine lineage (Clade I, Figure 1). All taxa in this clade were members of the previously-recognized subfamilies, Barbinae, Cyprininae, Labeoninae, and Schizothoracinae [4]. Although the basal relationships within this clade have been contentious due to disagreement between molecular and morphological phylogenetic studies, our data consistently supported the monophyly of the cyprinine clade. Within the cyprinines, our analyses provided robust evidence for the monophyly of Labeoninae as the currently recognized. However, in all of our analyses, the cyprinine, barbine, and schizothoracine fishes (except Procypris) were nested within one clade (Clade A) sister to the labeonine clade. In another analysis with more cyprinine samples (unpublished data), two clades were strongly recovered: the Labeoninae and the Cyprininae, containing the barbins, cyprinins (including Procypris), and schizothoracins.
Another well-supported primary clade of Cyprinidae resolved in all analyses was the leuciscine lineage (Clade II, Figure 1). Within this clade, all of our analyses provided substantial resolution and support for the monophyly of Gobioninae (including Gobiobotia), Acheilognathinae, Leuciscinae, and Xenocyprininae, the latter is endemic to East Asia. Although Gobioninae, Acheilognathinae, and Leuciscinae were each strongly supported, the relationships among them were weakly resolved and differed among analyses. These three clades, together with Tinca, formed a clade sister to the Xenocyprininae. The placement of Tinca within Cyprinidae has proved to be taxonomically problematic in previous studies [2]. In contrast to studies based on morphological [8,9] and molecular [1012,20] data, our analyses strongly supported a clade comprised of Tinca, leuciscini, Gobionini, and Acheilognathini, within which Tinca was weakly supported as sister to leuciscini.
Not surprisingly, the monophyly of the danionine (rasborine sensu Howes [2]) fishes was rejected by the present analyses. Morphologically, "Danioninae" contains a large assemblage of taxa, most of which cannot be accommodated by other subfamilies [2]. Furthermore, a recent molecular phylogeny indicated that Danioninae was not monophyletic; putative members were scattered throughout Cyprinidae [59]. Thus, we suggest that the East Asia endemics, such as Zacco, Opsariichthys, and Nicholsicypris should be excluded from a redefined Danioninae .
In the recent taxonomic revision of cyprinid (or cyprinoid) fishes by Chen and Mayden [60], the recognition of 10 families (including the Psilorhynchidae) was recommended. Of these groups, six (i.e., Acheilognathinae, Leuciscinae, Gobioninae, Cultrinae, Tincinae, and Rasborinae) were also supported in the present analyses ( Figure 1). The Psilorhynchidae and Leptobarbidae were not included in our analyses, and the Cultrinae and Rasborinae referred to Xenocyprininae and Danioninae, respectively, in our study. Our data suggested that the Cyprinae recognized by Chen and Mayden [60] could be further divided into two clades, Cyprininae and Labeonine, and that the Tanichthyidae should be included in the Acheilognathinae. Unlike Chen and Mayden, we do not recommend that these groups be elevated from subfamily to family level, but prefer to retain these clades within Cyprinidae.
Previous morphological studies consistently supported two major lineages within Cyprinidae, i.e., barbeled cyprinines and (usually) non-barbeled leuciscines, although the subgroups included in each lineage and the relationships among subgroups have differed among studies [2,8,9]. However, recent molecular studies have disagreed with the morphological placement of Danionine (Rasborinae). The placement of Danioninae to the leuciscine clade was indicated in some prior morphological and molecular phylogeny [8, 9,60], and was weakly supported in our cyprinid phylogeny (Figure 1). Another study placed Danioninae within the cyprinine [2], while other molecular phylogenetic analyses placed it at the base of the cyprinids [12,13,61]. The disputed phylogenetic placement of Danioninae may be mainly due to different taxon sampling in these previous molecular phylogenies. Our data indicated that Danioninae represents a lineage within Cyprinidae, that is distinct from the well-accepted cyprinine and leuciscine lineages. A basal position for Danioninae within Cyprinidae (as recovered in the P7 Bayesian analysis) could not be rejected by Bayesian hypothesis testing of alternatives phylogenies generated from our combined data. A total of 6414 of 18710 trees in the 95% credible set were congruent with the hypothesis that Danioninae was basal within cyprinids.

Phylogenetic history of cyprinid clades
Based on the distribution of fossil cyprinids, an Eocene origin for cyprinids was proposed [54]. Consistent with this hypothesis, our molecular dating analyses also indicated that cyprinids originated in the Middle Eocene (around 42 Mya). Within the family, the cyprinine linage appeared in the early Late Oligocene (around 27 Mya) and the leuciscine lineage in the Late Oligocene (about 2625 Mya).
The radiation of Labeoninae, the major cyprinine clade, occurred in the early Middle Miocene, and Cyprininae was estimated to have diversified in the late Early Miocene. Within the leuciscine lineage, the divergence between Xenocyprininae and the lineage comprising Leuciscinae, Tincinae, Gobioninae, and Acheilognathinae, occurred in the Early Miocene (about 20 Mya). According to our age estimates, the Acheilognathini, Gobionini, and Leuciscini radiated during the Middle Miocene (around 1812 Mya).