Background

Vertebrate mitogenome is a small (16–17 kb) and circular double-stranded molecule [1]. It contains 37 genes including 22 tRNA genes, 13 PCGs and two rRNA genes [1]. It also has two noncoding regions, OL and CR, and the latter contains regulatory elements for controlling the transcription and replication of mtDNA molecule [2, 3]. Due to its unique features, such as high copy numbers in tissues, simple genomic organization, maternal inheritance, almost unambiguous orthology, haploid inheritance and high nucleotide substitution rate [4,5,6], mitogenome has been widely applied in species identification, i.e., DNA barcoding, as well as population genetics, conservation biology, molecular phylogenetics and evolutionary processes [7,8,9,10,11,12,13]. Gene arrangements of fish mitogenomes are generally conserved, only with a few exceptions [1]. However, the genome sequence length, the bias of base composition and start/stop codon, the overlap and IGSs are diverse among different species [14].

Cobitinae is a subfamily of Cobitidae that was first identified by Hora (1932). To date, it contains 214 species recorded in FishBase, covering 21 genera, such as Cobits, Misgurnus and Paramisgurnus [15]. Loaches of subfamily Cobitinae are bottom-dwelling fishes and widely distributed in Eurasian continent. They usually possess high economic, ornamental and scientific research value. Loach commercial farming, including cobitid loach (M. anguillicaudatus) and large-scale loach (P. dabryanus), occupies a significant position in freshwater aquaculture of Asia, due to their enjoyable taste, high nutritional value, rapid growth and strong adaptation [16,17,18]. In China, loach is used as a diet therapy or folk remedy for patient’s recovery or treatment of many diseases, such as hepatitis, osteomyeitis, carbuncles, and cancers. Many Cobitis populations are mixed diploid-polyploid, even bisexual and unisexual forms co-existing in the same niche [19,20,21]. They are suitable as models to reveal the relationship among hybridization, polyploidization, reproduction, speciation and evolution [21,22,23]. Due to their great diversity, they are also used to trace the biogeographic history of freshwater systems and to reflect geologic events [24]. Cobitinae fishes usually inhabit various benthic habitats in rivers, lakes, streams and ponds [25]. However, dilapidation of the ecological environment has led to a decrease of benthic organisms [26, 27]. Cobitinae fishes are seriously threatened and their wild populations are gradually decreasing [28]. On this account, the diversity of these benthic fishes have been used as a bioindicator to assess the quality of the ecological environment [29, 30]. In addition, many Cobitinae species, such as the “kuhli loaches”, are well-known in Southeast Asia and Europe as ornamental fish for their varied morphological patterns and the ability to ingest bottom organic residues.

Cobitinae fishes are difficult to be classified because of their morphological similarity and high plasticity in morphology [31]. Although the secondary sexual dimorphism is used to define genera, it is not always congruent with the current genera definitions. The molecular phylogeny of Cobitinae fishes has been studied at the genera or family level via one or two mitochondrial and/or nuclear genes [24, 31,32,33,34,35,36], and remains complex and controversial. For example, based on mitochondrial gene cytb and nuclear gene rag-1, Perdices et al. (2016) [37] reconstructed the phylogenetic relationship of Northern Clade of family Cobitidae that inhabit in Europe, and North and Northwest parts of Asia. The subfamily Cobitinae was divided into Cobitis sensu lato group (Cobitis, Iksookimia, Niwaella and Kichulchoia), Misgurnus sensu lato group (Misgurnus, Paramisgurnus and Koreocobitis), Microcobitis, and Sabanejewia. Although the monophyly of the groups were resolved, the relationships within the groups are discordant with current taxonomic status.

Up to now, about 60 mitogenomes, covering more than 40 species of Cobitinae, have been deposited into GenBank [38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55]. Although a few mitogenomes characteristics were described, the integrated characteristics of Cobitinae mitogenomes are still not well known. In this study, we sequenced the mitogenome of C. macrostigma, the type species of the genus Cobitis [25], and compared it with other 41 species (57 individuals) to amplify detailed features of the Cobitinae mitogenomes. Additionally, we assembled a large sequence matrix (11,442 bp) of 58 Cobitinae mitogenomes and two outgroups to investigate the phylogenetic status and the origin time of Cobitinae fishes.

Results

General features of C. macrostigma mitogenome

The mitogenome of C. macrostigma was sequenced, annotated and compared with 57 Cobitinae mitogenomes (Table 1). It contains 13 PCGs (nd1–6, nd4l, cox1–3, cytb, atp6 and atp8), 22 tRNA genes, two rRNA genes (12S rRNA and 16S rRNA) and two non-coding regions (OL and CR) (GenBank: MT259034). Gene order and orientation are same to most teleost mitogenomes (Fig. 1, Table 2). PCGs range from 168 bp (atp8) to 1551 bp (cox1) in size, with a total length of 11,427 bp. tRNAs vary from 66 bp (tRNACys(C)) to 76 bp (tRNALys(K)) in size, with a total length of 1557 bp. The length of small encoding subunit 12S rRNA and large subunit 16S rRNA are 952 bp and 1675 bp, respectively. They are flanked by tRNAPhe and tRNALeu(UUR) and interposed by tRNAVal. Among 58 mitogenomes analyzed, the entire mitogenome of C. macrostigma has the highest (99.6%) similarity with C. granoei and lowest (88.2%) with C. sinensis.

Table 1 Species, GenBank accession number and length of mitogenomes used in this study
Fig. 1
figure 1

Circular sketch map of the C. macrostigma mitogenome. Different colors represent different gene blocks

Table 2 Annotation of the C. macrostigma mitogenome

Highly conserved tRNAs secondary structure, overlaps and non-coding intergenic spacers among Cobitinae mitogenomes

Cobitinae mitogenomes range from 16,337 bp (L. annandalei) to 16,647 bp (M. anguillicaudatus and C. takatsuensis) in length (Table 1). Their gene composition, gene arrangement and strand bias are highly conserved (Fig.1 and Table 2). Among the 22 tRNAs, due to the absence of DHU arm, tRNAser(AGN) (S1) is the only one that is not folded into the typical clover-leaf secondary structure (Fig. 2a). In the Cobitinae mitogenomes, unmatched base pairs are widespread among tRNAs. Taking C. macrostigma as an example, there are 446 base pairs among the 22 tRNAs, and only one gene (tRNALeu(CUN)) possesses a fully paired stem. In the 425 base pairs of other 21 tRNAs, there are 43 (10.1%) unmatched base pairs that contain 28 noncanonical matches of G-U and 15 other mismatches, including A-C (7), A-A (1), C-C (2), C-U (2), and U-U (3) (Fig. 2a). Most of them are located in the acceptor, DHU and anticodon stems.

Fig. 2
figure 2

Putative secondary structure of tRNAs (a) and OL (b) of Cobitinae mitogeneomes. C. macrostigma mitogenome is taken as an example. tRNAs are labeled with their corresponding amino acids. Dashes (−) indicate Watson–Crick bonds, and dots (·) indicate mispaired nucleotide bonds

We also compared the gene overlaps and IGSs among 58 Cobitinae mitogenomes. Two long overlaps (atp8-atp6 and nd4l-nd4) and two long IGSs (OL and tRNAAsp-cox2) were found in Cobitinae mitogenomes. Highly conserved motifs “ATGCTAA” and “ATGGCAATAA” were found in the overlapped junctions between nd4l and nd4, and between atp8 and atp6, respectively (Fig. 3a). There are also several small overlaps between adjacent tRNA genes, such as tRNAIle - tRNAGln and tRNAThr - tRNAPro. OL is located within the five gene cluster (WANCY) (Table 2, Fig.1) and its secondary structure shows a stable stem-loop hairpin, which is strengthened by six C-G base pairs (Fig. 2b). Among the 31 bp of OL, the C-G base pairs on stems are highly conserved while the loops in the middle are variable (Fig. 3b). Another long IGS, between tRNAAsp and cox2, is also conserved in the 5′ and 3′ end, and highly variable in the middle.

Fig. 3
figure 3

Sequence logo of gene overlaps in atp8-atp6 (a), non-coding intergenic spacers in tRNAAsp-cox2 (b) and short conserved motif in CR of Cobitinae mitogenomes (c)

CR, located between tRNAPro and tRNAPhe, is the most variable region in Cobitinae mitogenomes and ranges from 872 bp (Lepidocephalus macrochir) to 990 bp (C. takatsuensis) (Supplementary Table 2) [44]. Three domains are conserved and can be recognized in Cobitinae mitogenomes (Fig. 3c). They are terminal associated sequences (TAS), the central conserved-blocks (CSB-D, CSB-E and CSB-F) and conserved sequence blocks (CSB-1, CSB − 2 and CSB-3).

Usage bias of start and stop codon, codon distributions and relative synonymous codons in Cobitinae mitogenomes

The typical start codon ATG is conservative and is used in 12 PCGs, while GTG is only used in cox1 in 98% (57/58) analyzed Cobitinae mitogenomes except one individual of M. anguillicaudatus (No. 11) (Fig. 4, Supplementary Table 3). Five types of stop codons were found, containing three canonical (TAA, TAG and AGA) and two truncated stop codons (TA- and T--) (Fig. 4). The two truncated termination codons are used in nd2, cox2, atp6, cox3, nd3, nd4 and cytb, the 3′ -ends of which are followed by a tRNA gene encoded with the same strand.

Fig. 4
figure 4

Usage bias of start and stop codons of 13 PCGs in Cobitinae mitogenomes. Pie graphs show the use frequency of start and stop codons. Gene abbreviations are the same as Table 2

The codon distribution and relative synonymous codon usage (RSCU) of 58 Cobitinae mitogenomes were analyzed. Our results show that codon distribution is largely coincident among these Cobitinae mitogenomes (Supplementary Figure S1). As shown by six representative species of Cobitinae, the codons encoding Leu(CUN), Ala and Thr are the three most frequently present, while those encoding Cys are rare (Fig. 5a). Compared to the other five Cobitinae species, P. anguillaris uses more codons of Leu(CUN) and less codons of Leu(UUR). The patterns of RSCU are also consistent among the analyzed species (Fig. 5b). Degenerated codons are biased to use more A/T than G/C in the 3rd position of PCGs, which results in the content of A + T is higher than G + C in the 3rd position of Cobitinae PCGs. For example, the codons for Arginine CCA and the codes for Tryptophan UGU are prevalent, while their other synonymous codons are relatively less used.

Fig. 5
figure 5

Codon distribution (a) and relative synonymous codon usage (b) of PCGs in C. macrostigma and other five representative species of Cobitinae. CDpT = codons per thousand codons

A + T %, AT-skew and their linear correlations of Cobitinae mitogenomes

The A + T content and AT-skew of whole mitogenomes, PCGs, tRNAs, rRNAs and CR were calculated (Fig. 6a-b). The 58 Cobitinae mitogenomes all exhibit AT bias, and the A + T content is the lowest (54.8 ± 0.6%) in tRNAs and the highest (66.3 ± 0.9%) in CR (Fig. 6a, Supplementary Table 2). The AT-skew values are the largest and positive in rRNAs, while they are the smallest in PCGs and most are negative except Canthophrys gongota, Acantopsis choirorhynchos, P. cuneovirgata, P. kuhlii, P. oblonga, and Kottelatlimia pristes (Fig. 6, Supplementary Table 2). These results indicate that PCGs are biased towards using T not A in most Cobitinae mitogenomes. To examine whether the A + T content and AT-skew are different in three codon position of PCGs, we also selected the six Cobitinae species for a more detailed analysis. The A + T content shows 1st < 2nd <3rd in the three position of PCGs in all analyzed fishes. Meanwhile, the AT-skew of 1st and 3rd are positive while 2nd is negative (Table 3). This is due to the bias usage of relative synonymous codons (Fig. 5b). In all analyzed Cobitinae mitogenomes, CRs possess more A and C with all AT-skew values positive (0.002–0.112) and GC-skew negative (− 0.341−− 0.101) (Supplementary Table 2).

Fig. 6
figure 6

Base compositions and AT-skew in Cobitinae mitogenomes. a. A + T content of different regions in Cobitinae mitogenomes. b. AT-skew of different regions in Cobitinae mitogenomes. c. The correlations between A + T% and AT-skew in 13 PCGs of Cobitinae mitogenomes. d. The correlations between G + C% and GC-skew in 13 PCGs of Cobitinae mitogenomes

Table 3 Base composition and skewness of the mitogenomes in C. macrostigma and other five representative species of Cobitinae

The correlations of Cobitinae mitogenomes (yA1 = − 0.0166x – 0.9047, R2 = 0.5991) genus Cobits (yA2 = − 0.012x + 0.5786, R2 = 0.5197) and Pangio (yA3 = = − 0.0466x + 2.5813, R2 = 0.5486) were calculated between A + T % versus AT-skew. All of them showed negative linear correlations, implying that AT-skew becomes more positive with the increasing of A + T content (Fig. 6c). The similar negative linear correlations were also found in G + C % versus GC-skew (Fig. 6d).

Non-synonymous and synonymous substitutions

To better understand the role of selective pressure and evolutionary relations of Cobitinae fishes, the ω or dN/dS value of each PCG was calculated (Fig. 7). All the PCGs evolved under a purifying selection (ω < 0.5). The atp8 gene showed the highest ω value (ω = 0.12) and the cox family genes were lowest (ω = 0.02 ± 0.01). This phenomenon is also found in most Metazoa [56], but the fold change (> 10 fold) is particularly high in Cobitinae. The lower ω value represents less variations in amino acids. Thus, cox1, cox3 and cytb are potential barcoding markers for Cobitinae species identification.

Fig. 7
figure 7

Nonsynonymous/synonymous ratios (ω = dN/dS) of the 13 PCGs of Cobitinae mitogenomes

Phylogenetic analysis of Cobitinae fishes

Molecular phylogenetic analyses were performed using 13 PCGs from 58 Cobitinae mitogenomes, belonging to 41 species from 14 genera. The ML and BI analyses generated similar topology with high bootstrap support / posterior probability values. Each tree was similarly divided into two main clades: Cobitis-Misgurnus-other genera (clade I) and Pangio-Lepidocephalichthys-other genera (clade II) (Fig. 8 and Supplementary Figure S2). Clade I included all analyzed species of Cobitis, Paramisgurnus and Misgurnus, and five species from other genus (I. longicorpa, K. multifasciata, N. delicata, K. naktongensis, and Microcobitis sp.). Four Pangio species, five Lepidocephalichthys species and other five species (K. pristes, A. choirorhynchos, A. gracilentus, L. macrochir, and C. gongota) were clustered into Clade II, among which the analyzed species of genus Pangio and Lepidocephalichthys formed two well-supported (pp = 1.00) monophyletic groups respectively. In addition, Pangio is the sister genus to Lepidocephalichthys.

Fig. 8
figure 8

Phylogenetic tree constructed by BI methods, based on 13 PCGs of 58 Cobitinae mitogenomes. Sinorhodeus microlepis and Rhodeus shitaiensis were chosen as outgroups. Node numbers represent the values of posterior probability

The BI phylogenetic tree confirmed that Cobitis was a paraphyletic group, since Misgurnus clade A, N. delicate, I. longicorpa, and K. multifasciata shared the common ancestor with the all 15 Cobitis species analyzed in this study, with high posterior probability values (pp = 1.00). The species of Misgurnus were separated into two independent lineages: the majority of M. anguillicaudatus individuals (12/14) and M. bipartitus clustering with the Cobitis species (Misgurnus clade A), and two M. anguillicaudatus individuals, M. mizolepis, M. mohoity, and M. nikolskyi gathering with P. dabryanus and K. naktongensis (Misgurnus clade B).

Divergence time estimation of Cobitinae fishes

The combination of strict clock model and Yule process tree prior provided the best fit to the data sets (Supplementary Table 4). The chronogram with divergence time of Cobitinae lineages was estimated based on the cytB mutation rate (0.68% per million years) (Fig. 9). The first split of Cobitinae lineages was estimated to have occurred in the late Eocene (42.11 Ma, 95% HPD: 36.35–47.86 Ma), then separated into clade I (northern clade) and clade II (southern lineages). Cobitis-Iksookimia-Kichulchoia-Niwaella lineage diverged from the rest of northern clade lineage during the Oligocene (30.07 Ma, 95% HPD: 25.55–34.69 Ma), similar to the previous described [35], then diversified and further radiated after 4.94 Ma. The mtDNA introgression between ancestral species of Cobitis and ancestral species of Misgurnus seems to have taken place in the Middle Miocene (14.40 Ma, 95% HPD: 12.30–16.54 Ma). C. macrostigma appeared about 0.36 Ma (95% HPD: 0.06–0.55 Ma) in the Pleistocene. Pangio-Lepidocephalichthys-other genera (southern lineages) might originate about 40.45 Ma. In southern lineages, Pangio was estimated to have occurred about 20.14–29.88 Ma, and the divergence times of the four species analyzed in this study are congruent with the previous described dating [24].

Fig. 9
figure 9

The divergence times of Cobitinae fishes. The ranges of 95% HPD intervals are represented by the blue bars

Discussion

In this study, we conducted a comparative mitogenome analysis and revealed the conserved and unique characteristics of 58 Cobitinae mitogenomes. Cobitinae mitogenomes display highly conserved tRNA secondary structure, overlaps and non-coding intergenic spacers. Among the 22 tRNAs, tRNAser(AGN) (S1) is the only one that is not folded into the typical clover-leaf secondary structure (Fig. 2a). Loss of stem in S1 is common character among Cobitinae and other metazoan mitogenomes [57, 58]. Similarly, the widespread unmatched base pairs among Cobitinae tRNAs is also a conserved feature in the eukaryote mitogenome [59,60,61]. Although their functions are not clear in fish, the unmatched base pairs are considered as the current state of evolutionary and irreversible process, which might be caused by tRNA editing [62].

Like other cyprinid fishes [14, 63], two long overlaps and two long IGSs were found in Cobitinae mitogenomes. The motif “ATGCTAA” in nd4l-nd4 was conserved in vertebrates, including fish, turtle and human [14, 63,64,65,66]. However, in comparison with the conserved motif (ATGATAA) in other Cypriniformes fishes, there is a specific 3 bp insertion (GCA) in the atp8-atp6 overlap motif of Cobitinae and other loaches [67,68,69], indicating this insertion is a characteristic feature of loaches. IGSs are important for transcription and associated with gene rearrangement in insects [70,71,72]. It is commonly assumed that IGS had a rapid nucleotide substitution rate under relaxed selection [73]. Moreover, Cobitinae mitogenomes share highly conserved sequences in IGSs that are immediately adjacent to tRNAs, such as “CTTTCCCGCC”, “AAGGCGGGA” and “AGC”. Whether these conserved sequences have a function or not and how they act awaits further investigation. As the longest IGSs, CR plays an important role in controlling the transcription and replication of mtDNA molecule by several domains and motifs [74, 75]. Although significant length variation were found in CR of vertebrate [76], the three domains can also be recognized in Cobitinae mitogenomes. Furthermore, the AT-skew and GC-skew of CR might reflect the strand asymmetry [77,78,79]. In teleost, the skew inversion of CR was only found in the mitogenomes of Albula glossodonta and Bathygadus antrode, showing a reversed strand asymmetry [75]. The normal Cobitinae mitogenomes CR skewness indicates that the strand asymmetry is not reversed.

The phylogenetic analyses show the monophyly of the genus Pangio and Lepidocephalichthys, consistent with the previous study [35]. However, Cobitis, the biggest genus of Cobitinae [15], is a complex and controversial paraphyletic group. Similar to the trees constructed by cyt b [25, 80, 81], Iksookimia, Kichulchoia and Niwaella species were nested within Cobitis, implying a close relationship among them. Perdices [37] proposed that these species of Iksookimia, Kichulchoia, and Niwaella might belong to genus Cobitis, as morphologically specialized species derived from a local Cobitis species. However, this assumption awaits more morphological, karyological and molecular investigation. In addition, our phylogenetic analysis confirmed the assumption that M. mizolepis and P. dabryanus are conspecific [33, 80] and the different lineages under the species name C. striata and C. takatsuensis might actually represent different species.

The species of Misgurnus were separated into two independent clade and clustered into Cobitis species and P. dabryanus-K. naktongensis, respectively. The same results were observed in the trees based on the cyt b [80] and 13 PCGs from 28 cobitidae species [47]. However, all Misgurnus and Koreocobitis species were grouped into a monophyletic clade when their phylogenetic relationships were constructed by nuclear gene rag-1 [80]. This incongruity between mitochondrial and nuclear gene trees was explained by the different evolutionary rate of markers, hybridization or introgression [82]. It is commonly believed that hybridization and subsequent mtDNA introgression might occur between ancestral species of Cobitis and ancestral species of Misgurnus [35, 80]. In this study, we collected 14 mitogenomes from M. anguillicaudatus, which were divided into two genetically divergent clades. The similar phenomenon has been reported by several previous studies, which is explained by hybridization and mtDNA introgression [34, 35, 47, 83, 84]. Considering that M. anguillicaudatus clustered into the clade of Misgurnus and Koreocobitis by nuclear analyses [80], we supposed that the 12 mitogenomes (No. 1–12) of M. anguillicaudatus in Misgurnus clade A could be considered as the introgressed mtDNA type because of their close relationship with Cobitis species, whereas the other two individuals in Misgurnus clade B retained the original M. anguillicaudatus mitogenomes. M. anguillicaudatus with introgressed mtDNA type spread over most of East Asia, including China, Japan and Korea. M. anguillicaudatus shows extensive ploidy variability in nature. Besides most common diploid individuals (2n = 50), triploid (3n = 75) and tetraploid (4n =100) have been frequently recorded in some localities of China and Japan [21, 47, 85, 86]. Rare pentaploid (5n = 125) and even hexaploid (6n = 150) individuals were found in the Yangtze River basin [87]. All of M. anguillicaudatus polyploids analyzed in this study belonged to the introgressed mtDNA type. Since mtDNA is inherited maternally, these polyploids might have originated from the diploid M. anguillicaudatus with introgressed mtDNA. Further analyses are needed to confirm this hypothesis of inter-genus mtDNA introgression based on a large-scale sampling with quantitative morphological features, definite ploidy, and more genes from both mitochondria and nuclear genomes.

The first split of Cobitinae lineages was estimated to have occurred in the late Eocene (42.11 Ma, 95% HPD: 36.35–47.86 Ma), separating northern clade and southern lineages, consistent with reconstruction dates of the paleo-drainages of East Asia [35, 88]. Cobitinae fishes in Clade I and Clade II, nominated as “northern clade” and “southern lineages” respectively, show a distinct disjunctive distribution with a small area of sympatry in Vietnam [35]. Consistent with their locations, the northern clade spread to most of East Asia, Siberia and Europe, while the southern lineages distribute across the Indian subcontinent and Southeast Asia after their isolation. The nodes within northern clade and southern lineage appear asynchronous, implying that some local dominant factors, rather than large-scale events, might shape the evolution within northern or southern lineage.

Conclusions

This study represents the first comparative mitogenome and phylogenetic analyses within Cobitinae. The conserved and unique characteristics of 58 Cobitinae mitogenomes were revealed. We observed distinct base compositions among different genus and identified a specific 3 bp insertion (GCA) in the atp8-atp6 overlap as a unique feature of loaches. ML and BI analyses both strongly support the paraphyly of Cobitis and polyphyly of Misgurnus. In addtion, Cobitinae might have split into northern and southern lineages in the late Eocene (42.11 Ma), and a mtDNA introgression between Cobitis and Misgurnus might have occured about 14.40 Ma. The current study provides new insights into the mitogenome features and evolution of Cobitinae fishes.

Methods

Sampling, sequencing and assembly

The C. macrostigma analyzed in this study was caught from the Yangtze River in Yibin City, Sichuan Province, China (N: 28°46′6.01″, E: 104°38′13.99″) in October 2018 and five individual were transported to the laboratory (National Aquatic Biological Resource Center, NABRC) in oxygen-rich water. It possesses 5–9 large and round spot in the midline of lateral body side [89] (Fig. 1). Before sampling, they were reared in a square and glass recirculating freshwater tanks with a volume of about 100 L, at 22 °C on a 14 h (hour) light/10 h dark cycle for morphological identification. After deep and overdosed anesthesia with styrylpyridine (a common anaesthetic used in fish, 30-50 mg/L; aladdin, China), one healthy one-year-old female fish, 7 cm in length and 1.8 g in weight, was euthanized by immediately cutting off the spinal cord adjacent to the head. Total DNA was extracted according to the Ezup Column Animal Genomic DNA Kit technical manual (Sangon, Shanghai, China). PCR primers were designed based on the conserved sequences between the mitogenomes of C. granoei (GenBank: NC_023473.1) and C. sinensis (GenBank: NC_007229.1). 742–2495 bp DNA were amplified by using High Fidelity DNA Polymerase (Yeasen, Shanghai) (Supplementary Table 1). To obtain accurate sequences, we chose a cloning strategy. According to manual, PCR amplicon was purified, ligated ESI-Blunt vector (Yeasen, Shanghai) and transfected into 5α Chemically Competent Cell (Tsingke Biological Technology, Beijing). The positive clones were sequenced by Quintara Biosciences (Wuhan, China). The segments, longer than 1500 bp, were sequenced using the primer walking sequencing strategy. The resulting DNA sequences were assembled using DNAStar (DNASTAR Inc., USA) [90]. Other 57 Cobitinae mitogenomes were download from NCBI GenBank database [38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55].

Gene annotation and bioinformatic analyses

tRNA genes and their secondary structures were predicted with MITOS [91] and tRNAscan-SE 2.0 with default parameters [92]. All 13 PCGs and two rRNA genes were annotated by comparison with the sequences of other Cobitinae fishes in GenBank (https://blast.ncbi.nlm.nih.gov/). The mtDNA maps were drawn using CGView Server V1.0 [93]. The sequence logos of gene overlaps and non-coding IGSs were drawn using WebLogo 3.7.4 [94]. The base composition, codon distributions and relative synonymous codons usage were calculated using DNAStar (DNASTAR Inc., USA) [90], MEGA 7.0 [95] and Microsoft Excel 2010. Skewness was measured using the formulas: AT-skew = (A% - T%) / (A% + T%) and GC-skew = (G% - C%) / (G% + C%) [79]. The silimlarity of the sequences was calculated in MEGA 7.0 [95] under p-distance and NCBI-BLAST (https://blast.ncbi.nlm.nih.gov/Blast.cgi).

Phylogenetic analyses

The phylogenetic analysis was performed based on 13 PCGs of 58 Cobitinae mitogenomes. Sinorhodeus microlepis and Rhodeus shitaiensis were chosen as the outgroups (Table 1). Each of the 13 gene sequences was separately aligned using Muscle v3.8.31 [96] and concatenated into a sequence matrix by PhyloSuite v1.2.2 [97]. Then PartitionFinder2 [98] was used to find the best partitioning strategy and to calculate the best-fit evolutionary models for each subset. For the alignment, a scheme with eight partitions was selected and GTR + G + I was chosen as the best-fit evolutionary model for each partition. Phylogenetic trees were constructed by the maximum likelihood (ML) method and bayesian inference (BI). The ML method was implemented in RAxML v8.2.12 [99]. Each partition scheme was run with the GTRGAMMAI model, and 1000 rapid bootstrapping replications were set to evaluate the bootstrap support values and search for the best-scoring ML tree. The BI phylogeny was performed in MrBayes v3.1.2 [100] with the “unlink” and “prest ratepr = variable” model parameters. 10,000,000 generations were run in two independent runs of four independent Markov Chain Monte Carlo (MCMC) chains, and were sampled every 1000 generations. The convergence of the BI analyses was investigated using Tracer v1.7.1 software. The first 2500 trees were discarded as conservative burn-in, and the rests were used to generate a majority rule consensus tree.

In cobitid fishes, 0.680% (divergence per pairwise comparison per Ma) was calculated and suggested for the mutation rates of cytb gene [32]. In this study, BEAST v1.10.4 [101] was used to estimate the divergence time with the rate (0.68%). GTR + G + I was chosen as the best fit model by PartitionFinder2 [98]. The best-fit clock type and tree prior were selected from two clock models (strict clock and uncorrelated relaxed clock) and four tree priors (Yule process, Exponential growth, Constant size and Bayesian skyline) by comparing the marginal likelihood values estimated by path sampling [102]. The analyses were simultaneously run for 20,000,000 generations, with parameters sampled every 1000, then the first 25% of the trees were discarded as burn-in. Tracer v1.5 [103] and Figtree were used to assess the convergence and view trees, respectively.