Characterization of the complete mitogenome of the endangered freshwater fish Gobiobotia naktongensis from the Geum River in South Korea: evidence of stream connection with the Paleo-Huanghe

Background The freshwater fish Gobiobotia naktongensis (Teleostei, Cypriniformes, and Gobionidae) is an endangered class I species whose population size has been greatly reduced. Objective To successfully protect and restore the highly endangered freshwater fish G. naktongensis from the Geum River in South Korea. Methods The mitogenome was characterized using the primer walking method with phylogenetic relationships. Results The complete mitogenome of G. naktongensis Geum River was 16,607 bp, comprising 13 protein-coding genes, 2 ribosomal RNA genes, and 22 transfer RNA (tRNA) genes. Seventeen substitutions were found by comparing the tRNA regions between G. naktongensis Geum and Nakdong Rivers and G. pappenheimi; most were specific to G. naktongensis Nakdong River, with changes in their secondary structures. The comparison between G. naktongensis Geum River and G. pappenheimi revealed differences in the lengths of the D-loop and two tRNAs (tRNAArg and tRNATrp) and the secondary structures in the TΨC-arm of tRNAHis. In the phylogenetic tree, G. naktongensis Geum River did not cluster with its conspecific specimen from the Nakdong River in South Korea, but showed the closest relationship to G. pappenheimi in mainland China. Conclusions Our results support the existence of the Paleo-Huanghe River connecting the Korean peninsula and mainland China, suggesting that G. naktongensis in the Geum River should be treated as a different evolutionarily significant unit separated from that in the Nakdong River. The complete mitogenome of G. naktongensis Geum River provides essential baseline data to establish strategies for its conservation and restoration.


Introduction
The freshwater fish Gobiobotia naktongensis (Teleostei, Cypriniformes, Gobionidae) is a small species with a total length of 6-8 cm (www.fishbase.org, 2022). It was first reported in the Nakdong River system in South Korea and was classified as a novel species by Mori (1935). Since then, ecological studies by Jeon and Son (1983) and Choi (1985) have shown that its distribution extends to the Geum and Han River systems in South Korea, respectively. This species is endemic to the Korean peninsula and has been designated and protected as endangered class I species since 2005 by the Wildlife Protection Act of the Ministry of Environment in South Korea. It is also classified as a vulnerable species in the Red Data Book of Endangered Fishes Online ISSN 2092-9293 Print ISSN 1976 1 3 in Korea (NIBR 2014). Its population size has been greatly reduced by large-scale river engineering projects, such as the Four Major Rivers Restoration Project (2009)(2010)(2011). There have been no reports of G. naktongensis in the Geum River since the construction of large weirs in 2013 (MOE/NIE 2013-2018a-2018bMOE/NIBR 2013. However, after the complete reopening of the Sejong-weir at the Geum River in October 2018, G. naktongensis was found 200 m downstream of the weir in April 2019. To protect and restore endangered species successfully, it is necessary to establish effective strategies based on their population genetic structure. Mitochondrial genomes (mitogenomes) are powerful phylogenetic markers (Avise et al. 1987) or golden regions of DNA barcoding markers and are frequently used in ecological, evolutionary, and systematic studies, as well as in conservation studies of diverse vertebrate taxa. G. naktongensis is distributed in only four major river systems, the Nakdong, Geum, Han, and Imjin Rivers in South Korea. The Korean Peninsula is divided into three subdistricts based on the geological and biogeographical separation of freshwater fish fauna ( Fig. 1) (Kim et al. 2005;. Given this biogeography, a high level of genetic variation is expected to exist between the G. naktongensis populations that inhabit different major river systems in the two different subdistricts, that is, the West Korea Subdistrict and South Korea Subdistrict. However, only one mitogenomic sequence of G. naktongensis is available for the Nakdong River (Hwang et al. 2013a) in the Gen-Bank database (accession number KC353467).
The aim of this study was to analyze and characterize the complete mitogenome of G. naktongensis from the Geum River and to reveal the phylogenetic relationship by including two populations from different major river systems in South Korea.

Materials and methods
Specimens of G. naktongensis were caught in 2020 from the Geum River in South Korea using a scoop net (mesh size: 4 × 4 mm). The captured individuals had a total length of 56 mm, a body length of 49 mm, and a weight of 1.09 g. They were anesthetized by submersion in an anesthetic agent (MS222; Aqualife TMS, Syndel Laboratories, Ltd., Canada). A small piece of the caudal fin was excised with sterile scissors, and the endangered fish was released after recovery from the anesthetic in clean water. All sampling was conducted with the permission of the Ministry of the Environment of Korea. Genomic DNA (gDNA) was extracted using TNES-urea buffer according to Asahida et al. (1996). The gDNA was stored in a voucher (NeF-00001) at the Research Center for Endangered Species. The mitogenome was divided into two regions and amplified by overlapping PCR amplification, according to . The PCR products were sequenced by the primer walking method using 25 sequencing primers (available upon request) on a 3730xl DNA Analyzer (Applied Biosystems, Foster City, CA, USA). The sequence was deposited in GenBank under accession number MT539708.
The sequence data were assembled in a complete circular contig using DNA sequence analysis software (Sequencher 5.0; Gene Codes Corp., Ann Arbor, MI, USA). The mitogenomic sequence was annotated using the MITOS web server (Bernt et al. 2013) and MitoFish (Iwasaki et al. 2013) with those of other gobionid species publicly available in the GenBank database to determine the gene boundaries of protein-coding genes (PCGs) and ribosomal RNA (rRNA) genes. Transfer RNA (tRNA) genes were identified by tRNAscan-SE 1.21 (Lowe and Eddy 1997) to compare their secondary structures among Fig. 1 Major river systems in the Korean peninsula. The Geum and Nakdong River systems, in which two Gobiobotia naktongensis populations were reported, are indicated by thick lines. Three biogeographical areas based on freshwater fish assemblages were adopted from Kim et al. (2005) 1 3 G. naktongensis from Geum and Nakdong rivers in South Korea and G. pappenheimi from mainland China.
All mitochondrial genes, including PCGs, rRNA, and tRNA, were rearranged in the H-strand for further analysis. The identification of the exact start and stop codons of all PCGs was carried out after alignment using ClustalX 2.0 (Larkin et al. 2007) in MEGA-X software (Kumar et al. 2018). Nucleotide compositions were estimated and compared for all species in the genus Gobiobotia using the MEGA-X software (Kumar et al. 2018). To estimate the base composition bias, the strand asymmetry of the mitogenome of G. naktongensis Geum River was calculated using the following formulas: AT skew = [A − T]/ [A + T] and GC skew = [G − C]/[G + C] (Perna and Kocher 1995). We also calculated the values of relative synonymous codon usage (RSCU) of the mitogenome of G. naktongensis Geum River using MEGA-X software (Kumar et al. 2018). The Ka/Ks ratio of 13 PCGs, excluding the stop codon of four Gobiobotia species, was calculated using DnaSP v5 (Librado and Rozas 2009).
The mitogenomic sequences of 77 species belonging to the family Gobionidae, including G. naktongensis Nakdong River, were retrieved from GenBank. They were aligned with the sequence of G. naktongensis Geum River in this study and manually refined for phylogenetic analysis. For phylogenetic analysis, the nucleotide matrix was partitioned into four groups, according to Inoue et al. (2005). The sequences of the 12 PCSs, excluding nad6, were divided according to codon position (i.e., the first and second positions of codon triplets), excluding the third codon position. Unambiguously aligned regions from 2 rRNA and 22 tRNA genes were obtained after eliminating divergent regions using Gblocks Server (http:// molev ol. cmima. csic. es/ castr esana/ Gbloc ks_ server. html) with default settings. Nucleotide matrices of 3617, 3617, 2527, and 1498 bp for the first and second codon positions of the PCSs, and rRNA and tRNA genes, respectively, were obtained. The alignment information is available upon request in the FASTA format.
Bayesian inference (BI) analysis was conducted using MrBayes 3.1.2 (Ronquist and Huelsenbeck 2003) for all representative species of the Gobionidae, including G. naktongensis. Two leuciscid species (Leuciscus waleckii and Tribolodon hakonensis) were used as outgroups. We selected the nucleotide substitution models that best fit each partitioned nucleotide matrix using jModelTest 2 (Darriba et al. 2012) based on the Bayesian information criterion (BIC). The general time-reverse (GTR) model, allowing invariant sites and a gamma distribution (the GTR + I + Γ model), was selected for all partitions. Four independent Markov chains were simultaneously used at 1,000,000 generations with sampling every 100 generations, and the first 25% was discarded as burn-in.
Maximum likelihood (ML) analysis was performed with RAxML 7.0.4 (Stamatakis 2006;Stamatakis et al. 2008). The concatenated nucleotide matrix was divided into four partitions. A RAxML search was executed for the bestscoring ML tree in a single program run (the "-fa" option), instead of the default maximum parsimony starting tree. The best-scoring ML tree of a thorough ML analysis was determined using the GTRGAMMAI model based on 200 inferences. Statistical support was evaluated using 1000 nonparametric bootstrap inferences. The resultant tree was illustrated using TreeView 1.6.6.

Results and discussion
The complete mitogenome of G. naktongensis Geum River in South Korea is a circular molecule 16,607 bp in total length (Table 1), which is similar to the total length of other Gobiobotia species (16,609-16,637 bp; Table 2). It comprises 13 PCGs, two rRNA (12S and 16S rRNAs), 22 tRNA genes, and one control region (Table 1). Its gene content and order were identical not only to those of other congeneric species but also to those of other typical vertebrates. Twelve PCGs, excluding nad6 and eight tRNA genes (tRNA Gln , tRNA Ala , tRNA Asn , tRNA Cys , tRNA Tyr , tRNA Ser2 , tRNA Glu , and tRNA Pro ), were positioned on the heavy strand (H-strand) and the origin of replication on the light strand (L-strand) (Hwang et al. 2013a;Kim et al. 2020;Kwak et al. 2021). Overlaps in sequences among the mitochondrial genes were found in ten genes with total of 27 bp and a range of 1-7 bp. The most prominent overlaps were detected between atp8 and atp6 and between nd4L and nd4, and the others were frequently found between PCGs and tRNA genes or between two adjoining tRNA genes. Although most PCGs started with ATG (a putative start codon), cox1 started with GTG, the result of which is identical to that of most typical vertebrates (Tzeng et al. 1992;Miya et al. 2003;Shan et al. 2016). Ten out of 13 PCGs had complete stop codons (TAA or TAG), while the other three genes (cox2, cox3, and cob) had incomplete stop codons, such as T or TA. These incomplete stop codons can be converted to TAA by polyadenylation after transcription during mRNA maturation (Ojala et al. 1981).
We compared the nucleotide composition of the mitogenomes of five Gobiobotia species, including two G. naktongensis populations ( Table 2). The nucleotide composition of G. naktongensis Geum River was A = 30.3%, G = 16.8%, T = 26.3%, and C = 26.3%, showing a bias toward A + T (56.9%), similar to other Gobiobotia species, which is similar to the results of most fish mitogenomes (Wang et al. 2020;Yang et al. 2018).
AT/GC skew is a method used to evaluate the excess of A and/or C nucleotides based on the H-strand; a positive skew value indicates that the T and/or G nucleotides consist of a relatively small number. Thus, AT/GC skew is a measure of compositional asymmetry. Owing to asymmetrical directional mutation pressure (Francino and Ochman 1997;Perna and Kocher 1995;Yang et al. 2018), such asymmetry is reflected in the codon usage of genes in different directions.
For example, H-strand encoded genes show a clear preference for C in the codon wobble position, whereas L-strand encoded genes for G or T. The PCGs of the mitogenome of G. naktongensis Geum River had a slightly higher AT content (56.9%) than the rRNA genes (55.2%) ( Table 3). The control region, which occupies most of the non-coding region, showed a significantly higher AT content (66.5%), similar to the mitochondrial genes of other fishes (Wang et al. 2020;Yang et al. 2018;Zhou et al. 2017). AT skews by gene regions were all positive, except for PCGs, and all GC skews were negative, except for tRNAs. Particularly, rRNA regions had a highly A-biased nucleotide composition, and 13 PCGs regions had a highly C-biased composition. The AT skews among the 13 PCGs in G. naktongensis Geum River waved near zero, ranging from -0.072 to 0.117, except for nad6 (-0.462), and the values of negative and positive AT skews were similar (Fig. 2). All GC skews in the 13 PCGs ranged from -0.428 to -0.179 except for nad6 (0.423). This result suggests that more C nucleotides are present in most PCGs, and nad6 only presented negative AT and positive GC skews, which is consistent with most previous reports of strand asymmetry in freshwater fish (Hwang et al. 2013b, c). Among 3,857 codons encoded by 13 PCGs, the amino acids Ala, Arg, Gly, Leu1, Pro, Ser1, Ser2, Thr, and Val were utilized by four different codons, and the other amino acids were encoded by either one or three. Figure 3 shows the amino acid codon usage by relative synonymous codon were the least frequently present. The only exception was G. intermedia, in which the proportion of Ala (GCC) was the most frequently used codon. The A + T content and AT/ GC skew of the 13 PCGs are closely related to codon usage (Chao et al. 2014;Shi et al. 2016). The results of the RSCU analysis showed that the most frequently used codons at the 3rd position were A, and the least frequently used codons were G, indicating that A or C were used more frequently than T or G in codons at the 3rd position, indicating saturation (Yamanoue et al. 2007). The Ka (non-synonymous)/Ks (synonymous) ratio is particularly useful for determining the evolutionary relationship between PCGs in the mitogenomes of closely related species (Fay and Wu 2003). This ratio is an indicator of the selective pressure on PCGs: negative selection (Ka/Ks < 1), positive selection (Ka/Ks > 1), and the balance of both selections (Ka/Ks = 1) (Meganathan et al. 2011;Li et al. 2012). The Ka/Ks ratios of all PCGs among Gobiobotia species were < 1 (range 0.024-0.100), suggesting that they were under strong negative (purifying) selection and environmental changes were not large enough to change genetic function (Fig. 4). Additionally, no Ks sites (synonymous) were detected in atp8 between G. naktongensis Geum River and G. pappenheimi.
The mitogenome of G. naktongensis Geum River contains 22 tRNA genes that are typically found in vertebrates. Their lengths ranged from 68 bp (tRNA Cys ) to 76 bp (tRNA Lys ). Most tRNAs, except for tRNA Ser1 , were predicted to be folded into typical cloverleaf secondary structures, and like most vertebrates, tRNA Ser1 lacked a recognizable D-arm with a loop (Cui et al. 2007;Zhou et al. 2009). Twelve mismatched pairs (mainly A-C, and rarely U-U, A-A, C-C, and U-C) were predicted in nine genes (tRNA Phe , tRNA Val , tRNA Met , tRNA Trp , tRNA Arg , tRNA His , tRNA Ser1 , tRNA Thr , and tRNA Ser2 ) among the 22 tRNA genes, and G was inserted at the 3′-end (upward direction) of the TΨC-stem of tRNA Ser1 . The mismatch of the stem region can be corrected through a post-transcriptional RNA editing mechanism (Masta and Boore 2004).
The prediction of the secondary structure of tRNA genes among G. naktongensis Geum and Nakdong Rivers and G. pappenheimi revealed 17 base substitutions (including indels) in 11 of 22 tRNA genes (data not shown). Among them, nine substitutions were specific to G. naktongensis Nakdong River, three specific to G. naktongensis Geum River, and the other five substitutions were specific to G. pappenheimi. tRNA genes showing structural differences are indicated in Fig. 5. In tRNA Arg and tRNA Ala , structural differences in the acceptor and anticodon stems were found because of specific base substitutions in G. naktongensis Nakdong River. The D-loop showed differences in the lengths of the two G. naktongensis populations (7 bp) and G. pappenheimi (8 bp) due to the base indels of tRNA Arp and tRNA Trp . Moreover, the lengths of tRNA His were the same, but showed a length difference in the variable region between the anticodonstem, TΨC-stems, and TΨC-loops. All tRNAs were more conserved than synonymous regions, except for the variable loops, and the TΨC-and D-stem regions were more conserved than the loop regions (Vilmi et al. 2005). Point mutations within the tRNA gene have the potential to affect the tRNA metabolic fate structurally, functionally (or both) critical positions for the tRNA metabolic fate (Helm et al. 2000). Therefore, G. naktongensis Geum and Nakdong Rivers may be taxonomically different from G. pappenheimi.
Phylogenetic trees were reconstructed based on the BI and ML methods using the mitogenomic sequence matrix from 12 PCGs, 2 rRNA genes, and 22 tRNA genes of the representative gobionid species, including G. naktongensis Geum River. In the resulting trees, all the gobionid species formed a monophyletic group with respect to the two outgroups. All Gobiobotia species clustered with Xenophysogobio nudicorpa and were clearly distinct from the other gobionid genera (Fig. 6). Within the lineage, G. naktongensis Geum River showed a closer relationship to G. pappenheimi in mainland China with 100% bootstrap value in ML analysis and 0.94 posterior probability value in BI analysis, rather than its conspecific population reported from the Nakdong River, its type locality (Mori 1935).
Recent phylogenetic studies have revealed significant genetic differences in freshwater fish fauna between major river systems in South Korea Won et al. 2020). In South Korea, many independent rivers have developed because of the large number of mountain ranges, which is known to have occurred due to large-scale events between the Early Triassic and Early Miocene (Chough et al. 2000;Won et al. 2020). They have served as vicariant barriers that have shaped the current biogeography and allopatric speciation of fish assemblages in the Korean Peninsula ( Fig. 1) (Kim et al. 2005). Previous phylogenetic studies of two Coreoleuciscus species ) and two Koreocobitis species , endemic to the Korean peninsula, suggested that their biogeography was clearly divided into two disjunct areas, the West Korea and South Korea subdistricts, by the major vicariant barriers along the Noryeong/Sobaek mountain ranges and Baekdudaegan mountain range. The two G. naktongensis populations from the two major river systems, the Geum and Nakdong rivers in the West Korea and South Korea subdistricts, respectively, also showed the same biogeographical distribution pattern as those of Coreoleuciscus and Koreocobitis . Each of the two species in both genera was erected as a novel species based on further taxonomic studies, as well as significant genetic divergence (Kim et al. 2000;Song and Bang 2015). According to Lindberg (1972) and Nishimura (1974), when sea levels fell during the Ice Age, the Han (including the Imjin River) and Geum River systems in the Korean Peninsula were connected to the Yellow River system (Huanghe) in mainland China (the Paleo-Huanghe River). Thus, our results showing a closer relationship of G. naktongensis Geum River to G. pappenheimi in mainland China than its conspecific population from the Nakdong River provides important evidence for the existence of the Paleo-Huanghe River system that had shaped the distinct fish assemblages Red arrows indicate point mutations, and red boxes indicate structural changes. The numbers in the D-loop region indicate their lengths of nucleotide sequences in the West Korea Subdistrict, including the Han, Imjin, and Geum River systems.
The G. naktongensis populations of the Imjin, Han, and Geum Rivers should be taxonomically separated from the population of the Nakdong River, and their taxonomic status should be considered in future studies. In addition to taxonomic issues between the two G. naktongensis populations, the existence of the two distinct lineages strongly suggests Maximum-likelihood (ML) phylogenetic tree based on the complete mitogenomes from gobionid species belonging to the order Cypriniformes. The nucleotide sequence matrix included the first and second codon positions of 12 protein-coding genes and unambiguously aligned regions of two ribosomal RNA and 22 transfer RNA genes. Bootstrap value above 50% in the ML analysis and posterior probability value above 0.90 in the Bayesian inference analysis are indicated at each node. Gobiobotia naktongensis analyzed in this study is shown in bold 1 3 that both should be protected because populations from separate biogeographical regions with significant genetic variation may be considered separate evolutionarily significant units.

Conclusions
In this study, we analyzed the complete mitogenome of the G. naktongensis population from the Geum River to construct a phylogenetic tree of gobionid species. This study provides baseline data for the molecular identification, geographical distribution, and phylogenetic study of endangered freshwater fish species in the Korean peninsula. These will also be essential for establishing plans and strategies for the conservation and restoration of this evolutionarily significant unit.