Introduction

For the proper conservation and management of natural populations, it is essential to examine genetic population structures in addition to the genetic characteristics of the population, such as genetic diversity and population demography (Coates et al. 2018). A growing number of studies have suggested that genetic diversity is an important resource for adapting to environmental changes (Barrett and Schluter 2008; Bernatchez 2016; Bitter et al. 2019). Therefore, appropriate assessment of genetic diversity is fundamental for establishing management and conservation units, evaluating the genetic health of current populations, and assessing the adaptive potential of populations. Additionally, examining population dynamics in relation to past geographic history provides not only insights into the underlying evolutionary processes of current population genetic structures but also valuable information for appropriate fisheries management and conservation measures of populations (Benestan 2020).

Analysis of mitochondrial (mtDNA) is beneficial for addressing evolutionary, phylogenetic, and phylogeographic questions owing to its unique characteristics compared with nuclear DNA, including its haploid nature, maternal inheritance, and higher substitution rates (Avise 2000). In population genetics and phylogeographic studies including fish species, the mtDNA control region and the cytochrome b gene (Cyt b) are frequently used as genetic markers (Avise 2000). The control region of the mitochondrial genome is a noncoding region that comprises a series of variable regions and conserved sequence blocks (Satoh et al. 2016). The variable region, also called the “hypervariable region” (HVR), harbors dozens of times more substitutions than other mtDNA coding regions (Aquadro and Greenberg 1983; Stewart and Baker 1994; McMillan and Palumbi 1997), thereby making this region the most suitable for the evaluation of recent evolutionary history, including population demographics.

Despite the advantages of population genetics studies, the HVR tends to accumulate multiple substitutions, even at the same sites, within a relatively short time frame (Heyer et al. 2001), often resulting in homoplasy. When analyzing sequence data, sequences are generally interpreted under the assumption of identity by descent (IBD); multiple individuals share the same nucleotide sequence without recombination and are inherited from a common ancestor. However, recurrent substitutions (e.g., C → G → C) can result in homoplasy; multiple individuals share apparently “IBD” sequences, but these are simply identical nucleotide sequences (identity by state, IBS) having different evolutionary processes (or different substitution steps). Convergent molecular evolution results in a misunderstanding of the substitution history of sequences (Excoffier et al. 1992; Phillips et al. 2009); unintentionally treating IBS sequences caused by homoplasy as IBD sequences (false haplotyping).

Distinguishing IBD from IBS sequences (hereafter referred to as haplotyping accuracy) in the haplotyping process is essential for accurately evaluating genetic diversity. Particularly in populations with high haplotype or nucleotide diversity, homoplasy is likely to affect polymorphism evaluation and potentially leads to underestimation of genetic diversity such as haplotype diversity (h) (Nei 1987) and nucleotide diversity (π) (Nei and Li 1979). Studies have shown that such false haplotyping could also introduce biases in population genetic diversity statistics, including FST (Bradman et al. 2011; Ando et al. 2016; Verma et al. 2016); however, little is known about the effects of homoplasy on population demographics. An empirical study of shark species implicated homoplasy in the HVR (Tavares et al. 2013) even though the genetic diversity of these species is not high in marine species, which highlights the need for close attention to homoplasy in the HVR when studying marine fish. To avoid false haplotyping resulting from homoplasy, it is imperative to analyze the concatenated sequences with other regions that exhibit lower substitution rates than the HVR (Ando et al. 2016).

The marbled flounder Pseudopleuronectes yokohamae is distributed across the Japanese Archipelago, Korean Peninsula, and China, from the Gulf of Pohai southward of the Yellow Sea to the northern part of the East China Sea (Nakabo 2002; Amaoka 2016; Tanda et al. 2008). This species is an important target of coastal fisheries. The biological characteristics of this species are unique among flatfish species; it spawns adhesive eggs and has limited adult migration (Minami 2008; Amaoka 2016). Owing to the limited dispersal potential, even among geographically close populations, these populations are expected to be structured, and this indeed has been shown in previous studies (Park et al. 1990; Fujii 2010; Tsukagoshi et al. 2015; Sato et al. 2018). Several studies have analyzed the population genetic structure of P. yokohomae using the HVR, or the first half of the control region, exhibiting substantial genetic differences among sampling populations (Fujii 2010; Ikeda et al. 2010; Lee et al. 2012; Tsukagoshi et al. 2015). These studies also revealed that P. yokohomae exhibits considerably high genetic diversity; therefore, it is predicted that the HVR in P. yokohomae might be saturated with substitutions owing to its highly variable nature.

In the present study, we investigated the possibility of homoplasy at the HVR and evaluated haplotyping accuracy in P. yokohomae using common mtDNA markers: the first half of the control region (HVR), NADH-dehydrogenase subunit 2 (ND2), and Cyt b genes. Specifically, considering the differences in substitution rates among regions, we examined the haplotype linkages of these regions, patterns in haplotype networks, and site-specific substitution rates to make inferences regarding evolutionary history. The haplotyping accuracy was evaluated by comparing polymorphisms in each region with concatenated sequences. Additionally, we investigated the demographic history of P. yokohamae and discuss the potential pitfalls of inferring population history using the HVR owing to its highly variable nature. On the basis of the findings, we identify the challenges of conducting genetic diversity assessments and evolutionary demographic analyses only using the HVR and provide solutions to the problems.

Materials and methods

Samples and mtDNA sequencing

A total of 120 individuals of P. yokohomae (standard length 18–223 mm) were collected from Otsuchi Bay in Iwate Prefecture, Japan, between 2012 and 2013. The whole samples were preserved in 99.5% ethanol for DNA extraction. Genomic DNA was extracted from the fin clips using the QuickGene DNA tissue kit S and Mini80 (KURABO, Osaka, Japan), following the manufacturer’s instructions. The target fragments of the mtDNA markers, the first half of the control region (i.e., HVR, 378 bp), ND2 (1045 bp), and Cyt b (1141 bp) were amplified via polymerase chain reaction (PCR) using the following custom primers: HVR (5′-TCT TAC CCC TAA CTC CCA AAG C-3′ and 5′-GAA GTA GGA ACC AAA TGC CA-3′), ND2 (5′-AGA TCA AAA CTC TTC GTG CT-3′ and 5′-CAT GCA GAA GAT GTG GGA TA-3′), and Cyt b (5′-ACC CCA ACA CAA GAG AAA AT-3′ and 5′-AGA GCA TGC ATT ACA AGA CA-3′). For amplification, 1 μL of the template DNA, 0.125 μL of Blend Taq -Plus- (TOYOBO, Osaka, Japan), 1.25 μL of buffer for Blend Taq -Plus- (10 ×) (TOYOBO), 1.25 μL of 2 mM dNTP, 0.25 μL of each primer (25 μM), and 8.88 μL of sterile deionized H2O were added to each microtube to adjust the final volume to 13 μL. The PCR amplifications were carried out in a Veriti™ thermal cycler (Applied Biosystems, Waltham, MA, USA) under the following PCR cycling conditions: initial denaturation at 94 °C for 2 min, followed by 35 cycles of denaturation at 94 °C for 30 s, annealing at 50 °C (for the HVR primers) and 54 °C (for the ND2 and Cyt b primers) for 30 s, extension at 72 °C for 1 min, and final extension at 72 °C for 5 min. The PCR products were purified using the AMPure XP kit (Beckman Coulter, Brea, CA, USA) according to the manufacturer’s instructions. PCR products were bidirectionally sequenced using the BigDye® Terminator v3.1 cycle sequencing kit (Applied Biosystems) following the manufacturer’s instructions, using the same primers as used for amplification. Reactions were purified using the ethanol/EDTA/sodium acetate method described in the above-mentioned manual and then dissolved in 10 μL of Hi-Di formamide (Applied Biosystems). Samples were sequenced using a 3500XL genetic analyzer capillary sequencer (Applied Biosystems-HITACHI) at Tohoku University, Japan. All sequences obtained in this study were deposited in the DDBJ/EMBL/GenBank databases under accession numbers LC754366–LC754404 and LC754730–LC754784.

Data analyses

All sequences were edited manually using FinchTV 1.4.0 (Geospiza, Inc.; Seattle, WA, USA; http://www.geospiza.com) and aligned using ClustalW (Thompson et al. 1994) implemented in Molecular Evolutionary Genetics Analysis version 7.0 (MEGA7) for bigger datasets (Kumar et al. 2016). Using the HVR data, Tsukagoshi et al. (2015) suggested that P. yokohamae and the related species cresthead flounder P. schrenki hybridized primarily in the northern part of the Japanese Archipelago; they found that some P. yokohamae samples were placed in the P. schrenki clade in the neighbor-joining (NJ) tree (Saitou and Nei 1987), and vice versa. Thus, by combining data from Tsukagoshi et al. (2015) deposited in GenBank (acc. nos. AB979354.1–AB979435.1), an NJ tree was constructed using MEGA7 to exclude samples with haplotypes belonging to the P. schrenki clade.

To evaluate DNA haplotyping accuracy, the following variability indices were estimated for each region and for the concatenated sequences using Arlequin version 3.5.1 (Excoffier and Lischer 2010): number of polymorphic sites and haplotypes, haplotype diversity (h), and nucleotide diversity (π). All concatenated sequences were used to estimate site-specific substitution rates using Parat 0.9.1 (Meyer and Von Haeseler 2003) with default parameters to assess substitution rate variation among the sites. Haplotype networks were constructed based on each HVR, ND2, and Cyt b sequence data via the statistical parsimony method using TCS v1.21 (Clement et al. 2000) with default parameters in support of homoplasy detection; the number of reticulations in the haplotype network correspondingly increases with recurrent substitutions (Bandelt et al. 1995).

To detect the incidence of homoplasy, haplotype linkages observed between different regions were compared in light of the evolutionary pathways of the mtDNA regions. According to the variability indices and haplotype networks, we assumed that Cyt b was the most evolutionarily conserved among the three regions (see also the “Results” section). Based on the evolutionary relevance of the different mtDNA regions, haplotype linkages between HVR or ND2 and Cyt b were examined to discriminate between IBD and IBS (homoplasy) following assumptions according to Phillips et al. (2009). Briefly, if a group of samples shares an identical HVR or ND2 haplotype but has multiple linkages to Cyt b haplotypes, homoplasy possibly occurs in the HVR/ND2, considering the differences in nucleotide substitution rates among the three regions.

  1. 1.

    The haplotype network traces its ancestry to a single ancestor possessing a linkage of plesiomorphic HVR, ND2, and Cyt b haplotypes.

  2. 2.

    New linkages appear as a result of new mutations at each locus.

  3. 3.

    Because of the hypervariable nature of the control region, homoplastic mutations are much more likely in the HVR, followed by ND2, and less likely in Cyt b.

  4. 4.

    Plesiomorphic (ancestral) haplotypes are common and exhibit a greater number of haplotype linkages than derived or tip haplotypes.

  5. 5.

    Two or more HVR/ND2 haplotypes can have no more than one Cyt b linkage in common, except by homoplasy.

  6. 6.

    Linkages of a given HVR/ND2 haplotype to multiple Cyt b haplotypes involve homoplasy if the HVR/ND2 haplotype is not linked to all Cyt b haplotypes in the series of mutational steps at Cyt b (except when involving an inferred Cyt b haplotype).

The demographic histories of the P. yokohamae population were inferred using the network analysis described above and a Bayesian skyline plot (BSP, Drummond et al. 2005). Since substitution saturation potentially affects phylogenetic inferences (Strimmer et al. 2009; Xia and Lemey 2009) and demographic inferences, the saturation of substitution was evaluated by plotting the number of transitions and transversions versus the F84 distance using DAMBE7 (Xia 2018). Haplotype networks provide a good overview of population demography in addition to the genealogical relationships of haplotypes. A star-like haplotype network indicates a recent population expansion, whereas a haplotype-scattered shape with a loop structure indicates a constant population size over time (Koizumi and Ikeda 2013). Skyline-plot methods are based on the coalescent theory to reconstruct fluctuations in effective population size (Ne) (female effective population size Nef in the case of mtDNA) over time with information contained in haplotype genealogies of DNA sequences (Ho and Shapiro 2011). BSPs were depicted using the sequence data from each region as well as the concatenated ones. For the concatenated data, the sequences were partitioned by region, assigning priors and parameters individually. The best nucleotide substitution models estimated in MEGA7 were HKY for the HVR and TN93 for both ND2 and Cyt b regions. The BSP analyses were performed under the substitution models, strict clock model, and coalescent Bayesian skyline model using BEAST v2.7.6 (Bouckaert et al. 2014). A mutation rate of 7.5%/million years (Myr), or 0.0375 substitutions per site per lineage per Myr, was applied to the HVR based on 3–12% of the average mutation rate of this region in flatfish (Xiao et al. 2011). As the specific mutation rates of ND2 and Cyt b in flatfish were unavailable, a general 2%/Myr divergence rate (i.e., 0.01 substitutions per site per lineage per Myr) in bony fish (Bermingham et al. 1997) was applied. The clock rate prior was set to a log-normal with M = 3.75 × 10−8 and S = 0.35 for the HVR, and M = 1.0 × 10−8 and S = 0.25 for both ND2 and Cyt b. The Markov chain Monte Carlo analysis was run for 1.0 × 108 generations for each region and 3.0 × 108 generations for the concatenated data, sampling every 1000 generations, and discarding the first 10% as burn-in. The convergence of sampling processes was monitored using Tracer v1.7.2 (Rambaut et al. 2018), ensuring that all effective sample size values were above 200. The BSP was then generated using Tracer. To calculate the effective female population size, a generation time of 3 years was assumed according to the age at which more than 50% of the adults are sexually mature (Takahashi et al. 1983).

Results

One individual belonged to the P. schrenki clade in the NJ tree; therefore, the remaining 119 samples were used for subsequent analyses. Table 1 presents the mtDNA diversity of P. yokohamae in each region and the concatenated regions. As documented in previous studies, high haplotype diversity (h) and nucleotide diversity (π) were observed in the HVR in this study. Comparing the variability indices among the three mtDNA regions, the genetic diversity of the HVR was the highest with h = 0.960 and π = 0.0176, followed by ND2 (h = 0.922 and π = 0.0027), and Cyt b (h = 0.873 and π = 0.0031). When the sequences were concatenated, the number of haplotypes and the haplotype diversity increased. They were maximized when all the sequence data were concatenated with 49 haplotypes. Haplotyping accuracy was greatly improved by concatenating sequences and increasing the number of haplotypes by seven (HVR + ND2), seven (HVR + Cyt b), and ten (all regions combined) compared with that of the HVR alone.

Table 1 Summary of mitochondrial DNA diversity of each region and concatenated regions of the Pseudopleuronectes yokohamae population

Substitution sites were widely distributed in the three regions; however, sites with high site-specific substitution rates were concentrated in the HVR (Fig. 1). Specifically, many of the highest site-specific substitution rate category sites (i.e., sites #9, #176, #304, and #372 in the HVR; #306 and #424 in ND2; #471 and #639 in Cyt b) occurred in the HVR, but some others were also observed in the protein-coding regions.

Fig. 1
figure 1

Histogram of relative site-specific substitution rate across the three mitochondrial DNA regions, a the first half of the control region (HVR, 378 bp), b NADH-dehydrogenase 2 gene (1045 bp), and c cytochrome b gene (1141 bp), estimated using 119 Pseudopleuronectes yokohamae samples. The lengths of the bars reflect the relative substitution rate at each site, where the average substitution rate is normalized to 1 using the concatenated sequence data

The haplotype network in the HVR and ND2 presented a large number of reticulations, whereas that in Cyt b exhibited a relatively simple shape without a loop structure (Fig. 2). Considering that the reticulate network was characterized by recurrent substitutions (including homoplasy) and relatively low genetic diversity in the Cyt b (Table 1), Cyt b represents the most conserved and plausible phylogeny among the three regions. Therefore, the haplotype linkages of HVR–Cyt b and ND2–Cyt b are investigated to distinguish IBD from IBS (homoplasy) under the assumptions 1–6 described in the “Materials and Methods” section. A total of six HVR haplotypes had multiple linkages to Cyt b haplotypes (i.e., samples that have an identical HVR haplotype shared more than one Cyt b haplotypes) ranging from 1 to 5 bp difference. In ND2, a total of four haplotype linkages were found with 1−5 bp difference in Cyt b. Such HVR/ND2 haplotypes highlighted with specific colors in Fig. 2c, d were analyzed in conjunction with their observed evolutionary pathways of Cyt b haplotypes. Inferences regarding specific haplotypes are outlined below and summarized in Table 2 (see also Online Resource, Table S1 for the relationships between sample IDs and haplotype IDs in each region).

Fig. 2
figure 2

Haplotype networks of a the first half of the control region (HVR), b NADH-dehydrogenase 2 (ND2), and cytochrome b (Cyt b) gene, depicted using 119 Pseudopleuronectes yokohamae samples. Haplotype linkages of c HVR–Cyt b and d ND2–Cyt b are illustrated within the Cyt b haplotype network, respectively. For each linkage, bold white letters indicate multiple haplotype linkages (i.e., individuals that have an identical HVR or ND2 haplotype share more than one Cyt b haplotypes) where each color represents a specific haplotype linkage. The size of the circles corresponds to the number of samples. The small filled circle represents missing inferred haplotypes. See also Online Resource, Table S1 for the relationships between sample IDs and haplotype IDs

Table 2 Summary of the observed linkages between the control region (HVR) or NADH-dehydrogenase 2 (ND2) gene haplotypes and multiple cytochrome b (Cyt b) gene haplotypes in the Pseudopleuronectes yokohamae population

(1) Haplotype cr-6

The haplotype was linked to two Cyt b haplotypes (cb-24 and cb-25) that differed by single transitional substitution at the site with low site-specific substitution rate. The terminal Cyt b haplotype (cb-25) consisted of only one individual. In this case, the haplotype linkages can be explained by a single mutational step of Cyt b at the site where substitution rarely occurs; thus, it was probably the result of mutation at Cyt b.

(2) Haplotypes cr-8 and nd-8

These haplotypes had two linkages with Cyt b haplotypes (cb-6 and cb-15) that differed by single transitional substitution at the site with low site-specific substitution rate. The common Cyt b haplotype of cb-6 seemed an ancestral haplotype, while the haplotype cb-15 was a relatively minor terminal haplotype. Individuals having the cb-15 haplotype shared multiple HVR and ND2 haplotypes; thus, it is unclear whether this is the result of retention of particular plesiomorphic HVR/ND2 haplotype or homoplasy. However, both cr-8 and nd-8 haplotypes shared the same Cyt b haplotype linkages, thus the linkages may be a result of a single mutational event at Cyt b.

(3) Haplotypes cr-9 and nd-11

The haplotypes cr-9 and nd-11 linked to two (cb-1 and cb-20) and three Cyt b haplotypes (cb-5, cb-12, and cb-19), respectively. All these haplotypes differed by transitional substitutions at sites of low site-specific substitution rates. Linkages of neither HVR nor ND2 haplotypes to the Cyt b haplotypes were in the series of mutational steps at Cyt b (i.e., no individuals had the cb-6 haplotype). Therefore, these two HVR/ND2 haplotypes are probably the consequence of homoplastic mutations at HVR/ND2. The number of mutational events necessary to explain each HVR–Cyt b and ND2–Cyt b linkages were two and three, respectively.

(4) Haplotype nd-12

The haplotype is linked to two Cyt b haplotypes (cb-6 and cb-23) that diffed by four transitional substitutions. In addition, individuals having these major Cyt b haplotypes also shared multiple ND2 haplotypes. The linked Cyt b haplotypes varied at sites including the highest site-specific substitution rate category sites such as #471 and #639, which indicates that substitutions are likely to occur at such Cyt b sites. However, it would be plausible to consider that homoplasy occurred at ND2 as four mutational steps needed to resolve the ND2–Cyt b linkage.

(5) Haplotypes cr-11 and nd-3

The cr-11 and nd-3 haplotypes linked to three (cb-6, cb-19, and cb-23) and five Cyt b haplotypes (cb-2, cb-4, cb-6, cb-10, and cb-14), respectively, possibly are the results of complex evolutionary histories. The three Cyt b haplotypes linked to the cr-11 haplotype diffed by a maximum of five transitional substitutions with sites including the highest site-specific substitution rate category. The linkage between cb-6 and cb-23 was probably the consequence of homoplasy at the HVR as described in the case of (4). On the other hand, the cb-19 was a minor and tip haplotype that consisted of only one individual; thus, it can be produced by a single transitional mutation from the cb-6 haplotype. For the nd-3 haplotype, five mutational events were required to fully resolve the total of five ND2–Cyt b linkages, and most of the linkages were involved with the core haplotype cb-6. The haplotypes cb-4 and cb-10 were terminal and minor haplotypes that each differed from the cb-6 haplotype by a single transition at a site with low site-specific substitution rates, suggesting that these haplotypes probably were the result of a single mutational event from the cb-6 haplotype. The cb-14 haplotype also linked to the hub haplotype of cb-6 with single transitional substitution at the site of low site-specific substitution rate; however, it contained individuals having multiple ND2 haplotypes. Therefore, it is difficult to distinguish whether this is the result of mutation of Cyt b or homoplasy at ND2. Conversely, the cb-2 haplotype lacked consecutive Cyt b linkages that can be explained by a series of mutational steps at Cyt b. Additionally, the cb-2 and cb-6 haplotypes diffed by two sites including transversional substitution, which is less likely to be expected than transitional substitution. Therefore, homoplasy at ND2 is probably involved in the evolutionary pathway of the cb-2 haplotype.

(6) Haplotypes cr-20 and cr-25

These haplotypes each had two linkages with Cyt b haplotypes (cb-6 and cb-14 for cr-20 and cb-6 and cb-12 for cr-25 haplotypes) that differed by single transitional substitution at site with low site-specific substitution rate. Both haplotypes linked with the core haplotype cr-6, while other ones were relatively minor terminal haplotypes, sharing individuals having multiple HVR haplotypes. Single mutational event at Cyt b or homoplasy at HVR both reasonably explained these evolutionary pathways and are difficult to conclude.

Because homoplasy was suggested in the HVR and ND2, substitution saturation was assessed for these regions. No saturation was observed in both regions because transitions outnumbered transversions regardless of pairwise genetic distances (Online Resource, Fig. S1).

The haplotype networks represented contrasting population demographics (Fig. 2); the HVR showed a diffuse pattern with loop structures, suggesting a constant population size over time, while the star-like networks of both ND2 and Cyt b implied a recent population expansion. The BSPs showed similar demography across the three regions; a constant population size followed by a recent decline from approximately 10–20 kiloyears ago (kya) (Fig. 3a–c). Additionally, the estimated median Nef from the time series was nearly identical. The BSPs from the HVR and Cyt b traced a longer history with approximately 135 ky than that of ND2 with only approximately 80 ky. The concatenated sequence data traced the longest demography history to approximately 165 ky showing a population expansion at approximately 125 kya, which was detected using none of the single regions alone (Fig. 3d). After the population growth, the BSP exhibited similar demographic transition to those from each region: a constant population size and a recent decline from approximately 10 kya.

Fig. 3
figure 3

Demographic history of Pseudopleuronectes yokohamae population inferred by Bayesian skyline plot (BSP) using a the first half of the control region (HVR), b NADH-dehydrogenase 2, c cytochrome b gene, and d the concatenated sequences. The solid line in the BSP shows median estimates for female effective population size (Nef) over time (kiloyears ago) with 95% central posterior density intervals (blue colored area). Note that Nef is plotted on a log scale. The black and gray bars above the x-axis represent glacial and interglacial periods (Machida et al. 2003), respectively

Discussion

To evaluate the possibility of homoplasy in the HVR and haplotyping accuracy in P. yokohamae, we investigated the evolutionary relationships of multiple mtDNA regions. Compared with ND2 and Cyt b regions, genetic variability in the noncoding HVR was particularly high, with more sites in the highest site-specific substitution rate categories (Table 1, Fig. 1). In contrast, Cyt b was the most conserved among the three regions, showing the lowest haplotype diversity (Table 1), a relatively low substitution rate (Fig. 1), and a simple haplotype network (Fig. 2). On the basis of the haplotype network of Cyt b, multiple non-IBD substitution patterns were observed in the HVR and ND2, which was also supported by the reticulate structure of the haplotype networks (Fig. 2). Considering these results, homoplastic mutations are highly likely to occur in the P. yokohamae population. A remarkable finding of this study was that homoplasy could be detected not only in the HVR but also in the ND2 gene, which is a protein-coding region with a relatively lower substitution rate than that of the HVR. Homoplasy in the HVR has been reported in a wide variety of marine organisms, such as the Steller sea lion Eumetopias jubatus (Phillips et al. 2009), shark species (Tavares et al. 2013), and Japanese flounder Paralichthys olivaceus (Ando et al. 2016); however, few cases of homoplasy at ND2 gene have been documented (Podlesnykh et al. 2020).

Considering our results and those of previous case studies, homoplasy in the HVR may occur in other fish species owing to the highly variable nature of this region. Indeed, homoplasy in the HVR has been implicated in some flatfish species (such as the willowy flounder Tanakius kitaharai (Xiao et al. 2008), blackfin flounder Glyptocephalus stelleri (Xiao et al. 2010), and point-head flounder Cleisthenes pinetorum (Xiao et al. 2011)). Although these studies did not assess the evolutionary history of the HVR, the genetic diversity of these flatfishes (i.e. h = 0.93–0.99, π = 0.009–0.017) was similar or even relatively higher than that of P. yokohamae. In such species or population(s) having numerous substitution sites, some identical haplotypes may indeed have resulted from IBS owing to the recurrent substitutions. Thus, it is not surprising that homoplasy also occurs in these fish. Furthermore, high genetic variability in the HVR has been reported in many fish species across wide taxa: h = 0.776–0.981, π = 0.0178–0.0298 in European plaice Pleuronectes platessa (Hoarau et al. 2004); h = 0.968–1.000, π = 0.008–0.014 in red porgy Pagrus pagrus (Ball et al. 2007); h = 0.913–0.993, π = 0.0073–0.0099 in yellow drum Nibea albiflora (Han et al. 2008). It should be noted that homoplastic substitution was suggested in shark species even though their genetic diversity was much lower, with h = 0.540 and π = 0.00216 in Carcharhinus limbatus, h = 0.880 and π = 0.00436 in C. porosus, and h = 0.879 and π = 0.00405 in Rhizoprionodon porosus (Tavares et al. 2013). The level of genetic variability indeed varies among species; however, the control region, particularly the HVR, is widely recognized as a hypervariable region (Lee et al. 1995). Considering that the control region has a universal structure across a diverse group of fish species, as evidenced by Satoh et al. (2016) in their study of 250 species, it is natural to consider that homoplasy occurs at the HVR in other fish species. Therefore, it is worth investigating the evolutionary relationships of the HVR with other mtDNA regions to recognize the potential risk caused by homoplasy.

To more accurately estimate genetic diversity (i.e., h and π) and population genetic diversity statistics (i.e., FST), mtDNA sequences across different regions should be concatenated. This study revealed that the haplotyping accuracy was greatly improved by concatenating the HVR and coding region(s) (Table 1). A previous study also reported that haplotyping accuracy was improved by concatenating the HVR and ND2 in P. olivaceus (Ando et al. 2016). Some population genetic diversity and population genetic analyses of P. yokohomae have been implemented using the HVR alone (Fujii 2010; Ikeda et al. 2010; Lee et al. 2012; Tsukagoshi et al. 2015); thus, these analyses may be biased owing to false haplotyping resulting from homoplasy. We were unable to assess the potential bias of homoplasy in population genetic statistics because only one sample population was used in this study; however, previous studies have reported overestimation or underestimation of FST owing to false haplotyping resulting from homoplasy (Bradman et al. 2011; Ando et al. 2016; Verma et al. 2016). Therefore, multiple mtDNA regions, ideally the whole mitochondrial genome, should be analyzed to obtain robust estimates and reduce the risk of false haplotyping.

Considering the non-IBD substitution patterns (Table 2) and the high site-specific substitution rate in the HVR (Fig. 1), homoplasy may have occurred at such mutation hotspots. Nevertheless, it is difficult to conclude that these cases resulted from homoplasy because recurrent mutation events could not be directly observed. This study showed that the Cyt b region was evolutionarily the most conserved among the three regions but still possessed sufficient variability for genetic markers. Additionally, some substitution sites in the Cyt b region exhibited high site-specific substitution rates; thus, we cannot rule out the possibility that homoplasy may occur, even in the Cyt b region. To comprehensively understand the evolutionary history of Cyt b, further analysis with evolutionarily more conserved mtDNA region(s) is required. Using whole mitochondrial genome data obtained by next-generation sequencing technology may provide a realistic choice, which may also contribute to minimizing biases caused by homoplasy.

Although homoplastic mutations were indicated in the HVR/ND2, no substitution saturation that would bias demographic inferences in the P. yokohamae population seemed to have occurred. If sequences fully experience substitution saturation, coalescent-based phylogenetic analysis such as BSP would be severely biased (Xia and Lemey 2009); however, sequences did not attain substitution saturation at the HVR and ND2 (Online Resource, Fig. S1). The demographic history estimated by BSPs were highly consistent among the three regions (Fig. 3), indicating that homoplasy in the HVR and ND2 in the P. yokohamae population would not greatly affect population demography inferences using BSP analysis. The substitution models assumed in the BSP analysis account for multiple substitutions, which possibly minimize the effect of homoplasy.

The observed BSPs also imply that concatenated sequence data potentially improve the resolution of demographic inference, particularly in relatively ancient years. The BSP with concatenated data revealed the longest population history of any of the individual regions (Fig. 3). It also represented a signal of the past population expansion that none of the single regions detected. This is probably due to the increase in the number of coalescent events when using concatenated sequences. A simulation study showed that the usage of longer sequences indeed contributes to improving accuracy and reducing estimation error (Heled and Drummond 2008). The BSP suggests that P. yokohamae experienced a population expansion of approximately 125 kya, which corresponded to the last interglacial period, followed by a recent population decline after the Last Glacial Maximum. Paleogeographic changes in marine environments were remarkable owing to Pleistocene glacial−interglacial cycles (Chappell 1994; Lambeck et al. 2002), which impacted the demographic history of a wide variety of marine species (Matsui 2022). Some coastal marine fish species in the Northwestern Pacific also experienced population expansion during the same interglacial period we observed (Kokita and Nohara 2011; Ni et al. 2014). Previous population genetic studies in P. yokohamae have not yet conducted demographic analysis, or have used only the HVR; thus we may reconstruct a more plausible population history of this species using multiple mtDNA regions that were well consistent with the past geographical events.

The population demography inferred from the haplotype networks differed among regions; ND2 and Cyt b showed past population expansion, whereas the HVR represented a constant population size over time (Fig. 2). The BSP using concatenated sequences exhibited both past population expansion in relatively ancient years and a constant population size, followed by population decline in recent years (Fig. 3d). Considering the differences in substitution rates between the noncoding HVR and coding regions of ND2 and Cyt b, each haplotype network may reflect the population history at different time scales. Specifically, recent population demography may be reflected by the faster-evolving HVR, while relatively ancient histories are represented by coding regions with a lower substitution rate. This suggests that, by using multiple mtDNA regions with different substitution rates, it is possible to continuously capture population dynamics from the relatively old past to recent years. Meanwhile, estimating population demography from a haplotype network based solely on the HVR imposes a risk of misunderstanding past population history, particularly when genetic diversity is high, because the rapidly evolving HVR may mask the signature of past demography. Indeed, contrasting haplotype network patterns between the noncoding and coding regions are frequently observed in species with high genetic diversity in the HVR: h = 0.963, π = 0.0179 in yellowtail snapper Ocyurus chrysurus (da Silva et al. 2015); h = 0.9955–0.9997, π = 0.0343–0.0782 in endemic Hawaiian damselfishes (Tenggardjaja et al. 2016); h = 0.995, π = 0.0245 in European eel Anguilla anguilla (Ragauskas et al. 2017); h = 0.997, π = 0.0227 in longtail tuna Thunnus tonggol (Kasim et al. 2020); h = 0.941, π = 0.0083 in catfish species Pangasius krempfi (Duong et al. 2023). Contrastingly, species whose genetic diversity is middle to relatively low tend to show fairly consistent haplotype networks across mtDNA regions: h = 0.5626, π = 0.0022 in lane snapper Lutjanus synagris (Silva et al. 2018); h = 0.875, π = 0.00860 in silver carp Hypophthalmichthys molitrix (Chen et al. 2019); h = 0.928, π = 0.00578 in naked carp Gymnocypris przewalskii (Fang et al. 2022); h = 0.164, π = 0.006 in ten spotted live-bearer fish Cnesterodon decemmaculatus (Rautenberg et al. 2022); h = 0.968, π = 0.0087 in Chinese gizzard shad Clupanodon thrissa (Zhang et al. 2023).

In summary, the HVR and ND2 genes in P. yokohomae were highly suggestive of homoplasy, owing to non-IBD substitution patterns, high genetic variability and substitution rates, and reticulate haplotype network structures. With regard to the hypervariable nature of the control region in fish, our results imply that homoplasy has occurred in other fish species, including flatfish. Our study showed that the HVR still has great potential for population genetic studies; however, users should acknowledge the risks and bias that homoplasy may cause, owing to its elevated substitution rate. We also showed that homoplasy may not always affect population demographic inferences, but longer sequences potentially increase the resolution of population demographic history. Hence, considerable caution is required when using mtDNA regions for population genetics studies, and the use of other sequence regions in parallel is highly recommended.