Abstract
The hypervariable region (HVR) in the control region of the mitochondrial DNA has frequently been used for population genetics and phylogeographic studies because of its highly variable nature. Although the HVR is beneficial for evaluating recent evolutionary history, including population demography, recent studies have implied the incidence of homoplasy in this region. To assess the accuracy of relying solely on the HVR for population genetics studies, molecular evolutionary analysis of the HVR, NADH-dehydrogenase subunit 2 (ND2), and cytochrome b genes were performed using 120 individuals of marbled flounder Pseudopleuronectes yokohamae. The HVR exhibited the highest genetic variability among the three regions, with sites showing high site-specific substitution rates. Considering the reticulate haplotype network structure and evolutionary linkages between regions, homoplastic mutations were indicated in the HVR in addition to ND2, underestimating genetic diversity. We found that homoplasy was less likely to affect coalescent-based demographic inferences in the population; however, there is still a potential risk of misinterpretation of population demography when solely using the HVR owing to its hypervariable nature. Collectively, we suggest analyzing other regions in addition to the HVR in fish population genetic research to improve accuracy and eliminate biases caused by homoplasy.
Avoid common mistakes on your manuscript.
Introduction
For the proper conservation and management of natural populations, it is essential to examine genetic population structures in addition to the genetic characteristics of the population, such as genetic diversity and population demography (Coates et al. 2018). A growing number of studies have suggested that genetic diversity is an important resource for adapting to environmental changes (Barrett and Schluter 2008; Bernatchez 2016; Bitter et al. 2019). Therefore, appropriate assessment of genetic diversity is fundamental for establishing management and conservation units, evaluating the genetic health of current populations, and assessing the adaptive potential of populations. Additionally, examining population dynamics in relation to past geographic history provides not only insights into the underlying evolutionary processes of current population genetic structures but also valuable information for appropriate fisheries management and conservation measures of populations (Benestan 2020).
Analysis of mitochondrial (mtDNA) is beneficial for addressing evolutionary, phylogenetic, and phylogeographic questions owing to its unique characteristics compared with nuclear DNA, including its haploid nature, maternal inheritance, and higher substitution rates (Avise 2000). In population genetics and phylogeographic studies including fish species, the mtDNA control region and the cytochrome b gene (Cyt b) are frequently used as genetic markers (Avise 2000). The control region of the mitochondrial genome is a noncoding region that comprises a series of variable regions and conserved sequence blocks (Satoh et al. 2016). The variable region, also called the “hypervariable region” (HVR), harbors dozens of times more substitutions than other mtDNA coding regions (Aquadro and Greenberg 1983; Stewart and Baker 1994; McMillan and Palumbi 1997), thereby making this region the most suitable for the evaluation of recent evolutionary history, including population demographics.
Despite the advantages of population genetics studies, the HVR tends to accumulate multiple substitutions, even at the same sites, within a relatively short time frame (Heyer et al. 2001), often resulting in homoplasy. When analyzing sequence data, sequences are generally interpreted under the assumption of identity by descent (IBD); multiple individuals share the same nucleotide sequence without recombination and are inherited from a common ancestor. However, recurrent substitutions (e.g., C → G → C) can result in homoplasy; multiple individuals share apparently “IBD” sequences, but these are simply identical nucleotide sequences (identity by state, IBS) having different evolutionary processes (or different substitution steps). Convergent molecular evolution results in a misunderstanding of the substitution history of sequences (Excoffier et al. 1992; Phillips et al. 2009); unintentionally treating IBS sequences caused by homoplasy as IBD sequences (false haplotyping).
Distinguishing IBD from IBS sequences (hereafter referred to as haplotyping accuracy) in the haplotyping process is essential for accurately evaluating genetic diversity. Particularly in populations with high haplotype or nucleotide diversity, homoplasy is likely to affect polymorphism evaluation and potentially leads to underestimation of genetic diversity such as haplotype diversity (h) (Nei 1987) and nucleotide diversity (π) (Nei and Li 1979). Studies have shown that such false haplotyping could also introduce biases in population genetic diversity statistics, including FST (Bradman et al. 2011; Ando et al. 2016; Verma et al. 2016); however, little is known about the effects of homoplasy on population demographics. An empirical study of shark species implicated homoplasy in the HVR (Tavares et al. 2013) even though the genetic diversity of these species is not high in marine species, which highlights the need for close attention to homoplasy in the HVR when studying marine fish. To avoid false haplotyping resulting from homoplasy, it is imperative to analyze the concatenated sequences with other regions that exhibit lower substitution rates than the HVR (Ando et al. 2016).
The marbled flounder Pseudopleuronectes yokohamae is distributed across the Japanese Archipelago, Korean Peninsula, and China, from the Gulf of Pohai southward of the Yellow Sea to the northern part of the East China Sea (Nakabo 2002; Amaoka 2016; Tanda et al. 2008). This species is an important target of coastal fisheries. The biological characteristics of this species are unique among flatfish species; it spawns adhesive eggs and has limited adult migration (Minami 2008; Amaoka 2016). Owing to the limited dispersal potential, even among geographically close populations, these populations are expected to be structured, and this indeed has been shown in previous studies (Park et al. 1990; Fujii 2010; Tsukagoshi et al. 2015; Sato et al. 2018). Several studies have analyzed the population genetic structure of P. yokohomae using the HVR, or the first half of the control region, exhibiting substantial genetic differences among sampling populations (Fujii 2010; Ikeda et al. 2010; Lee et al. 2012; Tsukagoshi et al. 2015). These studies also revealed that P. yokohomae exhibits considerably high genetic diversity; therefore, it is predicted that the HVR in P. yokohomae might be saturated with substitutions owing to its highly variable nature.
In the present study, we investigated the possibility of homoplasy at the HVR and evaluated haplotyping accuracy in P. yokohomae using common mtDNA markers: the first half of the control region (HVR), NADH-dehydrogenase subunit 2 (ND2), and Cyt b genes. Specifically, considering the differences in substitution rates among regions, we examined the haplotype linkages of these regions, patterns in haplotype networks, and site-specific substitution rates to make inferences regarding evolutionary history. The haplotyping accuracy was evaluated by comparing polymorphisms in each region with concatenated sequences. Additionally, we investigated the demographic history of P. yokohamae and discuss the potential pitfalls of inferring population history using the HVR owing to its highly variable nature. On the basis of the findings, we identify the challenges of conducting genetic diversity assessments and evolutionary demographic analyses only using the HVR and provide solutions to the problems.
Materials and methods
Samples and mtDNA sequencing
A total of 120 individuals of P. yokohomae (standard length 18–223 mm) were collected from Otsuchi Bay in Iwate Prefecture, Japan, between 2012 and 2013. The whole samples were preserved in 99.5% ethanol for DNA extraction. Genomic DNA was extracted from the fin clips using the QuickGene DNA tissue kit S and Mini80 (KURABO, Osaka, Japan), following the manufacturer’s instructions. The target fragments of the mtDNA markers, the first half of the control region (i.e., HVR, 378 bp), ND2 (1045 bp), and Cyt b (1141 bp) were amplified via polymerase chain reaction (PCR) using the following custom primers: HVR (5′-TCT TAC CCC TAA CTC CCA AAG C-3′ and 5′-GAA GTA GGA ACC AAA TGC CA-3′), ND2 (5′-AGA TCA AAA CTC TTC GTG CT-3′ and 5′-CAT GCA GAA GAT GTG GGA TA-3′), and Cyt b (5′-ACC CCA ACA CAA GAG AAA AT-3′ and 5′-AGA GCA TGC ATT ACA AGA CA-3′). For amplification, 1 μL of the template DNA, 0.125 μL of Blend Taq -Plus- (TOYOBO, Osaka, Japan), 1.25 μL of buffer for Blend Taq -Plus- (10 ×) (TOYOBO), 1.25 μL of 2 mM dNTP, 0.25 μL of each primer (25 μM), and 8.88 μL of sterile deionized H2O were added to each microtube to adjust the final volume to 13 μL. The PCR amplifications were carried out in a Veriti™ thermal cycler (Applied Biosystems, Waltham, MA, USA) under the following PCR cycling conditions: initial denaturation at 94 °C for 2 min, followed by 35 cycles of denaturation at 94 °C for 30 s, annealing at 50 °C (for the HVR primers) and 54 °C (for the ND2 and Cyt b primers) for 30 s, extension at 72 °C for 1 min, and final extension at 72 °C for 5 min. The PCR products were purified using the AMPure XP kit (Beckman Coulter, Brea, CA, USA) according to the manufacturer’s instructions. PCR products were bidirectionally sequenced using the BigDye® Terminator v3.1 cycle sequencing kit (Applied Biosystems) following the manufacturer’s instructions, using the same primers as used for amplification. Reactions were purified using the ethanol/EDTA/sodium acetate method described in the above-mentioned manual and then dissolved in 10 μL of Hi-Di formamide (Applied Biosystems). Samples were sequenced using a 3500XL genetic analyzer capillary sequencer (Applied Biosystems-HITACHI) at Tohoku University, Japan. All sequences obtained in this study were deposited in the DDBJ/EMBL/GenBank databases under accession numbers LC754366–LC754404 and LC754730–LC754784.
Data analyses
All sequences were edited manually using FinchTV 1.4.0 (Geospiza, Inc.; Seattle, WA, USA; http://www.geospiza.com) and aligned using ClustalW (Thompson et al. 1994) implemented in Molecular Evolutionary Genetics Analysis version 7.0 (MEGA7) for bigger datasets (Kumar et al. 2016). Using the HVR data, Tsukagoshi et al. (2015) suggested that P. yokohamae and the related species cresthead flounder P. schrenki hybridized primarily in the northern part of the Japanese Archipelago; they found that some P. yokohamae samples were placed in the P. schrenki clade in the neighbor-joining (NJ) tree (Saitou and Nei 1987), and vice versa. Thus, by combining data from Tsukagoshi et al. (2015) deposited in GenBank (acc. nos. AB979354.1–AB979435.1), an NJ tree was constructed using MEGA7 to exclude samples with haplotypes belonging to the P. schrenki clade.
To evaluate DNA haplotyping accuracy, the following variability indices were estimated for each region and for the concatenated sequences using Arlequin version 3.5.1 (Excoffier and Lischer 2010): number of polymorphic sites and haplotypes, haplotype diversity (h), and nucleotide diversity (π). All concatenated sequences were used to estimate site-specific substitution rates using Parat 0.9.1 (Meyer and Von Haeseler 2003) with default parameters to assess substitution rate variation among the sites. Haplotype networks were constructed based on each HVR, ND2, and Cyt b sequence data via the statistical parsimony method using TCS v1.21 (Clement et al. 2000) with default parameters in support of homoplasy detection; the number of reticulations in the haplotype network correspondingly increases with recurrent substitutions (Bandelt et al. 1995).
To detect the incidence of homoplasy, haplotype linkages observed between different regions were compared in light of the evolutionary pathways of the mtDNA regions. According to the variability indices and haplotype networks, we assumed that Cyt b was the most evolutionarily conserved among the three regions (see also the “Results” section). Based on the evolutionary relevance of the different mtDNA regions, haplotype linkages between HVR or ND2 and Cyt b were examined to discriminate between IBD and IBS (homoplasy) following assumptions according to Phillips et al. (2009). Briefly, if a group of samples shares an identical HVR or ND2 haplotype but has multiple linkages to Cyt b haplotypes, homoplasy possibly occurs in the HVR/ND2, considering the differences in nucleotide substitution rates among the three regions.
-
1.
The haplotype network traces its ancestry to a single ancestor possessing a linkage of plesiomorphic HVR, ND2, and Cyt b haplotypes.
-
2.
New linkages appear as a result of new mutations at each locus.
-
3.
Because of the hypervariable nature of the control region, homoplastic mutations are much more likely in the HVR, followed by ND2, and less likely in Cyt b.
-
4.
Plesiomorphic (ancestral) haplotypes are common and exhibit a greater number of haplotype linkages than derived or tip haplotypes.
-
5.
Two or more HVR/ND2 haplotypes can have no more than one Cyt b linkage in common, except by homoplasy.
-
6.
Linkages of a given HVR/ND2 haplotype to multiple Cyt b haplotypes involve homoplasy if the HVR/ND2 haplotype is not linked to all Cyt b haplotypes in the series of mutational steps at Cyt b (except when involving an inferred Cyt b haplotype).
The demographic histories of the P. yokohamae population were inferred using the network analysis described above and a Bayesian skyline plot (BSP, Drummond et al. 2005). Since substitution saturation potentially affects phylogenetic inferences (Strimmer et al. 2009; Xia and Lemey 2009) and demographic inferences, the saturation of substitution was evaluated by plotting the number of transitions and transversions versus the F84 distance using DAMBE7 (Xia 2018). Haplotype networks provide a good overview of population demography in addition to the genealogical relationships of haplotypes. A star-like haplotype network indicates a recent population expansion, whereas a haplotype-scattered shape with a loop structure indicates a constant population size over time (Koizumi and Ikeda 2013). Skyline-plot methods are based on the coalescent theory to reconstruct fluctuations in effective population size (Ne) (female effective population size Nef in the case of mtDNA) over time with information contained in haplotype genealogies of DNA sequences (Ho and Shapiro 2011). BSPs were depicted using the sequence data from each region as well as the concatenated ones. For the concatenated data, the sequences were partitioned by region, assigning priors and parameters individually. The best nucleotide substitution models estimated in MEGA7 were HKY for the HVR and TN93 for both ND2 and Cyt b regions. The BSP analyses were performed under the substitution models, strict clock model, and coalescent Bayesian skyline model using BEAST v2.7.6 (Bouckaert et al. 2014). A mutation rate of 7.5%/million years (Myr), or 0.0375 substitutions per site per lineage per Myr, was applied to the HVR based on 3–12% of the average mutation rate of this region in flatfish (Xiao et al. 2011). As the specific mutation rates of ND2 and Cyt b in flatfish were unavailable, a general 2%/Myr divergence rate (i.e., 0.01 substitutions per site per lineage per Myr) in bony fish (Bermingham et al. 1997) was applied. The clock rate prior was set to a log-normal with M = 3.75 × 10−8 and S = 0.35 for the HVR, and M = 1.0 × 10−8 and S = 0.25 for both ND2 and Cyt b. The Markov chain Monte Carlo analysis was run for 1.0 × 108 generations for each region and 3.0 × 108 generations for the concatenated data, sampling every 1000 generations, and discarding the first 10% as burn-in. The convergence of sampling processes was monitored using Tracer v1.7.2 (Rambaut et al. 2018), ensuring that all effective sample size values were above 200. The BSP was then generated using Tracer. To calculate the effective female population size, a generation time of 3 years was assumed according to the age at which more than 50% of the adults are sexually mature (Takahashi et al. 1983).
Results
One individual belonged to the P. schrenki clade in the NJ tree; therefore, the remaining 119 samples were used for subsequent analyses. Table 1 presents the mtDNA diversity of P. yokohamae in each region and the concatenated regions. As documented in previous studies, high haplotype diversity (h) and nucleotide diversity (π) were observed in the HVR in this study. Comparing the variability indices among the three mtDNA regions, the genetic diversity of the HVR was the highest with h = 0.960 and π = 0.0176, followed by ND2 (h = 0.922 and π = 0.0027), and Cyt b (h = 0.873 and π = 0.0031). When the sequences were concatenated, the number of haplotypes and the haplotype diversity increased. They were maximized when all the sequence data were concatenated with 49 haplotypes. Haplotyping accuracy was greatly improved by concatenating sequences and increasing the number of haplotypes by seven (HVR + ND2), seven (HVR + Cyt b), and ten (all regions combined) compared with that of the HVR alone.
Substitution sites were widely distributed in the three regions; however, sites with high site-specific substitution rates were concentrated in the HVR (Fig. 1). Specifically, many of the highest site-specific substitution rate category sites (i.e., sites #9, #176, #304, and #372 in the HVR; #306 and #424 in ND2; #471 and #639 in Cyt b) occurred in the HVR, but some others were also observed in the protein-coding regions.
The haplotype network in the HVR and ND2 presented a large number of reticulations, whereas that in Cyt b exhibited a relatively simple shape without a loop structure (Fig. 2). Considering that the reticulate network was characterized by recurrent substitutions (including homoplasy) and relatively low genetic diversity in the Cyt b (Table 1), Cyt b represents the most conserved and plausible phylogeny among the three regions. Therefore, the haplotype linkages of HVR–Cyt b and ND2–Cyt b are investigated to distinguish IBD from IBS (homoplasy) under the assumptions 1–6 described in the “Materials and Methods” section. A total of six HVR haplotypes had multiple linkages to Cyt b haplotypes (i.e., samples that have an identical HVR haplotype shared more than one Cyt b haplotypes) ranging from 1 to 5 bp difference. In ND2, a total of four haplotype linkages were found with 1−5 bp difference in Cyt b. Such HVR/ND2 haplotypes highlighted with specific colors in Fig. 2c, d were analyzed in conjunction with their observed evolutionary pathways of Cyt b haplotypes. Inferences regarding specific haplotypes are outlined below and summarized in Table 2 (see also Online Resource, Table S1 for the relationships between sample IDs and haplotype IDs in each region).
(1) Haplotype cr-6
The haplotype was linked to two Cyt b haplotypes (cb-24 and cb-25) that differed by single transitional substitution at the site with low site-specific substitution rate. The terminal Cyt b haplotype (cb-25) consisted of only one individual. In this case, the haplotype linkages can be explained by a single mutational step of Cyt b at the site where substitution rarely occurs; thus, it was probably the result of mutation at Cyt b.
(2) Haplotypes cr-8 and nd-8
These haplotypes had two linkages with Cyt b haplotypes (cb-6 and cb-15) that differed by single transitional substitution at the site with low site-specific substitution rate. The common Cyt b haplotype of cb-6 seemed an ancestral haplotype, while the haplotype cb-15 was a relatively minor terminal haplotype. Individuals having the cb-15 haplotype shared multiple HVR and ND2 haplotypes; thus, it is unclear whether this is the result of retention of particular plesiomorphic HVR/ND2 haplotype or homoplasy. However, both cr-8 and nd-8 haplotypes shared the same Cyt b haplotype linkages, thus the linkages may be a result of a single mutational event at Cyt b.
(3) Haplotypes cr-9 and nd-11
The haplotypes cr-9 and nd-11 linked to two (cb-1 and cb-20) and three Cyt b haplotypes (cb-5, cb-12, and cb-19), respectively. All these haplotypes differed by transitional substitutions at sites of low site-specific substitution rates. Linkages of neither HVR nor ND2 haplotypes to the Cyt b haplotypes were in the series of mutational steps at Cyt b (i.e., no individuals had the cb-6 haplotype). Therefore, these two HVR/ND2 haplotypes are probably the consequence of homoplastic mutations at HVR/ND2. The number of mutational events necessary to explain each HVR–Cyt b and ND2–Cyt b linkages were two and three, respectively.
(4) Haplotype nd-12
The haplotype is linked to two Cyt b haplotypes (cb-6 and cb-23) that diffed by four transitional substitutions. In addition, individuals having these major Cyt b haplotypes also shared multiple ND2 haplotypes. The linked Cyt b haplotypes varied at sites including the highest site-specific substitution rate category sites such as #471 and #639, which indicates that substitutions are likely to occur at such Cyt b sites. However, it would be plausible to consider that homoplasy occurred at ND2 as four mutational steps needed to resolve the ND2–Cyt b linkage.
(5) Haplotypes cr-11 and nd-3
The cr-11 and nd-3 haplotypes linked to three (cb-6, cb-19, and cb-23) and five Cyt b haplotypes (cb-2, cb-4, cb-6, cb-10, and cb-14), respectively, possibly are the results of complex evolutionary histories. The three Cyt b haplotypes linked to the cr-11 haplotype diffed by a maximum of five transitional substitutions with sites including the highest site-specific substitution rate category. The linkage between cb-6 and cb-23 was probably the consequence of homoplasy at the HVR as described in the case of (4). On the other hand, the cb-19 was a minor and tip haplotype that consisted of only one individual; thus, it can be produced by a single transitional mutation from the cb-6 haplotype. For the nd-3 haplotype, five mutational events were required to fully resolve the total of five ND2–Cyt b linkages, and most of the linkages were involved with the core haplotype cb-6. The haplotypes cb-4 and cb-10 were terminal and minor haplotypes that each differed from the cb-6 haplotype by a single transition at a site with low site-specific substitution rates, suggesting that these haplotypes probably were the result of a single mutational event from the cb-6 haplotype. The cb-14 haplotype also linked to the hub haplotype of cb-6 with single transitional substitution at the site of low site-specific substitution rate; however, it contained individuals having multiple ND2 haplotypes. Therefore, it is difficult to distinguish whether this is the result of mutation of Cyt b or homoplasy at ND2. Conversely, the cb-2 haplotype lacked consecutive Cyt b linkages that can be explained by a series of mutational steps at Cyt b. Additionally, the cb-2 and cb-6 haplotypes diffed by two sites including transversional substitution, which is less likely to be expected than transitional substitution. Therefore, homoplasy at ND2 is probably involved in the evolutionary pathway of the cb-2 haplotype.
(6) Haplotypes cr-20 and cr-25
These haplotypes each had two linkages with Cyt b haplotypes (cb-6 and cb-14 for cr-20 and cb-6 and cb-12 for cr-25 haplotypes) that differed by single transitional substitution at site with low site-specific substitution rate. Both haplotypes linked with the core haplotype cr-6, while other ones were relatively minor terminal haplotypes, sharing individuals having multiple HVR haplotypes. Single mutational event at Cyt b or homoplasy at HVR both reasonably explained these evolutionary pathways and are difficult to conclude.
Because homoplasy was suggested in the HVR and ND2, substitution saturation was assessed for these regions. No saturation was observed in both regions because transitions outnumbered transversions regardless of pairwise genetic distances (Online Resource, Fig. S1).
The haplotype networks represented contrasting population demographics (Fig. 2); the HVR showed a diffuse pattern with loop structures, suggesting a constant population size over time, while the star-like networks of both ND2 and Cyt b implied a recent population expansion. The BSPs showed similar demography across the three regions; a constant population size followed by a recent decline from approximately 10–20 kiloyears ago (kya) (Fig. 3a–c). Additionally, the estimated median Nef from the time series was nearly identical. The BSPs from the HVR and Cyt b traced a longer history with approximately 135 ky than that of ND2 with only approximately 80 ky. The concatenated sequence data traced the longest demography history to approximately 165 ky showing a population expansion at approximately 125 kya, which was detected using none of the single regions alone (Fig. 3d). After the population growth, the BSP exhibited similar demographic transition to those from each region: a constant population size and a recent decline from approximately 10 kya.
Discussion
To evaluate the possibility of homoplasy in the HVR and haplotyping accuracy in P. yokohamae, we investigated the evolutionary relationships of multiple mtDNA regions. Compared with ND2 and Cyt b regions, genetic variability in the noncoding HVR was particularly high, with more sites in the highest site-specific substitution rate categories (Table 1, Fig. 1). In contrast, Cyt b was the most conserved among the three regions, showing the lowest haplotype diversity (Table 1), a relatively low substitution rate (Fig. 1), and a simple haplotype network (Fig. 2). On the basis of the haplotype network of Cyt b, multiple non-IBD substitution patterns were observed in the HVR and ND2, which was also supported by the reticulate structure of the haplotype networks (Fig. 2). Considering these results, homoplastic mutations are highly likely to occur in the P. yokohamae population. A remarkable finding of this study was that homoplasy could be detected not only in the HVR but also in the ND2 gene, which is a protein-coding region with a relatively lower substitution rate than that of the HVR. Homoplasy in the HVR has been reported in a wide variety of marine organisms, such as the Steller sea lion Eumetopias jubatus (Phillips et al. 2009), shark species (Tavares et al. 2013), and Japanese flounder Paralichthys olivaceus (Ando et al. 2016); however, few cases of homoplasy at ND2 gene have been documented (Podlesnykh et al. 2020).
Considering our results and those of previous case studies, homoplasy in the HVR may occur in other fish species owing to the highly variable nature of this region. Indeed, homoplasy in the HVR has been implicated in some flatfish species (such as the willowy flounder Tanakius kitaharai (Xiao et al. 2008), blackfin flounder Glyptocephalus stelleri (Xiao et al. 2010), and point-head flounder Cleisthenes pinetorum (Xiao et al. 2011)). Although these studies did not assess the evolutionary history of the HVR, the genetic diversity of these flatfishes (i.e. h = 0.93–0.99, π = 0.009–0.017) was similar or even relatively higher than that of P. yokohamae. In such species or population(s) having numerous substitution sites, some identical haplotypes may indeed have resulted from IBS owing to the recurrent substitutions. Thus, it is not surprising that homoplasy also occurs in these fish. Furthermore, high genetic variability in the HVR has been reported in many fish species across wide taxa: h = 0.776–0.981, π = 0.0178–0.0298 in European plaice Pleuronectes platessa (Hoarau et al. 2004); h = 0.968–1.000, π = 0.008–0.014 in red porgy Pagrus pagrus (Ball et al. 2007); h = 0.913–0.993, π = 0.0073–0.0099 in yellow drum Nibea albiflora (Han et al. 2008). It should be noted that homoplastic substitution was suggested in shark species even though their genetic diversity was much lower, with h = 0.540 and π = 0.00216 in Carcharhinus limbatus, h = 0.880 and π = 0.00436 in C. porosus, and h = 0.879 and π = 0.00405 in Rhizoprionodon porosus (Tavares et al. 2013). The level of genetic variability indeed varies among species; however, the control region, particularly the HVR, is widely recognized as a hypervariable region (Lee et al. 1995). Considering that the control region has a universal structure across a diverse group of fish species, as evidenced by Satoh et al. (2016) in their study of 250 species, it is natural to consider that homoplasy occurs at the HVR in other fish species. Therefore, it is worth investigating the evolutionary relationships of the HVR with other mtDNA regions to recognize the potential risk caused by homoplasy.
To more accurately estimate genetic diversity (i.e., h and π) and population genetic diversity statistics (i.e., FST), mtDNA sequences across different regions should be concatenated. This study revealed that the haplotyping accuracy was greatly improved by concatenating the HVR and coding region(s) (Table 1). A previous study also reported that haplotyping accuracy was improved by concatenating the HVR and ND2 in P. olivaceus (Ando et al. 2016). Some population genetic diversity and population genetic analyses of P. yokohomae have been implemented using the HVR alone (Fujii 2010; Ikeda et al. 2010; Lee et al. 2012; Tsukagoshi et al. 2015); thus, these analyses may be biased owing to false haplotyping resulting from homoplasy. We were unable to assess the potential bias of homoplasy in population genetic statistics because only one sample population was used in this study; however, previous studies have reported overestimation or underestimation of FST owing to false haplotyping resulting from homoplasy (Bradman et al. 2011; Ando et al. 2016; Verma et al. 2016). Therefore, multiple mtDNA regions, ideally the whole mitochondrial genome, should be analyzed to obtain robust estimates and reduce the risk of false haplotyping.
Considering the non-IBD substitution patterns (Table 2) and the high site-specific substitution rate in the HVR (Fig. 1), homoplasy may have occurred at such mutation hotspots. Nevertheless, it is difficult to conclude that these cases resulted from homoplasy because recurrent mutation events could not be directly observed. This study showed that the Cyt b region was evolutionarily the most conserved among the three regions but still possessed sufficient variability for genetic markers. Additionally, some substitution sites in the Cyt b region exhibited high site-specific substitution rates; thus, we cannot rule out the possibility that homoplasy may occur, even in the Cyt b region. To comprehensively understand the evolutionary history of Cyt b, further analysis with evolutionarily more conserved mtDNA region(s) is required. Using whole mitochondrial genome data obtained by next-generation sequencing technology may provide a realistic choice, which may also contribute to minimizing biases caused by homoplasy.
Although homoplastic mutations were indicated in the HVR/ND2, no substitution saturation that would bias demographic inferences in the P. yokohamae population seemed to have occurred. If sequences fully experience substitution saturation, coalescent-based phylogenetic analysis such as BSP would be severely biased (Xia and Lemey 2009); however, sequences did not attain substitution saturation at the HVR and ND2 (Online Resource, Fig. S1). The demographic history estimated by BSPs were highly consistent among the three regions (Fig. 3), indicating that homoplasy in the HVR and ND2 in the P. yokohamae population would not greatly affect population demography inferences using BSP analysis. The substitution models assumed in the BSP analysis account for multiple substitutions, which possibly minimize the effect of homoplasy.
The observed BSPs also imply that concatenated sequence data potentially improve the resolution of demographic inference, particularly in relatively ancient years. The BSP with concatenated data revealed the longest population history of any of the individual regions (Fig. 3). It also represented a signal of the past population expansion that none of the single regions detected. This is probably due to the increase in the number of coalescent events when using concatenated sequences. A simulation study showed that the usage of longer sequences indeed contributes to improving accuracy and reducing estimation error (Heled and Drummond 2008). The BSP suggests that P. yokohamae experienced a population expansion of approximately 125 kya, which corresponded to the last interglacial period, followed by a recent population decline after the Last Glacial Maximum. Paleogeographic changes in marine environments were remarkable owing to Pleistocene glacial−interglacial cycles (Chappell 1994; Lambeck et al. 2002), which impacted the demographic history of a wide variety of marine species (Matsui 2022). Some coastal marine fish species in the Northwestern Pacific also experienced population expansion during the same interglacial period we observed (Kokita and Nohara 2011; Ni et al. 2014). Previous population genetic studies in P. yokohamae have not yet conducted demographic analysis, or have used only the HVR; thus we may reconstruct a more plausible population history of this species using multiple mtDNA regions that were well consistent with the past geographical events.
The population demography inferred from the haplotype networks differed among regions; ND2 and Cyt b showed past population expansion, whereas the HVR represented a constant population size over time (Fig. 2). The BSP using concatenated sequences exhibited both past population expansion in relatively ancient years and a constant population size, followed by population decline in recent years (Fig. 3d). Considering the differences in substitution rates between the noncoding HVR and coding regions of ND2 and Cyt b, each haplotype network may reflect the population history at different time scales. Specifically, recent population demography may be reflected by the faster-evolving HVR, while relatively ancient histories are represented by coding regions with a lower substitution rate. This suggests that, by using multiple mtDNA regions with different substitution rates, it is possible to continuously capture population dynamics from the relatively old past to recent years. Meanwhile, estimating population demography from a haplotype network based solely on the HVR imposes a risk of misunderstanding past population history, particularly when genetic diversity is high, because the rapidly evolving HVR may mask the signature of past demography. Indeed, contrasting haplotype network patterns between the noncoding and coding regions are frequently observed in species with high genetic diversity in the HVR: h = 0.963, π = 0.0179 in yellowtail snapper Ocyurus chrysurus (da Silva et al. 2015); h = 0.9955–0.9997, π = 0.0343–0.0782 in endemic Hawaiian damselfishes (Tenggardjaja et al. 2016); h = 0.995, π = 0.0245 in European eel Anguilla anguilla (Ragauskas et al. 2017); h = 0.997, π = 0.0227 in longtail tuna Thunnus tonggol (Kasim et al. 2020); h = 0.941, π = 0.0083 in catfish species Pangasius krempfi (Duong et al. 2023). Contrastingly, species whose genetic diversity is middle to relatively low tend to show fairly consistent haplotype networks across mtDNA regions: h = 0.5626, π = 0.0022 in lane snapper Lutjanus synagris (Silva et al. 2018); h = 0.875, π = 0.00860 in silver carp Hypophthalmichthys molitrix (Chen et al. 2019); h = 0.928, π = 0.00578 in naked carp Gymnocypris przewalskii (Fang et al. 2022); h = 0.164, π = 0.006 in ten spotted live-bearer fish Cnesterodon decemmaculatus (Rautenberg et al. 2022); h = 0.968, π = 0.0087 in Chinese gizzard shad Clupanodon thrissa (Zhang et al. 2023).
In summary, the HVR and ND2 genes in P. yokohomae were highly suggestive of homoplasy, owing to non-IBD substitution patterns, high genetic variability and substitution rates, and reticulate haplotype network structures. With regard to the hypervariable nature of the control region in fish, our results imply that homoplasy has occurred in other fish species, including flatfish. Our study showed that the HVR still has great potential for population genetic studies; however, users should acknowledge the risks and bias that homoplasy may cause, owing to its elevated substitution rate. We also showed that homoplasy may not always affect population demographic inferences, but longer sequences potentially increase the resolution of population demographic history. Hence, considerable caution is required when using mtDNA regions for population genetics studies, and the use of other sequence regions in parallel is highly recommended.
References
Amaoka K (2016) Flatfishes of Japan (Citharidae, Paralichthyidae, Bothidae, Pleuronectidae, Poecilopsettidae, Samaridae). Tokai University Press, Kanagawa (in Japanese)
Ando D, Ikeda M, Sekino M, Sugaya T, Katamachi D, Yoseda K, Kijima A (2016) Improvement of mitochondrial DNA haplotyping in Japanese flounder populations using the sequences of control region and ND2 gene. Nippon Suisan Gakkaishi 82:712–719 (in Japanese with English abstract)
Aquadro CF, Greenberg BD (1983) Human mitochondrial DNA variation and evolution: analysis of nucleotide sequences from seven individuals. Genetics 103:287–312
Avise JC (2000) Phylogeography: the history and formation of species. Harvard University Press, Cambridge
Ball AO, Beal MG, Chapman RW, Sedberry GR (2007) Population structure of red porgy, Pagrus pagrus, in the Atlantic Ocean. Mar Biol 150:1321–1332
Bandelt HJ, Forster P, Sykes BC, Richards MB (1995) Mitochondrial portraits of human populations using median networks. Genetics 141:743–753
Barrett RDH, Schluter D (2008) Adaptation from standing genetic variation. Trends Ecol Evol 23:38–44
Benestan L (2020) Population genomics applied to fishery management and conservation. In: Oleksiak MF, Rajora OP (eds) Population genomics: marine organisms. Springer Nature Switzerland AG, Cham, pp 399–421
Bermingham E, McCafferty S, Martin A (1997) Fish biogeography and molecular clocks: perspectives from the Panamanian Isthmus. In: Kocher T, Stepien C (eds) Molecular systematics of fishes. Academic Press, London, pp 113–128
Bernatchez L (2016) On the maintenance of genetic variation and adaptation to environmental change: considerations from population genomics in fishes. J Fish Biol 89:2519–2556
Bitter MC, Kapsenberg L, Gattuso JP, Pfister CA (2019) Standing genetic variation fuels rapid adaptation to ocean acidification. Nat Commun 10:5821
Bouckaert R, Heled J, Kühnert D, Vaughan T, Wu CH, Xie D, Suchard MA, Rambaut A, Drummond AJ (2014) BEAST 2: a software platform for Bayesian evolutionary analysis. PLOS Comput Biol 10:e1003537
Bradman H, Grewe P, Appleton B (2011) Direct comparison of mitochondrial markers for the analysis of swordfish population structure. Fish Res 109:95–99
Chappell J (1994) Upper Quaternary sea levels, coral terraces, oxygen isotopes and deep-sea temperatures. J Geogr (chigaku Zasshi) 103:828–840
Chen H, Wang D, Guo J, Duan X, Liu S, Chen D, Li Y (2019) Monitoring the genetic effects of broodstock enhancement of silver carp (Hypophthalmichthys molitrix) in middle Yangtze River based on cytb gene and D-loop sequences. J Freshw Ecol 34:323–332
Clement M, Posada D, Crandall KA (2000) TCS: a computer program to estimate gene genealogies. Mol Ecol 9:1657–1659
Coates DJ, Byrne M, Moritz C (2018) Genetic diversity and conservation units: dealing with the species-population continuum in the age of genomics. Front Ecol Evol 6:1–13
da Silva R, Veneza I, Sampaio I, Araripe J, Schneider H, Gomes G (2015) High levels of genetic connectivity among populations of yellowtail snapper, Ocyurus chrysurus (Lutjanidae-Perciformes), in the western South Atlantic revealed through multilocus analysis. PLoS ONE 10:e0122173
Drummond AJ, Rambaut A, Shapiro B, Pybus OG (2005) Bayesian coalescent inference of past population dynamics from molecular sequences. Mol Biol Evol 22:1185–1192
Duong TY, Nguyen NTT, Tran DD, Le TH, Nor SAM (2023) Multiple genetic lineages of anadromous migratory Mekong catfish Pangasius krempfi revealed by mtDNA control region and cytochrome b. Ecol Evol 13:e9845
Excoffier L, Lischer HE (2010) Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows. Mol Ecol Resour 10:564–567
Excoffier L, Smouse PE, Quattro JM (1992) Analysis of molecular variance inferred from metric distances among DNA haplotypes: application to human mitochondrial DNA restriction data. Genetics 131:479–491
Fang DA, Luo H, He M, Mao C, Kuang Z, Qi H, Xu D, Tan L, Li Y (2022) Genetic diversity and population differentiation of naked carp (Gymnocypris przewalskii) revealed by cytochrome oxidase subunit I and D-loop. Front Ecol Evol 10:1–11
Fujii T (2010) Genetic differentiation of marbled flounder (Pleuronectes yokohamae) around the Japanese Archipelago. Fish Oceanogr Tokyo Bay 1:8 (in Japanese)
Han ZQ, Gao TX, Yanagimoto T, Sakurai Y (2008) Genetic population structure of Nibea albiflora in Yellow Sea and East China Sea. Fish Sci 74:544–552
Heled J, Drummond AJ (2008) Bayesian inference of population size history from multiple loci. BMC Evol Biol 8:1–15
Heyer E, Zietkiewicz E, Rochowski A, Yotova V, Puymirat J, Labuda D (2001) Phylogenetic and familial estimates of mitochondrial substitution rates: study of control region mutations in deep-rooting pedigrees. Am J Hum Genet 69:1113–1126
Ho SY, Shapiro B (2011) Skyline-plot methods for estimating demographic history from nucleotide sequences. Mol Ecol Resour 11:423–434
Hoarau G, Piquet AM-T, van der Veer HW, Rijnsdorp AD, Stam WT, Olsen JL (2004) Population structure of plaice (Pleuronectes platessa L.) in northern Europe: a comparison of resolving power between microsatellites and mitochondrial DNA data. J Sea Res 51:183–190
Ikeda M, Endo W, Fujii T, Ishii M, Akabane S, Nakamura M, Issiki T, Katayama S (2010) Genetic characteristics and population formation processes of the marbled flounder (Pleuronectes yokohamae) in the Tokyo Bay and Ise-Mikawa Bay. Fish Oceanogr Tokyo Bay 1:7 (in Japanese)
Kasim NS, Jaafar TNAM, Piah RM, Arshaad WM, Nor SAM, Habib A, Ghaffar MA, Sung YY, Danish-Daniel M, Tan MP (2020) Recent population expansion of longtail tuna Thunnus tonggol (Bleeker, 1851) inferred from the mitochondrial DNA markers. PeerJ 8:1–23
Koizumi I, Ikeda H (2013) Population demographic history inferred from haplotype networks. In: Ikeda H, Koizumi I (eds) Phylogeography: the natural histories of species revealed by DNA. Bun-ichi Sogo Shuppan, Tokyo, pp 30–32 (in Japanese)
Kokita T, Nohara K (2011) Phylogeography and historical demography of the anadromous fish Leucopsarion petersii in relation to geological history and oceanography around the Japanese Archipelago. Mol Ecol 20:143–164
Kumar S, Stecher G, Tamura K (2016) MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol Biol Evol 33:1870–1874
Lambeck K, Esat TM, Potter EK (2002) Links between climate and sea levels for the past three million years. Nature 419:199–206
Lee WJ, Conroy J, Howell WH, Kocher TD (1995) Structure and evolution of teleost mitochondrial control regions. J Mol Evol 41:54–66
Lee SJ, Lee SG, Gwak WS (2012) Population genetic structure and genetic variability of the marbled sole Pleuronectes yokohamae on the coast of Gyeongsangnam-do, Korea. Anim Cells Syst 16:498–505
Machida H, Oba T, Ono A, Yamazaki H, Kawamura Y, Momohara A (2003) The quaternary. Asakura Publishing, Tokyo (in Japanese)
Matsui S (2022) Phylogeography of coastal fishes of Japan. In: Kai Y, Motomura H, Matsuura K (eds) Fish diversity of Japan: evolution, zoogeography, and conservation. Springer Nature Singapore, Singapore, pp 177–204
McMillan WO, Palumbi SR (1997) Rapid rate of control-region evolution in Pacific butterflyfishes (Chaetodontidae). J Mol Evol 45:473–484
Meyer S, Von Haeseler A (2003) Identifying site-specific substitution rates. Mol Biol Evol 20:182–189
Minami T (2008) Ecology of flatfishes. Review. In: National Association for the Promotion of Productive Seas (ed), Review in early-life ecology of major commercial fishes. National Association for the Promotion of Productive Seas, Tokyo, pp 263–297 (in Japanese)
Nakabo T (2002) Pleuronectidae. In: Nakabo T (ed) Fishes of Japan with pictorial keys to the species, English. Tokai University Press, Tokyo, pp 1371–1379
Nei M (1987) Molecular evolutionary genetics. Columbia University Press, New York
Nei M, Li WH (1979) Mathematical model for studying genetic variation in terms of restriction endonucleases. Proc Natl Acad Sci U S A 76:5269–5273
Ni G, Li Q, Kong L, Yu H (2014) Comparative phylogeography in marginal seas of the northwestern Pacific. Mol Ecol 23:534–548
Park JY, Kijima A, Fujio Y (1990) Genetic differentiation and variability between and within brown sole and marbled sole in the genus Limanda. Tohoku J Agric Res 41:9–23
Phillips CD, Trujillo RG, Gelatt TS, Smolen MJ, Matson CW, Honeycutt RL, Patton JC, Bickham JW (2009) Assessing substitution patterns, rates and homoplasy at HVRI of Steller sea lions, Eumetopias jubatus. Mol Ecol 18:3379–3393
Podlesnykh AV, Kukhlevsky AD, Brykov VA (2020) A comparative analysis of mitochondrial DNA genetic variation and demographic history in populations of even- and odd-year broodline pink salmon, Oncorhynchus gorbuscha (Walbaum, 1792), from Sakhalin Island. Environ Biol Fish 103:1553–1564
Ragauskas A, Butkauskas D, Bianchini ML (2017) Distinct matrilines in the panmictic population of the European eel Anguilla anguilla? Aquat Living Resour 30:21
Rambaut A, Drummond AJ, Xie D, Baele G, Suchard MA (2018) Posterior summarization in Bayesian phylogenetics using Tracer 1.7. Syst Biol 67:901–904
Rautenberg GE, Bonifacio AF, Chiappero MB, Amé MV, Hued AC (2022) Genetic structure of a native Neotropical fish species: new insights about a South American bioindicator. Arch Environ Contam Toxicol 83:168–179
Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4:406–425
Sato M, Kitanishi S, Ishii M, Hamaguchi M, Kikuchi K, Hori M (2018) Genetic structure and demographic connectivity of marbled flounder (Pseudopleuronectes yokohamae) populations of Tokyo Bay. J Sea Res 142:79–90
Satoh TP, Miya M, Mabuchi K, Nishida M (2016) Structure and variation of the mitochondrial genome of fishes. BMC Genomics 17:719
Silva D, Martins K, Oliveira J, da Silva R, Sampaio I, Schneider H, Gomes G (2018) Genetic differentiation in populations of lane snapper (Lutjanus synagris – Lutjanidae) from western Atlantic as revealed by multilocus analysis. Fish Res 198:138–149
Stewart DT, Baker AJ (1994) Patterns of sequence variation in the mitochondrial D-loop region of shrews. Mol Biol Evol 11:9–21
Strimmer K, von Haeseler A, Salemi M (2009) Genetic distances and nucleotide substitution models. In: Lemey P, Salemi M, Vandamme A-M (eds) The phylogenetic handbook: a practical approach to phylogenetic analysis and hypothesis testing. Cambridge University Press, Cambridge, pp 111–141
Takahashi T, Saito S, Maeda T, Kimura H (1983) Annual life period of adult righteye flounders Limanda herzensteini and L. yokohamae. Nippon Suisan Gakkaishi 49:663–670 (in Japanese with English abstract)
Tanda M, Gorie S, Nakamura Y, Okamoto S (2008) Age and growth of marbled sole Pleuronectes yokohamae in Harima Nada and Osaka Bay, the Seto Inland Sea, Japan. Nippon Suisan Gakkaishi 74:1–7 (in Japanese with English abstract)
Tavares W, da Silva Rodrigues-Filho LF, Sodré D, Souza RFC, Schneider H, Sampaio I, Vallinoto M (2013) Multiple substitutions and reduced genetic variability in sharks. Biochem Syst Ecol 49:21–29
Tenggardjaja KA, Bowen BW, Bernardi G (2016) Reef fish dispersal in the Hawaiian Archipelago: comparative phylogeography of three endemic damselfishes. J Mar Bioi 2016:1–17
Thompson JD, Higgins DG, Gibson TJ (1994) Clustal W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673–4680
Tsukagoshi H, Takeda K, Kariya T, Ozaki T, Takatsu T, Abe S (2015) Genetic variation and population structure of marbled sole Pleuronectes yokohamae and cresthead flounder P. schrenki in Japan inferred from mitochondrial DNA analysis. Biochem Syst Ecol 58:274–280
Verma R, Singh M, Kumar S (2016) Unraveling the limits of mitochondrial control region to estimate the fine scale population genetic differentiation in anadromous fish Tenualosa ilisha. Scientifica 2016:2035240
Xia X (2018) DAMBE7: new and improved tools for data analysis in molecular biology and evolution. Mol Biol Evol 35:1550–1552
Xia X, Lemey P (2009) Assessing substitution saturation with DAMBE. In: Lemey P, Salemi M, Vandamme A-M (eds) The phylogenetic handbook: a practical approach to DNA and protein phylogeny, 2nd edn. Cambridge University Press, Cambridge, pp 615–630
Xiao Y, Takahashi M, Yanagimoto T, Zhang Y, Gao T, Yabe M, Sakurai Y (2008) Genetic variation and population structure of willowy flounder Tanakius kitaharai collected from Aomori, Ibaraki and Niigata in northern Japan. Afr J Biotechnol 7:3836–3844
Xiao Y, Gao T, Zhang Y, Yanagimoto T (2010) Demographic history and population structure of blackfin flounder (Glyptocephalus stelleri) in Japan revealed by mitochondrial control region sequences. Biochem Genet 48:402–417
Xiao Y, Zhang Y, Yanagimoto T, Li J, Xiao Z, Gao T, Xu S, Ma D (2011) Population genetic structure of the point-head flounder, Cleisthenes herzensteini, in the northwestern Pacific. Genetica 139:187–198
Zhang CP, Chen X, Yuan L, Wu Y, Ma Y, Jie W, Jiang Y, Guo J, Qiang L, Han C, Shu H (2023) Genetic diversity and population structure of Chinese gizzard shad Clupanodon thrissa in South China based on morphological and molecular markers. Glob Ecol Conserv 41:e02367
Acknowledgements
We are grateful to the late Professor Goto (Iwate University Sanriku Fisheries Research Center) for his invaluable contributions to the conduct of this study. We extend our deepest condolences and gratitude to him.
Funding
This work was partially supported by JSPS KAKENHI grant number JP19K06202.
Author information
Authors and Affiliations
Contributions
All authors contributed to the conception and design of this study. Genetic experiments, data analyses, and manuscript writing were performed by Yuki Yamamoto and Minoru Ikeda. Airi Takanashi and Yuji Yokosawa contributed to the sampling and study design. All the authors have read and approved the final version of the manuscript.
Corresponding author
Ethics declarations
Conflicts of interest
The authors have no relevant financial or nonfinancial interests to disclose.
Consent to participate
Not applicable.
Consent to publish
Not applicable.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Yamamoto, Y., Takanashi, A., Yokosawa, Y. et al. Implication of homoplasy in hypervariable region (HVR) of mitochondrial DNA in a population of marbled flounder Pseudopleuronectes yokohamae: consideration for conducting population genetic analyses using the HVR. Fish Sci 90, 701–712 (2024). https://doi.org/10.1007/s12562-024-01792-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12562-024-01792-z