Distinct genetic clustering in the weakly differentiated polar cod, Boreogadus saida Lepechin, 1774 from East Siberian Sea to Svalbard

The cold-adapted polar cod Boreogadus saida, a key species in Arctic ecosystems, is vulnerable to global warming and ice retreat. In this study, 1257 individuals sampled in 17 locations within the latitudinal range of 75–81°N from Svalbard to East Siberian Sea were genotyped with a dedicated suite of 116 single-nucleotide polymorphic loci (SNP). The overall pattern of isolation by distance (IBD) found was driven by the two easternmost samples (East Siberian Sea and Laptev Sea), whereas no differentiation was registered in the area between the Kara Sea and Svalbard. Eleven SNP under strong linkage disequilibrium, nine of which could be annotated to chromosome 2 in Atlantic cod, defined two genetic groups of distinct size, with the major cluster containing seven-fold larger number of individuals than the minor. No underlying geographic basis was evident, as both clusters were detected throughout all sampling sites in relatively similar proportions (i.e. individuals in the minor cluster ranging between 4 and 19% on the location basis). Similarly, females and males were also evenly distributed between clusters and age groups. A differentiation was, however, found regarding size at age: individuals belonging to the major cluster were significantly longer in the second year. This study contributes to increasing the population genetic knowledge of this species and suggests that an appropriate management should be ensured to safeguard its diversity.


Introduction
Polar cod, Boreogadus saida, Lepechin, 1774 is a small gadid fish whose ability to synthesize antifreeze glycoproteins (Chen et al. 1997) enable survival in the freezing environments of the Arctic Ocean and surroundings seas. In the Atlantic Ocean, it occurs as far south as the Gulf of St Lawrence (Canada), whereas in the Pacific Ocean, it is found as far south as the Sea of Okhotsk and Norton Sound (Western Alaska). This species performs a crucial role in the Arctic food web where it is estimated to funnel 75% of lower trophic energy to upper marine and near-shore predators including birds, seals, beluga whales and eventually polar bears and humans (Ponomarenko 1968;Bradstreet et al. 1986;Crawford and Jorgenson 1996;Hop and Gjøsaeter 2013). Polar cod is a relatively short-lived species (maximum 7-8 years), reaching maturity at the age of 2-3 years for males and females, respectively (Craig et al. 1982). This species mainly spawns once due to its high post-spawning 1 3 mortality (Altukhov 1979;Hop et al. 1995;Nahrgang et al. 2016).
B. saida occupies a wide range of habitats encompassing semi-enclosed brackish bays, under landfast and pack ice, as well as throughout the water column from near-shore shallow water to hundreds of kilometres off-shore (Drolet et al. 1991;David et al. 2016;Robertis et al. 2016;Thorsteinson and Love 2016). It is sometimes found in very dense shoals (Crawford and Jorgenson 1996;Melnikov and Chernova 2013) and known to undertake extensive migrations (Ponomarenko 1968;Hop and Gjøsaeter 2013). Polar cod aggregates in the areas of the Barents Sea where ice forms during late autumn. Hence, from September onwards, polar cod from the Kara Sea migrates to breed in the Barents Sea passing through all the straits of Novaya Zemlya and around its northern tip to the west (Manteifel 1943). It spawns at near-bottom depths in areas near the ice edge from November to February (Ponomarenko 1968(Ponomarenko , 2008 producing positively buoyant eggs (Spencer et al. 2020) that drift for two to four months before hatching . The exact spawning grounds seem to be difficult to identify due to the ice cover; however, the discontinuous distribution of age 0-group individuals reported in some years suggests two different spawning sites (Eriksen et al. 2015): a western component around Svalbard and an eastern component along the western coast of Novaya Zemlya and near the Timan coast in the south-eastern Barents Sea (Ponomarenko 2000), respectively. A recent study applying a coupled ocean-sea ice and particle tracking model in Svalbard waters reported that particles released in the western fjords were mostly retained in the fjords and suggested that spawning in Svalbard area probably occurs on the southern and eastern sides and later in the south-eastern Barents Sea . However, the revision of Russian historical data  combined with literature published in English reported particularly large abundance of polar cod in the eastern Barents Sea and adjacent waters and revealed crucial knowledge gaps including the uncertainty as to what degree the species is dependent on sea ice, its genetic structure or the distribution of the age 0-group in the NE Barents Sea (Aune 2021).
The first attempt of assessing genetic population structure in this species, using random amplified polymorphic DNA (RAPDs), concluded the existence of a panmictic population in the North Atlantic Ocean (Fevolden et al. 1999). Later on, two lineages of mitochondrial DNA were revealed around Greenland waters, although no strong population differentiation was detected (Pálsson et al. 2009). A more recent study showed that polar cod in Svalbard and NE Greenland was distributed between fjord and oceanic populations, the latter one being a panmictic entity, whereas weak differentiation was observed amongst fjords (Madsen et al. 2016). Polar cod across the Pacific Arctic (Bering, Chukchi and Beaufort Seas) was found to comprise a single panmictic population with high genetic diversity compared to other gadoids, as indicated across all 13 protein-coding genes in the mitogenome (Wilson et al. 2017). Small but significant differentiation in polar cod along Russian Arctic coast (from the Kara, Laptev and East Siberian seas) was found with seven microsatellite loci (Gordeeva and Mishin 2019). Lack of spatial population structure and isolation by distance in juveniles sampled in the region between the fjords off West Svalbard, the northern Sophia Basin and the Eurasian Basin of the Arctic Ocean was also reported by Maes et al. (2021) using a suite of nine microsatellites. Finally, a circumpolar study (Nelson 2020) conducted with the same set of nine microsatellite markers detected significant differentiation (F ST = 0.01, p < 0.01) with a geographically underlying basis across the species' entire range. Four genetic groups were thus identified: (i) Canada East (from Resolute Bay to the Gulf of St. Lawrence), (ii) Canada West (Canadian Beaufort Sea and Amundsen Gulf), (iii) Europe (Greenland Sea, Iceland and the Laptev Sea) and (iv) USA (north Bering, Chukchi and western Beaufort seas). In this case, genetic divergence was thought to be influenced by a combination of physical distance, geophysical structure and oceanography, whereas very little genetic differentiation was detected within the identified groups (Nelson et al. 2020). In addition, a number of genomic resources have been made available in the last years (e.g. Breines et al. 2008;Árnason and Halldórsdóttir 2015;Wilson et al. 2018).
Growing concentrations of greenhouse gases warm the atmosphere and the oceans resulting in sea level rise due to thermal expansion of waters and melting of ice sheets and glaciers (Nicholls and Cazenave 2010). The Arctic amplification of global warming is provoking the retreat in the ice edge of the Arctic-Atlantic at unprecedented pace, thereby negatively affecting this cold-adapted species (Bouchard et al. 2016;Steiner, 2019) with narrow thermal plasticity (Drost et al. 2016;Laurel et al. 2018). Concurrently, boreal species extend their habitats into the northern Barents Sea (Fossheim et al. 2015) and sea ice retreat opens up vast areas for human activities such as petroleum, deep-sea mining and fishing, both adding to the pressures on these pristine Arctic ecosystems (see also Christiansen 2017). As a consequence, the abundance of polar cod, a keystone species of the iceassociated food web, has diminished (Divoky et al. 2015). The lower levels that polar cod stock experienced during the last years together with the fluctuating recruitment cannot be attributed to harvesting and temporally coincide with the rise of sea temperatures in the Barents Sea. A strong mechanistic link between survival in the early stages of the life cycle and changes in ice cover and temperature suggests an imminent collapse of recruitment if the observed ice-reduction and warming continue (Huserbråten et al. 2019). A simulation study indicated that the environmental conditions needed 1 3 to account for a successful recruitment to 0-group consist of high ice concentration in summer and low temperatures throughout the pelagic juvenile phase; therefore, the perturbations from the Arctic ocean climate typically found in the northern and eastern Barents Sea appear to be detrimental . In contrast, a 30-year data series on the length-at-age of polar cod cohorts (ages 0-4) analysed in relation to sea surface temperature, sea ice concentration, prey biomasses, predator indices and length-at-age the previous year using multiple linear regression suggested that retreating sea ice had positive effects on the growth of polar cod in the Barents Sea as only length-at-age for age 1 showed a positive significant relationship with prey biomass. High length-at-age was significantly associated with low sea ice concentration and high length-at-age the previous year (Dupont et al. 2020).
This study aims to characterize the population genetic structure of polar cod in the geographic region ranging from Svalbard to the East Siberian Sea using a panel of specifically developed SNP markers. The deeper genetic knowledge this new panel provides can give useful guidance for conservation and sustainable management of this species.

Sampling
A total of 1316 individuals was sampled from 17 geographic locations ranging from 75 to 81°N, from East Siberian Sea to Svalbard (Fig. 1). Full biometric data including length, weight, age, stage and sex were available for 596 individuals. Age was determined by counting winter zones from whole otoliths submerged in water and under binoculars as described in Gjøsaeter and Ajiad (1994). All samples were collected by pelagic trawl with the exception of ESS, LS and KS (see Table 1), which were sampled using Bongo net. In all cases, sampling was framed within surveys aiming to assess fish stocks.

DNA isolation, SNP panel selection and genotyping
DNA was extracted from ethanol-preserved fin clips using OMEGA E-Z 96™ Tissue DNA kit (Promega). A panel of single-nucleotide polymorphism (SNP) markers was developed from a restriction site-associated DNA sequencing (RADseq) data set of 270 polar cod individuals from oceanic and fjord populations from the North East Atlantic Ocean. RADseq (Baird 2008;Emerson et al. 2010;Hohenlohe et al. 2010) can identify and score thousands of genetic markers, randomly distributed across the target genome and has become an important tool for ecological population genomics (e.g. Davey et al. 2011;Bergey et al. 2013;Ogden, 2013). The initial dataset used for SNP panel selection consisted of 141,178 SNP markers that fulfilled the following filtering criteria: > 1% major allele frequency (MAF), > 65% presence per population, only a single SNP per sequence tag and removal of suspected paralogues. SNPs were then ranked based on the F ST computed between oceanic and fjord populations, and the 613 SNPs displaying the highest values were selected as a panel for screening two Svalbard samples (63 individuals), which were also part of the initial dataset. Out of those 613 SNPs, 316 were detected in the two Svalbard samples and F ST based on these 316 SNPs was computed between the samples. The top 192 SNPs showing significant differentiation were used to develop the final SNP panel used in all the downstream analysis. A subset of 174 of those SNPs could be distributed on seven multiplex reactions and amplified on a Sequenom MassARRAY iPLEX Platform, as described by Gabriel et al. (2009).

Population genetic analyses
The genotype frequency per locus and its direction (heterozygote deficit or excess) against the expectations under Hardy-Weinberg equilibrium (HWE) as well as linkage disequilibrium (LD) between pairs of loci were estimated using the programme GENEPOP v7 (Rousset 2008). HWE and LD were examined using the Markov chain parameters: 10,000 steps of dememorization, 1000 batches and 10,000 iterations per batch, and significance was assessed after the post hoc sequential Bonferroni correction (Holm 1979). Observed (H o ) and unbiased expected heterozygosity (uH e ), and also the inbreeding coefficient (F IS ), were computed for each sample with GenAlEx 6.503 (Peakall and Smouse 2006).
Genetic relationships amongst samples were examined using principal component analysis (PCA) (Pearson 1901;Hotelling 1933;Jolliffe 2002), as implemented in adegenet (Jombart 2008). Population genetic structure was examined using the analysis of molecular variance (AMOVA) and by estimating pairwise F ST (Weir and Cockerham 1984) using ARLEQUIN v.3.5.1.2 (Excoffier et al. 2005) with 10,000 permutations. The Bayesian clustering approach implemented in STRU CTU RE v. 2.3.4 (Pritchard et al. 2000) was used to identify genetic groups under a model assuming admixture and correlated allele frequencies without using population information. The analyses were conducted using the programme ParallelStructure (Besnier and Glover 2013) to reduce the computational time. Ten runs with a burn-in period consisting of 100,000 replications and a run length of 1,000,000 Markov chain Monte Carlo (MCMC) iterations were performed for K = 1 to K = 10 clusters. To determine the number of genetic groups in the data, STRU CTU RE output was analysed using two approaches. Firstly, the ad hoc summary statistic ΔK of Evanno et al. (2005) was calculated. Secondly, StructureSelector (Li and Liu 2018) was used to estimate Puechmaille (2016) statistics (MedMed, MedMean, MaxMed and MaxMean), which have been described as more accurate than the previously used methods to determine the best fit number of clusters, for both even and uneven sampling data. Finally, the ten runs for the selected Ks were averaged with CLUMPP v.1.1.1 (Jakobsson and Rosenberg 2007) using the FullSearch algorithm and the G' pairwise matrix similarity statistic and graphically displayed using barplots. Genetic clustering using sampling sites as prior was also examined and visualized using discriminant analysis of principal components (DAPC) (Jombart et al. 2010) in adegenet (Jombart 2008). A number of up to 116 principal components (PCs) were tested to determine the optimal number to avoid overfitting of the data and creating artificially large separation between groups (Jombart and Collins 2015;Miller et al. 2020). The trade-off between power of discrimination and overfitting was measured using the a-score, which is the difference between the proportion of successful reassignment of the analysis (observed discrimination) and values obtained using random groups (random discrimination), i.e. the proportion of successful reassignment corrected for the number of retained PCs. Loci departing from neutrality and therefore reflecting possible selective responses or linkage disequilibrium with genes under divergent selection (Lewontin and Krakauer 1973) were statistically identified using two outlier analytic approaches: BayeScan  and LOSITAN (Antao et al. 2008). BayeScan was run by setting sample size to 10,000 and the thinning interval to 50 as suggested by . The loci with a posterior probability above 0.99 were retained as outliers, corresponding to a Bayes Factor (BF) > 2, i.e. "decisive selection" (Foll and Gaggiotti 2006). In LOSITAN, a neutral distribution of F ST with 100,000 iterations was simulated, with a forced mean F ST at a significance level of 0.05 under an infinite allele model.
Allele frequency shifts at outlier loci are expected to be driven by selective responses towards strong ecological gradients leading to local adaptation, either due to directly associated genes or through hitchhiking (linkage) with associated genes (Gagnaire 2015). Major allele frequencies (MAF) per sample were displayed as appropriate.
Finally, the relationship between genetic distance, i.e. F ST /(1−F ST ), and geographic distance was examined using Mantel (1967) test to investigate if the correlation conformed the expectations of "Isolation by Distance" (IBD), a pattern of increasing genetic differentiation with geographic distance as a result of restricted gene flow and drift (Wright 1943;Slatkin 1993;Rousset 1997). Putative patterns IBD were investigated not only using the full set of SNPs but also with loci decomposed into candidates to selection (i.e. SNPs flagged by any of the outlier analyses) and neutrals (i.e. SNPs that failed to be identified by any of the outlier detection approaches), both for the total set of samples as well as after excluding the farthest ones (East Siberian Sea, ESS and Laptev Sea, LS). The matrix of geographic distance was calculated using the R-package marmap (Pante and Simon-Bouhet 2013) by computing the shortest pairwise distances by water, i.e. avoiding land masses. Twotailed Mantel (1967) tests were conducted using PASSaGE v2 (Rosenberg and Anderson 2011) and significance was assessed via 10,000 permutations.

Genetic variation and overall population structure
The SNP loci and their corresponding flanking regions together with the raw data are available in Online Resource 1 (Tables S1 and S2, respectively). A significant deficit of heterozygotes was detected for 11 of the 127 loci, which were thus discarded for further analyses. Table 1 shows the summary statistics across the 116 remaining loci for each sample. The observed (H o ) and unbiased expected heterozygosity (uH e ) ranged from 0.114 to 0.138 and 0.112 to 0.141, respectively (Table 1). In both cases, the highest values were reported for sample ESS, taken in the East Siberian Sea.
Principal components analyses (PCA) revealed two main groups (Fig. 2), which were confirmed by the first level of division in STRU CTU RE (Fig. 3a). Each genotype was assigned to the two inferred clusters based on the individual proportion of membership (q i ) using a threshold of ancestry of q > 0.8, which allowed the identification of a major cluster hosting 1098 individuals and a minor cluster hosting 151, thus leaving out eight admixed individuals. The number of individuals belonging to the minor cluster ranged from 2 (ESS, st611) to 17 (TB784) per sample. These low numbers hampered to conduct any further geographically explicit analysis due to the lack of statistical power. The ratio between females and males of adult fish was around 1 in both clusters.
The subsequent analyses were conducted separately for the total set of individuals as well as for the major cluster. However, given the consistency between both patterns, and unless otherwise stated, only the results corresponding to the full dataset will be presented here, whereas the results for the major cluster can be found in the Online Resource 2. The AMOVA revealed an overall significant genetic structure (F ST = 0.0045, p < 0.0001), with 99.5% of the variation hosted within populations. The pairwise F ST comparisons showed that the easternmost sample (ESS) was significantly different from all other ones (F ST ranging from 0.013 to 0.038, see also Table S4 for major allele frequency per sample) followed by the Laptev Sea, LS (range from 0 to 0.007), which differed from six of the samples (although four of them would lose significance after Bonferroni correction) ( Table 2). No differentiation was observed amongst the samples collected between the Kara Sea and Svalbard.
STRU CTU RE analysis after the removal of the individuals from the minor cluster revealed K = 2 as the most likely number of clusters (Fig. 3b). The ancestry to cluster on a sampling site basis confirmed the results from the pairwise F ST . These patterns were further corroborated by the discriminant analysis of principal components using the full set of individuals (Fig. S1 in Online Resource 2), where the 36 PCs retained allowed to identify a first axis, which accounted for 43.8% of the variation and isolated sample ESS. Axis 2, responsible for 11.4% of the variation, did not fully separate sample LS, which overlapped to a great extent with the remaining.
LOSITAN identified seven SNPs deviating from neutrality, two of which were confirmed by BayeScan (Table 3). Patterns of IBD were recorded for the full set of samples (and individuals) irrespective of the type of markers used (neutrals, candidate to selection or altogether) albeit the strength of the correlation was weaker at neutral loci (Table 4). The significance of IBD was due to the easternmost samples (ESS, LS), as patterns faded away when these samples were excluded from the analyses (see Fig. S3 in Online Resource 2 for illustration).

Study of the genetic clusters identified by principal component analysis
The strongest genetic signal in the dataset was revealed by PCA (Fig. 2) and STRU CTU RE (Fig. 3a) as two significantly distinct groups of individuals (F ST = 0.139, p < 0.0001) were found in every sampling site, i.e. the proportion of individuals belonging to the minor cluster ranged between 4 and  19% per site. The individual's axis position for PC1 was strongly correlated with STRU CTU RE ancestry coefficient Q (r = 0.983, p < 0.0001). To rule out that the clustering pattern was driven by the SNP cross-amplification of any taxonomically related species that could be morphologically confounded with polar cod, a subset of individuals was barcoded. The resulting COI fragment (654 bp) confirmed that samples belonged to B. saida (see Online Resource 3 for a detailed description).
The F ST per locus computed between clusters reached statistical significance in 15 out of 116 SNPs (13%); ten of which showed F ST > 0.6 ( Table 5). Amongst the loci revealing significant structure, the eleven SNPs displaying the largest differentiation in major allele frequency (MAF) between clusters were found to be linked (Fisher's method, p < 0.05), all of them being annotated to chromosome 2 in Atlantic cod, Gadus morhua genome assembly with the exception of two for which no match could be found. Furthermore, the allelic patterns showed by nine of those eleven loci (i.e. OMLE01011103, OMLE01085950, OMLE01122866, OMLE01075874, OMLE01065648, OMLE01120889, OMLE01059372, OMLE01120901, OMLE01025315) Table 2 Heatmap of pairwise F ST for collections including all the individuals (below diagonal) and for those containing only individuals belonging to the "major" cluster (above diagonal) T B792 TB783 TB813 TB782 st676 TB786 TB785 TB784 st629 st623 st632 st597 st611  Greener colours indicate lowest differentiation (F ST closer to zero) whereas yellow towards red colours represent larger differentiation were depicted by a homozygote in the major cluster and conversely to a heterozygote in the minor one. As biometric information was available for only half of the individuals belonging to the minor cluster, an identical number of individuals (N = 74) was randomly sampled from the major cluster to compare length-at-age under even conditions. The distribution of individuals per age was very similar in both cases, but only age classes 1 and 2 had large enough sampling size (n ≥ 25) to conduct statistical analysis. No differentiation in length was registered when the individuals were one year old (t = − 0.686, p = 0.496); however, individuals from the major cluster reached a larger size at the second year of age (t = − 2.169, p = 0.036), i.e. average length of 22.1 cm versus the 16.0 cm in the minor cluster (Fig. 4).

Discussion
The current study, investigating the population genetic structure of polar cod from Svalbard to the East Siberian Sea using a SNP panel, revealed two main results. Firstly, the existence of two significantly distinct genetic groups: a major cluster that contained 87.35% of the individuals and a minor cluster hosting 12%. Geography does not seem to account for the clustering as both genetic groups were represented in every sampling site, i.e. individuals belonging to the minor cluster were distributed across all samples in proportions ranging from 4 to 19%. Secondly, the pattern of isolation by distance found was driven by the two easternmost samples (ESS and LS), whereas a lack of differentiation was reported in the geographic area ranging from the Kara Sea to Svalbard.

Population structure in polar cod
The suite of 116 polymorphic loci revealed a significant pattern of IBD driven by the easternmost Russian samples (East Siberian Sea and Laptev Sea), whilst no spatial population structure was detected in the region between the Kara Sea and Svalbard. This result agrees with the absence of spatial population structure and isolation by distance reported by Maes et al. (2021) using nine microsatellites, which suggests ongoing gene flow in the region between the fjords off West Svalbard, the northern Sophia Basin and the Eurasian Basin of the Arctic Ocean. Likewise, previous studies showed small, but significant, differentiation along the Russian Arctic coast (Gordeeva and Mishin 2019) and lack of genetic structure within the Bering, Chukchi and Beaufort seas of Alaska, despite a consistent isolation by distance pattern (Wilson et al. 2017). The lack of genetic differentiation between the Kara Sea and Svalbard, in agreement with the genetic profile of a panmictic population, typically occurs in large and genetically diverse populations (Frankham 1996). Polar cod is the most widespread and abundant species in the Arctic Ocean, with a stock estimated of almost Eight of the SNPs showing the highest F ST values were annotated to chromosome 2 in Gadus morhua genome assembly. The eleven SNPs highlighted in boldface type were found to be significantly linked to each other in the full dataset across all samples (Fisher's method, p < 0.05), nine of them being annotated to chromosome 2 in cod. The allelic patterns for the nine SNPs in italics correspond to a homozygote in the major cluster and a heterozygote in the minor one 2 million tonnes (only in the Barents Sea) in some years , and has been shown to display large genetic diversity (Wilson et al. 2017). This absence of structure might also result from the significant gene flow due to passive dispersal during early stages and migrations during the adult stage (Ponomarenko 1968;Hop and Gjøsaeter 2013;Gordeeva and Mishin 2019;Nelson et al. 2020), i.e. advection by sea-ice drift has been hypothesized to play an important role for the genetic connectivity of circum-Arctic populations (David et al. 2016). However, the results by Huserbråten et al. (2019) indicate that dispersal features segregate ichthyoplankton drift from the two main spawning grounds of B. saida in the Barents Sea, towards the southeast as well as a northward retreat of one of two clearly defined spawning assemblages, possibly in response to warming.

Putative origin of polar cod clustering
The individuals examined in this study displayed a two-clustering pattern irrespective of their geographic explicit locations, as it has been shown in Atlantic cod, G. morhua (Berg 2016;Kirubakaran 2016); three-spine stickleback, Gasterosteus aculeatus ( therefore, body colour has been invoked to play a pivotal role in facilitating sympatric speciation in the absence of impermeable geographic barriers (Casas et al. 2021). In contrast, for other fish species, time of spawning seems to be a crucial evolutionary driver. Genomic data collected from spawning aggregations of Pacific herring (Clupea pallasii) across 1600 km of coastline showed that reproductive timing drives population structure in these pelagic fish (Petrou, 2021). On Atlantic herring (C. harengus), genomic regions have also been identified in association with spawning season (Martínez Barrio, 2016;Lamichhaney 2017), some of which are located on a supergene maintained by a chromosomal inversion (Pettersson 2019). In both species, spawning phenology was highly correlated with polymorphisms in several genes, in particular SYNE2, which is probably within a chromosomal rearrangement in Pacific herring and known to be associated with spawn timing in Atlantic herring (Lamichhaney et al. 2017;Petrou et al. 2021). A group of eleven SNPs under strong LD, which were mostly annotated to chromosome 2 (LG02) in Atlantic cod genome assembly, accounted for the clustering pattern in polar cod. This linkage group 2 has been shown to be highly divergent between spring and winter spawners in Atlantic cod within the Gulf of Maine (Barney et al. 2017). Genomic data would be needed to test the hypothesis if the different spawning times for polar cod in Svalbard area and the south-eastern Barents Sea proposed after a particle tracking Box and whiskers plot for length (cm) of the individuals belonging to minor and major cluster at ages 1 and 2 years, respectively. The box represents the interquartile range with the notch showing the median and the red cross depicting the mean. The upper whisker represents the largest value within 1.5 times interquartile range above 75 th percentile, whereas the lower whisker represents the smallest value within 1.5 times interquartile range below 25 th percentile. Outliers depict those values > 1.5 and < 3 times the interquartile range beyond either end of the box model study  align with this clustering pattern. Likewise, rearrangements in LG2, 7 and 12 have been identified within coastal and offshore samples of Atlantic cod on the Norwegian coast (Sodeland 2016;Johansen 2020). In Atlantic cod, cryptic sympatric populations have been described in coastal areas. Thus, in a 14-year period, stable coexistence of two genetically divergent Atlantic cod ecotypes were reported along the Norwegian Skagerrak coast: a "fjord" ecotype that dominates in numbers deep inside fjords, whereas the "North Sea" ecotype was the only type found in offshore North Sea (Knutsen 2018). Likewise, polar cod sampled inside fjords in Svalbard and north-east Greenland has been reported to be different from polar cod sampled over the shelf. The observed genetic variation did not adhere to an isolation by distance pattern and it was speculated that B. saida populations inhabiting fjords may have become reproductively isolated from shelf-dwelling B. saida during the last post-glacial recolonization (Madsen et al. 2016), whereas the particle tracking study of Eriksen et al. (2020) revealed that particles released in the fjords of western Svalbard were mostly retained in the fjords, thus suggesting that spawning in these areas might lead to isolation. Besides, polar cod is suggested to display several intraspecific forms differing in proportions of body, size and position of fins and in coloration (Chernova 2018). Two ecotypes (coastal and oceanic) of polar cod have also been described in the Russian Arctic, which differ in age of maturation, body shape and colour of larvae and adult fish (Moskalenko 1964). The oceanic one undertakes long migrations, its distribution is associated with ice cover and it reaches the highest latitudes (Moskalenko 1964). Likewise, juveniles sampled in the Russian Arctic seas showed a trend of decreasing body size with increasing latitude (Mishin et al. 2018). Whilst we cannot make the differentiation between such ecotypes in the present study, the average length recorded at the second year of age was larger for the individuals belonging to the major cluster (22.1 vs. 16 cm in the minor cluster). However, sampling sizes urge for a cautious interpretation of this result. Likewise, a differential length in ecotypes was also reported between the "North Sea" and the "fjord" ecotype of Atlantic cod that coexist along the Skagerrak coast, i.e. the "North Sea" type grows faster during the summer and is, on average, 2 cm longer in autumn, when they reach 9 and 11 cm, respectively (Knutsen et al. 2018). The settling period in cod occurs within the size range of 40−110 mm, and it was shown that the aforementioned differences in growth rate were found to be small until the juveniles reached ~ 40 mm and disappeared again in juveniles larger than 110 mm, thus highlighting how this vital rate may change rapidly at ontogenetic transition points leading to different phenotypic trajectories in co-existing ecotypes (Jørgensen et al. 2020). Likewise, migratory NE Arctic cod (NEAC) and stationary Norwegian (NCC) and Russian coastal cod have been reported to display both genetic and phenotypic (growth, maturation, counts of vertebrae) differences (Nordeide et al. 2011;Johansen et al. 2020). Thus, for example, BB homozygotes for Pantophysin, Pan I (typical of NEAC), were of significantly longer size than the Pan I AA homozygotes (typical of NCC) (Fevolden et al. 2012).
Chromosomal inversions underlie all four supergenes (i.e. sets of genes that are inherited as a single marker and encode complex phenotypes through their joint action) recently identified in Atlantic cod (Matschiner et al. 2021), which have been shown to originate separately between 0.40 and 1.66 million years ago and be linked to migratory lifestyle and environmental adaptations such as to salinity. Such a phenomenon cannot be excluded in the phylogenetically close B. saida. Thus, future studies may benefit from expanding the genomic coverage of SNPs to investigate whether structural genomic variants, ecotype specific haplotypes and/or other genomic signatures can shed light on the occurrence of the two discrete clusters observed here.

Management implications
A notable effect of global warming is the poleward shift of the distribution range in marine species (Poloczanska 2016). Over the past 30 years, at least 19 fish species of major economic importance (e.g. cod, herring, saithe, sprat, hake and sole) have experienced changes in distribution in some parts of their range (Baudron 2020). Polar cod is one of the 21 commercially targeted species in the Barents Sea although its economic importance has been relatively low. In the last years, Russia was harvesting this species at very low levels (Popov and Zeller 2018;Wienerroither et al. 2011) and nowadays it is not directly targeted (Aune et al. 2021) although taken as by-catch in the shrimp fishery (Jacques et al. 2019). However, in the light of climate change, the facilitated access to Arctic waters due to the ice retreat might lead to increased exploitation of polar cod, either as a target species or as bycatch in pelagic fisheries and bottom trawling (Christiansen et al. 2014). Additionally, changed physiological constraints or increased competition from other fish species such as the subarctic capelin, Mallotus villosus (Hop and Gjøsaeter 2013), or through predation from boreal species like Atlantic cod and haddock, Melanogrammus aeglefinus, expanding their range into the polar cod habitat (Renaud et al. 2012;Christiansen 2016;Andrews et al. 2019) might represent further challenges for the persistence of its populations in the future.
The sustainable management of this species would benefit from the use of comprehensive molecular tools to disentangle the underlying basis of the observed clustering reported in this study. Furthermore, a verification of the existence of the alleged spawning grounds (around Svalbard and along Novaya Zemlya) is needed, as well as a deeper understanding of the genetic structure of fjord versus oceanic polar cod to clarify if they represent two different management units as it occurs in Atlantic cod.
Acknowledgements We would like to thank Georg Skaret, Jostein Røttingen, Elena Eriksen, the staff of the Polar branch of VNIRO and the crew on board all research vessels in Norway and Russia for collecting the samples. The crew of RV Helmer Hanssen is thanked for providing samples from NE Greenland and Svalbard waters for SNPs development. Georg Skaret and Stine Karlson are acknowledged for performing the age determination of the fish. We are grateful to Bjørghild Breistein, Geir Dahle, Per Erik Jorde and François Besnier for inspiring discussions on an early stage of this manuscript. The following entities are thanked for providing financial support: Equinor through the Grant No. 4590100459; VIGG RAS and IORAS Russian Government basic research programmes for 2019-2021 (# 0112-2019-0001 and # 0149-2018-0008); the Institute of Marine Research through funding from the Norwegian Ministry for Industry and Fisheries (NFD) and the projects "Stock complexes in the Barents Sea" (the Barents Sea program) and "Polar cod" ("Økologiske prosesser") programme; the UiT-The Arctic University of Norway through the TUNU Programme. The Research Council of Norway is thanked for financial support through the project "Arctic ecosystem impact assessment of oil in ice under climate change" (ACTION -RCN no. 314449/E40). Great thanks are also due to reviewer Sharon Wildes and to the two anonymous referees for their insightful comments to the first version of the manuscript.
Author Contributions TJ, FV, KP, DZ, MQ conceived and designed the research. TJ, NG, DZ conducted the data collection. KP, SB, C-HCC contributed the analytical tools (SNPs). GWS, TH, AMR conducted the laboratory work. MQ performed the statistical analysis and wrote the first draft of the manuscript. All authors contributed, read and approved the final version of the manuscript.
Funding Open access funding provided by Institute Of Marine Research. The funding for the present research has been provided by the following institutions: Equinor through the Grant No. 4590100459; VIGG RAS and IORAS Russian Government basic research programmes for 2019-2021 (# 0112-2019-0001 and # 0149-2018-0008); the Institute of Marine Research through funding from the Norwegian Ministry for Industry and Fisheries (NFD) and the projects "Stock complexes in the Barents Sea" (the Barents Sea program) and "Polar cod" ("Økologiske prosesser") programme; the UiT-The Arctic University of Norway through the TUNU Programme. The Research Council of Norway is thanked for financial support through the project "Arctic ecosystem impact assessment of oil in ice under climate change" (ACTION -RCN no. 314449/E40).

Conflict of interest
The authors declare that no competing interest exists, as well as that there is no financial support or relationships that may pose any kind of conflict. Likewise, the authors declare that contributed to the text, agreed with its content and approved it for submission.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.