Background

The divergence of polar bears (Ursus maritimus) from brown bears (Ursus arctos) and their adaptation to a novel environment and lifestyle in the Arctic is a well-known example of rapid evolution [1, 2]. Despite having diverged relatively recently (ca. 479–343 thousand years ago (kya)) [1], polar bears evolved a novel and distinct ecology, behaviour, and morphology. This rapid evolution was proven even more substantial by stable isotope analysis of an ancient polar bear jawbone from Svalbard, indicating that the species had already adapted to a marine diet and life in the High Arctic by at least 110 kya [3]. Therefore, in perhaps as little as 20,000 generations, polar bears evolved a suite of unique adaptations that enable them to maintain homeostasis under low temperatures, occupy a hypercarnivorous niche, subsist on a diet of primarily seals and their blubber [4], process lipids as their predominant energy source [5, 6], and camouflage into their surroundings with pigment-free fur [7].

The rapid adaptation of such an iconic species has sparked a number of studies seeking to unravel the genomic underpinnings through the use of various datasets and analyses [1, 2, 8]. Both Miller et al. [2] and Liu et al. [1], investigated whole-genome datasets to uncover genomic regions showing signatures of selection. Together, these studies revealed different yet complementary sets of genes, pathways, and phenotypes that were likely shaped by natural selection during polar bear evolution. Genes showing the strongest signal of positive selection are involved in adipose tissue development, fatty acid metabolism, heart function, and fur pigmentation [1], indicating these are key processes in polar bear adaptation to the Arctic.

Using all currently available nuclear genomes from polar bears (n = 109) and brown bears (n = 33), we set out to determine the nature of the polar bear-specific amino acid changes in the top genes showing the strongest signal of positive selection in the polar bear lineage as reported by Liu et al., (Table 1 and supplementary Table S1) [1]. This increase in sample number and spatial coverage enabled us to more comprehensively investigate the origins of polar bear-specific substitutions. By comparing the target gene regions across the two species based on this comprehensive dataset, we (i) determine the number and location of missense substitutions specific to the polar bear lineage (ii) address whether the polar bear variants derive from standing variation or de novo mutations, (iii) evaluate whether the variants fixed in polar bears may lead to functional changes in the genes.

Results

Population structure and admixture

To gain a better understanding of whether selection acted upon standing variation in the polar/brown bear ancestral lineage or on de novo mutations in the polar bear lineage, we first needed to determine whether derived variants were already present in the ancestral gene pool. However, as admixture from polar bears into brown bears has been well documented [1, 2, 9,10,11], we first investigated admixture at our selected genes. If any introgression from polar bears persists in the brown bear gene pool, inferences of the origins of these variants (standing variation or de novo) may be biased. To do this, we ran independent PCAs on each of the twelve genes analysed including the 50 kb flanking regions up and downstream of the genes.

Clear genetic differentiation between polar bears and brown bears has been shown at a genome-wide scale [1, 2]. For eleven of the twelve genes, polar bears and brown bears form separate clusters, which we expect if there is no introgressive admixture/ILS at these loci (Supplementary Figs. 1–5 and 7–12). However, for EDH3, seven of the brown bears cluster within the diversity of polar bears, indicating some level of introgression or ILS around this gene (Supplementary Fig. 6). The seven brown bears were all from the Admiralty, Baranof, and Chichagof islands in Alaska, an area known to be inhabited by introgressed brown bears [9]. We therefore excluded EDH3 from further analysis.

Non-synonymous differences between polar bears and brown bears

For this analysis, we only considered dinucleotide sites coding for non-synonymous amino acid changes. Firstly, we calculated the percentages of sites fixed within polar bears (Table 1), which ranged between 10 of 8424 coding sites (0.12%) in FCGBP to 18 of 3957 coding sites (0.45%) in LAMC3. We compared this to the number of sites also fixed in brown bears for an alternative allele. Seven out of the eleven genes contained sites showing this pattern (Table 1). More specific information regarding gene positions, amino acid changes, and allele counts can be found in supplementary Table S2.

Table 1 Number of sites in candidate genes showing the strongest signal of selection in polar bears

Fixed derived missense mutations in polar bear

To understand the origins of the variants fixed in the polar bear lineage relative to brown bears (i.e. whether they are derived or ancestral), we compared the polar bear and brown bear alleles to the giant panda (Ailuropoda melanoleuca) reference sequence, assuming the ancestral variant is retained in the panda. This was to reduce any noise caused by changes occurring in the brown bear lineage after its divergence from polar bears. Therefore, we only included sites where polar bears acquired a derived allele, while brown bears retained the ancestral state. Seven of the eleven genes of interest had at least one site showing this pattern (Table 1, Fig. 1). The number of such sites varied from zero (CUL7, FCGBP, LAMC3, and XIRP1) to twelve (TTN). More specific information regarding the gene positions and allele counts can be found in supplementary Table S3.

Fig. 1
figure 1

Relative proportions of genotype frequencies at biallelic sites across nine genes with fixed variants in the polar bear lineage. Only sites which are fixed in polar bears for the derived allele and are biallelic or fixed in brown bears for the ancestral allele are shown. Sites fixed for the derived allele in the polar bear lineage and fixed for the ancestral allele in the brown bear lineage (putative de novo mutations) are indicated with an asterisk. Gene length and the position of each site along the gene is indicated, as are the number of individuals (n = X) with more than 4x coverage at said site

Influence on protein structure

To evaluate whether the fixed derived amino acid changes in polar bears (also fixed for the ancestral allele in brown bears) had any functional influence on the protein function, we implemented three independent analyses to predict the functional effects of the amino acid change in humans. Amino acid changes that appeared to have a functional influence according to at least one of the three tests were found in four genes, with varying numbers of sites displaying this in each gene, TTN (9 sites), LYST (3 sites), AIM1 (2 sites), and COL5A3 (1 site) (Supplementary Table S3). However, none of these sites showed consistent deleterious effects across all three analyses. All other amino acid changes were deemed either benign, neutral or tolerated. It should be noted that these amino acid changes may have a different influence on polar bear proteins compared to humans, and this should be taken into account in the interpretation of these results.

Discussion

To increase our understanding of how polar bears adapted to the Arctic, we investigated the origins of the genomic variants in eleven candidate genes previously found to show the strongest signals of positive selection in the polar bear lineage [1], nine of which have functions that could be linked to Arctic-specific adaptations, including adipose tissue development, fatty acid metabolism, heart function, and fur pigmentation (Table 1). We analysed a comprehensive panel of polar bear and brown bear nuclear genomes, and identified biallelic sites in which alternative alleles result in an amino acid change within these genes.

We find the majority of sites fixed in polar bears are biallelic in brown bears (Table 1, Fig. 1). This result may reflect that natural selection more readily acted upon standing variation already in the ancestral polar/brown bear gene pool, allowing for more rapid adaptation compared to selection on de novo mutations. However, we also identify a number of sites in seven of the eleven genes analysed, in which polar bears are fixed for the derived allele and brown bears are fixed for the alternative, ancestral allele, relative to the outgroup giant panda. Although the absolute number of sites showing this pattern is relatively low (ranging from 1 to 12 sites in each gene), these findings suggest selection may also have acted on de novo mutations in the polar bear lineage.

However, our results may also reflect the relatively low genetic diversity in polar bears compared to brown bears (Supplementary Figs. 1–12), which is due to the long-term, low effective population size of polar bears [1, 2]. Such low diversity would have exacerbated the impact of genetic drift on standing variation, potentially leading to the fixation of variable sites in the polar bear lineage. In contrast, it is unlikely that de novo mutations reached fixation by genetic drift alone, even in the fairly homogenous polar bear gene pool. The initial low frequency of a de novo mutation would more likely lead to the mutation being purged from the gene pool rather than being fixed via genetic drift. Therefore, it is more probable that the de novo polar bear-specific substitutions reached fixation due to strong selective pressures, as opposed to simply reflecting the relatively limited genetic diversity of polar bears relative to brown bears.

Moreover, at both variant types (standing variation and de novo), we find evidence for protein structure-altering substitutions in the polar bear lineage (Supplementary Table S3), potentially leading to phenotypic change in the species. This adds weight to our hypothesis that selection acted on both standing variation and on de novo mutations. However, we cannot exclude that the fixation of some of these phenotypically relevant mutations may be due to genetic hitchhiking from one causative mutation in the gene. Regardless, the fact that putatively phenotypic changes occur at both variant types suggests natural selection acted upon both standing variation and de novo mutations.

The genes with the highest proportion of sites indicating de novo mutations were APOB and ABCC6 (with four and two such sites, respectively), which are associated with the cardiovascular system, and LYST and AIM1 (with seven and two such sites, respectively), which are associated with pigmentation (Table 1).

The APOB gene codes for apolipoprotein B (apoB), the primary lipid-binding protein of chylomicrons and low-density lipoproteins (LDL), which enables the mobility of fat molecules around the body [12, 13]. The ABCC6 gene encodes for a protein belonging to the superfamily of ATP-binding cassette (ABC) transporters and is involved in transporting various molecules across extra- and intra-cellular membranes. Diseases associated with ABCC6 include Pseudoxanthoma elasticum (PXE), a neurocutaneous disorder that affects the elastic tissue of the cardiovascular system, causes arterial calcification, and increases the risk of coronary artery disease [14,15,16]. Selection on the APOB and ABCC6 genes may have played a role in the novel adaptation of polar bears to a lipid-rich diet, and increased the efficacy of cholesterol clearance from the blood [1].

The LYST gene codes for the lysosomal trafficking regulator Lyst. Mutations in the LYST gene have been reported to cause hypopigmentation, a melanosome defect characterized by light coat color [17, 18]. AIM1 is also associated with colouration, as variable expression of AIM1 has been associated with tumor suppression in human melanoma, influencing melanin pigment production [19]. Selection on LYST and AIM1 may have led to the lack of fur pigmentation in polar bears, resulting in the characteristic white phenotype of the species that may confer a selective advantage in the Arctic. Detrimental amino acid changes may have significantly hindered the function of these genes, resulting in the lack of pigmentation for natural selection to act upon.

Conclusion

Although genes involved in adaptation of polar bears to their Arctic lifestyle have previously been uncovered [1, 2], a comprehensive assessment of whether selection on these genes acted upon standing variation or de novo mutations has been lacking. In the present study, through the analysis of a comprehensive data set of polar bear and brown bear genomes, we were able to address this question and provide new insights into the origins of variants found in genes under selection (did they derive from standing variation in the ancestral gene pool or de novo mutation?) that putatively enabled the rapid adaptation in polar bears to the Arctic.

Methods

For this study, we analysed publicly available whole-genome data from 109 polar bears and 33 brown bears (Supplementary Table S4). This included the data set from Liu et al. [1] of 89 genomes and an additional 30 polar bear and 23 brown bear genomes published elsewhere [2, 9, 11, 20, 21]. We downloaded SRA files from NCBI (Bioproject IDs: PRJNA169236, PRJNA196978, PRJNA210951, PRJNA271471, PRJNA395974, and PRJEB27491). Polar bear individuals originated from West Greenland, East Greenland, Canada, Siberia, USA (Alaska), and Svalbard (genome-wide coverage of 1.7x - 114.1x). Brown bear individuals originated from the USA (Montana, Alaska), Sweden, Finland, Italy, Greece, Slovakia, Spain, Slovenia, Georgia, and Russia (genome-wide coverage of 2.7x - 58.2x). The recently published brown bear reference genome individual was excluded from the analysis due to unknown provenance [22].

Raw data processing

We processed all raw sequencing reads with the PALEOMIX [23] pipeline. Internally, adapter sequences, stretches of Ns, and low-quality bases were trimmed and filtered with AdapterRemovalv2 [24] using default parameters. BWA-backtrack v0.7.15 [25] was used to map the cleaned reads to the pseudo-chromosomal polar bear genome (Genbank accession: GCA_000687225.1) from Liu et al. [1], with default parameters. Reads with mapping quality of less than 30 were filtered using SAMtools v1.6 [26]. Duplicates were removed with picard v2.6.0 [27]. Possible paralogs were filtered using SAMtools. Finally, local realignment around indels was performed using GATK (v 3.3) [28].

Population structure and admixture

To investigate whether admixture or incomplete lineage sorting (ILS) may be present between polar bears and brown bears at the genes of interest, we performed independent principal component analyses (PCAs) for each gene including the 50 kb regions up and downstream of the gene. For this, we included all polar and brown bear individuals. We used a genotype likelihood approach to construct the PCAs: input genotype likelihood files were constructed using ANGSD v0.929 [29], with the SAMtools genotype likelihood algorithm (−GL 1), and specifying the following parameters: remove reads that have multiple mapping best hits (−unique_only), remove reads with a flag above 255/secondary hits (−remove_bads), include only read pairs with both mates mapping correctly (−only_proper_pairs), adjust mapQ for reads with excessive mismatches (−C 50), adjust quality scores around indels (−baq 1), a minimum mapping quality of 20 (−minMapQ 20), a minimum base quality of 20 (−minQ 20), discard sites where there is no data on at least 95% of the individuals (−minInd), skip tri-allelic sites (−skipTriallelic), and remove SNP sites with a p-value larger than 1e− 6 (−SNP_pval 1e-6). The ANGSD output beagle file was run through PCAngsd v0.95 [30].

Gene investigation

We analysed twelve of the genes previously found to show the strongest signal of positive selection in the polar bear [1]. These included ABCC6, AIM1, APOB, COL5A3, CUL7, EHD3, FCGBP, LAMC3, LYST, POLR1A, TTN, and XIRP1. The phenotypes putatively associated with these genes can be found in Table 1 and supplementary Table S1.

Genotypes were called using ANGSD, specifying the same parameters as the PCA analyses with the additional parameters: write major and minor alleles and the genotype directly (−doGeno 5), estimate the posterior genotype probability based on the allele frequency as a prior (−doPost 1), use the reference allele as the major allele (−doMajorMinor 4), output as beagle likelihood file (−doGlf 2), and calculate allele frequencies assuming a fixed major allele and an unknown minor allele (−doMaf 2). In order to decrease biases that could arise when calling heterozygous alleles from the low coverage genomes, we only called genotypes from individuals that had at least 4x coverage at the site of interest (−geno_minDepth 4). For the investigation into which alleles represent the ancestral variant, we downloaded each of the relevant giant panda gene transcript sequences from Genbank (Supplementary Table S5). Although assembled genome data is available from Ursidae species more closely related to the polar bear and brown bear (e.g. black bear [31]), we specifically avoided the use of other Ursine bears to determine the ancestral state, as introgressive gene flow has been identified across nearly all species of Ursine bears [32] which may lead to the misidentification of the ancestral variant. We additionally included genomic data from a ~ 110 k year old polar bear sample from Svalbard [3] (NCBI Bioproject ID: PRJNA169236) in an attempt to understand when the variants may have arisen. However, due to the very low coverage (~ 0.34x) manner of the Poolepynten bear, we did not recover any sites with enough data to incorporate into our analyses.

We performed predictions of the effects of amino acid changes found in putative de novo mutations on the polar bear lineage on the function of homologous human proteins using Polyphen-2 [33], SIFT [34], and PROVEAN [35]. For this analysis we downloaded the relevant human gene transcript sequences from Genbank (Supplementary Table S5). This was done by aligning the human and polar bear protein genes and selecting only the positions that were fixed for the derived allele in polar bears and fixed for the ancestral allele (shared allele with the giant panda) in brown bears. Positions and amino acid changes were submitted to Polyphen-2 batch web service using HumDiv and HumVar model classifiers as well as to PROVEAN human protein batch web service for the PROVEAN and SIFT analyses.