Introduction

The blue whale (Balaenoptera musculus) is the largest animal that has ever lived, with individuals reaching 30 m in length and weighing up to 150 tonnes. They are found in oceans across the globe but were historically most abundant in the Southern Ocean (Sears and Perrin 2018). These whales were too fast and powerful for early whalers to catch using traditional methods and it was not until technological advancements in the 1860s that commercial exploitation of the large “rorqual” whales became possible. Whaling for blue whales began in the Northeast Atlantic (NEA) and subsequently spread rapidly to all other oceans on an industrial scale (Thomas et al. 2016).

The blue whales’ large size made them a lucrative target as they provided a more profitable yield per unit of hunting effort than other whales. By the second half of the 20th century, commercial whaling had brought this species to the brink of extinction. A worldwide pause on the hunting of blue whales was put into effect by the International Whaling Commission (IWC) in 1966. Although a complete moratorium on all commercial whaling was implemented in 1985, some illegal hunting persisted. It has been estimated that between 1900 and the late 1970s, over 379,000 blue whales were harvested globally (Rocha et al. 2014).

Blue whales are currently classified as Endangered by the International Union for Conservation of Nature (Cooke 2019) and in North America they are listed under the Canadian Species at Risk Act (SARA) and the US Endangered Species Act (ESA). While no longer hunted, they continue to be threatened by ship strikes, fishing gear entanglement, marine noise, pollutants, and climate change (COSEWIC 2012). Though blue whale numbers are thought to be slowly increasing globally, this growth has been described as spotty and equivocal (Branch et al. 2004; Thomas et al. 2016; Cooke 2019). Today there are an estimated 5,000–15,000 blue whales, only 3–11% of the 1926 global species estimate (Cooke 2019). Despite the modest growth for the species, several aggravating factors continue to challenge recovery of some blue whale populations. For NA blue whale stocks these challenges include reduced calving and recruitment rates, persistent ecological disruptions (Beauchamp et al. 2009; Koubrak et al. 2022), as well as lagging governance on recovery efforts (Koubrak et al. 2022). In fact, there is no evidence of recovery from the impact of whaling on this species in the NA (Ramp et al. 2006). Understanding population structure is of critical value for the conservation of blue whales, considering the regional trends.

Knowledge of the distribution, genetic structure, and population ecology of blue whales is essential for the protection of this vulnerable species. Blue whales are currently divided into at least four distinct subspecies (Balaenoptera musculus brevicauda, B. m. intermedius, B. m. indica and B. m. musculus). Population structure, genetic diversity and connectivity in the Pacific and Southern oceans have been studied extensively using genetic markers (Attard et al. 2010, 2016, 2018; Barlow et al. 2018; Costa-Urrutia et al. 2013; Leduc et al. 2017; Sremba et al. 2012; Torres-Florez et al. 2014). No similar genetic data has been generated thus far for blue whales in the NA (Balaenoptera m. musculus). However, numerous studies have looked at contemporary blue whale movements, distribution and population size within the NA using sightings, photoidentification, vocalizations, satellite tracking and isotope analyses (Davis et al. 2020; Delarue et al. 2022; Lesage et al. 2017; Pike et al. 2019; Silva et al. 2013, 2019; Storrie et al. 2018; Trueman et al. 2019). While data from tagged animals helps provide information on contemporaneous movements (Lesage et al. 2017), a large gap remains in our understanding of blue whale migrations, population ecology and interconnectivity, particularly across the NA.

One outstanding conservation question is whether blue whales in the eastern and western portions of the NA comprise a single population, as homogeneous versus discrete populations may merit different recovery strategies. Early whalers thought that there were two distinct stocks of blue whales in the NA (Lesage et al. 2017). Long-term photo-identification data coupled with satellite telemetry data suggests a low degree of admixture between eastern and western NA blue whales (Ramp and Sears 2013; Sears and Calambokidis 2002; Sears and Perrin 2018; Silva et al. 2013). However, blue whale songs recorded from the Northeast Atlantic (NEA) and Northwest Atlantic (NWA) are similar, yet distinct from blue whales in other oceans, suggesting that population structure, if it exists, is likely minor and recently evolved (Clark 1994). In the NWA, blue whales presently only number about 250 adults (COSEWIC 2012), whereas in the NEA there are ~ 3,000 individuals (Pike et al. 2019). Currently, the National Oceanic and Atmospheric Administration’s (NOAA) revised Recovery Plan for the Blue Whale (2020) (https://repository.library.noaa.gov/view/noaa/27399) makes clear that the question of whether eastern and western whales in the NA constitute one population or two is unresolved. However, in that Recovery Plan, all blue whales in the NA are considered a single management unit based on the International Whaling Commission’s (IWC) blue whale stock definition and the similarities in blue whale song found across the NA, pending further evidence.

An additional conservation issue for blue whales is introgression and hybridization (Allendorf et al. 2001; Rhymer and Simberloff 1996). Purported blue whale/fin whale hybrids have been reported by whalers from the coast of Lapland and Alaska for more than a century (Cocks 1887; Doroshenko 1970). In the last few decades, hybrids caught near Iceland and Spain have been verified using molecular evidence (Árnason et al. 1991; Bérubé and Aguilar 1998; Spilliaert et al. 1991). It is not clear whether these hybridization events represent regular and sustained gene flow between the two species (Westbury et al. 2019; but see Árnason et al. 2018). However, if gene flow is occurring from the far more abundant fin whales (population size > 80,000, Pampoulie et al. 2020) to blue whales, this presents a threat to NA blue whale population(s) due to loss of genetic integrity. This is of particular concern if opportunities for contact between competing populations and species occur (Rhymer and Simberloff 1996).

Herein, we constructed the first de novo assembly of a NWA blue whale genome and used it as a reference for assembling the nuclear and mitochondrial genomes of a collection of present-day and historical blue whales from across the NA, as well as an Antarctic blue whale and NA fin whales. Our objectives are to use these data to advance the current understanding of blue whale population structure across the NA and provide information on population size changes and migrations through time. Additionally, these genomes can provide information relevant to conservation, issues such as quantifying the potential genetic threat posed to blue whales through introgression with fin whales, seen in both contemporary and historical samples, and provided genetic evidence to determine the number of distinct populations in the NA.

Methods

Samples

We sampled 26 blue whales from across the NA, one from the Southern Atlantic and one from the Antarctic. Of the NA samples, four were from strandings along the east coast of Canada between 2014 and 2019 and seven were sampled using biopsy darts near Svalbard, Norway between 2014 and 2017. The remaining samples were from historical museum skeletons in Canada, the USA, Iceland and Norway, with collection dates ranging from 1876 to 1975 (Table 1). Seven present-day fin whales from Norwegian waters were also sampled using biopsy darts. In addition, archived genomic sequences from a NA blue whale (SRR5665644) and two sei whales (SRR5665645 and SRR5665646) were downloaded from GenBank.

Table 1 Present-day and historical museum whale samples with the average coverage of sequenced short reads from nuclear (N) or mitochondrial (Mito) genomes

De novo genome assembly

The DNA used for the de novo blue whale genome assembly came from muscle taken from a female blue whale (NW-M6, ROMM125066; Table 1) salvaged from Newfoundland in 2014 by the Royal Ontario Museum (ROM), under permit (SARA permit ref: NLSAR-003-14). The Illumina and PacBio reads from the NA blue whale were assembled using the hybrid assembler MASURCA v 3.2.8 (Zimin, et al. 2017). Assembly completeness was assessed using BUSCO (Simão et al. 2015) and genome size was also estimated independently from Illumina short reads using PREQC (Simpson 2014). Sex chromosome linked contigs were detected by aligning to the cow X chromosome (CM008197.2). The genome was screened for repeat regions using REPEATMASKER v4.0.7 and REPEATMODELER (see Supporting Information for more assembly details).

Transcriptome assembly

RNA for the transcriptome assembly was extracted from skin/blubber tissue, collected from a whale sampled in the Svalbard Archipelago (79°N), Norway (Fig. 1A). Transcripts were assembled from paired-end RNAseq data using TRINITY (Grabherr et al. 2011) and TOPHAT (Trapnell et al. 2009) assemblers and redundancies in the predicted transcripts were removed using CD-HIT (Fu et al. 2012). The masked NA blue whale genome was annotated with the predicted transcripts using the MAKER2 pipeline (Holt and Yandell 2011) (for details see Supporting Information). The annotation quality was assessed using BUSCO and INTERPROSCAN v5.23-62.0 (Mulder and Apweiler 2007).

Mitochondrial genome assembly

Mitochondrial genomes for present-day and historical whales were assembled from the trimmed and merged Illumina paired-end reads by mapping them to a reference mitochondrial genome (NC_00160136) using BOWTIE2 2.3.3.1 (Langmead and Salzberg 2012). This analysis had a larger sample size including several individuals from which mitochondrial DNA was successfully recovered but had low concentration for whole genome analyses (Table 1).

Whole genome sequencing

The DNA extraction from present-day samples (after 2010) was done from frozen tissue and short reads were sequenced as detailed in Supporting Information. For the historical samples (1876–1975) the DNA was extracted from the bones at the ROM’s ancient DNA facility following the methodology detailed in the Supporting Information. The paired-end reads were trimmed and merged using SeqPrep v 1.1 (https://github.com/jstjohn/SeqPrep) with default settings and quality score cut-off for mismatches in overlap > 20 (-q 20). The first and last two bases of the merged reads were trimmed, removing potentially damaged sites in ancient DNA (Dabney et al. 2013).

The trimmed sequences from the present-day and historic samples were mapped to the masked de novo assembled NA blue whale genome autosome contigs (Table 1) using BWA 0.7.17 (Li and Durbin 2009) and genome-wide variants were detected for analyses. The SNPs for all the whole genome analyses for present day samples were filtered for quality score and mapping quality > 30, depth 10X to 130X, and MAF > 0.1 (see Supporting Information for further details).

Population structure analysis

Genomic population structure analyses were conducted on blue whales from present-day NA, historic NWA and Antarctica (Table 1). The six historic NWA blue whales (NWa-R4, NWa3, NWa4, NWa5, NWa6, and NWa-CM1) dated from the early days of whaling of this species through to post-whaling. Population structure was investigated using principal component analysis (PCA), Hudson’s Fst (Bhatia et al. 2013), and Jost’s D (Jost 2008). The blue whales were also checked for kinship using PLINK (--genome) (Purcell et al. 2007). The phylogenetic relationships between individuals were explored using RAxML-NG (Kozlov et al. 2019) and SVDQuartets (Chifman and Kubatko 2014). Additionally, phylogenetic relationships among blue whale samples were examined using maternally inherited mitochondrial genomes from a larger set of blue whales from NA, Antarctic and the Pacific.

Principal component analysis

PCA was performed using LASER v 2.04 (Wang et al. 2015) which uses projection Procrustes analysis for samples with low depth of coverage. The samples with low depth (< 10X) of coverage were placed in the context of a reference PCA space constructed using genotypes of reference individuals with higher coverage depth. The first PCA analysis included present-day and historical blue whales from NA and Antarctica (Table 1) along with seven NA fin whales, and two sei whales.

A second PCA analysis visualized the genetic relationship among just the blue whales, which included present-day and historical blue whale samples (NWa-R4, NWa3 and NWa4). The SNPs for the PCA were also filtered for being genotyped in > 50% samples and r2 > 0.8 within a 1 kb window to filter for linkage disequilibrium. The first and second PCA analyses included 4,136,458 and 2,517,406 SNPs, respectively. The reference PCA space for the first PCA analysis was constructed using present-day NA blue whales (> 15X) and two present-day fin whales (~ 15X) and two sei whales (~ 10X). The reference space for the second PCA analysis was computed from the present-day NA blue whales (> 15X).

Genomic phylogenetic analysis

Phylogenetic relationships were estimated for the blue whales from present-day, historical (NWa3, NWa4 and NWa-R4) and Antarctica using RAxML-NG and SVDQuartets with the NA fin whale as the root for an alignment of 176,382 SNPs. SNPs were filtered as in PCA analysis. For RAxML-NG, the substitution model GTR + G was used with the Lewis ascertainment bias correction and bootstrapped 500 times. The best maximum likelihood tree was visualized using FigTree v.1.4.4 (http://tree.bio.ed.ac.uk/software/figtree). SVDQuartets was run in PAUP* v.4a168 (Swofford 2002) sampling all quartets and inferred with 1,000 nonparametric bootstraps. These analyses were repeated for a reduced subset of the blue whales, excluding the lower coverage assemblies below 6X, which included the NA historical samples, to assess their impact on the accuracy and resolution of the reconstructed trees.

Whole genome fst and population statistics estimates

Hudson’s Fst was estimated for the present-day (> 15X coverage) NEA and NWA blue whale samples. Hudson’s Fst estimate is not biased by differing sample size between populations (Bhatia et al. 2013). Fst was estimated for 906,598 genome-wide biallelic SNPs also filtered for being present in > 50% samples and 1 kb apart. Fst was estimated in R using a custom script and the 95% confidence interval was estimated using 1,000 bootstrap replicates. The Jost’s D, also a measure for genetic differentiation among populations, was estimated using the basic.stats function of hierfstat in R.

Mitochondrial genomes: genetic diversity and matrilineal population structure

The 27 mitochondrial genomes sequenced here, along with the four that are publicly available (CM018075, MF409242, X72204 and assembled from SRR5665644) include 28 unique mitochondrial haplotypes. The unique haplotypes were aligned with a sei whale (NC_006929) using MAFFT v7 (Katoh and Standley 2013), the sister group to blue whales (Árnason et al. 2018). Gene regions (including codon positions) were identified and PartitionFinder v.2.1.1 (Guindon et al. 2010; Lanfear et al. 2017) used to determine the best partitioning scheme and substitution models with the Bayesian Information Criterion. MrBayes v3.2.7 (Ronquist et al. 2012) was used to reconstruct the phylogenetic relationships between the blue whales with the sei whale as the outgroup. From the resultant phylogenetic tree, a blue whale sequence from each of the three clades descended from the two deepest nodes in the tree were selected and combined with the mitochondrial genome sequences of the sei, fin (NC_001321), humpback (Megaptera novaeangliae, NC_006927) and minke (Balaenoptera acutorostrata, NC_005271) whales. This second mitochondrial dataset was also partitioned using PartitionFinder2 but excluded the control region due to alignment ambiguity. A time-scaled phylogeny was estimated using the species divergence dates from Árnason et al. (2018) and the program BEAST v2.6.6 (Bouckaert et al. 2019) utilizing a relaxed lognormal clock with a Calibrated Yule model tree prior. The ages determined for the two earliest blue whale divergences then served as calibration points for another BEAST analysis using a lognormal relaxed clock and a coalescent constant population tree prior for the 28 blue whale mitochondrial haplotypes producing a time-scaled blue whale phylogeny. All Bayesian analyses were run for 100 million generations multiple times and Tracer v1.7.1 (Rambaut et al. 2018) was used to check convergence and that equivalent samples sizes (ESS) were greater than 200.

A haplotype median-joining network was created using the program POPART (Leigh and Bryant 2015) with the default settings for the 31 mitochondrial genomes. Descriptive statistics including the number of segregating sites, haplotypes, haplotype diversity, nucleotide diversity and Tajima’s D were calculated for the mitochondrial genomes using DnaSP v. 6.12 (Rozas et al. 2017).

Population history and gene flow analysis with demographic model fitting

Three demographic models were tested using whole-genome data and FASTSIMCOAL v 2.7 (Excoffier et al. 2021) with each model estimating the time in generations of the split between the NEA and NWA blue whale populations. Separate effective population size parameters were estimated for both populations as well as for the common ancestor. The first model (“no gene flow”) had no gene flow between the populations and had a total of four parameters. The second model (“symmetric gene flow”) estimated a fifth parameter, the per capita rate of gene flow. The third model (“asymmetric gene flow”) had six parameters and allowed asymmetric rates of gene flow. Autosomal biallelic SNPs from the noncoding region (112,902) of the genome were utilized to generate the site frequency spectrum (SFS) for FASTSIMCOAL2. The SNPs were also filtered for any missing data, 10 kb apart and 10 kb from coding regions. ARLECORE from the ARLEQUIN software suite (Excoffier and Lischer 2010) was used to generate a folded minor allele SFS together with 200 bootstrapped SFS (for the non-parametric bootstrap analysis below). The number of invariant sites was estimated from our SNP dataset and manually inserted into SFS. We used a mutation rate of 1.39 × 10− 8 substitutions per nucleotide per generation (Árnason et al. 2018) to calibrate the models. For each model we ran 100 independent runs of FASTSIMCOAL2, each with different initial starting parameters. Starting parameters were drawn from log-uniform distributions ranging from 100 to 1,500,000 generations for the three effective population size parameters and 1 × 10− 8 to 0.01 for per capita migration rate parameters. The time of divergence was drawn from a uniform distribution ranging from 100 to 1,500,000 generations. All parameters were set to be unconstrained allowing FASTSIMCOAL2 to explore parameter values outside the initial parameter ranges. Each run proceeded through 40 ECM cycles of hill climbing, with 200,000 coalescent simulations used to estimate the likelihood corresponding to each set of parameter value combinations tested. We re-estimated the likelihood of the best supported set of parameter values for each run using 10 million coalescent simulations. Simulation estimates were then used to choose the best set of parameter values and to determine the maximum likelihood value for each model. Support for the three models was assessed using Akaike weights. For the best supported model, 200 non-parametric bootstraps were used to estimate the 95% confidence intervals for model parameters. Each bootstrap had 100 independent runs, each starting from the initial maximum-likelihood parameter values. Fewer ECM cycles (20 rather than 40) were thus required as initial parameters were already in the right region of parameter space. Due to the computationally intensive nature of bootstrapping, fewer simulations (100,000 per set of parameter combinations) were used. Values of effective population size reported by fastsimcoal2 are for haploid individuals. We divided these values by two to transform them to effective population size for diploid individuals.

Heterozygosity

Genome-wide heterozygosity for present-day NA blue whale samples (> 15X) and the Antarctic historical sample was estimated in ANGSD (Korneliussen et al. 2014) based on the site frequency spectrum (SFS) using the infinite sites model. The SNPs for the present-day sample were filtered as in PCA and the SNPs from the historical sample was also filtered for deaminated cytosine residues (-noTrans). The inbreeding coefficients were estimated using PLINK (--het) in present-day NA blue whale samples (> 15X). The population-level mean diversity (total population heterozygosity) in the NA blue whale population (HT) was estimated using the basic.stats function of hierfstat in R.

Introgression

Gene flow between present-day and historical blue to fin whale, humpback and sei whales was investigated using D-statistics (Green et al. 2010) and the direction of gene flow and percent of introgression was estimated by Dfoil (Pease and Hahn 2014) with minke whale as the outgroup (SRS439234). D-statistics was estimated using the four-taxon phylogeny of (((Antarctic Blue, NA Blue), Fin), Minke). The ABBA/BABA tests, where “A” is the ancestral allele and “B” is the derived allele, were performed in ANGSD (Korneliussen et al. 2014) with option -doAbbababa with mapping quality > 30 and quality score > 20 test. This test avoids bias due to differences in depth of coverage by sampling sites at each position of the genome (Korneliussen et al. 2014). The historical samples were also filtered for deaminated cytosine (-rmTrans). The jackknife procedure was used for standard error estimations. Similarly, to study blue-humpback (SRS4201634) whale and blue-sei (SRR5665645) whale introgression, analyses were conducted for (((Antarctic Blue, NA Blue), Humpback), Minke) and (((Antarctic Blue, NA Blue), Sei), Minke).

Dfoil was estimated using the phylogenetic relationship of (((Sei, present-day NA Blue), (Fin, Humpback)), Minke). Sites for these analyses were also filtered for genotype in > 0.50 samples. The genetic identity of the fin whale used in the introgression analysis was verified by testing against another known fin and clustering in the first PCA analysis (Fig. 1B).

Results

Genome assembly

The NA blue whale genome of 2.49 Gbp comprising 11,400 contigs was assembled (N50 of 1.46 Mb and L50 of 449). The genome contained 94.8% complete and 2.6% fragmented mammalian single copy genes. The total size of the genome estimated from Illumina reads was ~ 2.7Gb indicating 92.6% had been assembled with 255 contigs mapped to the sex chromosome and one contig aligned to the mitochondrial DNA (Árnason and Gullberg 1993). Repetitive elements comprised 46.2% of the genome. Predicted transcriptome included 30,867 genes that represented 82.7% of complete mammalian single copy genes and 65.7% of which had known Pfam domains.

Population structure

Principal component analysis

The first PCA grouped blue, fin and sei whales into species specific clusters (Fig. 1C), except for the NWa-CM1 which was intermediate between the blue and fin whale clusters, suggesting a blue-fin hybrid. The D-statistics tests also revealed that two other historical blues, NWa5 and NWa6, had significant fin whale introgression. All three whales were removed from further population analyses using nuclear sequences.

The kinship analysis did not identify any closely related blue whales and the closest blue whale were NW9/M6 (pi-hat = 0.11). Most NA samples clustered together in the second PCA, (Fig. 1D) on PC1 except for NW9/M6 and NE-Ar/NE3. On PC2, NE4/NE73 was an outlier to the main cluster of NA samples. The PC1 and PC2 accounted for 11.42% and 10.80% of total variability, respectively.

Phylogenetics analysis

The genetic relationships estimated using RAxML and SVDQuartest while not fully resolved, did indicate that all the NA blue whales were allied and distinct from the Antarctic blue whale (Fig. 1B and Fig. S2 for higher coverage whales only). The resultant tree placed the NEA whales nested within NWA whales. The bootstrap support values were strong for the basal nodes in the tree for several NWA branches in both analyses but were weaker for the shallower nodes including all the NEA whales and two NWA whales. For the analyses using a subset of the whales with the lower coverage historical NA whales removed, both resultant tree topologies again placed the NEA whales nested within the NWA whales but not agree on which NWA whale sequence was the more divergent (Fig. S2).

Fst and population differentiation statistics

Genetic divergence estimated using Hudson’s Fst suggested moderate (0.21) genome-wide differentiation between NEA and NWA blue whales, with a 95% confidence interval of 0.21 to 0.22. Population allelic differentiation (Jost’s D) between NEA and NWA blue whales was 0.0077.

Fig. 1
figure 1

Population structure of present-day and historical blue whales from North Atlantic based on whole-genome sequencing data. A. Map of blue whale sampling locations. (B) Phylogenetic trees estimated using RAxML-NG and SVDQuartets with only bootstrap support values above 50% presented, (RAxML above and SVDQuartets below). (C) Principal component analysis (PCA) of seven Northwest Atlantic (NWA) blue whales, eight Northeast Atlantic (NEA) blue whales, one Antarctic blue whale, seven fin whales from the North Atlantic and two sei whales using LASER. The circled sample represents the blue-fin hybrid historical sample (NWA-CM1). The variability accounted by PC1(43.89%) and PC2(7.63%) (D) PCA of seven NWA blue whales and eight NEA blue whales. The variability accounted by PC1(11.42%) and PC2(10.80%) (see Supplement Figure S1 for PCA with NA blue whales and the Antarctic whale)

Mitochondrial genetic diversity and matrilineal population structure

We examined the complete mitochondrial genome sequences from 31 blue whales (Table 2). Genetic relationships among the blue whales reconstructed using these sequences revealed a tree distinct from that estimated using the nuclear sequences in that the Antarctic whale nested in amongst the NA blue whales (Fig. 2A). The tree has potentially five haplogroups, with the majority of the NEA and NWA whales sampled being in haplogroup A. Sister to haplogroup A were two poorly resolved clades consisting of an Antarctic, a Pacific, and a Southern Atlantic whale and one NWA whale (haplogroups C & B). Haplogroups D and E represented much more divergent mitochondrial lineages sharing successive common ancestors with haplogroups A, B and C dated at 167,000 and 201,000 years ago, respectively. The mitochondrial genome median joining network revealed consistent results to the mitochondrial phylogenetic tree with widely separated clusters among the blue whales sampled (Fig. 2B). Due to variation in mitochondrial genome coverage between modern and historical samples, sequence coverage was mapped onto the tree and network to verify clustering was not due to low coverage (see Figure S3).

Fig. 2
figure 2

(a) Time-calibrated phylogeny of 25 unique North Atlantic, one South Atlantic, one Antarctic and one Pacific blue whale mitochondrial genome haplotypes. Scale bar represents 95% highest posterior density (HPD) for divergence estimates. Numbers at nodes denote posterior probabilities (only support for the basic nodes is reported). Lineages are coloured according to the region where the whale was sampled. (b) Mitochondrial haplotype median-joining network for blue whales with ticks along the branch lengths denoting nucleotide differences between haplotypes. The two longest branches have 30 and 32 substitutions respectively

Table 2 Summary of genetic diversity statistics for 31 blue whale mitochondrial genomes including segregating sites (S), number of haplotypes (h), haplotype diversity (Hd), nucleotide diversity (π) and Tajima’s D. NA means not applicable

Population history and gene flow analysis with demographic model fitting

Among the three gene flow models (‘no gene flow’, ‘symmetric gene flow’ and ‘asymmetric gene flow’) compared between blue whales from NEA and NWA, the model with no gene flow received almost no support (Table 3) and was rejected in favour of models with gene flow. A model with symmetric gene flow received an Akaike Weight of only 0.10 while a model with asymmetric gene flow received the greatest support with an Akaike Weight of 0.90. This best fit model suggested that the populations diverged 499 generations ago and that subsequent gene flow has been primarily unidirectional with substantial gene flow from the northwest to the northeast (Fig. 3) but very little gene flow in the reverse direction. The per capita rates of gene flow reported in Fig. 3 translate to 21.8 (18.6 to 25.5, 95% CI) individuals migrating from the west to the east each generation and 0.039 (0.00046 to 1.49; 95% CI) from the east to the west. Effective population sizes between the northwest and northeast were comparable at around 5,000 individuals while ancestral values were about four times these values.

Fig. 3
figure 3

The best fit demographic model for the blue whale samples from the Northwest Atlantic (NW) and Northeast Atlantic (NE). Effective population sizes (Ne), divergence in generations ago (Tdiv), and rates of gene flow per 1000 individuals per generation are shown. 95% confidence of model parameters are shown in brackets based on 200 bootstrap replicates. Time flows from the top of the diagram to the bottom

Table 3 Support for three demographic models for northwest and northeast blue whales from the North Atlantic. ΔAIC = delta Akaike Information Criterion. The best supported model has a ΔAIC of 0

Heterozygosity and population history of NA blue whales

Genome-wide heterozygosity for the present-day NA and historical Antarctic blue whales was ~ 0.0025 (0.00250–0.00254) and 0.0053, respectively. The inbreeding coefficients for the present-day whales were low (< 0.02) The total population heterozygosity (HT) estimated for the NA population was 0.3952.

Introgression

All present-day NA blue whale samples showed significant fin whale introgression (Z-score > 3) (Table 4). Four of six historical blue whale samples also had evidence of fin whale introgression. The sample NWa-CM1 (from 1974) had a D-statistic value of 0.94 indicating a recent hybridization between fin and blue whale. Likewise, NWa5 and NWa6 (from early-1900s) also had high D-statistics values indicating they were hybrids. None of these hybrids carried fin whale mitochondria, indicating their mothers were blue. None of the blue whales sampled showed significant introgression with humpback whales. This is also the case with blue and sei whales, except for NE-Ar (D-statistic = 0.013; Z-score = 5.71). Dfoil statistics revealed unidirectional gene flow, from fin to blue whale (Table S1) and that fin whale sequences constitutes ~ 3.5% of the genomes of these NA blue whales.

Table 4 D-statistics analysis to detect presence of gene flow between the blue and fin whales with the four-taxon phylogeny (((Antarctic Blue, NA Blue), Fin), Minke)

Discussion

Our results provide the first insights into the population structure, and demographic history of blue whales from the NA, and document levels of introgression with fin whales.

North Atlantic blue whale population structure

There has been uncertainty whether NA blue whales consist of one or more populations. Photoidentification studies from both sides of the NA over the last several decades support more than one population, with only a single occurrence of an overlap between eastern and western whales (Sears and Calambokitis 2002; Ramp and Sears 2013; Sears and Perrin 2018; see also Silva et al. 2013). However, similarities in whale song across the NA, relative to blue whale calls in other ocean basins, suggests a single population or at most two with minimal differentiation (Clark 1994). Our findings based on the nuclear sequences examined here showed a moderate and statistically significant Fst between eastern and western blue whales but a low Jost’s D value indicating low allele differentiation between the two. Fst is influenced by within-subpopulation heterogeneity whereas the Jost’s D relies on numbers of alleles within and among subpopulations (Alcala and Rosenberg 2019). The phylogenetic reconstructions using nuclear SNPs for both modern (high coverage) NA whales alone and for modern and historical samples combined do not support the premise of two reciprocally monophyletic clusters separated by geography, rather the NEA blue whales sampled here were nested within the NWA whales. This was supported by PCA analyses which showed overlap between the sampled whales and modelling using FASTSIMCOAL2, which rejected a scenario of no gene flow between whales from both sides of the NA.

The model favoured in FASTSIMCOAL2 has asymmetric gene flow with many more whales migrating from west to east each generation than in the opposite direction. The underlying causation of this asymmetric gene flow is intriguing in an environment with no known barriers and suggests the possibility of an underlying driver. One possibility is oceanic circulation. The North Atlantic Current is a strong ocean current flowing west to east. Whales conceivably use ocean currents to conserve energy during long migrations (Lesage et al. 2017), though they are unlikely limited by them. However oceanic circulation also contributes to clinal plankton biogeography in the NA, including a west-east zonal gradient, with the NWA less biodiverse, notably for calanoid species (Kléparski et al. 2021), important in the feeding ecology of NA krill (Schmidt 2010). Blue whales of the northeast Pacific are known to track spring/summer plankton blooms over time, as they forage northward along the coast (Abrahms et al. 2019). It is possible that blue whales in the NA similarly track resource blooms spatially over time, contributing to patterns of gene flow over generations. Recent warming in the Barrents Sea has led to an increase in krill biomass, secondary to changes in the Atlantic advection (Eriksen et al. 2017) which could have implications for population recovery. Additional research is required to explore this further.

The phylogeny reconstructed from the maternally inherited mitochondrial genomes agreed with the tree estimated from the nuclear sequences in that there was no clear distinction between eastern and western whales. Where it differed, is that the Antarctic blue whale sample was not basal to the NA whales but nested in amongst them. This may either represent unsorted ancestral polymorphism present in the population or possibly that females may have a dispersal behaviour that periodically involves long-range matrilineal dispersals, including individuals moving into the NA from other blue whale populations.

Heterozygosity

We observed relatively high genome-wide heterozygosity in present-day NA blue whales as was reported by Árnason et al. (2018) and in the one historical Antarctic whale sample. Population-level heterozygosity was also high, which was consistent with reports of high heterozygosity observed in Chilean and other Antarctic whales (Torres-Florez et al. 2014). While blue whales were hunted to the brink of extinction, they currently exhibit high genetic variability. However, these animals are long lived and with the bottleneck event having happened so recently, a reduction in heterozygosity would likely not be observable for several generations. High variability along with low inbreeding has been associated with healthier reproductive outcomes and greater adaptability (but see Teixeira and Huber 2021) which could assist in the species’ recovery, as long as low population numbers do not persist for an extended number of generations.

Blue whale / fin whale hybridization

Blue and fin whales are the two largest animals on earth and hybrids between these species have been observed on multiple occasions (Árnason et al. 1991; Pampoulie et al. 2020). This is remarkable as they are not sibling species and diverged about ~ 8.35 million years ago (Árnason et al. 2018). We demonstrated gene flow between NA blue and fin whales in all our present-day and four (of six) historical samples, including a first-generation hybrid. The recent gene flow that we recorded between fin whales and NA blue whales was not detected in the previous study by Westbury et al. (2019). Our findings are partially in agreement with Árnason et al. (2018), but we only detected gene flow between blue and fin whales not between humpbacks and blues. Also, we found the contribution to the blue whale genome of fin whale DNA was larger than reported in Árnason et al. (2018) at 3.5% and the gene flow was unidirectional, from fin whale to blue whale. Important differences in our analyses are a larger sample size, using a de novo assembled blue whale genome as the reference and comparing between different blue whale populations, namely the Antarctic and NA, rather than between different whale species as was done in these other studies. Only two of our whale samples from the early 20th century lacked any sign of introgression. Our results indicate recent introgression observed in NA blue whales took place after the separation between NA and Antarctic subspecies. Hybridization between species can occur naturally or result from breeding disruptions promoted by anthropogenic activities and effects. While the abundance of fin whales was negatively impacted by whaling (Wolf et al. 2022), their numbers still greatly exceed that of blue whales globally. While male fin whales are smaller than their blue whale counterparts, they have comparable cruising and sprinting speeds (Sears and Perrin 2018; Aguilar & García-Vernet 2018), which could make male fin whales competitive during courtship chases where blue and fin whales are sympatric.

Hybridization can in some cases lead to the extinction of distinct species through introgressive swamping of the genome (Rhymer and Simberloff 1996). Our relatively small sample sizes from early whaling and post-whaling blue whales preclude drawing conclusions on whether hybridization rates are changing. However, the high frequency of occurrence indicates a need to examine greater numbers of both recent and historical samples to ascertain if there is a threat of genetic swamping of NA blue whales by fin whales.

Conclusions

Here we present the first collection of present day and historical blue whale genomes, both nuclear and mitochondrial, from samples across the NA. Results of our study indicate that the genetic structuring of blue whales in the NA, although statistically significant, is more nuanced than simple models of one or two populations. Fst analyses shows statistically significant genetic structuring between eastern and western blue whales, however asymmetric gene flow from west to east is occurring across the NA and phylogenetic reconstruction place the eastern whales nested within the western whales. Future conservation actions and management policies should be informed by this complexity. The NEA appears to be an area for potential blue whale population recovery given the dramatic increases in krill stocks in the Barents Sea. Hybridization of blue whales with fin whales appears frequent in the NA and future rates of hybridization should be monitored. While genetic variability in NA blue whales is high, this is likely due to the recency of the bottleneck and the species long generation time, since it takes time to see the impact of population reduction on the genomes. If blue whale populations do not increase from their current low levels, reduction in heterozygosity probably will occur over time. Further studies with larger sample sizes for present-day and historical blue whales from around the world should be conducted to better define populations and subspecies and examine gene flow to help plan global conservation efforts for this endangered species.