1 Introduction

Humpback whale (Megaptera novaeangliae) populations in the North Pacific Ocean were severely overexploited by whaling and were reduced by over 90 % from an estimated pre-exploitation population size of 15,000 individuals (Johnson and Wolman 1984; Rice 1978). By the end of commercial whaling in the 1960’s, this population was at an all-time low, estimated at 1,200 by Johnson and Wolman (1984), and 1,400 by Gambell (1976). Various protective measures were implemented to support recovery of this overexploited population, including: an International Whaling Commission (IWC) ban on humpback whale harvest in 1965, humpback whales were listed under the Mammal Protection Act (MMPA) of 1972, and listed under the Endangered Species Act (ESA) in 1973. Today, four decades later, humpback whales are still listed under the MMPA and ESA and are protected from commercial harvest by the IWC; however, abundance estimates of this species in the North Pacific Ocean are rare and often rely on limited data.

Since the end of large-scale industrial whaling, humpback whales in the North Pacific have been increasing in abundance. Using mark-resighting from 1990 to 1993, it was estimated that there were 10,000 individual humpback whales in the central and eastern North Pacific Ocean (Calambokidis et al. 1997). In 2004–2006 an international, collaborative research effort known as SPLASH (Structure of Populations, Levels of Abundance and Status of Humpbacks) was conducted throughout the entire North Pacific Basin (Calambokidis et al. 2008). As part of this study, population estimates were made using photographic identification in a mark-recapture analysis. They concluded that, at the end of their study in 2006, there were nearly 20,000 humpback whales in the North Pacific. Given a historic estimate of 15,000 individuals, the SPLASH abundance estimate suggests that there were more humpback whales in the North Pacific in 2006 than there had been prior to the onset of exploitation. In addition to estimating population size, SPLASH also calculated annual rate of increase (ROI) (Calambokidis et al. 2008). When they contrasted their estimates to that of 1993 of approximately 10,000 in the North Pacific (Calambokidis et al. 1997), they found a 4.9 % ROI over the 13 year period. However, when they instead compared against the 1966 population estimate of 1,400 (Gambell 1976), they found a 6.8 % ROI over the 39 year period. Given these results, SPLASH concluded that humpback whale abundance in the North Pacific is increasing at 5–7 % annually. In a different study, the maximum theoretical biological ROI was estimated for humpback whales using best-case scenario values for adult survival, young-of-the-year survival, pregnancy rates, and age at first reproduction. Here, it was found that the maximum plausible ROI due to biological reproduction is 11.8 % (Zerbini et al. 2010). While these studies arrived at different estimates of ROI, it still can be concluded that the humpback whale population of the North Pacific Ocean is increasing and may have reached or exceeded pre-exploitation abundance levels.

Humpback whales are migratory; in the North Pacific Ocean they generally spend summers in northern latitudes feeding and winters in tropical waters breeding and calving (Johnson and Wolman 1984). While on the feeding grounds, humpback whales use their baleen to filter small schooling forage fish and invertebrates, such as Pacific herring (Clupea pallasii), capelin (Mallotus villosus), the juvenile stages of Pacific salmon (Oncorhynchus spp.), and euphausiids (Nemoto 1970; Straley 2009, pers. comm.). Based on energetic analyses it was estimated that humpback whales consume approximately 0.4 tons of prey each day while on the feeding grounds (Witteveen et al. 2006). While humpback whales can be found feeding in Alaska every month of the year, their abundance peaks in summer and fall months and they are least abundant in the late winter and spring (Baker et al. 1992; Rice et al. 2011; Straley 1990, 1994).

One important feeding ground for humpback whales in the North Pacific is Prince William Sound (PWS). At present, very little information exists on the current and historic abundances of humpback whales in PWS. Only a few historic point estimates of humpback whales in PWS have been made including 84 (SE \(=\) 11) individuals in 1980–1983 and 100 (SE \(=\) 20) individuals in 1983–1984, both found using a Chapman mark-recapture estimator (Matkin and von Ziegesar 1985) and an estimate of 19 individuals in 1985 using a Schnabel estimator in a low effort year with few whales encountered and identified (von Ziegesar and Matkin 1986). From these estimates, the population in PWS was estimated to increase at 6 % annually, which is in the range of estimates for the entire North Pacific Basin determined by SPLASH (von Ziegesar and Matkin 1986). No other estimates of abundance were made until 2007–2009, when a field study was undertaken to investigate the magnitude and composition of prey consumption by humpback whales in PWS (Rice et al. 2011). Using mark-recapture methods, late-season abundance of humpback whales was estimated at 135 (SE \(=\) 12) whales in September-March of 2009 (Rice et al. 2011).

In this paper we use a comprehensive dataset of 30 years, from 1978 to 2009, of humpback whale surveys in PWS from Eye of the Whale Research (EOW) and North Gulf Oceanic Society (NGOS) to examine trends in abundance and develop estimates of humpback whale population size. This dataset has been used to catalog individual humpback whales in the area and for various other analyses (von Ziegesar and Matkin 1985; von Ziegesar 1992; von Ziegesar et al. 2001; von Ziegesar 2010, 2012); however, no recent estimates of humpback whale population size or population change over time have been generated from these data.

This project uses this dataset to develop the first time series of humpback whale abundance in western PWS. Specifically, we employ sightings per unit of effort and mark-recapture to evaluate relative and absolute abundance, respectively. Estimates are then used to characterize the trends in abundance for this area and compare them to the trends documented for the entire North Pacific Basin over the same time period. Given that such long datasets of mark-recapture data are rarely available, this presents an opportunity for evaluation of the performance of a commonly used open population model. To our knowledge, this is the longest time series used in any mark-recapture analysis of humpback whale abundance.

2 Methods

2.1 Study area

Prince William Sound is located in the Gulf of Alaska (\(60^{\circ } 35^{\prime }\,\hbox {N}, 147^{\circ } 10^{\prime }\,\hbox {W}\)), and is comprised of numerous islands, glacial fjords, and bays. The western portion of PWS was designated as our study area (Fig. 1), as this is where the majority of surveys from the dataset were conducted. The dataset also includes some data from neighboring areas; however, because these areas are not consistently surveyed over the time series, these data were removed from our analyses.

Fig. 1
figure 1

Map of Prince William Sound, Alaska, showing the designated study area of western Prince William Sound (circled)

2.2 Effort

Three measures of effort were available from the dataset: nautical miles traveled, hours spent searching, and number of boat-days. For early years, where track lines were recorded by hand on photocopied maps, track lines were manually digitized. In more recent years, many of the surveys had digital track lines from handheld GPS units. For surveys conducted within the defined study area, each measure of effort (nautical miles, hours, and days) was totaled by year, and the different measures were analyzed to determine which was most appropriate for this study (Table 1). Many surveys had no record of the track line taken and/or the number of hours spent searching; years known to have \(>\)10 % missing effort data are indicated in Table 1.

Table 1 Days, nautical miles (nm), and hours of effort in western Prince William Sound by year

2.3 Description of available data

The dataset includes humpback whale sightings and identification photos of the ventral surface of humpback whale flukes, taken annually between April and September from 1978 to 2009. The documentation of the unique pigmentation patterns of the ventral flukes is a commonly used method to identify and track individual humpback whales over time (Katona et al. 1979). No data were collected in 1982, and data from 1981 was thrown out because effort in this year was relatively small and data collection was compromised by weather complications. Photos were taken using single-lens reflex cameras from 1978 to 2002 and using digital cameras from 2003 to 2009. Photo identification records include \(>\)5,000 photographic sightings of \(\sim 575\) individual humpback whale flukes over the time series. Depending on the year, between one and three research teams collected data on separate boats. Research teams however had different specific goals; namely, some specifically targeted humpback whales (von Ziegasar, EOW), while others targeted killer whales (Orcinus orca) and gathered humpback whale data only opportunistically (Matkin and Saulitis, NGOS). The total number of whales sighted, unique identifications, and newly identified individuals are summarized by year (Table 1).

2.4 Data organization

Data describing effort, humpback whale sightings, and individual characteristics in Microsoft Excel files (EOW) were transferred into the Alaska Humpback Whale Database, a Microsoft Access database designed specifically for storage and querying of humpback whale photographic data (developed by co-author Straley and colleagues). The database contains tables to hold information pertaining to a given whale at the time of the sighting, unique and long-lasting characteristics of that whale, and information specific to the survey effort. During this entry process, sighting histories were error-checked by verifying Excel entries with actual photos and with hard-copy datasheets. The best photograph of each individual whale each year was identified and, for years where photos were archived as negatives (1987–2002), the photos were digitized by scanning the negatives.

2.5 Humpback whale time series: count per unit effort

All recorded sightings of humpback whales—both those whales that were photographed and those that were sighted and noted only—were tallied for each year. An annual index of relative abundance was calculated as the number of sightings divided by the amount of effort. A quadratic regression line was fitted to determine if there was a linear or curvilinear trend over time. Additionally, an index of the number of unique whales identified each year divided by the amount of effort was also constructed. Again, a quadratic regression line was fitted to assess any trend over time.

2.6 Humpback whale time series: mark-recapture modeling

2.6.1 Mark recapture modeling consideration: photo quality

Quality of identification photos was considered for the mark-recapture portion of this analysis, as poor quality photos can cause misidentification of individuals and result in bias in a mark-recapture setting. Typically, in the case of photo identification of humpback whales, it is unlikely to have false positive errors; namely, instances where photos are wrongly designated as re-sightings. However, the occurrence of false negative errors-instances where photos are wrongly designated as new sightings when they are actually resightings-increases with the inclusion of poor quality photos (Friday et al. 2000, 2008; Stevick et al. 2001). Efforts were taken to reduce the occurrence of this type of error by removing the photographs determined to be of poor quality. This was done by ranking the quality of the photo with a value of 1 through 3 in six categories. Values from each of the six categories were then summed to generate an overall index of quality for each photo; thus, index values could range from 6 to 18, with lower values indicating a better quality photo. Criteria used to rank these photos are a modified version of the protocols described by othersFootnote 1 (Calambokidis et al. 2008; Friday et al. 2000). We deemed all photos with a total photo quality index greater than or equal to 9 to be “poor” and removed them from the analysis.

2.7 Mark recapture modeling consideration: fluke distinctiveness

Fluke distinctiveness could also play a role in producing false negative errors, instances where photos are wrongly designated as new sightings, which would introduce positive bias in abundance in a mark-recapture analysis (Friday et al. 2000, 2008; Stevick et al. 2001). Flukes bear wildly different patterns and markings, making some individuals more identifiable than others. Thus, it is more likely that a match of an individual with nondescript flukes would be overlooked resulting in a false negative error. To evaluate the importance of fluke distinctiveness in the sighting histories of this dataset, we evaluated all flukes in this analysis (4306 sightings of 356 individual whales) by ranking them with four criteria. The ranking system usedFootnote 2 is a modified version of that by Friday et al. (2000). Each pair of flukes was assigned a value, 1 through 3, in each category. Values from each of the four categories were then summed to generate an overall index of fluke distinctiveness for each pair of flukes; index values could therefore range from 4 to 12, with lower values indicating more distinct flukes. This index was then used to measure the importance of fluke distinctiveness in recapture probability (after the photo quality standard was applied) by assessing the relationship, if any, between total number of sightings for each individual and the distinctiveness index of its flukes. Additionally, this index was used as an individual covariate to the probability of capture in mark recapture models as explained in later sections.

2.7.1 Mark recapture modeling

For this study, we chose only open-population models, because population-level changes of humpback whales over many years result from recruitment (births or immigration) and loss (death and emigration). The mark-recapture program MARK was used to obtain maximum likelihood parameter estimates (MLE’s) and to compare various models using the POPAN implementation of the Jolly-Seber model (Schwarz and Arnason 1996; White and Burnham 1999). Like most commonly-used mark-recapture models, the Jolly-Seber model generates estimates of parameters by modeling sighting histories as a multinomial distribution. Its structure allows several model configurations to be estimated, permits the inclusion of covariates, and allows for parameters to be added, fixed, and set equal across time periods, groups, and strata. Sightings of all first year calves were removed from the mark-recapture analyses as their probability of capture is complicated by co-occurrence with their mothers.

The POPAN procedure in program MARK runs a variety of the widely used and accepted models for open populations (Schwarz and Arnason 1996; Schwarz 2001) and was chosen because it generates annual estimates of abundance \((\hat{N}_{t})\) to form a time series. This model is a modified version of the traditional open Jolly-Seber model and, based on the sighting histories, estimates survival, \(\phi _{t}\), and probability of capture, \(p_{t}\), “births” or net number of new entrants into the population (pent) in year \(t\), and \(\hat{N}\), the overall population size. Estimates of standard error come from the variance-covariance matrix, and the estimates of the lower and upper 95 % confidence intervals of the \(\phi \)’s and\( p\)’s are made using the likelihood profile for the MLE (Lebreton et al. 1992).

We also examined a commonly-used alternative model from Pradel (1996), because of the addition of a population growth rate parameter. However, we limit our reporting on the Pradel model, because its estimates are very similar to those of the Jolly-Seber model yet less precise.

The assumptions for the Jolly-Seber model are as follows Seber (1982):

  1. 1.

    Each animal in the population has the same probability of capture, \(p_{t}\), at sampling occasion \(t\).

  2. 2.

    Each animal has the same probability of survival, \(\phi _{t}\), at a given interval between sampling occasions \(t\) and \(t+1\).

  3. 3.

    Marks are not lost or misidentified, and all marks are reported.

  4. 4.

    Sampling is instantaneous, relative to the interval between sampling occasions.

Various Jolly-Seber model configurations were developed within MARK and compared to identify the best model. The \(\phi \) parameters were modeled as dependent on time period, denoted \(\phi (t)\); or constant, \(\phi (.)\). The \(p\) parameters were modeled with the following scenarios: time-dependent, \(p(t)\); constant, \(p(.)\); a linear function of effort, \(p\)(eff); dependent on fluke distinctiveness through use of an individual covariate, \(p\)(FD); and dependent on a combination of both time and fluke distinctiveness, \(p(t,\hbox {FD})\). It was expected that there would be variability in capture, but unknown if relative effort or the more flexible time effects would best explain this variability. Further, fluke distinctiveness was suspected to affect sightability (see Sect. 4.3.1). Therefore, models with all possible combinations of \(\phi \) and \(p\) configurations were constructed, because no combination could be ruled out in terms of biological plausibility. The parameter for the probability of new recruits (entrants), pent, was modeled dependent on time in all cases because it is well known that the number of births varies from year to year. The overall population estimate, \(\widehat{N}\), was estimated, which is the superpopulation size or the number of whales available at some point throughout the duration of the study. From \(\widehat{N}\), annual estimates of abundance, \(\widehat{N}_t\) were derived within the MARK software.

Competing models were evaluated using the Akaike’s Information Criterion (AIC) adjusted for small sample sizes, AICc (Burnham and Anderson 2002, pg 128). The best model was selected using the guidelines offered by Burnham and Anderson (2002), pg 128: the model with the lowest AICc value is considered “best” and other models are given in terms of the difference, \(\Delta \)AICc with the best model. Any model with \(\Delta \)AICc \(\le \) 2 has “...no credible evidence that the model should be ruled out”. Models with \(\Delta \)AICc of 2–4 offer “weak evidence of not being the best model”, models with \(\Delta \)AICc of 4–7 offer “definite evidence”, \(\Delta \)AICc of 7–10 offer “strong evidence”, and \(\Delta \)AICc \(>\)10 offer “very strong evidence of not being the best model”. Goodness of fit was examined through deviance residuals and no unwanted patterns were seen.

2.7.2 Sensitivity analysis: early year photo quality

Photos from some early years were not available to be assessed for quality (specifically: 1980, 1983–1985, and 1987). Because there is no means of measuring or compensating for this, we anticipated that there would be some level of bias introduced by data from these years. To address this, we conducted a sensitivity analysis to determine the effect of these early years on the remainder of the time series. We did this by re-running the selected Jolly-Seber model using only data from 1988 to 2009, years where all photos were assessed for quality. Estimates from this model were then compared to estimates from the complete time series both graphically and by calculating the correlation coefficient for the overlapping portion of both time series.

2.7.3 Sensitivity analysis: survival

To assess the impact of emigrating whales on the abundance estimates, another sensitivity analysis was conducted. Survival probability fixed equal to 0.98 in the selected model and the other parameters were re-estimated. A survival probability of 0.98 was selected because this is the rate that was found in the only other assessment of survival of humpback whales in PWS (Mizroch et al. 2004). The results were then compared to those from the base Jolly-Seber model to assess how a different survival would affect the model outputs.

2.8 Comparison of time series approaches

Abundance time series results from all four time series models, the two count per effort models and the abundance estimates from Jolly-Seber and Pradel mark-recapture models, were compared in a correlation analysis to evaluate their similarity for this dataset.

2.9 Rate of increase

The ROI, frequently used in demography, is defined as the geometric mean of the average annual population growth rate minus 1 (expressed as a percentage rate of increase). It is calculated in Eq. (1) from the changes in logarithms of abundance estimates \(\left\{ {\Delta _y =\Delta \, \hbox {ln}(\widehat{N}_y)}\right\} \) produced by the Jolly-Seber model

$$\begin{aligned} ROI=\hbox {exp}\left( {\frac{1}{Y}\sum \Delta \,\hbox {ln}\left( {\widehat{N_y}}\right) } \right) -1=\exp \left( {\overline{\Delta }}\right) -1, \end{aligned}$$
(1)

in which \(\overline{\Delta } = \sum \Delta _y/Y\) and Y is the number of years. The approximate SE of ROI (equal to the SE of ROI + 1) was found from the delta method (Seber 1982, pp. 7–9). It is calculated as in Eq. (2)

$$\begin{aligned} SE(ROI)=\overline{\Delta } SE\left( {\overline{\Delta }}\right) =\left( {ROI+1}\right) SE\left( {\overline{\Delta }}\right) , \end{aligned}$$
(2)

for which the standard error \(SE(\overline{\Delta })\) is found empirically in Eq. (3):

$$\begin{aligned} SE\left( {\overline{\Delta }}\right) =\left( {\frac{1}{Y}} \right) \sqrt{\sum SE^{2}(\Delta _y )}, \end{aligned}$$
(3)

from the estimates \(\{SE(\Delta _y)\}\) produced by the MARK software. This calculation ignores estimated covariances between the \(\{\Delta _y\}\) values, but these were small compared to the variances of \(\hat{N}\). Equation (4) was used to approximate a 95 % confidence interval for ROI.

$$\begin{aligned} \textit{LCI}(\textit{ROI}),\textit{UCI} (\textit{ROI})=\textit{ROI}\pm 1.96 \textit{SE} (\textit{ROI}) \end{aligned}$$
(4)

This method is the same used by Calambokidis et al. (2008) to generate ROI estimates for the SPLASH study, and was employed to make direct comparisons possible.

3 Results

3.1 Effort

From examination of effort data (Table 1), our preferred effort measure for this study is in days, as it was available consistently for all years. Furthermore, effort in days is comparable to effort in nautical miles, which normally would be thought to be a more accurate measure.Footnote 3 In years where nautical miles traveled were consistently recorded, 1984–2002 and 2004, nautical miles and number of days on the water were significantly correlated (Pearson \(r = 0.99, p < 0.001\)).

3.2 Humpback whale time series: count per unit effort

Counts of humpback whale sightings from this study ranged from 0.58 whales per day in 1984 to 5.03 whales per day in 1980 (Fig. 2). There was no significant trend in this time series (\(p = 0.19\)). Number of unique individuals identified each year divided by days of effort ranged from 0.23 in 1985 to 2.00 in 1979 with a trend that was similar to the average rate.Footnote 4 As in the sightings per effort time series, there was no significant trend in the unique individual per effort time series (\(p = 0.48\)).

Fig. 2
figure 2

Average humpback whale count per day of effort for each year; all whales seen and recorded were included in the count, regardless of whether or not the whale was photographically identified. Error bars are 1 standard deviation. Trend line is a linear regression

3.3 Humpback whale time series: mark-recapture modeling

3.3.1 Mark recapture modeling consideration: fluke distinctiveness

The number of sightings of all individuals as a function of their fluke distinctiveness index had a slight, yet significantly increasing trend (\(p = 0.015\), Fig. 3). This indicates that some level of heterogeneity bias may be present in the dataset based on recognizability of flukes.

Fig. 3
figure 3

Number of sightings of each individual showed plotted by fluke distinctiveness index value

3.3.2 Mark recapture modeling

Models including \(\phi (t)\) were unrealistic due to parameter estimates converging too close to the boundary of 1. This is likely indicative of too sparse data and the difficulty in detecting survival for a species with such high survival rates; results from these models are not presented. Model \(\phi (.), p(t), pent(t), N\) was selected as the best Jolly-Seber model (Table 2). This model has constant survival, \(\phi \), over the time series, estimated at 0.94 (95 % CI 0.93–0.95), and time dependent probability of capture, \(p\), with estimates ranging from 0.16 to 0.57 (Table 3), and no obvious trend over years (Fig. 4). Further, there were no issues with parameters converging on the boundary in this model, so there was no need to fix parameters equal and we had separate estimated parameters for \(p\) in each year of the time series. The mark-recapture model with an effort covariate had a significantly poorer fit with \(\Delta \)AIC much greater than the criterion of 10. Estimates of \(p\), did, however, show a significant linear relationship (\(p = 0.034\)) with effort (Fig. 5), which was not improved by adding a quadratic term. The addition of fluke distinctiveness index values as an individual covariate did not improve the model fit (Table 2). The general trend in abundance \((\hat{N}_{t})\) was quadratic and both variables in the polynomial were significant (Year, \(p = 0.0030, \hbox {Year}^{2}, p = 0.0031\); Fig. 6). The overall abundance estimate, \(\hat{N}\), was 396 (95 % CI 371–431). Estimates of the pent parameter are not reported in this paper, but are available from the senior author.

Table 2 Summary of Jolly-Seber mark-recapture model fits, analyzing sighting histories from 1978–2009, based on survival (\(\phi \)), and probability of capture \((p)\)
Table 3 Estimates and corresponding estimated standard error (SE) for probability of capture, \(\hat{{p}}\), and the derived estimates of abundance, \(\hat{N} \), by year in the selected Jolly-Seber model, \(\phi (.), p(t),\, pent(t), N\)
Fig. 4
figure 4

Capture probability parameter estimates and corresponding approximate 95 % confidence intervals the parameters in the selected Jolly-Seber model, \(\phi (.), p(t), pent(t), N\)

Fig. 5
figure 5

Estimates of probability of capture, \(p\), from the selected Jolly-Seber model versus the number of effort days. Error bars indicate approximate 95 % confidence intervals

Fig. 6
figure 6

Abundance estimates from the selected Jolly-Seber model (black squares) by year. The number of unique individuals for each year is shown in gray diamonds. No data were available for 1981 or 1982. Error bars indicate the approximate 95 % confidence intervals

3.3.3 Sensitivity analysis: early year photo quality

Abundance estimates from the same Jolly-Seber model, \(\phi \)(.) \(p(t), pent(t), N\) that was re-run using only years with complete assessments of photo quality (1988–2009), showed a statistically significant correlation between paired points in overlapping years of 0.99 (\(p < 0.001\); Fig. 7). This high degree of correlation indicates that photographs from early years were not introducing bias in estimates of later years.

Fig. 7
figure 7

Abundance estimates of humpback whales in PWS from Jolly-Seber models using all available data, 1978–2009 (black squares), or using only 1988–2009 data (gray diamonds). The gray diamonds are offset to make them separate from the black squares. No data were available for 1981 and 1982

3.3.4 Sensitivity analysis: survival

The Jolly-Seber model, \(\phi \)(.), \(p(t), pent(t), N\), was rerun with a pre-defined constant survival (making the model, \(\phi = 0.98, p(t), pent(t), N\); Fig. 8). There was a high correlation between the paired annual abundance estimate points of these model results and the original Jolly-Seber model, however, it had a much poorer fit (\(\Delta \) AICc = 114.6).

Fig. 8
figure 8

Abundance of humpback whales in Prince William Sound from Jolly-Seber models where survival is estimated (black squares) and where survival is fixed to 0.98 (gray diamonds). The gray diamonds are offset to make them separate from the black squares. Polynomial trend lines for both model results are displayed on the graph; the black solid line shows the trend for the model with estimated survival and the gray dashed line shows the trend for the model with fixed survival

3.4 Comparison of time series approaches

Neither of the count per effort time series was correlated with either the Jolly-Seber or Pradel mark-recapture time seriesFootnote 5 (Table 4). The count per effort and the number of unique IDs per effort time series had a small amount of correlation. The time series from the two mark-recapture approaches were highly correlated (\(p < 0.001\)), showing that they produced extremely similar trends (Table 4).

Table 4 Correlation matrix comparing the different time series models

3.5 Rate of increase

The ROI estimate from the Jolly-Seber model results was 4.53 % (95 % CI 3.28–5.79 %). This estimate of ROI is comparable to the SPLASH ROI estimate (Calambokidis et al. 2008) with UCI overlapping with the lower end of the SPLASH estimate of 5–7 %. There was no significant change in the individual annual rate of population change over time (\(p = 0.68\)). Our estimates are feasible as they are well within the estimated maximum theoretical biological ROI of 11.8 % per year (Zerbini et al. 2010).

4 Discussion

4.1 Effort

Although we found the number of survey days to be the best effort measure for this analysis, a finer scale would have been preferable because the time and distance covered in a given survey day are expected to be variable. However, due to missing data values, the number of survey days per year was the only complete measure of effort available over the complete time period. In addition, we were able to show that this measure of effort was a reasonable proxy by demonstrating that it was highly correlated with nautical miles for those years where nautical miles was available. We attempted to include effort as a covariate to probability of sighting as this would have decreased the overall number of parameters in the model; however, effort data was not related to probability of sighting in the mark-recapture models. Therefore, we did not select models that included effort covariates for this dataset.

4.2 Humpback whale time series: count per unit effort

Count per unit effort time series, in the form of number of sightings and number of unique individuals sighted per unit of effort, did not appear to provide reliable information about population trends for this dataset. Based on local knowledge, and known broader scale population trends (specifically, the North Pacific wide study, SPLASH), we would have expected to see a generally increasing trend in abundance over time. That we did not is likely due to a saturation effect inherent in the data collection procedures. Each whale seen in a survey requires an appreciable dedication of time in order to successfully photograph the ventral flukes of the animal for identification. This generally limits a researcher to a maximum of 10–20 individual whale encounters in a day. So, even if whale densities were increasing, it would not be possible to detect these changes beyond a certain density.

Another contributing feature to the implausibility of the count per effort series is that field sightings are complicated by many factors aside from the actual time or length of the survey in our study, including: weather conditions, sightability, prior knowledge of humpback whale locations, logistical issues, and some research teams gathering humpback whale data opportunistically only. Without the addition of more personnel, any combination of these factors would make the traditional time and distance measurements less reliable. To avoid these issues it would be necessary to design a rigorous study with predefined transects, weather and sightability constraints, and consistent on- and off-effort time and distance measurements.

The large variability in estimates is likely reflective of the nature of the survey logistics. First, the complex geography of the study area makes surveys more challenging as whales can be found deep in bays and inlets that are not observable from the main water channels. Secondly, the study area is larger than could be covered in a single day’s survey, and because humpback whales are not found homogeneously throughout the study area, some survey days had relatively high whale counts, whereas others yielded zero whale counts.

4.3 Humpback whale time series: mark-recapture modeling

4.3.1 Mark recapture modeling consideration: fluke distinctiveness

We know from the relationship between sighting frequency and the fluke distinctiveness index that there is some bias toward more distinct flukes in the sighting histories. There are two possible mechanisms in the photo identification process that could have contributed to this bias: (1) removal by photo quality cutoff and (2) misidentification errors. Despite attempts to evaluate photo quality and distinctiveness of flukes independently, it is possible that more distinct flukes were more likely to meet the photo quality criteria than less distinct flukes since these measures are inherently subjective and rankers easily mistake recognizability with photo quality. In a mark-recapture setting, this bias works to give more distinct flukes a higher probability of capture and thus is a violation of the homogeneity assumption for capture probability of all animals. The over-dispersion parameter, \(\widehat{\hbox {c}}\), is estimated in program MARK, but we were not confident in its reliability because there are many capture histories with low frequencies of occurrence. Therefore we did not attempt to correct for over-dispersion in this manner. Further, heterogeneity usually creates a negative bias, which is not of much concern here.

Alternatively, the relationship between sighting frequencies and the fluke distinctiveness index could occur by false negative misidentification errors, or instances where matches are overlooked. This could be caused by a significant change in pattern on the flukes between sightings, or may simply be the result of a matching error. Both are more likely to occur if the fluke is originally less distinct (Friday et al. 2000). One study of Atlantic humpback whales demonstrated that over eight years, 4.6 % of the 152 flukes analyzed showed a major pattern change, 31.8 % showed a moderate change, and 63.6 % showed little or no change (Carlson et al. 1990). Given the particularly long temporal scale of the sightings used in this analysis, it is likely that some portion of these flukes underwent a degree of change in fluke pattern, contributing to the misidentification error. Misidentification is equivalent to a tag loss in a mark-recapture setting, which works to inflate population size estimates. If the degree of bias toward more distinct flukes was consistent over the time series, our ROI estimates would not be affected.

Studies that have specifically addressed bias from humpback whale photo identification have shown that misidentification and heterogeneity error is small, and can be mitigated for. A double tagging study by Stevick et al. (2001) showed that false positive errors were absent altogether and false negative errors were negligible (0.1 %). Heterogeneity was assessed in a study by Friday et al. (2008) where the authors concluded that the most accurate results could be found by simply removing poor quality photos. In the present study, we implemented a photo quality standard and built models including fluke distinctiveness covariates and found that they were not selected based on AICc values. Consequently, we conclude that the overall influence from fluke distinctiveness and heterogeneity in this study was minimal.

4.3.2 Jolly-Seber models

In our Jolly-Seber mark-recapture model, an overall increasing trend in abundance demonstrates that the western PWS humpback population is in recovery from the past over-exploitations of whaling with a 5.0-fold increase in abundance. However, there appears to be a decline from approximately 2000 to 2006. The cause of this decline is unknown, though it could represent a temporary emigration from the study area. Other studies have noted that while humpback whales demonstrate a general fidelity to the feeding grounds, it is common for them to move between feeding areas with the magnitude and frequency of these movements varying from individual to individual (Baker et al. 1992; Calambokidis et al. 1996, 2001, 2008; Straley et al. 2008; Waite et al. 1999). Further, humpback whales are known to be generalist feeders (Nemoto and Kasuya 1965; Witteveen et al. 2008), and it is commonly believed that their movements within feeding grounds can be linked to food availability. In a Gulf of Maine study (Weinrich et al. 1997), movements in Atlantic humpback whales were directly linked to food availability. Humpback whale abundance was documented to decline by over 95 % in one area and increase dramatically in a neighboring area, simultaneously to the same geographic shift in a humpback whale prey species.

The abundance estimate for 2009 is much higher than the overall trend in the model, but at this point it is unclear if this is a reflection of variability in the data or a surge in immigration into the area. There was an exceptionally high number of unique individuals identified in 2009 (\(n = 89\)), which leads us to believe that this high abundance estimate is, at least in part, driven by an increased number of individuals coming into the study area (immigration). Sightings of new individuals were seen in each year.Footnote 6 In a closed population, sightings of new individuals would decrease over time as all individuals are discovered. However, in this open population, we see that the discovery of new individuals persists through the time series. Particularly, in this last year, 2009, we see an exceptionally high number of new sightings.

The overall abundance, estimated by the MARK software (P) was 396 (95 % CI 371–431). The number of unique individuals that went into the analysis was 322, indicating that the model estimates approximately 74 whales that were in the area at some point in the study, but were undetected.

4.3.3 Sensitivity analysis: early year photo quality

Because of the high correlation found in this sensitivity analysis, we concluded that the influence of poor quality photos in some early years is negligible in their effect on later years in the full time series models. However, it is possible that photos from these years are altering the results of the years they are in (1980, 1983, 1984, 1985, and 1987). In these years, inclusion of poor quality photos may have increased the risk of false negative errors-instances where photos are wrongly designated as new sightings—which may have led to an inflation of the population estimates. Though, without a way to estimate any direct or proxy measure of false negative errors in these years, we cannot measure the magnitude of this effect. Inflated estimates in these years would lead to an overall underestimation of ROI in this study. This means that our estimate of ROI could be biased low for this reason.

4.3.4 Sensitivity analysis: survival

In the selected model, the estimate of \(\phi \) (set constant throughout the time series) was 0.94. This is lower than the only other humpback whale survival rate estimate from PWS made by Mizroch et al. (2004) of 0.98 (95 % CI 0.95–1.0). However, it is consistent with other estimates made by the same research group, namely: 0.93 (95 % CI 0.91–0.95) for all of Alaska, 0.94 (95 % CI 0.92–0.95) for Southeast Alaska, and 0.96 (95 % CI 0.94–0.98) for Hawaii (Mizroch et al. 2004). Further, our estimates are consistent with a survival estimate of 0.95 (95 % CI 0.93–0.97) made for North Atlantic humpback whales (Buckland 1990), and within range of the 0.96 (SE = 0.008) survival estimate also made for North Atlantic humpback whales (Barlow and Clapham 1997). While humpback whales exhibit some level of feeding site fidelity, many studies have documented movements that contradict complete fidelity (Baker et al. 1992; Calambokidis et al. 1996, 2001, 2008; Straley et al. 2008; Waite et al. 1999). This is true for our study area as well; other analyses from our same dataset show clear evidence of exchange of individuals with adjacent areas (von Ziegesar and Matkin 1986; von Ziegesar 2012). This means that because we have no way of distinguishing between death and emigration in our models, our estimates of survival may be incorporating emigration caused by movements in addition to actual mortalities. That said, we believe our estimate to be a good representation of the averaged survival over the entire time series for western PWS.

4.3.5 Comparison to previous abundance estimates

Humpback whales in PWS were estimated using a Chapman estimator at 100 (SE = 20) individuals in 1984 (Matkin and von Ziegesar 1985). Our Jolly-Seber abundance estimate from the same year was slightly lower, at 73 (95 % CI 54–99), although the confidence intervals for these estimates nearly overlap. The difference in these estimates is most likely due to the fact that photo quality was not considered in the previous study, and the inclusion of poor quality photos leads to inflated population estimates.

We estimated humpback whale abundance at 134 (95 % CI 114–158) and 194 (95 % CI 162–231), for the summer of 2008 and 2009, respectively, while an independent study estimated 135 (SE = 12) humpback whales in the 2008–2009 winter season using a Huggins closed capture model (Rice et al. 2011). It is important to note that these abundance estimates are not directly comparable because our data are from the summer and only from western PWS, while their data were collected in the winter and represent sound-wide observations. We would expect abundance to drop substantially in winter months as the vast majority of humpback whales are thought to spend winter months in the tropical breeding grounds (Darling and McSweeney 1985; Johnson and Wolman 1984). So, despite the difference in spatial coverage, it is somewhat surprising that estimates from both studies are of similar magnitude given the seasonal differences. This suggests that a large portion of summer whales are staying late into the winter season, and is supported anecdotally by sightings of individuals seen in both summer and into fall and winter (Moran 2011, pers. comm.). One theory is that these late-season humpback whales are staying in PWS to capitalize on the local winter stocks of Pacific herring (Rice et al. 2011).

4.3.6 Model performance

In our analysis of this humpback whale dataset of exceptionally long time-frame, various limitations in model estimation were discovered. First, we were unsuccessful in estimating time-dependent survival, \(\phi _{t}\), in our mark-recapture models due to a high portion of the parameters converging on the boundary of one rather than producing realistic estimates. This is likely due to the particularly long life span of humpback whales (Gabriele et al. 2010). Because humpback whales are living so much longer than the time period between sampling intervals, the model is not able to generate fine-scale estimates of \(\varphi \) during these times.

After managing the specific problematic parameters, these long-term data were successfully described by realistic parameter estimates. Therefore, we feel that the Jolly-Seber model is an appropriate option for analyzing long-term humpback whale datasets. With this study, the first time series of humpback whale abundance from 1978–2009 has been constructed and should be of use for further investigations of ecological processes in PWS.

4.4 Rate of increase

As previously mentioned, the SPLASH study generated estimates of abundance for humpback whales in the North Pacific Basin (Calambokidis et al. 2008). These estimates are biologically feasible given they are below the maximum theoretical ROI estimate of 11.8 % (Zerbini et al. 2010). Given the earlier estimates of humpback whale abundance in the North Pacific, the population has been recovering at a rate of 5–7 % annually in order to reach current North Pacific population estimates. In PWS, we found a slightly lower ROI, 4.53 % (95 % CI 3.28–5.79 %) where the 95 % CI of overlaps with the lower end of the SPLASH ROI estimate (4.9 %). The lower end of the SPLASH ROI estimate was found using more recent data, 1993–2006; because these years are overlapping with our time series, it makes sense that our estimate would align closer with the lower end of the SPLASH estimate. The upper end (6.8 %), on the other hand, uses data spanning back to 1966—over a decade before the start of our time series. Further, estimates of ROI from SPLASH are generally higher for the western Pacific regions, and between 5.5 and 6.0 % when assessing the population without considering the western Pacific (Calambokidis et al. 2008). Therefore, the ROI results are consistent with the broader trends of humpback whale recovery seen for the North Pacific. Thus, like the North Pacific population, the western PWS population has been rapidly recovering from past over-exploitation and may be reaching levels at or even exceeding those before commercial exploitation.