Introduction

A variety of methods are available for estimating the abundance of marine mammals (and other species) including correcting and extrapolating counts, transect sampling, spatial modelling, and mark-recapture approaches (Hammond et al. 2021). The individual recognition data obtained by identifying and following individual animals used in mark-recapture approaches to estimate abundance can also be used to estimate survival rates (e.g., Lebreton et al. 1992; Ramp et al. 2014; Arso Civil et al. 2019) and reproductive rates (e.g., Barlow and Clapham 1997; Arso Civil et al. 2017; Coxon et al. 2022), which are essential parameters when modelling the dynamics and assessing the conservation status of animal populations.

Photo-identification has become widely used to follow marine mammals since researchers first noticed that some individuals possessed naturally occurring, identifiable, and persistent features. The unique markings of bottlenose dolphins (Tursiops truncatus) were recorded and tracked as early as the 1950s (Caldwell 1955). Photo-identification of killer whales in the northeastern Pacific Ocean began in the 1970s (Bigg 1982), and the resulting demographic records now span decades. Other examples of photo-identification studies that have generated long-term datasets include bottlenose dolphins in Sarasota Bay, Florida (Wells and Scott 1990), North Atlantic right whales (Eubalaena glacialis; Pace et al. 2017), North Atlantic humpback whales (Megaptera novaeangliae; Stevick et al. 2003), blue (Balaenoptera musculus) and fin (B. physalus) whales in the Gulf of St Lawrence (Ramp et al. 2006, 2014); and southern right whales off Argentina (Eubalaena australis; Agrelo et al. 2021).

Data used in mark-recapture analysis must meet a number of assumptions if reliable parameter estimates are to be made: (1) marks are unique; (2) marks cannot be lost or missed; (3) all marks are correctly recorded and reported (Hammond 2018). The purpose of this study is to explore the variability inherent in correctly matching individuals among sampling occasions.

Errors in individual identification are known to occur (Payne et al. 1983; Langtimm et al. 2004), and the few studies that have explored effects of misidentification have found that, even at small rates, errors in identification can bias parameter estimates (Stevick et al. 2001; Lukacs and Burnham 2005; Yoshizaki et al. 2009). Misidentification involves many factors, but a recurring theme involves the importance of choosing the right features to use as a natural mark in order to satisfy assumption 2, above. Anatomical features should be chosen so that the natural markings used in a mark-recapture experiment will last longer than the experiment and should not change in such a way that might affect the ability to recognize it in future (Wilson et al. 1999). For killer whales, the shape of the dorsal fin and patterns in the saddle patch are most often used as natural marks (Bigg 1982; Kuningas et al. 2014). For humpback whales, pigmentation patterns on the underside of the flukes, as well as the edge of the flukes themselves, are used to identify individuals (Stevick et al. 2001, 2003). In this study, naturally occurring nicks and notches in a Pacific white-sided dolphin’s (Lagenorhynchus obliquidens; Fig. A1) dorsal fin were used as natural marks, such that it could be recognized from both left- and right-side photographs.

Observers tend to conflate photo-quality with animal distinctiveness because a well-marked individual is more easily recognized than a subtly marked individual in a poor-quality photograph (Urian et al. 2015). As a result, previous cetacean studies have relied on strict protocols when gauging whether two photographs are a match (Wilson et al. 1999; Read et al. 2003). However, the final dataset used to estimate population parameters is still subject to human error, because it is dependent on a somewhat subjective decision about whether a human observer is convinced that a pair of photographs represent two encounters of the same individual or two different individuals. Little attention has been paid to the process by which researchers reach a final decision about whether two photographs constitute a match, but a survey has shown that researchers vary widely in their approach to defining a match (Urian et al. 2015). Historically, cetacean studies use “conservative” protocols and, after seeking advice from experienced colleagues in the case of any ambiguity, only score two photographs as a match if there is consensus among observers (Friday et al. 2000; Stevick et al. 2001; Urian et al. 2015). Most protocols reviewed by Urian et al. (2015) are inherently averse to false positives; the corollary to this is that false negatives will arise as a result (Stevick et al. 2001). Not all researchers use protocols that are averse to false positives. Urian and colleagues reported “an unsettling degree of variation among researchers in the evaluation of image quality, distinctiveness, images selected and matches. Participants from the same institution generally had similar results, suggesting that most variation was due to the different methods used by each laboratory.” Many researchers may be trained to quantify their level of certainty that two photographs do or do not represent a match, but there is little guidance from statisticians about how to incorporate that uncertainty into the binary framework of conventional mark-recapture models.

Erring on the side of false negatives is not always a precautionary approach. Deciding always to call ambiguous matches a non-match will cause recapture rates to be biased low, which will cause estimates of abundance to be positively biased and estimates of survival rates to be negatively biased (Hammond 1986; Hammond et al. 1990; Friday et al. 2008). For management procedures that set allowable harm limits based on abundance (e.g., Wade 1998; Winship et al. 2006; Genu et al. 2021) a positively biased abundance estimate could lead to overexploitation. The extent to which this is a problem for real-world conservation and management decisions is case-specific, but few studies have estimated the magnitude of bias in abundance and survival estimates depending on matching uncertainty.

There are two primary reasons for misidentification errors: (1) errors in identification due to changes in the natural markings; and (2) misidentification as a result of variation at the level of the matching process. The first can occur if individuals acquire new marks such as scars or damage due to predation or intra-specific interactions (Gordon 1987; Steiger et al. 2008), or if marks such as scratches or pigmentation patterns heal and subsequently disappear (Dufault and Whitehead 1995). Dufault and Whitehead (1995) found that mark acquisition occurred at a higher rate than mark loss. Mark acquisition is presumed less likely to cause misidentification, especially in small populations (Urian et al. 2015), but it is easy to imagine a scenario in which mark acquisition may lead to changes that are substantial enough for false negative errors to occur. For larger, wide-ranging populations, it is recommended that mark acquisition rates are estimated and that strict animal distinctiveness criteria that rely on markings that are unlikely to change over time are used (Urian et al. 2015).

The second misidentification process, errors that occur at the level of the matching process, has received comparatively little attention. Previous analyses have shown that conflating photo-quality with individual distinctiveness biases the matching process and, subsequently, the parameter estimates from mark-recapture analyses (Arnbom 1987; Friday et al. 2000, 2008). False rejections of true matches, and field protocols that photograph individuals using non-symmetrical markings on left and right sides, can result in a dataset containing multiple encounter histories for an individual (Hiby et al. 2012). In addition, many animals may simply have similar markings. As the number of individuals in a population increases, so too does the difficulty in distinguishing individuals with similar natural markings. The extent to which matching uncertainty biases resulting estimates of population parameters requires investigation for each study. Differences in protocols among individuals and laboratories will likely result in different biases in parameter estimates, as long as protocols require investigators to force an inherently subjective matching process into a binary (match/not-a-match) outcome (Urian et al. 2015). This issue has become increasingly important as long-term cetacean studies have switched from film to digital photography, which may introduce heterogeneity in matches (Urian et al. 2015). Ideally, the level of uncertainty associated with any given match should be quantified and incorporated into resulting population parameter estimates (Urian et al. 2015).

Acknowledging explicitly the uncertainty in the photo matching process, the aim of this study was to use 6 years of photographic data on Pacific white-sided dolphins to quantify the extent to which matching uncertainty affects the bias and precision of abundance and survival estimates. The study also explores the challenges inherent in datasets with relatively low rates of recapture and the effect of matching uncertainty in these cases.

Methods

Study area

The study took place in the waters between northeastern Vancouver Island, British Columbia (BC), Canada and the Broughton Archipelago and Knight Inlet on BC’s mainland coast. The study area is characterized by a complex geography of numerous islands, narrow inlets, and fjords (Fig. 1).

Fig. 1
figure 1

Map of the study area. The blue polygon represents the study area of the Broughton Archipelago, British Columbia, Canada and adjacent waters

Data collection

To ensure consistency in data collection with an existing, long-term catalogue and to maximise the number of long-term resightings, field protocols in the current study followed those of a previous study as closely as possible (Morton 2000). Photo-identification surveys for Pacific white-sided dolphins in the Broughton Archipelago were conducted from 2008 to 2013. Photo-ID effort was distributed throughout the year but was restricted by weather conditions. Groups of dolphins were found using a combination of boat-based searches and from radio reports and communication from local mariners. Reports from a stationary hydrophone network (OrcaLab) monitored 24 h/day (Morton and Symonds 2002; Deecke et al. 2010), were used to direct dolphin searches. Searches and photographic encounters were limited to sea conditions of a maximum Beaufort scale = 2 for reasons of safety and sightability.

For each encounter, a GPS position and an estimate of group size was made in the field and recorded. Total group size was estimated by tallying the number of individuals in smaller subgroups (typically 2–8 individuals) at intervals throughout the encounter (Morton 2000). A group was defined as all of the dolphins encountered in a discrete location in a day. Finer scale information (e.g., groups defined using a 15 m ‘chain rule’; Smolker et al. 1992) on group composition was collected from 2011 onwards to inform studies of sociality. Encounters with dolphins lasted a minimum of 20 min during which the following data were recorded: an estimate of group size (minimum, maximum, and best estimate); location; predominant group activity state (although scan-sample data were collected at 5-min intervals during longer encounters), and number of calves in the group (minimum, maximum, and best estimate). Photographs were collected with digital SLR cameras.

Groups of dolphins were approached slowly in an effort to reduce the probability of bow-riding behaviour, which brings some individuals, especially juveniles, very close to the boat and makes other individuals less available for photographic capture. Large groups were generally traversed in two passes to try to obtain photographs from both sides. In the first pass, individuals were photographed in sub-groups as each sub-group came into photographing range until the far edge of the group was reached. In the second pass, the group was traversed in the opposite direction and at the same angle as the first pass and individuals were photographed in sub-groups in the same manner as the first pass. Dolphins were photographed almost exclusively while engaged in slow, milling (non-directional) behaviour in tight groups (behaviour typically observed following medium to high speed travel). The non-directional/milling behaviour facilitated photographing both right and left sides of the dorsal fin.

Photo-ID efforts ended when dolphins engaged in activity states (e.g., high-speed travel) that resulted in water splashing around the dorsal fin, which results in poor quality photographs. An encounter ended when all of the members of the group had been approached, if weather conditions changed or when the time limit of the research permit (30 min per sub-group) was reached.

Photo-processing methods

All photographs of a dorsal fin were graded for quality of the image and distinctiveness of the markings in two independent stages (Urian et al. 2015). Information on quality, distinctiveness and other attributes were entered into Photo Mechanic 5 (Camera Bits) photo-processing software.Footnote 1 First, photographs were graded for photographic quality using a standardized set of photographic quality criteria ranging from 1 (poor quality) to 3 (high quality) following the image quality scoring criteria used in studies of bottlenose dolphins in Scotland (Wilson et al. 1999). Dorsal fins of Pacific white-sided dolphins varied among individuals from extremely well marked with nicks and scars, to completely clean, unmarked fins. Thus, not all dolphins were distinctive enough to be included in mark-recapture analyses. A separate photo reviewer scored each quality 3 photograph to grade the distinctiveness of each individual. The distinctiveness score ranged from D1 (Highly distinctive) to D4 (Unmarked). A distinctiveness score of D2 (Moderately distinctive) included fins with intermediate features such as a small nick, or many small nicks that are detectable from both sides and D3 (Somewhat distinctive) included fins with subtle features such as such as black scratches or other long-lasting distinguishing marks that are only identifiable from one side. The D3 score category does not include nicks and notches on the trailing edge of the dorsal fin. A separate set of photo reviewers conducted the matching step (see below).

Photo-matching

A team of six photo reviewers (including EA) conducted the photographic matching of the current study’s catalogue in Photo Mechanic 5. The pattern of nicks on the trailing edge of the dorsal fin was the primary mode of identification. Fin shape provided a secondary indicator. Dorsal notches had to match in size, angle of tear and other details. The definition of a match allowed for acquisition of marks over time, but no loss of nicks; that is, if there was an additional notch on the more recent photo, but the original nick or notch was present in both photographs, then this was scored as a potential match. Nicks, notches and tears in the fin, along with the shape of the fin itself, are detectible from both sides, so the decision to include these features in the distinctiveness scoring and thresholds meant that both left- and/or right-side photographs could be used to identify individuals.

Only photographs of quality 3 and with a distinctiveness score of D1 or D2 (i.e., symmetrical markings that would be recognized from both sides) were included in the analysis. This protocol was chosen to reduce misidentification errors, while allowing both left- and right-side photographs to be included in the analysis. Photographs of individuals believed to be calves (i.e., small, ruffled dorsal fin, orange colouration, foetal fold marks on the body, photographed alongside mother) were excluded from the analysis. The high-quality subset of photographs was then matched within each photographic encounter, and each individual was assigned a preliminary identification code. Identified individuals within an encounter were matched and a certainty score of “Certain” (100% confident), “Likely” (< 100% but ≥ 90% confident), or “Possible” (< 90% but ≥ 50% confident) was assigned to putative matches between pairs of photographs based on the degree of confidence in each match.

An encounter history of 1 s and 0 s, corresponding to whether a putative individual was or was not detected (i.e., captured) during each sampling encounter, respectively, was created for each individual for each matching certainty level for analysis.

Available data

The number of sampling occasions and the months in which sampling took place each year varied widely throughout this study. Between 2008 and 2013, a total of 34 photographic encounters with dolphins occurred. Of these, 32 encounters contained photographs of sufficient quality and distinctiveness to enter the analysis for the “Certain” and “Likely” certainty levels, whereas all 34 encounters contained photographs that were of sufficient quality to create encounter histories at the “Possible” certainty level. The frequency of capture of individuals for the three certainty levels is shown in Fig. 2. The single encounter from 2008 was not included in the analysis due to low sample size (only 2 individuals were identified).

Fig. 2
figure 2

Number of times an individual was seen, tallied for each certainty level

Estimation of abundance

The encounter histories for each of the three certainty levels were analysed to produce three estimates of abundance. Chapman’s modification to the Lincoln–Petersen two-sample estimator to account for small sample bias was used to estimate abundance (Hammond 1986; Seber 2002).

$$\hat{N} = \frac{{(n_{1} + 1)(n_{2} + 1)}}{{(m_{2} + 1)}} - 1,$$

where \(\widehat{N}\) is the abundance estimate; estimate of population size, \({n}_{1}\) is the number of individuals captured during the first sampling occasion, \({n}_{2}\) is the number of individuals captured during the second sampling occasion, \({m}_{2}\) is the number of individuals recaptured. That is, the number of animals captured during the first sampling occasion that were also captured during the second sampling occasion.

For this analysis, each year was treated as a sampling occasion, and recaptures were restricted to individuals seen in adjacent pairs of years. Given the low number of recaptures in adjacent pairs of years and the comparatively large number of photographs taken in 2010, a separate within-year analysis was conducted for 2010 (see “Results”).

Variance was estimated as:

$$\mathrm{var}\left(\widehat{N}\right)=\frac{\left({n}_{1}+1\right)\left({n}_{2}+1\right)\left({n}_{1}-{m}_{2}\right)\left({n}_{2}-{m}_{2}\right)}{{\left({m}_{2}+1\right)}^{2}\left({m}_{2}+2\right)}.$$

Log-normal 95% confidence intervals were calculated (Borchers et al. 2002) as \(\widehat{N}\)/d to \(\widehat{N}\)*d, where

$$d={e}^{\left({z}_{\alpha }\sqrt{\mathrm{var}\left[\mathrm{ln}\left(\widehat{N}\right)\right]}\right)},$$
$$z_{{0.0{25}}} = { 1}.{\text{96 for a 95}}\% {\text{ CI}}{.}$$

and

$$\mathrm{var}\left[\mathrm{ln}\left(\widehat{N}\right)\right]=\mathrm{ln}\left(1+\frac{\mathrm{var}(\widehat{N})}{{\widehat{N}}^{2}}\right)$$

.

The simple two-sample estimator used makes a number of assumptions regarding population closure and capture probabilities (Hammond 2018), violation of which can introduce bias in abundance estimates. However, the objective of this exercise was to assess the relative importance of matching uncertainty on the resulting abundance estimates, not to generate a robust abundance estimate for use in decision-making. Any such violations should affect estimates in a comparable way and therefore are unlikely to compromise these results.

Estimating adult annual survival rate

Annual encounter histories from 2008 to 2013 were created for each certainty level to estimate annual apparent survival rate using a Cormack-Jolly-Seber model (Cormack 1964; Jolly 1965; Seber 1965; Amstrup et al. 2005). Models were explored that allowed survival and recapture probability to vary over time or to be constant, and the model with the lowest value of Akaike’s Information Criterion (AIC) was selected as that which had the most support from the data. Analysis was carried out in software MARK (White and Burnham 1999) version 6.1. No goodness of fit tests were conducted to test for lack of fit. However, similar to estimating abundance, the objective of this exercise was to assess the relative importance of matching uncertainty on the resulting estimates of survival, rather than generating the best estimates for wider use, and any assumption violations are unlikely to compromise these results.

Results

The number of individual dolphins photographed and recaptured in each sampling period and at each matching certainty level is shown in Table 1. Including less certain matches resulted in fewer individual dolphin identifications overall, because a more permissive matching threshold will decrease the number of putative individuals in n1 and n2, and increase the number of matches in m2. Sampling effort was greatest during 2010 and high in 2011, resulting in the greatest number of recaptures between these years. The lack of recaptures in pairs of years not including 2010 preclude estimation of abundance. Consequently, data from 2012 to 2013 were pooled to boost the number of recaptures with 2011 (Table 1). Compared to other years, 2010 had a substantially higher number of within-year recaptures, so data were also analysed using two sampling occasions within 2010 (2010a: April–June; 2010b: July–November; Table 1).

Table 1 Number of individual dolphins photographed in the two sampling periods (n1, n2), and the number of matches between these (m2), for each of three matching certainty levels

Two-sample estimates at each matching certainty level produced abundance estimates of individually identifiable dolphins in the population for 2009–2010, 2010–2011, 2011–2012 + 2013 (data pooled for 2012 and 2013), and for time periods within 2010 (Table 2). Abundance of marked dolphins at the “Certain” matching level, ranged from a low of 985 (CV = 0.55) in the “2011–2012 + 2013” sample, to a high of 2005 (CV = 0.31) in 2010 (Table 2). As expected, the “Certain” matching level produced abundance estimates that were greater than the estimates at the “Certain + Likely” or “Certain + Likely + Possible” levels in the same pairs of samples (Table 2). In 2010–2011, for example, abundance was estimated as 1404 (CV = 0.34) at the highest level of certainty (“Certain”) and at 1159 (CV = 0.31) at the “Certain + Likely + Possible” matching level. The greater the matching certainty level, the lower the precision of the abundance estimates (Table 2).

Table 2 Two-sample estimates of abundance (\(\widehat{N}\)) for pairs of years from 2009 to 2011, within 2010 (2010a: April–June; 2010b: July–November), and for 2011 with pooled data for 2012 and 2013

Estimates of annual survival rate

Annual apparent survival rate for well-marked adult dolphins was estimated using a CJS model at each matching certainty level for the period 2008–2013. The model for constant survival and time-varying recapture probability had the best support from the data at all certainty levels. Annual survival for the “Certain” matching level was estimated as 0.458 (SE = 0.288) rising slightly to 0.468 (SE = 0.276) for the “Certain + Likely + Possible” matching level (Table 3). Thus, as less certain matches were included in the analyses, the apparent survival estimates increased slightly and the precision of the estimates also increased slightly. The very low estimates of apparent survival are likely a result of emigration out of the study area over the period of the study.

Table 3 Apparent survival rate estimates derived from three different levels of certainty in photographic matching from 2008 to 2013

Discussion

The study demonstrates that matching uncertainty has the potential to introduce substantial degrees of both bias and uncertainty in a real-world photo-identification study of a dolphin species with a low capture probability. The extent to which this translates into conservation or management risk hinges on the extent to which the less-than-certain matches are actually true matches rather than false positives. If all the less-than-certain matches are false positives, then using the highest certainty level as a threshold to define a match will give the least biased abundance estimate. But if all or some of the less-than-certain matches are actually true matches, then using the highest certainty level as the threshold to define a match means that the abundance estimates are positively biased.

This pattern shows up in our abundance estimates (Table 2) and, with a smaller effect, in our survival estimates (Table 3). Abundance estimates were found to vary among years and matching certainty levels. Because of inter-annual variation in effort and the low rate of recapture in 2009 and 2011–2013, it is most informative to focus on a comparison of within-year estimates from 2010. Within 2010, the abundance estimates ranged from 2005 (95% CI 1103–3645) for “Certain” matches to 1335 (95% CI 962–1852) for “Certain + Likely + Possible” matches. The “Possible” and “Likely” categories may contain false positive matches, which will cause negative bias in abundance estimates (Yoshizaki et al. 2009). However, false positive errors typically arise from inclusion of poor-quality photographs (Stevick et al. 2001; Friday et al. 2008; Barlow et al. 2011) and only the highest quality photographs were included in this analysis.

Notably, there is still much variation in abundance estimates depending on matching certainty level, despite restricting our analyses to photographs of the highest quality. While restricting analysis to the best quality photographs of marked individuals, there may still be substantial false positive errors in species in which individuals may share similar marks. As the number of individuals in the population increases, so too will the probability of seeing two dolphins with very similar markings on their dorsal fins. As more tenuous matches were categorised as recaptures in the analyses, the abundance estimates decreased (i.e., assumed to become negatively biased if the lower certainty level matches were not true matches) and the apparent precision increased. But if some of these less than certain matches actually were matches, the estimates from the “Certain + Likely” and “Certain + Likely + Possible” scenarios are less biased than the “Certain” scenarios, and the abundance estimate from the “Certain” scenarios were biased high. Although the number of recaptures in this study is low, this effect was most obvious in the within-2010 analysis, when effort was highest and there were a substantial number of matches. In terms of survival, estimates of survival rate and their precision increased only slightly as less certain matches were categorized as a match (Table 3).

Although the direction of these trends can be predicted from first principles, the magnitude of the effect found in our study was unexpectedly high, at least for abundance. The “Certain” abundance estimates were ~ 50% higher than estimates derived from “Certain + Likely + Possible” matches in the “Within 2010” and the “2011–2012 + 2013” scenarios (Table 2). The “Certain” abundance estimates were 25 and 21% higher than estimates derived from “Certain + Likely + Possible” matches in the “2009–2010” and “2010–2011” scenarios, respectively (Table 2). While reliance on absolute certainty is often described as a “conservative” and “recommended” feature of any photo matching protocol (e.g., Friday et al. 2008), our results suggest that use of an overly conservative threshold to define a match could result in substantial bias (20–50%) in abundance estimates.

The trade-off illustrated in this study exemplifies the need for researchers to decide whether it is better to include uncertain matches to increase sample size to get a more precise (but potentially biased) estimate, or to prioritise accuracy over precision. This decision is inherently case-specific. If monitoring for overall trends in abundance (Wilson et al. 1999; Gerrodette and Forcada 2005), precision may be paramount and bias less of a concern, as long as the bias remains consistent over the period of interest (Taylor and Gerrodette 1993; Taylor et al. 2007; Williams et al. 2016a).

Pacific white-sided dolphins are known to be caught in salmon gillnet fisheries in British Columbia, and our ability to assess the sustainability of that bycatch hinges on improving the accuracy and precision of both the estimates of dolphin abundance and bycatch rate (Williams et al. 2008). A new trade rule requires countries wishing to export seafood to US markets to demonstrate that their management schemes are comparable in effectiveness to those under the US Marine Mammal Protection Act (Williams et al. 2016b). We anticipate investments in filling data gaps in many understudied species and regions to facilitate compliance with this new rule (Ashe et al. 2021a,b; Hammond et al. 2021; Punt et al. 2021). Given the potential positive bias in abundance estimates using overly strict matching criteria, assuming that some likely or possible matches were actually true matches (Table 2), it will be useful to investigate the potential for matching uncertainty to bias abundance estimates in new studies, because positively biased abundance estimates could lead to overestimates of allowable harm limit for assessing sustainability of bycatch (Wade 1998; Punt et al. 2020).

When the trade-off makes the difference between having a biased estimate versus no estimate at all, then it is better to report an estimate, as long as the sources of bias are acknowledged. One could potentially quantify that bias using an approach such as the one presented here. Selecting an acceptable trade-off between bias and precision may be more challenging to consider with small sample sizes.

It is a well-known problem that traditional mark-recapture methods are sensitive to misidentification of animals that are recognised from natural markings (Link et al. 2010). Previous studies have considered effects of photo-quality and animal distinctiveness on bias and precision in abundance and survivorship estimates (Friday et al. 2000; Stevick et al. 2001). Although matching certainty is clearly confounded with photographic quality and animal distinctiveness, the current study examined matching uncertainty with a dataset of high-quality photographs of well-marked individual dolphins. The aim was to evaluate the effect on estimates of population parameters caused by uncertainty in individual identification. Results showed that while this issue had relatively modest impact on estimates of apparent survival, abundance estimates could vary by 20–50% as a result of this source of uncertainty (Table 2). It is difficult to evaluate the effect of matching uncertainty on precision of estimates, however, due to low number of recaptures in our study. Nevertheless, it is worth noting that in the within-2010 abundance estimates, which had the largest number of recaptures, there is a big difference in CV among certainty levels. The next problem, of course, is how to resolve this.

This study has looked at misidentification by examining the impact on abundance and survival estimates that arises at the processing level. Conventionally, photo reviewers processing ID photographs are instructed to assert that two photographs are or are not a match. Depending on the protocols used by a given research team, most researchers will be averse to false positives and will default to a non-match in the case of < 100% certainty, whereas others may be equally averse to false negatives (Urian et al. 2015). Nonetheless, the conventional mark-recapture models require a binary decision to be made. The current study shows that there is value to having a number of photo reviewers, matchers, and experienced researchers record their level of certainty that two photographs represent a match, because there is useful information contained in that matching certainty level (Urian et al. 2015). The size of effect due to matching uncertainty found here, as high as a third on abundance, may not be typical, but the approach could easily be incorporated in other photo-ID studies (Urian et al. 2015). In many cases, the effect on estimates of abundance or survival of matching uncertainty may be negligible, but it will be impossible to know this unless matching protocols instruct matchers to record their level of confidence in a match so that this information can be used at the analysis stage.

Our study confirms much what has been said in other studies of misidentification in mark-recapture arising from issues of data quality, and the advice for coping with the source of misidentification. Mark misidentification can arise when the sampling (e.g., photo-ID) introduces heterogeneity in the observer’s ability to recognise marks. One study that identified a minimum quality level when including photographic data in a mark-recapture analysis using two kinds of tags (i.e., photo-ID and genetics) placed bounds on the uncertainty and incorporated that uncertainty in a bootstrap estimate of the variance around the abundance estimate (Stevick et al. 2001). Conceptually, the recommendation of Stevick et al. (2001) could apply here as well: encounter histories corresponding to the three matching certainty levels could be resampled via a bootstrap, to incorporate this source of uncertainty in estimates of abundance, survival and their variances. This will be more pragmatic in the short term than the suggestion to use genetic double-tagging to minimise or avoid misidentification in the first place (Lukacs and Burnham 2005; Link et al. 2010). For some cetacean species, it may be possible to use natural markings on two morphological features as another way to investigate misidentification, e.g., northern bottlenose whales (Hyperoodon ampullatus, Gowans and Whitehead 2001) and bottlenose dolphins (Tursiops truncatus, Genov et al. 2018).

Misidentification can also arise when marks change through time. With a sufficiently large number of individuals followed through time, it may be possible to build mechanistic models to understand how marks evolve. Quantifying this effect could then allow to account for misidentification in the resulting demographic parameters. Simulation studies suggest that these mechanistic models will only work when capture probability is higher than that observed during this study (i.e., > 0.2) and when the absolute number of resightings is sufficiently large to have enough data to estimate demographic parameters and changes in marks simultaneously (Yoshizaki et al. 2009).

Statistically, the process of misidentification that this study discusses is quite challenging to address using traditional likelihood methods, but could be handled in a straightforward way using Bayesian methods (Link et al. 2010). Bayesian mark-recapture methods have been an active research area for some time (Schofield et al. 2009). As Bayesian survival estimators become more commonly used and associated code made accessible to population ecologists (Gimenez et al. 2009), addressing misidentification in a Bayesian framework may be the next logical step. By integrating all known sources of uncertainty into a single analytical framework, Bayesian methods offer the potential to assess whether misidentification (or any other source of uncertainty) has the potential to affect what is perhaps the most important question of all—does any of this uncertainty affect the ultimate category of risk to which we assign a species (Brooks et al. 2008)?