Evaluating the use of thermal imagery to count harbor seals in aerial surveys

Aerial surveys of pinnipeds are often used to estimate abundance, a critical component of stock assessments and management decisions. In Alaska, USA, aerial surveys of harbor seals ( Phoca vitulina ) have historically relied on visual detections by human observers, a method which works well on large groups of seals at predictable haul-out locations, or when seals are located on a visually uniform substrate such as a sandy beach or exposed mudflat. However, regions such as the Aleutian Islands of Alaska, where harbor seals haul out in small numbers at variable locations and are inconspicuous on shore, are challenging to survey accurately. To determine whether the use of thermal imaging techniques would improve detections of harbor seals in the Aleutian Islands, we conducted a study to compare counts derived from visual detections documented by color photographs with those derived from thermal detections documented by infrared images. In 2019, we conducted 15 flights in the Aleutian Islands, completing 129 experimental trials. We manually reviewed color and thermal images to count harbor seals and used a Bayesian analysis to explore the effects of several covariates on seal detections. The thermal method detected more harbor seals than the visual method early in the day, when cloud cover was greater, and when observers had more experience operating the thermal imaging equipment. The relative improvement in performance of the thermal method was particularly notable when surveys occurred four or more hours prior to solar noon. We discuss the costs and benefits of incorporating thermal technology as part of the existing monitoring program for harbor seals in the Aleutian Islands, includ-ing the need to control for differences if incorporating new survey methods based on thermal detection.


Introduction
Estimates of population size for pinniped species are critical components of stock assessments, can reveal trends in the health and status of a population, and are necessary to inform sound management decisions, particularly where animals occupy areas that overlap with human activity. To acquire information about pinnipeds at a population level, researchers have relied on manned aerial surveys to conduct broadscale distribution and abundance surveys (Kenyon and Rice 1961;Pearson and Verts 1970;Stirling et al. 1977;Kingsley et al. 1985;Payne and Selzer 1989;Bester et al. 1995;Lonergan et al. 2011;Lowry 2014). Aerial surveys allow researchers to collect data over large geographic areas in relatively short periods of time and to obtain information from remote locations that may not be accessible by other means. While aerial techniques offer many advantages, they are still susceptible to challenges with detection probability, 1 3 or accounting for animals that are present during surveys but are not detected (Caughley 1974).
Aerial surveys of harbor seals (Phoca vitulina) are typically designed to target known haul-out sites and are timed to coincide with specific life history stages, such as reproduction or molting, when the majority of individuals are expected to be hauled out of the water and, thus, detectable for counts (Thompson and Harwood 1990;Boveng et al. 2003). Telemetry-informed correction factors to account for seals that are in the water during surveys (Huber et al. 2001;Harvey and Goley 2011) and temporal, environmental, and behavioral covariates (such as time of day, tidal height, and disturbance) that may influence haul-out numbers (Watts 1996;Simpkins et al. 2003;London et al. 2012) are subsequently applied to raw counts to correct for imperfect detection when estimating abundance or population trend.
Conventional aerial survey methods for harbor seals require observers to visually locate seals and to take a series of oblique photographs that are reviewed after the flight to count individuals. This approach is suitable for harbor seals as they are typically non-migratory and relatively faithful to haul-out locations throughout their adult lifespan (Scheffer and Slipp 1944;Fisher 1952;Bigg 1981;Hastings et al. 2004). Aerial surveys that rely on visual cues to detect seals are most effective when the seals haul out in larger aggregations and on uniform substrates that provide a marked contrast, such as a sandy beach or exposed mudflat. There is greater potential for seals to go undetected if they haul out individually or in small groups, at new or unknown locations, or occupy substrates where they are well camouflaged.
In Alaska, the Aleutian Islands stock of harbor seals is one of 12 designated management stocks that have been identified in the state (Muto et al. 2020). Small et al. (2008) documented a 67% decline in the stock across a 20-year period between 1977 and 1999. Recent trends in abundance for this stock have not shown any significant recovery from this dramatic decline (Muto et al. 2020), highlighting these seals as an at-risk population and a high priority for monitoring. The Aleutian Islands, however, pose several challenges for aerial survey coverage. Flights in this area are difficult due to the large geographic extent of the island chain, logistical constraints on operating an aircraft in a remote region with limited fueling and servicing resources, and weather patterns that frequently impede flight operations. When flights are possible, harbor seals are often difficult to locate as they are well camouflaged against the rocky substrates that are characteristic of the region. Unlike most parts of mainland Alaska, the Aleutian Islands have no large terrestrial predators such as bears or wolves. This allows harbor seals in the Aleutian Islands to occupy more areas along the coastline and to disperse in lower densities, making scanning for and detecting seals challenging.
One approach to improve the detectability of objects is the use of thermal, or infrared, imaging. The approach is based on the principles of physics whereby all objects above a temperature of absolute zero emit infrared radiation. Infrared radiation that is emitted by an object is invisible to the human eye but can be felt as heat and detected by cameras with appropriate sensors. Thermal imaging has been used for a wide range of applications in studies of terrestrial mammals (McCafferty 2007;Cilulko et al. 2013). While its use in marine mammal studies, specifically to detect wild populations, has been less extensive, thermal imaging has been explored to study species such as walrus (Barber et al. 1991;Burn et al. 2006), gray whales (Perryman et al. 1999), and polar bears (Amstrup et al. 2004). For phocids in particular, the use of infrared cameras as an alternative method to detect seals during large-scale aerial surveys is a relatively new approach. Studies of seals in Arctic regions of Russia and Alaska (USA) that incorporate thermal imaging have focused on ice-associated seals that provide a stark thermal contrast to their sea-ice habitat (Chernook et al. 2014;Conn et al. 2014;Sigler et al. 2015). Surveys of seals in the United Kingdom, however, have shown infrared technology to be highly effective at detecting harbor seals against rocky substrates (Cunningham et al. 2010;Thompson et al. 2019).
To determine whether the use of thermal imaging techniques would improve detections of harbor seals in the Aleutian Islands region, we conducted an experimental study to compare counts derived from visual detections and color images with those derived from thermal detections and infrared images. In this paper, we compare conventional aerial survey methods for harbor seals in Alaska, USA, to aerial methods that incorporate thermal imaging techniques to evaluate whether counts differed between the two modes of detection, while taking into consideration environmental and behavioral covariates.

Study area
The Aleutian Islands are a chain of volcanic islands and islets that span a broad arc between the Alaska Peninsula in the United States and the Kamchatka Peninsula in Russia, delineating the Bering Sea to the north and the Pacific Ocean to the south (Fig. 1). In the United States, the island chain extends from Attu Island in the west to the Alaska Peninsula in the east, stretching over a length of 1800 km and covering an area of approximately 17,666 km 2 . The islands are characterized by steep mountains and rocky shores, and the climate is distinguished by strong winds, heavy rainfall, and persistent fog.

Survey design
Our survey design followed an existing scheme that was previously developed to divide all coastal areas of Alaska inhabited by harbor seals into discrete geospatial survey units with unique alphanumeric identifiers. Each survey unit encompasses locations where harbor seals are known to haul out, based on historical knowledge gathered from local residents and surveys that have been conducted since the late 1980s by federal, state, and local agencies. The dataset of harbor seal haul-out locations in Alaska is currently maintained by the NOAA Alaska Fisheries Science Center and is reviewed and updated annually (Alaska Fisheries Science Center 2020).
A subset of 34 survey units within the Aleutian Islands stock was selected for our study (Fig. 1). These units were selected due to their relatively high abundance of seals and/or high numbers of haul-out sites encountered on previous surveys, while also spreading effort throughout the study area and taking into account aircraft logistical constraints. Our goal was to survey each unit twice over the two-month survey period; however, there was no prescribed time between surveys. During each survey of a unit, we conducted two independent passes, one that consisted of a visual Fig. 1 Map of study area in the Aleutian Islands, Alaska, showing the subset of survey units selected for this study in red detection pass and one that consisted of a thermal detection pass (hereafter referred to as a visual trial and a thermal trial). Trials for each survey unit were conducted in succession except when aircraft endurance or weather made this impractical; however, trials that we were not able to conduct in succession were still conducted relatively close in time (< 1 h).
For visual trials, our target survey altitude was 229 m (750 ft), which was our standard for visual/photographic surveys of harbor seals in Alaska. This survey altitude, along with our target survey speed of 100 knots, allowed the science team to detect seals visually while minimizing disturbance from the plane. For thermal trials, several altitudes were selected to test the performance of the thermal imaging equipment and to identify an optimal altitude for future thermal surveys. Based on detection results from preliminary flights that we conducted over harbor seal haul-out locations on different substrates in Cook Inlet, Alaska, the target altitudes that we selected for thermal trials included 229 m (750 ft), 396 m (1300 ft), 549 m (1800 ft), and 701 m (2300 ft). Altitudes for the thermal trials were randomized and assigned to trial pairs prior to conducting surveys. If clouds precluded surveys at an assigned altitude, the next lower surveyable altitude was used for the trial.
The science team consisted of three biologists who rotated positions every third flight, acting as either a navigator, visual scanner and photographer, or thermal imaging operator. Because the thermal imaging equipment was relatively new to all team members, we rotated positions every third flight to increase each person's exposure to the equipment while allowing time to make individual improvements. During trials, the navigator assisted each observer with descriptions of haul-out locations (marked by GPS waypoints), but the photographer and thermal imaging operator were isolated from one another on separate aviation intercom channels. The navigator was under strict instructions not to relay any information gained from the previous trial. Likewise, the pilots were instructed to follow similar tracklines during each trial to the extent possible.
Visual trials followed typical protocols for harbor seal aerial surveys in Alaska. Specifically, the designated photographer visually scanned the area for seals (with assistance from the designated navigator) and took color photographs with a hand-held digital SLR camera obliquely through an open window when seals were located. During the thermal trial, the designated operator used the thermal imaging equipment to scan the area for seals and recorded the image stream to a video file. Although thermal scans could occur directly overhead, they usually were performed at oblique angles. To account for potential disturbance by the aircraft and any unintended learning of where seals were located on the part of the science team, we randomized the order of the trial type for each trial pair, prior to conducting surveys.
We chose to conduct experimental trials at the survey unit level instead of the individual haul-out level (within each survey unit) because the complex habitat inhibited identical matching of sites or groups of seals between the two imaging methods. The degree of optical zoom and details in how habitat features were captured in the infrared imagery compared to the color imagery would require an impractical level of effort to determine the appropriate pairwise match between the two trials. Additionally, the survey unit approach was consistent with our standard monitoring protocol, so this option provided us with a better indication of how differences in detectability might affect our survey results and long-term monitoring.

Aerial surveys and equipment
During August and September of 2019, we completed 15 flights during daylight hours in the Aleutian Islands. Of those, seven flights occurred in August and eight in September. The months were chosen to correspond with the period when harbor seals in Alaska undergo their annual molt and, consequently, when we expected the greatest proportion of seals to be hauled out. Our survey platform was a fixedwing DeHavilland DHC-6-300 Twin Otter aircraft, equipped with removable windows for oblique digital photography and infrared equipment for thermal imaging mounted under the nose of the aircraft. When weather allowed, surveys were timed to occur within a 4-h window centered on low tide. Weather variables recorded during surveys included sky cover, precipitation, and air temperature.
To locate each known harbor seal haul-out location, we used tablet computers with the ForeFlight integrated application for aviation navigation, which we preloaded with custom map layers and waypoints. To maintain a GPS location on our tablets, record flight tracks, and embed the location where each digital photograph was taken into the image metadata, we used a Bad Elf GPS Pro + Bluetooth GPS receiver and data logger. For digital photographs, we used a Nikon D700 single-lens reflex camera equipped with an 80-400 mm Nikon Nikkor zoom lens and a Foolography Unleashed D200 + Bluetooth Module.
The survey aircraft was equipped with a Star SAFIRE HD unit, an advanced all-digital, high-definition, gyrostabilized, multi-sensor imaging system designed by FLIR Surveillance, Inc. (Wilsonville, Oregon, USA). The system featured a high-definition color sensor with a resolution of 1280 × 720 pixels, NTSC format, optical fields of view from 29° to 0.25°, and a maximum digital zoom option of 120 ×. The system's high-definition infrared sensor contained a 640 × 512 InSb focal plane array sensor with a 3-5 µm wavelength response, a resolution of 1280 × 720 pixels, NTSC format, optical fields of view from 30° to 0.25°, and a maximum digital zoom option of 120 ×. The system components consisted of a stabilized turret forward-looking infrared (FLIR) unit capable of 360° azimuthal coverage and + 30° to − 120° elevation coverage, a central electronics unit, and a system control unit (Fig. A1). While the system displayed differences in temperature on a grayscale, it did not provide a temperature reference for measurements of absolute, or apparent, temperature.
To view and archive the imagery, the unit was connected to a computer monitor and a Churchill Navigation geospatial video recorder that were inside the aircraft. During flights, FLIR operators scanned areas with the high-definition infrared (IR) sensor, switching between different optical fields of view (e.g., wide, medium, and narrow) as needed, depending on the survey altitude. If necessary, we used the high-definition color sensor to identify thermal sources that were ambiguous. We used two primary modes of operation: Inertial Point, where the system tracks continuously over a geographic area as the aircraft moves, and GeoPoint, where the system points to and holds a target location. Video from each thermal trial was saved to a memory card in the recording device.

Image analysis
To analyze our color imagery from the visual detection method, we used the geographic information system application QGIS (QGIS Development Team 2020) to view the spatial location of each photograph along with our survey trackline. In conjunction with the geospatial information, we reviewed all images in ACDSee Pro 10 on high-resolution computer monitors and selected the best image, or series of images, to count seals at each haul-out location. During visual surveys, we took 2178 color photographs of harbor seals, and after review, selected 559 of those images to use for counts.
To analyze our thermal imagery, we reviewed each video recording in VLC media player (version 3.0.8) and created individual snapshots when seal heat signatures were encountered (Fig. 2). We created 2380 snapshots to use for review and, of those, used 1120 that were non-duplicative for counts. Geographic coordinates were extracted from text on each snapshot using an optical character recognition script so that we could view them geospatially to determine which snapshots should be used to count seals.
We imported each image or snapshot that we selected into a custom map template connected to a PostgreSQL (PostgreSQL Global Development Group 2020) database. One biologist with extensive experience reviewing aerial images of harbor seals was selected to review the images; color images from all visual trials were reviewed first, and infrared images from all thermal trials were reviewed second. Points were manually digitized on each seal and saved to the PostgreSQL database, which we subsequently queried to summarize counts. Because surveys occurred in August and September, when most harbor seal pups are independent of and spaced farther from their mothers, pups are difficult to distinguish from juvenile seals born in the previous year. Consequently, we counted all harbor seals present and did not distinguish between different age classes.

Statistical analysis
We conducted a Bayesian hierarchical analysis of detection data that included visual survey data from the four years prior to 2019 (i.e., 2015-2018) and from the detection data for visual and thermal trials in 2019. Visual counts from surveys prior to 2019 were conducted under the same survey protocols as our visual trials and were included to increase sample size. We restricted these counts to survey units that were also surveyed in 2019, and we excluded any instances where there was incomplete coverage of a survey unit (e.g., due to localized fog or turbulence).
We assembled covariates that we thought might be important predictors of survey counts, either through an impact on the number of seals hauled out or through an impact on detection probability. These included altitude relative to the standard level (229 m), FLIR operator experience, sky cover, precipitation, time from the closest low tide, temperature, and hour-of-day in solar time (as the difference from solar noon). FLIR operator experience was calculated as the sequential number of survey units scanned by each operator. Sky cover was determined by the science team in the field as one of five categories (0-5%, > 5-30%, > 30-55%, > 55-95%, or > 95-100%) that were chosen based on Meteorological Aerodrome Report (METAR) cloud classifications, and precipitation was recorded as categories of none, fog, drizzle, and rain. The time from the closest low tide was calculated by extracting the nearest low tide time from XTide (Flater 2020) based on the tide station assigned to each survey unit and the time when the area was surveyed. Temperature in degrees Celsius was recorded from a sensor on the aircraft during visual trials. Solar time was calculated using the R package solaR (Perpiñán 2012; R Core Development Team 2017).
Our model for count data obtained from visual detection surveys in 2015-2018 and the 2019 visual-FLIR surveys is described in Eq. 1. For each unit surveyed, C i,j,k gives the count of seals obtained at trial k (k ∈ {1, 2}) of visit j to survey unit i according to the formulation, Note that for 2015-2018 surveys, there was only one trial (i.e., the visual trial) of each survey unit; that is, k = 1 for all 2015-2018 surveys. Notation is defined as follows: • 0 is an overall intercept (expected log-scale count), • 1 is a fixed effect for second trials of a survey unit, • 0 , 1 , and 2 are parameters describing the difference in expected counts between the thermal method and visual method, which may vary based on the survey altitude, • a i,j,k is the altitude of the aircraft in thermal surveys relative to the altitude of visual surveys, log i,j,k = 0 + (k − 1) 1 + 0 I i,j,k + 1 I i,j,k a i,j,k • I i,j,k is an indicator taking on 1.0 if the detection method on trial k of visit j to survey unit i was the thermal method, • is an observer experience effect, • O i,j,k is the number of survey units an observer has previously surveyed with the thermal imaging equipment (with the idea that more experience may lead to higher counts), • 1 , 2 , and 3 are hour-of-day effects on survey counts; 1 and 2 are linear and quadratic effects describing variation attributable to changes in seal haul-out behaviourthey thus apply to both survey methods; 3 gives an additional effect of hour-of-day on thermal counts (with the thought that the efficacy of thermal counts may decrease throughout the day as background medium warms), • h i,j,k is the difference between the solar hour-of-day and solar noon, • is a linear effect of sky cover on thermal survey counts (with the assumption that sky cover would not affect visual detections), • s i,j,k is the proportion of sky cover noted when beginning a survey of a particular survey unit (a mean value was used if a range was provided), • 1 and 2 are linear and quadratic tide effects, • t i,j,k is the difference between the hour surveyed and the hour of the closest low tide, • p (i) ∼ N 0, 1∕ p is random, normally distributed error associated with counts of survey unit i, and • t (i, j, k) ∼ N 0, 1∕ t is random, normally distributed error associated with trial k of visit j to survey unit i (this was intended to permit overdispersion relative to the Poisson distribution) We used JAGS (Plummer 2003) to fit the hierarchical model to count data using Markov chain Monte Carlo (MCMC). Prior to conducting surveys, we first used simulation to verify that we could recover true parameters using our hierarchical model; in this way, experimental design was tailored to our desired estimation model. Because it was a Bayesian analysis, we had to specify prior distributions for model parameters (i.e., 0 , 1 , 0 , 1 , 2 , , 1 , 2 , 3 , , 1 , in 30,000 samples from the joint posterior. We examined trace plots and monitored Gelman-Rubin diagnostics to ensure convergence to a stationary distribution.

Results
In 2019, we flew 129 experimental trials at 33 survey units. The odd number of trials was due to incomplete coverage of a survey unit within a trial, and one survey unit that we selected was not surveyed due to deteriorating weather and constraints on aircraft endurance. Of the 129 trials conducted, we completed 66 trials using the visual detection method and 63 trials using the thermal detection method (Table 1); 61 trials had complete visual and thermal trials of a survey unit. All visual trials were conducted at the standard survey altitude of 229 m, while thermal trials were slightly skewed towards higher altitudes (> 549 m). Two thermal trials were conducted under varying altitudes due to weather, so the altitudes were averaged within each of those trials. Surveys from each method were similarly distributed among each category of sky cover and precipitation, with approximately half of the surveys occurring when sky cover was > 55% and with most (90%) of the surveys occurring with no precipitation.
Survey times relative to the nearest low tide time were also similar between the two methods. The maximum time that surveys occurred prior to low tide was 7 h, while the maximum time that surveys occurred after low tide was 5 h. Temperature during visual trials ranged from 8 to 16 °C. Survey timing ranged from 7 h prior to 3 h after solar noon for the visual method, and 7 h prior to 1 h after solar noon for the thermal method. Both methods had an average survey time of 3 h prior to solar noon.
For the 61 trials that had complete visual and thermal trials of a survey unit, counts of harbor seals per survey unit derived from the visual detection method ranged from 0-351, with an average count of 89 seals (standard deviation, SD = 83 seals), while counts derived from the thermal detection method ranged from 0-388, with an average count of 102 seals (SD = 89 seals).
Effects of all covariates described below were analyzed relative to baseline conditions where survey altitude was 229 m, skies were clear, hour-of-day was at solar noon, and observers had no experience operating the thermal imaging equipment. Because most surveys had no precipitation, we did not consider it in further analyses.

Effects of trial order on survey counts
The posterior distribution for β 1 had 95% mass above zero, indicating moderate support that the second trial had higher counts than the first trial. The posterior mean prediction estimate of the proportional increase on the second trial compared to the first was 1.18 (i.e., an increase of 18%).

Effects of altitude, operator experience, and sky cover on thermal survey counts
At baseline conditions, FLIR survey counts appeared to be slightly higher at low altitudes than at high altitudes (Fig. 3a). However, there was substantial imprecision in this relationship and the quadratic shape does not appear intuitive. Our interpretation is that either FLIR detection rates do not change or they decline slightly as altitude increases.
Posterior distributions of were centered near zero, but were slightly positive, providing weak evidence that counts increased with FLIR operator experience (Fig. 3b). The mean prediction was ≈ 15% higher counts with 12 survey units of FLIR experience compared to no FLIR experience.
The posterior distribution for κ had a mode above zero, but there was also some mass (≈ 5%) below zero indicating that there was moderate evidence of higher counts from the thermal method when there was greater sky cover (Fig. 3c). The mean prediction was ≈ 48% higher counts with complete cloud cover compared to no cloud cover.

Effects of tide on survey counts
Although not related to the performance of the FLIR, we accounted for tide (time from nearest low tide) as it has been shown to affect harbor seal aerial survey counts Ver Hoef and Frost 2003). Thus, tide effects apply to both thermal and visual detections of seals. There was weak support for counts being lower close to low tide (Fig. 3d).

Effects of hour-of-day on survey counts
Peaks in harbor seal counts were predicted to occur approximately 2 h prior to solar noon for the visual method and 6 h prior to solar noon for the thermal method (Fig. 4). Predicted thermal counts were notably higher at the beginning of the survey day (i.e., ≥ 4 h prior to solar noon) but steadily declined thereafter. For example, thermal counts were predicted to be roughly 200% higher than visual counts 7 h before solar noon, but 63% lower 3 h after solar noon.

Relative detection efficiency of the thermal method
We compared the relationship between visual counts and FLIR counts as a function of time-of-day, where time was relative to solar noon. We binned time into three categories where "early" represented surveys that occurred ≥ 4 h prior to solar noon, "middle" represented 2-3 h prior to solar noon, and "late" represented any time after that (ranging from 1 h before to 3 h after solar noon). Counts derived from the thermal method were higher than those from the visual method during the early solar period, but lower than the visual method during the late solar period (Fig. 5). When adding a fixed-correction of 0.5 to all counts to account for visual counts that were equal to zero, we found that the detection efficiency (ratios of FLIR to visual counts) of the FLIR was consistently higher during the early solar period (Fig. 5).

Trial order, altitude, and FLIR operator experience
Our analysis provided moderate support for higher harbor seal counts during the second trial compared to the first trial. Since we would expect disturbance from the aircraft to decrease seal counts from one trial to the next, this effect initially seemed counterintuitive. While this result may be statistical noise, it is possible that disturbance had little or no impact or that second trials frequently occurred closer to solar noon when we would expect more seals to haul out and, thus, be available for detection. We also considered the possibility that the observers or pilots benefitted from knowledge gained during first trials that resulted in higher detections during second trials. However, we believe that this explanation is unlikely given the protocols that we put in place to minimize this "learning" effect (e.g., isolating communications between the FLIR operator and photographer and asking pilots to follow similar tracklines during each trial to the extent possible). Altitude appeared to have little to no impact on counts of harbor seals from the thermal method. However, based on our experience, we determined that 549 m (1800 ft) was an ideal altitude at which we could effectively scan an area with sufficient time to fine-tune settings on the thermal camera and investigate ambiguous thermal sources by zooming in and/or switching to the color sensor. Thus, considering the practical operations during manned aerial surveys, 549 m would be our recommended altitude for future FLIR surveys with the same system configuration.
Our Bayesian model predicted higher thermal counts when individuals had more experience operating the thermal imaging equipment. This was logical considering it required practice to adjust camera settings while moving rapidly and continuously in an airplane with environmental conditions constantly changing. While training for many aspects of these surveys can often occur on the ground, we recommend a preliminary period of instructional flights that specifically focus on the operation of the FLIR system at survey altitude and speed prior to conducting surveys.

Tide
While support was weak, counts of harbor seals were predicted to be lower close to low tide. This prediction initially seemed counterintuitive given that higher proportions of harbor seals in other regions of Alaska tend to haul out near low tide. The Aleutian Islands, however, are different in a few key aspects. First, unlike mainland Alaska, the Aleutian Islands are free of large terrestrial Relationships between visual and FLIR counts (top) and FLIR detection efficiency (ratios of FLIR to visual counts) and FLIR counts (bottom) as a function of the solar period for counts of harbor seals in the Aleutian Islands, Alaska. Here, "early" corresponds to ≥ 4 h prior to solar noon, "middle" corresponds to 2-3 h prior to solar noon, and "late" is any time after that (ranging from 1 h before to 3 h after solar noon). To account for zeros in detection efficiency ratios, we added a fixed-correction of 0.5 to all counts 1 3 predators. Consequently, harbor seals have greater flexibility in the haul-out locations that they choose, which may include isolated habitats as well as the shorelines of large islands due to the lack of predation risk there. Second, the tidal range (i.e., difference in height between low and high tide) is roughly 2-5 times smaller in the Aleutian Islands compared with the Gulf of Alaska or Bristol Bay in mainland Alaska (Fett et al. 1993), and therefore, has much less impact on haul-out habitat availability in the intertidal zone. Analysis of haul-out behavior from biologgers recently deployed on harbor seals in the Aleutian Islands seems to support this with a relatively flat response to time from low tide (AFSC, unpublished data). This, along with considerable uncertainty in the tide effect in our analysis, makes it difficult to say anything definitive about the relationship between tide and counts of harbor seals from this study.

Hour-of-day and sky cover
Thermal counts were higher than visual counts at the beginning of the day (i.e., ≥ 4 h before solar noon), but steadily declined later in the day (up to 3 h after solar noon). Thermal counts were, to a lesser extent, also predicted to be higher when cloud cover was greater. The higher performance of the thermal method early in the day, and when cloud cover was greater, was reasonable given that solar radiation increases throughout the day, especially when skies are clear. As the surrounding environment, and particularly the rocky substrates that harbor seals use when they haul out, increases in temperature, the thermal signatures of seals become less distinguishable (McCafferty et al. 2007;Gooday et al. 2018). Although the body temperature of a seal that is hauled out may also increase throughout the day, the rate at which it absorbs and radiates thermal energy compared to its surroundings is likely disproportionate. Additionally, as solar radiation increases during the day and there is less contrast between the heat signatures of objects, the thermal camera may require more adjustments to effectively detect and display differences in temperature.
Evaluating temperature effects during this study presented its own unique challenges. The FLIR system that we used was designed solely for object detection and did not provide a temperature scale or reference. Even if a temperature reference was available from the system, it would not provide meaningful measurements of temperature, as atmospheric conditions would affect the transmission of thermal radiation from objects on the ground to the sensor on the airplane. Water molecules in humid air and fog absorb infrared radiation, clouds reduce the amount of solar radiation that hits the earth and act as insulators, precipitation cools objects and the cooling effect is further amplified by evaporation, and wind can transfer heat through convection (see Burke et al. 2019 for a more detailed overview of environmental factors that affect thermal infrared data). In addition to atmospheric considerations, thermal radiation from each seal would vary depending on whether its coat was dirty or clean, wet or dry, molted or molting, and light or dark in color. Although we collected ambient air temperature, those measurements were taken at the altitude of the aircraft. We considered extracting surface temperatures from weather reanalysis models but determined that the resolution of these products was too coarse to provide meaningful temperatures at seal haul-out locations.

Species misclassification
Although most survey conditions provided views of heat signatures on the thermal video that were clearly identifiable as harbor seals, it is important to note that species misclassification may be a potential source of error. Steller sea lions (Eumetopias jubatus) and sea otters (Enhydra lutris) also haul out in the Aleutian Islands, sometimes in close proximity to harbor seals, and they have similar thermal-image characteristics to seals (see Supplementary Material). In most cases, features used to classify a species were either identifiable directly on the thermal view or could be verified by temporarily switching to the color view. However, in some rare instances when environmental conditions, distances to thermal targets, or camera settings were suboptimal, it was difficult to make a species classification with 100% confidence. Similar issues have been identified in other wildlife studies that incorporate thermal imaging where animals can only be classified at a higher taxonomic level (Lethbridge et al. 2019) or they must represent singlespecies colonies (Seymour et al. 2017).

Limitations with direct comparisons
Because we faced limitations with crew size and the configuration of our platform, we were unable to conduct the visual and thermal trials simultaneously. Instead, we tried to anticipate possible sources of variation in counts and controlled for these factors either though design (e.g., via randomization, different treatment levels) or analysis (e.g., through covariates). Simultaneous operations are important when the number of individuals available for detection are expected to change between trials. Two potential factors that may affect harbor seal haul-out numbers are changes in environmental conditions and disturbance from the aircraft. To control for environmental changes, we conducted each trial relatively close in time (< 1 h). To control for disturbance effects and unintentional learning of the science team between trials, we randomized the order of each method for all trial pairs prior to the survey. Although movement of harbor seals to and from the haul-out location may still occur, we expected the overall number of seals visible during the two survey trials to be similar.

Implications for existing monitoring programs
It is worth considering the possible costs and benefits of incorporating thermal technology as part of the existing monitoring program for harbor seals in Alaska, and specifically, for the Aleutian Islands stock. It is apparent that detection of seals using the visual method alone can be poor and variable depending on time of day and other environmental conditions that influence visibility. In addition, other visibility factors such as characteristics of the haul-out substrate (darker, rocky reefs vs. mud and sand) and pelage coloration patterns (i.e., higher frequency of dark pelage in the Aleutian Islands; Shaughnessy and Fay 1977) may impact visual detectability. As such, counts of harbor seals from the visual method may be smaller than the actual number of seals hauled out and, thus, our current understanding of absolute abundance is likely to be biased low.
Inferences from current harbor seal monitoring programs in Alaska are made largely with respect to trends in abundance (increasing or decreasing numbers of individuals) rather than absolute abundance (exact numbers of individuals). As long as detection probabilities remain constant (an untestable assumption using current data collection protocols), trends are indicative of the overall pattern even if absolute abundance is underestimated, especially when environmental and detection factors are controlled for in the analysis (Eberhardt et al. 1999). However, absolute abundance is still an important component to the conservation and management of harbor seals. For instance, metrics such as potential biological removal (Wade 1998) are derived from abundance estimates that, at lower levels, could lead to regulatory restrictions on commercial fisheries that cause incidental mortality or serious injuries to harbor seals. To estimate absolute abundance, detection probability needs to be explicitly accounted for in surveys. Future efforts should devote some resources to improving its estimation. Using FLIR detections in optimal conditions may provide one approach for estimating visual detection probabilities (or at least estimating its upper bound, assuming FLIR detection probability is 100% in perfect conditions). Other methods for estimating detection probability, such as mark-recapture distance sampling (Borchers et al. 2006;Burt et al. 2014) may also work, although it would be beneficial to have more than one detection method (e.g., visual and thermal instead of two visual observers) to reduce detection heterogeneity (e.g., human observers cueing in on the same distinctive features and disproportionately detecting the same animals).
To incorporate thermal counts into existing monitoring programs, it would be important to control for differences between present and future survey methods, taking into account information on time-of-day, experience operating thermal imaging equipment, and sky cover when estimating trend. Although this could be done in theory (e.g., using controlled calibration; Racine-Poon 1988), it would introduce additional variance, as uncertainty in such relationships would need to be propagated into final trend estimates. It would also make the underlying count data model more complicated. However, it appears that infrared technology could improve seal detections under earlyday survey conditions and should be a component of future harbor seal survey protocols in the Aleutian Islands. Likewise, the use of thermal imaging could be applied to improve the detection of other mammals, marine or terrestrial, that are well camouflaged in their habitat or that disperse in lower densities.

Incorporating new technology while maintaining comparisons with long time-series data
Although our study demonstrates that thermal imaging technology can improve detections of harbor seals during aerial surveys, we acknowledge that those improvements occurred under specific circumstances. It is clear that there is no one-size-fits-all approach; new technology may improve our understanding of a population when used in conjunction with, and as a complement to, conventional methods. Perhaps the bigger challenge, then, is how to integrate new techniques while maintaining relevance to historical datasets (e.g., Womble et al. 2020). For remote sensing techniques in particular, ecologists recognize their importance in existing wildlife monitoring programs (Marvin et al. 2016;Stephenson 2019), but the framework to make comparisons with long time-series data collected using conventional techniques is less clear.
As with the introduction of any new method of collecting data, its effects must be thoughtfully considered in the survey design and analysis of a study, but analytical tools must also be developed to accommodate the transition. For instance, it is important to calibrate new and existing methods using correction factors derived from experimental data (such as data that we have gathered in this study). Evidently, such correction factors should vary according to environmental conditions and factors such as hour-of-day. In addition to point estimates, we believe it is important to propagate uncertainty about correction factors into future harbor seal trend analysis. A Bayesian approach to controlled calibration (e.g., Racine-Poon 1988) is therefore attractive, as it provides a straightforward way to include both correction factors and their associated uncertainties into subsequent analysis.