Landscape influence on biophony in an urban environment in the European Alps

Biophony is the acoustic manifestation of biodiversity, and humans interact with biophony in many ways. However, quantifying biophony across urban landscapes has proven difficult in the presence of anthrophony, or sounds generated by humans. Improved assessment methods are required to progress our understanding of the processes influencing biophony across a variety of spatial–temporal scales. We aimed to identify how the landscape influences biophony, as well as the total acoustic environment, along an urban to rural gradient. We designed the study to quantify how soundscape–landscape relationships change across a variety of spatial–temporal scales. We recorded the afternoon acoustic environment during the spring of 2016 at 30 locations in the city of Innsbruck, Austria using a spatially balanced random sampling design. We quantified the total acoustic environment with the sound exposure level (SEL) metric, and developed a new metric, percent biophony (PB), to quantify biophony while avoiding noise bias. We quantified relationships with land cover (LC) classes, as well as a landscape index, distance to nature (D2N), across 10 scales. D2N within 1280 m best predicted PB, while both the LC class trees and D2N within 40 m best predicted SEL. PB increased more throughout the spring at locations with more natural surrounding LC, while PB did not change significantly at locations with more urban surrounding LC. LC and composite indices can serve as reasonable predictors for the acoustic environment; however, the relationships are scale dependent. Mapping soundscapes can help to illustrate possible driving mechanisms and provide a valuable tool for urban management and planning.


Introduction
Sound is a natural property of landscapes and plays a significant role in ecosystems (Pijanowski et al. 2011b). While many studies have investigated the propagation (Garg and Maji 2014) and adverse impacts (Fritschi 2011) of noise-defined as undesirable sound produced by machines-modeling natural and potentially desirable sound across urban landscapes is a recent and developing area of focus (Hong and Jeon 2017). As natural sound can be beneficial to public health and its measurement can serve as an indicator for urban biodiversity (Zhang et al. 2016), methods to understand its spatial variation and factors influencing its characteristics are required by both ecologists and urban planners (Hao et al. 2016). While noise attenuation measures remain important, research from soundscape ecology argues for an integrated management approach, shifting the noise-only focus to the broader interaction of all sounds and receivers across the landscape (Smith and Pijanowski 2014;Kang et al. 2016).
Soundscape ecology separates sounds into three components: biophony-sounds created by biological organisms (e.g., birdsong, insect stridulations, and frog choruses), geophony-sounds from non-biological origin (e.g., wind, running water, and rain), and anthrophony-sounds created by human activities (e.g., traffic noise, construction noise, and human speech) (Pijanowski et al. 2011a). Anthrophony can be subdivided, where sound from machines (e.g., cars, trains, and planes) can be classified as technophony and is usually considered to be noise, while other anthropogenic sound (e.g., music and human speech) is generally not considered noise (Qi et al. 2008). For this study, we focus on the quantitative measure of the acoustic environment (i.e. the physical sounds in the landscape) as opposed to the soundscape that includes how animals perceive the acoustic environment (ISO 2014).
While anthropogenic noise has been studied in urban landscapes (e.g., Xie et al. 2011) and biophony has been studied in natural landscapes (e.g., Fuller et al. 2015;Gasc et al. 2018), work on biophony in urban areas so far has focused on the impacts of noise on animal vocalizations (see Slabbekoorn and Ripmeester 2008). Only Joo et al. (2011) and Kuehne et al. (2013) attempted to quantitatively measure biophony across noisy urban environments. Joo et al. (2011) modeled the acoustic environment from sounds recorded at 17 locations on two transects along the urban to rural gradient of East Lansing, Michigan and found strong correlations between land cover and biophony. Kuehne et al. (2013) modeled the soundlandscape relationship at 10 lakes along the urban to rural gradient of Seattle, Washington and found a negative relationship between biophony and the degree of urbanization. However, Kuehne et al. (2013) cautioned that existing biophony metrics can be biased in the presence of noise. Fairbrass et al. (2017) corroborated this concern, finding that four common acoustic indices used to quantify biophony were biased by noise in the urban environment around Greater London, UK.
An accurate assessment of how biophony varies across urban environments is needed to support soundscape planning ) and could help improve the understanding of how urban development impacts biodiversity (Rajan et al. 2019). Mapping desirable components of the acoustic environment could support holistic soundscape policies (Smith and Pijanowski 2014;Nugent et al. 2016); however, a better understanding of the relationship between sound and the landscape is required (Lomolino et al. 2015).
Furthermore, the impact of landscape scale on acoustic metrics has not been studied, as previous studies have assumed different distances at which the area surrounding a particular location contributes to its acoustic environment. For example, Joo et al. (2011) used percent urbanization within a radius of 300 m to explain observed sounds, while Kuehne et al. (2013) used a radius of 10 km. Therefore, we aimed to: (1) develop a method to quantify biophony in the presence of noise; (2) analyze the relationship between the total acoustic environment, biophony, and land cover (LC) characteristics at different spatial scales; and (3) map the total acoustic environment and biophony along an urban to rural gradient through a case study in the European Alps.

Study area and recording site selection
This study was conducted in Innsbruck, Austria, a city in the central European Alps with a population of 131,009 in 2015 (Statistics Austria 2015) (Fig. 1a). Innsbruck lies at the intersection of two important alpine trade routes: the urbanized Inn valley and the Brenner pass, the lowest elevation transportation crossing in the Alps. Innsbruck is surrounded by large natural areas, including the 727 km 2 Karwendel Nature Park (VdNÖ 2019). These characteristics result in a pronounced gradient from urban to natural or near-natural landscapes over approximately 5 km.
The study area encompassed 42.8 km 2 and ranged in elevation between 560 and 1009 m.a.s.l. We selected the recording locations (n = 30) using a spatially balanced survey design (Theobald et al. 2007). The method is based on the Reversed Randomized Quadrant-Recursive Raster (RRQRR) algorithm, which allows for the implementation of probability-based designs (e.g., stratified random sampling) while maintaining spatial balance between selected locations. We assigned LC less common in the study area with a higher probability to ensure that they were represented in the sampling, and excluded impossible recording locations such as rivers, buildings, and roads. If we could not record at the exact locations determined by the RRQRR algorithm due to practical reasons (e.g., private properties), we shifted the location to the nearest possible site. The mean distance between the actual recording locations and the locations generated by the RRQRR algorithm was 90 m. We recorded the coordinates of each location with a GPS (Apple iPhone 5s) (Fig. 1b).

Sound recording scheme
We conducted the field recordings during the spring of 2016, beginning on January 27th and ending on May 2nd. We collected between 3 and 5 recordings at each location between the hours of 14:00 and 18:00 Monday through Friday (total number of recordings = 125) for 5 min during each visit. We used a handheld recorder (model: H2n, manufacturer: ZOOM Corporation, Japan). We held the gain on the recorder constant (? 23.4 dB), positioned the microphone 1 m above the ground, and set the recorder to capture sound in all directions (4-channel mode on the device). The recorder mixed the signal into a stereo waveform audio file (WAV format), and we downsampled each file to a sampling rate of 24 kHz/s and a 16-bit sample depth. The recorder had a flat frequency response from 100-10,000 Hz, with a slight attenuation below 100 Hz (ZOOM 2015). The reported dB values are negative in reference to 0, which is the microphone-specific threshold value where sound becomes too loud and starts to get distorted.

Acoustic metrics
To avoid potential noise bias found in existing metrics as described by Fairbrass et al. (2017), we used two metrics to quantify the acoustic environment: 1. Sound exposure level (SEL) for quantifying the total acoustic energy in dB/min (Merchant et al. 2015), and 2. Percent biophony (PB) for quantifying biological sounds, defined as: where A bio is the spectrogram area of biological sounds and A total is the total spectrogram area ( Fig. 2a). As a spectrogram is represented digitally as a matrix of amplitude values, with rows arranged in increasing frequency and columns arranged in increasing time, A total is the total number of values, or cells, in the matrix, and A bio is the number of values containing any amplitude of biophony. This metric equally counts any identified biological sound, regardless of the distance between the sound source and the microphone.
To compute A bio , we implemented a biophony isolation algorithm that first adaptively removes background noise, leaving only distinct foreground acoustic events, and then secondly removes all nonbiophony foreground events resulting in a spectrogram containing only biophony. To adaptively remove background noise, we applied a modified adaptive level equalization (ALE) algorithm, which was originally developed for speech endpoint detection by Lamel et al. (1981), but later applied to remove background noise in recordings of the acoustic environment by Towsey andPlanitz (2011) andTowsey (2013).
To isolate biophony after removing background noise, we identified each foreground event with a region-labelling algorithm (implemented in the SciPy label function) (Jones et al. 2001). Then, all events containing energy below 1000 Hz were removed as they were likely technophony or geophony. All events containing energy above 11,000 Hz were also removed, as they were likely caused by small vibrations in the microphone housing. Lastly, with only biophony remaining in the spectrogram, PB was computed (Fig. 2b).
To assess the performance of the biophony isolation algorithm, we manually inspected all resulting biophony-only spectrograms and classified each into one of three categories: level 1, level 2, or level 3. We considered level 1 to contain the best results with the algorithm performing as intended, level 2 to contain minor errors though good overall performance, and level 3 to contain frequent errors though still acceptable performance.

LC characteristics
Around each sampling location, we computed the percent area of individual LC classes, as well as the average value of a composite landscape index, distance to nature (D 2 N), within 10 radii (5, 10, 20, 40, 80, 160, 320, 640, 1280, and 2560 m). Each successive radius doubles the previous, and the range represents different landscape scales (Fig. 3). LC data were derived from the Land Information System Austria (LISA), which provides a current and detailed vector-based dataset with 13 LC classes (Banko et al. 2014). We only included 4 LC classes in our analyses as we excluded LC classes that covered less than 10% of the study area (Fig. 1c). D 2 N represents the amount of anthropogenic influence on the landscape that incorporates both the degree of naturalness of a location, as well as distance to natural habitat, with values ranging from 0 (completely natural) to 1 (completely urbanized) (Rüdisser et al. 2012). D 2 N was calculated by multiplying two indices: degree of naturalness (N d ) and distance to natural habitat (D n ), which are both normalized to range between 0 and 1. N d aims to reflect biodiversity relevant anthropogenic interferences on plants, animals, and ecosystems and hence classifies LC categories along a seven-staged naturalness scale, namely natural, near-natural, semi-natural, altered, cultural, artificial with natural elements, and artificial. The seven categories-within a theoretically continuous interval scale-are defined by specific thresholds with proportional stretches. D n is defined as the distance to the nearest natural or near-natural patch, as defined by N d . When calculating D 2 N, a cutoff value of 1000 m is applied to D n where all values above 1000 m are counted equally (Rüdisser et al. 2012).

Modeling the relationship between LC characteristics and the acoustic environment
We modeled the acoustic environment with Bayesian multilevel regression, allowing us to estimate both effects across all observations, as well as across all locations. We first defined an empty model with no predictors to see how mean SEL and PB varied between recording locations. We defined level 1 (the measurement level) of the model as: where y tl is the normally-distributed estimate of PB or SEL measured at time t and location l, a l is the mean Fig. 2 a Example of a spectrogram from a section of an original recording, and b a spectrogram of biophony after applying the biophony isolation algorithm by first removing background noise and then removing foreground events with energy below 2000 Hz and above 11,000 Hz PB or SEL of all measurements at location l (location mean), e t is the difference of SEL or PB measured at time t from the location mean, and r 2 y is the withinlocation variance in SEL or PB. We defined level 2 (the location level) of the model as: where M is the mean of all SEL or PB location means (grand mean), e l is the difference between the grand mean and the location mean for location l, and r 2 a is the variance among the mean SEL or PB of each location. The empty model provided a baseline comparison for all other models and allowed us to test our assumption that parameters influencing SEL and PB vary between locations. With the variance variables defined in the empty model, we computed the intraclass correlation (ICC), defined as: The ICC quantifies the informational value of grouping the measurements by location and ranges from 0, meaning the measurements are completely independent from the location, to 1, meaning that all measurements at a location are equal (Gelman and Hill 2007).
We fit several models with a set of measurementlevel and location-level variables as predictors (Table 1) with non-informative priors. Because the LC classes are correlated, we fit separate models for each location-level variable. All models can be generally defined as: location-level where U and V are vectors of location-level and measurement-level variables, respectively, a and b represent vectors of regression parameters of the measurement-level variables at each location l, and all c symbols represent vectors of regression parameters of the location-level variables. The weather-related measurement-level variables were taken from a weather station located near the center of the study area that records observations every 10 min (station id: 11,302, name: INNSBRUCK-UNIV, maintained by the Austrian federal agency ''Zentralanstalt für Meteorologie und Geodynamik'').
To assess the performance of each model, we computed the widely applicable information criterion (WAIC) as defined in Vehtari et al. (2017), where lower WAIC values indicate better performance. To compare models, we set DWAIC = 0 for the best performing model and computed DWAIC for all remaining models, where larger DWAIC values indicate reduced performance. In addition, we computed R 2 as defined in Gelman and Hill (2007) for the location-level parameters to assess the amount of variance explained by the LC classes and D 2 N. Gelman and Pardoe (2006) caution that R 2 is not the best measure of model fit; however, they recognize the utility for communicating the variance explained by the model. Therefore, we included R 2 to assist with model interpretation.
To check the validity of the best performing models, we performed visual predictive checks of the posterior distribution. While not a fully objective and quantitative approach, Gelman and Shalizi (2013) argue that visual model checking is currently the most reliable for Bayesian multilevel models, as well as simple and intuitive. Following this approach, we simulated data (representing future or alternative observations) by drawing from the posterior distribution of the fitted models to verify that the models were able to replicate datasets visually similar to the observations, meaning they captured the structure of the observed data.

Mapping the acoustic environment
To create maps across the entire study area, we used the parameter values from the best performing models to compute the estimated PB and SEL at grid points spaced at the scale identified as the most relevant, and then applied a regularized spline interpolation following Papadimitriou et al. (2009).

Implementation and software
We performed the RRQRR location-selection algorithm using the create sample network tool in ArcGIS 10 Desktop, and we implemented all other analyses in the Python programming language (Python 3.6). We performed the modeling in the PyMC3 package (Salvatier et al. 2016) and saved all code in Jupyter notebooks for reproducibility (Kluyver et al. 2016).

Results
The results from the manual classification of the biophony isolation algorithm performance show that the approach performed as intended on 51% of the recordings, contained minor errors on 34% of the recordings, and contained frequent errors on 15% of the recordings (Table 2). We considered the performance to be acceptable as it was greatly improved over earlier approaches implementing a constant frequency to isolate biophony (e.g., as in Joo et al. (2011) and Kuehne et al. (2013)). The performance levels were associated with the content of the recordings, with level 1 containing light to moderate background technophony and occasional foreground technophony, level 2 containing areas of strong technology-biophony overlap, and level 3 containing strong wind, light rain, or near constant foreground technophony (Fig. 4).
Comparing mean PB and SEL at each location with the surrounding LC illustrates the variability in the results, although a clear trend is discernable: PB generally decreased with increasing urban LC while SEL increased (Fig. 5). This trend was further demonstrated by the negative correlation between PB and SEL (r = -0.69, p \ 0.001) (Fig. 6). PB increased during the recording period (spring) (r = 0.38, p \ 0.001), especially at locations with mean SEL \ -55 dB ref . This is illustrated by the higher variation in PB at relatively quiet locations (r PB = 9.6 when SEL \ -55 dB ref ) compared to relatively loud locations (r PB = 0.7 when SEL [ -40 dB ref ). SEL remained nearly constant during the recording period (r = 0.04, p = 0.7).
Comparing the predictive accuracy of the models suggests relative performance of landscape variables across scales, though the results are not conclusive because the standard error (SE) of the WAIC estimates were large relative to the DWAIC between models. WAIC SE ranged between 28 and 31 for PB models and 17 and 21 for SEL models, while the maximum DWAIC was 11 between PB models and 9 between SEL models. Based on DWAIC, the composite landscape index, D 2 N, served as the best predictor for PB and the second-best, behind trees LC, for SEL (Table 3). Because D 2 N and trees LC had the same DWAIC, we considered both equally as the best performing models for SEL. Unsurprisingly, developed LC classes (buildings and other constructed areas) and D 2 N had a negative effect on PB and a positive effect on SEL, while trees had a positive effect on biophony and a negative effect on SEL. Herbaceous vegetation, which includes land uses ranging from intense agriculture to small residential lawns, served as the weakest predictor for both PB and SEL. The full model results are provided in the Appendix.
The performance of the models across the range of scales resulted in similar trends among the predictors, expect for herbaceous vegetation, which was a consistently weak predictor for both PB and SEL (Fig. 7). The best performing PB models contained predictors from larger scales (mean = 772 m for the five highest ranked models) than for the best performing SEL models (mean = 192 m for the five highest ranked models). The best predictor for PB, D 2 N, was at a scale of 1280 m, while the best predictors, trees LC and D 2 N, were both at scales of 40 m (Fig. 8).
Time was the only measurement-level variable included in the final models, as the remaining variables did not increase model performance. An ICC of 0.62 for the empty SEL model and 0.39 for the empty PB model validated the assumption that observations within locations were not independent. R 2 values of the location-level variables indicated that for PB, D 2 N within 1,280 m was able to explain 91% of variation in starting PB (i.e. the intercept a) and 70% of the difference in time effects (i.e. the slope b). D 2 N within 40 m was able to explain 56% of the starting SEL and 69% of the difference in time effects (62% and 28%, respectively, for trees LC within 40 m). Comparing the ratios of the mean slope to the mean intercept for the best performing models shows that time had a large effect on PB but not on SEL, with respective ratios of 0.778 and 0.004. This observation was true for all models. The models selected to map PB and SEL across the study area are defined as: where y t is PB or SEL for observation t, T is time (weeks) since recording began, and X is surrounding D 2 N within 1,280 m and 40 m for PB and SEL, respectively, at location l.

Discussion
Implementing the PB metric was necessary to accurately assess biophony within the noisy study area. In addition to SEL, we could have separately measured the energy of biophony from our derived biophonyonly spectrograms; however, initial tests indicated that even a small amount of residual noise could strongly bias the results as the noise contained orders of magnitude greater energy than biophony. Using existing soundscape metrics, like the normalized difference soundscape index where all sound above 2 kHz is counted as biophony (Kasten et al. 2012), would have been reasonable at the locations surrounded by more natural LC but largely biased by noise at locations surrounded by more developed LC. The PB metric is simple, as well as easy to interpret and compare. While the biophony isolation algorithm proved to be effective for this study, scaling it to larger recording datasets from different landscapes would likely require significant tuning. Therefore, the challenge of automatically and reliably identifying biophony remains a valuable pursuit. Recently developed techniques have demonstrated promising results and could be applied to multiple environments (e.g., Grill and Schlüter 2017;Bellisario et al. 2019;Fairbrass et al. 2019). The modified ALE algorithm that we implemented could be incorporated as a preprocessing step to remove background noise and further improve the performance of biophony identification algorithms.
The results of this study provide evidence to the relevant scales on which high-resolution LC and composite landscape indices explain variation in components of the acoustic environment in the region Fig. 4 Representative performance of the biophony isolation algorithm for a level 1, b level 2, and c level 3 around a mid-sized city. We found that larger scales better explained the spatial variation of biophony, as measured by PB, while smaller scales better explained total sound, as measured by SEL. In line with previous sound-landscape relationship studies (e.g., Joo et al. 2011;Kuehne et al. 2013;Mennitt et al. 2014;Mullet et al. 2016), we found that landscape variables, namely individual LC classes and the composite D 2 N landscape index, can serve as important predictors of the acoustic environment. However, our findings provide a basis to further investigate the contribution from processes on specific scales.
The most frequent sound sources in the recordings provide possible explanations for the landscape scales identified as having the strongest relationships to both PB and SEL. By listening to all the field recordings, we observed that nearly all technophony stemmed from automobiles, with occasional appearances of trains and airplanes. Because the sound power of technophony was orders of magnitude higher than that for biophony-an observation also apparent in the results found by Joo et al. (2011) and Kuehne et al. (2013)we are not surprised to find a strong relationship between SEL and the landscape variables at a scale of 40 m, as it aligns with the scale of the road network in the study area. Turner et al. (2018) also identified a strong relationship between anthrophony and distance to roads, and we believe that traffic noise should be a major consideration in sampling designs and model specifications for further studies in urban environments.
While we observed that traffic noise was a dominate component at many of our recording locations, other foreground and background sources were also present. As demonstrated by Job et al. (2016), urban acoustic environments can be highly variable at small scales (i.e. street scales), although global trends have also been found on large scales (i.e. city scales) (Warren et al. 2006). We believe that new approaches should be developed that integrate contributions from multiple scales that could likely improve the predictive power of spatial acoustic models.
We observed that nearly all biophony stemmed from birdsong, which was also reported by Joo et al. (2011) in their study along an urban-rural gradient in East Lansing, Michigan. The better performance of LC variables on larger scales for predicting PB makes sense based on findings from studies of bird abundance and diversity in urban areas, where patch area and connectivity of bird habitat were identified as the most important variables (Shanahan et al. 2011). However, birdsong and bird density cannot be directly compared as singing behavior varies by species (Marler and Slabbekoorn 2004), but a better understanding of this relationship could enable approaches that incorporate bird behavior into models for urban biophony.
While LC variables can serve as important factors of biophony at large scales, thereby representing global trends, the weak relationship between LC variables and PB at smaller scales suggests that other landscape information is needed to help explain local heterogeneity. For example, Pekin et al. (2012) demonstrated that acoustic diversity was strongly correlated with LiDAR-derived metrics of vertical vegetation structure at a small scale (10 m) at a natural preserve in Costa Rica. In urban environments, anthropogenic features could both increase and decrease birdsong. For example, Cox et al. (2016) found that urban morphology may play a role in which backyard feeders birds visit.
Understanding the spatial variation of biophony is required to support efforts to quantify its value as an ecosystem service in urban areas (Liu et al. 2014;Hao Fig. 8 Predicted a percent biophony (PB) and b sound exposure level (SEL) across the study area using the best performing model et al. 2016). We likely could have produced more accurate maps by using machine learning modeling approaches (Mennitt et al. 2014;Mullet et al. 2016); however, our objective of inferring LC-sound relationships rather than out of sample prediction placed value on model simplicity and interpretability. Therefore, the maps we created serve to visually illustrate the inferred relationships and suggest possible mechanisms that may be driving each component of the acoustic environment. Implementing a multilevel modeling approach allowed us to take advantage of pooling information from all observations to improve parameter estimates within each location; however, we did not leverage the ability of our Bayesian approach to assign informative priors, which could be justified and may have increased model performance. We recommend that the impact of different priors should be explored in similar future studies using this modeling approach.
While we chose the recording scheme to maximize the number of recording locations (n = 30), we found that additional observations and recording locations would be required to enable an analysis that simultaneously considers multiple landscape variables. We believe the composite landscape index D 2 N serves as an acceptable alternative to including multiple LC classes; however, the maps resulting from the selected models illustrate where additional landscape information is needed. For example, the model estimations for PB over LC classes of bare soil and bare rock are likely too high, resulting from the relatively high D 2 N values in these areas. However, bare soil and bare rock LC are unlikely to contain high biophony. Similarly, potential spatial autocorrelation and effects of additional environmental factors should be incorporated. We were surprised that including wind speed as a predictor in SEL models did not lead to increased predictive performance; however, it is likely that wind conditions often vary greatly across the study area and some recording locations were more than 5 km from the weather station. Collecting wind speed at each location should be a consideration for similar future studies.
Mapping noise has been an effective attenuation tool employed by the European Union (EU) that could be extended to other components of the acoustic environment. While the EU Environmental Noise Directive does not require specific attenuation measures, member countries are required to map noise in urban areas. The maps are intended to increase noise awareness and organically inspire solutions. New policies could leverage the same philosophy for conserving low-noise areas or promoting natural sounds (Directive 2002/49/EC 2002Nugent et al. 2016). Towards that effort, the results of this study suggest that scale is an important component of analyzing the composition of acoustic environments and should assume a larger role in developing quantitative methods within soundscape ecology.