2.1 Introduction

Until a few decades ago, progress in bioacoustic and then ecoacoustic research was severely limited by available equipment. Over time, technological advances and the availability of user-friendly analysis software have made bioacoustics research more commonplace. The advantage of passive bioacoustic studies (in which sounds are often remotely recorded) is that the methods are non-invasive and anyone with a minimal amount of equipment can record animal sounds. However, this disadvantage diminishes if a researcher is not knowledgeable about the characteristics and limitations of the equipment being used. Given the rapid advances in digital technology, bioacousticians are often challenged with keeping up with these advances. Appropriate selection and usage of sensors, amplifiers, filters, and recorders, and proper usage of analysis software are key to valid studies on animal sounds. This chapter guides bioacoustics researchers in selecting appropriate gear for maximizing the outcomes of their research.

To record, store, and play back sounds, there are two types of devices: analog and digital. Analog recording devices, such as cassette recorders and reel-to-reel tape recorders, are now obsolete and almost completely replaced by digital recording devices. However, many researchers over time have made phonograph, reel-to-reel, or cassette recordings, which provide historical data. So, when reading an older research article in bioacoustics, one may have to consider the potential limitations of the specific equipment used at the time and their ramifications on the reported findings. Chapter 1 provides an overview of older and historic equipment.

2.2 Basic Concepts of Sound Recording

The acquisition, storage, and playback of sounds in digital systems involve the interoperation of a few independent components (Fig. 2.1). Bioacoustics researchers may choose to source the necessary components and assemble a setup themselves. The practical considerations for selecting these components will be covered in Sect. 2.3. Alternatively, researchers may opt for pre-assembled equipment. The growing market has made available a wide variety of programmable, and often customizable, autonomous recorders. Section 2.4 discusses a few of the widely used terrestrial and underwater autonomous recorders. Organizations developing autonomous recorders often invest in the necessary trial-and-error experimentation for arriving at optimal combinations of components for different applications. The use of such pre-assembled equipment allows bioacoustics researchers to circumvent the associated efforts (financial and labor). However, unique demands of specific studies may not always be addressed by existing autonomous recorders. Before diving into details of each component, we provide a quick recap of the overarching concepts and terminologies.

Fig. 2.1
figure 1

Signal chain of a typical digital recording setup in bioacoustics studies showing the different components involved in the collection, analysis, and transmission of sounds

2.2.1 Sampling Rate and Bandwidth

The sampling rate used when converting analog electronic signals to digital signals limits the maximum frequency that can be recorded. The sampling frequency is measured in hertz, and the sampling rate (which has the same value but different unit) is measured in samples/s. The frequency range is limited by the Nyquist frequency, which is ½ of the sampling frequency (see Chap. 4). Sampling frequency for the standard CD is 44.1 kHz (i.e., high enough to match the full human hearing range). An 8-kHz sampling frequency suffices to understand the human voice. Nowadays, digital recorders easily sample up to 192 kHz and higher, with the flexibility to choose lower sampling frequencies (32, 44.1, 48, 88.2, and 96 kHz are common). Instrumentation recorders can have sampling frequencies up to 1 MHz.

Despite the available sampling frequencies, the actual recording bandwidth of a recorder is dictated by the analog electronics before the analog-to-digital (AD) converter. Because most commercial recorders are designed for the recording of music or human speech, the upper frequency is often limited to 20 kHz and the electronics do not have a flat frequency response beyond this limit, even if selecting a high sampling frequency such as 192 kHz. For professional recorders, the real frequency response (i.e., the output amplitude across frequencies as a function of input amplitude) is usually stated in the equipment specifications (e.g., flat to within ±3 dB between 10 Hz and 60 kHz). If the frequency response is not specified, it is important to make some tests using a frequency-generator as a sound source. It is also important to consider that the frequencies close to the Nyquist frequency might be affected by artifacts such as aliasing.

2.2.2 Aliasing

According to sampling theory, to preserve all information in an analog signal, a sampling frequency at least twice the highest frequency in the signal (including harmonics) should be used. A non-optimal sampling frequency can produce misrepresentations of components in the original waveform, which often manifest as artifacts in a spectrographic display but are not actually present in the original signal (see Chap. 4, section on aliasing). In a spectrogram, the alias is mostly in the higher frequency region and appears as the mirror-image of the actual signals beyond the Nyquist frequency (Fig. 2.2). In digital recording, anti-aliasing filters (Sect. 2.3.2.2) are required before the sampling stage to prevent aliasing from sounds that have components higher than the Nyquist frequency.

Fig. 2.2
figure 2

Spectrogram (top) and oscillogram (bottom) of an AD-converter with a sinusoidal frequency sweep from 40 kHz to 100 kHz as input. Sampling frequency 96 kHz, and thus Nyquist frequency 48 kHz. In an ideal system with a sharp anti-aliasing filter, the spectrogram would only go up to 48 kHz and show nothing once the signal frequency went beyond Nyquist. In this real-world example, however, as the signal frequency f exceeds the Nyquist frequency fN, the alias (appearing as the downsweep) is created with frequency ffN. As such, a 50-kHz input produces a 46-kHz alias and a 52-kHz input produces a 44-kHz alias, etc. The amplitude of the alias depends on the attenuation of the anti-aliasing filter at the input frequency. An attenuation of −10 dB at 50 kHz produces an alias at 46 kHz with a level of −10 dB relative to the input level. Spectrogram generated by SeaPro (http://www.unipv.it/cibra/seapro.html; accessed 15 Mar. 2021) software

2.2.3 Amplitude Sensitivity

Amplitude sensitivity, expressed as the ratio of output voltage to input pressure, indicates how many volts are produced from a sound with a root-mean-square (rms) sound pressure of 1 Pa in air and 1 μPa in water. More commonly, sensor sensitivity is given in decibel: dB re 1 V/Pa for microphones and dB re 1 V/μPa for hydrophones. To convert the linear sensitivity to dB, one needs to take 20 log10. So, a microphone sensitivity of 1 mV/Pa (=0.001 V/Pa) can be expressed as −60 dB re 1 V/Pa. Note that an rms sound pressure of 1 Pa is equal to a sound pressure level (SPL) of 94 dB re 20 μPa, because

$$ {\displaystyle \begin{array}{l}1\kern0.5em \mathrm{Pa}=1,000,000\kern0.5em \upmu \mathrm{Pa}=50,000\times 20\kern0.5em \upmu \mathrm{Pa};\\ {}\mathrm{apply}\ 20\;{\log}_{10}\ \mathrm{and}\ \mathrm{get}:20\;{\log}_{10}\left(50,,,\kern-0.15em ,000\right)=94.\end{array}} $$

The most sensitive sensor is not necessarily the “best” sensor. When attempting to capture very loud sound, less sensitive equipment should be chosen to avoid signal distortion or, in extreme cases, damaging the equipment. If only a sensor of low sensitivity is available, then an amplifier may be used in the recording chain, but self-noise may become an issue. High sensitivity allows lower gain settings to promote a good recording.

2.2.4 Bit-Resolution and Dynamic Range

The dynamic range is the difference between the highest and lowest sound levels that can be recorded. Digital recorders usually operate with 16- or 24-bit resolution; 16 bits guarantee a dynamic range of about 96 dB (unipolar, 90 dB bipolar) and 24 bits theoretically produce a dynamic range of 144 dB (unipolar, 138 dB bipolar) thus encompassing the dynamic range of human hearing. However, even the best analog circuits rarely exceed 110 dB of dynamic range. This means that of the available 24 bits, only 20 bits are effectively used to encode the sound and the others are dominated by noise. In many conditions, the real dynamic range is limited to 70–80 dB by the noise of the sensor and preamplifier. An accurate setting of the recording levels can allow effective use of 16-bit recorders, without wasting the extra storage space required for 24-bit recording. However, when incoming sound levels cannot be predicted, the 24-bit setting allows additional dynamic range for unpredictable sound events (e.g., high-intensity impulsive noises such as from pile driving). The recorded volume should be set at a particular level to exploit the dynamic range of the recording setup: high enough to rise above the equipment self-noise during quiet times, but not too high to cause clipping of loud sounds. Recently introduced recorders allow 32-bit floating-point recording by combining the output of two 24-bit converters working with different signal gains. This simplifies the setting of recording levels but cannot yet overcome the dynamic range limitations of the microphones and of associated preamplifiers.

2.2.5 Self-Noise

All components of the signal chain suffer from self-noise, which is additive across the signal chain. Self-noise and dynamic range are the two critical specifications that affect amplitude response. For example, when recording in very quiet locations or to pick up very low-level sounds, the self-noise generated by the components of a signal chain must be taken into consideration, along with dynamic range. Self-noise limits the spatial range of bioacoustic sampling. It may also be an issue in playback, when self-noise is amplified and broadcast in addition to the intended signal. The circuits inside sensors can generate broadband background noise with various spectral shapes (i.e., not necessarily flat across the frequency band, like white noise, but worse at higher frequencies). The level of this noise is expressed in decibel (e.g., dB(A) after frequency weighting, dB re 20 μPa unweighted in air, or dB re 1 μPa unweighted in water) to indicate the equivalent sound level of noise as if generated by the environment. The self-noise of a sensor is almost always declared in its technical specifications; the same is true for professional recorders. On the contrary, for many consumer recorders, even of high quality, the self-noise measures are rarely available. A useful comparison of the self-noise of consumer recorders available on the market is presented on the website of Avisoft Bioacoustics.Footnote 1

The noisiest component of the chain determines the quality of the recording. This is particularly important when recording low-level sounds (Fig. 2.3). The input self-noise is expressed as the Equivalent Input Noise (EIN) measured in an open or unloaded circuit and expressed in dBU (the “U” stands for “unloaded”). Very good values range from −130 dBU to −120 dBU, and poor recorders have a −100 dBU EIN.

Fig. 2.3
figure 3

Spectrogram depicting high self-noise versus low self-noise output by three microphone/recorder combinations. In the left section, a low-noise system was used and the signal clearly emerged from the environment background. In the following sections, nosier systems were used; the sounds appear unclear and listening was unpleasant

2.3 Instrumentation of Signal Chain Components

To ensure that proper equipment is used for recording, analysis, and playback, researchers must consult manuals for each piece of equipment in the signal chain before conducting research. In some cases, laboratory tests may be required to verify the real performance or to calibrate equipment (Sect. 2.6). While recording, researchers must ensure that the frequency response (and, in turn, bandwidth), self-noise, and dynamic range (in particular, the maximum recording level) of the overall recording system do not end up deleting or significantly distorting a portion of the signal. Otherwise, a researcher can miss part of an animal’s sound that is outside the recording system’s sensitivity or frequency range. This might especially happen, if the sound is above or below the human hearing range. For example, elephants communicate with conspecifics using infrasounds (Payne et al. 1986), and rodents and bats produce ultrasounds for communication and foraging (see Chap. 12 on echolocation).

Other features to consider when purchasing equipment for fieldwork are the construction quality, weather proofing, reliability, visibility of the display, and ease of use in harsh conditions (see Chap. 3 on practical considerations). Powering the instruments might be a major issue with regard to practicality, cost, and safety. For example, low-noise preamplifiers generally require higher operating currents. Large-capacity batteries increase the risk of fire. During long field trips, internal rechargeable batteries may be difficult to recharge; replaceable batteries may be easier to manage, and external powering options could become a necessity (e.g., to power a recorder with a standard 5 V USB source or with a 6- or 12-V battery pack). For extended autonomous deployments, the cost of the power source might end up exceeding the cost of the recording equipment.

2.3.1 Sensors

Microphones and hydrophones convert sound pressure signals into electrical signals. The electrical signal, which is representative of the original sound waveform, can be amplified, filtered, recorded, visualized, and further analyzed or converted back to sound for playback or projection. Speakers work in the reverse and convert the electrical signal into sound for broadcast. A transducer converts a signal from one form (of energy) to another. So microphones, hydrophones, and speakers are all transducers. Usually, microphones and hydrophones, as long as they do not have a built-in preamplifier, can be used as both sound sensors and sound projectors. But their receiving and projecting amplitude sensitivities, frequency responses, and directionalities may differ.

Each microphone and hydrophone has a unique amplitude sensitivity, frequency response, and directivity pattern. These are specified in the specification sheets of high-quality sound sensors. A flat frequency response gives the least distorted audio-signal; however, during signal calibration, a non-flat response can be accounted for. The sensor size influences amplitude sensitivity, frequency response, and directionality. A sound sensor, to be omnidirectional, should be smaller than the minimum wavelength of the signal to be received. Large sensors are more sensitive but tend to limit responses at high frequencies. Large sensors become directional at lower frequencies than small sensors do.

2.3.1.1 Microphones

Microphones convert sound energy (from sound waves) into an electrical audio-signal using a moving diaphragm or membrane. Two main types of microphones are common: dynamic microphones and electrostatic microphones (condenser and electret microphones) (Brüel and Kjær 1982). Some microphones are sensitive to particle motion, as well as sound pressure, which results in them being very sensitive to sounds very close to the microphone (i.e., in the near-field). This often exaggerates the low-frequency components of the received sound.

In dynamic microphones, a coil on the back of the diaphragm is immersed in a magnetic field and generates a current by electromagnetic induction when the membrane moves (Fig. 2.4). Such microphones do not require external power, but they have limited sensitivity, making them most useful for loud signals or at close range to the sound source. The delicate mechanical suspension in dynamic microphones may warrant gentle handling.

Fig. 2.4
figure 4

Schematic of a dynamic microphone (left) and a condenser microphone (right) showing the conversion of sound waves into electrical audio-signal outputs. Microphone schematic components: 1. vibrating diaphragm, 2. coil attached to the diaphragm, 3. magnet, 4. backplate, 5. battery, 6. resistor, 7. output

Electrostatic microphones are based on a condenser with a thin moving diaphragm (Fig. 2.4). Movement of the diaphragm changes capacitance in the condenser. Capacitance changes are then converted to voltage. Condenser microphones need a high voltage to polarize the condenser. In contrast, electret microphones are permanently polarized as their diaphragms are made of metallic-coated, pre-polarized, plastic membrane. Both condenser and electret microphones need power for their integrated preamplifier, with condenser microphones requiring additional power to polarize the condenser. This power may be supplied by an internal 3–5 V battery, 48-V phantom power (P48), or a Power-In-Plug (PIP) unit. P48 is a standard means of feeding power to a condenser microphone with 48 Vdc and is commonly used in professional recorders. Modern pocket digital recorders use PIP units for powering their microphones. The membranes in electrostatic microphones are delicate and sensitive to humidity, which can be problematic in humid environments. The lower mass of electrostatic elements generally yields superior high-frequency response. However, electrostatic sensors may be noisier than dynamic sensors. For studies involving low-frequency sounds, dynamic sensors may be a better choice.

A radio-frequency microphone is a special type of condenser microphone, developed by SennheiserFootnote 2 in its MKH series. With this type of microphone, variations of the capacitor modulate the frequency of a radio-frequency oscillator, and then a demodulator extracts the audio-signal to be transmitted over a cable. The radio-frequency oscillator and the demodulator are both housed inside the microphone, and these microphones are less prone to problems of interference and humidity.

The more recently developed Micro-Electrical-Mechanical System (MEMS) microphones have pressure-sensitive elements integrated directly into a silicon chip (as found in most cell phones) with similar fabrication technologies used to make semi-conductor devices. Some integrate an AD-converter to produce a digital output. Their development resulted from the need for tiny microphones for cell phones. Because of the small size and low inertia of their sensors, MEMS microphones are sensitive to high frequencies and consequently are used in ultrasonic microphones, such as in bat detectors. Because of their low cost, they are the perfect candidates for array applications, including “acoustic cameras” that overlay the image taken by a video-camera with a map of the sound sources generated by a matrix of tens or hundreds of MEMS microphones.

Most condenser microphones have a self-noise lower than 20 dB(A), which is sufficient to record music or speech at a close distance, but not suited to record faint animal sounds and noises in a quiet environment. The quietest studio microphones have a self-noise below 10 dB(A); among these microphones is the Rode NT1A, a cardioid microphone that has an excellent self-noise of only 5.5 dB(A). Even quieter microphones are available in the category of instrumentation microphones, but few very expensive models are available. Lynch et al. (2011) and Pavan (2017) used very quiet instruments to show that noise in natural environments can be as low as 10 dB re 20 μPa and even go below 0 dB re 20 μPa below 1 kHz. Of course, a quiet microphone must be connected to a quiet recorder!

Sometimes, microphone specifications are difficult to read or self-noise is not provided. One must examine the parameters that are given, such as amplitude sensitivity and the signal-to-noise ratio (SNR). If not differently declared, the SNR is relative to 94 dB re 20 μPa (i.e., 1 Pa) at 1 kHz and thus the self-noise can be obtained by subtracting the given SNR from 94. If properly measured and reported, an SNR of 80 dB(A) means a self-noise of 14 dB(A), which is pretty good. In other cases, the sensitivity, the maximum allowed SPL, and the dynamic range are presented. In this case, the self-noise can be obtained by subtracting the dynamic range from the maximum allowed SPL.

2.3.1.1.1 Ultrasonic and Infrasonic Microphones

Microphones for ultrasounds are typically small, with a small membrane with very low inertia. Ultrasonic microphones are usually condenser microphones developed for measurement purposes, not for recording music; however, the increasing interest in ultrasonic communication and echolocation in animals (mainly bats and rodents, but also insects) has fostered the development of a wide range of sensors for ultrasounds. Ultrasonic microphones for measurement purpose need to have a flat frequency response; usually they also have high self-noise and are very expensive. If the flatness of the frequency response is not a necessity, other, lower-cost microphones can be used instead (e.g., low-cost small condenser microphones and tiny MEMS microphones). Considering that ultrasonic microphones need high sampling rates, often beyond those available in consumer digital recorders or AD-converters (see Sect. 2.3.4), ultrasonic sensors with integrated AD-converter and USB interface have been developed. In bioacoustic studies, these are mainly used for detecting and recording bats (Sect. 2.3.5), insects (Buzzetti et al. 2020), and rodents either in the wild or in etho-pharmacological studies (Buck et al. 2014).

Infrasonic microphones are specially designed for low-frequency recording, down to 1 Hz or even 0.1 Hz. Until a few decades ago, Sennheiser produced the MKH 110, a condenser microphone with 12-V powering. Now discontinued, it is still appreciated in the used equipment market. These microphones have been widely used to record elephant communication (Payne et al. 1986; Poole et al. 1988). Currently, microphones designed for infrasonic applications are largely limited to measurement (instrumentation) microphones.

2.3.1.1.2 Measurement and Specialty Microphones

Measurement microphones (or, instrumentation microphones) are a special class of microphones designed to make accurate measurements of sound amplitude within a specified frequency range, which could be infrasound to ultrasound, to accurately characterize a sound field or a sound source. These microphones comply with specific and rigid requirements. They need to have a well-defined and stable frequency response to sound (ideally flat). They usually appear as cylinders with diameters ranging from 1/8 inch for very high frequencies (but with low sensitivity) to 2 inches for high sensitivity and low noise (but limited extension to high frequencies). Normally based on condenser sensors, these microphones are often powered at 200 V. Measurement microphones are usually connected to specific digital recorders and analyzers, or integrated into a sound level meter (also known as phonometer). Usually dedicated to noise measurement, these microphones are also used to calibrate other types of instruments (see Sect. 2.6) and to record sounds for analysis and listening with great accuracy. Brüel & KjærFootnote 3 are well known for their measurement microphones; however, other manufacturers exist as well, providing a wide range of sensors for applications of sound recording, acoustic measurements, noise monitoring, building acoustics, cinema calibration, occupational health, and live sound broadcasts.

Optical microphones are a very special category of measurement microphones. A laser beam is reflected by a very tiny low-inertia sound-sensing membrane, and the reflected beam is then detected by an optical sensor to extract the modulation given by the membrane moved by sound waves. Their advantage is the direct optical output that is conducive for long-range transmission over optical cables and their insensitivity to electric and electromagnetic fields.

Wireless microphones transmit the received sound by a radio signal that can be either a standard AM- or FM-transmission or a digital format to ensure signal quality and privacy. Wireless microphones allow the cable-less transmission in situations where cables are problematic. Wireless microphones connected to a multi-channel receiver allow a wide area to be monitored. In some cases, the wireless microphones used for television interviews can be used successfully (e.g., by placing the microphone close to or inside a nest and then recording from a distance). A traditional microphone can also be equipped with a radio transmitter and a battery that powers both. The limitations include powering the transmitters (in particular, in field and long-term deployments), limited dynamic range, compromised self-noise, and radio-frequency interference during transmission.

2.3.1.1.3 Microphone Directionality

Directionality is an important characteristic of a microphone. Omnidirectional microphones detect sound from all directions and can be appropriately used for recording a soundscape (i.e., the combination of all sounds generated in an environment; see Chap. 7). Directional microphones are good for making recordings of a selected animal in a specific direction (e.g., a particular individual in a colony) and for attenuating noise coming from directions other than the signal direction (e.g., the noise of a nearby river or road). Directional microphones thus improve the SNR by reducing background sounds and noise coming from other directions in the environment. In indoor applications, directional microphones are used to focus on a performer and to attenuate reverberation from the hall. Widely available types of directional microphones include cardioid, hypercardioid, bidirectional, and unidirectional (Fig. 2.5). Cardioid microphones exhibit a heart-shaped directivity (i.e., they are less sensitive at 180° from the sound source) and they are often used with parabolic reflectors. The hypercardioid microphone is less sensitive at ±120° from the direction to the sound source. Bidirectional microphones pick up sound in a figure-of-8 pattern equally from two, opposite directions.

Fig. 2.5
figure 5

Polar patterns of directionality of different microphones. With microphones facing the top of the page, these patterns extend from the axis of the microphones, and thus present directivity in the vertical plane. In the horizontal plane, these patterns are symmetrical (i.e., they rotate about the vertical axis). (a) omnidirectional, (b) cardioid, (c) bidirectional (figure-of-8), and (d) shotgun (lobar)

Shotgun microphones (Fig. 2.5d) are the most directional and commonly used for recording a specific animal. Their use is desirable when it is necessary to improve the recording level of a specific sound source, or to attenuate unwanted sound coming from other directions. The design of shotgun microphones (such as the Sennheiser K6/ME66 or the MKH 8070) is based on the interference tube principle; usually a cardioid condenser microphone is placed at the end of a tube with slits on sides, canceling off-axis signals (Fig. 2.6). The directivity increases with the length of the interference tube and with the frequency of incoming signals, so that at high frequency (> 4 kHz), the receiving lobe is quite narrow. For lower frequencies, the directivity decreases. This also means that off-axis sounds are not only attenuated, but also have a modified frequency spectrum, with high frequencies more attenuated than low frequencies. At wavelengths longer than tube length, off-axis attenuation is null. If interested in higher frequencies, such as bird songs above 1 kHz, a high-pass filter to cut off low frequencies (e.g., to attenuate wind noise or traffic noise below 150 Hz) is available in high-quality microphones.

Fig. 2.6
figure 6

Photograph (left) of a modular microphone (Sennheiser K6/ME66) with the preamplifier body that hosts a battery to power the microphone in case the P48 powering is not available; the sensing capsule is interchangeable (omni ME62, cardioid ME64, short shotgun ME66, shotgun ME67). Polar pattern (top-right) of the microphone at different frequencies and the frequency response (bottom-right) on axis and at 90° from the sound. Reprinted with permission from Sennheiser

2.3.1.1.4 Monophonic and Stereophonic Recording

Monaural recordings are made with a single microphone. Stereo recordings are made with two microphones and provide a sense of depth or movement through space in recordings. Stereo recording offers spatial information, which helps better discriminate sound sources in the surrounding space. Three primary setups are used for stereo recordings (Fig. 2.7): XY, binaural, and MS (middle-side). A common setup for the XY stereo recording uses two cardioid or super-cardioid microphones placed at 60° or 90° angles, nose-to-nose. The two microphones can be coincident or spaced. In some cases, the left microphone points in the left direction, in other cases, the left microphone points in the right direction and the right one in the left direction.

Fig. 2.7
figure 7

XY recording configuration (left) using two cardioid microphones, and MS recording configuration (right) which typically combines a cardioid microphone in the middle and a bidirectional microphone taking the sounds coming from the sides (figure-of-8 polar pattern)

In the binaural stereo recording configuration, two omnidirectional microphones are placed approximately the distance between the ears of a typical human head (16–18 cm spacing) through the use of a mannequin head that simulates a human head and ears. This presents the idea of three-dimensional (3D) sound experience as the listeners with headphones have the sensation “to be there,” with their ears in the same position of the microphones. The microphones can also be separated with nothing in-between, or with just a generic separation, such as a sphere of foam, or a Jecklin disk. Another special binaural configuration is called the Stereo Ambient Sampling System (SASS) design that simulates a human head. Compared with other techniques, with exception of the true binaural, this type of recording produces the best spatial image when heard through headphones. In some setups, cardioid microphones angled at 60°–90°, like in the XY configuration, are used to enhance left-right separation.

In the MS microphone stereo recording setup, a cardioid microphone is piggy-backed on top of a bidirectional microphone. The cardioid picks up frontal information, whereas the bidirectional microphone gets sounds coming from the sides only. This type of recording requires specific electronics, or signal processing to combine the signals to produce a traditional stereo image. In essence, the signals from the left and right capsules are summed out-of-phase before being combined with the mono-signal. This computation allows the recordist to control the width of the stereo spread and make other adjustments in post-processing. In the early stages of the sound industry, this helped to maintain the compatibility among mono and stereo recordings. Several microphone arrangements have been developed for stereophonic recording; for a comprehensive review, see Rayburn (2011) or Streicher and Everest (1998).

Latest developments, mainly driven by the film industry to produce an immersive 3D (full-sphere, surround-sound) acoustic environment, capture sound not only in the horizontal plane, but also above and below the listener. Surround-sound recording requires several microphones in a 3D configuration, whose signals (channels) are electronically or digitally combined to produce both stereo and multi-channel surround-sound experiences, or to create specific receiving beams (e.g., to focus on a sub-space or on a specific source). The Ambisonics system allows recording of sound pressure on 3 axes with 4 microphone capsules mounted as a small tetrahedron (first order Ambisonics) (Zotter and Frank 2019). Higher-order Ambisonics microphones can have up to 32 capsules on a small sphere to achieve higher directional details and to simulate virtual directional microphones to be oriented in any direction during post-processing.

2.3.1.1.5 Microphone Arrays

Arrays of sound sensors are used to monitor animals across habitats, locate and track sound sources (such as individual animals), and study environmental noise. Arrays may be stationary (fixed in location), freely drifting (e.g., suspended from balloons), or towed. Ambisonic microphones, are a special case of microphone arrays. The sensors in an array operate in tandem. Their signals are combined in digital signal processing. A number of requirements need to be met for successful array processing (e.g., to track a bat by its biosonar). Sensor locations need to be known accurately. Sensor directionality needs to be known. Sensor spacing must be such that the target signal can be detected on multiple sensors. These sensors need to be matched and their eccentricities need to be computed. Time differences of arrival (TDOA) need to be computed between sensors. An overview of digital signal processing algorithms to locate and track sound sources is given in Chap. 4.

While the complexity of meeting the above requirements has limited the application of microphone arrays for animal localization and tracking in terrestrial environments, Mennill et al. (2012) successfully deployed an array of wireless microphones with integrated Global Positioning System (GPS) time synchronization to make accurate measurements of the position of a sound source by computing TDOAs of the same sound at different microphones. They discuss how this system may be implemented to monitor frogs, birds, and mammals. Jensen and Miller (1999) used a 13.5-m vertical, linear microphone array that allowed for simultaneous recordings of bat signals at three different heights of vegetation. With this design, they were able to calculate flight direction, altitude, and distance from the array.

The literature sometimes presents arrays of sensors that do not operate in tandem. Rather, sensors are widely spaced over a potentially large area, sampling independently without synchronization. The applications are not to locate and track individual sound sources, but rather to monitor a soundscape, compare animal presence/absence across sites, or evaluate environmental noise over a large area. During digital signal processing, noise levels might be compared across sites and perhaps interpolated to produce a noise map. For example, the Cornell Lab of Ornithology uses an array of 30 recorders to monitor animal habitat use on a wide spatial scale and to assess anthropogenic impacts (Fig. 2.8).

Fig. 2.8
figure 8

Noise pattern observed at Sapsucker Woods (Ithaca, NY, USA), caused by a jet plane taking off from a nearby runway. Receiver locations are denoted by white circles. Regions shaded red show high noise levels and follow parallel to the flight path. The ambient noise levels are raised by about 15 dB at the frequencies of chorusing birds (2 to 4 kHz). Image courtesy of Dimitri Ponirakis, K. Lisa Yang Center for Conservation Bioacoustics, Cornell Lab of Ornithology

2.3.1.1.6 Do-it-Yourself (DIY) Microphones

Microphones well-suited for bioacoustic studies can be built with microphone capsules costing only a few US dollars. Examples are the omnidirectional electret capsules from Primo Microphones Inc. (EM models)Footnote 4 or the PUI Audio Inc. AOM-5024 L model.Footnote 5 These capsules can be powered directly by PIP when connected to a handheld digital recorder, or powered with a battery and a simple electronic circuit. Adapters can be easily built to power PIP microphones with the P48 powering provided by professional recorders that do not provide PIP.Footnote 6 DIY microphones can be easily assembled to experiment with different spatial configurations, even in the focus of a parabolic reflector, or to have low-cost expendable microphones for very specific field tasks.

2.3.1.1.7 Deployment Considerations

In open-field environments, wind can affect signal reception by a microphone by causing non-acoustic noise, which is an artifact of turbulent pressure fluctuations at the external surface of the microphone. Such turbulent pressure fluctuations may be caused by the obstruction that the microphone itself presents. Turbulent air flow may also be caused elsewhere and produce noise artifacts in recordings as the perturbations travel past the microphone. Even a light breeze can produce strong low-frequency noise artifacts, which can overload the internal electronics or the recorder. Microphones can be fitted with a windsock to reduce wind noise. A windsock can be easily made with commercially available open-cell foam, which limits air flow but allows sound waves to reach the microphone membrane. For severe wind conditions, a fur-like cover is preferable (Fig. 2.9).

Fig. 2.9
figure 9

Photograph of a microphone setup with pistol grip and elastic suspension, foam windsock, and additional furry windsock for maximum wind protection. Reprinted with permission from Sennheiser

When aiming to record animals in a specific direction (e.g., a bird calling from a tree), a directional microphone should be used and pointed at the bird. It will focus sound recording in the direction of the bird and limit background noise from other directions. An alternative to a highly directional shotgun microphone is a cardioid microphone placed in the focus of a parabolic reflector (Fig. 2.10). The microphone is pointed toward the parabolic reflector, facing into the dish, not toward the animal. Ideally, the microphone’s beam pattern would be matched to the solid angle subtended by the reflector. The diameter of the parabolic reflector determines which frequency range of incoming sounds will be amplified (Fig. 2.11). To be reflected, the wavelength of the incoming sound must fit inside the dish. The lowest frequency a parabola can reflect, and thus focus on the microphone, depends on the dish diameter (Wahlstrom 1985). For a 1-kHz signal, a 30.5 cm diameter dish is fine, and for a 500-Hz signal, a dish of 61 cm in diameter is required. The very low frequency of a lion roar (40–200 Hz) would require a dish about 10 m in diameter.

Fig. 2.10
figure 10

Diagram of a parabolic dish and microphone used to record a bird on a tree. The parabolic solution gives added amplification and directivity, which helps in recording a single animal, a quiet animal, or animals at a distance

Fig. 2.11
figure 11

Sketch of frequency response and gain of a generic microphone placed in parabolas of different diameters. The red lines show the frequency response of an ideal microphone, with the option of a high-pass filter to reduce low-frequency noise below 80 Hz. The blue lines show the theoretical gain of three parabolas of different sizes. The gain is proportional to frequency and to the parabola diameter. Actual response may vary depending on the shape and depth of the parabola and on the response and positioning of the microphone

Compared to shotgun microphones, parabolic reflectors intercept a much wider quantity (proportional to the diameter and surface of the reflector) of acoustic energy and concentrate it on the microphone, thus providing a high gain. However, this gain is proportional to the frequency and the parabola diameter, thus producing a recording with increased high-frequency levels that requires equalization in post-processing (some parabolas can have equalization built-in). As a rule of thumb, the more wavelengths are contained in the parabola diameter, the higher the gain and greater the directionality. Because of these features, parabolas, with the right choice of microphones, can provide excellent recordings of very quiet, distant sources. For example, in a taxonomic and behavioral study of chipmunks (Neotamias spp.), Gannon and Lawlor (1989) used a 51-cm parabolic reflector with a Sennheiser ME-20 omnidirectional microphone and K3U preamplifier. Chipmunk calls were in the range of 4 kHz to 15 kHz, so this size dish was adequate for detecting this range of mid-frequency calls.

To produce a more pleasant recording, it is possible to record in stereo by using two microphones in the focus, separated by a thin plate. This way, sounds coming from the frontal axis of the parabola reach both microphones with the same level, while off-axis sounds are focused more on one side. Another option is to place an MS microphone combination in the focus of the parabola. Listening with headphones helps in pointing the parabola on the source of interest and gives immediate feedback on the quality of the sounds being recorded. When analyzing recordings made with a parabola, it is important to take into account that the frequency response is not flat as it increases with frequency (Fig. 2.11). In some cases, slightly moving the microphone out of focus reduces the high-frequency emphasis and produces a more pleasant sound.

2.3.1.2 Hydrophones

A hydrophone is a piezoelectric transducer that converts sound waves in water to electrical signals. Hydrophones can receive sound in air, but the sound has to be of very high amplitude. Because the acoustic impedances of the medium and the sensor match much better in water than in air, hydrophones have to be less sensitive, or they would easily overload. The underwater sensor usually is sealed in a resin package with a waterproof connector and needs to be handled with care. After use in saltwater, a hydrophone should be rinsed with freshwater or else connections are likely to corrode.

A piezoelectric transducer can be used as a sensor or projector; however, when the transducer has a built-in preamplifier, it can no longer be used as a projector, but only as a sensor. Hydrophones are much less sensitive, and a great deal of power is needed (from an external amplifier) to drive a hydrophone as a projector. As a sensor, a hydrophone can have a built-in preamplifier that matches the frequency response, dynamic range, and high impedance of the transducer. A few hydrophones on the market with built-in preamplifier (Fig. 2.12) can be powered directly by a recorder, computer, or analysis system (e.g., either by P48 or by PIP at 2–5 Vdc). Most preamplified hydrophones require powering through dedicated cables and can require single or dual powering (e.g., +12 V, or −12 V and +12 V) to be provided by a battery box (Fig. 2.12). A popular low-cost hydrophone is the H2c from Aquarian Audio,Footnote 7 which allows PIP powering. The DolphinEarFootnote 8 is an inexpensive, lightweight, battery-operated hydrophone with an external amplifier and headset that is good for ecotourism or classroom use. Other relatively low-cost hydrophones well suited for marine mammal studies are produced by Cetacean Research Technology.Footnote 9

Fig. 2.12
figure 12

Photographs of an ITC 6050C hydrophone with built-in preamplifier and external battery power (left) and a Cetacean Research Technology C57 hydrophone with cable and battery box (right; courtesy of J R Olson)

To record underwater sound in open water from a distant source, a sensitive hydrophone is needed. Good sensitivity would be −160 dB re 1 V/μPa. Such a hydrophone produces 1 V when receiving 160 dB re 1 μPa of acoustic pressure and 1 mV for a signal of 100 dB re 1 μPa. If used for recording a signal at 180 dB re 1 μPa, it will produce a 10-V output and may overload the connected electronics. To record underwater sound at close distance (e.g., in front of an echolocating dolphin which can produce pulses with source levels above 220 dB re 1 μPa m pk-pk), a low-sensitivity hydrophone is needed (e.g., one that has a sensitivity of −210 dB re 1 V/μPa). Very likely, such a hydrophone cannot be used for recording low-level sounds from a distant source because it requires high amplification and consequently produces high electronic noise. However, using hydrophones with built-in preamplifiers when powerful signals can occur risks overloading of the preamplifier, thus producing distorted signals. Erbe (2009) used four different hydrophone systems (differing in amplitude sensitivity) to record impulsive pile driving at ranges from 14 m to 1330 m.

Hydrophones can vary considerably in their frequency response; some are used specifically for low-frequency, mid-frequency, or high-frequency reception. Typically, hydrophones are smaller than the wavelengths that are being recorded. But, with the smaller sensor comes a lower energy input. This results in lowered sensitivity. Generally, the smaller the piezoelectric element, the broader the frequency range, but the lower the amplitude sensitivity. Lower sensitivity can require higher amplification, and thus can produce higher electronic noise. Piezoelectric hydrophones usually have a resonance peak in the upper part of their bandwidth, so that optimum operation of the hydrophone is along the flat portion of the frequency response curve below resonance. Reception at other frequencies could be used, but the difference in response of the hydrophone needs to be accounted for during analyses. Some studies require the use of multiple hydrophones to cover the entire frequency range of the animal’s sounds.

2.3.1.2.1 Hydrophone Directionality

Hydrophones, much like microphones, have directional receiving and transmitting characteristics, depending on the size and shape of the transducer (Fig. 2.13). Spherical transducers receive and transmit signals uniformly in all directions. With a cylindrical transducer, sounds are received and projected uniformly in the horizontal plane, assuming the transducer is suspended vertically. In the vertical plane, the transducer will have a directivity pattern. If the transducer has a planar shape, it will have two beams on its opposite faces as shown in the left polar plot in Fig. 2.13. When used as a sensor, a spherical hydrophone is typically omnidirectional (receives sounds equally from all directions) as shown by the right polar plot of Fig. 2.13. Used as a projector, the directivity pattern of a hydrophone changes depending on the frequency being projected (directivity increases with frequency).

Fig. 2.13
figure 13

Specifications and polar plot of directional ITC 3003D transducer (left) and omnidirectional ITC 1007 transducer (right). Reprinted with permission from Gavial ITC (https://www.gavial.com/itc-products; accessed 22 Aug. 2021)

2.3.1.2.2 Sonobuoys

A sonobuoy is a canister housing a hydrophone, dampening cable, battery, recording/transmitting electronics, and a transmitting antenna. Navies of the world use sonobuoys for underwater listening by deploying them from aircraft or ships. These devices also may be used for bioacoustic studies. Once a sonobuoy is deployed in saltwater, a battery is activated, which triggers the inflation (CO2) of a flotation balloon and antenna. The hydrophone and associated dampening cables can be set to drop to a pre-selected water depth (i.e., 30, 60, 120, or 300 m). During operation, the sonobuoy canister floats at the water surface with the antenna in the air and transmits acoustic data in real-time to a receiver onboard a vessel or aircraft or to a receiver at a station onshore. After a preset time (e.g., 1, 2, 4, or 8 h), a burn-wire penetrates the flotation balloon, and the sonobuoy fills with water and sinks to the seafloor.

Analog sonobuoys (Fig. 2.14) are available in two common configurations: omnidirectional sonobuoys (with a frequency response of up to 20 kHz) and DIrectional Frequency Analysis and Recording (DIFAR) sonobuoys, which provide bearing information on incoming signals. The latter type has been used to determine source levels and calling rates in cetaceans (e.g., Miller et al. 2015). The most recent generation of sonobuoys features a digital recording system and is equipped with GPS technology.

Fig. 2.14
figure 14

Photograph of a sonobuoy deployed from a ship to monitor whale sounds in the Mediterranean Sea (SOLMAR Project, http://www.unipv.it/cibra/res_solmar_uk.html)

2.3.1.2.3 Stationary Hydrophone Arrays

Stationary hydrophone array configurations include moorings (with or without surface buoy), seafloor packages, or cabled systems. Arrays of permanent, stationary hydrophones can be placed on the seafloor and connected via cables, either electrical or electro-optical, to processing centers located on shore. Multi-channel receivers allow listening or recording of sounds from multiple hydrophones. Typically, the array is optimized for long-range acoustic reception by using very-low-frequency sensors. Some bottom-mounted arrays are equipped with wideband hydrophones to allow scientists to monitor a wide variety of marine species, as well as ambient noise levels (e.g., Caruso et al. 2015; Favali et al. 2013; Nosengo 2009; Sciacca et al. 2015). Usually, these arrays are installed and maintained by navies, oceanographic organizations, or research centers for many years (see Chap. 1 for a list of past and current bottom-mounted hydrophone arrays deployed around the world).

2.3.1.2.4 Towed Hydrophone Arrays

A towed array contains several hydrophones (not necessarily of the same type), commonly housed in an oil-filled sleeve (Fig. 2.15), where the oil matches the acoustic impedance of sea water. Originally developed for navies and geophysical survey companies, towed arrays were bulky and expensive, and mainly received low-frequency sound (<15 kHz). In more recent years, lightweight, wideband towed arrays sensitive up to 100 kHz and more have been developed to meet the requirements of researchers aiming to study marine mammals from small platforms, such as sailboats (Pavan and Borsani 1997; Pavan et al. 2013). By simultaneously processing sound from more than one hydrophone (or group of hydrophones), the bearing (or even location) of the vocalizing animal maybe be determined (see Chap. 4, section on sound localization). Towed arrays are used for line-transect surveys and to sample animals in their environment over a wide geographic range.

Fig. 2.15
figure 15

Photograph of a towed array under water, developed by the University of Pavia (Italy), with the tow vessel in the background

A straight-line array cannot resolve between signals arriving from the port or starboard side without the vessel changing course or using multiple array deployments (Thode et al. 2010). Large arrays (sometimes hundreds of sensors, possibly with different frequency sensitivities and bandwidths) allow tracking of multiple sources simultaneously by selective beamforming (Zimmer 2011). More complex towed systems use a 3D hydrophone configuration called a volumetric array (Zimmer 2013) or vector sensors (Thode et al. 2010) to locate sound sources in three dimensions. Acoustic vector sensors are sensitive to particle velocity rather than to pressure and hence sense the direction of incoming sound waves and resolve the directional ambiguities. Thode et al. (2010) attached a vector sensor module to the end of an 800-m towed array to detect sperm whale clicks and compute unambiguous bearing estimates of whales over time.

Many towed arrays have a depth sensor, so the operator knows the tow-depth in relation to the sound velocity profile in the water column. Such information allows the user to position the array either in a surface duct or below the thermocline to listen to sounds coming from deep water (see Chap. 6 on sound propagation under water). Additionally, the depth information enables subsequent array processing to exploit the surface effects on sound propagation to improve localization accuracy.

Array performance is degraded (in particular below ~1 kHz) by vessel self-noise, hydrodynamic noise artifacts (flow noise), and non-acoustic mechanical vibration, which reduce the ability to capture low-frequency animal sounds and which can cause an acoustic overload of the recording chain. To mitigate these issues, tow speed should usually not exceed 6 knots. A long cable with special elastic sections in the array can dampen vibrations. Flow- and vessel-noise can be mitigated with a smooth high-pass filter (e.g., 500 Hz, 12 dB/octave; see Sect. 2.3.2.1).

2.3.1.2.5 Deployment Considerations

To operate properly, hydrophones must have little vertical or horizontal movement. Water flow over the surface of the hydrophone generates pressure fluctuations, which appear as noise in spectrograms but which are not due to an acoustic wave. This flow noise is an artifact of deployment (see Chap. 3, section on flow noise). It is typically of low to mid frequencies (see, for example, the spectrogram in Fig. 3 in Erbe et al. (2015) showing flow noise in marine soundscape recordings) and thus can be filtered out with a high-pass filter, but this limits the recording of low-frequency sounds. Large or rapid vertical or horizontal movement of a hydrophone (e.g., if it is deployed over the side of a boat) may cause the system to be saturated with no useable recordings collected. It is very difficult to make good recordings in the open ocean; a hydrophone often needs to have its own flotation system, rather than be suspended from a boat; otherwise, the movement of the boat will translate into movement of the hydrophone. The horizontal component of water flow past a hydrophone may be minimized by deploying freely drifting hydrophone systems (e.g., suspended from a freely drifting buoy). The vertical component of water flow past a hydrophone may be minimized by dampening systems; for example, suspending the recorder on a bungee with a movement-dampening drogue, or by using a catenary floatation line (see Chap. 3 and Fig. 5 in Erbe et al. 2019). In towed arrays, long towing cables and specifically designed hydrophones (acceleration-compensated) are used to avoid saturation of the hydrophones from movement.

2.3.2 Filters

Filters are used to minimize unwanted noise from the environment (including other animals) or electronic self-noise. Filters can be used while recording or during post-processing. Filtering during recording facilitates conserving recorder dynamic range for signals in the frequency band of interest. A filter can be a stand-alone unit (some also have an amplifier) or filtering can be achieved using software, either in real-time or in post-processing. Note that filters are not a “magic wand” to make a bad recording clean. While recording, filters can be used to suppress unwanted noise without affecting the sounds of interest only when the noise and the sounds do not overlap in frequency. If noise and sounds do overlap (in frequency, or in time, or both), it is possible to perform some filtering or noise removal in post-processing. However, the settings need to be carefully chosen. Some microphones and digital recorders (Sect. 2.3.4) have built-in selectable filters, often with selectable attenuation rates.

2.3.2.1 Low- and High-Pass Filters

Using a low-pass filter, the recordist can set a frequency above which signals are attenuated. A high-pass filter attenuates signals below a selected frequency. High-pass filters are often used to reduce low-frequency noise generated by wind and road traffic in terrestrial recordings and flow noise in underwater recordings. For example, to record a bird singing in the 2–5 kHz range, a high-pass filter set at 1 kHz will suppress traffic noise (which is typically below 500 Hz). A band-pass filter combines low-pass and high-pass filters. All filters have a transition bandwidth at the intersection of the pass band and the attenuation band, where there is a roll-off in the attenuation amount (steepness), which is normally expressed in dB/octave (e.g., 6 dB/octave in a smooth filter, or 24 dB/octave for a steeper filter). The greater the roll-off, the sharper the filter. However, sharper filters have longer impulse responses and generate longer artifacts in the output waveforms.

2.3.2.2 Anti-Aliasing Filters

Digital recorders and audio interfaces have built-in anti-aliasing filters with varied performances; whereas instrumentation recorders and instrumentation acquisition boards usually do not have built-in anti-aliasing filters and require a separate signal-conditioning device to perform filtering and adjust the signal level. The available filters have their specific shape and thus can influence the frequency response of the recording.

AD-converters (Sect. 2.3.4) in recording equipment (either stand-alone recorders or external converters connected to a computer) have relatively smooth anti-aliasing filters that attenuate frequencies starting somewhat below the Nyquist frequency, but do not completely cut out the signal at Nyquist. Attenuation at Nyquist is often in the range of 6–12 dB, and the maximum attenuation (the FZero of the filter) is located above the Nyquist frequency.

The anti-aliasing filter shape is rarely reported in equipment specifications; tests are required to evaluate the anti-aliasing performances of the AD-converter, in particular if wideband signals are to be recorded and analyzed. Concern for aliased components is required for any type of signal possibly exceeding the Nyquist frequency, including external interferences captured by the electronics and cables, as well as higher harmonics of the signals to be recorded. A laboratory test with a frequency-generator signal sweeping across the whole frequency range of the recorder and beyond the Nyquist frequency can reveal unexpected and unwanted performance by the converter.

2.3.3 Amplifiers

A preamplifier conditions the incoming signal from a transducer and boosts the signal before it is recorded. A preamplifier converts a weak electrical signal into a stronger, noise-tolerant output signal for further processing. Without preamplification, the recorded signal could be noisy or distorted. The preamplifier has a high input-impedance (i.e., it requires only a small current to sense the input signal) and a low output-impedance (so that when a current is drawn from the output, the change in the output voltage is minimal). In other words, a preamplifier converts a high-impedance input signal from a transducer to a low-impedance output signal. Besides lowering impedance, some preamplifiers also provide amplification (typically 20 to 26 dB). This is not true for most preamplifiers and hence they are typically paired with amplifiers. Preamplification should be constant across the recording bandwidth so as not to distort the signal. The frequency range and dynamic range specifications of the preamplifier and amplifier need to match other electronics in the recording system. For recording faint animal sounds or quiet soundscapes, the quality of the preamplifier is often an issue and must be considered carefully relative to the required use and the transducer to be connected.

An amplifier increases the signal gain after it is captured to drive the signal along a cable to the AD-converter without significantly degrading the SNR. Amplifiers can boost hydrophone signals as much as 60 dB (1000x). However, amplifying a signal will also increase ambient background sounds and self-noise; very high amplification could inadvertently make the noise level so high that desired signals cannot be recorded with good fidelity. Amplifiers for microphones are battery-powered and have high- and low-pass filters, which makes them useful for fieldwork.

Speakers include power amplifiers that drive a projector to generate high-amplitude acoustic signals in air or under water. The power amplifier provides the higher current to drive the speaker. Most power amplifiers used in high-fidelity home-entertainment systems also can be used in bioacoustic research. However, in some cases, more power and bandwidth are needed so that commercial broadcast power amplifiers must be used. No matter what class of amplifier or preamplifier is used, one should always consult the manufacturer’s manual. Over-amplification can “blow” a loudspeaker or underwater projector.

2.3.4 Analog-to-Digital Converters and Digital Recorders

Despite declared sampling frequencies and bit-resolution, AD-converters, either in a stand-alone recorder or in a computer audio-interface, are based on diverse technologies and can affect the quality of a recording. For example, delta-sigma converters have high noise at high frequencies, beyond the human hearing limits, which becomes evident in wide-bandwidth power spectra and spectrograms. Another problem is jitter from instability of the clock driving the AD-converter and the digital stream. Excessive jitter can reduce the quality of recordings and can be seen easily by analyzing a clean test tone. Jitter can produce both random artifacts (Fig. 2.16) and periodic artifacts with well-defined frequencies. Jitter cannot be minimized by the user because it is characteristic of a given device. AD-converters can be divided into two main categories: for musical use, generally limited to the standard sampling frequencies of 44.1, 48, 96, and 192 kHz, or for instrumental measures, with sampling frequencies ranging from 100 Hz to 1 MHz and more. Converters for the consumer and prosumer musical market have smooth anti-aliasing filters included, suitable for musical signals, and a high-pass filter usually set below 10 Hz; instrumentation converters do not have any filter on their inputs and will sample any signal starting from 0 Hz (DC coupling). When using instrumentation converters, aliasing problems must be considered, and external anti-aliasing filters must be included in the recording chain (see Sect. 2.3.2.2).

Fig. 2.16
figure 16

Spectrogram of a sinusoidal tone sampled at 44,100 Hz with a poor AD-converter (top panel). Note the low-intensity broadband noise (blue components) due to random jitter around the red line representing the tone’s central frequency. Spectrogram of the same sinusoidal tone sampled at 44,100 Hz with a good AD-converter (middle panel); the broad blue band is absent in this image. The bottom panel shows the constant amplitude of the signal waveform

An inexpensive and very portable AD-converter unit is PoScope’sFootnote 10 Mega1 sampling at 500 kHz at 12 bit and recording directly to a PC in PCM files via USB interface. However, the PoScope, as most industrial data acquisition systems, including most National InstrumentsFootnote 11 devices, has no anti-aliasing filter and the measurement needs to be sampled at a rate much higher than the highest frequency contained in the input signals. If the upper-frequency content of the signal (including any possible noise or interference such as those generated by video monitors, digital networks, and switching power supplies) is unknown, use a good-quality, low-pass external filter at the known or presumed upper cut-off frequency while recording and digitally filter and down-sample the recorded file thereafter. It is also important to consider that strong low-frequency sounds below the desired frequency range can limit the dynamic range at higher frequencies of interest, so using a high-pass filter at a selected low frequency while recording is recommended.

AD-converters are more commonly available in the consumer market as “digital recorders” that also include the circuitry to save recorded data to permanent storage (e.g., SD-cards or internal memory) and an interface for powering the other components (either from an external source or through internal batteries). Some digital recorders also offer built-in selectable high-pass filters, which can help reduce the low-frequency noises produced by handling and suppress wind or flow noises.

The frequency response of the digital recorder should be matched to the frequency response of the sensor–preamplifier–amplifier system as close as possible and to the needs of the research. The component with the narrowest frequency response is the limiting factor in the recording chain. All AD-converters have a maximum voltage range at the input that can be converted without overloading or clipping. The trick is to stay below the clip-level and still have good dynamic range and SNRs. Other important features in selecting the appropriate recorder are: the number of channels (e.g., 2, 4, 8, or more), durability, reliability for field-use, battery duration, flexibility and ease of use, maximum storage, integrated sensors (unidirectional or directional), inputs for external sensors, power options for the external sensors (P48 and/or PIP power), and the capability to connect a remote-control or a timer. Some recorders (especially many analog and digital tape recorders and video-cameras) use Automatic Gain Control (AGC) to keep the recorded volume within the same amplitude range. Other devices have an Auto Level Control (ALC) setting or a limiter function designed to avoid overloading or clipping. Some recorders indicate clipping either by a level-meter or with a flashing light. Any AGC, ALC, or limiter options should be disabled to perform comparisons among different sounds or different recordings and if true sound level measurements are needed. The gain level should remain constant throughout a recording, and noted; ideally, the sampling rate and gain settings should remain the same among recordings, at least for the same subject or context.

2.3.4.1 Recording Ultrasounds and Infrasounds

Ultrasonic recorders were developed mainly for bat and dolphin studies; however, other animal species also produce ultrasonic sounds (e.g., insects, frogs, and infant rodents). To record ultrasound requires a sensor with suitable frequency extension and a recorder or an AD-converter with a high enough sampling frequency. An affordable solution is available in the form of ultrasonic microphones with integrated high-speed AD-converter and USB interface (e.g., DodotronicFootnote 12 Ultramic family with sampling frequencies ranging from 200 kHz to 384 kHz). Dodotronic microphones do not need specific drivers and can be used on Windows, MacOS, and Linux, and also on Android smartphones. Recent models include support for internal storage (miniSD card) and powering with a USB battery box. The internal recorder can be set by Bluetooth to record on trigger or on a time schedule. Other similar devices are the Wildlife Acoustics Echo Meter Touch and Petterson Ultrasound Microphone. Another option for recording at very high sampling frequency is to use an instrumentation AD-converter like the PoScope Mega1+.

Many recorders are not suited for very-low-frequency recording. Most have a lower limit of 10–20 Hz; others can record down to 7–10 Hz. Recording very-low-frequency animal signals is complicated because this frequency range also contains environmental and electronic noise, which typically would be filtered out. For recording infrasounds (e.g., calls of elephants or baleen whales), it is important to check the specifications of the recorder and eventually make a bench-test of the available frequency range using a signal generator (a tone sweeping through a wide range of frequencies is a good test signal). An option is to use an instrumentation AD-converter with DC coupling.

2.3.4.2 Special Features of Digital Recorders

Pre-recording buffer memory allows the user to save the few seconds of sound before pressing the record button. Auto-start initiates the recording automatically when a certain input level is exceeded. Double recording allows a lower-level backup copy in case some parts of the primary recording are overloaded. With this method, the incoming sound is recorded twice, in two different files, the second stereo file is stored at some dB down from the first file. In terrestrial applications, a wired remote-control can be useful when it is required to hide or protect the recorder (e.g., from rain). A wireless remote-control, by Bluetooth or by Wi-Fi (wireless fidelity), allows controlling the functions and levels by a smartphone application, but this would consume additional power and could impact energy budgets. File time-stamping inserts the date and time of the recording in the file name, rather than just a sequential number. This is extremely helpful when storing and cataloging the recordings. Some recorders have a computer audio-interface or the ability to connect a computer to record directly on a laptop or a tablet. This option allows the same recording quality while using special software for managing files (e.g., to tag files with a time-stamp and GPS position, or to automatically start and stop the recording according to received signals or according to a user-defined schedule).

2.3.5 Equipment for Monitoring Bats

Acoustic detection of ultrasonic bat calls has emerged as the most commonly used method for monitoring bat presence and activity (Collins and Jones 2009; Gorresen et al. 2008; Weller and Baldwin 2012). Observing and recording bats, other than for scientific research, is a very diffuse hobby and a common topic of citizen science. This results in a wide variety of bat detectors produced by small companies or DIY bat detector kits. The common types of detectors are heterodyne, frequency-division, time-expansion, zero-crossing, and full-bandwidth digital recorders (Obrist et al. 2010). Some bat detectors have their own specific software, either free or to be purchased, for further processing of recorded data.

Heterodyning was the first developed system, completely analog, to shift one frequency (the incoming signal) to another by multiplying it with a second frequency (set by the user). The user can tune the detector (similar to tuning a radio) to select a frequency range accessing a small portion of the available received frequency. For example, with a bat detector (e.g., Pettersson ElektronikFootnote 13 D100) tuned to the 40–50 kHz range, the call of a bat at 45 kHz (such as the Pipistrelli bat, Pipistrellus spp.) is multiplied (heterodyned) by a frequency (43 kHz) generated by an internal oscillator. This produces sidebands at 88 kHz and 2 kHz (which are the sum and the difference of the two frequencies); the higher frequency is eliminated with filters and the lower frequency is broadcast to the listener and available for recording. This makes for a tunable, inexpensive bat detector that will quickly indicate if bats are in the area. Heterodyning offers a limited view of the ultrasonic spectrum but is still appreciated by many bat specialists.

Frequency-division transforms the available frequencies and replicates the bat call by converting it into a square wave (sine wave also used) at its zero-crossing points. This wave is then divided by a preset factor (usually 10), creating another square (or sine) wave at a lower frequency (e.g., a 40-kHz call is converted to 4 kHz). All sounds in the environment are converted in this way. As such, masking of bat calls by noise, or overlapping of calls from different individuals, can produce results that could become difficult to interpret. Many devices have filters and ways to lower or otherwise adjust background noise. However, this recording option is now obsolete because modern digital ultrasound recorders are capable of recording at very high sampling frequencies (upward of 200 kHz) and capture the full bandwidth.

Time-expansion bat detectors use an AD-converter to digitize sounds, convert them so that they are audible to the human operator, and store these digital signals to memory (usually SD-card). Reduction of the recorded frequencies expands the sounds in time (hence the name). Some modern digital bat detectors do convert ultrasounds to audible sounds in real-time by means of FFT processing (Pavan et al. 2001). However, there is a delay when the signals are retrieved and played back at a slower speed (so that they can be heard with some delay). A high-frequency modulated call that sounds like a quick click is heard as a descending note or whistle upon playback from time-expansion.

Zero-crossing is an algorithm for extracting primary frequency information by tracking when the waveform crosses the zero-amplitude level at certain rates. Zero-crossing bat detectors run constantly, wake up when certain frequencies are detected, and save information on zero-crossings in storage. Some advanced bat detectors also retain the amplitude envelope of the original call; however, they only track the most intense component of the call. Using zero-crossing, a bat detector documents the dominant frequency, so if, for some reason, a harmonic is dominant over the fundamental or other signals overlap the fundamental of the call, only the most intense frequency is recorded. The operator needs to recognize this in order to represent the true nature of the bat’s signal. The recordings produced by zero-crossing detectors are usually small (e.g., 50 KB), whereas an equivalent recording of full-spectrum calls consumes considerable storage space (e.g., 5 MB per call).

Full-spectrum digital bat detectors are digital recorders with high sampling frequency that capture the full bandwidth of the call (Dannhof and Bruns 1991; Moir et al. 2013). In some detectors, it is also possible to hear sounds in time-expansion while recording continuously. These bat detectors can record continuously or only when there are signals in a given frequency band set by the user (triggered recording); this solution reduces the storage size and shortens the time needed to analyze the recordings as only call series are recorded. Different trigger parameters allow selecting the frequency range to be recorded (spectral trigger) and the threshold level to activate the recorder. This technology is available in handheld and autonomous recorders (see Sect. 2.4.1), and computer-based bat detectors that use an external ultrasonic microphone. Some of the more advanced handheld digital bat detectors incorporate a display to visualize detected calls, and also include frequency-division, time-expansion, or frequency-shifting to provide acoustic feedback to the operator.

Some frequency-division detectors are combined with heterodyne and time-expansion capabilities into one unit. The Ciel CDB301 combines both a heterodyne detector with a frequency-division detector, allowing the researcher to tune into the frequency of a known bat call and identify a bat by both its sound contour and frequency. At the same time, the detector monitors the whole frequency band and checks if there are any bats in the vicinity. The Pettersson D240, like many of these dual bat detectors, provides heterodyning ability on one channel and time-expansion on another. Connected to a voice-activated digital recorder, these detectors can be left in the field in monitor mode and retrieved data can be analyzed on a PC using the product’s software (e.g., BatSound). The Anabat Walkabout (Fig. 2.17) records bat signals using the zero-crossing technology and also saves signals as full-spectrum WAV files compatible with SonoBat software. The calls can be heard and displayed at the same time and saved to disk, making species identification instantaneous. Units are compact, mobile, and well-suited for long-term monitoring. Solar-powered units with detachable solid-state hard drives allow for greater periods of use.

Fig. 2.17
figure 17

Some of the detectors discussed in this section. (a) Dodotronic USB Ultramic 384BLE, (b) Wildlife Acoustics (http://www.wildlifeacoustics.com/; accessed 15 Mar. 2021) Echo Meter Touch 2 Pro connected to an iPad and to a smartphone, (c) Anabat Walkabout (Titley Scientific (http://www.titley-scientific.com/; accessed 15 Mar. 2021)), and (d) D1000X bat detector by Pettersson Elektronik. Permission given by the respective manufacturers

For teaching or demonstration, any detector is useful, but one may consider heterodyne types of detectors because of their low cost (i.e., every student could use one). An interesting and flexible option is represented by ultrasonic microphones that incorporate a high-speed AD-converter that can be connected by USB to any computer platform (Windows, MacOS, Linux, iOS, Android, or Raspberry). The Dodotronic Ultramic series, the Wildlife Acoustics Echo Meter Touch, and the Petterson M500 are great devices for classroom demonstration. They allow to record ultrasounds continuously or on trigger with a companion tablet or smartphone, and provide full-spectrum recording capability, audio feedback, and real-time visualization. Some of these manufacturers also provide software for either basic operations, such as recording and display, or more advanced tasks such as bat species identification.

2.3.6 Projectors

Playback studies to investigate animal behavior have been used on many different taxa (see Chap. 3, section on playback methods). The projectors used for broadcasting in air and under water also have, like the sensors, their characteristic frequency response and operational frequency range. Equipment with suitable characteristics should be chosen appropriately based on the characteristics of the sounds to be transmitted. Usually, speakers are electrodynamic devices; however, for high frequencies, electrostatic speakers are also used. At high amplitudes, projected sounds can distort. One must look in the manufacturer’s manual to check maximum amplitude output of the projector and select a unit sufficiently capable of producing amplitude output similar to the level an animal would encounter. Generating sound in water requires more energy than in air, because of the higher impedance and density of water.

Among loudspeakers, some common names are used to describe their general operational frequency range: a tweeter is a high-frequency speaker typically small in diameter and a woofer is a low to very low frequency speaker that is much larger in diameter than a tweeter. A system with detachable loudspeakers can be convenient for placing speakers close to an animal or on opposing sides of an animal.

For underwater applications, there are two types of projectors: electrodynamic devices and transducers with piezoelectric elements. An electrodynamic device functions like an in-air speaker, but is watertight and can be used at shallow depths. For example, a swimming pool speaker (Lubell,Footnote 14 Fig. 2.18) is an inexpensive electrodynamic device, but has a narrow frequency range that is relatively flat. On the other hand, piezoelectric projectors have projection sensitivity that varies with frequency. Note that many of the piezoelectric projectors are two-way or reciprocal devices that can also receive acoustic signals in water. The receiving sensitivity is fairly flat for a large portion of the operative frequency range; on the contrary, when working as a projector, the amplitude of the generated signal typically increases with frequency.

Fig. 2.18
figure 18

Photograph of JA Thomas lowering a Lubell underwater speaker into a melt hole to play back underwater vocalizations to Weddell seals (Leptonychotes weddellii) in the Antarctic

2.4 Autonomous Recorders

Autonomous recorders combine the different components of the signal chain (sound sensing, amplifying, filtering, and digitization) to offer a packaged solution. A variety of autonomous passive acoustic monitoring (PAM) systems have been developed, which allow the documentation of acoustic activity from animals and the environment. Autonomous recorders (both terrestrial and aquatic) are programmable and can be set up to satisfy specific needs. These systems can obtain long-term (months to years) data from remote areas and operate independent of weather and light conditions (e.g., Lammers et al. 2008; McCauley et al. 2017; Obrist et al. 2010). Some recorders generate recordings in popular formats (e.g., WAV files) that are compatible across several analysis software packages, whereas others generate a device-specific file format requiring the use of a specific software program for analyses. Autonomous recorders eliminate the influence of an observer’s presence on the animal’s behavior, are non-invasive, operate remotely, allow systematic periodic sampling, and provide long-term recordings.

2.4.1 Terrestrial Recorders

Autonomous recorders are used to study airborne sounds from terrestrial animals on a long-term basis, during day and night, during any type of weather, and in areas where the animals might not be visible because of vegetation. They are low-power, digital recorders with extended data storage capabilities enabling the recording of sounds for extended periods, continuously, or on a pre-defined schedule (e.g., record x hours before and after sunset or sunrise, or for x min every y min). Important features of autonomous recorders in the field include: battery duration, total recording time, recorder reliability, programming capabilities, weatherproof construction, tamper-proof setup, ease of data-retrieval, and possible interface with video. The frequency response, dynamic range, and amplitude sensitivity of the unit are determined by the sound sensor, preamplifier, amplifier, and AD-converter used. By using a GPS or a highly precise internal clock, individual recorders can be time-synchronized. This allows measuring the TDOA of sounds among multiple recorders to triangulate and locate a sound source (see Chap. 4, section on localization). Another option is triggered recordings. For example, when the energy in certain frequency bands exceeds a preset threshold, data are recorded. This can reduce the amount of data to be stored onboard. Recorded data can be retrieved manually from the recorder or remotely via wireless methods. The more advanced units feature Wi-Fi, cellular network, or satellite communication interfaces for data transmission to a remote server. For instance, Pavan and team used autonomous recorders (Wildlife Acoustics SM3 and SM4) to document airborne sounds for six years at three locations with 10-min samples every 30 min (Fig. 2.19) (Pavan et al. 2015; Righini and Pavan 2019). Bat nocturnal activities were monitored via ultrasonic autonomous recorders (Wildlife Acoustics EM3+ and SM4BAT-FS) and an ultrasonic USB microphone (Dodotronic Ultramic 250 K) connected to a PC-tablet.

Fig. 2.19
figure 19

(a) Photograph of autonomous acoustic recorders placed in the Sassofratino Nature Reserve, Italy. In the foreground, a Wildlife Acoustics Song Meter SM3. In the background, a custom recorder developed at the University of Pavia. (b) Wildlife Acoustics Song Meter SM4BAT-FS. (c) Titley Scientific Anabat Express. Permission to reprint by the respective manufacturers

The increasing interest in acoustic monitoring in the last few years has stimulated the development of many autonomous recorders; among these, the Wildlife Acoustics series, the Bioacoustic Audio Recorder (Frontier Labs,Footnote 15 Brisbane, Queensland, Australia), the Swift (Cornell Lab of Ornithology, Cornell University, Ithaca, New York, USA), and the Anabat Express (Titley Scientific, Brendale, Queensland, Australia). Some recent open-source examples are built around the Raspberry Pi and similar small-board computers. In some cases, the projects are open access. However, these devices often require large batteries to sustain power over long periods. Examples include the Solo acoustic monitoring platformFootnote 16 (Whytock and Christie 2017), based on the Raspberry Pi and an external microphone; the Bat Pi 2Footnote 17 for monitoring bats; and the AURITA system, which combines in a waterproof package the Solo recorder and a commercially available bat recorder, the Peersonic RPA2, to capture sounds from 60 Hz to 192 kHz (Beason et al. 2018). The AudioMoth,Footnote 18 an open-source device, which also can be purchased and assembled, employs a low-power microcontroller and an onboard MEMS microphone (Hill et al. 2018) and has very basic capabilities but allows remote data acquisition at very low cost on a single channel with sampling frequencies up to 384 kHz.

2.4.2 Underwater Recorders

Over the past few decades, interest in marine bioacoustics and in underwater noise monitoring have increased worldwide, and the market for underwater autonomous recorders is rapidly expanding. Autonomous recorders with a variety of features (such as operational longevity, high depth rating, onboard processing, and communication capabilities) are produced by several commercial organizations and academic entities. Examples of commercially available recorders are the AMAR from JASCO Applied Sciences,Footnote 19 Snap from Loggerhead Instruments,Footnote 20 AURAL from Multi-Électronique,Footnote 21 icListen from Ocean Sonics,Footnote 22 SoundTrap from OceanInstrumentsNZ,Footnote 23 EAR from Oceanwide Science InstituteFootnote 24 (Lammers et al. 2008), and RESEA from RTSYS.Footnote 25 Academic recorders include the Rockhopper by Cornell Lab of Ornithology (upgraded variant of MARU; Klinck et al. 2020), USR by Curtin University (McCauley et al. 2017), and HARP by Scripps Institution of Oceanography (Wiggins and Hildebrand 2007). Selection of a particular type of autonomous recorder is driven by the needs and limitations of the research project. Most of these modern recorders support recording at 16- and 24-bit resolutions and offer flexibility to record at different sampling frequencies and to program custom duty cycles. Some even offer the flexibility to easily switch components (e.g., choosing hydrophones with appropriate sensitivity or frequency range). With the market for these recorders expanding, there are numerous options available beyond the few products mentioned here.

In very shallow waters, at depths reachable by a diver, deployment and recovery operations can be relatively easy. At greater depths, specific additional equipment is needed to allow the recovery—typically, a ballast (to secure stability on the seafloor), an acoustic release, and floaters to retrieve the recorder at the surface once the releaser disconnects the recorder from the ballast (Fig. 2.20). Anchored units are sometimes also diver-recovered or programmed to surface at a set date and time. In ice-covered habitats, the equipment can be secured to fast- or pack-ice with the hydrophone in the water.

Fig. 2.20
figure 20

Schematic of a mooring setup for the Rockhopper autonomous passive acoustic recorder (Klinck et al. 2020). The example includes a wide-bandwidth hydrophone from HighTech Inc. (http://www.hightechincusa.com/; accessed 15 Mar. 2021) (HTI-92-WB), but the recorder offers flexibility with hydrophone choices

2.5 Recording Directly to a Computer

Almost all computers, laptops, and tablets have an audio input and built-in microphone. Digital recording of sounds is controlled by the onboard soundcard. However, in most cases, the recording quality of the built-in microphone is only conducive for recording human voice or music and inadequate for animal sounds. For most animal recordings, an external sound sensor (microphone or hydrophone) connected to a high-quality audio input must be used with the computer or laptop. The recordist should consult the computer specifications to know the frequency range and dynamic range of the built-in soundcard. If the built-in sound system of a computer is not good enough, an external AD-converter can be easily connected by USB, or, for special devices, by other interface types. For fieldwork, it is preferable to choose converters with powering from the computer USB. The quality of recordings depends on the preamplifier noise and bandwidth, sampling rate, and bit-resolution of the soundcard or AD-converter. However, other features can drive the choice: number of channels, features of the AD-converter, the type of interface (USB, Firewire, Thunderbolt, or proprietary), availability of drivers for the computer, and power available for the sensors (P48 or PIP). For laptops used in fieldwork, their size, weight, ruggedness, power consumption, and reliability should be considered. Most USB-based converters for music recording are equipped with microphone preamplifiers with P48 power and offer good quality; some offer very high quality, comparable to the best digital recorder, with sampling frequencies up to 192 kHz with a number of channels ranging from 2 to 8; some external units provide up to 32 channels. Single-channel AD-converters are also available to be directly connected to a P48 microphone, to transform the microphone into a USB microphone. However, because some quality parameters are rarely described in official specifications (e.g., the self-noise, jitter-noise, and the anti-aliasing-filter used), conducting laboratory or bench tests to choose the best AD-converter can be necessary. For specific applications, the use of instrumentation AD-converters may be required.

2.6 Calibration

For quantitative animal bioacoustic studies, calibrated recording equipment needs to be used so that absolute sound pressure can be determined. This section deals with two types of calibration: calibrating the recording equipment and calibrating the recording. To calibrate the recording, the calibration of the recording equipment is applied to the recorded data.

Calibrating the recording system implies determining the frequency response and amplitude sensitivity of the recording system. The recording system consists of several components (e.g., sensor, amplifier, and AD-converter), each with its own frequency response and amplitude sensitivity. The recording system may be calibrated as a whole by presenting a calibration signal of known amplitude and measuring the output. From the difference between output and input, the frequency response and amplitude sensitivity may be calculated. Or, each piece of equipment may be calibrated separately, and the frequency responses and amplitude sensitivities may be joined (i.e., multiplied in linear terms or summed in logarithmic terms).

The simplest calibration signal is a sine wave (i.e., a pure tone; Fig. 2.21). While the rms value is typically used in equipment calibration sheets, the peak (pk) or peak-to-peak (pk-pk) values are more easily read off signal displays on a computer or oscilloscope. For a sine wave, the conversion is:

Fig. 2.21
figure 21

Waveform of a sinusoidal signal (pressure p as a function of time) showing prms, ppk, and ppk-pk

$$ {\displaystyle \begin{array}{l}{p}_{rms}=\frac{p_{pk}}{\sqrt{2}}\approx 0.707\times {p}_{pk}\\ {}\iff 20{\log}_{10}\;\frac{p_{rms}}{p_0}=20{\log}_{10}\;\frac{p_{pk}}{p_0}-20{\log}_{10}\left(\sqrt{2}\right)\approx 20{\log}_{10}\;\frac{p_{pk}}{p_0}-3\;\mathrm{dB}\end{array}} $$

The variable p denotes pressure. The reference pressure p0 is 20 μPa in air (i.e., for microphone calibration) and 1 μPa in water (i.e., for hydrophone calibration); also see Chap. 4 on an introduction to quantities and units. To add to the confusion, the dynamic range of analog electronics and AD-converters is given in pk-pk values. The simple equation is only valid for sinusoidal signals.

Using a sine wave yields an amplitude sensitivity at only one frequency. In order to measure the frequency response of the equipment, a series of sine waves at different frequencies needs to be presented. More commonly, white noise (i.e., a broadband signal of equal amplitude across frequency) is used and amplitude sensitivity is determined at all frequencies contained in the signal after Fourier transform of the output signal (see Chap. 4).

A simple recording setup is shown in Fig. 2.22. A calibration signal p(t) (i.e., pure tone or white noise of known amplitude) is presented to the sensor (i.e., microphone or hydrophone). The sensor has a sensitivity s, which relates the voltage V at its output to the pressure p at its input; so s has the unit V/Pa. The sensitivity can also be expressed in dB re 1 V/Pa: S = 20 log10 (s/(V/Pa)). The output voltage V of the sensor is typically passed to an amplifier. The amplifier gain g relates the voltage at its output to the voltage at its input and is thus unit-less: g = V2/V1. Expressed in dB, the amplifier gain is G = 20 log10 (g). The output voltage of the amplifier is then passed to an AD-converter such as a soundcard on a computer. The AD-converter has a digitization gain c, that relates the digital values d in the audio file to the voltage V at its input. The bit-depth of the AD-converter limits the maximum digital value (i.e., the full-scale value FS) that can be stored. The digitization gain is defined as the ratio of the full-scale value to the input voltage that produces the full-scale value: c = FS/Vmax. The digitization gain is expressed in dB re FS/V. The sensitivities (in linear terms) of each component in the recording system can be multiplied to yield the system sensitivity, which relates the digital values d in the audio file to the pressure p sensed by the sensor. In logarithmic terms, the overall system sensitivity is the sum of the sensitivities of each piece of equipment.

Fig. 2.22
figure 22

Sketch of a generic recording system consisting of a sensor (i.e., microphone or hydrophone), amplifier, and AD-converter (e.g., a computer with soundcard). Each piece of equipment has its own sensitivity or gain (indicated by red letters). These sensitivities may be expressed in linear terms (small letters) or decibels (capital letters). The sensor converts the input pressure time series p(t) to a voltage time series V1(t), which is amplified to yield V2(t). The AD-converter produces a digital time series d(t)

Once the recording system has been calibrated, it can be used to record animals or other sound sources. To determine the calibrated pressure time series p(t) from the stored data d(t), divide by all the sensitivities and gains: p(t) = d(t) / (c g s). Alternatively, using the level quantities (in dB) for each equipment, the received level RL (e.g., rms sound pressure level) is determined by subtracting all sensitivities and gains from the rms amplitude level D: RL = D – C – G – S. For example, somebody made a 10-minute recording of a singing bird. The microphone sensitivity was s = 50 mV/Pa, or S = 20log10(0.05) = −26 dB re 1 V/Pa. The amplitude at the output of the microphone was amplified by, let’s say, a factor g = 100, or G = 20log10(100) = 40 dB. The soundcard produced a full-scale amplitude at 2 V input: c = FS/2 V, or C = 20log10(1/2) = −6 dB re FS/V. A computer is used to process the data. If the data are read using the MATLAB (The MathWorks Inc., Natick, MA, USA) function audioread with the flag “native,” then the raw digital values are presented. With the flag “double,” the data are normalized by the full-scale value and so lie between −1 and +1. Computing the rms amplitude of the normalized digital time series yields a value of, let’s say, 0.06. In logarithmic terms, the rms amplitude level of the stored normalized data is D = 20log10(0.06) = −24 dB. What was the received sound pressure level of the bird song? Subtracting all the gains, the rms sound pressure level received at the microphone was −32 dB re 1 Pa (because −24 (−6) 40 (−26) = −32). The standard reference pressure in air is, however, 20 μPa, which is equivalent to 20log10(20/1,000,000) = −94 dB re 1 Pa. So, the rms sound pressure level recorded from the bird was −32 −(−94) = 62 dB re 20 μPa. The researcher might further want to compute calibrated sound spectrograms of the bird song, and so the question is how to convert the digital values to pressure values. Using the linear sensitivities and gains, p(t) = d(t) / (FS / 2 V) / 100 / (0.05 V/Pa) yields pressure samples in units of Pa.

2.6.1 Microphone

To make accurate recordings of sound intensity in the laboratory or field, either from an animal or a different source, a researcher should always use a calibrated microphone. A commercial microphone is calibrated when received from the manufacturer and comes with specification sheets containing amplitude sensitivity, frequency response, and reception directionality as a function of frequency in the horizontal and vertical planes. For example, the ½-inch microphone shown in Fig. 2.23a has an amplitude sensitivity of 12.5 mV/Pa or −38 dB re 1 V/Pa and a flat frequency response (to within 3 dB) from about 3 Hz to 40 kHz (Fig. 2.23c). Given its cylindrical symmetry, it is omnidirectional about its vertical axis (Fig. 2.23b). In the vertical plane, its receiving directionality is steered toward its axis; in other words, it is most sensitive in the forward (i.e., vertical in Fig. 2.23b) direction. The lower the frequency, the more receptive it becomes from other directions. To check that the microphone maintains its sensitivity over time, a bioacoustician should periodically use a calibrator. For example, the calibrator shown in Fig. 2.24 is very stable and emits a 1 kHz tone at 94 dB re 20 μPa.

Fig. 2.23
figure 23

Specifications of a Brüel & Kjær 1/2-inch free-field microphone type 4191. (a) Photo. (b) Polar plot of receiving directionality from 16 kHz to 40 kHz. c. Graph of frequency response. Permission to reprint from Brüel & Kjær

Fig. 2.24
figure 24

A sound level calibrator (LUTRON, model SC-941) that generates 94 dB re 20 μPa at 1 kHz. The microphone to be calibrated must be inserted in the hole (1/4 inch diameter) on the left side. Adapters are available to fit other microphone diameters

Provided there is a commercial, calibrated microphone available, a researcher can calibrate a microphone of unknown sensitivity by comparison with a calibrated microphone. Using a loudspeaker system to do this is a convenient option. Alternatively, signals of opportunity, like roadway or jet noise, may also be considered while ensuring that both microphones receive the same signals and levels. First, calibrate the sound field at the frequencies of interest with the calibrated microphone. Then, replace the calibrated microphone with the one of unknown sensitivity and record the output in the same frequency range. Do not place the two microphones side-by-side in the sound field since this could cause diffraction and distortion of the sound field. The sound field should not contain echoes, so choose an open space or an anechoic room for low frequencies. In the example of Fig. 2.25, the calibrated microphone has a sensitivity of 50 mV/Pa. In the given sound field, it produces an output signal with an amplitude of 0.3 voltage units. After the calibrated microphone has been removed and the to-be-calibrated microphone has been installed at exactly the same location, the latter produces an output signal of 0.7 voltage units. The sensitivity of the to-be-calibrated microphone is simply 0.7/0.3 × 50 mV/Pa = 117 mV/Pa.

Fig. 2.25
figure 25

Sketch of a setup to calibrate a microphone of unknown sensitivity with a microphone of known sensitivity in a constant sound field. Redrawn from a laboratory manual with permission from Lasse Jakobsen, Institute of Biology, University of Southern Denmark, Odense, Denmark

2.6.2 Hydrophone

High-quality commercial hydrophones are calibrated by the manufacturer with all pertinent information contained in the accompanying specification sheets. Many hydrophone types have built-in preamplifiers with amplification and impedance matching. Thus, these hydrophones come with a calibration sheet having one sensitivity value that includes the preamplifier. The sensitivity of a hydrophone is usually expressed in dB re 1 V/μPa, which is different from the expression for microphone sensitivity (dB re 1 V/Pa).

To use RESON hydrophones as examples, their most sensitive hydrophone (i.e., the one with the least negative sensitivity: TC4032; Fig. 2.26) has a sensitivity of −170 dB re 1 V/μPa (single ended). If the sound received by the hydrophone were 170 dB re 1μPa rms, then the output from the hydrophone would be 1 V rms. To compare this to a microphone, add 120 dB, which is a factor 106 in pressure (20 log10 (106) = 120 and 106 μPa = 1 Pa). So, −170 dB + 120 dB yields −50 dB re 1 V/Pa. The most sensitive ½- or 1-inch microphone is −26 dB re 1 V/Pa, which is 24 dB (i.e., about 16 times, because 20log10(16) = 24) more sensitive than the TC4032 hydrophone.

Fig. 2.26
figure 26

Graph of amplitude sensitivity and frequency response for several RESON hydrophones with preamplifiers. The most sensitive is the TC4032; the least sensitive is the TC4035. Permission to reprint from RESON (http://www.teledyne-reson.com/; accessed 15 Mar. 2021)

Although most hydrophones are stable through time, it is wise to check the calibration periodically using a pistonphone. However, a pistonphone can determine the sensitivity of an uncalibrated hydrophone at only one frequency. The sound pressure of a pistonphone is extremely stable and is only affected by one factor: barometric pressure. For this reason, a special barometer is included with the pistonphone. For accurate calibrations, the barometric pressure should be checked, and sound pressure adjusted according to the scale on the barometer. For calibrations performed near sea level (as is often the case in marine bioacoustics), this error is negligible, but if one is working in an aquatic environment that is significantly above sea level, then this factor (which is −2 dB at 2000 m altitude) should be included. For hydrophones to be deployed at great depth in the ocean, the amplitude sensitivity (and pressure resistance) should be measured in a pressure chamber.

The frequency response of an uncalibrated hydrophone (for frequencies up to a few kHz) can be measured in air by using the same method as described for a microphone (Fig. 2.25). However, for higher frequencies, this should be done in open water (e.g., a deep lake) and the method described for microphones can be used by simply substituting the microphone with a hydrophone of known sensitivity compared to one of unknown sensitivity. An appropriate amplifier and an underwater projector are needed, but a hydrophone without a built-in preamplifier also can be used as a projector. First, the environment (lake, pool, or tank) should be checked for echoes and reverberations (see Popper and Hawkins 2018 for details). The projected calibration sound must be a pulse that ends before the first echo arrives at the sensor. This necessity restricts the frequency range that can be used for calibration since the projected pulse must be ramped up and down to reduce high-frequency artifacts caused by the onset and end of the pulse.

The next step is to determine the received level of an underwater sound. For example, a dolphin click is recorded with a TC4035 hydrophone, which has a sensitivity of −215 dB re 1 V/μPa (Fig. 2.26). If the output is amplified by 60 dB (1000x) and the recorded signal is 1.2 V pk-pk, then the received level is: 20 log10 (1.2) – 60 – (−215) = 1.58 − 60 + 215 ≈ 157 dB re 1 μPa pk-pk. Usually, the analog voltage signal is converted to a digital signal by an AD-converter, which has a digitization gain that also needs to be accounted for (see above).

2.6.3 AD-Converter

A 16-bit AD-converter has 216 bit resolution, covering 65,536 counts peak-to-peak. Its full-scale value is 216–1 = 65,535 in unipolar mode, where the digital amplitude values lie between 0 and 65,535, or 215 = 32,768 in bipolar mode, where the digital amplitude values are in the range −32,768; . . ; 0; . . ; 32,767. In decibels, the dynamic range of a 16-bit AD-converter in bipolar mode is 20 log10 (32,768) = 90 dB. Every bit gives ~6 dB of dynamic range in the digital domain. But a 90-dB dynamic range rarely can be realized since most electronics used before AD-conversion do not have such a large dynamic range. A 24-bit converter in bipolar mode offers a theoretical dynamic range of about 138 dB; however, only the most sophisticated electronics can provide up to 115–120 dB of dynamic range. This means that there cannot be more than 19–20 bits of real dynamic range and the remaining bits (least significant bits) are just filled by noise. AD-converter specification sheets rarely show this, thus there is growing need to have more realistic AD-specifications to account for the intrinsic AD-converter noise and its artifacts showing as distortion and jitter. In some recording systems, the least significant bits are used to encode complementary information; however, this practice is not standard.

AD-converters thus carry an intrinsic digitization gain, which is the ratio of the full-scale value to the input voltage that leads to full-scale. The digitization gain is expressed in dB re FS/V. For example, an AD-converter with a digitization gain of −6 dB re FS/V reaches its FS value at a peak input voltage of 2 V, because 20 log10(FS/2 V) = −6 dB re FS/V. AD-converters may be calibrated with a voltage signal generator. The peak voltage of the input signal has to be less than the maximum voltage range specified in the specification sheet; otherwise, the AD-converter will be overloaded and the signal clipped.

2.6.4 Autonomous Recorder

Off-the-shelf autonomous recorders are manufacturer-calibrated. The specification sheets typically give one overall amplitude sensitivity and frequency response for the entire system (including sensor, amplifier, and AD-converter). If the recorder allows variable gain settings, then the chosen gain will affect the amplitude sensitivity and needs to be accounted for. Some manuals (e.g., the SoundTrap User GuideFootnote 26) provide guidance on how to calibrate the recorded data if read by software packages such as MATLAB, PAMGuard, or Audacity.

2.6.5 Measuring Self-Noise

When intending to record quiet sounds or ambient sound levels in the absence of nearby sound sources, it is important to first measure the system self-noise to avoid confounding electronic noise with environmental noise. For this, the system should record in a quiet room and the sound sensor should be in a sound- and vibration-proof box (Fig. 2.27). If using an autonomous recorder, the entire system should rest in a sound-proof box.

Fig. 2.27
figure 27

Diagram of equipment to measure underwater ambient noise. The RESON hydrophone with lowest self-noise is the TC4032. Prior to deployment, system self-noise may be determined by recording with the hydrophone in a sound- and vibration-proof box in the laboratory. Permission to reprint from RESON

To record quiet sounds under water or to accurately quantify ambient sea noise, a sensitive hydrophone with a wide frequency range is needed (e.g., the TC4032, Fig. 2.26). All of the system components should have low self-noise. A “wet-ground” ground-wire from the input equipment to the water might be necessary to reduce system noise. The amplifier should have an adjustable band-pass filter to avoid aliasing during direct digital recording. The AD-converter needs sufficient bit-resolution and sampling rate to cover the frequency band of interest. The system frequency response shown in Fig. 2.27 goes up to about 100 kHz. If the full bandwidth is desired, then the sampling frequency should be at least 200 kHz. When reporting measured levels, provide the frequency range over which sound was measured and the bandwidth over which sound levels were computed (e.g., per Hz or in 1/3-octave bands).

2.7 Other Gear

2.7.1 Sound Pressure Level Meter

SPL meters, also called phonometers, are used to measure ambient noise, including abiotic and biotic sounds. SPL meters have a variety of settings for transient vs. continuous sound, frequency range, amplitude range, and any weightings (Brüel and Kjær 2001). The microphone on an SPL meter is omnidirectional, can be covered with a windsock, and mounted on a tripod. The fast-setting is used for impulse or transient sounds. The slow-setting is used for continuous sounds. Most SPL meters have a selectable frequency range. The user can select a flat setting, which collects dB measurements equally over the desired bandwidth (i.e., without weightings). The A-weighting is selected when the user desires to place a filter over the sampled frequency range in an effort to account for the relative loudness perceived by the human ear (see Chap. 4, section on weighting curves). However, it is important to not underestimate the impact of infrasounds, which can be heard or perceived by animals. The C-weighting is selected when the user desires to measure the peak sound pressure level. Measurements with these filters are expressed as dB(lin), dB(A), or dB(C). To measure environmental noise over the whole spectrum (especially for species with unknown hearing curves), it is important to use the unweighted, flat setting. At low frequencies of anthropogenic noise, the type of weighting used can make a large difference in the amplitude measurement.

Out of the various measures an SPL meter may report, the most common one is perhaps the Equivalent Continuous Sound Level (Leq), which is a time-average: the equivalent constant SPL that would produce the same energy as the fluctuating sound level measured over a given time interval (e.g., 60 s). The duration of the measure must be declared as Leq,T (e.g., Leq,60s), where T is the time interval of the measurement. The level may be weighted (e.g., A or C weighting). LAeq is often used in the assessment of noise dose or sound exposure in humans (Fig. 2.28). For example, LAeq,1s = 73 dB or Leq,1s = 73 dB(A) is a measurement taken with an A-weighting filter over 1 s and LCeq,1s indicates a measurement taken with a C-weighting filter for 1 s.

Fig. 2.28
figure 28

Recording and spectral analysis of noise in a residential area. Recording (top) of the overall sound level (A-weighted) with the LAeq level of the shown period. The unweighted spectrographic image (bottom), with frequency up to 20 kHz on a logarithmic scale, shows the spectral composition of the recorded period. At about 20 Hz is the noise generated by a truck engine. At about 16.53 occurs the noise of a passing airplane (50–1000 Hz). Bird songs appear at 1500–9000 Hz. Courtesy of Alberto Armani

Some SPL meters have a 60-s Leq setting used for short-term sampling. However, if the sound level varies randomly, calculating Leq is tricky, and so, Integrating Sound Level Meters are better (Fig. 2.29) as they determine Leq during a suitable time period. When more information on the statistics of sound levels is needed, in both time and frequency, noise-level analyzers are used (Fig. 2.29). They perform statistical analyses of sound levels over a specified period, either broadband or band-limited (e.g., in a 1-octave or 1/3-octave band). Most sophisticated, and expensive, noise measuring systems can produce spectra in narrower bands (as fine as 1-Hz bands) and calculate spectral percentiles to show the level variation statistics for each frequency band. In other words, the percentile analysis of a 1/3-octave spectrum shows what percentage of time each level is reached or exceeded within the measurement period (see Chap. 4, section on power spectral density percentiles).

Fig. 2.29
figure 29

Photograph of Larson Davis SoundAdvisor 831C sound level meter with spectral analysis and sound recording capabilities (left; permission to reprint from Larson Davis (http://www.larsondavis.com/; accessed 5 Mar. 2021)) and of a simple noise-level analyzer with calibrator (right; shown being calibrated using a 1 kHz tone with 94 dB SPL)

All these devices need to be calibrated periodically with a known calibration tone. Calibrators are standardized at the factory and usually maintain calibration for a long time. Only specialized laboratories can certify calibrators. The calibrator signal is usually a 1-kHz sinusoidal tone at 94 dB re 20 μPa SPL rms (equivalent to a pressure of 1 Pa rms, 95.45 dB pk, or 1.41 Pa pk).

2.7.2 Vibration Measurement

2.7.2.1 In Terrestrial Studies

In addition to communicating through sound (i.e., pressure waves propagating through air or liquid), animals ranging from elephants to insects communicate by producing waves that travel through solids (i.e., substrate-borne vibrations, also referred to as vibrational or seismic communication in the literature) (Cocroft et al. 2014a; Hill 2008; Hill et al. 2019; O’Connell-Rodwell 2010). Of insects alone, an estimated ~195,000 species communicate in part or whole via substrate-borne vibrations (Cocroft and Rodríguez 2005). Of these, the most species-rich group is plant-living insects, and so most examples in this section deal with invertebrate signalers and plant substrates.

Vibrational signals travel through various kinds of substrates (e.g., rod-like, such as plant stems; plate-like, such as leaf litter) as different types of waves (e.g., bending, Rayleigh) that vary in their direction of energy propagation (reviewed in Elias and Mason 2014; Mortimer 2017). In plant stems and leaves, substrate-borne vibrations travel as bending waves (Michelsen et al. 1982) and signal propagation is frequency-dispersive; in other words, energy at higher frequencies propagates faster than does energy at lower frequencies (Michelsen et al. 1982). Furthermore, each substrate acts as a unique filter, attenuating some frequencies more than others (reviewed in Elias and Mason 2014). Filtering varies among different plant species (Bell 1980; McNett and Cocroft 2008; Virant-Doberlet and Čokl 2004), different parts of same plants (Čokl et al. 2005; McNett and Cocroft 2008), and even among different parts of the same leaves (Čokl et al. 2004; Magal et al. 2000).

Filtering is a key consideration for selecting a sensor for recording or playback (Cocroft et al. 2014b). Importantly, the transmission and filtering properties of a given substrate can be affected by a sensor, if it loads on extra mass. If the aim is to characterize signal parameters of a given species, then to minimize filtering, one must choose a sensor that adds as little mass as possible and minimize the signal propagation distance between the source and the receiver. For example, one might affix a small and lightweight micro-accelerometer to the substrate, close to the signaling animal. Alternatively, one might use a laser-Doppler vibrometer to detect and record signals directly from the body of the signaling animal (Čokl et al. 2005).

The output of a sensor is proportional to the quantity (displacement, velocity or acceleration) that it detects – a sensor that detects displacement will be most sensitive to low-frequency signals, whereas a sensor that detects acceleration will be most sensitive to high-frequency signals. The consequence of this relationship between output and quantity is that the type of sensor used impacts the measurements that one makes of a signal and how that signal is characterized.

Some of the key considerations for selecting a type of sensor include its sensitivity and power needs (all sensors require power), the frequency and amplitude ranges of the signals, equipment ruggedness and portability (if considered for fieldwork), and cost (Table 2.1). Research questions can be framed around the signaler or receiver, and the measurement of interest can vary widely (e.g., number of signals produced, signal parameters, etc.). Different sensor types function best in different frequency ranges, and the dominant frequency of a vibrational signal can vary widely, from <50 Hz for tremulating katydids (De Souza et al. 2011; Morris 1980; Morris et al. 1994; Sarria-S et al. 2016), to between 50 and 200 Hz for tremulating stinkbugs (reviewed in Čokl et al. 2014), to above 500 Hz for diverse kinds of plant-feeding insects (reviewed in Čokl et al. 2014). Vibrational signals can also be narrowband (McNett and Cocroft 2008) or broadband, with energy distributed over several kHz (Cocroft 1996; Hamel and Cocroft 2019).

Table 2.1 Examples of sensors that might be selected for vibrational communication studies, taking research aim, substrate type, signal frequency and amplitude ranges, ruggedness of equipment, and cost constraints into account

The amplitudes of vibrational signals also vary widely, even just within small arthropods. For example, large neotropical katydids produce substrate-borne vibrations by vertically oscillating their abdomens relative to the substrate (in other words, they bounce) and the amplitude of these oscillations can be large enough to observe with the naked eye (Belwood and Morris 1987; Morris et al. 1994; Rajaraman et al. 2015). In contrast, the amplitude of signals by tiny treehopper nymphs can be so low as to be difficult to detect without a very sensitive sensor, such as a laser-Doppler vibrometer (LDV) (JH, pers. obs.). The animal’s use of substrates is another key factor to consider: some vibrationally signaling animals, such as small, plant-feeding insects, are relatively sessile and signal from specific locations on plants of a single species (McNett and Cocroft 2008), whereas other vibrationally signaling animals are more motile and may signal on diverse substrate types (reviewed in Elias and Mason 2010).

2.7.2.1.1 Sensor Types Based on the Quantity Measured

Displacement: Phonocartridges and other piezoelectric sensors have greatest sensitivity at low frequencies. Phonocartridges can be quite good for detecting low-frequency, low-amplitude signals in plant substrates, but placement of the photocartridge on the plant leaf or stem necessarily loads the substrate and changes its transmission properties (Fig. 2.30a). Additionally, amplitude measurements made with phonocartridges are variable and not repeatable, because amplitude varies with the pressure with which the stylus contacts the plant tissue.

Fig. 2.30
figure 30

Sensors that detect and measure substrate-borne vibrations. (a) A phonocartridge attached to lab-hands or a thin wooden dowel. (b) Accelerometer. (c) Piezo disc or contact microphone for detecting substrate-borne vibrations. (d–f) Accelerometers affixed to substrates with a small amount of accelerometer wax or dental wax. Lightweight supports such as twist-ties and thin hair clips are used to reduce the likelihood of the accelerometer shifting position or detaching from a substrate

Velocity: LDVs use the reflection of a laser beam pointed at a reflective object or substrate to detect the velocity of its movement. (If a surface does not reflect enough of the laser for measurement, a small amount of reflective paint or tape can be applied to the substrate.) LDVs are highly sensitive and excellent for detecting and making measurements of low-amplitude signals that also have energy concentrated in low frequencies. They do not load any mass to a substrate, so they do not affect signal transmission in this way, and in fact, they can be used to characterize signals by recording from an animal itself (Čokl et al. 2005). LDVs provide repeatable measures of amplitude for vibrational signals. Unfortunately, LDVs can be expensive. Although they are fairly portable, they are still quite cumbersome compared with a micro-accelerometer. Additionally, because an LDV detects motion perpendicular to the laser, the researcher must decide which plane is of interest (e.g., identify the major axis of motion). LDVs are not well-suited for high-amplitude signals, as a moving branch or stem will break the contact of the laser with the reflective surface and disrupt measurement.

Acceleration: Accelerometers can be purchased in a wide variety of sensitivities, frequency ranges, and sizes, and some models have the capacity for adjustable gain. For example, a commonly used micro-accelerometer in studies of small insects has a mass of 0.8 g and a frequency range of 0.8 Hz10 kHz. Accelerometers can generate repeatable measurements of amplitude, and because accelerometers are necessarily attached to a substrate, they can measure high-amplitude signals that move the substrate itself. Accelerometers are lightweight and small (Fig. 2.30b), can be rugged, and several commonly used models can be powered by one or more 9-V batteries. Drawbacks of accelerometers are that attaching a sensor to a substrate loads mass to the substrate; to avoid altering of substrate transmission properties, it is recommended to limit sensor mass to <5% of the mass of the substrate (Cocroft and Rodríguez 2005). Because accelerometers detect acceleration, they are not as sensitive at low frequencies as they are at higher frequencies, and they generally have lower bandwidths than LDVs.

The study of animal vibrational communication is rapidly growing. In order to withstand the rigor of peer-review, researchers must document the type, make, model, and sensitivity of the sensors used, and also document the factors likely to affect signal characteristics and propagation (e.g., substrate type and characteristics, position of the animal). The relative position of the sensor must be logical, consistent, and be informative for the study. For sensors that attach to substrates (e.g., accelerometers), secure and even attachment will help achieve a good signal-to-noise ratio and minimize impedance mismatch (Fig. 2.30 a, d–f).

2.7.2.2 In Underwater Studies

An important issue with respect to fishes and invertebrates is their sensitivity to particle motion that accompanies sound transmission, rather than to sound pressure. Particle motion comprises particle displacement, particle velocity, and particle acceleration (ISO 18405 2017Footnote 27) and differs from sound pressure in that it is a vector quantity. In contrast, sound pressure is a scalar quantity, acting in all directions.

Popper and Hawkins (2018) reported that it is commonplace to characterize underwater sound by the sound pressure alone, because it is easily measured by a hydrophone, and then to estimate the particle motion from the sound pressure measurements and the acoustic properties of the medium. This is relatively easy in an acoustic free-field (i.e., no nearby boundaries to sound propagation). However, near acoustic boundaries (like the seabed and the sea surface), the relationship between pressure and particle motion becomes complex and so, particularly in shallow waters that are inhabited by many fishes and invertebrates, measuring particle motion directly is necessary. The result is a dearth of data on particle motion and its importance to, and potential effects upon, animals. Although there are excellent hydrophones for monitoring sound pressure, there are far fewer devices for detecting and analyzing particle motion.

Popper and Hawkins (2018) described the many problems with measuring particle motion in a tank and recommended that measurements be taken in the field, or at least in a specially designed sound exposure chamber to control the relative magnitudes of particle motion and sound pressure. To make particle motion measurements, it is necessary to mount three orthogonally orientated vector sensors together to monitor the three spatial components of particle motion. Any sound can thus be resolved into its directional components and the direction to the sound source may be determined. Calibrated particle motion measurement systems are commercially available, but expensive. An alternative approach is to measure the sound pressure gradient in the water to derive the particle motion in a particular direction.

Many studies have used custom-built particle motion sensors for studying the impacts of anthropogenic activities on fish (e.g., Campbell et al. 2019; Solé et al. 2017; van der Knaap et al. 2021). GeoSpectrum Technologies Inc. offers a few choices for off-the-shelf particle motion sensors in their M20 line of products. Each device consists of an omnidirectional acoustic pressure sensor co-located with three (or two) dipole sensors that measure the amplitude and phase of particle motion in the three (or two) orthogonal directions. Being lightweight and having a small form factor (e.g., the M20–040 has a 64 mm diameter and is 179 mm tall; Fig. 2.31), they are preferred over traditional hydrophone arrays for assessing directionality, especially for use on small unmanned underwater vehicles (e.g., Stinco et al. 2019). The M20 devices support directionality assessments over a frequency range of 1 Hz to 3 kHz, and the bearing uncertainty increases with decreasing frequency and decreasing SNR. Erbe et al. (2017) used a GeoSpectrum M20 to determine sound pressure, particle displacement, particle velocity, and particle acceleration from recreational swimmers, kayakers, and divers.

Fig. 2.31
figure 31

Photograph (left) and receiving frequency response (right) of GeoSpectrum M20–040. Note that the units of the calibration curve are in terms of particle velocity level (PVL): dBV re 1 m/s. Permission to reprint from GeoSpectrum Technologies Inc. (http://www.GeoSpectrum.ca/; accessed 15 Mar. 2021)

2.7.3 Smartphone Applications

Smartphone applications have put bioacoustic research in the hands of hobbyists and citizen scientists. Applications are inexpensive, rapidly evolving, and available on both Android based phones and iPhones. These applications are well-suited for classroom and field demonstrations of bioacoustic research. The microphone and soundcard in cellphones from different manufacturers determine the frequency range and level of the sounds recorded and the type of analysis possible. A researcher needs to know the frequency range and amplitude sensitivity of the cellphone to ensure that the sounds of the target animals can be appropriately captured. Applications used in battery-operated cellphones provide the ability to select a recording time and duration for long-term, remote monitoring of ambient and animal sounds.

2.8 Summary

Technology used in bioacoustic research is changing rapidly. This chapter describes currently used equipment in bioacoustic studies, along with references and websites. The chapter starts with an introduction to the nomenclature used in the industry, describing these as they apply to animal bioacoustic research. An understanding of the terminology would assist a bioacoustician with choosing appropriate equipment with characteristics suitable for a particular study. Instruments that form a complete recording or playback setup are described in light of these characteristics, along with mentions of a few of the commonly used products available in the market. Considerations such as electronic noise, aliasing, sensitivity, resolution, and dynamic range are discussed for both terrestrial and underwater equipment. Autonomous recorders, that offer pre-packaged programmable solutions for passive acoustic monitoring, are also discussed. The discussions cover several indicative bioacoustic studies (targeting a wide variety of fauna) that highlight the use of specific equipment for different purposes and under different conditions. Other related types of equipment used in closely related fields (such as biotremology, particle velocity measurement, etc.) are highlighted.

A priori knowledge of the target animal’s sounds is helpful in selecting appropriate equipment. Sensing and recording equipment needs to be appropriate for the environmental conditions being studied. This chapter summarizes how to select and operate microphones and hydrophones, digital recorders, automated recording systems, amplifiers, filters, sound pressure level meters, and cellphone applications. Knowing the equipment specifications and selecting components to match in frequency range and amplitude sensitivity is important. The dynamic range, amplitude sensitivity, and frequency response of each piece of equipment in a recording setup must match and suit the types of sound (i.e., their level and frequency range) intended to be recorded. Periodic calibrations of microphones and hydrophones are necessary to ensure accurate measurements are made, and the methods are described herein. With their wide availability and ease of use, smartphone driven approaches are gaining popularity lately. The chapter aims to offer the reader a firm grounding with the concepts and available equipment options in bioacoustics. Pointers to seek further understanding are provided along with information about online resources that could offer more up-to-date information on the topic.

2.9 Additional Resources

Information about recording equipment:

Smartphone applications: