1.1 Introduction

For centuries, scientists have recognized the importance of documenting human, animal, and environmental sounds. However, in recent decades, the field of bioacoustics has experienced an exceptional period of growth, primarily boosted by the rapid development of new technologies and methods to record and analyze acoustic signals. The most significant revolution in the field was the introduction of digital recording, data storage, and analysis technologies that reached the consumer market around 1980 with the introduction of the compact disc (CD). In the “analog days,” researchers had to carry bulky and heavy equipment and batteries to field locations; recording duration was often limited by excessive tape and battery consumption.

Researchers produced hardcopies of sound displays using a Kay Sona-Graph™ machine and spliced together sonograms to generate figures for publication. Initially, frequency and time measurements were taken from these hardcopies using a regular ruler, and signals or sound events of interest were identified manually by listening human observers. As a result, studies using bioacoustics-based approaches were sparse. Now, researchers struggle to keep up with the ever-increasing number of studies using bioacoustics made possible by the accessibility, affordability, and extended recording capabilities of current equipment.

This chapter is a compilation of the authors’ collective experiences in the field of bioacoustics, with each author having considerable experience studying the sounds of vocal animals across a myriad of terrestrial and aquatic environments. Even considering the drawbacks of the “good old days” of bioacoustics research, the authors concur they were incredibly fortunate to have a career studying fascinating animal sounds. As recording and analysis technologies improved, the types of information that could be extracted from recordings of animal sounds increased. Presently, species-level identification is possible in most cases, and depending on the focal animals the age, sex, reproductive status, behavior, activity patterns, and even health of an individual may be estimated from acoustic recordings. Acoustic data can be used to estimate the population density of vocal animals, and dialects can indicate the geographic boundaries of a population. However, density estimation by acoustics is still in its infancy, and will require further advancement in the spatial analysis of the acoustic environment by using multiple sensors to become reliable and widely applicable. At the community level, the entire acoustic environment or soundscape can be used to estimate species abundance and biodiversity. Changes in vocal behavior can be indicative of environmental stressors, such as anthropogenic noise or habitat degradation (Pavan 2017).

Originally, sounds of terrestrial animals were studied with equipment and methods developed for military needs, human speech analysis, and music processing (Koenig et al. 1946; Potter et al. 1947; Marler 1955). Later, scientists became interested in the sounds of aquatic animals, and underwater research was facilitated by technologies used by the navies to monitor the noise made by ships and submarines. Because of the frequency limitations of transducers (i.e., microphones and hydrophones), recorders, and analysis equipment, most initial bioacoustic research was conducted in the sonic range (i.e., the frequency range audible to humans: 20 Hz–20 kHz). Even in the early stages of the digital revolution, both recorders and analysis equipment were generally limited to audible frequencies.

A major hurdle for collecting field recordings was the large size and weight of early analog equipment, along with high power consumption, which resulted in limited recording time. The development of smaller, lightweight recording devices made the collection of acoustic data significantly easier. Currently, with the advent of small digital recorders with large solid-state memories, anyone including researchers, professionals, and amateurs can collect large amounts of high-quality acoustic data continuously over extended periods. However, when using handheld recorders, the potential influence of the human observer on the animals’ acoustic behavior is a concern. Through the development and use of autonomous recorders, video cameras, and acoustic animal tags, human observer effects can be minimized, and unsupervised data collection over extended periods (days to months) and in remote locations is now possible.

In this chapter, we describe the history of the development of transducers, recorders, and sound analyzers, along with the advances that these developments facilitated in the field of bioacoustics. Recording equipment can now capture a wide range of frequencies, from infrasounds to ultrasounds (sounds below and above the range of human hearing, respectively), and are used in a wide range of applications, from the study of individuals and populations to entire soundscapes. The digital revolution in sound recording and analysis allowed for significant advances in the field of bioacoustics (Obrist et al. 2010) and resulted in the development of new disciplines, such as computational bioacoustics (Frommolt et al. 2008), acoustic ecology, soundscape ecology (Pijanowski et al. 2011a, b; Farina 2014), and ecoacoustics (Farina and Gage 2017). An overview of acoustic principles and the evolution of sound recording systems for musical applications is given in Rumsey and McCormick (2009) and in Rossing (2007).

1.2 Advances in Recorders

The most significant advancement in recording technology was the switch from analog-to-digital devices. A reduction in size and weight of the recorder, extended battery life, rechargeable batteries, more stable and larger capacity storage media, broader frequency range, and accessibility of a computer interface accompanied this transition. Together, these advances provided bioacousticians with an adaptable system for recording a variety of species, greater field portability, and generally more affordable high-quality equipment.

To understand the basic differences between analog and digital recorders, a clear explanation of the terms is necessary. Humans perceive the world in analog; this means that everything is seen and heard as a continuous flow of information. In contrast, digital information estimates analog data by taking samples at discrete intervals and describing the sample values as a finite number represented by binary coding (Pohlmann 1995). For instance, while a vinyl record player (phonograph) is analog, a CD player is digital. A phonograph converts groove modulation from a vinyl record into a continuous electrical signal, whereas a CD player reads a pit structure that is interpreted as a series of ones and zeros (bits) that is typical of binary coding. Likewise, a video cassette recorder (VCR) is analog, yet a digital videodisc (DVD) player is digital. A VCR reads audio and video data from a tape as a continuous variation of magnetic information, whereas a DVD player reads ones and zeros from a disc similar to a CD.

Digital devices can approximate analog audio or video signals with an accuracy level that is dependent on both sampling rate and bit depth (or the number of bits in each sample). The Shannon-Nyquist sampling theorem proves that, for a given frequency range, a sampling rate at least twice that of the highest frequency can capture all information in that frequency band, enabling perfect reconstruction of the analog waveform.

With proper sampling, analog signals can be transformed in the digital domain at a level that makes them indistinguishable from the original. A significant advantage of digital data is that it can be stored and manipulated more easily than analog recordings. With analog recorders, each copy produces a little degradation that accumulates through multiple successive copies. Analog tapes are also prone to degradation with time. Digital copies are a perfect duplication that is indistinguishable from the original, unless specific data codes are added to identify them. More importantly, digital recordings can be directly transferred to a computer for processing or transferred through the Internet to be shared among different laboratories. If researchers want to transfer audio or video files from old analog tapes so they can be recognized and processed by a computer, they must use a sound interface based on an analog-to-digital converter (AD-converter) to digitize the analog signal and transform it into a sequence of numbers.Footnote 1 For playing back sounds from a computer, a sound interface with a digital-to-analog converter (DA-converter) is required. Next, we outline a brief history of the evolution of analog and digital recording devices. For more detail on digital recording technologies, see Pohlmann 1995.

1.2.1 Analog Recorders

The first purported sound recording was made by Édouard-Léon Scott de Martinville and dates back to 1860. The recording was just a few seconds in duration and was made using a phonautograph. The phonautograph has a vibrating stylus, which moves on soot-covered paper to draw the sound waveform.Footnote 2 It was invented in 1857, and although it could record sounds, it never evolved to allow reproduction of the recorded sound.

In the 1870s, Thomas Edison invented the wax-cylinder recorder (Figs. 1.1 and 1.2), which had a vibrating diaphragm that was mechanically linked to a needle that sculpted grooves. It was initially recorded on aluminum foil and then on a wax layer covering the cylinder, as it was slowly rotated and translated on a screw axis. This device encoded the sound vibrations into modulations of the groove and then allowed playback of the recorded vibrations through the same needle-membrane system.

Fig. 1.1
figure 1

Thomas Alva Edison and his phonograph. Image source: https://commons.wikimedia.org/wiki/File:Edison_and_phonograph_edit2.jpg, by Levin C. Handy (per http://loc.gov/pictures/resource/cwpbh.04044/), public domain, Wikimedia Commons

Fig. 1.2
figure 2

Photographs of an Edison’s wax-cylinder player (left) and a wax-cylinder recording (right). Image sources: (left) https://commons.wikimedia.org/wiki/File:EdisonPhonograph.jpg, by Norman Bruderhofer, www.cylinder.de, CC BY-SA 3.0 http://creativecommons.org/licenses/by-sa/3.0/, via Wikimedia Commons; (right) https://commons.wikimedia.org/wiki/File:Bettini_1890s_brown_wax_cylinder.jpg, by Jalal Gerald Aro, CC BY-SA 2.0 https://creativecommons.org/licenses/by-sa/2.0, via Wikimedia Commons

According to Ranft (2001), the first known recordings of animal sounds (a caged Indian bird, the Common Shama) were made in Germany in 1889 on an Edison wax-cylinder. One of the first known scientific studies of animal sounds occurred in 1892 when Richard Lynch Garner recorded primates on vax cylinders at a zoo in the USA (Garner 1892). Garner also experimented with the playback of the recordings to observe the primates’ reactions.

The first flat disc was invented in the late 1870s, which provided an advantage over previous technology as the discs could be easily replicated. Then in 1887, Emile Berliner patented a variant of the phonograph, named the gramophone, which used flat discs instead of spinning cylinders (Fig. 1.3). Sounds were recorded on a disc as modulated grooves, with a system similar to the one developed by Edison for wax-cylinders. The first published recording of a bird sound was issued in 1910 in Germany, and the first radio broadcast of a singing bird was in Britain in 1927 (Ranft 2001).

Fig. 1.3
figure 3

Emile Berliner with disc record gramophone – between 1910 and 1929. Image source: https://commons.wikimedia.org/wiki/File:Emile_Berliner_with_disc_record_gramophone_-_between_1910_and_1929.jpg, National Photo Company Collection (Library of Congress), public domain, via Wikimedia Commons

Lademar Poulsen, a Danish engineer, invented the telegraphone or wire recorder in 1898 (Poulsen 1900). Wire recorders were the first magnetic recording devices, and they utilized a thin metallic wire, which passed across an electromagnetic recording head. Each point along the wire was magnetized based on the intensity and polarity of the signal in the recording head. Wire recorders often had problems with kinks in the wires, but editing was relatively easy as sections of wire could simply be cut out.

In the early 1900s, RCA Victor developed the Victrola, which played records or albums that were readily available to the general public. Sounds were recorded as modulated grooves on a disc, and this disc was used to produce a master metallic plate where the grooves appeared as ridges. Albums were then produced for distribution by molding copies using the master plate and Bakelite (or synthetic plastic) material. In 1920, AT&T invented the Vitaphone, which recorded and reproduced sounds as optical soundtracks on photographic film; the film impression was made with a thin beam of light modulated by the sound.

Arthur Allen, the founder of Cornell University’s Laboratory of Ornithology, and Peter Kellogg made the first recordings of wild birds in 1929 at a city park in Ithaca, NY, USA. Albert R. Brand (a graduate student of Allen) and M. Peter Keane built the first equipment for recording in the field. Together, they recorded over 40 bird species within the first two years. With World War I parabola molds available from the Physics Department, Keane and True McLean (a professor in Electrical Engineering at Cornell) constructed a parabolic reflector to improve recording of bird songs in the fieldFootnote 3 (Ranft 2001). In those years, Theodore Case of Fox Case Corporation approached Arthur Allen to record singing wild birds and demonstrate the sound-synchronized film technology. Under the guidance of Allen, a Fox Case Corporation crew filmed and recorded the songs of wild birds in North America (Little 2003). Today, two of those recordings can be heard on the Macaulay Library website.Footnote 4 After a successful campaign with the Fox Case film crew, Allen and his colleague Peter Paul Kellogg recorded the sounds of wildlife for research and education purposes. The Library of Natural Sounds (now known as the Macaulay Library) began in 1930 at the Cornell Laboratory of Ornithology. In 1932, Allen and Kellogg used visual and audio recordings to demonstrate to the American Ornithological Union that the ruffed grouse (Bonasa umbellus) produced drumming sounds (Little 2003). In 1935, Cornell biologists carried out an expedition to record the sounds of vanishing bird species, including the ivory-billed woodpecker (Campephilus principalis), for which they used a mule-drawn wagon to transport recording equipment into the field (Fig. 1.4).Footnote 5 Even with limited space and harsh conditions, Alton Lindsay, in 1934, took a phonograph recorder on the Little America Expedition to Antarctica and made recordings of airborne sounds from Weddell seals (Leptonychotes weddellii), available today at the Smithsonian Institution.

Fig. 1.4
figure 4

Photograph of ornithologist Peter Paul Kellogg in 1935 in a mule-drawn wagon used to haul an amplifier (center) and optical film recorder (on the right) to capture the sounds of ivory-billed woodpeckers in the Singer Tract, Madison Parish, Louisiana. Image by Arthur A. Allen courtesy of the Cornell Laboratory of Ornithology

In the late 1930s, a German company invented the Magnetaphone, which was based on the same principle as the magnetic wire recorder, but instead of wire, it had long, thin strips of paper impregnated with fine particles of iron oxide that were drawn across an electromagnetic head. After World War II, the American company Ampex perfected the German technology by replacing paper with a thin plastic film. For almost 50 years, reel-to-reel magnetic tape was the standard media for use on recorder/playback devices (Fig. 1.5). Reel-to-reel recorders (or open-reel recorders) used variable tape speeds to record different frequency ranges, with faster recording speeds providing higher-frequency recordings. Another American company, a contemporary of Ampex, the Amplifier Corporation of America, was one of the first companies to develop a truly portable reel-to-reel recorder, the Magnemite 610, which was introduced in 1951 and was used by many pioneers in the field of bioacoustics. Figure 1.6 shows Peter Paul Kellogg using a 1950s Magnemite 610 recorder with a Western Electric 633 microphone mounted in a parabolic reflector.

Fig. 1.5
figure 5

Open-reel recorder made by AEG (1939). Image source: https://commons.wikimedia.org/wiki/File:AEG_Magnetophon_K4_1939.jpg, by Friedrich Engel, CC BY-SA 3.0 https://creativecommons.org/licenses/by-sa/3.0, via Wikimedia Commons

Fig. 1.6
figure 6

Photograph of an early 1950s field recording system. Peter Paul Kellogg with an Amplifier Corporation of America Magnemite 610 reel-to-reel tape recorder and a Western Electric 633 microphone mounted in a parabolic reflector. Courtesy of the Cornell Laboratory of Ornithology

Initially, tape recordings were mono recordings with one soundtrack on the tape. Stereo recording techniques (providing two record/playback channels) were developed in the 1960s. Initially, these recorders were bulky and not field portable. Then, portable open-reel recorders were developed for the rapidly developing outdoor recording needs of the radio, music, and film industries. Stereophonic recorders allowed the recording of two synchronous signals on parallel tracks onto one tape. In bioacoustics applications, often one track was used by the recordist for comments and the second track for recording animal sounds.

In the 1970s and 1980s, the most common reel-to-reel recorders used by bioacousticians were the Nagra III and IV series and the Uher 4000 series. They offered multiple recording and playback speeds (depending on the models, 3.75, 7.5, 15, or 30 inches per second), were relatively lightweight, ruggedized, and battery powered, which meant they were better suited for field studies. Eventually, recorders had even more channels (as many as 24 in some music-recording studios), which enabled scientists to record and playback signals simultaneously from more than one acoustic sensor.

Recorders were also developed to record a wide range of frequencies. Studies by Griffin (1944), Sales and Pye (1974), and Au (1993), provided evidence that animals (bats and dolphins) produce a wide range of ultrasonic signals. The first recordings of ultrasonic echolocation signals from bats and dolphins were made on expensive dedicated tape recorders at very fast tape speed (60 and 120 inches per second). Among them, the RACAL Store4DS recorder was used in the 1980s and 1990s, and it provided tape speed up to 60 inches per second to record frequencies up to 300 kHz. It was battery powered and reasonably portable. However, the limited data storage capacity of these magnetic reels meant that the recordings lasted only a few minutes.

In 1964, Philips introduced the compact cassette tape, which was comprised of a small plastic case holding two small reels with 1/8-inch wide magnetic tape running at 4.75 cm/s (1.875 inches per second). In the 1970s, analog cassette recorders, which could easily record and playback sounds, became available at affordable prices, but were used primarily for music and human speech, and were thus limited in frequency to the human hearing range. These recorders (Fig. 1.7) were much smaller and less expensive than reel-to-reel devices. Cassette tapes could record up to one hour on each side of the cassette (typical total recording duration was either 60, 90, or 120 min), but tapes were very thin and fragile, which made them prone to print-through (the magnetic transfer of a recorded signal to adjacent layers of tape). In 1976, Sony introduced, with little success, the Elcaset, a bigger cassette with 1/4-inch tape running at 9.5 cm/s. Today, however, it is almost impossible to find new reel-to-reel or cassette tapes as there are very few manufacturers of these media.

Fig. 1.7
figure 7

Left: Photograph of a semi-professional stereo cassette recorder Marantz CP430 used by nature recordists until the last decade of the twentieth century. Right: Photograph of a mono cassette recorder (Philips K7, 1968) with microphone and cassette inside. Image source: https://commons.wikimedia.org/wiki/File:Philips_EL3302.jpg, by mib18 at German Wikipedia, CC BY-SA 3.0 http://creativecommons.org/licenses/by-sa/3.0/, via Wikimedia Commons

One of the advantages of tape recording was the possibility to play back the tapes at a speed lower or higher than the original recording speed. This way it was possible to lower the frequency of recorded ultrasonic signals to the human hearing range, thus making them audible (and longer in duration); conversely, recordings of infrasounds were played at higher speed to make them audible (and shorter in duration). The same trick can now be done easily with digital systems. Playbacks are a commonly used experimental approach in bioacoustics, wherein previously recorded sounds are broadcast to the animals of interest. Many playback studies used magnetic tape recordings containing animal sounds as the stimuli.

Researchers could easily play the sound backward (by reversing the reading direction of a spliced tape) or insert a section of tape containing sounds of another species, individual, or noise as a control stimulus. Magnetic tape was also used to record live video images. The first practical video tape recorder (VTR) was built in 1956 by Ampex Corporation. The first VTRs were reel-to-reel recorders used in television studios, which made recording for television cheaper and easier.

VHS tape recorders, introduced in the 1970s, were the first compact analog devices to record both audio and video signals simultaneously on the same tape. Commercial video cameras quickly became available for home use. Battery power for cassette recorders and VHS cameras/recorders made this equipment popular for field studies of animal behavior and sounds.

Many magnetic analog recordings had problems because the media deteriorated when tapes were not stored under properly climate-controlled conditions. Unfortunately, some older analog recordings have been lost, or, in some cases, the players are not available to retrieve the recorded sounds. In the last decades, a great effort was made by major sound libraries to preserve old recordings (on wax-cylinders, discs, magnetic tapes, and cassettes) and to transfer them to safer digital storage (Ranft 1997, 2001, 2004). This was often not an easy task because magnetic tape recordings used a large variety of tape types, speeds, and track format arrangements. Unfortunately, many valuable tape recordings have yet to be converted to a digital format and archived. Without a long-term preservation strategy and support, it is possible that these media may be lost forever.

1.2.2 Digital Recorders

The introduction of the CD by the music industry in 1983 brought digital audio to the consumer market and started a new audio recording age (Pohlmann 1995). The ability to store sound in a digital format greatly improved acoustic data collection. It allowed easy and perfect replication of recordings, enabled accurate digital editing, and provided the means of more permanent data storage with direct access for processing and analysis by a computer.

In 1987, Rotary Digital Audio Tape (R-DAT or DAT) recorders were the first widely available digital recorders (Fig. 1.8). However, these devices still recorded on a thin magnetic tape encapsulated in a small cassette using a rotating helical-scanning magnetic head, which allowed for much faster head-tape speed and data density. Many R-DAT recorders allowed recording at different sampling rates of 32.0, 44.1, or 48.0 kHz and 16-bit resolution (the CD standard is 44.1 kHz, 16 bit) (Pohlmann 1995). The R-DAT format had little success in the consumer market because of the high cost but was used widely by professional recordists as a replacement for expensive and bulky open-reel recorders.

Fig. 1.8
figure 8

(a) Photograph of a portable R-DAT recorder Sony TCD-D7 (1992) with a DAT cassette and the optical able to provide digital data transfer to a PC. (b) a MiniDisc recorder and disc (1997)

Some specialized R-DAT models allowed recording up to 100 kHz on a single channel (i.e., by using a 204.8 kHz sampling frequency and doubled tape speed). R-DAT offered recording quality that was comparable to open-reel recorders, however, the helical-scanning head proved problematic in humid conditions, and the thin tape used in R-DAT cassettes was easily damaged. An alternative to R-DAT was the digital compact cassette (DCC) introduced by Philips in 1992. DCC was compatible with the already existing analog cassette tapes but failed to gain commercial success.

Digital recorders with optical discs (CD-R and DVD-R) never gained popularity for field applications because the equipment had to remain stationary while recording. Also, at the same time, magnetic discs (hard drives) quickly became the state-of-the-art data storage media. In contrast, the MiniDisc (MD), a small optical disc developed and marketed by Sony in 1992, had more success among nature recordists, because the MD portable recorders were smaller, lighter weight, and much cheaper than DAT recorders. MD offered random access to the recordings (DAT and analog tape recorders allowed only sequential access), which made it much easier to find and listen to specific sections of a recording. These devices used the same sampling mode as the CD (44.1 kHz, 16 bit). The main disadvantage of the MD was the lossy signal compression based on Adaptive Transform Acoustic Coding (ATRAC), similar to the MP3 codec developed by the Moving Picture Expert Group (Budney and Grotke 1997). The compression fit 74 minutes of acoustic data onto a small digital disc with a nominal capacity of 140 megabytes (MB) with a compression rate of 5:1. The precision of some measurements of the acoustic structure of animal sounds can be significantly affected by lossy data compression schemes (Araya-Salas et al. 2017).

With hard drive recorders and the subsequent development of solid-state memory recorders, a new generation of high-quality equipment with unparalleled capacity became available in the early 2000s (Figs. 1.9 and 1.10). Solid-state memory recorders do not require mechanical moving parts for the storage and retrieval of digital information and instead use memory cards, such as Compact Flash (CF) or Secure Digital (SD and microSD) cards also used in the digital photography market.

Fig. 1.9
figure 9

(a) Photograph of a professional portable high-quality recorder (Sound Devices, SD722) with both hard disc and solid-state memory recording capabilities, connected to two low noise microphones (Rode NT1A) for soundscape recording. (b) Photograph of SONY TC-510 open-reel recorder (1982) and a SONY PCM-M10 digital recorder with its microSD memory card. (c) Photograph of five widely used digital recorders lined-up for comparative testing. From left: Sony PCM-M10, Sony PCM-D50, Olympus LS-3, Roland R05, and Zoom H1. They feature internal microphones, but also can connect to external Plug-In-Power (PIP) microphones or hydrophones. Courtesy of M Pesente (2016)

Fig. 1.10
figure 10

Left: Photograph of a portable digital recording and analysis system composed of a pair of microphones, an AD-converter with USB interface (Edirol UA25), a low-power notebook, and an additional battery (2004). Right: Photograph of an autonomous terrestrial recorder by Wildlife Acoustics (model SM3, 2014) with external battery deployed in a nature reserve in Italy

The subsequent development of pocket digital recorders for the consumer market allowed scientists and amateurs to record many hours of sounds with high quality. Portability and storage space increased while cost decreased. Today, tape recorders have been completely replaced by solid-state digital recorders with either external (Fig. 1.9a) or built-in microphones (Fig. 1.9c). Attempts to develop portable digital recorders based on handheld portable computers or pocket PCs never gained much popularity because of the rapid development of pocket recorders. Professional and semi-professional recorders (Fig. 1.9a) provide phantom powering at 48 V (P48) for professional condenser microphones, have quiet microphone preamplifiers, several types of powering options and can have up to 8 channels. Most pocket recorders lack the phantom powering required for professional microphones, but can power external microphones at low voltage (Plug-In-Power, or PIP; see Sect. 1.3.1).

Most digital recorders can sample at different sampling frequencies (e.g., 44.1, 48, 96, and 192 kHz) with either 16 or 24 bits of resolution, yielding very high sound quality. Some models can sample up to 192 kHz, but some of these have input electronics that limit the bandwidth to less than 60 kHz, well beyond human hearing limits, but not enough for recording animal ultrasounds. In the music industry, other standards have been developed to allow even higher acoustic quality (Melchior 2019), up to 384 kHz sampling with 32-bit depth, but they are not yet available in low-cost consumer recorders.

1.2.3 Recording to a Computer

In the 1990s, the first sound-acquisition boards for personal computers became available, which revolutionized the way scientists collect and analyze acoustic data. Once a sound was recorded in a digital format, recordings could easily and without degradation be transferred to a computer, stored, edited, copied, distributed, played, processed, and analyzed with different algorithms. Software (either freeware or commercial) that can be used on a laptop provides scientists with “a bioacoustics laboratory in a bag.” The consumer and professional market offer a large number of sound interfaces, to be connected by USB or other standards to a PC, which can offer very high audio quality and multiple input/output channels. Smaller versions of such a setup, or compact single-board computers costing few tens of US dollars, are being used in autonomous stationary and mobile recording systems, which allow data collection and real-time data processing in remote areas for months at a time (e.g., Klinck et al. 2012).

1.2.4 Autonomous Programmable Recorders

Researchers soon realized that their presence during recordings could influence the animal’s behavior, and that a remote system, which could be used in the absence of human observers, was needed. There was also an increasing interest in collecting samples of the acoustic environment over long periods of time. To address these new interests, off-the-shelf recorders were modified and connected to timers, enabling recording at a defined schedule. The use of portable computers also allowed scheduled recording in the field (Fig. 1.10). However, the main limitation was the need of external batteries, which allowed only a few days of operation. In addition, long-term recording required protection of the equipment in waterproof cases and additional batteries. Defense and research laboratories alike have interesting stories to tell about the evolution of their autonomous recording equipment (e.g., McCauley et al. 2017).

The first commercially available, programmable autonomous recorder, SongMeter 1 (SM1), was sold by Wildlife Acoustics in late 2007 and opened a rapidly developing market. Since then, new products have been proposed by companies and research groups, with increasing performances and autonomy. These can be programmed to record at defined intervals (e.g., every day across the dawn and dusk periods) or more regular sampling schedules (e.g., 1 minute every 10 minutes, or 10 minutes every half-hour) to sample temporal patterns of variation in a soundscape. This way, the acoustic behavior of animals of interest can be recorded without disturbance by the recordist and for extended periods, both day and night. These recorders need to be rugged and reliable to be deployed in harsh environments. The period of time that recorders can collect data depends on the combination of available battery power and memory. Depending on these factors, terrestrial recorders can operate for weeks to months. A grid of autonomous recorders can be used for monitoring biodiversity over a large area (e.g., entire countries; Obrist et al. 2010), even in the ultrasonic range. Figure 1.10b illustrates one type of autonomous recording system made by Wildlife Acoustics. A few different types of autonomous recorders are currently available. However, as interest in continuous, long-term acoustic monitoring of remote areas (Pavan et al. 2015; Righini and Pavan 2019) increases, new devices will continue to appear on the market and in the open-source arena. In some cases, audio recorders can be coupled with photo- and video traps to get images of the animals if they are at a close enough range.

Recent open-source autonomous recorders are built around the Raspberry Pi and similar small board computers. However, these devices often have inefficient power optimization and require large batteries to supply power over long periods. The Solo acoustic monitoring platformFootnote 6 (consisting of Raspberry Pi plus external microphone) needs a 12-V car battery to record for 40 days. Autonomous recorders need to be low-power to allow for extended periods of recording time with a manageable battery supply. The AudioMothFootnote 7 is an open-source device that also can be purchased assembled, and it employs a low-power microcontroller with an onboard Micro Electro-Mechanical System (MEMS) microphone (Hill et al. 2018). MEMS are very small and cheap and allow for production of autonomous recording devices at very low cost. Autonomous recorders can also be built around a wireless interface to send raw or processed data in real-time, in near real-time, or at scheduled intervals. However, data transmission requires power and the creation or use of a suitable wireless network (Sethi et al. 2018).

Smartphones with an external battery supply are another option used to explore animal sounds and soundscapes. The Automated Remote Biodiversity Monitoring Network (RFCx ARBIMON) can receive acoustic data from a remote recorder based on a cellphone that, if coverage is available, directly sends data to the central server with online access.Footnote 8 This system, coupled with Artificial Intelligence recognition algorithms, can identify sound categories to generate alerts to prevent poaching and deforestation. More information on autonomous recorders is available in Chap. 2.

1.2.5 Multi-Channel Recorders

Collecting multiple channels of acoustic data allows for acoustic localization of the sound source. Multi-channel recordings can help mitigate the Lloyd’s mirror effect, a phenomenon in which low-frequency sounds near the ground may not be recorded correctly because of the interference of direct and surface reflected sound. Increased interest in collecting multiple channels of acoustic data coupled with environmental information has driven the development of new multi-channel, multi-parametric instrumentation. Multi-channel portable recorders and computer interfaces developed primarily for professional music recording can be used for bioacoustics applications, however, dedicated recorders with very high sampling rates are also being developed for specific study systems.

The recently developed JASON QualilifeFootnote 9 can record up to 5 data channels, with the maximum sampling frequency up to 800 kHz per channel, all featuring 16-bit resolution, a sharp filter to prevent aliasing, and an adjustable analog gain for a large range of uses (Fig. 1.11).

Fig. 1.11
figure 11

The JASON Qualilife also hosts a high dynamic luxmeter in four different wavelengths and direct USB HDD or micro SD storage

Although already designed for low-power consumption (12 V, 100 mA), to further reduce power consumption and achieve extended long-term recording, an extension board (Qualilife Wake-Up Detector; Fourniol et al. 2018; Glotin et al. 2018), can be used to trigger the recorder when it receives a signal at a specified frequency. This allows for a reduction in power consumption and data storage, also reducing unnecessary post-processing work. Moreover, it includes a high dynamic luxmeter (which works from sun zenith to lunar eclipse) that is synchronized with the acoustic recorder.

1.3 Advances in Microphones

There were several early attempts in the mid- to late-1800s by Johann Philipp Reis and Elisha Gray to develop the precursor to a microphone. Reis developed the sound transmitter, which contained a metallic strip that rested on a membrane that caused intermittent contact between a metal point on the strip and an electrical circuit when it vibrated. Elisha Gray developed the liquid transmitter, consisting of a diaphragm connected to a moveable conductive rod, which was immersed in an acidic solution. In 1876, Alexander Graham Bell invented the magnetic transmitter, and Edison and Berliner developed a loosely-packed carbon granules microphone (Fig. 1.12). David Edward Hughes coined the term “microphone” in 1878 for his microphone system based on carbon granules, which performed poorly by today’s standards (due to high self-noise and distortion). However, it was an important step forward, enabling technology for long-distance voice communication or telephony (for more details see Robjohns 2010)Footnote 10

Fig. 1.12
figure 12

Left: Drawing of a carbon-button microphone (1916). Image source: https://commons.wikimedia.org/wiki/File:Carbon_button_microphone_1916.png; unknown author, public domain, via Wikimedia Commons. Right: Sennheiser MKH416 directional microphone used for bioacoustics research; https://commons.wikimedia.org/wiki/File:Sennheiser_MKH416.jpg by Galak76, CC BY-SA 3.0 http://creativecommons.org/licenses/by-sa/3.0/, via Wikimedia Commons

In 1886, Thomas Alva Edison refined the carbon granule microphone and developed the carbon-button transmitter. This transmitter consisted of a compartment filled with granules of carbonized anthracite coal, which were confined between two electrodes. One electrode was connected to an iron diaphragm. Edison’s transmitter was durable, efficient, simple, and cheap to build. His transmitter became the basis for millions of telephone transmitters used around the world.

1.3.1 Microphones Used in Bioacoustics Research

At the beginning of the twentieth century, most microphones were carbon granule sensors. These early microphones were noisy and had limited sensitivity and frequency response. This meant these early microphones were suited only for recording human voices. In those early stages, dynamic microphones based on a membrane with a coil immersed in a magnetic field were difficult to produce because they required small but strong magnets.

In 1917, Edward Wente made a great stride forward by inventing the condenser microphone, which is still used in a wide variety of applications today. In the 1920s, with the significant increase in broadcast radio, there was a high demand for better quality microphones. The piezoelectric microphone was created based on piezoelectric crystals, which are sensitive to pressure changes and generate a voltage when compressed/decompressed; conversely, they vibrate and produce sound waves if excited by an electric signal. Originally, they used quartz or Rochelle salt crystals, but the sound quality was poor. With the development of strong magnets, dynamic microphones were then used for decades because of their simplicity and reliability. However, for bioacoustics studies, they were not sensitive enough, and their frequency response generally did not extend beyond the human hearing range. Today, almost 90% of the microphones manufactured annually are electret condenser microphones (Rossing 2007) because of their many advantages when compared with dynamic microphones, including higher sensitivity, higher fidelity, and wider frequency response. Piezoelectric transducers are now mainly used in hydrophones that have specialized ceramics that provide high sound quality. Robjohns (2010) provides a history of microphone evolution and outlines how advances in broadcast radio, telephones, television, and music industry, along with the need for directional and ultrasonic recordings, drove the design of several new types of microphones (e.g., the condenser-, dynamic-, ribbon-, and carbon-microphones).

The widely used condenser microphones are fairly sensitive, compared with dynamic microphones, and feature an extended frequency response, but they require external power. Professional condenser microphones are often powered through the signal cables with 48 V (phantom power, P48) provided by the recording device, by a preamplifier, or by a power unit. Consumer microphones usually use electret condenser capsules that require 3–5 Vdc powering (plug-in power, PIP) provided by the recorder via the microphone plug. Microphones well-suited for bioacoustics studies can be built with electret condenser capsules costing only a few US dollars (Fig. 1.13). For a detailed discussion of features and operation of microphones, see Chap. 2, section on selecting a microphone.

Fig. 1.13
figure 13

Photograph of the PRIMO EM172 microphone capsule (left) used by many nature sound recordists for their custom-made microphones (center and right). Courtesy of M Pesente

Many animals including insects, frogs, bats, and other terrestrial and marine mammals emit ultrasonic sounds (Sales and Pye 1974). Studies of ultrasonic signals require a broadband microphone capable of responding to signals at very high frequencies. In contrast, some animals, such as elephants, produce very low-frequency sounds and require infrasonic microphones capable of detecting signals at or below 20 Hz (Payne et al. 1986). Previously, ultrasonic and infrasonic recording required very expensive and complex transducers, recorders, and analyzers. With the advent of broadband AD-converters in laptops and smartphones, ultrasonic and infrasonic animal sounds can now be recorded at a reasonable cost. Ultrasonic microphones may use small electret condenser capsules or MEMS, which are primarily used in smartphones. MEMS are small and inexpensive, feature an extended frequency response (including the ultrasonic frequency range), can include an AD-converter, and can be directly integrated into digital systems. Some microphones also incorporate a high-speed AD-converter and USB interface to be directly connected to a computer, a smartphone, or a tablet for recording and real-time display. The Dodotronic Ultramic series offers a range of USB ultrasonic microphones with sampling frequencies ranging from 192 kHz to 384 kHz (Buzzetti et al. 2020); the most advanced models also include the ability to record on an internal microSD memory card.Footnote 11

In cases where researchers want to separate sounds coming from different directions, or target an individual animal for recording, a directional microphone, a parabolic reflector, or a microphone array can be used. One of the first documented attempts was in 1932, when Peter Paul Kellogg and Arthur Allen used a microphone installed in the focus of a parabolic reflector to record bird sounds (Wahlstrom 1985; Ranft 2001). Parabolic reflectors have been widely used to record animal sounds, capture distant speech, and detect the noise of incoming vehicles and airplanes during the first and second world wars (i.e., before the invention of radar; see Chap. 2 for a discussion of use and features of parabolic reflectors). As an alternative to parabolic reflectors, ultra-directional microphones, or so-called shotgun microphones, were developed. The design of shotgun microphones is based on the interference tube principle to attenuate off-axis sounds; these microphones were developed to have a narrow angle of forward reception. The shotgun was initially designed for use in a studio setting (as opposed to recording long-distance sounds) to minimize off-axis sounds (e.g., noise from the public and room reflections).

Single microphone (i.e., monophonic) recordings cannot provide any spatial information. These recordings are made with a single microphone that can be an omnidirectional microphone to capture all sounds around or a directional one to capture sounds from a specific source or direction. However, microphones can be paired to record sounds in stereo to provide a spatial sound image wherein listeners can identify the perceived spatial location of the sound source. Many different types of microphone configurations have been developed, mainly for recording music, but also for recording soundscapes.

A further development, mainly conceived for cinema and videogames, is the surround system that is based on multi-microphone (i.e., microphone array) recordings and speakers placed around the listener to create a more immersive acoustic experience (Streicher and Everest 1998; Rayburn 2011). With 3D audio, a whole acoustic space is recorded with a microphone array. From this, it is possible to extract sound information to build a stereophonic or binaural or surround program. Today 3D audio is mainly used for 3D Virtual Reality, with either video game, cinema or scientific uses, that allows the user to be placed in a 3D audio and video environment (with special visors and headphones, or in special VR rooms) and to move inside it to look and listen in any direction. The currently most used 3D audio system is Ambisonics (Fig. 1.14) that is based on 4 (first order), 8 (second order), 16 (third order) or more channels (Zotter and Frank 2019).

Fig. 1.14
figure 14

Ambisonic recorder with 4 microphones (first order) Zoom H3VR

Specific microphone array applications in bioacoustics include localizing sound sources, either static or moving, such as flying bats (Blumstein et al. 2011). Using specific algorithms, signals can be extracted from the microphone array, and the direction and intensity of sound sources can be identified by superimposing a sound map on top of an image taken by a video camera. This type of application is called an acoustic camera and is largely employed by the automotive industry to locate sources of noise in a vehicle. Acoustic cameras help visualize patterns of both indoor and outdoor noise (e.g., of a passing car, train, airplane, or around a wind turbine). Acoustic cameras have the potential to help in localizing biotic sound sources; however, they are expensive and have been rarely used for bioacoustics studies; an example is given by Stoeger et al. (2012) to identify the sound sources in elephants.

1.3.2 Measurement Microphones

Measurement microphones are a special class of microphones designed to make accurate amplitude measures of sounds, ranging from infrasound to ultrasound. Although measurement microphones can be used for recording, they are generally used to characterize the acoustic properties of a signal or of a location. Usually, measurement microphones are condenser microphones optimized for a specific frequency range and used to characterize a sound field or a sound level when connected to a sound level meter (or phonometer); see Chap. 2 for a discussion of measurement microphone features and operation. This microphone technology has not changed much over time; however, the measuring equipment to which microphones are connected has evolved within a few decades from bulky and expensive analog devices to small, powerful, and flexible digital devices also able to provide spectral analysis.

1.3.3 Accelerometers

An accelerometer measures the acceleration (i.e., the rate of change of velocity) of an object. Single- and multi-axis accelerometers can detect both the magnitude and the direction of the acceleration, as a vector quantity. They can thus measure the movements of an animal (e.g., mounted in a collar) or to sense the vibration of a body part. Tiny accelerometers are used to detect vibrations generated by insects and other animals for communication. The recently defined science of biotremology uses accelerometers and laser vibrometers to study vibrational communication in insects and other zoological groups (Hill et al. 2019) by either detecting their movements or the vibrations transmitted through the substrate. MEMS accelerometers are now very tiny and largely used in electronic devices, such as smartphones and game controllers, to sense their movement in space.

1.3.4 Laser and Optical Microphones

Laser microphones, also known as laser interferometers, laser accelerometers or vibrometers, are designed to detect vibrations on a surface without any contact with the sound source. These microphones can detect vibrations over large distances, from few centimeters to tens and hundreds of meters. For example, laser microphones can measure the vibration of a glass window to capture the sounds produced inside a room. These devices were developed for spying purposes and are now mostly used in industry to record vibration of machinery. In bioacoustics research, and biotremology studies in particular (Hill et al. 2019), this technology is used to record the vibration of animal body parts (e.g., wings or abdomen of insects producing sounds) or vibration of the substrates (e.g., plant stem, tree trunk, spider-web, and burrow-wall), which could indicate the presence of an animal. Current instruments are lightweight and easy to use; however, they require that the target being recorded is not moving and on a stable platform. These devices should not be confused with optical microphones and hydrophones, which are being developed and have a completely optical chain, where the transducer directly produces an optical signal to be sent on an optical fiber cable, either analog or digital, from the transducer to the recorder.

1.3.5 Bat Detectors

In the eighteenth century, the Italian scientist Lazzaro Spallanzani recognized that bats were capable of navigating and capturing their prey in the dark. While Spallanzani hypothesized that this was related to their hearing, it was not until the development of ultrasonic recorders and microphones in the early 1940s (Fig. 1.15) that scientists were able to study the ultrasonic sounds produced by bats for echolocation (Griffin 1944). Donald Griffin was working with piezoelectric transducers connected to an oscilloscope when he observed high-frequency signals produced by bats flying outside his open laboratory window. This discovery opened an entirely new field of bat echolocation research.

Fig. 1.15
figure 15

Left: Photograph of an early ultrasonic bat detector from the laboratory of Donald Griffin. Image courtesy of the Cornell Laboratory of Ornithology. Right: Photograph of an ultrasonic USB microphone UltraMic250k, based on MEMS, developed by Dodotronic in 2010, connected to a tablet computer that allows recording and display of ultrasounds in real-time

Early bat detectors were based on the heterodyne principle and on frequency-division counters (Obrist et al. 2010), which produced audible but highly distorted sounds when receiving ultrasonic calls. Heterodyne detectors allowed only a narrow frequency range up to a few kHz, to be shifted down to the audible range. The user then tuned the detector to the frequency of interest and listened to and recorded signals only around the tuned frequency. Information outside that frequency range was discarded.

Frequency division (or count-down) detectors cover a broad frequency range. They are based on zero-crossing detection. They count how many times the signal waveform crosses zero pressure and they produce a synthetic wave every n incoming waves. The output signal frequency is a fraction of the original frequency (i.e., 1/n), and advanced systems retain the amplitude envelope of the original signal. The frequency division method is much better than the heterodyne; however, both produce a distorted signal often not useful for scientific investigation. The first digital models, called time-expansion detectors, digitally recorded the incoming bat calls at a high sampling rate, and played them back at a reduced sampling rate, which allowed for human observers to hear the calls and record them on a conventional recorder (Obrist et al. 2010). This method preserves all acoustic features so that recordings can be used for scientific analysis.

Digital bat detectors include a built-in ultrasonic microphone, onboard signal sampling and processing, memory for digital data storage, a graphical display to show a spectrogram with related settings, and a speaker for monitoring incoming ultrasounds by either slowing down or shifting them in frequency. Current models are completely digital, they record and store data continuously, and can transpose ultrasounds into audible sounds in real-time by spectral shifting (or spectral compression), using a Fast Fourier Transform (FFT) algorithm (see Chap. 4 on signal processing). Some bat detectors can be used as autonomous recorders which can selectively record ultrasounds from echolocating bats for many consecutive nights, with a programmable timer to start at sunset and stop at sunrise. Some also have analysis software that identifies the species, of course with variable margin of error depending on the species (see Chap. 2, section on bat detectors). Given the computing and storage capabilities of current tablets and smartphones, dedicated ultrasonic microphones with an integrated AD interface also are available to record bat calls and display their features on the device screen (Fig. 1.15).

1.4 Advances in Hydrophones

In 1826, Jean-Daniel Colladon and Charles-Francois Sturm made an experiment in Lake Geneva, Switzerland, to determine the speed of sound in water (Colladon 1893). They used two small boats on opposite sides of the lake, ~14 km apart. On one boat, there was an underwater bell, which was struck at the same time that gunpowder was ignited, which resulted in a paired underwater sound and above-water gunpowder flash. The operator of the second boat used an underwater listening horn to detect the sound of the bell (Fig. 1.16). The time difference between seeing the gunpowder flash and hearing the bell allowed the scientists to compute the speed of sound in water. Their measurements were fairly accurate and indicated that the speed of sound in water is approximately five times greater than the speed of sound in air.

Fig. 1.16
figure 16

Experimental setup to determine the speed of sound underwater. Image Source: J. D. Colladon, Souvenirs et Memoires, Albert-Schuchardt, Geneva, 1893

Until the advent of hydrophones, it was assumed that oceans, rivers, and streams were quiet environments. Much of hydrophone development was driven by military needs during World Wars I and II, when the use of hydrophones and sonar projectors facilitated the detection of enemy vessels, particularly submarines, by listening to their sound (i.e., passive sonar) or by listening for the reflection of emitted sound pulses (i.e., active sonar). Sonar operators were some of the earliest bioacousticians who were able to distinguish sonar signals from marine animal sounds (Fish and Mowbray 1970). Today, hydrophones are used in a large variety of biological research applications to monitor population dynamics and behavior of marine invertebrates, fish, and mammals (Au and Hastings 2008; Tremblay et al. 2009). Hydrophones are also largely used to monitor the underwater noise produced by ship traffic and other invasive activities, such as seismic surveys with airguns and naval sonar (Pavan et al. 2004).

1.4.1 Single Hydrophones

Hydrophones are transducers used to receive underwater sound; they are usually based on piezoelectric materials. Hydrophones are generally built with a piezoelectric transducer that generates a voltage when compressed/decompressed; conversely, it can vibrate and produce sound waves if excited by an electric signal. Piezoelectric transducers can be operated either as a receiver or as a transmitter. In 1917, Paul Langevin obtained a large 10 cm × 10 cm × 1.6 cm slice of a natural quartz crystal and used this to develop a transmitter capable of emitting sound so powerful it killed nearby fish. After World War II, other materials (potassium dihydrogen phosphate, ammonium dihydrogen phosphate, and barium titanate) were used instead of quartz to build hydrophone transducers (Rossing 2007).

As the Navies of the world began to recognize the utility of listening underwater, hydrophone technology developed fairly rapidly, and also was used for oceanographic and biological research (Wenz 1962; Munk and Wunsch 1979; Urick 1983; Naramoto 2000). Most of the early bioacoustics research on aquatic animals was conducted using a battery-operated single hydrophone (Fig. 1.17) suspended in the water from the shore, a small boat, or sea ice, and required the presence of a researcher.

Fig. 1.17
figure 17

Simple piezoelectric hydrophone (Aquarian Audio HC2a) with PIP powering connected to a digital pocket recorder (SONY PCM-M10)

Traditional hydrophones feature an analog output (voltage or current) and are available with or without a front-end preamplifier. Hydrophones that feature an integrated AD-converter and digitize the analog signal directly at the sensor are now commercially available. Some digital hydrophones also integrate signal processing and storage capabilities (e.g., real-time reporting of noise levels). Because of the increased power consumption of digital hydrophones, these are primarily used in cabled sensor networks, such as seafloor sensors or sub-surface towed arrays.

1.4.2 Sonobuoys

Navies of the world recognized the need for a hydrophone that could operate remotely, was mobile, and could monitor sounds at different water depths, which led to the development of sonobuoys. Sonobuoys are individual canisters that float at the water surface and house a hydrophone, dampening cable, battery, recording/transmitting electronics, and a transmitting antenna. See Chap. 2 for details of features and operation of sonobuoys. Navies of the world used sonobuoys for underwater listening to detect submarines by deploying them from airplanes or ships. A few labs were able to acquire military sonobuoys and used them for receiving and recording marine animals.

1.4.3 Autonomous Underwater Acoustic Recorders

In recent years, a wide variety of stationary, autonomous passive acoustic monitoring (PAM) systems have been developed for the recording of acoustic activity from naturally occurring biological and geophysical sources, as well as from anthropogenic sources in marine environments (Figs. 1.19, 1.20, 1.21, and 1.22). These systems have an advantage over systems that rely on human observers as they are non-invasive and able to collect long-term data from remote areas independently of weather and light conditions (Mellinger et al. 2007; Lammers et al. 2008; Tremblay et al. 2009; Obrist et al. 2010; Sousa-Lima et al. 2013; Jacobson et al. 2016); see Chap. 2.

1.4.4 Towed Hydrophone Arrays

A towed array contains several hydrophones housed in an oil-filled plastic sleeve, which are pulled behind vessels of varying size. Towed arrays of hydrophones allow beamforming (a processing technique that combines time-delayed signals from multiple hydrophones to increase gain in a given direction) to improve signal-to-noise ratio and estimate bearings to specific sound sources. Consecutive bearing estimates allow the localization of a source and determining its range. A towed array in effect provides a high-gain, directional sensor that can be steered in different directions either in real-time or in the post-processing of recordings (see Chap. 2 for details of towed hydrophone arrays). During World War I, a towed sonar array (the first documented towed array) known as the Electric Eel was developed by the US Navy physicist Harvey Hayes (Naramoto 2000). Bill Watkins and William Schevill at Woods Hole Oceanographic Institution were among the first bioacousticians to use this technology to record and study the sounds of marine mammals (e.g., Watkins and Schevill 1977; Watkins et al. 1987). The original towed arrays focused on lower-frequency signals (i.e., frequencies typical of foreign vessel noise), but Schevill and Watkins developed new instruments to record the higher frequencies emitted by dolphins. Their recordings are of high scientific value and are available online in digital format at the WHOI Watkins Sound Library.Footnote 12

In 1983, Thomas et al. (1986, 1987) worked with a geophysical company to build a modified towed array specifically for the study of marine mammal sounds (Fig. 1.18), which was capable of capturing low- and medium-frequency underwater sounds (20 Hz–15 kHz). Depth and temperature sensors on the array measured the thermocline and sound propagation conditions in the area. Self-noise from the moving ship was present, but filtered out as much as possible. Many species of marine mammals were heard, which helped the fishermen find tuna as they tend to associate with dolphin pods.

Fig. 1.18
figure 18

Left: Photograph of the topside electronics required to receive, record, and process data from a towed array in 1983. Right: Photograph of deploying a towed array from the deck of a tuna seiner, the MV Queen Mary, to listen for underwater sounds of marine mammals and fish in the Eastern Tropical Pacific. Photos by Jeanette Thomas

In recent years, lightweight towed arrays have been developed to meet the requirements of studying marine mammal sounds from small platforms, such as sailboats (Pavan and Borsani 1997). Deployment of the towed array from a sailboat minimizes recorded self-noise of the towing vessel. Current towed arrays can capture sounds over a large geographic area and cover a wide frequency range (from infrasound to ultrasound).

1.4.5 Seafloor Hydrophone Arrays

Arrays of bottom-mounted hydrophones were an important naval asset for the surveillance of oceans for the presence and movements of enemy vessels and submarines. In the 1950s, at the height of the Cold War, the US Navy launched a classified project known as the SOund SUrveillance System (SOSUS). The SOSUS large-aperture arrays allowed the Navy to detect signals at ranges of several hundred kilometers. SOSUS arrays were highly successful in detecting and tracking Soviet submarines of that era. The sailors operating the early SOSUS arrays also detected numerous biological sounds of unknown origin. An unknown low-frequency sound was attributed to the “Jezebel Monster,” yet later found to be from blue (Balaenoptera musculus) and fin whales (Balaenoptera physalus). After the end of the Cold War, the SOSUS system was made available to scientists (Nishimura and Conlon 1994; Stafford et al. 1998; Watkins et al. 2000), who monitored the presence of marine mammal sounds and tracked their long-range seasonal movements across the oceans. In one case, a blue whale was tracked for 80 days along the eastern seaboard of the USA using the 20-Hz signal the animal repeatedly produced.

At present, bottom-mounted arrays of hydrophones are deployed across oceans worldwide, with some strictly dedicated to military applications, and others dedicated to monitoring earthquakes or nuclear explosions, such as the array operated by the Comprehensive Nuclear Test Ban Treaty Organization (CTBTO). Over the last decade, multidisciplinary seafloor networks were established: the North-East Pacific Time-series Undersea Networked Experiments (NEPTUNE) and the Victoria Experimental Network Under the Sea (VENUS) in CanadaFootnote 13; the Controlled, Agile, and Novel Ocean Network (CANON) run by MBARI in the USA; the European Multidisciplinary Seafloor Observatory (EMSO) run by Europe; the Submarine Multidisciplinary Observatory (SMO) managed by Italy; and the Neutrino Mediterranean Observatory (NEMO also known as KM3net) operated by the Neutrino Mediterranean Observatory. Some of these arrays are equipped with wideband hydrophones, which allow scientists to monitor a variety of marine mammal species as well as ambient noise levels (Nosengo 2009; Favali et al. 2013; Caruso et al. 2015; Sciacca et al. 2015; Viola et al. 2017). NEPTUNE and VENUS also provide online public access to recorded data. The Listening Into the Deep Ocean (LIDO) project provides real-time streaming of acoustic data that is a gateway to several underwater data acquisition systems (André et al. 2011).

1.4.6 Small Arrays

Novel hydrophone array configurations have recently been developed for a team led by François Sarano to conduct a longitudinal study on the same group of sperm whales since 2013, under the authority of the Marine Megafauna Conservation Organization and as part of the global program Maubydick. In 2017 and 2018, the team collected a set of audio-visual recordings using a custom acoustic antenna developed by the University of Toulon with the JASON Qualilife DAQ (Data AcQuisition) to record the animals in the near field at very high frequency (600 kHz sampling frequency, Fig. 1.19). A similar antenna has been deployed in Amazonia allowing high-definition 3D tracking and click analysis of the Amazon river dolphin (Inia geoffrensis; Glotin et al. 2018).

Fig. 1.19
figure 19

The JASON Qualilife DAQ 3x600 kHz in the custom array by H Glotin, recording sperm whales in the near field in 2018. Courtesy of V Sarano

1.5 Autonomous Mobile Systems

1.5.1 Aerial Mobile Systems

Autonomous mobile monitoring systems were developed for terrestrial applications, such as the Autonomous Aerial Acoustic Recording Systems (AAARS) developed at the University of Tennessee (Buehler et al. 2014). This system is based on an altitude-controlled weather balloon with an acoustic recorder and a GPS unit with radio transmitter. It moves quietly according to local winds and can be tracked by a radio receiver. If ground anchored, this system allows the recording of sounds in a given location. Mobile systems based on drones, on the contrary, can be stationary or can be programmed to survey a given area, however, they are very noisy and this can severely affect animal behavior and both the quality and usability of the recordings.

1.5.2 Underwater Mobile Systems

The high cost of visual and acoustic marine surveys conducted from large research vessels drove the development of new monitoring solutions using autonomous vehicles; either moving on the surface (Unmanned Surface Vessels, USVs) or underwater (Autonomous Underwater Vehicles, AUVs). These systems are remotely operated by an onshore pilot and can monitor offshore areas for weeks or months at a time (Klinck et al. 2012, 2015).

The most commonly used autonomous mobile systems to monitor the marine acoustic environment are underwater gliders (Baumgartner et al. 2013). These instruments (Fig. 1.20) use small changes in buoyancy, in conjunction with wings, to convert vertical motion to horizontal motion, and thereby propel themselves forward with very low-power consumption. Gliders slowly dive (~ 0.25 m/s horizontal speed) in a saw-tooth pattern through the water. When surfacing after a dive, the glider communicates with an onshore base station to exchange data and commands (e.g., send position, remaining battery capacity, whale detections, and ambient noise levels, and receive new waypoints). The maximum operating depth of current models is about 1000 m. Therefore, these instruments are well-suited for monitoring of deep-diving odontocetes, such as beaked whales (Klinck et al. 2012).

Fig. 1.20
figure 20

Left: Photograph of the passive acoustic seaglider™ developed by the Applied Physics Laboratory, University of Washington. Courtesy of G Shilling. Right: The Sphyrna ASV allows 3D passive acoustic tracking of diving cetaceans

Other instruments in this category include deep-diving (Matsumoto et al. 2013) and surface drifters (Griffiths and Barlow 2015). These instruments drift with the ocean current and cannot be programmed to navigate along a defined track-line. However, they are much cheaper than gliders. Recent Autonomous Surface Vehicles (ASV) can perform surveys along a pre-defined track; among these, the Sphyrna (Fig. 1.20) has advanced algorithms to allow 3D passive acoustic tracking of deep divers with four hydrophones fixed on the keel (Poupard et al. 2019).

1.5.3 Animal Acoustic Tags

A recent development for studying animals in-situ is the animal-worn acoustic tag. Such devices allow detailed observations of the movement and acoustic behavior of tagged animals. However, for some species, such as cetaceans, developing a reliable, long-term instrument attachment has been problematic.

Recorders in collars, similar to those used for radio tracking, have also been experimented to record sounds and activity of terrestrial animals while moving freely, but with few applications. More successful was using the crittercam developed and used by National Geographic to primarily provide amazing videoFootnote 14 of wild animals either on land or in water. Lynch et al. (2013) attached an inexpensive collar-mounted recording device on ten wild mule deer (Odocoileus hemionus) over two weeks in Colorado. Recorded sounds included rumination, which allowed the researchers to document foraging activities.

Video tags have been attached to whales, dolphins, sirenians, and penguins, and to document the underwater life. Sophisticated acoustic tags provided an important step forward in marine mammal bioacoustics. The development of these tags was primarily driven by the need to document and understand the reaction of cetaceans to underwater sounds such as naval sonars, airguns, and pile drivers. The D-TAG (Johnson and Tyack 2003), A-Tag (Akamatsu et al. 2007), Acousonde recorder (Burgess et al. 2011), and other similar instruments, feature a variety of animal movement detectors (three-axial accelerometer, magnetometer, depth-sensor, light sensor, etc.) and acoustic sensors (hydrophones). These tags are attached to the animals with non-invasive suction cups, and usually stay attached for a few hours, but can stay on the animal for up to a few days. Once detached, the tag floats to the surface and transmits a radio signal to aid recovery. This kind of technology (Fig. 1.21) has enabled important research on sound usage and behavioral responses of animals to anthropogenic sounds, such as naval sonars (Tyack 2009; Tyack et al. 2011).

Fig. 1.21
figure 21

The evolution of the DTAG over fifteen years. Each design comprises electronics, batteries, suction cups, floatation material, and a VHF transmitter for retrieval when the tag is floating on the sea surface. The tags all record sound, depth, and motion to solid-state memory. However, the size, capabilities, and endurance have changed over the years. The earliest version developed in 2000 (a) had 400 MB of memory and could record a single sound channel at 16 kHz sampling frequency for a few hours. The most recent version developed in 2009 (b) records stereo sound at up to 500 kHz sampling frequency for almost two days. (c) is an intermediate version of the tag. Courtesy of P Tyack and M Johnson (2016)

Often a variety of sensors can be attached to the animal to provide additional environmental or behavioral data to accompany acoustic recordings. Evans et al. (2004) attached a waterproof video camera with a hydrophone, VHS recorder, and depth-sensor to examine vocal behavior during dives of Weddell seals in Antarctica. Each time the seal vocalized, the depth and time of the sound were documented, audio and video were recorded, and the call type was later analyzed in the laboratory. Researchers had to retrieve the VHS tapes, but this species remains close to a colony during the breeding season, hauls out on the ice daily, and is easily (re)captured for recovery of the tag and data. Current digital video equipment is highly miniaturized and allows new exciting options for exploring the life of animals in the wild.

1.6 Advances in Sound Analysis Hard- and Software

The most important advancements in sound analysis equipment were the transition from analog-to-digital systems, along with the transition from hardware to software signal processing. This provided lightweight, field portable, battery-operated units with higher storage capacity, more stable storage media, and broadband analysis, often at a more affordable price than before. Now, even a smartphone can produce a spectrogram in real-time. Another important breakthrough was the ability of scientists to share digital data using the internet and shared storage in the cloud.

Initially, the basic analysis of acoustic signals was done using oscilloscopes. These instruments provided a visual representation of the waveform of acoustic signals known as oscillograms, which are plots with amplitude on the y-axis and time on the x-axis. Originally, oscilloscopes were large, heavy, expensive, AC powered, and used vacuum tubes. To obtain a hardcopy of the waveform, a camera was used to capture an image from the display. In some cases, the waveforms were traced on paper by an oscillating pen (similar to a seismometer).

The Kay Electric Company (later to become Kay Elemetrics) developed the Sona-Graph™ machine, which was a completely analog instrument and one of the first instruments to create an image of a sound known as a SonaGramTM. Developed primarily for navy applications and initially called vibralyzer, this technology was applied successfully to the study of human speech and animal sounds (Koenig et al. 1946; Borror and Reese 1953; Thorpe 1954; Marler 1955: Fig. 1.22). A SonaGram (sometimes called a sonogram by biologists) is a visual representation of the frequencies (on the y-axis) and intensity (color or shades of gray as the z-axis) in a sound as they vary with time (on the x-axis). This type of image visualization is also called spectrogram. The Sona-Graph™ was very expensive and capable of analyzing a signal of only a few seconds in duration up to 8 or 16 kHz. The device offered two analysis settings, wideband (300 Hz) and narrowband (45 Hz). The wideband setting provided better time resolution, while the narrowband setting provided better frequency resolution (Beecher 1988). The sound could be played back from a reel-to-reel recorder and recorded on an iron oxide magnetic track, which ran the circumference of a large internal turntable. A special thermo- sensitive paper was wrapped around a drum mounted on top of the turntable. The drum spun synchronously with the turntable as the signal was played back through a variable band-pass filter or a filter bank, and a stylus burned the signal onto the paper on the rotating drum according to the level of sound at the frequencies given by the filter (Fig. 1.23).

Fig. 1.22
figure 22

Photograph of L. Irby Davis using an early Kay Electric Co. Sona-Graph Sound Spectrograph analyzer (the late 1950s). Notice the sonogram on the paper wrapped around the drum on top of the analyzer. Courtesy of the Cornell Laboratory of Ornithology

Fig. 1.23
figure 23

Two spectrograms by Ken Norris illustrating the wide-band (top) and narrow-band settings (bottom) of the Kay Sona-Graph 6061A spectrum analyzer. Note that the values of the x- and y-axes were not printed on the output. The x-axis is time in seconds and y-axis is the frequency in hertz. Courtesy of the Cornell Laboratory of Ornithology

This was a smelly, smoky process, which made the procedure unpleasant for researchers. To analyze a long sound recording, several short spectrogram sections had to be printed and taped together. The resulting sheets of paper often required a lot of wall or table space for review and further analysis. Because of the large size, these spectrograms were also difficult to reduce in size and adapt for inclusion in a publication.

In the 1970s, a camera using Kodak photographic paper (the size of 35-mm film) was attached to the screen of an advanced oscilloscope capable of performing real-time FFT spectrum analysis (Hopkins et al. 1974). As the sound played, a spectrogram image appeared on the screen and the camera photographed the resulting image in real-time. Measurements of frequency and time could be taken as the spectrograms were displayed. The photographic paper had to be developed in a dark room and produced a roll of 35-mm paper about 4 m long. One advantage of this system was the ability to view the sounds in real-time, which allowed scientists to study patterns of sounds. This system produced long-lasting spectrograms that are still usable 40 years later (see Thomas and Kuechle 1982 for samples of sonogram output).

Once thermal imaging paper (similar to the paper used in older fax machines) was developed, Kay, Unigon, and other companies developed real-time spectrogram imaging units, which had a continuous output using large rolls (8 inch wide) of thermal imaging paper. For further analysis, segments had to be cut with scissors. However, these data were difficult to analyze, store, and prepare for publication. Measurements of frequency and time could be taken as the images were displayed on the analyzer but were not provided on the output itself. If exposed to light or heat, the hardcopies gradually turned brown and were generally unusable after a few years.

In the mid-1970s, the first attempts were made to use general-purpose computers to analyze sounds, mainly for speech analysis. These attempts used the Fast Fourier Transform (Strong and Palmer 1975), an algorithm that decomposes a signal segment into a finite number of sinusoids, each one characterized by frequency, amplitude, and phase. This algorithm was successfully applied to the human voice and to animal sounds to produce spectrograms in different formats. The speed and data-handling capabilities of computers in subsequent years allowed for the implementation of more complex mathematical signal processing algorithms (see Chap. 4 on signal processing).

A few years later, in 1980, a computer-based digital spectrographic workstation was developed at the University of Pavia (Italy) that produced black-and-white spectrograms of animal sounds on a computer screen, with a moving cursor to take measures. The workstation produced and printed a spectrogram of a 1-s signal in about 40 minutes (Pavan 1983, 1985). The AD-converter allowed users to acquire and analyze sounds in the ranges of 5, 10, and 20 kHz with a sampling frequency of 51.2 kHz. Hardcopies of displays were made on the computer’s printer and then joined together (Fig. 1.24).

Fig. 1.24
figure 24

Black-and-white spectrogram of a 2.4-s bird song (Thekla lark) produced in 1981 by joining three printouts of 800 ms each; the spectrogram generation required 2 hours. The x-axis is time in seconds and y-axis is the frequency in hertz. Frequency range 0–5 kHz, sampling frequency 20,480 Hz, and 12-bit resolution (72-dB dynamic range). From top: spectrogram, envelope, tracking of dominant frequency, and amplitude plot in dB

Around that same time, in 1984, a group of acousticians at The Rockefeller University and Engineering Design Inc. developed a software program, called Signal. This software was developed for computers and was able to control and communicate with the recording hardware. The system was able to display spectrograms in real-time, provide basic time-frequency information of recorded signals, and store data digitally on the computer’s hard disc. These developments revolutionized bioacoustics sound analysis; however, at the time, these units were expensive, custom-made, and had very little storage capacity (the typical storage available in 1985 was 5 MB on a 15-inch magnetic disc).

In 1985, the spectrographic workstation was upgraded to produce color spectrograms (Fig. 1.25; Pavan 1992) on a mainframe computer (HP 1000) interfaced to an AD-converter and to a graphic workstation.Footnote 15 Around this time, the first personal computers (PC) appeared, and the software was rewritten to produce real-time color spectrograms and signal envelopes using an Intel 8086/8087 processors and a high-quality Audiologic Duetto sound board produced in Italy, with sampling frequency up to 48 kHz with 16-bit resolution, and later with a widely available and cheap Sound Blaster sound card. A mouse-driven cursor allowed to take accurate measures directly on the computer screen, and printouts were possible in gray scales on standard matrix-dot printers or on thermal printers. By storing the recordings in a digital format, it was also possible to edit the recordings and to play them back at a different speed or even backward (e.g., to produce playback tapes for behavioral experiments).

Fig. 1.25
figure 25

Photograph of an envelope-plot and color spectrogram generated by the digital signal processing workstation based on HP1000 mainframe in 1985. Recordings were of calls of a Barbary partridge (Alectoris barbara)

At the same time, other researchers started experimenting with digital signal processing. Aubin (France) and Specht (Germany) developed similar digital sound analysis systems that also included the synthesis of sounds for playback experiments (Bremond and Aubin 1989; Specht 1992; Aubin et al. 2000). Specialized AD-converters appeared on the market to sample analog signals at high rates, which allowed digital recording and analysis of frequencies up to 100 kHz. However, specialized processors (Digital Signal Processors, DSP) were required to process ultrasonic signals in real-time (Pavan 1992, 1994).

In 1987, new commercially available digital instruments dedicated to sound analysis became available, among them the Kay Sona-Graph DSP 5500 (Fig. 1.26). This very expensive unit was able to analyze and display stereo signals in real-time up to 32 kHz. Either reel-to-reel or cassette recordings could be used as an input, and the unit had a thermal-paper printer for printing gray-shaded spectrograms.

Fig. 1.26
figure 26

Photograph of the University of Pavia bioacoustic laboratory equipment in 1989 with a Kay Sona-Graph DSP 5500, color monitor, thermal printer, portable open-reel stereo recorder, cassette deck recorder, filter bank, speakers, and headphone

Digital sound storage and analysis became widespread given the improvements in digital computer technology and data storage, coupled with the proliferation of personal computers, and the development of dedicated sound analysis software packages. These advances also fostered the development of high-quality electro-acoustic and musical equipment (microphones, recorders, and AD-converters) for a rapidly expanding consumer market of musicians and music enthusiasts. Among the first analysis software dedicated to bioacoustics, it is worth to mention Canary, developed for Macintosh computers at Cornell University, then replaced by Raven,Footnote 16 a multi-platform software developed from the same university. For an overview of computer-based bioacoustics sound analysis and related algorithms, see Hopp et al. (1998), Zimmer (2011), and Sueur (2018). Many academic institutions and companies started to develop software programs for PC, Mac, and Linux computers.Footnote 17

These software programs allowed for easy recording, manipulation, analysis, and display of signals. Now, researchers are able to collect huge acoustic datasets, and computational bioacoustics faces the Big Data problem. The latest software programs, either commercial or open source, also enable the user to run sophisticated detection/classification algorithms over long-term data sets for automated detection of occurrences of a target sound (see Chap. 8 on detection and classification methods). This saves much time and avoids having to view and listen to the entire recording manually. Scientists also can use readily available programming environments (including MATLAB, Octave, Python, R) to develop their own analyses, often facilitated by libraries of procedures dedicated to sound processing and bioacoustic analysis (e.g., Sueur et al. 2008; Sueur 2018; Ulloa et al. 2021).

In the late 1990s, smartphone technology was developed, along with sound analysis software for these devices. Smartphones of the twenty-first century have the same computing power as a desktop PC. Sound recording and visualization applications were developed for both Android and iPhone Operating System (iOS) platforms. In addition, the development of the Internet of Things and low-cost computer platforms (e.g., Arduino, Raspberry PI, and others) have allowed scientists to build web-enabled data recording and analysis systems. These new technologies and analytical methods can be applied not only to audible sound but also to infrasonic and ultrasonic signals. For example, ultrasonic echolocation signals produced by bats can now easily be shifted into the human hearing range, visualized, and analyzed in real-time with handheld digital devices, with a smartphone equipped with an ultrasonic microphone, or remotely monitored with web-connected recorders.Footnote 18

1.7 Summary

Advances in electronic technology over the last 100 years, including the dramatic size reduction of equipment, increased battery life, increased data storage capacity, the switch from analog-to-digital recorders, along with the transition from analog-to-digital signal processing, have facilitated an explosion of research in the field of bioacoustics. Many of these advances were enabled by equipment developed for military use, professional music applications, human speech analysis, and for the radio, television, and film industries. Often an improvement in one type of equipment led to advancements in another. Analog devices, which stored data on magnetic tape, were replaced by digital devices, such as optical discs, hard drives and solid-state memory cards. Microphones and hydrophones are now used in arrays that allow long-term monitoring, localization of the sound-producing animals, and 3D acoustic recording. Towed hydrophone arrays allow mobile surveys of marine sounds, which can be coupled with animal sightings and environmental data. Autonomous transducer/recorder units can be deployed for long-term monitoring of biotic and abiotic sounds in both air and water in remote habitats. Recently, smartphone applications have provided an affordable and portable bioacoustics laboratory for use by hobbyists, citizen scientists, and researchers alike.

The digital revolution in sound recording and analysis has facilitated significant advances in the field of bioacoustics and enabled the development of ecoacoustics, which joins bioacoustics and ecology, and computational bioacoustics. Acousticians are now able to study the sounds from sound-producing species in a wide variety of locations, during day and night, year-round, and often remotely. Many free and commercially available software packages for recording and analyzing acoustic data have been developed for computers, tablets, and smartphones. Artificial Intelligence is now being applied to big data problems and to bioacoustic recordings to hopefully classify and recognize sounds at species level. It has never been easier or cheaper to study the acoustic world ranging from infrasounds to ultrasounds. However, it is always important to know the intrinsic limitations of each piece of equipment or software, the constraints given by the environmental context, and all their potential impact on the final results. It is also worth considering that bioacoustics and ecoacoustics are now being widely used to study and monitor critical and endangered species and to monitor entire ecosystems to understand climate change impacts.