1 Introduction

Acoustic signal processing using multiple microphones has received significant attention due to its fundamental role in a number of applications such as assistive hearing with hearing aids or cochlear implants, teleconferencing, hands-free telephony, voice-controlled devices, spatial audio reproduction, and sound-zoning, just to name a few. Some of the specific tasks which can be accomplished with acoustic signal processing include speech enhancement and speech dereverberation [1,2,3,4,5,6,7,8], room parameter estimation [9], acoustic echo and feedback cancelation [10, 11], source localization [2, 5, 12], audio source separation [7, 8], sound field control [13, 14], and automatic speech recognition [15], all of which are pertinent to the aforementioned applications. One of the core phases in the development of acoustic signal processing algorithms is that of the evaluation phase, where the performance of a newly developed algorithm is compared to that of existing algorithms in various acoustic environments which are relevant for the application at hand. This is clearly challenging because the laboratory conditions under which the algorithm is evaluated rarely match the real-world conditions where the algorithm must perform. Additionally, recorded audio signals with the target microphone configurations and specified acoustic scenarios may be unavailable, resulting in the use of simulated data for evaluation. Although simulated data can be useful in the evaluation of initial proof of concept ideas, it does not necessarily provide accurate indication whether the algorithm will perform well in real-world conditions. In an effort to overcome these challenges and to encourage the use of more realistic data, several high-quality acoustic databases containing room impulse responses (RIRs) [6, 9, 17,18,19,20,21,22,23,24,25,26,27], speech [6, 9, 10, 15, 20, 22, 23], music [20], and babble or cocktail party noise [22, 28, 29] have been developed over the years, which have played an important role in building confidence in the real-world performance of various acoustic signal processing algorithms.

In this paper, we present another complementary database of acoustic recordings from multiple microphones in various acoustic scenarios, referred to as the Multi-arraY Room Acoustic Database (MYRiAD). In comparison to the existing databases, the MYRiAD database is unique in its diversity of the employed microphone configurations suiting a wide range of applications, the acoustics of the recording spaces, and the variety of signals contained in the database, which includes RIRs, recordings of reproduced speech, music, and stationary noise, as well as recordings of live cocktail parties.

The database consists specifically of two different microphone configurations used across two different rooms. The first microphone configuration consists of a dummy head (DH) with in-ear omnidirectional microphones, two behind-the-ear (BTE) pieces mounted on the DH, each equipped with 2 omnidirectional microphones,Footnote 1 as well as 5 external omnidirectional microphones (XMs) located at various distances and angles from the DH.Footnote 2 This microphone configuration will be referred to as M1. The second microphone configuration consists of two concentric circular microphone arrays (CMAs) with in total 12 omnidirectional microphones,Footnote 3 which will be referred to as M2. The two different rooms where audio recordings were made are as follows: (i) the SONORA Audio Laboratory [35] located at the Department of Electrical Engineering (ESAT-STADIUS), KU Leuven, Belgium, which we will refer to as the SAL, and (ii) the Alamire Interactive Laboratory [35] located at the Park Abbey in Heverlee, Belgium, referred to as the AIL. The main acoustical difference between these two rooms is that the SAL is significantly more reverberant than the AIL, with reverberation times of 2.1 s and 0.5 s, respectively. In the SAL, the microphone configuration M1 was used in one position, and in the AIL, a combination of microphone configurations M1 and M2 was used in two positions. In terms of sound generation, 10 different movable loudspeakers were used as artificial sound sources in the SAL, while the AIL has been equipped with an array of 24 loudspeakers.

Fig. 1
figure 1

Fisheye view of the SAL and the AIL

Table 1 Equipment used for creating the database

The following audio signals were played back through the speakers and recorded by the microphones: exponential sine sweeps used to compute RIRs [36] between source and microphone positions, resulting in 110 RIRs for the SAL and 1104 RIRs for the AIL, as well as three male speeches [37], three female speeches [37], a drum beat [38], a piano piece [39], and speech-shaped stationary noise. Additionally, in both rooms, several participants were invited to re-create a live cocktail party scenario. The resulting noise from the different cocktail parties held at each of the spaces was recorded for both microphone configurations.

In total, the MYRiAD database contains 76 h of audio data sampled at 44.1 kHz in 24 bit, which results in 36.2 GB. All computed RIRs and recorded signals are available in the database and can be downloaded [43]. MATLAB and Python scripts are included in the database for accessing the signals and corresponding microphone and loudspeaker coordinates.

The remaining sections of this paper provide a detailed overview of the database and are organized as follows. In Section 2, an overview of the two different rooms, the SAL and the AIL, is presented. In Section 3, a detailed description is given of the equipment used. In Section 4, the microphone and loudspeaker configurations within the two rooms are discussed. In Section 5, an overview is given of the recorded signals, details of the cocktail party, and the computed RIRs. In Section 6, practical instructions for using the database are provided, along with a description of relevant MATLAB and Python scripts, and some examples from the database are illustrated. In Section 7, the database is briefly summarized.

Fig. 2
figure 2

Dummy BTE pieces used for creating the database. Each BTE piece consists of two omnidirectional microphones as indicated by the circles

Fig. 3
figure 3

Plan view of the M1 microphone configuration and the LS-SAL loudspeaker configuration. A description of the microphone and loudspeaker labels is given in Table 2. The radial grid spacing of the polar plot is 0.25 m. The DH is placed at a height of approximately 1.3 m ear level from the floor and all XMs are placed at a height approximately 1 m from the floor. The trapezoidal shape is used to represent the M1 microphone configuration in the floor plans of Fig. 6. For extracting the coordinates of the microphone and loudspeaker positions, the MATLAB or Python scripts discussed in Section 6.2 should be used

2 Room description

In this section, we provide a brief overview on the characteristics of the two recording rooms. The SAL is described in Section 2.1 and the AIL is described in Section 2.2.

2.1 SONORA Audio Laboratory (SAL)

The SAL [35] is located at the Department of Electrical Engineering (ESAT-STADIUS), KU Leuven, Heverlee, Belgium. Figure 1 shows a fisheye view and Fig. 6 shows a floor plan of the L-shaped SAL with approximate dimensions. The height of the room is 3.75 m, yielding a volume of approximately 102 m\(^{3}\). The walls and ceiling are made of plasterboard covering mineral wool, while the floor is made of concrete covered with vinyl. Two windows, each of 4 m\(^{2}\), are located on one side of the room. Adjacent to the recording room, separated by glass of area 6.5 m\(^{2}\), is the control room, where all the acquisition equipment and a computer are located. From the RIRs measured in the SAL, we estimated the reverberation time \(\mathrm {T_{20}}\) to be 2.1 s as described in Section 6.4. Details on the audio hardware used in the SAL are given in Section 3, while the microphone and loudspeaker configuration and placement are described in Section 4.1.1, Section 4.2.1, and Section 4.3.

2.2 Alamire Interactive Laboratory (AIL)

The AIL [35] is located in a historic gate building, the Saint Norbert’s gate of the Park Abbey in Heverlee, Belgium. Figure 1 shows a fisheye view and Fig. 6 shows a floor plan of the room. Apart from a staircase leading to a floor above, the room is approximately shoe-box shaped with 6.4 m width, 6.9 m depth, and 4.7 m height, yielding a volume of approximately 208 m\(^{3}\). The floor and ceiling are made of wood. The room is closed by thin line plastered brick walls with two windows each to the front and the back of about 3.3 m\(^{2}\) each, and wide passages to adjacent rooms, with one of them closed by a glass door. These passages were closed off with curtains during recording, except for a part of the cocktail party noise, cf. Section 5.3. The housing of the staircase is plastered, the stairs are wooden, and the railing is made of glass. From the RIRs measured in the AIL, the reverberation time \(\mathrm {T_{20}}\) is estimated to be 0.5 s, cf. Section 6.4. The AIL is equipped with a permanent, fixed array of 24 loudspeakers for spatial audio reproduction as shown in Fig. 1. Further details on the audio hardware used in the AIL are given in Section 3, while the microphone and loudspeaker configuration and placement are described in Section 4.1.1, Section 4.1.2, Section 4.2.2, and Section 4.3.

3 Recording equipment

A list of the recording and processing equipment used to create the database is shown in Table 1. In regard to the microphones, the DH contains 2 in-ear omnidirectional microphones (one for each ear) and the two BTE pieces (one for each ear) are each equipped with 2 omnidirectional microphones. The BTE pieces and their proprietary pre-amplifier were provided by Cochlear Ltd. and shown in Fig. 2. The specific loudspeaker and microphone configurations used for the various recordings in the database will be outlined in Section 4, and naming conventions of files will be defined in Section 6.

The recording chains were built as follows. As the digital audio workstations for sending and acquiring the signals, Logic Pro X and Adobe Audition on an iMac were used in the SAL and the AIL, respectively. In the SAL, the signals were sent from Logic Pro X via USB to the RME Digiface, then to the RME M-32 DA using the ADAT protocol, and finally to the respective Genelec 8030 CP loudspeakers. In the AIL, the signals were sent from Adobe Audition via the DANTE protocol to the Powersoft OTTOCANALI 4K4 DSP+D and finally to the Martin Audio CDD6 loudspeakers. In both rooms, all microphone signals were sent to an RME Micstasy (except for the BTE microphone signals which were firstly routed to the proprietary pre-amplifier) and converted to ADAT. In the SAL, the ADAT signals were sent to the RME Digiface and finally recorded on Logic Pro X, whereas in the AIL, the ADAT signals were sent to the Ferrofish Verto 64 and via DANTE to Adobe Audition. The various types of recorded signals are outlined in Section 5. For post-processing (such as RIR computation, cf. Section 5), MATLAB and Python were used.

4 Microphone and loudspeaker configurations

This section describes the microphone configurations in Section 4.1, the loudspeaker configurations in Section 4.2, and the placement of these configurations within the SAL and AIL in Section 4.3. The exact coordinates of the loudspeaker and microphone positions within the SAL and AIL from the various configurations can be loaded from the database, but the details of this procedure will be elaborated upon in Section 6.

4.1 Microphone configurations

4.1.1 M1

The first microphone configuration, M1, consists of the in-ear microphones from the DH, the microphones from the BTE pieces, three AKG CK97-O microphones, and two AKG CK32 microphones. As the AKG CK97-O and AKG CK32 microphones are not mounted on the DH, they are considered to be “external” in relation to the DH and hence will be referred to as external microphones (XMs). This M1 configuration was used in both the SAL and the AIL, cf. Section 4.3. Figure 3 depicts the plan view of the measurement configuration of the loudspeakers and microphones used for the audio recordings made in the SAL. For now, however, we will focus only on the trapezoidal shape enclosing the microphones, which is a depiction of the M1 configuration. A description of the corresponding microphone labels is given in Table 2.

For this M1 configuration, the DH is placed at a height of approximately 1.3 m ear level from the floor. Each of the BTE pieces is mounted on the DH as shown in Fig. 2. The XMs are placedFootnote 4 within a radius of 1 m from the DH as shown in Fig. 3. XM1, XM2, and XM3 are AKG CK97-O microphones, while XM4 and XM5 are AKG CK32 microphones. The XMs are all positioned at 1 m above the floor.

Table 2 Microphone and loudspeaker labels
Fig. 4
figure 4

Plan view of the M2 microphone configuration. A description of the microphone labels is given in Table 2. The radial grid spacing of the polar plot is 0.1 m. DPA 4060 microphones are used for the inner circular microphone array and AKG CK32 microphones are used for the outer circular microphone array. The circle drawn around the microphones represents the M2 microphone configuration in the floor plans in Fig. 6. For extracting more precise coordinates of the microphone and loudspeaker positions, the MATLAB or Python scripts discussed in Section 6.2 should be used

Fig. 5
figure 5

View of the LS-AIL loudspeaker array in the AIL. A description of the loudspeaker labels is given in Table 2. The speakers are organized in three different height levels of about 1.5 m (lower level), 3.3 m (upper level), and 4.1 m (top level) above the floor. The axes limits coincide with the boundaries of the approximately shoe-boxed shaped room, cf. Section 2.2. On the horizontal axes, the approximate distance between neighboring speakers is indicated. The given dimensions are of indicative nature and not exact; for extracting the coordinates of the microphone and loudspeaker positions, the MATLAB or Python scripts discussed in Section 6.2 should be used

4.1.2 M2

The second microphone configuration, M2, consists of two concentric circular microphone arrays (CMAs) composed of 4 DPA 4060 and 8 AKG CK32 microphones. Figure 4 shows a plan view of the M2 configuration, and a description of the microphone labels is given in Table 2. The inner circular microphone array has a radius of 10 cm and consists of 4 equidistantly placed DPA 4060 microphones. The outer circular microphone array has a radius of 20 cm and consists of 8 equidistantly placed AKG CK32 microphones. The microphones are all placed at a height of 1 m above the floor using a holder made of laser-cut acrylic glass, centered around the stand of the DH of the M1 configuration. This M2 configuration was used at two different positions within the AIL, always in combination with M1 as depicted in Fig. 6. It should be noted that since M2 was used in combination with M1, it is also possible to define arrays that contain microphones of both configurations, such as a linear array composed of CMA20_180, CMA10_180, XM1, CMA10_0, CMA20_0, XM2, and XM3.

4.2 Loudspeaker configurations

4.2.1 LS-SAL

The loudspeaker configuration LS-SAL as the name suggests is used in the SAL only. It is defined relative to the M1 microphone configuration, and consists of 10 loudspeakers. The loudspeakers are positioned at various spatial locations at a height such that the center of each of the woofers is approximately 1.3 m above the floor. Figure 3 is a plan view of this LS-SAL loudspeaker configuration along with the M1 microphone configuration. A description of the loudspeaker labels is also provided in Table 2. During recordings, the loudspeaker S0_1 was removed before recording the signals for the loudspeaker S0_2 so that there was a direct line of sight from the latter to the DH.

4.2.2 LS-AIL

The loudspeaker configuration LS-AIL is a 24-loud- speaker array, permanently installed in the AIL, cf. Fig. 1, which is typically used for spatial sound reproduction. Figure 5 shows the geometry of the loudspeaker array. The loudspeakers are labeled as described in Fig. 5 and Table 2. The width and depth of the array are approximately 5.6 m and 4.85 m, and the loudspeakers are arranged in three groups of different height levels, referred to as lower, upper, and top level. The lower level consists of 8 speakers located around the room along the walls at about 1.5 m height, the upper level containing 12 speakers is located above at about 3.3 m height, and the top level containing 4 speakers is located more centrally at about 4.1 m height. Note that for the sake of simplicity, the presented locations are only approximate. Using measurements of the distances between the speakers and a set of four reference points on the floor with known coordinates, the exact coordinates of the loudspeakers have been estimated based on the theory on Euclidean distance matrices [40]. All microphone and loudspeaker coordinates can be loaded from the database as discussed in Section 6.2.

4.3 Microphone and loudspeaker configuration placement

Figure 6 illustrates the placement of the M1 microphone configuration as well as the LS-SAL loudspeaker configuration within the SAL at a recording position near the corner of the L-shaped room.

Fig. 6
figure 6

Microphone and loudspeaker configuration placement. (Left) Placement of the M1 microphone configuration and the LS-SAL loudspeaker configuration within the SAL. (Right) Placement of the M1 and M2 microphone configurations in P1 and P2 as well as the lower level of the LS-AIL loudspeaker configuration within the AIL. Details of the M1 and M2 microphone configurations and the LS-SAL and LS-AIL loudspeaker configuration can be seen in Figs. 3, 4, and 5. For extracting the coordinates of the microphone and loudspeaker positions, the MATLAB or Python scripts discussed in Section 6.2 should be used

Fig. 7
figure 7

A combination of the microphone configurations M1 and M2 as used at the AIL

Figure 6 shows a floor plan of the setups M1 and M2 within the AIL, together with the lower speakers of the LS-AIL loudspeaker array. As can be seen, there are two recording positions in the AIL, referred to as P1 and P2, with the DH facing the speakers SU6 and SU7, located roughly below ST2 and ST1 (not shown in the figure), respectively. In both recording positions, both microphone configurations M1 and M2 are used, with the stand of the DH of M1 being the center of the circular microphone arrays of M2. Figure 7 shows a combination of M1 and M2 as used in position P2.

The coordinates of all speakers and microphones in both rooms can be loaded from the database using MATLAB or Python, cf. Section 6.2.

Table 3 Signals recorded and computed in the database
Fig. 8
figure 8

Cocktail party recordings at the SAL and the AIL

5 Recorded signals

The MYRiAD database contains 76 h of audio data and has a size of 36.2 GB. All microphone signals in the database are provided at a sampling frequency of 44.1 kHz with a 24 bit resolution. Their gains are set such that the recording level across the different microphone models is approximately the same around 1 kHz in diffuse noise. For the sake of consistency, recordings were done simultaneouslyFootnote 5 for all microphones in the SAL as well as in each of the two recording positions P1 and P2 in the AIL. A summary of the signals recorded and computed, along with the quantity of each (i.e., the number of different instances of that type of signal), their duration, their source, their acquisition method (i.e., how the signals were generated), the employed loudspeakers, and a signal label is provided in Table 3. In the remainder of this section, we discuss in more detail the RIR measurements in Section 5.1, the recorded speech, noise, and music signals in Section 5.2, and the recorded cocktail party in Section 5.3.

5.1 Room impulse responses

The database includes in total 110 RIRs from the SAL and 1104 RIRs from the AIL. To obtain the RIRs, two exponential sine sweep signals were played and recorded for each loudspeaker-microphone combination. In the AIL, the sides of the room were closed off with curtains during the recording. From these sine sweeps, the RIRs were computed by cross-correlationFootnote 6 according to the procedure detailed in [36]. From each pair of recorded sine sweeps, one of them was selected for RIR estimation by visual inspection of the spectrograms (more specifically, spectrograms containing any type of non-stationary noise were discarded). In order to obtain as clean as possible RIRs, some of the recorded sine sweeps were post-processed as to suppress low-level (stationary) harmonic noise components produced by the recording equipment. In this post-processing procedure, frequency bins containing harmonic noise components were identified during silence by comparing their magnitude to the median magnitude of neighboring frequency bins. If the difference was above the threshold of 4 dB, a Wiener filter [1] was applied in that frequency bin. The recorded signals were further post-processed to remove the input-output delay caused by the recording hardware.

5.2 Speech, noise, music

Speech, stationary noise, and music signals were played through the loudspeakers indicated in Table 3 and recorded by all microphones. Three male and three female speech segments were chosen randomly from the Centre for Speech Technology Research (CSTR) Voice Cloning Toolkit (VCTK) corpus [37]. The stationary noise source signal has a speech-shaped spectrum and was generated in MATLAB based on speech spectra from the VCTK corpus. The drum piece was taken from the studio recording sessions in [38]. The piano piece is track 60 (Schubert) from the European Broadcast Union Sound Quality Assessment Material Recordings for Subjective Tests (EBU SQAM) [39]. In the AIL, the sides of the room were closed off with curtains during recording. These signals were acquired for all loudspeakers in the SAL, but only for the lower loudspeaker level in the AIL, that is SL1 to SL8 (in contrast to the RIRs, which were computed for all possible loudspeaker-microphone combinations, cf. Section 5.1). The recorded signals were post-processed to remove the input-output delay caused by the recording hardware. For the signals recorded in the SAL, a slow phase drift was observed between the recorded data and simulated data obtained from convolving the estimated RIR with the source signal, cf. Section 6.3. This phase drift can be associated to hardware limitations in the recording setup and has been compensated for by time-shifting some of the recorded signalsFootnote 7 such as to minimize the error between the recorded and the convolved data. For the signals recorded in the AIL, no phase drift was observed. Both the source signals and the recorded signals are included in the database.

5.3 Cocktail party

In addition to the aforementioned signals, a cocktail party scenario was re-created and recorded in both the SAL and the AIL. All participants gave informed consent. They were instructed to stay outside of a 1 m circumference around the DH in both rooms and periodically move around in a random manner engaging in conversation. Snacks and beverages in glasses were also served to the participants during the recordings. For the SAL cocktail party, at any given time, there were at least 15 people present in the room, whereas for the AIL cocktail party, there were at least 10 and at most 14 people present. In the SAL, the microphone configuration M1 located as shown in Fig. 6 was used (the loudspeakers were removed from the room). In the AIL, the microphone configurations M1 and M2 located in position P2 as shown in Fig. 6 were used. The curtains on the sides of the room in the AIL were closed during the recordings of CP1, CP2, and CP3 and open during CP4, CP5, and CP6. Photos from the cocktail parties in the SAL and AIL are shown in Fig. 8.

Table 4 File path structure of the database
Table 5 Scripts facilitating the use of the database
Fig. 9
figure 9

Waveform and corresponding spectrogram of signals related to the SAL recordings. a First 10 seconds of the source signal corresponding to a female speaker, F1 (cf. Table 3), b computed RIR from the loudspeaker S0_1 to microphone BTELF (cf. Fig. 3), c recorded microphone BTELF signal after the signal from a was played through the loudspeaker S0_1, d simulated signal from the convolution of a and b, e error between signals c and d

Fig. 10
figure 10

Waveform and corresponding spectrogram of signals related to the AIL recordings. a First 10 seconds of the source signal corresponding to a female speaker, F1 (cf. Table 3), b computed RIR from the loudspeaker SL5_1 to microphone BTELF (cf. Fig. 3), c recorded microphone BTELF signal after the signal from a was played through the loudspeaker SL5_1, d simulated signal from the convolution of a and b, e error between signals c and d

Fig. 11
figure 11

Waveform and corresponding spectrogram for a 15 s sample of the cocktail party noise. (Left) Signal CP2 for XM2 in the SAL. (Right) Signal CP5 for XM2 in the AIL

Fig. 12
figure 12

Reverberation time \(\mathrm {T_{20}}\) for the two rooms SAL and AIL at full bandwidth and in different octave bands. The error bars indicate the standard deviation of the estimate across all possible loudspeaker-microphone combinations

6 Using the database

In this section, we elaborate on the file path structure of the database in Section 6.1 as well as the code provided for loading audio signals and retrieving loudspeaker and microphone coordinates in Section 6.2, and present some examples of audio signals in Section 6.3 and reverberation time estimates in Section 6.4.

6.1 File path structure

Table 4 provides an overview of the directory tree for the database. Audio files are located in the root directory /audio/, with loudspeaker source signals in the subfolder SRC/ and recorded microphone signals in the subfolders SAL/ and AIL/. The recorded microphone signals are further organized by loudspeaker (except for cocktail party recordings) and microphone configuration placement (in the AIL). The file names encode both the microphone and signal type. Note that not all folders contain all possible combinations of microphones and signals. For instance, the folder /audio/SAL/CP/ contains only files of signal type CP\(^*\), and the folders in /audio/AIL/SU\(^*\)/ and /audio/AIL/ST\(^*\)/ only contain files of signal type RIR, cf. Section 5.2.

The folder /coord/ contains files with coordinates of all speakers and microphones in both the SAL and the AIL, and the folder /tools/ contain MATLAB and Python scripts for accessing audio data and coordinates, cf. Section 6.2.

6.2 Creating microphone signals and retrieving coordinates

The database comes with MATLAB and Python scripts intended to facilitate retrieving loudspeaker and microphone coordinates and generating signals, as listed in Table 5.

The script load_audio_data is an example script demonstrating how a .wav-file can be loaded given a list of loudspeaker, microphone, and signal labels provided by the user. This script also calls the function load_coordinates(), which reads corresponding coordinates from SAL.csv or AIL.csv (cf. Table 4) and optionally visualizes them.

6.3 Examples of the audio signals

In this section, we take a glimpse into the database by observing some of the signals in both the SAL and the AIL, which will also make evident the different acoustics of the spaces.

Figure 9 displays the waveform (top of each sub-figure) and corresponding spectrogram (bottom of each sub-figure) for a number of signals related to the SAL. The colourmap in the spectrograms corresponds to the squared magnitude of the short-time Fourier transform coefficients and is plotted in dB. Figure 9a is the first 10 s of the source signal corresponding to a female speaker, F1 (cf. Table 3). Figure 9b is a computed RIR in the SAL from the loudspeaker S0_1 to microphone BTELF (cf. Fig. 3), where the reverberation time is seen to be quite long and highly frequency-dependent. Figure 9c shows the recorded signal of the source signal F1 (from Fig. 9a) in the microphone BTELF after being played through the loudspeaker S0_1. The effect of the reverberation is evident as the spectrogram shows how the source signal has now been distorted in both time and frequency. Figure 9d is the result of a convolution between the RIR from loudspeaker S0_1 to microphone BTELF (Fig. 9b) and the F1 source signal (Fig. 9a). This signal is representative of how the recorded signal from Fig. 9c would typically be simulated. As should be expected, Fig. c and d appear quite similar. However, Fig. 9e illustrates the difference (error) between the waveform plots in Fig. 9c and Fig. 9d, with the corresponding spectrogram of this error, demonstrating that the simulated signal and recorded signal are not identical. The error may be due to a variety of reasons such as acoustic noise, loudspeaker non-linearities, recording hardware limitations including slow phase drifts, cf. Section 5.2, and slowly time-variant as well as not perfectly linear sound propagation.

Figure 10 displays signals from the AIL in a similar manner to that of Fig. 9. The first 10 s of the same source signal, F1 (cf. Table 3) is observed (Fig. 10a). Figure 10b is a computed RIR in the AIL from the loudspeaker SL5_1 to microphone BTELF (cf. Fig. 3), where it can be observed that the reverberation time is significantly shorter as compared to the SAL and more uniform across frequency. Figure 10c shows the recorded signal of the source signal F1 (from Fig. 10a) in the microphone BTELF after being played through the loudspeaker SL5_1. Figure 10d is the result of a convolution between the RIR from loudspeaker SL5_1 to microphone BTELF (Fig. 10b) and the F1 source signal (Fig. 10a). Figure 10e is the difference (error) between the waveform plots in Fig. 10c and d. It can once again be observed that although the simulated and recorded signals are quite similar, they are not identical.

Figure 11 depicts the waveform and corresponding spectrogram from a 15 s sample of the cocktail party noise. The left of Fig. 11 is the signal CP2 (cf. Table 3) for microphone XM2 in the SAL and the right of Fig. 11 is the signal CP5 from XM2 in the AIL. The non-stationary behavior of this type of noise over time and frequency is quite evident.

6.4 Reverberation times

The reverberation time \(\mathrm {T_{20}}\) for the two rooms SAL and AIL is estimated at full bandwidth as well as in different octave bands. The estimate is obtained from the slope of a line fitted on the decay curves of the RIRs according to the ISO standard [41] and using the code in [42]. Here, the line was fitted in the dynamic range between − 5 dB and − 25 dB of the decay curve. A plot of the estimated reverberation times is shown in Fig. 12. As can be seen, the full-band reverberation time is significantly higher in the SAL with 2.1 s as compared to the AIL with 0.5 s. We further note that \({\mathrm T}_{20}\) in the SAL is largest between 1 and 2 kHz and quickly reduces above, while it is less dependent on frequency in the AIL. While in the AIL, the variance of the \(\mathrm {T_{20}}\) estimates continuously decreases with frequency, we observe that it increases again above to 2 kHz in the SAL. This may be due to an observed magnitude decay of the SAL RIRs above 2 kHz, resulting in less accurate line fitting. In addition, the increased directivity of the loudspeakers at higher frequencies may result in stronger variations of the generated sound field with regards to the loudspeaker placement.

7 Conclusion

In this paper, a database of acoustic recordings, referred to as the Multi-arraY Room Acoustic Database (MYRiAD), has been presented, which facilitates the recreation of noisy and reverberant microphone signals for the purpose of evaluating audio signal processing algorithms. Recordings were made in two different rooms, the SONORA audio laboratory (SAL) and the Alamire Interactive Laboratory (AIL), with significantly different reverberation times of 2.1 s and 0.5 s, respectively. In the SAL, a microphone configuration, M1, was used, which consists of in-ear dummy head microphones, microphones on behind-the-ear pieces placed on the dummy head, and external microphones (i.e., other microphones in the room). In the AIL, recordings were made in two different positions within the room using the microphone configuration M1 along with a second microphone configuration, M2, which consists of two concentric circular microphone arrays. In the SAL, 10 movable loudspeakers were used for sound generation, while in the AIL, a built-in array of 24 loudspeakers was used. The database contains room impulse responses, speech, music, and stationary noise signals, as well as recordings of a live cocktail party held in each room. MATLAB and Python scripts are included for accessing audio data and coordinates. The database is publicly available at [43].