Starting from classic listening experiments on stereo panning by Leakey [2], Wendt[3], and pairwise horizontal panning by Theile [4], this chapter explores the relevant perceptual properties for 3D amplitude panning and their models. Important experimental studies considered here are for instance those by Simon [5], Kimura [6], F. Wendt [7], Lee [8], Helm [9], and Frank [10, 11]. By the experimental results, it is possible to firmly establish Gerzon’s [1] E, \(\varvec{r}_\mathrm {E}\) and \(\Vert \varvec{r}_\mathrm {E}\Vert \) estimators for perceived loudness, direction, and width that apply to most stationary sounds in typical studio and performance environments.

2.1 Loudness

At a measurement point in the free field, the same signal fed to equalized loudspeakers of exactly the same acoustic distance would superimpose constructively (\(+6\) dB).

In a room with early reflections and a less strict equality of the incoming pair of sounds (typical, slight inaccuracy in loudspeaker/listener position, different mounting situations, different directions in the directivities of ears and loudspeakers), the superposition can be regarded as stochastically constructive (\(+3\) dB) in particular at frequencies that aren’t very low.

For the above reasoning, typical amplitude panning rules try to keep the weights distributing the signal to the loudspeakers normalized by root of squares instead of normalizing to the linear sum, in order to obtain constant loudness ([12], VBAP):

$$\begin{aligned} g_l\leftarrow \frac{g_l}{\sqrt{\sum _{l=1}^\mathrm {L} g_l^2}}. \end{aligned}$$
(2.1)

Loudness Model. If all loudspeakers are equalized, located at the same distance to the listener, and fed by the same signal with different amplitude gains \(g_l\), a constructive interference could be expected so that the amplitude becomes [1]

$$\begin{aligned} P=\sum _{l=1}^\mathrm {L} g_l. \end{aligned}$$
(2.2)

However, the interference stops to be strictly constructive as soon as the room is not entirely anechoic, the sitting position is not exactly centered, or even for anechoic and centered conditions at high frequencies, when the superposition at the ears cannot be assumed to be purely constructive anymore. Then it is better to assume a less well-defined, stochastic superposition in which a squared amplitude is determined by the sum of the squared weights [1]:

$$\begin{aligned} E=\sum _{l=1}^\mathrm {L} g_l^2. \end{aligned}$$
(2.3)

Therefore, the most common amplitude panning rules use root-squares normalization to obtain a loudness impression that is as constant as possible.

The measure E seems to be most useful when designing and evaluating amplitude-panning or coincident microphone techniques. It is not surprising that the ITU-R BS.1770-4Footnote 1 uses the Leq(RLB) measure as a loudness model: it is essentially the RMS level after high-pass filtering, cf. [13], which is closely related to the E measure detected from loudspeaker signals.

An interesting refinement was proposed by Laitinen et al. [14], which uses a measure \(\root p \of {\sum _{l=1}^\mathrm {L}g_l^p}\) in which the exponent p is close to 1 at low frequencies under anechoic conditions and close to 2 at high frequencies/under reverberant conditions.

2.2 Direction

In the early years of stereophony, researchers investigated the differences in delay times and amplitudes required to control the perceived direction. Below, only experiments are considered that did not use fixation of the listener’s head.

2.2.1 Time Differences on Frontal, Horizontal Loudspeaker Pair

The dissertation of K. Wendt in 1963 [3] shows notably accurate listening experiments done on \(\pm 30^\circ \) two-channel stereophony using time delays, in which listeners indicated from where they heard the sounds for each of the tested time differences. H. Lee revisited the properties in 2013 [8], but with musical sound material and an experiment, in which the listener adjusted the time differences until the perceived direction matched the one of a corresponding fixed reference loudspeaker, Fig. 2.1.

Fig. 2.1
figure 1

K. Wendt’s experiment [3] used an angular marks helping to specify the localized direction (left). Right shows results for time differences between impulse signals fed to loudspeakers, no head fixation (diagram shows means and standard deviation; the standard deviation was interpolated for the figure). In gray: Results of the time-difference adjustment experiment of Lee [8] using musical material (25, 50, 75% quartiles, symmetrized diagram)

The time differences are seldom applicable to reliable angular auditory event placement: auditory images are strongly frequency-dependent (not shown here) and therefore unstable for narrow-band sounds. Leakey and Cherry showed 1957 [2] that time-delay stereophony loses its effect under the presence of background noise.

2.2.2 Level Differences on Frontal, Horizontal Loudspeaker Pair

K. Wendt’s [3] and H. Lee’s [8] experiments deliver insights in sound source positioning with \(\pm 30^\circ \) two-channel stereophony, however this time with level differences.

Fig. 2.2
figure 2

Wendt’s [3] results to crack (impulsive) signals with level differences and without head fixation (the figure shows means and standard deviation; standard deviation was interpolated to plot this figure). In gray: Results of Lee’s [8] level-difference adjustment experiment with musical sounds (25, 50, 75% quartiles, symmetrized diagram)

As opposed to Fig. 2.1, in which auditory image panning with time differences were characterized by statistical spreads of up to \(15^\circ \), level-difference-based panning is clearly smaller in the spread of perceived directions than \(10^\circ \), Fig. 2.2.

Signal dependency. Wendt [3] described the signal dependency of panning curves on various transient and band-limited sounds, and Lee [8] for musical sounds. A new comprehensive investigation on frequency dependency was carried out by Helm and Kurz [9]. With level differences \(\{0,\,3,\,6,\,9,\,12\}\) dB and third-octave filtered pulsed pink noise at \(\{125,250,500,1\mathrm {k},2\mathrm {k},4\mathrm {k}\}\) Hz, they showed that the perceived angle pointed at by the listeners using a motion-tracked pointer was similar between the broad-band case and third-octave bands below 2 kHz. In bands above 2 kHz, smaller level differences cause a larger lateralization, see interpolated curves in Fig. 2.3.

Fig. 2.3
figure 3

Panning curve for frontal \(\pm 30^\circ \) loudspeaker pair from [9] on the example of the 500 and 4 kHz third-octave band and the slopes for different bands, based on the 3 and 6 dB conditions

2.2.3 Level Differences on Horizontally Surrounding Pairs

Successive pairwise panning on neighboring loudspeaker pairs is typically used to pan auditory events freely along the loudspeakers of a horizontally surrounding loudspeaker ring. The classical research done specifically targeted at such applications was contributed by Theile and Plenge 1977 [4]. They used a mobile reference loudspeaker with some reference sound that could be moved to match the perceived direction of a loudspeaker pair playing pink noise with level differences at different orientations with respect to the listener’s head. There is also the experiment of Pulkki [15] using a level-adjustment task, in which levels were adjusted as to match the auditory event to one of a reference loudspeaker at three different reference directions and for different head orientations. A comprehensive experiment was done by Simon et al. [5], who used a graphical user interface displaying the floor plan of a \(45^\circ \)-spaced loudspeaker ring to have the listeners specify the perceived direction. Martin et al. in 1999 [16] used a graphical user interface showing the floorplan of a 5.1 ring in their experiment, and last but not least, Matthias Frank used a direct pointing method to enter the perceived direction [10] in one of his experiments.

As the experiments did not seem to yield consistent results, a comprehensive level-difference adjustment experiment with 24 loudspeakers arranged as a horizontal ring was done in [17] and partially repeated later in [11], see results in Fig. 2.4. In the repeated experiment [11] it became clear that in the anechoic room, a large amount of the differently pronounced localization biases can be avoided by encouraging the listeners to do front-back and left-right head motion by a few of centimeters, whenever there is doubt.

Fig. 2.4
figure 4

Medians and \(95\%\) confidence intervals for adjusted level differences to align amplitude-panned pink-noise with harmonic complex tone from \(\{\pm 15^\circ ,0^\circ \}\), for a frontal and b lateral \(60^\circ \) stereo pair; a uses data from [17] with 4 responses per direction from 5 listeners; b used data from [11] with 20 responses per direction. Despite the considerably different spread, frontal and lateral stereo pairs seem to yield pretty much the same tendency

2.2.4 Level Differences on Frontal, Horizontal to Vertical Pairs

Quite extensively, T. Kimura investigates the localization of auditory events between frontal, vertical \(\pm 13.5^\circ \) loudspeaker pairs in 2012 [6, 18]. The work of F. Wendt in 2013 [7, 19] also investigates a slant and vertical loudspeaker pair, Fig. 2.5. Kimura uses pulsed white noise, Wendt uses pulsed pink noise.

Fig. 2.5
figure 5

Mean values and \(95\%\) confidence intervals of the direct-pointing experiments of Kimura (top) with level differences on a vertical \(\pm 13.5^\circ \) loudspeaker pair and results of F. Wendt (bottom) on frontally arranged horizontal, slant, and vertical \(\pm 20^\circ \) loudspeaker pairs showing two-dimensional \(95\%\) confidence (solid) and standard deviation ellipses (dotted)

Obviously, the horizontal spread is always smaller than the vertical spread and the spread does not align with the direction of the loudspeaker pair. The largest vertical spread appears for the vertical loudspeaker pair.

2.2.5 Vector Models for Horizontal Loudspeaker Pairs

A weighted sum of the loudspeakers’ direction vectors \(\varvec{\uptheta }_1\), \(\varvec{\uptheta }_2\) could be conceived as simple linear model of the perceived direction, using a linear blending parameter \(0\le q\le 1\)

$$\begin{aligned} \varvec{r}&=(1-q)\,\varvec{\uptheta }_1+q\,\varvec{\uptheta }_2. \end{aligned}$$
(2.4)

The parameter q adjusts where the resulting vector \(\varvec{r}\) is located on the connecting line between \(\varvec{\uptheta }_1\) and \(\varvec{\uptheta }_2\). On frontal loudspeaker pairs, localization curves typically run through the middle direction \(q=\frac{1}{2}\) for level differences of 0 dB. If only one loudspeakers is active, the result is either of the loudspeaker directions, thus the parameter is \(q=0\) or \(q=1\).

Classical definitions. As the simplest choice for q, one could insert \(q=\frac{g_2}{g_1+g_2}\) or \(q=\frac{g_2^2}{g_1^2+g_2^2}\) to get the vector definitions as weighted average using either the linear or squared gains according to [1]:

$$\begin{aligned} \varvec{r}_\mathrm {V}&=\frac{g_1\,\varvec{\uptheta }_1+g_2\,\varvec{\uptheta }_2}{g_1+g_2},&\varvec{r}_\mathrm {E}&=\frac{g_1^2\,\varvec{\uptheta }_1+g_2^2\,\varvec{\uptheta }_2}{g_1^2+g_2^2}. \end{aligned}$$
(2.5)

For both models, equal gains \(g_1=g_2\) yield \(q=\frac{1}{2}\), and also the endpoints with \(g_2=0\) or \(g_1=0\) correspond to \(q=0\) or \(q=1\), respectively. However, the slope of the \(\varvec{r}_\mathrm {E}\) vector is steeper than the one of the \(\varvec{r}_\mathrm {V}\). For instance, if \(g_2=2\,g_1\), the vector \(\varvec{r}_\mathrm {V}\) lies on \(q=2/3\) of the line between \(\varvec{\uptheta }_1\) and \(\varvec{\uptheta }_2\), while \(\varvec{r}_\mathrm {E}\) lies at \(q=4/5\) of the connecting line.

The \(\varvec{r}_V\) vector for the \(\pm \alpha \) loudspeaker pair at the directions \(\varvec{\uptheta }_{1,2}^\mathrm {T}=(\cos \alpha ,\,\pm \sin \alpha )\) corresponds to the tangent law [20], whose formal origin lies in a model of summing localization based on a simple model of the ear signals, cf. Appendix A.7. The equivalence of this law to the vector model follows from the tangent \(\tan \varphi \) as ratio of the y divided by x component of the \(\varvec{r}_V\) vector, \(\tan \varphi =\frac{g_1\,\sin (\alpha )+g_2\,\sin (-\alpha )}{g_1\,\cos (\alpha )+g_2\,\cos (\alpha )}=\frac{g_1-g_2}{g_1+g_2}\tan \alpha \).

Adjusted slope. Differently steep curves were fitted by an adjustable-slope model [17]

$$\begin{aligned} \varvec{r}_\gamma&=\frac{|g_1|^\gamma \,\varvec{\uptheta }_1+|g_2|^\gamma \,\varvec{\uptheta }_2}{|g_1|^\gamma +|g_2|^\gamma }, \end{aligned}$$
(2.6)

which uses \(\gamma =1\) for \(\varvec{r}_\mathrm {V}\) and \(\gamma =2\) for \(\varvec{r}_\mathrm {E}\). Figure 2.6 compares the prediction by \(\varvec{r}_\mathrm {V}\), \(\varvec{r}_\mathrm {E}\), and \(\varvec{r}_\gamma \) to frequency-dependently perceived directions in frontal horizontal pairs, to perceived directions in a lateral stereo pair, and to perceived directions in a frontal pair that is either horizontal or vertical, using various studies mentioned above.

Practical choice \(\varvec{r}_\mathrm {E}\). While a specific exponent \(\gamma \) closely fitting the experimental data may vary, a constant value is preferable. Figure 2.6 indicates that in most cases focusing on \(\varvec{r}_\mathrm {E}\) is reasonable and sufficiently precise, see also [11].

Fig. 2.6
figure 6

Fit of the \(\varvec{r}_\mathrm {V}\), \(\varvec{r}_\mathrm {E}\), and \(\varvec{r}_\gamma \) models for a third-octave noise on a frontal stereo pair using data from [9], and with data from [11]: b pink noise frontal and c lateral, cf. Figs. 2.3 and 2.4; d horizontal and vertical from [7], Fig. 2.5

2.2.6 Level Differences on Frontal Loudspeaker Triangles

V. Pulkki [21] and F. Wendt [7, 19] investigated localization properties for frontal loudspeaker triplets with level differences, see Fig. 2.7. Both used pulsed pink noise in their experiments.

Fig. 2.7
figure 7

Indirect level-adjustment experiment of Pulkki [21] shows the spread and mean of the adjusted VBAP angles for frontal loudspeaker triplets, and the experiments of F. Wendt [7, 19] use a direct pointing method to obtain results in the shape of two-dimensional \(95\%\) confidence (solid) and standard deviation ellipses (dotted) for \(\{-\infty ,\,0,\,+11.71\}\) dB for the top loudspeaker (left diagram), or the right loudspeaker (center diagram) respectively, or \(\{-\infty ,\,0,\,+11.51\}\) dB for the bottom loudspeaker (right)

While V. Pulkki used an indirect adjustment task to evaluate VBAP control angles to obtain auditory events directionally matching the respective reference loudspeakers, F. Wendt uses a direct pointing method. Wendt’s experiments indicate that loudspeaker triplets with three different azimuthal positions yield a smaller spread in the indicated direction than such with vertical loudspeaker pairs (not the case in Pulkki’s experiments).

Fig. 2.8
figure 8

Wendt’s experiments about frontal loudspeaker rectangles showing two-dimensional \(95\%\) confidence (solid) and standard deviation ellipses (dotted). The experimental setup of this and above-mentioned experiments is shown. Left: each of the corner loudspeakers is raised once by \(+6\) dB in level, right: both left/right loudspeaker levels are raised once by \(\{+3,+6\}\) dB, and both top/bottom pairs are once raised by \(+6\) dB

2.2.7 Level Differences on Frontal Loudspeaker Rectangles

F. Wendt [7, 19] moreover presents experiments about frontal loudspeaker rectangles, again using a pointer method and pulsed pink noise, Fig. 2.8.

Again it seems that arrangements avoiding vertical loudspeaker pairs exhibit a smaller statistical spread in the responses.

2.2.8 Vector Model for More than 2 Loudspeakers

For more than two active loudspeakers and in 3D, a vector model based on the exponent \(\gamma =2\) yields the \(\varvec{r}_\mathrm {E}\) vector [1]

$$\begin{aligned} \varvec{r}_\mathrm {E}&=\frac{\sum _{l=1}^\mathrm {L}g_l^2\,\varvec{\uptheta }_l}{\sum _{l=1}^\mathrm {L}g_l^2\, {\varvec{\uptheta }_l}}. \end{aligned}$$
(2.7)

2.2.9 Vector Model for Off-Center Listening Positions

At off-center listening positions, the distances to the loudspeakers are not equal anymore, resulting in additional attenuation and delay for each loudspeaker depending on the position. For stationary sounds, this effect can be incorporated into the energy vector by additional weights \(w_{\mathrm {r},i}\) and \(w_{\uptau ,i}\)

$$\begin{aligned} \varvec{r}_\mathrm {E}&=\frac{\sum _{l=1}^\mathrm {L}(w_{\mathrm {r},l}\, w_{\uptau ,l}\, g_l)^2\,\varvec{\uptheta }_l}{\sum _{l=1}^\mathrm {L}(w_{\mathrm {r},l}\, w_{\uptau ,l}\, g_l)^2\, {\varvec{\uptheta }_l}}. \end{aligned}$$
(2.8)

The weight \(w_{\mathrm {r},l}\) models the attenuation of a point-source-like propagation \(\frac{1}{r}\). The reference distance is the distance to the closest loudspeaker at the evaluated listening position, thus the weight of each loudspeaker results in

$$\begin{aligned} w_{\mathrm {r},l} = \frac{1}{r_l}. \end{aligned}$$
(2.9)

The incorporation of delays into the energy vector requires a transformation that yields the weights \(w_{\uptau ,l}\) for each loudspeaker. It is reasonable that these weights attenuate the lagging signals in order to reduce their influence on the predicted direction. An attenuation of \(\frac{1}{4}\frac{\mathrm {dB}}{\mathrm {ms}}\) is known from the echo threshold in [22], similarly [23], and has successfully been applied for the prediction of localization in rooms [24]. The weight of each loudspeaker is calculated as \(\tau _l=\frac{c}{r_l}\) in seconds at the listening position under test

$$\begin{aligned} w_{\uptau ,l} = 10^{\frac{-1000}{4\cdot 20}\tau _l}. \end{aligned}$$
(2.10)

Further weights can be applied in order to model the precedence effect in more detail, as proposed by Stitt [25, 26]. Listening test results in [27] compared the differently complex extensions of the energy vector and revealed that the simple weighting with \(w_{\mathrm {r},i}\) and \(w_{\uptau ,i}\) is sufficient for a rough prediction of the perceived direction in typical playback scenarios.

The left side of Fig. 2.9 shows the predicted directions by the energy vector for various listening positions when playing back the same signal on a standard stereo loudspeaker pair with a radius of 2.5 m. The absolute localization error can be calculated from the difference of the predicted direction and the desired panning direction. The right side of Fig. 2.9 depicts areas with localization errors within 4 ranges: \(0^\circ \dots 10^\circ \) (white, perfect localization), \(10^\circ \dots 30^\circ \) (light gray, plausible localization), \(30^\circ \dots 90^\circ \) (gray, rough localization), and \(>\!\!90^\circ \) (dark gray, poor localization).

Fig. 2.9
figure 9

Predictions of perceived directions by the energy vector for different listening positions in a standard stereo setup with two loudspeakers playing the same signal. Gray-scale areas on the right indicate listening areas with predicted absolute localization errors within different angular ranges

Concerning a single playback scenario, i.e. a single panning direction on a loudspeaker setup, the perceptual sweet area for plausible playback can be estimated by the area with localization errors below \(30^\circ \). For the prediction of a more general sweet area, the absolute localization errors can be computed for all possible panning directions in a fine grid of \(1^\circ \) and averaged at each listening position as shown in Fig. 2.10.

Fig. 2.10
figure 10

Predictions of mean absolute localization errors by the energy vector in a standard stereo setup for panning directions between \(-30^\circ \) and \(30^\circ \)

2.3 Width

M. Frank [10] investigated the auditory source width for frontal loudspeaker pairs with 0 dB level difference and various aperture angles, as well as the influence of an additional center loudspeaker on the auditory source width. The response was given by reading numbers off a left-right symmetric scale written on the loudspeaker arrangement (Fig. 2.11).

Fig. 2.11
figure 11

Experimental setup and results of experiments of M. Frank (confidence intervals) about auditory source width of frontal stereo pairs of the angles \(\pm 5^\circ ,\dots ,\pm 40^\circ \) and with an additional center loudspeaker (C)

Figure 2.11 (right) shows the statistical analysis of the responses. Obviously the additional center loudspeaker decreases the auditory source width.

Auditory source with is difficult to compare for different directions and also single loudspeakers yield auditory source widths that vary with direction. Still, a relatively constant auditory source width is desirable for moving auditory events. For static auditory events, the narrowest-possible extent can be desirable.

Fig. 2.12
figure 12

Cap size associated with \(\varvec{r}_\mathrm {E}\) length model for L\(+\)R (left plot) and L\(+\)R\(+\)C (right plot)

2.3.1 Model of the Perceived Width

The angle \(2\arccos \Vert \varvec{r}_\mathrm {E}\Vert \) describes the aperture of a cap cut off the unit sphere perpendicular to the \(\varvec{r}_\mathrm {E}\) vector, at its tip, from the origin, see Fig. 2.12. As the \(\varvec{r}_\mathrm {E}\) vector length is between 0 (unclear direction) and 1 (only one loudspeaker active), this angle stays between \(180^\circ \) and \(0^\circ \).

M. Frank’s experiments about the auditory source width [10, 28] showed that stereo pairs of larger half angles \(\alpha \) were also heard as wider. The length of the \(\varvec{r}_\mathrm {E}\) vector gets shorter with the half angle \(\alpha \). In a symmetrical loudspeaker pair \(\varvec{\uptheta }_{12}^\mathrm {T}=(\cos \alpha ,\,\pm \sin \alpha )\) with \(g_1=g_2=1\), the y coordinate of the \(\varvec{r}_\mathrm {E}\) vector cancels and its length is

$$\begin{aligned} \Vert \varvec{r}_\mathrm {E}\Vert =r_\mathrm {E,x}&=\cos \alpha . \end{aligned}$$

The corresponding spherical cap is same size as the loudspeaker pair \(2\arccos \Vert \varvec{r}_\mathrm {E}\Vert =2\alpha \). However, only \(\frac{5}{8}\) of the size was indicated by the listeners of the experiments, which yields the following estimator of the perceived width:

$$\begin{aligned} ASW&=\textstyle \frac{5}{8}\cdot \frac{180^\circ }{\pi }\cdot 2\arccos \Vert \varvec{r}_\mathrm {E}\Vert . \end{aligned}$$
(2.11)

For an additional center loudspeaker \(g_3=1\), \(\varvec{\uptheta }^\mathrm {T}=(1,0)\), the estimator yields

$$\begin{aligned} \Vert \varvec{r}_\mathrm {E}\Vert =r_\mathrm {E,x}&=\frac{1}{3}+\frac{2}{3}\cos \alpha , \end{aligned}$$

an increase matching the experiments as \(\arccos \Vert \varvec{r}_\mathrm {E}\Vert <\alpha \), see Figs. 2.13 and 2.12.

Fig. 2.13
figure 13

Model of the perceived width as \(\frac{5}{8}\) of the half-angle \(\arccos \Vert \varvec{r}_\mathrm {E}\Vert \) matches the half-angle of the experiment. Except for a lower limit, which is determined by the apparent source width (ASW) due to the room acoustical setting

2.4 Coloration

Despite research primarily focuses on the spatial fidelity of multi-loudspeaker playback, the overall quality of surround sound playback was found to be largely determined by timbral fidelity (\(70\%\)) [29]. Loudspeakers in a studio or performance space are often characterized by different colorations that are caused by different reflection patterns (most often the wall behind the loudspeaker). When changing the active loudspeakers, or their number, these differences become audible. On the one hand, static coloration, e.g. the frequency responses of the loudspeakers, can typically be equalized. On the other hand, changes in coloration during the movement of a source cannot be equalized easily and yield annoying comb filters.

Although coloration is often assessed verbally [30], we employ a simple technical predictor based on the composite loudness level (CLL) by Ono [31, 32]. The CLL spectrum predicts the perceived coloration and is calculated from the sum of the loudnesses of both ears in each third-octave band. Studies about loudspeaker and headphone equalization show that differences in third-octave band levels of less than 1dB are inaudible by most listeners [33, 34]. This criterion can also be applied for the perception of coloration, i.e., differences between CLL spectra of less than 1dB are assumed to be inaudible.

Pairwise panning between loudspeakers results in a single active loudspeaker for source directions that coincide with the direction of a loudspeaker and two equally loud loudspeakers for source directions exactly between two neighboring loudspeakers, cf. Fig. 2.14. In the second case, the different propagation paths from the two loudspeakers to the ears create a comb filter. This comb filter is not present for sources played from a single loudspeaker. Thus, moving a source between the two directions yields noticeable coloration. This is in contrast to static sources, for which Theile’s experiments [35] indicated that they are perceived without coloration.

Fig. 2.14
figure 14

Coloration predicted by composite loudness levels for a single loudspeaker C (black), two equally loud loudspeakers C and R (light gray), and their difference (dashed dark gray)

The actual shape of the afore-mentioned comb filter depends on the angular distance between the loudspeakers. The first notch and its depth decreases with the distance. This implies that coloration increases for playback with higher loudspeaker densities.

A similar comb filter is created when using a triplet of loudspeakers with the same loudspeaker density as the pair, e.g. L, C, R compared to C, R. In order to avoid a strong increase in source width or annoying phasing effects, the outmost loudspeakers L and R are strongly reduced in their level, typically around -12dB compared to loudspeaker C. In doing so, the similarity of the comb filters yields barely any coloration when moving a source between the two directions, cf. Fig. 2.15.

Judging from what is shown above, it appears beneficial to activate always a few loudspeaker to stabilize the coloration, as opposed to using just one loudspeaker and moving the playback to another one. Keeping the number of simultaneously active loudspeakers more or less constant does not only prevent coloration of source movements, it also yields a more constant source width. Because of this relation between coloration and source width, the fluctuation of \(\Vert \varvec{r}_\mathrm {E}\Vert \) is also a simple predictor of panning-dependent coloration.

In general, the strongest coloration is perceived under anechoic listening conditions. In reverberant rooms, the additional comb filters introduced by reflections help to conceal the comb filters due to multi-loudspeaker playback.

2.5 Open Listening Experiment Data

Experimental data from azimuthal localization in frontal and lateral loudspeaker pairs Figs. 2.3 and 2.4, azimuthal/elevational localization in horizontal, skew, and vertical frontal pairs Fig. 2.5, triangles Fig. 2.7, and quadrilaterals Fig. 2.8 are available online at https://opendata.iem.at in the listening experiment data project, as well as the data to the width experiment in Fig. 2.11.

Fig. 2.15
figure 15

Coloration predicted by composite loudness levels for loudspeaker C with additional –12 dB from L and R (black), two equally loud loudspeakers C and R (light gray), and their difference (dashed dark gray)

The opendata.iem.at listening experiment data project contains evaluation routines to analyze the 95%-confidence intervals symmetrically based on means, standard deviations and the inverse Student’s t-distribution CIMEAN.m, or more robustly based on median and inter-quartile ranges CI2.m and Student’s t-distribution, or for two-dimensional data analysis robust_multivariate_confidence_region.m. The MATLAB script plot_gathered_data.m reads the formatted listening experiment data and its exemplary code generates figures like the above.

In order to support others providing own listening experiment data, the MATLAB functions write_experimental_data.m read_experimental_data.m are provided on the website.