Vector-base amplitude panning (VBAP) was extensively described and investigated in [2], alongside with the stabilization of moving sources by adding spread with multiple-direction amplitude panning (MDPA) [3]. Since then, VBAP and MDAP have been becoming the most common and popular amplitude panning techniques, which is particularly robust and can automatically adapt to specific playback layouts.

3.1 Vector-Base Amplitude Panning (VBAP)

Assuming the \(\varvec{r}_\mathrm {V}\) model to predict the perceived direction, an intended auditory event at a panning direction \(\varvec{\theta }\), we call it the virtual source, can theoretically be controlled by the criterion according to V. Pulkki [2]

$$\begin{aligned} \varvec{\theta }&=\sum _{l=1}^\mathrm {L}\tilde{g}_l\,\varvec{\uptheta }_l. \end{aligned}$$
(3.1)

Here, \(\varvec{\uptheta }_l\) are the direction vectors of the loudspeakers involved and the amplitude weights \(\tilde{g}_l\) need to be normalized for constant loudness

$$\begin{aligned} g_l&=\frac{\tilde{g}_l}{\sqrt{\sum _{l=1}^\mathrm {L}\tilde{g}_l^2}}. \end{aligned}$$
(3.2)

Moreover, the weights \(g_l\) should always stay positive to avoid in-head localization or other irritating listening experiences. For loudspeaker rings around the horizon, always 1 or 2 loudspeakers will be contributing to the auditory event, for loudspeakers arranged on a surrounding sphere, always 1 up to 3 loudspeakers will be used, whose directions must enclose the direction of the desired auditory event, the virtal source. For the directional stability of the auditory event, the angle enclosed between the loudspeakers should stay smaller than \(90^\circ \).

The system of equations for VBAP [2] uses 3 loudspeaker directions and gains to model the panning direction \(\varvec{\theta }\)

$$\begin{aligned} \varvec{\theta }&=[\varvec{\uptheta }_1,\,\varvec{\uptheta }_2,\,\varvec{\uptheta }_3]\begin{bmatrix} \tilde{g}_1\\ \tilde{g}_2\\ \tilde{g}_3 \end{bmatrix}=\mathbf {L}\cdot \varvec{\tilde{g}}&\Rightarrow \varvec{\tilde{g}}&=\mathbf {L}^{-1}\,\varvec{\theta },&\varvec{g}&=\frac{\varvec{\tilde{g}}}{\Vert \varvec{\tilde{g}}\Vert }. \end{aligned}$$
(3.3)

The selection of the activated loudspeaker triplet is preceded by forming all triplets of the convex hull spanned by all the given playback loudspeakers. To find the loudspeaker triplet that needs to be activated, the list of all triplets is being searched for the one with all-positive weights, \( g_1\ge 0\), \(g_2\ge 0\), \(g_3\ge 0\).

Figure 3.1 shows the localization curve for VBAP between a loudspeaker at \(0^\circ \) and \(45^\circ \) for a centrally seated listener and one shifted to the left. The experiment is described in [4] and results were gathered by a 1.8 m circle of 8 loudspeakers, and listeners indicated the perceived direction by naming numbers from a \(5^\circ \) scale mounted on the loudspeaker setup. Black whiskers of the results (\(95\%\) confidence intervals and medians) for the centrally seated listener indicate a mismatch between slope of the perceived angles with VBAP; the ideal curve is represented by the dashed line and the mismatch can be understood by a better match of other exponents \(\gamma \) in Fig. 2.6. The directional spread is quite narrow. For an off-center left-shifted listening position the perceived directions is shown in terms of a \(5^\circ \) histogram (gray bubbles) in Fig. 3.1. For this off-center position, it becomes clear that the closest loudspeaker dominates localization within a third of the panning directions. Still, the directional mapping seems to be monotonic with the panning angle, and the perceived direction stays within the loudspeaker pair, which is a robust result, at least.

Fig. 3.1
figure 1

Perceived directions for VBAP between loudspeakers at 0\(^\circ \) and \(45^\circ \) from [4]. \(95\%\) confidence intervals and medians (black) are for a centrally seated listener in a circle of 2.5 m radius. Localization for left-shifted listener (1.25 m) can become bi-modal, so that \(5^\circ \) bubble histogram is shown (gray)

Fig. 3.2
figure 2

VBAP angles on a \(60^\circ \)-spaced horizontal loudspeaker ring starting at \(0^\circ \) (a) or \(30^\circ \) (b), perceptually adjusted to match panned pink noise with harmonic-complex acoustic reference in \(15^\circ \) steps, from [5]; black curve shows \(\varvec{ r}_\mathrm {E}\) model prediction

In Fig. 3.2 we see that responses from [5] in which the panning angle was adjusted to match reference loudspeakers set up in steps of \(15^\circ \) on amplitude-panned lateral loudspeaker pairs fairly match the reference directions using VBAP. The \(\varvec{ r}_\mathrm {E}\) vector model (black curve) delivers a better match with only one exception at \(105^\circ \). This motivates VBIP as alternative strategy.

Vector-Base Intensity Panning (VBIP). With nearly the same set of equations, but improving the perceptual mapping by the squares of the weights, the auditory event can be controlled corresponding to the direction of the \(\varvec{ r}_\mathrm {E}\) vector

$$\begin{aligned} \varvec{\theta }&=[\varvec{\uptheta }_1,\,\varvec{\uptheta }_2,\,\varvec{\uptheta }_3]\begin{bmatrix} \tilde{g}_1^2\\ \tilde{g}_2^2\\ \tilde{g}_3^2 \end{bmatrix}=\mathbf {L}\cdot \varvec{\tilde{g}}_\mathrm {sq}&\Rightarrow \varvec{\tilde{g}}_\mathrm {sq}&=\mathbf {L}^{-1}\,\varvec{\theta },&\varvec{\tilde{g}}&=\begin{bmatrix} \sqrt{\tilde{g}_{\mathrm {sq}1}}\\ \sqrt{\tilde{g}_{\mathrm {sq}2}}\\ \sqrt{\tilde{g}_{\mathrm {sq}3}} \end{bmatrix},&\varvec{ g}&=\frac{\varvec{\tilde{g}}}{\Vert \varvec{\tilde{g}}\Vert }. \end{aligned}$$
(3.4)

This formulation appears more contemporary due to the excellent match of the \(\varvec{ r}_\mathrm {E}\) model to predict experimental results, as shown earlier.

Non-smooth VBAP/VBIP width. If one of the loudspeakers is exactly aligned with the virtual source for either VBAP or VBIP, e.g. \(\varvec{\uptheta }_1=\varvec{\theta }\), the resulting gains are \(g_{1,2,3}=(1,0,0)\), and therefore only 1 loudspeaker will be activated. For a virtual source between the 2 loudspeakers, e.g. \(\varvec{\uptheta }_1+\varvec{\uptheta }_2\propto \varvec{ \theta }\), then we obtain \(g_{1,2,3}=(1,1,0)/\sqrt{2}\), and hereby only 2 loudspeakers will be active. This behavior in particular yields audible variation in the perceived width and coloration. For virtual source movements that cross a common edge of neighboring loudspeaker triplets, there will often be unexpectedly intense jumps that are quite pronounced.

Fig. 3.3
figure 3

The width measure \(2\arccos \Vert \varvec{ r}_\mathrm {E}\Vert \) for a virtual source on a horizontal and a vertical trajectory (\(45^\circ \) azimuth) using VBAP on an octahedral arrangement

Figure 3.3 illustrates the variation of the perceived width with VBAP on an octahedral arrangements of loudspeakers in the directions \(\varvec{\uptheta }_l^\mathrm {T}\in \{[\pm 1,0,0],[0,\pm 1,0], [0,0,\pm 1]\}\).

3.2 Multiple-Direction Amplitude Panning (MDAP)

In order to adjust the \(\varvec{ r}_\mathrm {E}\) or \(\varvec{ r}_\mathrm {V}\) vector not only directionally but also in length, and thus to control the number of active loudspeakers for moving sound objects, Pulkki extended VBAP to multiple-direction amplitude panning (MDAP [3]). Hereby not only the perceived width but also the coloration can be held constant.

Direction spread in MDAP. MDAP employs more than one virtual source distributed around the panning direction as a directional spreading strategy. For horizontal loudspeaker rings, MDAP can consist of a pair of virtual VBAP sources at the angle \(\pm \alpha \) around the panning direction \(\varphi _\mathrm {s}\pm \alpha \). In a ring of \(\mathrm {L}\) loudspeakers with uniform angular spacing of \(\frac{360^\circ }{\mathrm {L}}\), the angle \(\alpha =90\%\frac{180^\circ }{\mathrm {L}}\) yields optimally flat width for all panning directions, as shown for \(\mathrm {L}=6\) in comparison between MDAP and VBAP in Fig. 3.4. Moreover, MDAP seems to equalize the aiming of the \(\varvec{ r}_\mathrm {E}\) measure to the aiming of the \(\varvec{ r}_\mathrm {V}\) measure, which is the one controlled by VBAP and MDAP.

Fig. 3.4
figure 4

Width-related angle \(\arccos \Vert \varvec{ r}_\mathrm {E}\Vert \) and angular error \(\angle \varvec{ r}_\mathrm {E}-\varphi _\mathrm {s}\) for VBAP and MDAP, with MDAP using two virtual sources at \(\pm 90\%\frac{180^\circ }{\mathrm {L}}\) around the intended panning direction. This not only achieves minimal angular errors but also panning-invariant width

Fig. 3.5
figure 5

Perceived width differences when using VBAP and MDAP for panning onto or between loudspeaker directions from listening experiment in [4]

Listening experiment results. Experiments from [4] in Fig. 3.5 investigate the perceived width for two possible horizontal loudspeaker ring layouts, both with \(45^\circ \) spacings, but one starting at \(0^\circ \) (“0”) the other at \(22.5^\circ \) (“1/2”). Widths of MDAP with a direction spread of \(\alpha =22.5^\circ \) are perceived as significantly similar on both ring layouts, while VBAP yields significantly narrower results for panning onto the frontal loudspeaker in the “0” layout, which activates a single loudspeaker, only. Note that VBAP1/2 and MDAP1/2 are identical with \(\alpha =22.5^\circ \) and were treated as one condition.

Moreover, a more constant width measure also describes a more constant number of activated loudspeakers while panning. Figure 3.6 shows that listeners can hear the difference in coloration changes with rotatory panning using pink noise and a constant speed. The figure shows that coloration fluctuations of MDAP are always clearly smaller than with VBAP on similar loudspeaker rings. Moreover, coloration changes are more pronounced on rings of 16 loudspeakers than with 8 loudspeakers, which is explained by their faster fluctuation.

Fig. 3.6
figure 6

Coloration change of horizontally moving sources with VBAP and MDAP on different loudspeaker aperture angles from listening experiment in [4]

Figure 3.7 shows the results from [6] for a central and left-shifted off-center listening position when using MDAP on an 8-channel ring of loudspeakers. At the central listening position, the perceived directional spread around the loudspeaker positions \(0^\circ \) and \(45^\circ \), obviously increases as expected, as indicated by the whiskers (\(95\%\) confidence intervals and medians). Moreover, the spread of MDAP seems to slightly decrease the slope mismatch between the underlying VBAP algorithm and the perceptual curve around the \(22.5^\circ \) direction.

Despite MDAP enforces a larger number of active loudspeakers, its localization is still similarly robust as the one of VBAP, also at on off-center listening positions. The perceived direction can be assumed to stay at least confined within a strictly directionally limited activation of loudspeakers. Correspondingly, the perceived directions shown in the gray \(5^\circ \)-histogram bubbles of Fig. 3.7 indicate the perceived directions when the listener is located left-shifted off-center. While localization is slightly attracted by the closer loudspeaker at \(0^\circ \), the larger spread causes a more monotonic outcome that is less split than with VBAP in Fig. 3.1.

Fig. 3.7
figure 7

Perceived directions for MDAP panning on an 8-channel 2.5 m radius loudspeaker ring within the interval \([0^\circ ,\,45^\circ ]\) at a central (black medians and \(95\%\)- confidence whiskers) and 1.25 m left-shifted off-center listening position (gray \(5^\circ \) bubble histogram); dashed line indicates ideal panning curve

Fig. 3.8
figure 8

MDAP pink-noise directions on horizontal rings of \(60^\circ \)-spaced loudspeakers adjusted to perceptually match reference loudspeaker directions (harmonic complex) every \(15^\circ \). Markers and whiskers indicate \(95\%\) confidence interals and medians, black curve the \(\varvec{ r}_\mathrm {E}\) vector model

For a more exhaustive study, Frank used 6 loudspeakers on the horizon and gave the task to his listeners to align an MDAP pink-noise direction to match acoustical references every \(15^\circ \) (harmonic complex) by adjusting the panning direction [5]. The results in Fig. 3.8 contain 24 answers from 6 subjects responding four times (by repetition and symmetrization). The black line shows directions indicated by the \(\varvec{ r}_\mathrm {E}\) vector model for the tested conditions. Obviously, the confidence intervals of the adjusted MDAP angles match quite well both the reference directions and predictions by the \(\varvec{ r}_\mathrm {E}\) vector model, in particular for angles between \(0^\circ \) and \(90^\circ \) (except \(75^\circ \)) for the ring starting at \(0^\circ \), and from \(0^\circ \) to \(120^\circ \) for the \(30^\circ \)-rotated ring. The mismatch is much less than \(4^\circ \) for panning angles \(\le 90^\circ \).

MDAP with 3D loudspeaker layouts. For more arbitrary 3D loudspeaker arrangements, multiple-directions could be arranged ring-like, see Fig. 3.9. This arrangement uses 8 additional virtual sources inclined by \(45^\circ \) wrt. the main virtual source.

Fig. 3.9
figure 9

The width measure \(2\arccos \Vert \varvec{ r}_\mathrm {E}\Vert \) for virtual sources on a horizontal and vertical path on an octahedron setup using MDAP with additional 8 half-amplitude virtual sources at \(45^\circ \) distance to the main virtual source

At least mathematically, however, it requires to post optimize the amplitudes and angles of the virtual sources in order to accurately match the desired \(\varvec{ r}_\mathrm {V}\) or \(\varvec{ r}_\mathrm {E}\) vector in direction and length on irregular loudspeaker arrangements, cf. [7]. Non-uniform \(\varvec{ r}_\mathrm {V}\) vector lengths of the individual virtual sources involved cause a distorted resultant vector. In particular, their superposition is distorted towards those of the multiple virtual source directions with the longest \(\varvec{ r}_\mathrm {V}\) vectors. Epain’s article [7] proposes optimization retrieving optimal orientation and weighting of the multiple virtual sources for every panning direction.

3.3 Challenges in 3D Triangulation: Imaginary Loudspeaker Insertion and Downmix

Surrounding loudspeaker hemispheres typically exhibit the following two problems, in most cases:

  • Loudspeaker rectangles at the sides of standard setups with height (ITU-R BS.2051-0 [8]) can be decomposed ambiguously into triangles at the sides, back, and top. This can yield noticeable ranges within loudspeaker quadrilaterals, in which auditory events are unexpectedly created by just two of the loudspeakers.

  • Signals of virtual sources below the horizon usually get lost.

The problem of unfavorable or ambiguous triangulations into loudspeaker triplets appears subtle, however, it can cause clearly audible deficiencies. Especially when ambiguous triangulation yields asymmetric behavior between left and right, e.g., for the top, rear, and lateral directions, where we would manually define loudspeaker quadrilaterals instead of triangles, see [9].

As surrounding loudspeaker hemispheres are typically open by \(180^\circ \) towards below, VBAP/ VBIP/ MDAP is numerically unstable and theoretically useless for any panning direction below. Despite the absence of loudspeakers below renders downwards amplitude panning theroretically infeasible, it is still reasonable to preserve signals of virtual sources that are meant for playback on spherically surrounding setups.

In the case of the asymmetric loudspeaker rectangles, see Fig. 3.10, and a missing lower hemisphere of surrounding loudspeakers, the insertion of one or more imaginary loudspeakers in the vertical direction (nadir) or in the middle of the rectangle (the average direction vector) has proven to be a useful strategy, e.g. in [10]. Any imaginary loudspeaker aims at either extending the admissible triangulation towards open parts of the surround loudspeaker setup, or to cover for parts with potential asymmetry, see [9].

Fig. 3.10
figure 10

VBAP on the ITU D \((4 + 5 + 0)\) setup [8]. Top row: Insertion of imaginary loudspeaker at nadir preserves loudness of downward-panned signals, shown for vertical path and E values in dB for factors \(\{\frac{1}{\sqrt{5}},\,\frac{1}{2\sqrt{5}},\,0\}\) to re-distribute the signal to the 5 existing horizontal loudspeakers. Middle row: Due to typical triangulation, two left-right mirrored vertical paths (\(45^\circ \) azimuth) yield asymmetric behavior, as shown by the \(2\arccos \Vert \varvec{ r}_\mathrm {E}\Vert \) measure. Bottom row: Insertion of imaginary loudspeaker at \(65^\circ \) fixes symmetry and feeds the signal with a factor \(\frac{1}{\sqrt{4}}\) to the 4 existing neighbor loudspeakers

The signal of the imaginary loudspeaker can be dealt with in two ways

  • it can be dismissed, e.g., for loudspeaker below at nadir, this would still yield a signal near the closest horizontal pair of loudspeakers for virtual sources panned to below-horizontal directions unless panned exactly to nadir

  • it can be down-mixed to the neighboring \(\mathrm {M}\) loudspeakers by a factor of \(\frac{1}{\sqrt{\mathrm {M}}}\), or less as in Fig. 3.10; alternatively for control yielding perfectly flat E measures, the resulting down-mixed gain vector can be re-normalized by Eq. (3.2).

3.4 Practical Free-Software Examples

3.4.1 VBAP/MDAP Object for Pd

There is a classic VBAP/MDAP implementation by Ville Pulkki that is available as external in pure data (Pd). The example in Fig. 3.11 illustrates its use together with some other useful externals in Pd. Software requirements are:

Fig. 3.11
figure 11

Vector-Base/Multi-Direction Amplitude Panning (VBAP/MDAP) example in pure data (Pd) using Pulkki’s [vbap] external for an octahedral layout

3.4.2 SPARTA Panner Plugin

The SPARTA Panner under http://research.spa.aalto.fi/projects/sparta_vsts/plugins.html provides a vector-base amplitude panning interface (VBAP) and multiple-direction amplitude panning (MDAP), see Fig. 3.12, with frequency-dependent loudness normalization by \(\root p \of {\sum _{l=0}^\mathrm {L}g^p_l}\) adjustable to the listening conditions, see Laitinen [11].

Fig. 3.12
figure 12

The Panner VST plug-in from Aalto University’s SPARTA plug-in suite manages Vector-Base Amplitude Panning within sequencers supporting VST

The parameter DTT can be varied between 0 (standard, frequency-independent VBAP normalization, i.e. diffuse-field normalization), 0.5 for typical listening environments, and 1 for the anechoic chamber. The plugin allows to either manually enter the azimuth and elevation angles of multiple panning directions (if more than one input signal is used) and for the playback loudspeakers, or import/export from/to preset files. Of course all panning directions can be time-varying and be moved per mouse, automations, or controls.