1 Introduction

The steadily increasing requirements for the audio market – especially for in-ear devices – forces manufacturers to reduce the size and power consumption of their loudspeakers while keeping or even increasing the quality of the frequency response. Classically, this has been achieved by using for example new materials or recently even a new manufacturing principle – so called Micro-Electro-Mechanical-Systems (MEMS) [1]. The big advantage of MEMS-devices is that the manufacturing process is ideal for mass production, since true MEMS-devices allow complete integration of the mechanical part and the required electronics [2]. Additionally, a microspeaker e.g. based on the electrostatic principle consumes very little power. Here, the driving circuit is responsible for the majority of the energy consumption which can be reduced with appropriate driving circuits [3]. It has to be noted, that the overall power consumption is not yet lower than the one of classical analog loudspeaker principles but the problem is rather shifted to the driving circuit. Nevertheless, smaller form factors as well as the low power consumption pave the way for a new generation of micro-speakers [4].

Until now, the underlying principle of the sound generation has not been altered in a commercial product. Regardless of using a classical electrodynamic speaker or a MEMS-micro-speaker, one will always find a membrane which is excited with the desired audio signal via an amplifier. There have been approaches using Digital Sound Reconstruction (DSR), but no commercial product was manufactured based on this principle [57]. According to [8] it can be shown, that classical DSR does not grant the desired advantage of an increased sound pressure over the whole frequency range. Therefore, ADSR has been developed, which tackles the underlying problems of DSR and enhances the principle by using a mechanical redirection unit. With ADSR, the frequency response in a channel can be enhanced with a gain of +20 dB per decade for decreasing frequencies compared to the classical analog principle. Hence, the lower the frequency, the larger the benefit. Since micro-speakers are very limited regarding their size due to the application (e.g. in-ear speakers), especially low frequencies are very challenging. It is evident that ADSR would be very beneficial for these applications, either in a standalone form if higher frequencies (especially ultra-sound applications) are not as important, or in a hybrid form where the best of both worlds is combined in one design.

In the following the general principle of ADSR as well as its advantages and disadvantages are presented. Furthermore, new challenges as well as possible embodiments are shown. Based on analytical investigations from [8] the possibilities for ADSR – especially for in-ear applications – will be discussed. Finally, a proof of concept based on MEMS-micro-pumps is presented.

2 ADSR – principle

The general principle of ADSR is based on classical Digital Sound Reconstruction (DSR) which is shown in [5]. The main idea for DSR is the reconstruction of the actual audio signal via superimposing time-delayed sound pulses. These sound pulses are generated by an array of miniature loudspeakers – so called speaklets – which are either excited, thus producing the desired sound pulse, or inactive. The number of concurrent sound pulses directly corresponds to the discrete value of the amplitude of the audio signal. The main problem for DSR poses the ideal (purely positive or negative) sound pulse, since it can not be generated consecutively with a classical actuator. Using a channel as an example, one can excite the membrane and leave it at an elevated position, thus generating a purely positive or negative sound pulse due to the relation between the particle velocity \(v_{\mathrm{a}}\) and the sound pressure \(p_{\mathrm{a}}\) given by \(p_{\mathrm{a}}=v_{\mathrm{a}} \, \rho _{0} \, c_{0}\), where \(\rho _{0}\) denotes the mean density and \(c_{0}\) the speed of sound. In order to reuse the speaklet it has to travel back to its equilibrium state generating an equal sound pulse with opposite sign, which inherently goes against the initial idea. Hence, the speaklet is latched and can not be used until a pulse in the opposite direction is needed, which is for a simple sine wave for example in the next half wave. It can be clearly seen, that this latching phenomenon poses a problem since the speaklets get „used up“. For the second case, where the membrane is brought back to its equilibrium state, a purely positive or negative pulse is inherently not possible with a classical actuator. Therefore, this version does not lead to a sufficient technique either [8].

The main idea for ADSR was now to redesign the actuator and include a redirection unit. This redirection unit enables a splitting of positive and negative sound pulses, thus enabling reusing the unit. A schematic representation of the overall unit can be seen in Fig. 1 and its working principle can be described as follows:

  • Opening the front shutter and closing the side shutters (transition step).

  • Exciting the membrane with a purely positive or negative velocity signal and leaving the membrane at an elevated level until the shutter position has been altered (transmission step).

  • Closing the front shutter and opening the side shutters (transition step).

  • Releasing the membrane back to its equilibrium state which generates a negative or positive sound pulse (opposite to the first one). This pulse is now redirected to the side and damped out (damping step).

  • After returning to the equilibrium state, the process can be started again.

Using multiple of these so called unit cells (UC) one can build an array of which is denoted as a unit cell cluster (UCC). This UCC can be used to generate an audio signal similar to the underlying idea of ADSR as it is shown in Fig. 2.

Fig. 1.
figure 1

Depiction of the redesigned actuator for ADSR. This unit serves as the smallest working part of an array and is therefore denoted as a unit cell (UC). The shutter gates can be set to redirect the sound waves generated by a classical loudspeaker through two structurally separated channels, thus allowing for a separation used to generate purely positive or negative sound pulses. [8, 9]

Fig. 2.
figure 2

Visualization of the sound generation process using ADSR. The solid lines represent the transmitted sound pulses, the dashed line the sound pulses which get damped out and the chain dotted line the resulting sound pressure signal

As it can be seen the individual UCs produce consecutive sound pulses proportional to the amplitude of the audio signal – in this case a simple sinusoidal signal. Since the audio frequency is defined by the number of pulses used for the reconstruction, the amplitude is independent of the audio frequency. A lower audio frequency can be achieved by using more pulses, the maximum sound pressure generated by the sound pulse though stays constant and therefore also the amplitude of the audio frequency. It has to be noted that due to the required redirection channel, a similar reconstruction of the audio signal at the „back side“ can be achieved, which resembles the operating principle of a traditional speaker. Hence, it might be possible to use this acoustic wave to incorporate the advantages of cabinet designs as e.g. bass-reflex.

The advantage of ADSR is that the sound pressure especially in the low- but also mid frequency range is much larger than using the classical excitation principle called the analog mode. Since the underlying principle of the actuator differs from the classical loudspeaker due to the fact that sound pulses are being „pumped“ through the individual unit cells, one could see the UCC as an „acoustic pump“. This comparison is in fact valid since the UCC could be replaced with a micro-pump where the volume flow is modulated by the audio signal. For a frequency approaching zero the similarities become evident: Since the UCC is constantly pumping sound pulses (and therefore some defined air volume) into a channel, a constantly running pump would have the same effect regarding the displaced volume. Based on this realization, a broad variety of embodiments of the actuator for ADSR can be found. Some based on the classical loudspeaker are given in [8], others might be directly based on micro-pumps as shown in [1012].

3 MEMS-speakers using ADSR – comparison to classical speakers

MEMS-speakers in general already have a lot of advantages over classical speakers such as the electrodynamic loudspeaker. Therefore, the advantages of MEMS in general – as shown in [24, 13] – will be given at first.

3.1 MEMS-speakers vs classical speakers

Miniaturization

Probably one of the most important advantages is the possibility of miniaturization. Due to the inherent structure of MEMS-devices, it is possible to develop sensors and actuators with a size reduced by orders of magnitude compared to their initial counterpart.

Mass production and cost efficiency

MEMS are ideal for mass production which reduces the costs in general. It has to be mentioned that the pre-production phase is more involved since more complex machines are necessary, but the production phase itself is very efficient.

Integration

The production process enables a high integration of systems by e.g. combining a speaker with a suitable pre-amplifier in one system.

Advantageous scaling properties

Depending on the underlying principle, sensor and actuators can perform better or more efficient on the micro-scale. This also applies to ADSR, which benefits greatly from the small form factor regarding acoustic- and mechanical properties (small form factors reduce problems with travel time, shutters can operate much faster at the micro-scale, thermo-viscous effects can be used for sealing to avoid contact problems).

Consistency

The manufacturing process itself enables producing very consistent devices which vary only to a small degree regarding their properties (e.g. mechanical displacement).

Power consumption

MEMS-loudspeakers (based on the electrostatic- or piezoelectric principle) use very little power since they mostly require reactive power which can be recovered with appropriate system architecture. The majority of the overall power is consumed by the driving circuit. In this sense, ADSR offers the additional advantage, that the driving circuit can be optimised for a specific excitation signal rather than the complete audible frequency range.

Faster response times

Smaller size also leads to smaller masses and faster response times. True MEMS-speakers (e.g. electrostatic speaker with a silicon membrane) have very little inertia which enables almost immediate responses. This can be especially advantageous for applications like Active Noise Control (ANC).

3.2 ADSR vs. analog MEMS-speakers

Advantages

As it can be seen, MEMS-speakers already have a lot of potential advantages compared to classical analog speakers. However, they have not been exploited throughout the speaker market yet. ADSR has the potential of leveraging these advantages by offering key advantages for MEMS-systems as for example increased sound pressure especially for low frequency audio signals. Due to the working principle ADSR offers up to 20 dB per decade for a decreasing frequency. Hence, one could think of an ADSR speaker as a MEMS-sub-woofer. For a channel application the additional 20 dB enhance the frequency response to a completely constant one. Therefore the sound pressure in a channel is independent of the audio frequency. For the free field the scaling with the 20 dB is similar, hence, the sound pressure only reduces by 20 dB per decade towards reducing frequencies compared to the 40 dB per decade for the classical analog mode. If high frequency or even ultra-sound is of great importance, hybrid systems incorporating the classical analog scheme as well as ADSR can be used. Since true MEMS systems (no additional assembly of the membrane needed) tend to struggle with low frequency audio signals, ADSR offers a great addition. A classical ADSR actuator or even the hybrid version would offer a true MEMS design, hence no additional assembly would be required reducing the overall handling steps. Furthermore, ADSR greatly benefits from a MEMS design in general. First of all, ADSR uses pulses which are only varied regarding their amplitude, hence the actor response is independent of the audio frequency if the pulse generation is consistent over the amplitude range. This enhances the audio quality in general while also enabling easy actuator tuning. Since the mechanical properties tuned in the high frequency range define the audio quality in the low frequency range, ADSR is ideally suited for MEMS applications.

Disadvantages

Besides offering benefits in the low- to mid frequency range, ADSR requires a rather complex actuator. Due to the inherent design some sort of pumping mechanism – as e.g. shown in [12] – is required which increases the total number of moving parts. This does not only increase the complexity of the manufacturing process but also makes the overall system more error-prone. Depending on the excitation variant the actuator itself requires a more complex excitation. For the first excitation variant – the globally optimal excitation signal – the added complexity is in a reasonable range, since the excitation signal (hat-function) is only modulated regarding its amplitude. For increasing audio frequency though the error caused by the linear approximation of the audio signal increases leading to additional (non-hearable) total harmonic distortion (THD). The more complex version, where the excitation signal is derived in a locally optimal manner requires a lot more computational effort, since the exact form of the excitation pulse has to be calculated separately for each time step. For this version though, no additional THD is introduced at the cost of high computational effort. A comparison between the excitation with the globally optimal excitation signal as well as the locally optimal excitation signal can be seen in Fig. 3. For the sake of brevity, we refer to [8] regarding the procedure of obtaining the locally optimal excitation signals.

Fig. 3.
figure 3

Comparison of the reconstruction with ADSR for the globally optimal excitation signal (linear approximation, dash-dotted line) as well as the locally optimal excitation signal (dashed line). The results are given for a generic channel where the excitation signals for the individual unit cells have been obtained with the procedure explained in [8]

Furthermore, ADSR is not very well suited for ultra-sound applications due to its inherent operating principle. Albeit there is no physical limit to the authors knowledge, the advantageous scaling of ADSR is most promising in the low- to mid-frequency range. Hence, in order to experience similar benefits regarding the overall sound pressure amplitude, especially the shutters would have to exhibit extremely fast switching times which would pose additional challenges from a mechanical point of view.

4 In ear application

Since the principle of ADSR is especially well suited for MEMS-applications it makes sense to focus more on in-ear than on the free-field speakers. However, it has to be noted, that ADSR is not restricted to in ear applications, since this is just the focus of this article. Hence, also the general microspeaker market including e.g. smartphones should be explored. In order to get an understanding of the capabilities of ADSR under ideal conditions analytical investigations combined with a circuit model for the ear-channel have been used. Based on the geometry and mechanical properties given in [14] an electrostatic actuator is used serving as an example for the comparison between ADSR and the analog mode. It has to be mentioned that due to the limitations regarding the maximum displacement of an electrostatic speaker caused by the so called snap-in effect, the maximum sound pressure is limited by the underlying actuator principle. Using a piezoelectric device would greatly enhance the maximum stroke level and therefore the achievable sound pressure [15].

4.1 Mechanical model

A single cell of the given actuator is characterized by a diameter of around 1 mm and a maximum displacement of about 400 nm. Although the used speaklets are subject to an in-plane pre-stress as well as to a non-uniformly distributed membrane deflection, a simplified model for a piston like movement is applied. Here we assume that the electrostatic force is distributed equally and the pre-stress is negligible altogether. Although these assumptions will have an influence on the actual shape of the deflected membrane it is enough to use the simplified model for a first estimation.

Averaging the deflection of the membrane over the area leads to an effective displacement which can be used for a 1D-model to estimate the acoustic response. Using the model given in [16] the averaged displacement \(\hat{w}_{\mathrm{avg}}\) over the whole membrane area is defined by the maximum displacement \(w_{\mathrm{max}}\) and is given by

$$ \hat{w}_{\mathrm{avg}} = \frac{w_{\mathrm{max}}}{3}. $$
(1)

4.2 Ear canal model

In order to estimate the acoustic response for ADSR for an in-ear application, an occluded ear canal simulator – in this case a generic 711 coupler – can be used. This coupler simulates the acoustics of a standardized human ear canal and can be modeled in a simplified fashion with a network model. The network used for the 711 coupler can be found in [17] and is also shown in Fig. 4 with the parameters given in Table 1. With the help of the network model, the sound pressure at the ear drum can be calculated as a function of the volume flow-rate.

Fig. 4.
figure 4

Network model for the 711 coupler [17]

Table 1. Parameters for the network model of the 711 coupler [17]

4.3 Comparison of ADSR and the analog mode for in-ear applications

Based on the model of the electrostatic actuator an effective overall displacement can be calculated. The available area for the speaker array itself is limited through the standardized diameter \(D\) of 7.5 mm of the coupler. With the available area and the size of one speaklet the maximum number of speaklets \(n_{\mathrm{Speaklet}}\) amounts to 42 which represents a fill factor of 74.7%. It has to be noted that this factor is quite high and serves as an upper boundary since electrical connections between the speaklets and circuitry will use some additional space, thus reducing the fill factor. The effective displacement for a 1D-model can now be calculated by

$$ \hat{w}_{\mathrm{eff}} = \hat{w}_{\mathrm{avg}} \frac{n_{\mathrm{Speaklet}} a^{2}\pi }{D^{2} \frac{\pi }{4}}, $$
(2)

where \(a\) is the radius of a single speaklet. With the maximum displacement given earlier the effective displacement finally amounts to \(\hat{w}_{\mathrm{eff}} = {99.56}~nm\). Using the effective displacement for the calculation of the resulting volume flow-rate gives

$$ \hat{Q}(f_{\mathrm{a}}) = 2 \pi f_{\mathrm{a}} \, \hat{w}_{\mathrm{eff}} \, \frac{D^{2} \pi }{4}, $$
(3)

where \(f_{\mathrm{a}}\) denotes the frequency of the audio signal. The volume flow can be used as a (current-) source for the network model given in Fig. 4. Based on this model the transfer impedance \(Z_{\mathrm{trans}}\) can be calculated, which is then used to calculate the sound pressure at the ear-drum. For the analog mode, the sound pressure at the ear-drum is given by

$$ \underline{\hat{p}}_{\mathrm{a,analog}}(f_{\mathrm{a}}) = \underline{Z}_{\mathrm{trans}}(f_{\mathrm{a}})\, \hat{Q}(f_{\mathrm{a}}). $$
(4)

For ADSR the gain relation presented in [8] is used for the calculation of the maximum achievable sound pressure, which results in

$$ \underline{\hat{p}}_{\mathrm{a,ADSR}}(f_{\mathrm{a}}) = \underline{Z}_{\mathrm{trans}}(f_{\mathrm{a}}) \, \hat{Q}(f_{\mathrm{a}}) \, G(f_{\mathrm{a}}). $$
(5)

According to [8] the gain \(G(f_{\mathrm{a}})\) for an overlap factor \(o_{f}=2\) and pause ratio \(p_{r}=2\) can be expressed in terms of the frequency as

$$ G(f_{\mathrm{a}}) = \frac{2}{4\pi \, T_{\mathrm{Dig}} \, f_{\mathrm{a}}}. $$
(6)

For this comparison the period of one purely positive or negative sound pulse \(T_{\mathrm{Dig}}\) has been defined via the sampling frequency as . Finally, the comparison of ADSR vs. the analog mode over the frequency is shown in Fig. 5. As it can be seen, ADSR outperforms the analog mode in the low frequency range, whereas in the high frequency range a break-even point is visible. This break even point is a design variable and can be tuned with e.g. the pulse width \(T_{\mathrm{Dig}}\).

Fig. 5.
figure 5

Comparison of the acoustic response for ADSR and the analog mode for in ear applications

5 Measurements

Developing a demonstrator for ADSR on MEMS-level is quite time consuming due to the prolonged initial production phase compared to macroscopic embodiments. One possibility to prove the overall concept of ADSR without the need of stepping through a complete MEMS-development process is to use so called micro-pumps. These actuators are often used for micro-dosing systems for liquids but are also available for applications using air as the transportation medium [12]. In theory, a micro-pump can generate a completely constant flow as well as changing the flow-rate in a frequency range spanning the audible frequency range – which would be the ideal actuator for ADSR. Since a generic actuator used for ADSR can be seen as an acoustic pump, it is possible to use a pump and modulate its flow-rate with the audio signal.

In a first step, an experimental, unidirectional micro-pump based on the piezoelectric effect has been used to prove the concept of ADSR. The excitation is based on a rectangular signal with a 3 kHz base frequency which is modulated with the audio signal – in this case a simple sinusoidal one. Due to the non-linearity of the actuator response the mean flow over the voltage has been measured in a prior step.

The measured response has been used as a look-up table in order to pre-distort the excitation signal accordingly. The desired excitation signal as well as the pre-distorted excitation signal can be seen in Fig. 6 for an exemplary case.

Fig. 6.
figure 6

Excitation signals (pre-distorted and desired signal) for the case of \(f_{\mathrm{a}}= 200~\mbox{Hz}\) and two periods. For \(f_{\mathrm{a}}=200~\mbox{Hz}\) up to four periods can be used, but only two are displayed for improved readability

The final excitation has been carried out with an AWG from Keysight (33522B) as well as high voltage power amplifier for piezoelectric devices with an amplification factor of 70. The microphone used for the measurement was a Brüel & Kjaer 1/8-inch pressure field microphone (Type 4138) being mounted in a 3D-printed housing which also serves as an adapter from the micro-pump to the measurement channel. The measurement channel itself is 4 m long to avoid reflections from the low frequency audio signal and is terminated with a porous absorber. The length of the channel serves as an additional measure to avoid the influence of reflections, since the reflection of a single period of a sinusoidal signal with a frequency higher than 42.875 Hz will not be measured at all.

A schematic representation of the used measurement setup is displayed in Fig. 7. Thereby, the range from 100 Hz to 250 Hz is investigated as shown in Fig. 8. The measurements show good agreement with the presented theory since the sound pressure in the investigated frequency range is indeed nearly constant. Furthermore, since the micro-pump used for this experiment exhibits a pulsed air flow due to the pulsed excitation signal as well as the underlying operating principle, the audio signal also shows the 3 kHz carrier frequency. Moreover, the side lobes caused by the modulation are visible, although not as distinct as usual due to the fact that only one period of the audio signal has been measured, which also limits the frequency resolution for the discrete Fourier transform (DFT). Please note that due to the fluctuating flow-rate a response covering only the desired audio range without a dominant carrier signal is not possible. Therefore, micro-pumps which are able to operate with a completely steady flow rate are necessary for an optimal realization of ADSR.

Fig. 7.
figure 7

Schematic representation of the measurement setup used for the acoustic measurements with the micro-pump

Fig. 8.
figure 8

Acoustic response of the micro-pump using ADSR. Due to the amplitude modulation of the mean flow and the excitation at 3 kHz the carrier frequency is clearly dominant

For selected frequencies multiple periods have been used for the measurements while still maintaining the restriction that the length of the pulse train is smaller than the maximum travel time before the reflected wave hits the microphone again given by the length of the channel. One exemplary measurement where the effect of the increased frequency resolution can be seen is given in Fig. 9. As it can be seen, the carrier and the side lobes caused by the modulation are clearly visible. Using two periods increases the frequency resolution which gives a clearer picture of the side lobes when using two periods instead of one. Regarding the absolute amplitude of the acoustic response, analytical estimations can be made. Based on the relation between the sound pressure and the particle velocity in a channel given by \(p_{\mathrm{a}} = v_{\mathrm{a}} \, \rho _{0} \, c_{0}\), the sound pressure amplitude of the desired audio signal can be expressed as

$$ \hat{p}_{\mathrm{audio}} = \frac{\dot{Q}_{\mathrm{max}}}{2 A} \rho _{0} c_{0}. $$
(7)

In this equation \(\dot{Q}_{\mathrm{max}}\) denotes the maximum volume flow rate and \(A\) the cross-sectional area of the channel where the measurement is made. For the used setup the area \(A\) can be approximated by

$$ A = A_{\mathrm{ch}} - A_{\mathrm{mic}} = 156.31~\mbox{mm}^{2}, $$
(8)

where \(A_{\mathrm{ch}}\) is the area of the channel (side length \(l=12.6~\mbox{mm}\)) and \(A_{\mathrm{mic}}\) the area of the microphone inserted in the channel (due to the protection grid the microphone is not mounted in a flush manner to the wall of the channel). Using the maximum volume flow-rate given from the measurements by \(\dot{Q}_{\mathrm{max}}=3.63~\mbox{ml}\,\mbox{min}^{-1}\) as well as the speed of sound \(c_{0}=343~\mbox{m}\,\mbox{s}^{-1}\) and the mean density \(\rho _{0}= 1.225~\mbox{kg}\,\mbox{m}^{-3}\), the sound pressure amplitude of the audio signal can be calculated which results in \(\hat{p}_{\mathrm{audio}} = 0.0813~\mbox{Pa}\) (see Fig. 8 and 9). This is in very good agreement with the measurements and shows, that even with a non ideal excitation (non-zero carrier signal) the estimation of the final sound pressure amplitude works well. It has to be noted that due to the simplified 1D-model as well as some uncertainties regarding the used parameters (e.g. residual support material from the 3D print reducing the effective area), the estimated sound pressure does not coincide perfectly with the measured one, but gives a good estimate.

Fig. 9.
figure 9

Acoustic response of the micro-pump using ADSR. Comparison of two measurements in the frequency domain replicating the audio frequency \(f_{\mathrm{a}} = 100~\mbox{Hz}\) using one or two periods respectively

6 Conclusion

In this paper a new method for sound generation – ADSR – was investigated regarding the potential benefits compared to the classical analog mode. Starting from a general description of ADSR the application for MEMS-speakers has been elaborated while also giving an overview of the advantages of MEMS-speakers in general. Furthermore, in-ear applications have been investigated with the help of a network model for an occluded ear-canal simulator. Based on this model the benefits of ADSR in the low- to mid frequency range have been demonstrated. Due to the gain of 20 dB per decade for decreasing frequencies, the achievable gain especially for low frequencies is very large. Thus it can be said that ADSR offers a great addition to the micro-speaker market where low-frequency sound is known to be challenging. Finally, measurements have been presented where the principle of ADSR has been demonstrated with the help of a micro-pump. Although this micro-pump was not developed for ADSR, it can serve as a suitable demonstrator for the proof of concept.