A Survey on Application Specific Processor Architectures for Digital Hearing Aids

On the one hand, processors for hearing aids are highly specialized for audio processing, on the other hand they have to meet challenging hardware restrictions. This paper aims to provide an overview of the requirements, architectures, and implementations of these processors. Special attention is given to the increasingly common application-specific instruction-set processors (ASIPs). The main focus of this paper lies on hardware-related aspects such as the processor architecture, the interfaces, the application specific integrated circuit (ASIC) technology, and the operating conditions. The different hearing aid implementations are compared in terms of power consumption, silicon area, and computing performance for the algorithms used. Challenges for the design of future hearing aid processors are discussed based on current trends and developments.


Introduction
Modernhearing aids, like the one shown in Fig. 1, have to meeta variety of technical requirements. First of all, the power consumption of hearing aids is limited. To achieve an acceptable battery life, the average power consumption of hearing aids should be in the range of a few milliwatts. The reason for the low energy budget is the small physical size of battery-powered hearing aids. At the same time, the demand for more audio processing performance and memory capacity is steadily growing. There are newly developed algorithms with more or improved features and increasing demands on audio quality. In addition, it is required that hearing aids can be individually fitted to the hearing aid user, This article belongs to the Topical Collection: Survey Papers Holger Blume blume@ims.uni-hannover.de Lukas Gerlach gerlach@ims.uni-hannover.de Guillermo Payá-Vayá guipava@ims.uni-hannover.de 1 Institute of Microelectronic Systems, Leibniz Universität Hannover, Cluster of Excellence Hearing4all, Hannover, Appelstraße 4, 30167, Hannover, Germany adapt to constantly changing environmental conditions, and connect wirelessly to other electronic devices. These requirements and the high degree of flexibility and programmability make the hardware design for hearing aid devices challenging. Figure 2 shows the system components and peripherals of a state-of-the-art hearing aid. Typically, hearing aids contain a central processing unit that provides the functionality and connects all other components such as the receiver and microphones. The design and implementation of the processor is challenging and involves numerous tradeoffs due to the wide range of requirements and the large design space. The implementation alternatives in a multidimensional design space are shown in Fig. 3. Various hearing aid processor architectures and implementations were introduced in the literature. There are analog, mixed-signal, or purely digital hearing aids. Some use hard-wired processing and control circuits, others use fully programmable application-specific instruction-set processors (ASIPs) with custom instructions and hardware accelerators. This survey compares these hearing aids from a hardware perspective.
The main focus of related work, i.e., technical studies, surveys or overview papers, addressing the hearing aid signal processing, is on current and future hearing aid algorithms. The first related study [26] presents stateof-the-art signal processing in hearing aids. Among the studied techniques and algorithms are feedback reduction, architectures. A particular focus is on application-specific instruction-set processors (ASIPs).
The structure of this survey is as follows: Section 2 provides a list of the algorithms that are implemented on hearing aid processors, which are part of this survey. The hearing aid processors are described in detail in Section 3. The differences in the architecture of hard-wired and ASIPbased hearing aid processors are discussed. The remaining sections cover the ASIC technology and supply voltage (Section 4), power consumption (Section 5), silicon area (Section 6), operating clock frequency (Section 7), audio datapath width (Section 8) and on-chip memory in ASIPbased hearing aid systems (Section 9). Section 10 concludes this paper and points out possible future trends.

Algorithms for Hearing Aid Devices
A typical high-end hearing aid processing is shown as a block diagram in Fig. 4. Multiple microphones enable Figure 2 System components and peripherals of a modern hearing aid device [9,63].

Figure 4
Block diagram of a typical hearing aid processing [22,55]. directional filtering. Therefore, beamforming (BMF) and adaptive directional microphone (ADM) algorithms are the first in the chain and aim to increase the signal-to-noise ratio (SNR) by performing directional filtering. Feedback is then suppressed with a feedback cancellation (FBC) algorithm by analyzing the output signal and detecting feedback loops. The algorithms that process frequency domain data, such as the noise reduction (NR) and dynamic range compression (DRC) algorithms, require an analysis and synthesis filter bank. Classification algorithms generally generate control signals for the processing chain. A list of algorithms is included in Table 1. This list contains exclusively algorithms that are part of a processing chain in state-of-the-art hearing aids. Publications with the implementation, optimization, and application of these algorithms on the state-of-the-art hearing aid processors are also included in Table 1. There is a trend towards algorithms, that are computationally more demanding. In recent years, algorithms for machine learning and deep learning [40,44,52] and binaural processing algorithms [46] are used. Recently proposed algorithms of this kind [57,67,69], of which no implementation details on a hearing aid processor are known, are not listed in Table 1.

Hearing Aid Processor Architectures
In recent decades, various hearing aid processors have been proposed in the literature. All hearing aid processors are subject to comparable strict requirements regarding limited energy budget, available chip area, and performance requirements. However, a wide range of different architectures, algorithms, approaches, and technologies were introduced to meet these stringent requirements. 30 research and commercial processors published between 1996 and 2020 are listed in Table 2. This table provides a comparison of the architecture, ASIC technology, supply voltage, average power consumption, silicon area, and operating clock frequency of the various hearing aid systems.
The processor architectures are designed and optimized to efficiently execute particular hearing aid algorithms listed in Table 1. The architectures of these processors can be divided into three main classes: hard-wired with dedicated processing blocks, ASIPs, and ASIPs with hardware accelerators.
A digital hard-wired hearing aid is highlighted in the following. This dedicated architecture, originally proposed in [72] and published in 2014, is characterized by its flexibility compared to related architectures. It includes a corebased architecture consisting of a memory management unit for data exchange, a control unit, and an arithmetic unit for processing. Therefore, processing is easier to control Table 1 List of algorithms, which are applied by the related work in hearing aid processors.

Application-Specific Instruction-Set Processors
Application-specific instruction-set processor (ASIP) architectures include a digital signal processor (DSP) for signal processing [35,[47][48][49][50]56]. The DSP architecture is optimized for the typical hearing aid algorithms, therefore it is here also denoted as an (ASIP). The target algorithms can be modified or replaced by changing the program code. This offers greater flexibility compared to hard-wired architectures. However, due to the higher flexibility offered by the processor architecture and the increased memory requirements, the power consumption and silicon area requirements are generally higher compared to hard-wired architectures. Instruction-level and data-level optimizations improve the efficiency of signal processing. New custom instructions increase processing performance. A hearing aid with a DSP for signal processing is presented in [48] and its block diagram and photo are shown in Fig. 6. This hearing aid publication is highlighted as the authors propose an algorithm to silicon flow in addition to the proposed hearing aid chip. This flow is integrated into the chip design flow and supports accurate and fast simulations, ASIC synthesis, optimization and verification [48]. These tools are useful for handling the overall complexity and drastically decrease the design time, if the underlying ASIC technology is changed. The DSP architecture consists of a datapath with several general purpose execution units, a complex-valued multiplier, and a controller with a program read-only memory. The operating clock frequency of the DSP is reduced by increased parallelism and reduced memory accesses. A fast Fourier transform (FFT) algorithm case study for the architecture shows how a radix-8 implementation can minimize memory accesses and increase the number of parallel operations. Over 20 operations per cycle are achieved. In addition to clock gating and low voltage operation techniques, the authors propose to partition the datapath and the read-only memory (ROM) of the complete architecture. The underlying concept is that there are different types of operations that do not require the same  The analog front end including a digital-to-analog converter (DAC), programmable gain amplifier, and a serial interface are integrated on a separate chip [34].

ASIPs with Hardware Accelerators
There are hearing aid processing architectures that combine ASIPs with dedicated hard-wired accelerators. These accelerators are used for frequent and computationally intensive tasks. The flexibility and complexity of these accelerators varies. A list of accelerators for hearing aids can be found in Table 3. The hearing aid processing task is mapped to either the ASIP or the accelerator. The goal is to process the intensive computing tasks on the accelerator, while the ASIP performs computations in parallel and controls the accelerator processing [54].
The block diagram of a highlighted ASIP with accelerators [19] is shown in Fig. 7. The corresponding layout view is shown in Fig. 8. This research hearing system on chip contains four ASIPs on one chip to test different processor and algorithm configurations for processing performance and power efficiency. The ten co-processors, which can be used in parallel, accelerate the computation of the coordinate rotation digital computer (CORDIC) algorithm for hyperbolic and trigonometric functions such as sine, cosine, square root, exponential, tangent, and division with an average speed-up of 28 compared to a software implementation on the ASIP. The ASIP can configure several accelerators with different operating modes to calculate different results in parallel.
Using the same hardware accelerator for different audio signal processing tasks is also applied in other related work. In [40], an arithmetic unit with a dual MAC and butterfly unit can operate either in FFT mode or in CNN mode. By sharing hardware resources, 42% of hardware complexity can be saved. In [54], a streaming DSP hardware accelerator is introduced that can compute applications such as keyword recognition or other algorithms for classifications. Any of the co-processors in [19] can be disabled by clock gating, however, these operations are elementary and are often used in hearing aid applications. This also applies to the FIR filter accelerators presented in [4,51]. The accelerators presented in [4-6, 9, 25, 33, 63] are more complex and specific, because they implement complete algorithms, such as noise reduction (NR), feedback cancellation (FBC) or  [40] Convolutional neural networks (CNN) and fast Fourier transform (FFT) accelerators for speech enhancement Pu et al. [54] Streaming DSP for voice code word detection Gerlach et al. [19] Co-processors for hyperbolic and trigonometric functions Lin et al. [41] Noise reduction (NR) accelerator Lin et al. [41] Multiply-accumulate (MAC) unit accelerator Lin et al. [41] Fast Fourier transform (FFT) accelerator [4-6, 9, 25, 33, 63] Analysis filter bank (AFB) accelerator [4-6, 9, 25, 33, 63] Noise reduction (NR) accelerator [4-6, 9, 25, 33, 63] Feedback cancellation (FBC) accelerator [4-6, 9, 25, 33, 63] Wide dynamic range compression (WDRC) accelerator
others, in hardware. One advantage is the efficiency gained by the hard-wired implementation. If an algorithm needs to be changed, it is possible to use ASIP processing resources instead of the accelerators.

ASIC Technology and Supply Voltage
The advantages of the steadily decreasing feature sizes of CMOS semiconductor technology are exploited in commercial and research hearing aids. The feature sizes of modern hearing aids from 1996 to 2020 are shown in Fig. 9.
Hearing aids with an analog front end (AFE), including analog-to-digital converters (ADCs), programmable gain amplifiers (PGAs), or digital-to-analog converters (DACs), are marked. These hearing aids are either mixed-signal or analog hearing aid designs, which have on average larger feature sizes due to more restrictive design rules and greater sensitivity to noise [61]. To overcome these limitations, the authors of [19,34,48,53] propose a chip-level integration with two separate chips. Each chip is integrated with a different ASIC technology, to independently utilize the more appropriate feature size for both, the digital and the analog components of the hearing aid. The rate, at which the feature size shrinks, decreased significantly for hearing aid implementations in recent years. This is due to the higher costs for the design and manufacturing with smaller feature sizes [61]. The supply voltages of hearing aid implementations are shown in Fig. 10. Since the feature size remained almost constant over the last years (Fig. 9), the supply voltage also remains almost constant (Fig. 10). This is especially noticeable for hearing aids with analog components. The lowest supply voltages of 0.55 V to 0.8 V are used in digital hearing aid designs. Those hearing aid implementations, that employ undervoltage techniques through dynamic voltage scaling and use voltages close to the threshold voltage, are listed in Table 4.

Power Consumption
The average power consumption determines the battery life of the hearing aids. During normal operation, all components  of the hearing aid processing system are usually constantly active. The average power consumption for the hearing aid implementations is shown in Fig. 11. The computational complexity of the algorithms determines, among other things, the power consumption. The lowest achieved average power consumption for the given implementations is 10 μW. The hearing aids [74] and [51] consume this power for an adaptive signal-to-noise ratio (SNR) monitor based on an envelope detection and adaptive FIR and IIR filter calculations. On the other hand, when targeting hearables or smart headphones instead of hearing aid devices, deep-learning based noise reduction techniques require an average power consumption up to 4 mW [54]. Hard-wired architectures offer a comparatively low-power consumption compared to the ASIP architectures. The power distribution for the hardware components of the mixed-signal hearing aid [5] is 36% for the analog front end, 39% for the digital signal processor (DSP), 11% for the power on reset circuit and 13% for the remaining components. The digital signal processor of the hearing aid presented in [9], on the other hand, consumes up to 71%, while the analog parts consume the remaining 29%.

Silicon Area
The silicon area for each hearing aid is shown in Fig. 12. The analog front-end or wireless connection modules, which are not part of every hearing aid, require additional silicon area, which must be considered when comparing implementations. The area distribution for the mixed-signal hearing aid, which is presented in [9,25], is 30% for the analog and 70% for the digital part. The digital part consists of a 24 bit application-specific instruction-set processor and five dedicated accelerators. The analog part consists of an audio front end with a programmable gain amplifier (PGA), an analogto-digital converter (ADC) and a class-D amplifier for the pulse density modulation (PDM) output. The total size is 9.50 mm 2 and this is the maximum chip size since 2004. The analog hearing aid presented in [70], which is manufactured using a 0.13 μm and a 0. 35 μm technology, requires 66% of the area for the automatic gain control, 15% for the driver and 20% for the filter circuit. The wireless control part of the analog hearing aid, which presented in [12], is based on a dual tone multi frequency (DTMF) receiver, occupies 1.16×4.6 mm 2 , which is 16% of the total chip size    [41] 0.70 V 0.90 V 40 nm CMOS Wei et al. [71] 0.60 V 1.00 V 90 nm CMOS Wei et al. [72] 0.60 V 1.00 V 90 nm CMOS Lee et al. [40] 0.60 V 0.90 V 40 nm CMOS Pu et al. [54] 0.55 V 1.05 V 28 nm CMOS of 5.7×4.9 mm 2 . The silicon area of a hearing aid may be pad limited. As a result, the total area is larger than effectively required for the digital or analog core parts. This is the case for the second largest ASIP-based hearing aid system in this study, which does not include an analog front-end [48]. Its size is 20 mm 2 .

Operating Clock Frequency
The required operating clock frequency depends on the computing complexity of the hearing aid algorithms and the architecture-dependent processing power of the digital signa l processing system (Fig. 13). Most hard-wired hearing aids operate at comparatively low operating clock frequencies around 0.032 MHz to 8.000 MHz. The processing is samplebased, i.e., each processing unit or component like a digital filter or amplifier processes one sample per clock cycle. In [71,72], a more computationally intensive sample-based processing is applied, using a noise reduction algorithm based on multiband spectral subtraction and an enhanced entropy voice activity detection. The audio samples are stored in local ping-pong buffer and processed sequentially for each sub-band at a clock frequency of 3MHz to 8MHz for the various processing blocks. Digital hearing aids with Figure 11 Power consumption of commercial and research hearing aids.  an application-specific instruction-set processor as the central processing unit require somewhere in the region of a thousand instructions to process the algorithms. An implementation of a related noise reduction algorithms (mband) on an ASIP with hardware accelerators [33] needs 2176 cycles for computation. Parallelism at data or instruction level, or application-specific instructions [4,5,9,19,25,33,35,47,51,56] can reduce the clock frequency requirement. Accelerators are used for computing intensive tasks, where the pure software implementation on an ASIP is not feasible.

Audio Datapath Width
All digital hearing aids presented in this survey use fixedpoint hardware architectures for signal processing, due to lower hardware cost in terms of area and power requirements compared to floating-point hardware [35]. The audio datapath width of the fixed-point data, i.e., the number of bits per audio sample, is a crucial parameter for the design and implementation of hearing aids, as it determines the maximum achievable signal-to-noise ratio (SNR). A high SNR value is a strict requirement for hearing aids [5,31]. Each additional datapath bit increases the SNR by about 6 dB. However, this parameter also affects the area, power consumption, and processing performance of all components in the processing chain, digital processing blocks, memories, ADCs, and DACs [4,19,25,56,62]. The authors of [41] present a word length optimization to reduce the area and power of their MAC unit accelerator. They propose to optimize the number of bits based on the results of short-time objective intelligibility (STOI) measurements. Alternatively, signal-to-noise ratio (SNR) measurements are used in [33]. In [56], a 16-bit processor is extended with specific functional units that use 32-bit and 40-bit intermediate results to improve the fixed-point accuracy. Two separate processors are used in [62]. The 32-bit   Figure 13 Operating clock frequency of commercial and research hearing aids.      [4] static floating-point ASIP+acc. Gerlach et al. [18,19] emulated floating-point ASIP+acc.
Arm Cortex M3 processor is used for debugging and wireless connectivity and the 24-bit ASIP processes the audio samples. In Table 5, a comparison of architectures implementing an audio datapath with fixed width is given. Most designs have a datapath width of 16-bit, for the digital and analog parts. The datapath width can be switched in some ASIP based architectures, which are listed in Table 6. This is possible by using different execution units with different datapath width, microSIMD subword modes (single instruction multiple data) [39] or specialized accelerators. To take advantage of the increased dynamic range of floating-point data types, the architectures listed in Table 7 add hardware support for floating-point processing. The approaches used are block floating-point, static floating-point, or emulated floating-point.

Memory in Hearing Aid Systems
Due to strict power and area restrictions, on-chip memory is the only implementation option for the hearing aids listed in Table 8. On-chip area is limited and memory size is critical to the overall size of the chip. The area for the SRAM macros for the mixed-signal hearing aid presented in [9] is 1.35 mm 2 . Compared to the logic size of 5.39 mm 2 and 1.23 kB 0,62 kB data memory for mini-cores, 0,438 kB instruction memory and 0,172 kB coefficient memory Chang et al. [4] 5.00 kB 4 processing elements (PEs) with 512 B instruction memory, 512 B shared memory for inter-PE communication and 2.5 kB local memory Jia et al. [25] 6.00 kB 6 kB data memory Moller et al. [47] 22.50 kB 6,125 kB RAM and 16,375 kB ROM Mosch et al. [48] 68.00 kB 4 kB instruction ROM and 64 kB DSP parameter RAM [2,62,63] 110.00 kB 6 separate logical memory banks, 24-bit data memory, 32-bit DSP instruction memory Gerlach et al. [19] 140.00 kB 28 SRAMs, 65 kB instruction memory, 57 kB data memory and 16 kB audio interface memory Lee et al. [40] 327.00 kB 4 processing cluster, each with 64 kB for the CNNs and 2 kB for the FFT accelerators the analog size of 2.77 mm 2 the area of the SRAM is 14% of the total chip size for a 130 nm ASIC technology. The memory size depends on the complexity and type of the audio processing algorithms. Algorithms with a comparably high memory requirements are those based on trained models or data. Among those are localization algorithms [46,60], deep learning based speech enhancement and speech recognitionalgorithms [37,40,44,52]. As an example, the gaussian mixture model (GMM) of the localization algorithm requires about 90% of the total memory requirement of this algorithm [46,60]. In this case 44,400 of 48,816 words are required only for the trained model. Another example is the hearing aid with the highest amount of onchip memory, which is designed for computing intensive task as neural networks for speech enhancement [40]. The hearing aid with the least amount of on-chip memory is designed for IIR filters [49,50].

Conclusion and Future Trends
In this survey the state-of-the-art processor architectures for hearing aids are presented. Among these architectures are analog, mixed-signal, and digital processors. The main focus is on application-specific instruction-set processors (ASIPs), which are compared to dedicated hardware architectures and hearing aid systems with hardware accelerators. Trends for the ASIC technologies, average power consumption, silicon area, and operating clock frequencies are presented. There is a clear trend towards more flexibility and growing complexity of the algorithms. Especially the deep neural network based speech enhancement and binaural processing algorithms for sound source localization are of current interest. These algorithms with higher processing performance requirements have to be computed under the same strict constraints as power consumption and chip area.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creativecommonshorg/licenses/by/4.0/.