Neuromorphic Sensors, Cochlea
KeywordsOuter Hair Cell Basilar Membrane Automatic Gain Control Tectorial Membrane Spiral Ganglion Cell
The biological cochlea is a bony, fluid-filled structure in the inner ear. It performs the transduction between the pressure signal representing the acoustic input and the neural signals that carry information to the brain. The cochlea is spiraled from the base to the apex, containing approximately 2.5 turns. The internal part of the cochlea is divided into three chambers (scalae): scala vestibuli, scala media, and scala tympani; Reissner’s membrane separates the first chamber from the second chamber, and the basilar membrane (BM) separates the second and third chambers. A specialized structure, the organ of Corti, sits atop the basilar membrane. It contains both the inner and outer hair cells (IHCs and OHCs, respectively). The tips of these cells have hairlike structures, called stereocilia. Deflections in the stereocilia of the IHCs generate neural signals that travel to the brain. Neural signals from the brain can alter the length and width of the OHC cell bodies, affecting the mechanical properties of the BM, and thus its response to sound vibrations.
Airborne sound enters the ear via the ear canal and vibrates the eardrum, which in turn causes vibration of the three middle ear bones, one of which causes the cochlear fluid to vibrate. These small bones perform impedance transformation through a levering action, so that airborne sound can be transmitted into the fluid-filled cochlea.
At the base of the cochlea, the physical characteristics of the BM (it is narrower and stiffer) are such that it responds with greater movement to high frequency stimuli, whereas at the apex, the BM is wider and more flexible, so that it responds better to low frequency stimuli. Each position along the BM can thus be assigned a characteristic frequency that produces the greatest deflection at that place.
When the BM moves, the organ of Corti is displaced. The stereocilia of the IHCs are then displaced by the viscous drag of the cochlear fluid, making the displacement of the IHC’s stereocilia proportional to the velocity of the basilar membrane motion. This movement of the stereocilia causes a change in membrane potential within the IHC, which in turn causes the release of neurotransmitter from the IHC. This release causes spiking activity in neurons, the spiral ganglion cells, whose axons make up the auditory nerve.
The stereocilia of the OHCs are attached to another structure, the tectorial membrane, and they are thus displaced due to the shear between the rigid upper surface of the organ of Corti and the tectorial membrane. The OHCs change their length as a function of their membrane potential and provide active undamping of the mechanical structure. This allows the selectivity of the cochlear filters to adapt as a function of the intensity of the input sound.
Biological cochleas use a space-to-rate encoding in which the input sound is encoded as trains of pulses created from the outputs of a set of broadly frequency-selective channels (Shamma 1985). The pulses are phase locked for low frequencies, and this phase locking disappears for frequencies above around 3 kHz (Palmer and Russell 1986). Encoding the information this way allows sparser sampling of frequency information according to the active frequency channels rather than the maximal sampling rate required to capture all information from a single audio source.
Silicon cochleas have been developed for over 20 years starting with the work of Lyon and Mead (1988). These designs model the biophysics of the BM as a large number of coupled filter stages. The architecture of the silicon cochlea varies from the cascaded architectures (Watts et al. 1992; Sarpeshkar et al. 1998; van Schaik et al. 1996) first introduced by Lyon and Mead (1988), modeling the phenomenological output of the cochlea, to a resistively coupled bank of band-pass filters (Watts et al. 1992; Wen and Boahen 2006; Fragniere 2005; Hamilton et al. 2008) modeling the role of the BM and the cochlear fluid more explicitly (Fig. 1). The cascaded architecture is preferred over the coupled band-pass architecture for better matching, ease of implementation, and a sharp high frequency roll-off of the filters.
Filter Channel Circuits
Each stage of the filter bank usually consists of a second order section (SOS) filter in the case of the cascaded architecture. The characteristic frequencies of the filters along the cascade vary logarithmically with position similar to the approximate logarithmic dependence of the preferred frequency selectivity with position along the basilar membrane of the biological cochlea.
Each SOS consists of two forward filters, made from a transconductance amplifier and a capacitor, and one feedback transconductance amplifier. The difference of the outputs of the forward filters within a single SOS is the input to the subsequent IHC circuit. This difference readout adds a desirable zero to the transfer function without introducing undesirable gain proportional to frequency, as would occur with a temporal high-pass filter (van Schaik et al. 1996). It sharpens the filter response to approximate closer a band-pass response and reduces the phase accumulation across the cascade.
A half-wave rectification circuit implementing the IHC is a simplified model of the response of the IHCs of the biological cochlea. The circuit output drives spiral ganglion cell (SGC) circuits, each implemented as an integrate-and-fire neuron model. The number of events created in a cycle is set by the amplitude and frequency of the input and the amount of charge needed to generate an event.
Event Data from Cochlea
The AEREAR2 system implements a model of the BM, the IHC, and multiple SGCs with different thresholds driven by each IHC. It is the first fully integrated system that combines features of previous silicon cochlea designs that are robust to mismatch, along with novel features for easier programmability of the architecture and operating parameters. The chip includes integrated microphone preamplifiers (Baker and Sarpeshkar 2003), local gain adjustment, and on-chip digitally controlled biases (Delbruck and Lichtsteiner 2006). The individual basilar membrane and SGC circuit signals can also be digitized and read into the computer through the USB port. It has open-sourced host software APIs and algorithms which enable rapid development of application scenarios (Delbruck 2007). A bus-powered USB board enables easy interfacing of the AEREAR2 to standard PCs for control and processing.
The time-stamped AER events with a 1 ms resolution are sent to a PC where they are processed for applications. For natural sounds, off-chip microphone preamplifiers (MAX9814) with 20 dB range of automatic gain control or on-chip microphone preamplifiers with an 18 dB range of digitally controllable gain can be used. Input can also be applied from a PC sound card directly to the filter cascade (i.e., bypassing the preamplifiers).
In Fig. 2a, the dots represent events recorded from the AEREAR2 in response to a spoken sentence. Some channels are more excitable than others and have background activity leading to a sustained background event rate of about 4 keps. The average event rate is 80 keps and the peak event rate is around 325 keps. The sampled microphone output recorded using the onboard ADC is displayed in Fig. 2b.
Various cochlea implementations including the binaural AEREAR2 have been used for various auditory tasks including spatial audition (Chan et al. 2007; Abdalla and Horiuchi 2008; Chan et al. 2010; Finger and Liu 2011). Processing is done on the asynchronous spike events rather than on spectrograms generated from sampled raw audio inputs. Post-processing can be cheaper because only a sparse stream of events needs to be processed. This event-based system is being tested for other auditory tasks such as speaker identification (Chakrabartty and Liu 2010; Liu et al. 2010b; Abdollahi and Liu 2011; Li et al. 2012) and multi-sensor fusion.
- Abdalla H, Horiuchi TK (2005) An ultrasonic filterbank with spiking neurons. IEEE Int Symp Circuits Syst 5:4201–4204Google Scholar
- Abdalla H, Horiuchi TK (2008) Binaural spectral cues for ultrasonic localization. IEEE Int Symp Circuits Syst 2110–2113Google Scholar
- Abdollahi M, Liu S-C (2011) Speaker-independent isolated digit recognition using an AER silicon cochlea. Biomed Circuits Syst Conf 269–272Google Scholar
- Chakrabartty S, Liu S-C (2010) Exploiting spike-based dynamics in a silicon cochlea for speaker identification. IEEE Int Symp Circuits Syst 513–516Google Scholar
- Chan V, Jin C, van Schaik A (2010) Adaptive sound localization with a silicon cochlea pair. Front Neurosci 4(196):1–11, doi:10.3389/fnins.2010.00196Google Scholar
- Delbruck T (2007) jAER open source project. Available http://jaerproject.net
- Delbruck T, Lichtsteiner P (2006) Fully programmable bias current generator with 24 bit resolution per bias. IEEE Int Symp Circuits Syst:2849–2852Google Scholar
- Finger H, Liu S-C (2011) Estimating the location of a sound source with a spike-timing localization algorithm. IEEE Int Symp Circuits Syst 2461–2464Google Scholar
- Fragniere E (2005) A 100-channel analog CMOS auditory filter bank for speech recognition. ISSCC Dig Tech Pap 140–589Google Scholar
- Li C-H, Delbruck T, Liu S-C (2012) Real-time speaker identification using the AEREAR2 event-based silicon cochlea. IEEE Int Symp Circuits Syst 1159–1162Google Scholar
- Liu S-C, van Schaik A, Minch B, Delbrück T (2010) Event-based 64-channel binaural silicon cochlea with Q enhancement mechanisms. IEEE Int Symp Circuits Syst 2027–2030Google Scholar
- Liu S-C, Mesgarani N, Harris J, Hermansky H (2010) The use of spike-based representations for hardware audition systems. IEEE Int Symp Circuits Syst 505–508Google Scholar
- van Schaik A, Fragnière E, Vittoz E (1996) Improved silicon cochlea using compatible lateral bipolar transistors. Adv Neural Inf Process Syst 8:671–677Google Scholar
- Wen B, Boahen K (2006) A 360-channel speech preprocessor that emulates the cochlear amplifier. ISSCC Dig Tech Pap 556–557Google Scholar