Interfacing PDM MEMS Microphones with PFM Spiking Systems: Application for Neuromorphic Auditory Sensors

Neuromorphic computation processes sensors output in the spiking domain, which presents constraints in many cases when converting information to spikes, loosing, as example, temporal accuracy. This paper presents a spike-based system to adapt audio information from low-power pulse-density modulation (PDM) microelectromechanical systems microphones into rate coded spike frequencies. These spikes could be directly used by the neuromorphic auditory sensor (NAS) for frequency decomposition in different bands, avoiding the analog or digital conversion to spike streams. This improves the time response of the NAS, allowing its use in more time restrictive applications. This adaptation was conducted in VHDL as an interface for PDM microphones, converting their pulses into temporal distributed spikes following a pulse-frequency modulation scheme with an accurate inter-spike-interval, known as PDM to spikes interface (PSI). We introduce a new architecture of spike-based band-pass filter to reject DC components and distribute spikes in time. This was tested in two scenarios, first as a stand-alone circuit for its characterization, and then integrated with a NAS for verification. The PSI achieves a total harmonic distortion of -\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}46.18 dB and a signal-to-noise ratio of 63.47 dB, demands less than 1% of the resources of a Spartan-6 FPGA and its power consumption is around 7 mW.


I. INTRODUCTION
Pulse-density modulation (PDM) is a sigma-delta modulation technique used to digitize an analog signal with a 1-bit data stream and a high sample rate.In recent years, many low-power microelectromechanical (MEMS) microphones designed for mobile applications, such as tablets, laptops and cell phones, among others, have appeared in the market.In PDM data streams, a logic 1 corresponds to a pulse of the maximum positive polarity (+A), and a logic 0 represents the maximum negative polarity (-A).A signal value of 0 is codified by an alternation of 1s and 0s.Commonly, this type of modulation is associated with neuromorphic information codification, in the sense of being a rate-coded signal [1].This kind of computation allows processing information only when it is needed, avoiding periodic or redundant data processing, thus saving power and computational resources [2].
Fig. 1 shows a global scheme of the NAS architecture.First, the audio information is provided by a digital audio codec, whose discrete audio samples output is converted into spike streams, following the pulse frequency modulation (PFM).The NAS filters these spikes directly, spike after spike, using a set of Spike-based Low-Pass Filters (SLPF) connected in a cascade fashion.Finally, spikes are transmitted to the next layers using the Address-Event Representation (AER) protocol [13].
NAS has been currently used for many practical applications, as pitch frequency detection [14], musical tones identification [15], sound source localization [16], heart murmurs diagnosis [17], and speech recognition [18], among others.Great effort has been dedicated to improve NAS features, as it is the input layer of all these systems, improving responses and spreading for new applications of this technology.
One main disadvantage of the NAS is the need for a discrete audio codec to capture analog audio.Audio codecs provide a set of digital periodic samples that must be converted into spikes.These devices have a sampling period from 22.67µs to 10.41µs, limiting the temporal capabilities, e.g., sound localization applications.[19]However, PDM microphones provide a stream of rate-coded signals with higher sample rate (3.125MHz in this case, with a time resolution of 320ns), which can represent the NAS input and be directly processed as spiking information.Therefore way, the need to generate spike streams synthetically is avoided, which was a restriction in previous NAS implementations [10].

II. PDM TO SPIKES INTERFACE (PSI)
PDM information codification is substantially different from rate-coded spike-based signals.In rate-coded spike-based signals, the information is given by the spikes frequency, which means that the information is inversely proportional to the temporal Inter-Spike-Interval (ISI).This means that, with only two spikes, it is possible to reconstruct the amplitude of the original signal.Spike-based systems use PFM to distribute the spikes  in time properly, in order to accurately represent the signal's information.In PDM signals, the information is contained in the density of pulses, and one pulse is generated every clock cycle, where a logic '1' represents a positive value, and a logic '0' a negative one.For example, when there are more 1s than 0s the information is positive, and the more 1s, the higher the amplitude is.Thus, for reconstructing the signal's amplitude, it is necessary to collect PDM pulses during a temporal window, performing a downsampling operation.Digital systems convert PDM signals to digital values using the pulse coded modulation (PCM).PCM is reconstructed from PDM with a digital decimation stage, commonly performing a downsampling by a factor of 64, and providing a multiple-bits word (e.g., 16 bits @ 48.8kSamples/s) with high frequency noise added.After this stage, an infinite impulse response (IIR) filter is commonly used as a band-pass filter (BPF) to remove DC components and high frequency quantization noise.
The main goal of this work is to design an HDL circuit able to read PDM pulses and redistribute them in time as rate-coded spikes, with an ISI proportional to the sound pressure.Fig. 3 briefly shows how signals evolve from PDM pulses to PFM spikes.
To convert PDM information into rate-coded spikes, a two stages circuit (Fig. 2) is proposed.The first stage is a finite state machine (FSM) circuit that works as an edge detector, and generates a spike of a single clock cycle for each PDM pulse.The next stage consists in one (monaural) or two (binaural) banks of spike-based band-pass filters (SBPF), which process raw spikes from the FSM to give a temporal distributed spikes stream.
Since spikes can be both positive and negative, we use two wires to represent signed spikes.The FSM output generates a stream of signed spikes that are still not distributed in time, with the ISI being constant and equal to the PDM clock period.Fig. 3 presents an example of a positive increasing audio signal, and how spikes evolve.Fig. 3: Filtered spikes evolving from an increasing PDM audio signal.

A. PDM front-end circuit
The PDM front-end circuit (PFC) has two main functionalities: to generate the PDM clock and to convert long PDM pulses into one clock cycle spikes.The hardware platform used to implement these blocks is called AER-Node [20] and it has a clock frequency of 50MHz.Dividing this clock by a factor of 16, we get a PDM clock of 3.125MHz, which is the maximum value allowed by this kind of MEMS microphones.In every PDM clock cycle there is a PDM pulse in the PDM DAT line.If PDM DAT has a value of 1 then a positive spike is transmitted to the next stage, and if there is a 0 it will be a negative spike.

B. Second-order Spikes Band-Pass Filter (SBPF)
The next stage is a Spike Band-Pass Filter (SBPF), whose functionality is detailed in [21].This filter is composed of two first-order Spike-based Low-pass filters (SLPF) and one Spike Hold & Fire (SH&F) (see Fig. 4).SH&F is a SSP building block that subtracts the spike rate between two spiking signals (detailed in [12]).The SLPF that is connected to the SH&F's positive input has a cut-off frequency that is higher than the SLPF connected to the negative input.Subtracting the output from both spike-based filters, only the information in the middle band remains, rejecting the DC and high-frequency components.These filters are connected with 2-bit buses (for positive and negative spikes).These blocks use positive and negative activity to represent the bipolar nature of audio.

C. Hardware resources and power consumption
The PSI design was synthesized and implemented on a Xilinx Spartan 6 FPGA (XC6LX150T) to measure the required resources and its power consumption.Table I presents the resources that are needed to implement PSI in FPGA.As can be seen, the amount of resources needed is under 0.45% of the total slice registers and logic (LUT) of the FPGA.The PSI can operate at a clock frequency up to 147.18 MHz.After the synthesis, the power consumption was simulated using Xilinx XPower tool assuming a 50% of signal transitions, obtaining a power consumption estimation of 2.67mW for the PSI.This power consumption should be added to the MEMS microphones' power, which depends on the ones that are used.In our case, each of the microphones demands 0.98mW (according to the documentation provided by the manufacturer).Therefore, the whole system demands a total of 4.63mW for a binaural NAS.

III. EXPERIMENTAL SETUP
For testing purposes, a scenario was built to analyze the PSI's standalone behavior.Fig. 5 presents the testing setup, where two PDM microphones from ST Microelectronics (MP34DT02) were connected to an AER-Node board, which was in turn connected to an USB-AERmini2 board.MP34DT02 are omnidirectional MEMS microphones with PDM interfaces, with an acoustic overload point of 120dB SPL , an SNR of 60dBm, a dynamic range of 86dB, and a maximum power consumption of 0.98mW (as previously described).The AER-Node board has a Xilinx Spartan 6 FPGA (XC6S150T), which holds the PSI, a 128-channel binaural NAS, and a set of AER interfaces.Its parallel AER output was connected to the USB-AERmini2 board [22], which works like a bridge between AER buses and USB ports, allowing the AER events to be sent from the AER-Node board to a host computer.In the computer, two software tools were running: jAER [23], to visualize and log AER information; and MATLAB, to analyze and process the events.The sound used to excite the system was played using a flat response audio speaker, in this case a BEHRITONE C5A from Behringer, placed at a 1-meter distance from the PDM microphones and at a fixed gain in order to have an audio level of 65dBSPL on the microphones' side.This kind of equipment was used to avoid the influence of audio equalizers and the compensation that domestic Hi-Fi equipment presents.Thus, no preprocessed sounds were used and, instead, we tried to reproduce sound waves in the most ideal way possible.This will potentially open our system to many stand-alone applications, such as robotics.

A. PSI Experimental results
For the first experiment, the system was stimulated with a clear 500Hz pure tone audio signal played by the flat response speaker.Fig. 6 represents the spikes from each stage of the PSI.Higher addresses (3 and 2) correspond to the spikes fired by the PDM front-end circuit, and lower addresses (1 and 0) to the SPBF output.Spike addresses 3 and 1 are positive, whilst 2 and 0 are negative.Fig. 6 depicts how the addresses that contain the output of the PDM front-end overlap the information between positive and negative, which does not happen after filtering it with the PSI.In PDM, information makes sense for the average activity of a temporal window.However, in the spikes domain, the information is coded with the time between two consecutive spikes.From the signal sign point of view, we can say that zero-crossing is performed when the polarity of the spikes changes(i.e. after a positive spike, a negative one is produced).In the case of the PDM front-end output, there are several spikes overlapping positive (address 3) and negative activity (address 2).From the point of view of ISI, this represents a considerable amount of high-frequency noise.However, if we check the SBPF output spikes, there is no overlapping between positive (address 1) and negative (address 0) activity, rejecting high frequency noise.Fig. 7 shows the reconstruction of the original signal using the spikes' ISI.First, the green signal represents the reconstruction from PDM front-end's output.This is a noisy signal and it has an offset introduced by the PDM microphones.On the other hand, the blue signal is the reconstruction from SBPFs output.A clear tone with neither noise nor offset can be seen, improving the previous audio signal quality.Analyzing this response, we achieve a Total Harmonic Distortion (THD) of -39.51dB and a Signal-to-Noise Ratio (SNR) of 59.12dB.To measure the number of zero-crossings, a one second recording was analyzed and the amount of changes from positive spike to negative and vice versa were counted.In the PDM front-end's output, more than 80k zero-crossings were found.However, in SBPF's output, 1k zero-crossings were found, which exactly matches a 500Hz signal.
Our second experiment consisted in a frequency sweep from 20Hz to 20KHz to analyze the behavior of the system with different frequencies.Fig. 8 shows the frequency sweep results as a bode diagram.The top curve in Fig. 8 presents the gain for diverse frequencies.PSI gain starts to increase from 70Hz to 12KHz, and then decreases rejecting higher frequencies.This bandwidth is enough for many applications related to speech and speakers recognition.The spike-based filters in the PSI introduce a temporal deviation.It was measured as signal phase (in rads) and the results are included in Fig. 8 bottom.PSI has a mean phase of -4.5 rads, approximately, increasing when frequency is close to the cut-off frequency, as expected from a low-pass filter.

B. NAS integration
In order to validate the PSI on a real scenario, it was integrated in a 128-channel binaural NAS.This NAS was fed with a male voice saying: "Si vis pacem, para bellum", and the output activity was recorded using an USB-AERMini2 board as an AER-DATA file.Fig. 9 contains the cochleogram and the sonogram of this recording, respectively.Each word is clearly distinguishable, and activates middle channels between 200Hz and 5kHz.These figures were obtained by using NAVIS software [24].
IV. CONCLUSIONS In this paper, a PDM to PFM Spikes circuit is presented.PDM MEMS microphones are useful for low-power, standalone, embedded applications.Their output is based on spike density, and it needs to be adapted in order to be used as input to the NAS.A two-stage circuit for FPGA was designed, which is able to convert PDM information to PFM spikes with a consistent ISI.The PSI was synthesized for a Spartan 6 FPGA with low resources and power requirements.It was then tested with real audio stimulus, analyzing its behavior in terms of temporal response and zero-crossings.The PSI was also integrated in a full NAS to demonstrate the viability of the combination of this kind of systems.The use of PDM microphones with NAS considerably simplifies the system, enabling compact and portable spike-based auditory systems with lower power consumption.

Fig. 5 :
Fig.5: Test scenario.Sound is played by a response speaker, exciting PDM microphones.Finally, the information is sent to a computer through an AER-to-USB interface.

TABLE I :
PSI hardware requirements