Power-efficient gesture sensing for edge devices: mimicking fourier transforms with spiking neural networks

One of the key design requirements for any portable/mobile device is low power. To enable such a low powered device, we propose an embedded gesture detection system that uses spiking neural networks (SNNs) applied directly to raw ADC data of a 60GHz frequency modulated continuous wave radar. SNNs can facilitate low power systems because they are sparse in time and space and are event-driven. The proposed system, as opposed to earlier state-of-the-art methods, relies solely on the target’s raw ADC data, thus avoiding the overhead of performing slow-time and fast-time Fourier transforms (FFTs) processing. The proposed architecture mimics the discrete Fourier transformation within the SNN itself avoiding the need for FFT accelerators and makes the FFT processing tailored to the specific application, in this case gesture sensing. The experimental results demonstrate that the proposed system is capable of classifying 8 different gestures with an accuracy of 98.7%. This result is comparable to the conventional approaches, yet it offers lower complexity, lower power consumption and faster computations comparable to the conventional approaches.


Introduction
Gesture sensing technology allows users to interact easily and intuitively with machines compared to conventional mouse and touch screen systems [1] enabling applications in gaming, TVs, smart homes, and automotive. Gesture sensing technology is primarily dominated by vision-based systems, where large data volumes obtained with camera sensors are used by applying advance computer vision techniques [2][3][4][5]. However, camera-based solutions, on the Muhammad  other hand, raise privacy problems because they operate with photos and videos. Furthermore, they necessitate suitable lighting conditions and weather requirements (in outdoor scenarios), which limits their application. Nonvision based solutions have been devised to overcome these constraints. In non-vision based systems, specialized sensors in the form of gloves or bands are commonly attached to human subjects to record human hand motion and recognize gestures by analyzing the data received with these sensors [6][7][8][9][10]. Although these systems have been proven to overcome the limitation of vision-based systems, wearing these kinds of sensor provide a cumbersome experience to users.
To overcome the aforementioned limitations, contactless non-vision based systems are becoming more prevalent. Among them, radar-based technologies are favorable because they are subtle to lighting conditions, invariant to occlusion of hand, and possess a simple signal processing pipeline [11][12][13]. Additionally, their ability to preserve privacy, work inside enclosures and detect fine motions, it has become a sensor of choice for gesture sensing applications. There are two key research directions in radar-based gesture sensing systems, one direction is developing efficient miniature hardware that can produce high-fidelity target data [14][15][16][17][18][19][20][21], and the other being the signal processing pipeline, mostly driven by deep learning techniques [22][23][24][25][26][27][28][29][30][31].
With the current adoption of radars in IoT devices, radar-based techniques have focused on energy efficiency, the key requirement in IoT [32][33][34]. An effective gesture identification has been demonstrated with a tiny radar chip incorporated into a mobile phone and a small CNN network fitting resource constraint CPUs [35]. TinyRadarNN [36] uses a 2D CNN paired with a temporal convolutional neural network (TCN) to detect gestures utilizing range-frequency Doppler for low-power wearable devices. Similarly, in [37], to improve the robustness, additional features including range, Doppler, elevation and azimuth are fed to a CNN paired with recurrent neural networks (RNNs). In [38], to minimize power consumption and computational costs, a tiny CNN has been developed for an embedded solution for hand gesture recognition. Furthermore, to capture the micro-motion dynamics of the gestures [39][40][41], micro-Doppler [42] signatures are used to classify minute gestures such as gesture made with fingers. The potential of radar is not limited to just recognition of hand swipes, in [43][44][45] their capabilities have been shown in air-writing applications as well, where characters or phrases drawn on a virtual board in front of radar are recognized and categorized.
While traditional deep neural networks (deepNets) approaches remain the best candidates in detection and recognition, the energy efficiency of the systems during inference remains a concern [46], particularly for edge devices. The Multiply-accumulate (MAC) actions between layers deepNets cost the vast bulk of energy and research efforts are focused mainly on reducing MACs by employing smaller networks, pruning approaches, and weights quantization.
Spiking Neural Networks (SNNs) [47] in recent times have gained popularity for their energy efficiency because of the availability of the resources to build the hardware necessary to run SNNs efficiently. In contrast to deepNets, in SNN information is communicated by spike timing in SNNs, which includes latencies and spike rates. The communication in SNN is highly sparse as the information is only transmitted when the membrane potential of a node (neuron) reaches a specific threshold. Additionally, the sparse communication nature (1-bit activity) reduces the amount of data volume communicated between nodes significantly. Furthermore, since nodes are just integrating the spike coming at the node, therefore, the MAC arrays are replaced with adders significantly reducing the amount of computation. Despite the energy efficiency of SNNs [48][49][50][51], due to the non-differentiable transfer function training of SNNs is challenging.
Since conventional backpropagation cannot be applied, therefore, local unsupervised learning which involves mostly Spike timing-dependent plasticity (STDP) and its variants are used [52]. However, these kinds of methods only facilitate small networks requiring fewer parameters. Although recent advancements have shown very promising results with STDP [53], for bigger networks, mostly the concept of deepNets is incorporated where the network is trained in a backpropagation manner by using differential approximations to the spiking neurons [54]. Among the existing SNN neuron models, the leaky integrate-and-fire (LIF) [55] is the most well-known spiking neuron model and is used in this paper. The LIF is a good choice for developing SNN models since it is simple and easy to implement [56], requiring less computing (floating-point operations) and having neuro-computational features. The applications of SNNs in radar-based gesture sensing systems have been shown in [57][58][59][60] where range-Doppler features are fed to SNNs for robust gesture recognition and classification.
In contrast to approaches that operate on range-Doppler features in [61] we propose an SNN-based gesture recognition system that works only on raw data where the fast-time FFT is mimicked in the SNN itself. In this paper, we proposed to extend that approach where both the fast-time and slow-time FFTs are mimicked in the SNN allowing the system the classify up to 8 gestures. The main contributions of this paper are as follows: • An end-to-end radar-based gesture sensing system is proposed where the SNN takes the raw data and performs gesture recognition. • Unlike [57][58][59][60] that works on Doppler images, the proposed approach only relies on raw ADC data. The pre-processing steps such as slow-time and fasttime FFTs are not required reducing the overhead of performing computation and requiring additional computational units. • As an advancement to [61] that only classifies 4 gesture, the proposed approach not only mimics the fast-time FFT but also slow-time FFT in SNN enabling the system to classify up to 8 gestures. • A novel SNN architecture is proposed where the signal pre-processing (slow-time FFT, fast-time FFT) is mimicked in SNN.

Hardware
To perform our experiments, we use in this work the hardware platform developed by Infineon Technologies [62] as shown in Fig. 1. The simplified internal circuitry of the radar chipset is shown inÂ Fig. 2. It is consisted of 1 transmit (Tx) path antenna, 3 (Rx) receive paths, a mixer and Analogto-Digital Converters (ADCs). An external phase-locked loop is used for linear frequency sweeping. An 80 MHz reference oscillator is used to control the loop and the Finite State Machine (FSM) is controlled by a reference clock clicking at 80 MHz [63]. The tuning voltage is varied from 1 to 4.5 V to enable a voltage-controlled oscillator (VCO) that generates linear frequency sweeps from 57 GHz to 63 GHz enabling the chipset to transmit signal up to 6 GHz bandwidth. Serial Peripheral Interface (SPI) and Queued Serial Peripheral Interface (QSPI) are added for memory readout. The maximum data transfer for readout is up to 200Mb/s (4 × 50Mb/s). The out streaming of the data is achieved by an interrupt (IRQ) flag issued by FSM when a threshold set by the host is reached. The mixers are driven by high-pass filters trailed by a variable gain amplifier (VGA) and an antialiasing filter (AAF) and an ADC driver. A 4Mb/s 12 b Successive Approximation Register (SAR) is used for multi-channel ADC (MADC) and Static randomaccess memory (SRAM) of 196 k stores the raw data. The temperature and transmit power readout is done by a sensor ADC (SADC) [63].
For further hardware details, readers can refer to [63].

System design
The experimental setup used in this paper is shown in Fig. 3. A 60 GHz radar configured with the parameters shown in Table 1, collects the hand gesture data in form of raw ADC. The raw ADC data is then fed to a PC via a USB, where the signal processing steps take place. The start and the end of the gesture are automatic. When the hand is detected by the radar, the recording starts and the end of the gesture is marked when the recording is  Proposed processing chain where a radar is connected to a CPU. The CPU takes raw ADC data and performs signal processing such as MTI and target detection. The filtered data is fed to SNN hardware/software for gesture classification completed for 32 consecutive frames. For shorter and quick gestures where the number of frames is less than 32 are appended with zeros chirp values.

Signal model
Frequency-Modulated Continuous Wave radar (FMCW) radar transmits chirps, which are linearly increasing frequency waves. The chirps when reflected from a target are collected at a receiver antenna. A mixer at the receiver mixes the transmitted and received signals, resulting in a low-pass filtered signal known as an Intermediate Frequency signal. For a signal of bandwidth B w and duration T the frequency of the FMCW waveform can be mathematically expressed as: where ν c is the ramp start frequency. The beat signal is formed by mixing the reflected signal and replicate of the transmitted signal. The down-converted IF signal is given as: with the assumption of τ m /T << 1. The τ m = 2R m +v m t c represents the time taken by the transmitted signal to reach back to the receiver after reflecting from mth target at distance R m from the radar with radial velocity v m . The constant c represents the speed of light. The IF signal I F (t) is sampled and fed to subsequent processing steps.

Raw data
The I F (t) or raw ADC data from the radar chipset is collected chirp-wise (fast-time) and stacked together in rows (slow-time). So a frame is represented by a 2D array where each row represents a chirp.

Moving target indication filtering
In radar signals reflected from stationary objects can have greater magnitude than the target reflection thus subjugating the reflections of the target (in our case hand). To eliminate these reflections from stationary objects we used moving target indication (MTI). In MTI a moving average filter at each frame i is applied to the fast-time f (i) which is mathematically given as: where α denotes forget factor set to 0.01. Once the data is filtered using MTI it is fed to the target detection block.

Target detection
For hand detection and recognition, thresholding is employed on the filtered fast-time data. The threshold is calculated by taking the mean value of the fast-time data with some scaling factor. For example, the threshold ϑ at frame j is given by: where β is the scaling factor set to 3. The scaling factor is obtained empirically for the trade-off between the false positive and the probability of detection. The n in the equation represents the index along range bins and N b is the number of range bins. As soon as the moving target is detected, the filtered raw data is collected and fed to the SNN block for gesture classification. f n is calculated when there is no target in the j th bin.

Spiking neural networks
Spiking neural networks or SNNs are 3rd generation of neuron that advances the previous neural networks by capturing closely the neurocomputational features of the human nervous system [47]. Neurons in SNNs are more computationally powerful because SNNs not only take into account the spatial information but also take the temporal aspects (precise timing of the spikes) [64]. SNNs are ideal for large-scale data processing due to their low power consumption and ability to perform parallel analysis [65]. Their low power consumption, quick inference, and event-driven information processing makes them an ideal candidate for deep neural networks/machine learning applications [57][58][59] where low energy consumption is desired.
There exist different neuron models whose computational efficiency and biological plausibility have been discussed briefly in [66]. Among them, LIF is popular it is a simple model with fewer computations and is biologically plausible.

Proposed spiking neural network architecture
The aim of the proposed architecture is to be efficient in terms of computing power and energy. Therefore, we have opted for LIF as a neuron model choice. The proposed SNN architecture is shown in Fig. 4. Since the LIF is not differentiable and hence we cannot perform the backpropagation, therefore, we used a differentiable approximation to the LIF.
To mimic the discrete Fourier transform (DFT) in SNN layers we exploit the successive multiplication representation of DFT because of its linear transformation. Each DFT dimension is represented by a single Dense layer where the weights of the layer are real and complex parts of the coefficients of the DFT. Let the radar is operating with F n total number of frames and S n number of samples per chirp hence the input data dimension is S n × F n , to perform the DFT on this input data, the first layer contains 2 × S n nodes to compute the real and imaginary values. The connectivity between the input nodes and the layer nodes would be 2S n × S n nodes. We use DFT trigonometric equation to calculate the weights of the connection mathematically expressed as: where q and p take values between 0 and S n − 1. When applied to an input vector Y , in matrix form (6) can be written as: where C is the result of the transform, W R and W I are the real and imaginary coefficients. Since in radar processing the 2nd FFT is applied across slow-time therefore, the output of the first layer is reshaped and then transposed using the transpose layer. Then the same formulation is applied to the 3rd layer where now the real part and imaginary part from the first layer output are separately connected to real and imaginary weights that are calculated using the above trigonometric (6). Let Y r represents the transformation Y T W R and Y i represents the transformation Y T W I then at layer 3 following transformation occurs: The first and 3rd layers are appended with LIF as an activation function. The LIF transforms the output of each neuron into spikes. Let the 3rd layer's output be represented as: then the 1st and 3rd layer's outputs are appended with one another as: and is provided to the convolutional layer where each transformation is represented as a channel making φ as a 6 dimensional vector. The convolution layer has a total number of 16 filters of size 3. The stride is set to 1. The output of the convolutional layer is then fed to a fully connected layer with 64 neurons appended with LIF. Both the fully connected layer and convolutional layer use LIF as an activation function. In last a fully connected layer with 8 neurons is used as model output.

Training
The training of the network is performed in a conventional backpropagation manner using NengoDL [67] as it allows a differential approximation of the firing rate of the LIF neurons in the form SoftLIF [68] activation (an approximation to LIF). For calculating classification probabilities, we have used multi-class cross-entropy as an objective function. Just like ANNs, SNNs learning performance also requires a suitable optimization solver and weight initialization. We employed the adaptive moment estimation (Adam) [69] as an optimization solver because of its computational efficiency while being proved to be a good candidate for large networks [70].
As a loss function softmax classifier is used that uses cross-entropy as a loss function. The cross-entropy for J number of training samples belonging to K number of classes if mathematically expressed as: where z k j is true label for training example j for class k. x is the input example to the model H with weights θ .

Testing
To make the network spiking, the LIF neurons in the trained model are replaced with spiking LIF neurons. The connection weights and neuron biases for the spiking LIF are extracted from the trained model. In order to acquire an accurate estimate of the spiking neuron output over time, the test inputs or samples are adjusted for testing and presented to the network several times or steps.

Dataset
We have used the dataset from [57]. This dataset has a total of 4800 hand gesture swipes collected with 5 people. The dataset is a collection of 8 different gestures as shown in Fig. 5.
The dataset has 600 samples for each gesture. During both training and validation, the dataset was collected with minimum prior supervision provided to the users. Furthermore, the data was gathered in a variety of settings, including different locations and environments. We performed 10 trials of the training and testing experiment, and then the accuracy is average along those trials. For each trial, we randomly select 80% of the total dataset for training and 20% for testing.

Results
To assess the performance of the proposed system we have used classification accuracy as a measure. Our proposed system achieved a similar average accuracy of 98.7% as is achieved by state-of-the-art methods over random trials as shown in Table 2. All the methods were applied to the same dataset. In [58,59,61] 4 out of 8 gestures are used from the same dataset.

Discussion
In this paper radar-based gesture sensing system running on SNN is proposed where the proposed system does not require the conventional radar signal pre-processing steps such as fast-time and slow-time FFTs followed by Constant false alarm rate (CFAR) and the gesture sensing is performed using the raw ADC data only. The fast-time and slow-time FFTs are mimicked within the SNN and the CFAR and other denoising steps are learned by the SNN itself demonstrating the full capabilities of the SNNs. SNNs are energy-efficient, fast, scalable and hardware friendly making them a suitable energy-efficient and cost-effective solution.
Besides avoiding the use of traditional feature engineering techniques (slow-time and fast-time FFTs) which are mimicked within the SNN network, in contrast to the stateof-the-art methods that rely on 32 chirps per frame, the proposed method only works with a single chirp per frame. This significantly reduces the computational cost.
The capabilities of the proposed system in terms of firing/prediction are shown in Fig. 6 where the firing overtime for some of the examples is considered. For every class, the model starts by firing with more or less similar probabilities but quickly starts firing with a higher probability for the correct class after a few time steps. This happens due to spikes integration over a longer time period enabling higher accuracy in terms of correct class prediction.
Since we aim to make our model close to biological plausible and computationally efficient, therefore, we opted for the LIF model because of its simplicity. Like conventional deepNets or any machine learning model, the learning performance of the SNN depends on hyperparameters. When it comes to SNN there are two types of hyperparameters that need to be considered: one is for the SNN model itself and the second is the hyperparameters at the neuron level. The optimal parameters used for the training are shown in Tables 3 and 4. These parameters were obtained by performing a grid search allowing the  Similarly, ROT stands for range over time image network to overfit and underfit. Furthermore, the optimal parameters used for Adam and the weight initialization of the dense and convolutional layers are shown in Table 1.
The convolution layer weights are drawn from the wellknown Xavier initialization [72] and are proven to be well suited for SNN [73]. With the parameters given in Tables 3, 4 and 1, the proposed system is capable of classifying gestures 8 with an accuracy (98.7%) level similar to that of state-of-the-art SNNs and deepNets in Table 2. Figure 7(c) shows the confusion matrix obtained for testing the model with the aforementioned testing dataset. It can be seen that the system is able to classify gestures 1 and 2 100% correctly without any confusion. Gesture 3 is confused with 1.10% with geesture 5 and gesture 4 is confused 0.87% with gesture 6 and gesture 7 respectively. The only high number of miss classification, around 5.88%, occurs for gesture 5 and is confused with gesture 7. Gesture 6 is confused with gesture 71.03% of the times and gesture 7 is confused 0.88% with gesture 6. The last gesture 8 is confused with gestures 4 and 5 each 0.98% times respectively. The classifying confusion of the proposed system (Model 3) shown in Fig. 7(c) is significantly less compared to the Figs. 7(a) (Model 1) and 7(b) (Model 2). This less confusion of the proposed model is attributed to the mimicked FFTs in starting layers (Table 5).
To further evaluate the performance of our proposed system we used t-Distributed Stochastic Neighbour Embedding algorithm (t-SNE) to visualize high-dimensional feature space. This helps us how well discriminating features are produced by the network for each class in the dataset. We fed the output of the last layer (before the classification layer) into t-SNE with the associated labels. We change the layer neurons from 2 to 64 incrementally with a power of 2 and calculated the t-SNE for each case as shown in Fig. 8. It is observed that increasing the number of neurons (dimensions of embedding space) in the layer increases the separability. The proposed SNN learned both separable and discriminating features, as well as generated close-knit clusters for categorizing the 8 gesture at 32 neurons. This indicates that our SNN can correctly categorize the 8 gesture types at a lower dimension of 32.
Furthermore, Fig. 9(c) shows the discriminating features learned layer by layer. It can be seen that when moving along layers the features become more and more discriminating and form close clusters. At the last layer, the features are well discriminated and hence are easily classified. The importance of mimicking the range and Doppler FFT in the SNN model is evident from Table 6 where the performance in terms of accuracy is given for training and testing the model with and without the FFTs. It can be seen that without the FFTs in the SNN the performance is not so good with 81.03% accuracy. The performance becomes better by 7.87% when introducing the range FFT layer and reaches 98.7% by adding the Doppler FFT layer, our proposed method in this case. It is also evident from the tSNE Where the class labels are: a) 0 -down up, b) 1 -up down, c) 2left-right, d) 3 -rubbing, e) 4 -right-left, f) 5 -diagonal southwest to northeast, g) 6 -diagonal southeast to northwest, h) 7 -clapping. It can be seen that after a few time steps the SNN starts firing for the correct class plots of the CNN model and range model that the features produced are not well discriminating and hence resulting in a low performance than the proposed model.
To increase further power efficiency we looked at the effect of the post-training quantization on the performance in terms of accuracy as shown in Table 7. The quantization increases the power efficiency in two aspects: it reduces the memory footprint costs and computational costs. Furthermore, quantized data with a lower bit rate requires less data movement on-chip and off-chip, resulting in better energy efficiency and reduced memory bandwidth. Table 7 shows the post-training quantization effect on all the 3 models of Table 6. The quantization is performed . This drop is due to the higher number of neurons used in the proposed model and hence the high impact of quantization. However, we believe higher accuracy can be achieved with quantization aware training which currently the nengoDL framework does not support. Increasing the bits for quantization increases the accuracy as expected and with 8-bit and 16-bit, the proposed model achieves 92.68% and 96.46% of accuracy with good precision and recall as indicated by f1-scores. Where for "f1-scores micro" average is calculated by counting the total true positives, false positives and false negatives. For "f1-scores macro" the metric is calculated for each class using their unweighted arithmetic mean. Considering the goal of having a system that is energyefficient, we looked into the energy consumption per classification of the proposed system. Since the actual hardware-based energy calculation is out of the scope of this research work,in the current study, we relied the hardware metrics of the μBrain chip defined in [74] to estimate the energy consumption. If SP N is the maximum number of spikes, SP E = 2.1pJ is the energy per spike and LK P = 73μW is the static leakage power, then the energy  . 7 The confusion matrices of the models mentioned in Table 6 obtained with the test dataset. It can be seen that the Model 3 (our proposed model) performance is good as compared to the other models. This performance is attributed to mimicking the FFTs as the first lay- consumption C E per classification using μBrain hardware metrics is given as: where δT is the inference time. Assuming the δT = 28 ms. The energy consumption per classification of our proposed system is approx. C E = 2.1μJ. To see the energy efficiency for SNN hardware, readers can refer to [75], where the SNN hardware is compared with the other deep learning hardware in terms of energy efficiency. The performance of SNN hardware was tested on a keyword spotting application using a dynamic energy cost per inference on some energy-efficient accelerators commercially available as shown in Fig. 10. The dynamic energy cost per inference is the difference between the total amount of energy consumed by hardware in a single inference versus the energy consumed while the hardware is idle [75]. Here, an inference means passing an input vector through a two hidden layer feed-forward ANN to predict a probability distribution over alphabetical characters. They showed up to 10× improvement in power efficiency in their experiments.
Despite its simplicity, the proposed prototype SNN solution has the ability to identify real-time hand gestures with high accuracy, comparable to state-of-the-art deepNets and SNN counterparts. Additionally, the use of SNNs makes the proposed system a low-power and hardware friendly solution suitable for applications where low-power is desired. The subfigure e) shows that even at lower dimensions the SNN is able to learn both separable and discriminating features Fig. 9 Visualization of high dimensional feature spaces of layers of the network using t-SNE plots. Column a) shows the t-SNE for the input layer, CNN layer and dense layer for the CNN (Model 1). Similarly, column b) shows the t-SNE for the range model (Model 2) and column c) shows the t-SNE for the layers of the proposed method (Model 3). It can be seen the features getting discriminated as we move deeper in the network. The legend of the sub figures represent the classes as: 0down up, 1 -up down, 2 -leftright, 3 -rubbing, 4 -right-left, 5 -diagonal southwest to northeast, 6 -diagonal southeast to northwest, 7 -clapping

Conclusion
A novel spiking neural network(SNN)-based gesture sensing system implemented using a 60-GHz radar system is proposed. Unlike existing methods that use imagebased input data or point cloud input data, here we propose to directly leverage raw ADC data as input to the SNN. The SNN implicitly mimics the Fourier transform processing that not only helps to reduce the overhead of additional FFT accelerators but also makes the FFT pre-processing specific to the task, in this case, gesture sensing. In comparison to the state-of-the-art, our suggested SNN architecture offers a similar degree of accuracy performance on 8 gestures, making the proposed system suitable for low latency and low power embedded implementations. As future work, we would like to investigate how to mimic the non-parametric Fourier transforms. Furthermore, we would also like to mimic the micro-Doppler behavior that would allow us to classify micro-Doppler gestures. where N in , N out are the number of input and output units Table 6 A comparison of classifications based on different layers. The comparison is done using the same parameters as shown in Tables 3, 4 and 5

Model
Layers Accuracy

Data Availability
The data associated with this research cannot be made publicly available due to company confidentiality constraints.

Consent for Publication
We certify that the submission is original and has not been submitted for publication anywhere else.
Human and animal research disclosure No animal or human subjects were used while conducting this research.

Competing interest
The authors have no competing interests to declare that are relevant to the content of this paper.

Conflict of Interest
The manuscript was reviewed and approved by all co-authors, and there are no conflicts of interest to report.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.