Hilbert sEMG data scanning for hand gesture recognition based on deep learning

Tsinganos, Panagiotis; Cornelis, Bruno; Cornelis, Jan; Jansen, Bart; Skodras, Athanassios

doi:10.1007/s00521-020-05128-7

Hilbert sEMG data scanning for hand gesture recognition based on deep learning

Original Article
Open access
Published: 07 July 2020

Volume 33, pages 2645–2666, (2021)
Cite this article

Download PDF

You have full access to this open access article

Neural Computing and Applications Aims and scope Submit manuscript

Hilbert sEMG data scanning for hand gesture recognition based on deep learning

Download PDF

Panagiotis Tsinganos ORCID: orcid.org/0000-0003-3188-2432^1,2,
Bruno Cornelis^2,3,
Jan Cornelis²,
Bart Jansen^2,3 &
…
Athanassios Skodras¹

3330 Accesses
20 Citations
2 Altmetric
Explore all metrics

Abstract

Deep learning has transformed the field of data analysis by dramatically improving the state of the art in various classification and prediction tasks, especially in the area of computer vision. In biomedical engineering, a lot of new work is directed toward surface electromyography (sEMG)-based gesture recognition, often addressed as an image classification problem using convolutional neural networks (CNNs). In this paper, we utilize the Hilbert space-filling curve for the generation of image representations of sEMG signals, which allows the application of typical image processing pipelines such as CNNs on sequence data. The proposed method is evaluated on different state-of-the-art network architectures and yields a significant classification improvement over the approach without the Hilbert curve. Additionally, we develop a new network architecture (MSHilbNet) that takes advantage of multiple scales of an initial Hilbert curve representation and achieves equal performance with fewer convolutional layers.

sEMG based hand gesture recognition with deformable convolutional network

Article 17 January 2022

STCN-GR: Spatial-Temporal Convolutional Networks for Surface-Electromyography-Based Gesture Recognition

A novel sEMG-based dynamic hand gesture recognition approach via residual attention network

Article 17 June 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The problem of gesture recognition is encountered in many applications including human computer interaction [38], sign language recognition [10], prosthesis control [9] and rehabilitation gaming [8, 34]. Signals generated from the electrical activity of the forearm muscles, which can be recorded with surface electromyography (sEMG) sensors, contain useful information for decoding muscle activity and hand motion [16].

Machine learning (ML) classifiers have been used extensively for determining the type of hand motion from sEMG data. A complete pattern recognition system based on ML consists of data acquisition, feature extraction, classifier definition and inference from new data. For the classification of gestures from sEMG data, electrodes attached to the arm and/or forearm acquire the sEMG signals, and features such as mean absolute value (MAV), root mean square (RMS), variance, zero crossings and frequency coefficients are extracted and then fed as input to classifiers like k-nearest neighbors (k-NNs), support vector machine (SVM), multilayer perceptron (MLP) or random forests [40].

Over the past years, deep learning (DL) models have been applied with great success in sEMG-based gesture recognition. In these approaches, sEMG data are represented as images and a convolutional neural network (CNN) is used to determine the type of gesture. A typical CNN architecture consists of a stack of convolutional and pooling layers followed by fully connected (i.e., dense) layers and a softmax output. In this way, CNNs transform the input image layer by layer, from the pixel values to the final classification label.

The application of DL methods can also favor the performance of sEMG interfaces used in rehabilitation. There is a large body of literature about the utilization of sEMG devices in rehabilitation and myoelectric control [17, 28, 36, 37]. A common limitation presented is the lack of classification robustness. To address this, recent studies provide evidence for significant performance improvement (with respect to classification accuracy and latency) achieved with DL approaches [6, 41, 47].

CNNs have made breakthroughs in feature extraction and image classification tasks in 2D problems. Yet, choosing a proper method to convert time-series into images that can be used as inputs to CNN models is not obvious. Among the methods proposed in the literature are the segmentation of multi-channel signals using windows and the application of 1D transformations such as the Fourier and wavelet transforms.

In this work, we extend our previous research [49] about the application of the Hilbert space-filling curve to represent sEMG signals as images. This type of curve is useful because it provides a mapping between 1D and d-dimensional spaces while preserving locality. This approach enables the application of image processing techniques on sequence data such as biomedical signals. In this case, CNN models are used to classify Hilbert curve images of sEMG signals for the problem of hand gesture recognition. The overview of the current work is given in Fig. 1. The main contributions presented in this paper are:

a detailed performance comparison of the proposed locality-preserving Hilbert curve representation over well-known CNN models
a comparison with the state-of-the-art (WeiNet [53])) in hand gesture recognition
the development of a new lightweight model (MSHilbNet) based on multiple scale Hilbert curves.

The remaining of the paper is organized as follows. Section 2 provides a literature review on gesture recognition approaches, as well as on applications of the Hilbert curve to classification tasks. In Sect. 3, the details of the proposed method and of the CNN architectures used for experimentation are given. Section 4 describes the experiments performed for the evaluation of the models, while the results followed by a discussion are presented in Sect. 5–6. Finally, a summary of the outcomes is given in Sect. 7 and Appendix 1 contains additional figures and tables.

2 Related work

Both typical ML approaches and DL practices have been employed to study the problem of sEMG-based hand gesture recognition. The first ML approach is presented in [25], where for the classification of four gestures a set of time-domain features is extracted from sEMG signals recorded with two electrodes. The authors of [7] achieve 97% accuracy in classifying three types of grasps using the RMS feature extracted from seven electrodes as input to an SVM classifier. A comparison of different types of EMG features and classifiers for the classification of 52 gestures from the Ninapro reference dataset [3, 4] is provided by [3, 19, 30]. A random forest classifier and a combination of statistical and frequency domain features, i.e., MAV, histogram, wavelet and Fourier transform features, yield the best performance, an accuracy of 75%.

The literature for DL methods related to sEMG gesture recognition has been continuously increasing over the last years. In [39], the authors evaluate different configurations of RNNs and their results show that a classifier with a bidirectional recurrent layer composed of long short-term memory (LSTM) units followed by attention mechanism performs best in an application classifying 18 gestures from the Ninapro database. The authors of [46] use an unsupervised generative flow model to learn comprehensible features classified by a softmax layer that achieves about 64% accuracy on classifying 53 gestures. RNNs are important for sequence problems where successive inputs are dependent on each other. However, this is not totally true for EMG signals since they are inherently stochastic.

CNN models are the most commonly used DL approach for the task of gesture recognition based on sEMG. In [35], the authors develop a CNN for the categorization of six common gestures that improves the classification accuracy compared to SVM. The model of [2] consisting of convolutional and average pooling layers results in comparable performance to what was achieved using typical ML approaches. The results of our previous work [48] indicate that the use of max pooling rather than average pooling and the addition of dropout [44] layers between the convolutions can produce a 3% increase in accuracy (from 67 to 70%). The works of [18, 53] propose a few novelties compared to previous works not only in network structure but also in the way EMG signals are acquired. This is based on a high-density electrode array, which is considered an effective approach in myoelectric control [27, 33, 45]. Using instantaneous EMG images, the CNN model of [18] correctly classifies a set of eight hand movements with a rate of 89%, whereas the multi-stream CNN described in [53] achieves 85% accuracy on the Ninapro database. In their later works [22, 52], the authors of [53] propose a multi-view approach combining various sEMG representations, including FFT and traditional feature vectors, that achieves a classification improvement of about 3%.

Methods that deal with the adaptation of a pretrained network to new users have been developed as well. The work of [14] utilizes adaptive batch normalization (AdaBN) [31] to distinguish between subject-specific knowledge (normalization layers) and gesture-specific knowledge (convolutional layer’s weights), whereas in [12] a method based on weighted connections between a network trained on one subject (source domain) and a network trained on a different subject (candidate target network) is presented. In addition, [12] compares methods of data augmentation for sEMG signals.

The properties of the Hilbert curve are well known and have been exploited in the past for diverse applications. The authors of [13, 29] employ the Hilbert curve to represent mammographic images as 1D vectors from which a combination of features is extracted in order to detect breast cancer. Similarly, the work of [11] transforms volumetric data into 2D and 1D representations, which are then processed efficiently by typical CNNs. Compared to processing the raw data directly, the method described in [11] reduces training time and can be used on data with an arbitrary number of channels. The performance of recurrent models, such as long short-term memory (LSTM) networks, in the detection of image forgeries depends on the sequence of the extracted image patches. In [5], the order by which image patches are fed into an LSTM is determined by the Hilbert curve in order to better preserve their spatial locality. In our work, the Hilbert curve is not applied for dimensionality reduction, rather it is utilized for representing 1D sEMG signals as 2D images in a way similar to what is done in a few other studies. In the work of [54] that deals with the problem of DNA sequence classification, it was determined that long-term interactions between regions of the sequence are important for high classification accuracy. Instead of using very deep networks or larger filters, the Hilbert curve was used to map the DNA sequence into an image such that proximal elements stay close, while the distance between distant elements is reduced. In addition, the authors of [1] employ the same representation to improve the performance of a deep neural network that detects regions in the DNA sequence that are important for gene transcription.

3 Methods

3.1 Hilbert curve

A Hilbert curve (also known as a Hilbert space-filling curve), first described by the German mathematician David Hilbert in 1891, is a continuous fractal space-filling curve, i.e., a curve that passes through all the points of a d-dimensional space sequentially. Space-filling curves have been widely applied to tasks in data organization and compression. The Hilbert curve is known for being superior in preserving locality compared to alternatives [20, 32], such as the z-order and Peano curves. This means when two points that lie on a 1D line at a specific distance are mapped into 2D space with a space-filling curve, their new distance will be smaller if the Hilbert curve is used.

Following the notation found in [20], a discrete d-dimensional Hilbert curve of order k, denoted as $H_k^d$ of length $L^d$, where $L=2^k$, is a bijective mapping:

$$H_k^d : [L^d ]\rightarrow [L]^d$$

(1)

where $[L]=\{0,\ldots ,L-1\}$. For the 2D case ($d=2$), a sequence x of length $L^2$, $\{x\}=\{x_0,x_1,\ldots ,x_{L^2-1}\}$, is mapped into a 2D image y with dimensions $L\times L$, such that $y_{i,j}=x_l, \forall l \in \{0,\ldots ,L^2-1\}$, where $(i,j)=H_k^2(l)$. For the rest of the paper, we simply denote $H_k^2$ as $H_k$.

The Hilbert curve can be easily constructed in a recursive manner. Initially, the 2D plane is divided into four quadrants traversed according to a fundamental pattern, as shown in Fig. 2, that constitutes the first-order representation $H_1$. Higher-order curves are produced by dividing existing sub-squares into four smaller ones that are connected by a pattern obtained by rotation and/or reflection of the fundamental pattern. A visualization of Hilbert curve traversals of the 2D space for orders $k=\{1, 2, 3\}$ is shown in Fig. 2, where the numbers correspond to the index within the sequence that is mapped to a specific pixel.

3.2 sEMG representation

In this work, the Hilbert curve is employed to transform multi-channel sEMG signals into 2D image representations. Firstly, the sEMG data of a hand gesture recorded by M electrodes are organized into small segments of length N. Therefore, the dimensions of the data are $N \times M$. Then, the mapping can be used in two ways: (1) across the time dimension, i.e., for each sEMG channel, map the time sequence into a 2D image, or (2) across the sEMG channels, i.e., for each time instant, map the values of the channels into a 2D image. Examples of these representations are shown in Fig. 3.

Usually in image applications, the input of a CNN is either an image with one channel (grayscale) or an image with three channels (RGB). In our approach, the image depth corresponds to either the number of sEMG electrodes or the duration of the sEMG segment.

What our approach actually does, is to reshuffle the spatiotemporal samples of a 1D image (Fig. 3d) in a multi-channel image with lower dimensions (Fig. 3e, f). The result of constructing each channel through the Hilbert scanning is to extend either the time neighborhoods (e) or the electrode neighborhoods (f). So, only one of the domains benefits from the Hilbert curve: the time domain in (e) where the Hilbert curve is applied on the rows of (d), and the electrode (spatial) domain in (f) where the Hilbert curve is applied on the columns of (d).

The computation of the proposed representations using the Hilbert space-filling curve requires only bitwise operations performed in constant time [21]. Then, the algorithm that computes the mapping between 1D and 2D takes $\mathcal {O}(log_2{K})$, where $K=N$ (Sect. 3.2.1) or $K=M$ (Sect. 3.2.2) is the length of the projected sequence. In addition, since the mapping from 1D to 2D is the same for all the generated images, it is computed only once and then used as a look-up table. Thus, considering that K is limited to $K_{max}=64$, the computational overhead is negligible compared to training the CNN.

We evaluate the Hilbert curve representations of sEMG signals across five CNNs and compare their performance to the baseline approach (i.e., Hilbert mapping is not applied).

3.2.1 Hilbert in time

The application of the Hilbert mapping across the time dimension (HilbTime) consists of the following steps (Fig. 4). Given a single electrode sEMG sequence of length N, a 2D Hilbert representation is achieved with maximum dimensions $L \times L$, where $N \le L^2$ and L is a power of two, i.e., $L=2^k$. If there are M sEMG electrodes, this process is repeated for every electrode, and the outputs are stacked into a M-channel image, i.e., an image with dimensions $L \times L \times M$. For example, an sEMG segment of 10 electrodes with 64 samples is mapped into an $8 \times 8 \times 10$ image. The output image, y, is initialized with zeros, and then for every electrode m and every timestep n, image coordinates (i, j) are generated from the timestep n such that the image value at position (i, j, m) equals the signal value at timestep n of electrode m. It is important to note that when sequence segments of length smaller than $L^2$ are used, the final image can be cropped in order to remove rows and columns with only zeros.

3.2.2 Hilbert in electrodes

The Hilbert mapping is applied across the sEMG electrodes (HilbElect) (Fig. 4). Specifically, if the number of sEMG electrodes is M, then $M \le L^2$, where $L=2^k$. The Hilbert mapping is applied at every time instant of the sequence resulting in an image with dimensions $L \times L \times N$. For example, an sEMG segment of 16 electrodes with 20 samples is mapped into a $4 \times 4 \times 20$ image. The output image, y, is initialized with zeros, and then for every electrode m and every timestep n, image coordinates (i, j) are generated from the electrode index m such that the image value at position (i, j, n) equals the sEMG value at timestep n of electrode m. As in the previous case, if the number of electrodes is less than $L^2$ the final image can be cropped.

3.3 Network architectures

In this work, we evaluate our method on the state-of-the-art CNN architecture for the problem of hand gesture recognition (WeiNet [53]), as well as models based on architectures typically found in image tasks, such as VGGNet [42], DenseNet [24] and SqueezeNet [26]. Since these models were defined for image tasks where the dimensionality of the data is bigger, modifications were needed to adapt these architectures to sEMG data. In addition, inspired by the multi-scale dense networks (MSDNet) [23], we propose the application of a similar architecture, which we name multi-scale Hilbert network (MSHilbNet), where the generation of Hilbert curve representations of multiple orders is an inherent feature of the topology. The number of parameters for the model architectures is presented in Table 1, while details of each architecture are given next.

Table 1 Number of parameters for the model architectures

Hilbert sEMG data scanning for hand gesture recognition based on deep learning

Abstract

Similar content being viewed by others

sEMG based hand gesture recognition with deformable convolutional network

STCN-GR: Spatial-Temporal Convolutional Networks for Surface-Electromyography-Based Gesture Recognition

A novel sEMG-based dynamic hand gesture recognition approach via residual attention network

1 Introduction

2 Related work

3 Methods

3.1 Hilbert curve

3.2 sEMG representation

3.2.1 Hilbert in time

3.2.2 Hilbert in electrodes

3.3 Network architectures

3.3.1 WeiNet [53]

3.3.2 VGGNet [42]

3.3.3 DenseNet [24]

3.3.4 SqueezeNet [26]

3.3.5 Multi-scale Hilbert network (MSHilbNet)

4 Experiments

4.1 Dataset

4.2 Evaluation of methods

4.2.1 Baseline

4.2.2 Hilbert in time

4.2.3 Hilbert in electrodes

4.3 Model optimization

5 Results

6 Discussion

6.1 Hyper-parameter selection

6.2 Main results

6.3 Evaluation of the MSHilbNet

6.4 Future work

7 Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

A Appendix

A Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation