A novel sEMG-based force estimation method using deep-learning algorithm

This paper discusses the problem of force estimation represented by surface electromyography (sEMG) signals collected from an armband-like collection device. The scheme is proposed for the sake of two dimensions of sEMG signals: spatial and temporal information. From the point of space, first, appropriate channel number across all subjects is investigated. During this progress, an electrode channel selection method based on Spearman’s rank order correlation coefficient is utilized to detect signals from active muscle. Then, to reduce the computation and highlight the channel information, linear regression (LR) algorithm is conducted to weight each channel. Besides, the recurrent neural network (RNN) is used to capture the temporal information and model the relation between sEMG and output force. Experiments conducted on four subjects demonstrate that six channels are enough to characterize the muscle activity. By combining the selected channels with different weight coefficients, LR algorithm can fit the output force better than simply averaging them. Furthermore, RNN with long short-term memory cell shows the superiority in time series modeling, which can improve our results to a greater degree. Experimental results prove the feasibility of the proposed method.


Introduction
As a main mechanical interaction between humans and the environment, muscle force during movements is essential in daily life [1,2], such as intelligent prosthesis [3] and games [4]. However, it is impractical and inconvenient to use force sensors in many applications. Surface electromyography (sEMG), due to its non-invasive nature and abundant information, has been a promising alternative way to predict force information [5,6].
It has been proved that muscle force is highly correlated to the degree of muscle activation [7][8][9][10]. Intrinsically, each muscle consists of a number of muscle fibers, which are activated by the central nervous system. During activation, the stronger muscular contraction is needed, the larger muscle fibers controlled by motor neurons are recruited. As the sum of electric potential from motor neurons, sEMG reflects the magnitude of muscle force.
There have been many attempts in the past to quantify the force on the basis of sEMG. Papers covering this issue can be summarized as parametric estimation and nonparametric estimation. The parametric estimation produced from the point of physiological muscle force generation introduces a biomechanical model called Hill [11,12]. To estimate muscle force, Hill model takes force-velocity, length-tension and activation properties into consideration, and incorporates these factors according to force generation [13,14]. However, Hill model suffers from the difficulty in parameters determination [15][16][17].
Nonparametric estimation provides a new perspective from computing science. For example, situations where the force changes with sEMG can be treated as a regression problem [18][19][20]. This method does not focus on the physiological theory between sEMG and force, but simply search the input-output relationship with machine-learning algorithm. Li et al. employed support vector machine (SVM) to estimate lumbosacral joint compression force, and proved the efficiency when walking with backpack 1 3 load between 0 and 20% body weight [21]. Cao et al. investigated the handgrip force prediction with extreme learning machine (ELM), and concluded that ELM achieves faster learning speed and better generalization performance compared with SVM and multiple nonlinear regressions [22]. Besides, methods including linear dynamic model (LDM) [23], three-domain fuzzy wavelet neural network [24], fast orthogonal search [25] and polynomial fitting technique [26] were also utilized to do the similar jobs.
Deep-learning algorithms, specifically convolutional neural network (CNN) [27,28] and recurrent neural network (RNN) [29,30], have recently shown preferable results in searching the complex input-output relationship than traditional methods. Xu et al. [31] analyzed the feasibility by applying CNN and RNN to sEMG-based force estimation. Results demonstrated that the combination of CNN and long short-term memory (LSTM) network is an effective method. Apart from hand grasping, Chen et al. [32] delved degrees of freedom force prediction on hand with the similar method. While both of them obtained sEMG signals with high-density electrodes, which are expensive and unnecessary [23].
This paper proposes a linear regression (LR) and LSTMintegrated method (LR-LSTM) to estimate muscle force under the isometric contraction state. This conduction is in consideration of the two dimensions of the sEMG signals, which are channel dimension and temporal dimension. The main contributions of this paper can be summarized as follows: 1. Different from inputting all the channels into the regression model or converging the electrodes only on the activated muscle groups, this paper selected channels covering the active muscle groups. Based on the SROCC between the output force and signals in each channel, the useless sEMG signals are removed in case of interference.
2. Different from averaging the sEMG signals directly, this paper first utilized linear regression (LR) to approximate the press strength with a weighted channel combination. This conduction can highlight the active muscle groups by assigning large weights and make a linear approximation in channel dimension. 3. To take advantage of the temporal information in sEMG signals, RNN with LSTM cell is used to estimate the force at one time point instead of treating it as a time series problem. Experiments on four subjects show the superiority of the proposed method.
The remaining parts of the paper are organized as follows: the next section describes sEMG signal collection, preprocessing and selection methods. The third section introduces a two-stage regression algorithm in channel and temporal dimension. The forth section demonstrates and analyzes the experimental results. The conclusion and future work are drawn in the last section.

Data acquisition
As seen in Fig. 1, a homemade armband-like sEMG acquisition device is used to obtain the sEMG signals. The device consists of 16 modules placed on 8 circuit boards, covering 10 cm distance in vertical dimension. One of these modules indexed zero is functioned power in, and others are utilized to detect sEMG signals. For signal detection, a reference electrode is placed on the wrist to reduce the power line interference. The sEMG signals are amplified to 960-fold and filtered with a band pass of 20-500 Hz. After that, there is an A/D converter conducted to obtain an 8-bit value with the sampling frequency of 1000 Hz. There are four subjects aged 25-30 years participating in the collection, including two males and two females. During the acquisition process, subjects were asked to sit in the chair with an oscilloscope in front of them and conduct a press gesture for ten trials. In each trial, the applied force consists of two stages: ascent and retention. In other words, the subjects should perform press gestures with the force increasing gradually to the 60% maximum voluntary contraction (MVC), and then maintain the force until the 5-s trial ended. To realize controlled contractions, a visual guidance was shown on the oscilloscope. At the beginning of experiments, all subjects practiced the force patterns for many times until they got used to it.

Signal preprocessing
Due to intrinsically high spatiotemporal variations in sEMG signals, it is necessary to estimate the neural activity by signal preprocessing. Based on the experiences [33,34], the sEMG signals are preprocessed by full-wave rectification. Then a low-pass Butterworth filter with the cutoff frequency of 1.5 Hz is designed to extract the information in low-frequency band. For eliminating noises, the force signals were filtered by a three-order sliding mean filter. After that, preprocessed sEMG and force signals were normalized according to the maximum value of the each trial, respectively (Fig. 2).
In this study, ten-fold cross-validation is used to estimate the performance of the proposed method. That means for all ten trials, one of the trial data takes tunes as testing set, and others as training set. Considering the physiology that there is a delay between sEMG signals and measured force [35], for each trial, we segmented the data with a sliding window with 50 data points (about 50-ms period). As for the ground truth, the force value corresponding to the 50th sEMG value in the window is chosen as the label.

Electrode channel selection
As introduced in "Data acquisition", an armband-like acquisition device is used to collect sEMG signals and estimate the force information of a press gesture. Since the limb consists of bones and muscle groups (see Fig. 1), the advantage of this device is that it can detect all the muscle activity around the forearm. In other words, it can adopt numbers of gestures, thus making it has a strong practical value in both gesture recognition and force estimation.
However, when subjects conduct a specific gesture, there is a fixed activity distribution in muscle groups. The device cannot converge electrode channels only on the active muscle parts as Refs. [36,37] did. According to the dependence between the motions and muscle parts, it is necessary to choose the most relevant set of electrode channels [38,39]. Inspired by Ref. [26], the correlation coefficients between sEMG and force signals are employed for electrode channel selection.
Since the correlation coefficient may be influenced by the outliers and has large amount of calculation, the signals are first divided into segments with N data points ( N is set to 50). Then, the sEMG signals are supposed x i,k as an array sized of I × K , where I and K are the numbers of channels and data points, respectively. The MAV values can be obtained as follows: where i ∈ {1, 2, … I} , k ∈ {1, 2, … K} , and j means the jth MAV value in each channel. Furthermore, considering the nonlinear relationship between sEMG and force signals, the Spearman's rank order correlation coefficient (SROCC) [40,41] is calculated. Given the MAV array from force and sEMG signals, SROCC first ranks the MAV values from small to large. Then, the SROCC can be expressed as follows: where Due to the ten-fold cross-validation utilized in the experiment, for the testing data trial n, the SROCC calculation progress can be summarized as follows (see Fig. 3). After finishing the progress, the channels are selected based on the order of SROCC value size.

The two-step regression method
From a mathematical point of view, sEMG signals can be treated as a two-dimensional matrix composed of electrode channels and time dimensions. That is to say, it contains both temporal and spatial (channels) information. To utilize the spatial information reasonably and reduce calculation, this paper first uses LR algorithm to assign weights to channels. Then, the combined data are segmented in time dimension and treated as the input of a RNN model, which estimates the corresponding strength information as an output (see Fig. 4).

Linear regression
In order to decrease calculation and improving evaluation results, the existing methods usually average the signals from selected channels to approximate output force as much as possible [26,31]. The purpose of this conduction is to present the muscle activation so as to estimate the force information with a nonlinear regression algorithm. However, not all electrode channels have equal importance for the output force in reality. For instance, the most active signals may have a larger weight than other signals. In other words, instead of giving equal importance of each channel, this paper takes a channel combination with different importance weights. Similar to the calculation of SROCC, training set of LR should be combined due to the ten-fold cross-validation in this paper. Let x l,c donate the l th sEMG signal from the c th selected channel ( l ∈ {1, 2, ..., L} , c ∈ {c 1 , c 2 , ..., c s } ), where L is the length of combined training data and s is the number of selected channels. Then the combination can be expressed as follows:  As the combined signals should be similar with force, this paper calculates the weights with LR algorithm as the first step to simulate the output force. The loss function of this method can be written as follows: , and ∈ ℝ L×1 is the vector consisting of measured force values.
Then, the weight matrix should be After the weight matrix is obtained, the electrode channels will be combined according to the matrix to reflect the involvement of different muscle parts in the process of generating force as much as possible. Then, because the processed sEMG signals from skeletal muscles precedes mechanical tension 50-100 ms [42]. To make full use of the time information, the combined signals will be divided into 50-ms segments, and then, the RNN model will be used for estimation. In other words, for a predicted force point f t at t ms, the regression model can be expressed as follows: Specific conduction of RNN model is introduced in next subsection.

Long short-term memory (LSTM)
Based on recursive connection, RNN has showed powerful performance in modeling temporal dynamics. As a typical variant, RNN with LSTM cells can be more capable of capturing spatial corrections by deciding which information should be preserved or forgotten. In Fig. 4, a two-layer RNN model with LSTM cell is used to extract the temporal information within sEMG signals, which mainly contains input gate, forget gate and output gate. In this paper, the hidden units in both layers are equal to 16. Donating the t th values in the combined signals as t , the computations of LSTM cell are shown as follows: where t , t and t are the input gate, forget gate and output gate, respectively. i , i , f , f and o , o indicate the weight matrices and bias vectors of the corresponding gates. t and t are the cell and hidden state outputs, and means a sigmoid nonlinear function.
After temporal information extraction with numbers of LSTM blocks, there is usually a fully connected (FC) layer to estimate the output value in regression problem: where f is the predicted force, g means the activation function, and F , F , F are the input, weight and bias vectors, respectively. In this paper, there is one FC layer to estimate the final force value.

Model estimation standard
As most researchers do, the root mean square difference (RMSD) and R-Squared Regression Score (R2_score) between estimated force and measured force are used to evaluate the algorithm performance [43,44]. RMSD reveals the difference between the true value f i and predicted value f i , and R2_score reflects how well future samples are likely to be predicted by the model: where T is the testing set number and f is the average value of the measured force.

Experimental results and discussion
This paper proposes a two-step regression progress to estimate the force information. For proving the effect of the proposed method, experiments are conducted as follows. First, the appropriate channel number is explored under the press gesture. Then channel combination and regression algorithm are discussed and compared with other referenced methods.

Experiments of channel selection
Since SROCC counts the correlation between the corresponding points of two types of data, for channel selection method based on SROCC, the time delay between sEMG and force signal will often affect the final selection results. According to previous experience, the delay between the sEMG signals and the force is generally 50-100 ms. To verify the effectiveness of the method proposed in this paper, the sEMG signals were delayed by 10-90 ms with an interval of 10 ms to explore the influence on the final selection results. The results are shown in the following figure. Figure 5 shows the influence of different time delays on the SROCC value of subject 4. It is seen that through there are slightly differences among SROCC values, while the selection results are roughly fixed. Especially for the first several channels with larger SROCC values (i.e., channels 11-15), they have made big gaps with others. In this paper, to ensure accuracy, we uniformly set the time delay as 50 ms when selecting channels. On this basis, the channel selection results of all subjects under ten-fold cross-validation are shown in Fig. 6. As is seen in Fig. 6, most selected channels appear ten times (about 84.52%), which is due to the similar exertion pattern of the subjects in all trials. This also demonstrates the effectiveness of the method proposed in this paper. However, for individual channels, due to factors such as muscle fatigue, the results of channel selection vary slightly among the trials.

Experiments using different channel numbers
In this section, the LR-LSTM results with different selected channel numbers are proposed, then, the channel number can be determined by the results. On this basis, we analyze the relationship between the two steps, and compare the LR results with different channel selection and combination methods. Experimental results support the motivation of the proposed method.
For searching the appropriate sEMG channel number, this paper first selects the corresponding sEMG channels according to the SROCCs from the training set. Figure 7 shows the average R2_scores of different subjects with different sEMG channel numbers (from 3 to 10).
It demonstrates that the best results appear when utilizing six channels as input. In other words, six channels' combination shows a stable performance on all subjects. Moreover, it is seen that for all subjects, both standards go through the progress of getting better and then getting steady or worse. This indicates that only a small number of EMG channels cannot fully characterize the muscle activity in press gesture situation. While too many channels contribute nothing and may even bring interferences to the final results.
Besides, it is seen that the LR and LR-LSTM algorithms show similar trends in general. This is the reason that LR-LSTM takes the output from LR algorithm as the input and evaluates force information with LSTM model.

3
The closer the result of the LR algorithm is to the force information, the better estimation of LR-LSTM will make. For all channel numbers showed in Fig. 7, LR-LSTM performs better than LR algorithm. The reason can be discussed in the following two aspects. On the one hand, compared with LR algorithm, LR-LSTM extracts temporal information, making the estimation more stable and accurate. On the other hand, the nonlinear fitting ability of the proposed method is further strengthened.
Specifically, for results under six channels, the R2_score and RMSD results of each subject are presented in Tables 1  and 2. The bold type in tables show the best results with each subject.
Due to the differences among subjects in physical state, such as forearm diameter and muscle development, the best channel number also differs from subject to subject.

Experiments using different channel selection methods
In previous subsection, it has been proven that there is a consistency on the LR and LR-LSTM results. To prove the rationality of electrode channel selection and combination proposed, this paper takes the following methods as input, and calculates the fitting effect with LR algorithm for comparison.
Method 1: Electrode channel combination with all 15 channels participating in; Method 2: The average value of all electrode channels; Method 3: Electrode channel combination corresponding to the six largest SROCCs (the proposed method); Method 4: The average value of electrode channels corresponding to the six largest SROCCs; Method 5: Electrode channel combination corresponding to the next six largest SROCCs. Figure 8 shows the results of all channel selection and combination methods. It is seen that the Method 3 makes the best performance among all the subjects. Compared with averaging channel values directly, a weighted combination of channels allows more relevant channels to reveal the muscle-force interaction, leading to improvements on the results. Compared with Methods 2 and 4, Method 3 reduces the RMSD about 6.04% and 0.98%, and improves the R2_score about 9.54% and 2.25% across the subjects on average, respectively.
Besides, in the aspect of channel selection, it shows the rationality of our method with SROCCs from electrode channels. Specifically, compared with Methods 3 and 5, there is a 42.09% increment in R2_score and 9.45% decrement in RMSD. In particular, this paper also takes the values of all channels in the experiment, and it is seen that with more useless channels participating in, they would be harmful to the final results.

Comparison with different methods
The methods comparison can be divided into two aspects. The first one is the necessity of putting channels with a linear combination. The other is a comparison with other popular methods as described in "Introduction".
For proving the proposed methods, two separate LSTM models based on the selected 6 channels and all 15 channels are conducted on the dataset. In Fig. 9, the number in the bracket of X-axis means the number of channels. It shows that even the spatial information is assigned with a nonlinear function, the results would stay with a similar level compared with LR-LSTM. While the drawback of this conduction is that the LSTM method will make an extra computational cost with the increasing channel number.
Except for LSTM method, Tables 3 and 4 also provide the comparison with the common methods. It should be noticed that the difference between LR algorithm and LDM in Tables 3 and 4 is that LDM takes not only the channel information, but also the temporal information into consideration. Besides, all these methods mentioned are based on  the selected channels for fairness. Since the ployfit method in Ref. [26] is based on averaging the selected channel values, it shows the worst performance among all methods. In addition, due to the linear fitting characteristic of the linear dynamics algorithm, the result performs worse than ELM, SVM and the proposed method, which proves the nonlinear relation between force and sEMG signals. Figure 10 gives typical estimation effect of a testing trial with different methods. It is seen that compared with other methods, LR-LSTM method can fit the true label closer. In addition, the predicted force is smoother, making it more applicable to the real life. In a word, experiments showed that LR-LSTM method has made use of the information more reasonably. Owning to the ability of temporal information extraction and nonlinear fitting, it can make a better result than other methods.

Conclusions and future work
The force estimation of a press gesture is proposed in this paper. An armband-like collection device is designed as the hardware. sEMG signals are those which can adapt to different kinds of gestures in real life. While for a certain gesture, it is necessary to make a channel selection to reduce computation and improve evaluation performance. Based on MAV and sliding windows, SROCC can help to relieve the impacts of unrelated channels. At the same time, the combination of classifiers LR and LSTM is constructed to achieve efficient utilization of both channels and time information, so as to result in an accurate force evaluation compared with other methods.
Although some theoretical conclusions have been obtained, there are still some problems that can be further discussed. Due to the signal difference from subject to subject, the number of channels selected can be adaptive in the future work.