A low-cost AR application to control arm prosthesis

This paper presents an augmented reality application to assist with myoelectric prostheses control for people with limb amputations. For that, we use the low-cost Myo armband coupled with low-level signal processing methods specifically built to control filters’ levels and processing chain. In particular, we use deep learning techniques to process the signals and to accurately identify seven different hand gestures. From that, we have built an augmented reality projection of a hand based on AprilTag markers that displays the gesture identified by the deep learning techniques. With the aim to properly train the gesture recognition system, we have built our own dataset with nine subjects. This dataset was combined with one publicly available to work with the data of 24 subjects in total. Finally, three different deep learning architectures have been comparatively studied, achieving high accuracy values (being 95.56% the best one). This validates our hypothesis that it is possible to have an adaptive platform able to fast learn personalized hand/arm gestures while projecting a virtual hand in real-time. This can reduce the adaptation time to myoelectric prostheses and improve the acceptance levels.


Introduction
Upper limb amputation forces the individuals to adapt and overcome unexpected issues in normal daily activities, by forcing significant functional and social consequences.Recent studies Braza and Martin (2020) found that just in the USA 1.7 million people live with limb loss, or approximately 1 of every 200 people, and 57.7 million worldwide (McDonald et al. 2020).The etiologies for upper limb loss in adults is trauma, followed by cancer.
A study by Braza and Martin (2020) and Biddiss and Chau (2007) found that upper limb amputees present a high prosthetic rejection.The medical community finds that rehabilitation should be performed at a rehabilitation center with therapists, prosthetists, and physicians.Furthermore, the transition to normal life after an amputation usually takes several months to years, even with the help of long-term outpatient rehabilitation sessions and prosthetic programs.Notably, Yiğiter et al. (2002) have found that pre-training for prosthetic limb has resulted in a significant improvement in accepting and using them.This leads to the understanding that with specialized training, there could be a faster transition to prosthetics, reducing the post-prosthetics adaptation time.
The arrival of myoelectric prostheses, often called bionic prostheses, radically changed the way that people interact with the world, as the electrical impulses are translated to predetermined movements that have more amplitude and are more precise than the traditional ones.Furthermore, the costs' reduction in recent years has allowed more people to use them.Nonetheless, this increase in popularity has evidenced problems in adaptation and operation of daily activities.These prostheses require very different adaptation plans as they function very differently from traditional prosthetics and, consequently, the adaptation methods used bear almost no resemblance to the traditional ones.Furthermore, the levels of dissatisfaction by its users are quite high (Franzke et al. 2019;Heerschop et al. 2021;Salminger Alvaro Sanchez-Rocamora, Ester Martinez-Martin, Angelo Costa have contributed equally to this work.et al. 2020), showing that the current training programs do not yield improvement over the adaptation efforts, both in terms of time or easiness.
Another active area of research in terms of limb amputation is the usage of virtual reality (VR) to the treatment of phantom limb pain (PLP), as showed in this comprehensive review Dunn et al. (2017).Some recent papers show that there is a significant advancement in pain alleviation when using VR systems.Rutledge et al. (2019) have showed that people who have used VR have managed to significantly reduce PLP, by the means of specialized exercises.Ambron et al. (2018), using low-cost hardware, were able to greatly reduce the pain perceived by presenting several scenarios (games and web browsing) where the users could navigate using "digital" legs, capturing real muscular impulses from the legs extremities.The study was done with only two persons, limiting greatly the scope of the results, so the conclusions should be taken cautiously.The study has reported that, after using the system, the pain levels were almost zero with recursive reduction of their normal pain levels.Ambron et al. (2021) extended the previous article using more people and VR scenarios.Nonetheless, the method was the same.So, as only motion-based sensors, just simple movements can be reproduced, although muscular impulses would produce richer data and could be used for different actions and degrees of movement.The authors have reported a 10% PLP reduction when using the VR coupled with the leg movements.Tong et al. (2020) have also conducted a study with five persons that suffer from PLP from arm amputation, finding that using a VR system has reduced pain and improved anxiety and depression levels.The users were immersed in a VR environment and exposed to a basketball game and interacted with the game using a hand controller on the intact hand.While the results presented may reveal that the users have improved their PLP, the method used, by mirroring the intact hand, depends on the cognitive acceptance of the mirrored movements.State-of-the-art approaches have mostly lean toward solutions that use in some degree the remaining parts of the amputated limb.Unfortunately, none of these projects addresses the inclusion of prosthetics limbs, so it is unknown if apart from reducing PLP, the users improve their prostheses' usage.
Myoelectric prostheses could be the way to bridge full movement and fine motor skills, being the most similar to a real limb.Nonetheless, the adaptation to them is quite complex, as studies find that pre-training is the key to a rapid adaptation to prostheses.Phelan et al. (2021) have implemented a system that used VR and a myoelectric sensor system to pre-train users for prostheses, in a fully immersive environment.This study relied on 7 persons with upper limb amputation and 6 medical experts to validate the experiment.While this was a qualitative study, both users and experts have reacted positively and were keen to continue performing exercises with the system.Additionally, Chau et al. (2017) have performed a similar experiment where the results were also promising, being the main achievement that the PLP was reduced permanently without requiring further sessions.
Therefore, given these findings, we propose to bring the success of VR to the myoelectric prostheses domain.Thus, the objective of our work is to pre-train the prostheses' receivers in an augmented reality (AR) setting using lowcost sensor systems in an effort to reduce the adaptation time and improve the acceptance levels of the receivers.To do this, we use an electromyographic armband in tandem with a deep leaning generated model to accurately detect and classify arm movements.We have achieved 95.65% of accuracy detecting 7 gestures, demonstrating the success of our model and research.

Biosignal acquisition
Each body movement is a result of several parts (e.g., muscles, bones, nerves, among other elements) working synchronized and accurately.So, the energy of those elements changes during the movement and this change can be measured to obtain information about the function of the different body parts (see Table 1).
In the context of hand prothesis control, electromyography (EMG) signals are used, since these signals measure the electrical activity produced by the skeletal muscles when contract or relax.In this regard, two kinds of controlling methods can be considered: invasive interfaces, which Keeping in mind an easy-to-use low-cost application, a noninvasive interface has been considered in this paper.In particular, the Myo armband was used.This gesture control device was released by Thalmic Labs (now discontinued) in 2013.As illustrated in Fig. 1, this armband consists of eight superficial electromyographic sensors and an inertial measurement unit (IMU) that includes a gyroscope, an accelerometer, and a magnetometer.Placed on the forearm, its myoelectric sensors provide information about the electrical activity in that area, which covers the movement of the arm and the fingers.
Gestural data are provided as the time representation of rapid voltage oscillations for the eight electromyographic sensors, with a normalized amplitude range going from − 128 to 127 mV (see Fig. 2).Note that, given that different active muscles contribute to the same signal, the sampling rate is of 200 Hz.Although this frequency does not guarantee the tracking of all the relevant events at a muscular level, it is sensitive enough to properly perceive hand gestures.
It is worth noting that a noise filtering is required to accurately recognize hand gestures.With that aim, two pass filters were applied during data capture.Firstly, a low-pass filter with a threshold of 5 was applied to attenuate the signal, while suppressing sensor noise.Then, the lower signal values are smoothed, and even amplified, thanks to a highpass filter with a threshold of 3 (see Fig. 3).

Hand gesture recognition
With the aim of obtaining an accurate hand recognition from EMG data, we have focused on Deep Learning techniques since they have been proven to be very successful at this kind of data processing (e.g., Rim et al. 2020;Xiong et al. 2021;Buongiorno et al. 2019;Côté-Allard et al. 2019).From this starting point, the first arisen issue is the data required to properly train the considered deep neural network.In this regard, the main characteristics to be taken into account are the number of samples per class (i.e., hand gesture), the number of subjects involved in the data capture, the data goal since it should be as similar as possible to the real data to work with, and the quantity and type of sensors used for the capture.So, despite the existence of several public datasets in the literature, such as Myo-Gym (Koskimäki et al. 2017), NinaPro (DB5) (Pizzolato et al. 2017), or CapgMyo (Du et al. 2017); they do not fulfill the requirements of this work.As a consequence, a new dataset was built.

Our EMG dataset
Based on the Myo limitations, seven hand gestures were chosen for this work (see Fig. Nine healthy subjects of different gender, age and physical condition were recorded with one Myo armband, while repeatedly performed the above-mentioned hand gestures.Specifically, 6 men and 3 women in an age range between 18 and 24 years old were firstly informed about the experiment and its possible risks.Then, each participant took a few seconds to learn how to properly perform each hand gesture, while getting used to the recording process.After that, each hand gesture was performed during 10 sec without fully flexing the elbow and without supporting it, being followed by a few seconds of rest before starting the next gesture.With a 200-Hz frequency, this protocol resulted in a dataset of about 126,000 measurements, summarized in Table 2.
Each sample was grouped into sequences of 200 measurements, the minimum number of measurements to correctly represent each gesture.This resulted in 626 sequences.However, this number of sequences could be insufficient to robustly train a deep neural architecture.
With the purpose of increasing the amount of data, two actions were performed.On the one hand, this data was combined with a dataset of EMGs signals presented by Nasri et al. (2020) and publicly available in http:// www.rovit.ua.es/ datas et/ emgs/.This dataset contains 18,500 samples from 15 healthy subjects (9 men and 6 women) with different physical conditions and in an age range between 20 and 35 years old, distributed into the 7 gestures considered in this paper, as summarized in Table 3.
It is worth noting that the new range of age introduces some important differences in the taken signals, as illustrated in Fig. 5.This difference can considerably influence the learning process.However, the double filter applied for noise suppression acts as a normalization process, reducing this difference to a great extent and, consequently, improving the learning accuracy (see Fig. 5).
On the other hand, the sliding window technique was used.Basically, this technique consists in defining overlapping sequences, keeping the original sequence contiguity.That is, a window of measurements is defined.This window contains as many measurements as required by the system (200 in our case) and is moving through the data to provide the measurements it enclosed to the system.Note that the amount of measurements the window is moving determines the number of overlapping measurements between two sequences.Thus, in case of 150-measurement sequence, an overlap of 50 measurements means that two consecutive sequences share 100 measurements.For our experiments, an overlap of 20 measurements between 200-measurement sequences was proposed (see Fig. 6).In this way, 29, 591 sequences were obtained.

Deep learning architectures
EMG signals can be considered as time series data.For that reason, recurrent neural networks (RNNs) were considered for their classification, since they are able to learn the temporal relationship between the input data.In this regard, the proposed architectures are focused on gated recurrent unit (GRU) (Cho et al. 2014) and long short-term memory units (LSTM) (Hochreiter and Schmidhuber 1997) since they can deal with the vanishing gradient problem by using mechanisms called gates.These gates are different tensor operations that can learn what information is stored to properly identify long-term dependencies.Three different architectures have been proposed.As illustrated in Fig. 7, the first proposed RNN architecture (HG-RNN1) is composed by three LSTM layers with 50 units each and a fully connected layer for hand gesture classification.On the contrary, three GRU layers with 50 units each ended with a fully connected layer, form the HG-RNN2 architecture.Finally, the HG-RNN3 architecture combines one LSTM layer with two GRU layers and a fully connected layer.In addition, a dropout of 0.2 and a recurrent dropout of 0.5 take place after each recurrent layer.In terms of network training, Adam optimizer with a constant learning rate of 0.001 and the cross-entropy categorical loss function were used.The batch size was set to 200, and the number of epochs was 100.Note that a dropout layer is used to randomly deactivate neurons during training.This is used as a regularization method to reduce overfitting and improve generalization error in deep neural networks.In addition, it is worth noting that all the parameters were experimentally set.In particular, the number of epochs was set based on the training stabilization and the network overfitting.Finally, despite three architectures are presented here, some other architectures were implemented by changing the number of recurrent layers and their combinations.However, the accuracy was so low that they have not been included in this paper.
These architectures were trained by using 80% of the data (25% of that for validation) and tested with the remaining 20%.The experiment involved raw data (data directly coming from the Myo device without any processing and being grouped without using the windowing technique), raw data with sliding window (data directly coming from the Myo device without any processing and being grouped by using the windowing technique), and filtered data with sliding window (double filtered data grouped by using the windowing technique).The obtained results are summarized in Table 4.As shown, the sliding window technique increased the accuracy of all the architectures in both the raw and filtered data.On the contrary, the filtering for noise suppression improved the accuracy obtained with raw data, but not when the sliding window technique was used.Hence, the best accuracy (95.65%) corresponds to the HG-RNN3 architecture that combines LSTM Fig. 6 Generation of the EMG sequences by using the sliding window technique and without using it and GRU units, over raw data when the sliding window technique is applied.
Looking at its corresponding confusion matrix for the test evaluation (shown in Fig. 8), it can be observed that most of the data is on the main diagonal.However, some hand gestures are better recognized than others.For instance, a small percentage of the samples belonging to the "wrist flexion" class is misclassified as "tap" or "victory".Something similar happens with the "open-hand" class, where some samples are misclassified as "tap" or "victory".

Augmented reality
With the purpose of visualizing the performed hand gesture, augmented reality (AR) was used since it allows to overlapping digital contents on the real world.Consequently, a virtual myoelectric prosthesis will be brought into the physical space such that the user can see the hand gesture performed by their muscles in the live view captured by a camera.In this way, users can learn to use the prosthesis in a more efficient and fast way.Despite AR is usually implemented from a combination of visual, auditory, and tactile/haptic interactions, this work is focused on the vision side due to the nature of our goal.In this sense, based on the relationship between the digital and the real worlds, AR can be classified into: marker-based, markerless and location.So, in the marker-based AR, the digital components are displayed and moved according to the real world, while the 3D augmented models are user-controlled when the markerless AR is considered.Finally, when the augmented models are used to provide local information based on the user's location such as walking directions or road signs, location-based AR is required.
Keeping in mind the assistance of users in prosthesis control, the digital prosthesis must be anchored to the user's arm and, consequently, marker-based AR is used.In that way, the prosthesis animation will be displayed right onto the user's arm.
To properly locate the user's arm, a marker (i.e., a distinctive picture) is placed on the Myo bracelet such that the animation will start as soon as that marker is recognized, and it will be moved and oriented accordingly.In this regard, we have studied two of the most commonly used markers: ArUco and AprilTag.
In particular, ArUco (Romero-Ramirez et al. 2018;Garrido-Jurado et al. 2016) is an OpenCV-based library aimed to accurately detect and recognize squared planar markers for AR applications.Note that the camera pose is estimated from the pose of the markers.As a consequence, the marker dictionary plays a main role, since its markers should be as different as possible to avoid confusions.In this work, the ArUco DICT_6X6_250 dictionary has been used (see Fig. 9).
As ArUco markers, AprilTags (Wang and Olson 2016;Olson 2011) consist of a black square with a white foreground with a particular pattern.Although they are similar to QR codes in using 2D barcodes, AprilTags have been designed to encode far smaller data payloads (between 4 and 12 bits), allowing them to be more easily detected, more robustly identified, and less difficult to detect at longer ranges.Despite the existence of several AprilTag families, in this work, the Tag36h11 family, illustrated in Fig. 10, has been used.
It is worth noting that a marker requires enough visual points to be uniquely recognized, and the pattern detail visually changes based on the image resolution.Thus, an analysis of the recognition of both markers at different distances from the camera and with different orientations was carried out.For that, four markers of each type with a size of 7.5 × 7.5 cm were located on a rigid surface.This sur- face was placed at three different distances from the camera: 0.65, 1.65, and 2.5 meters.At each distance and with different orientations, 100 images were taken and analyzed such that a success is when at least one of the four markers was correctly recognized.Based on that, Table 5 summarizes the success rate for both ArUco and AprilTag markers.As it can be observed, AprilTag provides a higher accuracy and, consequently, it was used in this work.
On the way to blend the virtual content into reality, the next step was to model the prosthesis.As no 3D prosthesis model was available, it was built from scratch.For that, Blender 3D (https:// www.blend er.org/) was used.This free and open 3D tool provides a wide range of functionalities, from creating animated films or visual effects, to 3D print models, interactive 3D apps and computer games.As a starting point, a 3D hand model was downloaded from Free3D (Dreamer https:// free3d.com/).After modeling the hand prosthesis, it was animated to obtain the considered hand gestures, as shown in Fig. 11.
The last step is the projection of the 3D virtual prosthesis on the AR environment.The open-source Panda3D framework (https:// www.panda 3d.org/) was chosen for this task since it provides essential tools for 3D rendering among other developments.
Thus, a virtual environment where the 3D prosthesis model was projected depending on the Myo marker's position and orientation was developed (see Fig. 12).This position adjustment was obtained from the following equations: where ( X r , Y r , Z r ) represents the prosthesis coordinates in the virtual environment; ( X C tag , Y C tag ) indicate the center coordinates of the AprilTag marker in image coordinates; and (640, 480) correspond to the dimensions of the input image.
In addition, the scale of the virtual prosthesis model is directly related to the distance between the upper corners of the AprilTag marker.As illustrated in Fig. 13, the greater the distance between the upper corners is, the higher the prosthesis scale is.Note that we have considered a maximum distance between corners of 60 pixels and a maximum scale factor of 0.2.

AR application
With the integration of the previous components, the AR application was developed.As shown in Fig. 14, the application workflow can be summarized as follows: the  system starts capturing the user's EMG signals until 200 measurements are taken.That sequence of EMG readings is the input to the HG-RNN3 neural network that outputs the corresponding hand gesture.This together with the information relative to the Myo marker's pose is sent to the application that translates it into the appropriate animation and projection of the 3D hand prosthesis model.It is worth noting that, given that parallel programming is used, the delay between the user's gesture and the 3D prosthesis model animation is considerably reduced, and real-time execution is obtained (30 frames per second).Some samples of the final AR application can be observed in Fig. 15 and in the video demo available in https:// youtu.be/ c8E33 AWdRWM.Note that two different versions were developed.On the one hand, the 3D prosthesis model is directly projected on the user's arm.On the other hand, the application shows two different windows: one displaying the RGB camera capture, and

Discussion
Our study starts with a comparative analysis of several recurrent neural architectures with 25 users under different conditions.So, different combinations of recurrent layers together with other learning parameters have been proposed and experimentally analyzed in order to properly recognize hand gestures from EMG signals.In addition, with the aim to be useful for any user, high variability has been included in the data.In particular, the 25 participants were in an age range between 18 and 35, of different gender and with different physical conditions.Additionally, the data were provided to the architectures in three different versions: raw data (200-measurement consecutive sequences taken directly from the sensor), raw data with sliding window (200-measurement overlapping sequences taken directly from the sensor), and processed data with sliding window (200-measurement overlapping sequences from a double processed sensor data).Note that, although the double processing acts a data normalization method, the best accuracy results were obtained without using that.So, the experimental results revealed that an accuracy of 95.56% in hand gesture recognition has been obtained.This high accuracy hold promise as effective hand gesture recognition system from EMG signal for an AR application to pre-train prostheses' reviewers.
After that, the interface of the AR application was designed.A virtual model of the prosthesis is used to illustrate the prosthesis to be received.To properly project it on the user's arm, different markers were studied, in particular, ArUco and AprilTag markers.This study was based on the accuracy on the marker detection at different locations and distances from the camera by getting an accuracy of 95% of the detection at a distance of 0.65 meters.This low failure rate is imperceptible by the user since the system is continuously taking images and recognizing the marker to properly update the model projection.
Although there is a growing interest in this research area, there is no previous work that actually measured quantitatively the hand gesture recognition and the prosthesis projection as in this work.So, a comparison with previous research is not possible.However, the literature has demonstrated a great success of VR to the myoelectric prostheses domain.Based on those findings, the proposed AR application is  The main limitation of this study is the lack of a user test case on amputee subjects since all the participants were sane.In future studies, we plan to conduct new experiments by including amputee subjects and, although a high variability has been included in the data used to train the neural networks for hand gesture recognition, some data about amputee subjects will be also included in that data.We will also plan to qualitatively evaluate the user experience with the AR application and its design with the aim to improve it and make it a real-world application that can be used to improve the acceptance levels of the prostheses' receivers.

Conclusions
Amputation is one of the major reasons of disability since it can limit the daily life activity of a person.Focusing on upper limb loss, myoelectric prosthesis can restore the functionality of their user's hand in a noninvasive way.However, the prosthetic acceptance is still low due to the lack of dexterous control.Recent studies have demonstrated that a well-structured and tailored prosthetic training can improve its acceptance.
This paper presents a low-cost, easy-to-use AR application aimed to help users acquire the necessary dexterity to control the prosthesis to avoid its rejection.Firstly, the EMG signals from the residual muscles are captured by using a Myo armband.Then, those signals are translated into the corresponding hand gesture.In this work, seven hand gestures were considered: rest, closed hand, open hand, victory sign, wrist flexion, wrist extension, and tap action.In regard to hand gesture recognition, a dataset was firstly captured such that a comparative study of different neural networks could be carried out.In addition, techniques for input improvement as well as data augmentation were also considered.Nine healthy individuals were recorded by obtaining a dataset of about 126,000 measurements.This data combined with a public dataset containing the EMG signals from 15 subjects and the considered improvements in this paper fed the proposed neural architectures, by obtaining an accuracy of 95.56% as the best result. (The input data were the mixed dataset when the sliding window method was used.) With respect to the AR implementation, two kinds of markers were studied: ArUco and AprilTags.An analysis of their visual recognition at different distances and with different orientations revealed that AprilTags resulted in a higher rate of success.Thus, some AprilTags were located on the Myo armband.Then, a 3D prosthesis model was created and animated such that it was projected on the camera's image (or on a virtual environment) based on the AprilTag pose and the recognized hand gesture.Note that the whole process works at 30 frames per second.
Despite the promising results, amputee subjects must be included in the study to properly validate it.For that reason, a user test case will be conducted as future work.Moreover, other EMG sensors will be studied with the aim to increase the number of hand gestures to be considered.In addition, the AR application functionality will be improved by including tailored rehabilitation programs and game-based tasks.

Fig. 3
Fig. 3 Signal noise suppression by applying two consecutive pass filters

Fig. 5
Fig. 5 Comparison between the EMG signals corresponding to an 18-year-old subject and a 34-year-old subject when they perform the same hand gesture

Fig. 7
Fig. 7 Our RNN architectures for hand gesture recognition

Fig. 8
Fig. 8 Confusion matrix for HG-RNN3 architecture when raw data with sliding window was used

Fig. 11
Fig. 11 Animation of the 3D prosthesis model representing the considered hand gestures

Fig. 13
Fig. 13 3D prosthesis model projection based on the marker's size

Fig. 15
Fig. 15 Samples of the final AR application execution

Table 4
Test accuracy for the proposed RNN architectures with different data

Table 5
Recognition rate of AR markers at different distances and with different orientations