Background & Summary

Being able to estimate the Human Muscular Manipulability (HMM) index is very useful for a range of different tasks. HMM provides a measure that describes the relationship of an articulated body with respect to velocity, acceleration or force. As stated in1,2,3,4,5, the HMM index can be used to evaluate the effects of instantaneous variations between joints and limbs, and it is usually represented by a spheroid around the endpoint of the joint mechanism. The dimensions of this spheroid around the endpoint represent the maximal feasible velocity, acceleration or force capacity in the spheroid axes directions, giving a quantitative measure to evaluate the ability in the manipulation task. If this analysis is applied to a human limb, HMM is correlated with the comfort of an specific pose in order to generate movements at the end of the arm. For instance, HMM can be used to track a certain physical rehabilitation procedure after an injury under the assumption that the movements of the user are limited. Thus, if a person has an injured elbow that limits their movement and they need to grasp an object, they will try to compensate the reduced mobility by straining the shoulder and the wrist excessively. HMM in the natural set of poses to grasp an object would be lower than in the aforementioned case where shoulder and wrist are heavily stressed. In wearable robotics, control algorithms can thus be tuned to decrease unnecessary physical effort and for that reason, HMM can be used to improve the control input through its assessment in exoskeletons for assistance of disabled people3 or for power augmentation6, and in rehabilitation therapies with collaborative robots arms7. Thus, the ability to estimate HMM from a human pose has several applications to rehabilitation and healthcare, telemedicine, physiotherapy and labour risk prevention. Moreover, human motion is ultimately commanded by the electrical activation of muscles. To measure this activity, it is possible to use non-invasive Electromyography (EMG). Muscular information is directly related to postural outcomes during the performance of upper-limb movements, so it is expected that factors such as the aforementioned HMM could be predicted or inferred from EMG recordings. In this context, previous approaches on neuromechanical modelling have explored the possibility of extracting motion kinematics and dynamics from EMG signals in a variety of tasks8,9.

The bloom recently experienced by the new trends in artificial intelligence methods, and specifically machine learning and deep learning algorithms, has led to more accurate and efficient data processing of EMG data. For instance, some popular deep learning-based approaches consider the temporal factor using Long Term Short Memory units10. It is also popular to model the problem so it is based on images that can be efficiently processed by a convolutional pipeline11,12. Some other approaches leverage a fusion of features under a deep learning framework13,14. Nonetheless, traditional machine learning algorithms applied to control15 and neuromuscular modelling16 also reportedly provide good results.

Thus, the most common and most accurate approaches to perform predictions are based in machine learning and deep learning methodologies, which are notorious for being data-hungry algorithms. In this sense, in order to train these systems, datasets should provide large amounts of annotated data. In this work, we propose KIMHu, a Kinematic, Imaging and electroMyography dataset for HMM prediction. Samples are composed of images, depth maps, a list of keypoints that define the 3D position of the joints in the human body, and EMG signals. The labels include HMM indexes such as Kinematic Manipulability Index (KMI), Dynamic Manipulability Index (DMI), and a custom metric named Local Conditioning Index (LCI). The dataset features a number of physical upper-limb exercises, different repetitions of the same exercise, and data recorded from 20 healthy participants. In addition, benchmarking tools based on this dataset are introduced as well. Different train and test methodologies are suggested alongside accuracy metrics to be measured. Finally, a basic baseline has been set for comparison purposes.

Similar datasets

Recent neuromechanical datasets, i.e., those which include both EMG and kinematic and/or kinetic information, are mainly divided into gait analysis and upper-limb activities. Gait datasets usually include 3D ground force reactions, MoCap-based kinematic estimation and EMG recordings on the leg muscles17,18,19. Another recent dataset includes also ultrasound imaging to infer neuromuscular information of dynamic gait20. Some of them do not include muscular activity21, missing the neuromechanical approach for further analysis. Regarding upper-limb activities, the variability of related datasets is higher. A variety of contributions are centered on only hand movements22,23,24,25. To our knowledge, only a previous dataset focuses on elbow movements, but only measuring isometric muscle contractions26. Several of the previously cited datasets also include high-density EMG on the forearm22,26. Our dataset includes not only HMM indexes but raw RGB image frames, depth maps and EMG information. Table 1 summarizes the main differences of all these datasets in comparison to our proposal.

Table 1 Summary of recent neuromechanical datasets vs Ours.

Methods

In this section, the methodology followed to generate the data is explained. In addition, a description of the capture setup and the techniques applied to synchronize and generate the ground truth are presented.

Overview

The KIMHu dataset is a collection of data comprised of images, depthmaps, skeleton tracking, electromyography data of the upper limbs and different Human Muscular Manipulability indexes such as Kinematic Manipulability Index, Dynamic Manipulability Index and Local Conditioning Index. It depicts different participants and different upper limbs physical exercises. The proposed dateset could be used, but not limited, to:

  • Prediction of Human Muscular Manipulability based on image data.

  • Prediction of Human Muscular Manipulability based on 3D data or depth maps.

  • Prediction of Human Muscular Manipulability based on EMG data.

Ultimately, it could help developing different applications. For instance, the HMM indexes could be used for a physical rehab measurement software, or occupational hazard monitoring applications. Finally, the methodology proposed can be replicated by the scientific community in order to include more movements, more participants or data of different nature.

Capture setup

In this Section, detailed information about the devices used to record the data is introduced. The corresponding software specifications are described as well.

Hardware

The architecture of the system is composed of a RGB-D camera (Microsoft Kinect Xbox One, also known as Kinect V2), a Noraxon’s TeleMyo Mini DTS system, an Arduino UNO and a laptop with an Intel Core i7-9750H 2.60 GHz processor and 16GB RAM. The proposed system is shown in Fig. 1.

Fig. 1
figure 1

Capture Setup.

The second-generation Kinect V2 by Microsoft (released in July, 2014) is equipped with a color camera, depth sensor (including infrared (IR) camera and IR projector) and a microphone array. The Kinect V2 can be used to capture color images, user skeleton joint tracking, depth images, IR images and audio information27. The detailed technical specifications are summarized in Table 2.

Table 2 Kinect V2 hardware specifications.

Regarding to the mini DTS system, this is composed of a mini DTS receiver and four single channel DTS EMG wireless sensors with a sample rate of 1500 Hz. The measured data of each sensor is transmitted directly to the receiver in a short-range network with radio frequencies transmissions between 2.4 and 2.5 GHz. The EMG sensor data acquisition system has a 16-bit resolution and the EMG sensors have an input range of 5 mV and a first order high-pass filter set to 10 2 Hz is applied internally by the hardware system, the baseline noise of the sensors is lower than 5 V RMS of the signal. More technical details are included in Table 3.

Table 3 Noraxon Mini DTS hardware specifications.

Software

The implemented algorithm to capture Kinect information is written in C# and uses Microsoft Kinect SDK 2.0 on Windows 11 operating system. The Microsoft Kinect SDK 2.0 can extract skeleton data at approximately 30 Frames Per Second (FPS), the source code for capturing and processing the information is available at28. EMG signals were captured with Noraxon’s Myo Muscle software and exported into Matlab compatible files.

Synchronization

To capture information, a Kinect V2 (30 Hz) and a Noraxon Mini DTS (1500 Hz) are used. A digital pulse is sent to the Noraxon via the arduino UNO to indicate the beginning and the ending of each test. The connection between the devices is shown in Fig. 1.

Procedure

Twenty participants were included in the study: three women and seventeen men, age = 27±8 years, height = 175±6 cm, weight = 75±14 Kg, as presented in Table 4. All participants were right-handed and had no known neuromuscular or sensory disorders. Prior to their participation, the participants were informed of the study approach and gave their informed consent in accordance with the code of ethical conduct. The study was conducted in accordance with the Declaration of Helsinki, and approved by the Ethics Committee of University of Alicante (protocol code UA-2021-06-21-01, approved on 29 June 2021.

Table 4 Anthropometric information of the participants.

Prior the execution of the exercises, the participants were asked to stand over two markings on the floor, in front of the RGBD camera, in a resting position. The user must perform two tests. In each test, the participant must execute ten repetitions of the proposed exercise, having a resting period of six seconds between each repetition. In front of the participants a visual guidance was displayed on a smartphone to indicate the resting period and the exercise execution period. In total, both exercises took about 15 minutes.

For the first exercise, the participants must extend their arm, i.e., to do an abduction movement with the forearm in pronation. After that, an elbow-flexion towards the head must be executed and then, the participants must return to the elbow-extension position. Sequentially, the participants must execute a wrist-flexion movement, return the wrist to a neutral position, and then perform a wrist-extension movement. Finally, the participants must take their arm to the resting position. In Fig. 2 the sequence of movements required is depicted.

Fig. 2
figure 2

Tests: Leftmost image shows the Movements performed in Test 1 (Frontal plane) and the rightmost depicts the movements performed in Test 2 (Transverse plane).

For the second exercise, the participants must execute an abduction of the arm and then, execute an elbow-flexion movement and place their wrist right in front of their chest. After that, execute a wrist-flexion movement, then return the hand to a neutral position of the wrist and subsequently, execute a wrist-extension movement. At the end, the participants must take their arm to the resting position. In Fig. 2 some steps of the exercise can be observed.

Data Records

In this Section, the data that KIMHu29 provides is explained in detail, including the EMG data, the skeleton tracking, the images and the HMM metrics. It is worth noting that the KIMHu dataset can be downloaded from ScienceDB https://doi.org/10.57760/sciencedb.01902.

EMG Data

During the execution of the exercises, the surface EMG signals were recorded using bipolar electrodes over the following muscles: Extensor Carpi Ulnaris, Flexor Carpi Ulnaris, Biceps Branchii and Deltoideus Medius.

Images and user skeleton joint tracking

During the execution of the exercise routine, features of the participant’s posture are extracted for each frame. These features consist in the three dimensional (3D) points representing the position of each one of the skeleton joints. This information must be conveniently parsed and stored so that representative features can be easily extracted. CSV files are employed to store the position and orientation of the 25 joints comprising the skeleton (see Fig. 3). Moreover, additional information such as the timestamp is also recorded. The depth maps are recorded by the IR camera, they are 2D images 16 bits encoded in which measurement information is stored for each pixel. Color images are captured by the second lens of the RGBD camera. The names of the files have the following nomenclature: color <timestamp>.png and depth <timestamp>.png respectively.

Fig. 3
figure 3

Body tracking joints.

Kinematic groud truth

In the following subsections, the different HMM indexes included in the dataset are explained, focusing on what they mean and how they were computed.

Human arm kinematic model

To compute the measure of muscle manipulability, the musculoskeletal system of the human right arm is modeled as a three-segment mechanism (see Fig. 4), where the first joint is the shoulder, the second joint is the elbow, and the third joint is the wrist. This model has six degrees of freedom: shoulder abduction/adduction (q1), shoulder flexion/extension (q2), shoulder rotation (q3), elbow extension/flexion (q4), elbow pronation/supination (q5), and wrist flexion/extension (q6). L1, L2, and L3 represent the lengths of the arm, forearm, and hand, respectively.

Fig. 4
figure 4

Human Arm Kinematic Model.

Manipulability

HMM is a kinematic concept to compute the dexterous manipulability of a joint mechanism (as the human arm). Quantitatively, HMM is a measure that describes the relationship between joints and limb endpoint with respect to velocity, acceleration or force that can be applied by the mechanism to perform manipulation tasks. This concept can be represented by a spheroid around the endpoint to represent the maximal feasible velocity, acceleration or force capacity in the spheroid axes directions (see Fig. 5). Yoshikawa30 defined this paramenter of manipulability for a joint manipulator with a scalar value given by:

$$\omega =\sqrt{det\left[J{J}^{T}\right]}$$
(1)

Where J is the 6xN Jacobian matrix of the manipulator in a specific position, where N is the number of joints of the manipulator. Yoshikawa also defined the concept of dynamic manipulability31, where the dynamics of the joint manipulator is taken into account when determining its manipulability scalar value:

$${\omega }_{d}=\sqrt{det\left[J{\left({M}^{T}M\right)}^{-1}{J}^{T}\right]}$$
(2)

Where M is the inertia matrix.

Fig. 5
figure 5

Example of three different arm positions and their corresponding muscular force manipulability ellipses.

For this approach, the human arm was modelled in Matlab by means of the Denavit Hartenberg (DH) parameters32 and using a robotics toolbox33. The DH representation for the right arm is given on the Table 5. In each row of this table, it can be seen the four DH parameters ai, αi, di and θi, which describe the transformations between the DH reference systems Si-1 and Si, located in the joints of the upper arm (see Fig. 6). The six rows of Table 5 define the whole kinematics of the arm and it allows to compute the position of the endpoint from given joint values (q1,…,q6).

Table 5 DH table for 6-DOF human arm.
Fig. 6
figure 6

Visualization of the DH parameters and the DH reference systems.

The angles for each of the degrees of freedom, as well as the lengths of the links, are computed using the points captured by the RGBD camera: shoulder, elbow, wrist, thumb and hand tip. With the configuration of the arm, the information on the angles and the inertia matrix, the kinematic and dynamic manipulability is computed by applying equations?? and?? respectively.

Local conditioning index

There are other parameters which provides a good and intuitive measure about the dexterity of the joint mechanism when it comes to grasp or manipulate an object. They are the Dexterity Index (Id), which provides a measure of how comfortable the joint mechanism is working; and the Local Conditioning Index (Id), which gives a value between 0 and 1, the closer to 1 the system has better dexterity, and the closer to 0, the joint mechanism is closer to a singularity34.

$${I}_{d}\left(J\right)=\left\Vert J\right\Vert \cdot \left\Vert {J}^{-1}\right\Vert $$
(3)
$${I}_{cl}={I}_{d}{\left(J\right)}^{-1}$$
(4)

Organization

Upon downloading, 40 folders can be found in the root of the dataset. Each folder corresponds with one participant, whose identifiers are defined in Table 4, followed by T01 or T02, being the numbers the identifier of the movement as shown in Figure 2. Thus, each folder contains the data of a movement of a single parcitpant. Inside each folder, there are the following folders and files:

  • ColorFramePng (Folder): Contains the color frames corresponding with the user and test.

  • DepthFramePng (Folder): Contains the depth maps frames corresponding with the user and test.

  • Performance Index (Folder): Contains a csv file with the KMI, DMI and LCI, and a pdf of the data plotted for visualization.

  • XX_YY_emg_data.mat (File): Contains the EMG data corresponding with the user and test.

  • XX_YY_skeleton_tracking (File): Contains the keypoint tracking data corresponding with the user and test.

  • XX_YY_summary (File): Is a summary of the gathered data for each timestamp.

Technical Validation

The amount of data (number of files and disk space) categorized by test is shown in Table 6. In total, the KIMHu dataset comprises 5400304 synchronized samples for each color image, depth map and skeleton data. The same number for each channel of the EMG stream and HMM indexes are also included. There were twenty participants involved in this study with different anthropometric features to ensure a proper coverage of the sample space.

Table 6 Dataset metrics for each test.

Benchmark

One of the goals of our dataset is to serve as a benchmark and a global framework of comparison for HMM prediction systems. Thus, alongside the data itself, a set of methodologies to enable an easy and fair comparison are also proposed. These methodologies, which are thoroughly described in this section, include different strategies to split data into train and test and how to compute accuracy metrics. Finally, a baseline approach that takes EMG signals as input and predicts the corresponding KMI, DMI and LCI is proposed as well.

Train and test splitting methodologies

In order to properly test the learning algorithms and the generalization capabilities, two different train and test splitting methods are proposed.

First, all the samples must be shuffled together. A sample is understood as a piece of data that is correlated to an expected output, so it depends on the approach. For instance, it could be a system that takes as input a number of EMG data to predict the corresponding KMI. In this case, a sample is composed of time_steps × emg_channels. Thus, after the samples are randomized, 70% should be used for training and the remaining 30% for testing the algorithms. This methodology, which follows default figures in machine learning, ensures that the algorithm has enough data to be properly trained whilst it is tested against unknown data. Both splits ensure the presence of samples of all the range of input and output data however do not provide insight about the generalization capabilities towards different human participants as all of them are present in both the training and the test set. This is inconvenient as the algorithm could perform unexpectedly when deployed in a real use case where the participants were not considered in the training process. Nonetheless, it still is a valuable metric to measure the performance of the approach.

Due to this, a second splitting method is suggested. In this case, the samples of random 14 participants should be used for training, and the remaining 6 for testing. By training and testing with a set of different users, a proper measure of the generalization capabilities of the methods can be obtained, thus providing insight about the real performance on an actual use case.

For the sake of comparison, these methodologies are named “data-centric” and “user-centric” respectively. Finally, it is worth mentioning that all metrics should be reported for both splitting methods.

Metrics and comparison framework

As stated before, in order to set a proper comparison framework for the scientific community, the metrics that should be reported are proposed here.

At this point, it is worth recalling that the labels of the dataset KMI, DMI and LCI are real and continuous values, thus the regular categorical accuracy is not meaningful.

This is why the computation of the percentage of test samples that yield an error below different thresholds should be reported. These measures provide an easy-to-understand score about the performance of regression models. The thresholds for KMI are T01 = 0.0001, T1 = 0.001 and T2 = 0.002. The thresholds for DMI are T01 = 0.05, T1 = 0.5 and T2 = 1. The thresholds for LCI are T01 = 0.001, T1 = 0.01 and T2 = 0.02. These values are set to provide three difficulty levels, from most restrictive to laxer. For instance, the KMI in the dataset varies from 0 to ~0.02, so an error of T01 represents a displacement of a 0.5% regarding the total range of values.

The error that should be obtained for each sample is the Absolute Error (AE), which is defined as the absolute difference between the actual value of the label y and the predicted value f(x), that is \(AE=| y-f(x)| \).

Baseline

In order to set a basic baseline, a range of state-of-the-art regression methods were adopted to predict the HMM values from EMG input. To train the algorithms, the proposed dataset was processed using the sliding window method with a stride of 5. This is done to provide the algorithms with temporal context. Each sample is composed of 512 contiguous readings of the 4 EMG sensors, so each sample has 2048 features. The corresponding label is the KMI, DMI or LCI of the last instant considered in the sample. Both data-centric and user-centric splitting methodologies were followed, as defined in Section Train and Test Splitting Methodologies. Different models were trained regarding the 3 different HMM problems mentioned before for each regression algorithm. Finally, the obtained results are shown in Table 7. The values reported were computed as explained in Section Metrics and Comparison Framework.

Table 7 Metrics obtained for prediction of the KMI, DMI and LCI from EMG data by different regression approaches to serve as a baseline.

First, LightGBM35 was adopted. This is a framework, created by Microsoft, that provides a special implementation of a Gradient Boosting Decision Tree classifier. Particularly, LightGBM introduces two main features. First, gradient-based one-side sampling discards input data instances with small gradients, significantly reducing the number of samples. On the other hand, exclusive feature bundling enables the bundling of mutually exclusive features reducing, thus, its number. Both improvements allows LightGBM to provide accurate predictions whilst keeping the computation cost at bay. This method was selected among other because it has reportedly shown comparable performance to deep-learning algorithms in certain problems, and has been successfully used for a range of tasks such as land cover classification36, regression of structure-activity relationships37 or delivery time prediction38.

Linear Regressor was also employed. This is an ordinary least squares linear regression. This algorithm fits a linear model with coefficients \(w=({w}_{1},...,{w}_{p})\) to minimize the residual sum of squares between the observed targets in the dataset, and the targets predicted by the linear approximation. This is a naive but useful regression algorithm.

XGBoost was also involved in the benchmark. XGBoost stands for “Extreme Gradient Boosting”, which is heavily based in39. XGBoost works as Newton-Raphson in function space unlike gradient boosting that works as gradient descent in function space, a second order Taylor approximation is used in the loss function to make the connection to Newton Raphson method.

Finally, the last algorithm is ElasticNet. The main purpose of ElasticNet regression is to find the coefficients that minimize the sum of error squares by applying a penalty to these coefficients. ElasticNet combines L1 and L2 (Lasso and Ridge) approaches. As a result, it performs a more efficient smoothing process. Elastic Net first emerged as a result of critique on Lasso, whose variable selection can be too dependent on data and thus unstable. The solution proposed by the algorithm is to combine the penalties of Ridge regression and Lasso to get the best of both worlds.