Introduction

Frailty is a geriatric syndrome characterized by decreased ability of the body to respond to external stressors. The consequences include, among others, an individual’s functional decline, increased risk of falls and dependency or ultimately institutionalization, and even death. Frailty is highly prevalent in older adults, occurring in up to 30% of people over 75 years of age, and its appearance can be anticipated, delayed, or avoided (1). Rehabilitation was more successful when it was applied early. This means that the development of a method capable of objectively identifying frailty for use in clinical practice, would greatly increase the diagnostic capacity and scope, and directly increase the success of these interventions. The research hypothesis was that the degree of frailty of a patient could be automatically predicted from the signal obtained during a strength test performed using an adapted hand dynamometer, namely, provide support to frailty evaluation trough an objective measure processed by Artificial Intelligence.

Frailty measurements

The most important frailty measurement scales are the following. Fried et al. (2) which defined the frailty phenotype by the presence of three or more of the following symptoms: unintended weight loss, weak-ness, low resistance, slow movement, and low activity. The Frail-VIG (3) base on the Comprehensive Geriatric Assessment (CGA) is a global diagnostic tool and methodology designed to identify and quantify biomedical and pharmacological data, functional, psychological, and social problems that older adults may present with. The Barthel scale (4), allows the determination of the degree of dependency of a person or the need for help to carry out ten basic daily actions: eating, washing, going to the bathroom, moving, dressing, etc. The Lawton-Brody scale (5) measures the ability to perform activities of daily living. Each area was scored according to the description that best corresponded to the participant.

Fried and other indices or scales for frailty measurements exclusively use the maximum value of the grip strength of the hand as one of the symptoms, but in clinical practice, the geriatrician uses their experience to make quick diagnoses based on the way or the “form” the patient takes on to perform a handshake, that is, how long it lasts in time and how intense it is. We intend to demonstrate the correlation between Hand Grip Strength form and the frailty level of patients. If we can objectively measure frailty, it would mean savings in medical consultation time and even remote evaluation of patients wearing sensors.

HGS related work

Hand grip strength (HGS) has been widely used by investigators and therapists to diagnose sarcopenia and frailty, as it is a reliable indicator of overall muscle strength that decreases with age. The results obtained from these tests were used to verify whether HGS can function as a predictor of disability in older men (6). Grip strength appears to be an explanatory or predictive biomarker of specific outcomes such as generalized strength and function, bone mineral density, fractures, falls, comorbidity load, cognition, sleep, hospital-related variables, and mortality. In fact, the force-time curve was originally an assessment meant to identify maximal or submaximal effort and is a graphic representation of the force applied over a period of time during a single HGS trial. The function that was obtained had the form of a step.

There are different studies aimed to investigate the force-time characteristics. One of them focuses on the sustained maximal grip effort according to age and clinical condition (7), using a modified Martin vigorimeter. Another shows a comparative study of the protocols for diagnosing sarcopenia and frailty according to HGS (8), which is an important focus of the present research, as the protocol proposed here will be contrasted with those proposed in previous studies for the purposes of comparison and innovation.

In the preliminary research presented at IWANN 2021 Congress (9), the starting hypothesis for the present work was already stated as follows: “The analysis of the HGS signal, measured by a hand dynamometer, can objectively diagnose frailty in geriatric patients”. In this previous work, we evaluated the behavior of the HGS signal for short periods of time, obtaining the force-time curve, thus expanding the information provided by the maximum value. Signals were recorded with a modified Deyard dynamometer and processed using machine learning strategies to identify brittleness levels.

In the present study, we extended the number of patients from 83 to 138 older adults and use a different machine learning approach to increase the sensitivity and specificity of the previous study. The research hypothesis was that the degree of frailty of a patient could be automatically predicted from the signal obtained during a strength test performed using an adapted hand dynamometer.

Materials and methods

The instrument

The instrument used to measure the HGS in the present study was a modified constant electronic hand dynamo-meter 14192-709E-EH101–90 kg capacity range, in which the entire electric circuit was modified and replaced by one specific, designed, and constructed at the Technical Research Centre for Dependency Care and Autonomous Living (CETpD) of the Universitat Politécnica de Catalunya (UPC). Because of this modification, the dynamometer continuously measured the HGS over time, and includes the ability to store information, Bluetooth connectivity, and a micro-SD card for data storage using the Inertial Measurement Unit (IMU) developed by CETpD–UPC (10) for long-term monitoring of human pathological movement. The calibration of the system is fully described in at previous study (9).

Pilot protocol and data acquisition

The proposed pilot protocol was approved by the Ethics Committee for Drug Research of the Community of Madrid (Comité de Ética de la Investigación con Medicamentos de la Comunidad de Madrid – Ref 47/916546.9/19).

Participants, who signed informed consents, were selected via convenience sampling (non-probabilistic) from patients who received medical services provided by collaborating entities (“Hospital Central de la Cruz Roja San José” and “Santa Adela de Madrid”, “and “Casa d’Ampara de Vilanova i la Geltrú”), from November 2019 to July 2021. Only researchers from the Health Consortium “Alt Penedès-Garraf” had access to data base during and after data collection.

In the first part of the data acquisition, geriatricians obtained physiological information from each patient. The patient’s labelling between robust and frail of the database was made by the medical staff participating in the clinical trial based on all the information proportioned by the different questionnaires and scales (Fried, Barthel, Lawton-Brody, CGA Index). The second part of the protocol was the HGS test, which consisted of three trials carried out in a sitting position on a chair with the forearm placed on top of the leg in a neutral position (holding the dynamometer perpendicular to the leg), feet firmly on the floor at a shoulder-width distance, shoulders adducted, body neutrally rotated, and dominant hand used (Figure 1). The tests had a duration of six seconds each, with a rest interval of one minute between the tests.

Figure 1
figure 1

Protocol position example

Inclusion criteria

The sample was stratified by the degree of frailty (robust and frail participants). Participants meeting the following criteria were included in the study: age above 70 years, sufficient reading and writing ability to answer questionnaires, and willingness to participate in the study, and acceptance of the standards of performance and procedures established by researchers. Participants meeting the following criteria were excluded: alcohol and/or drug abuse, any type of neurological or osteoarticular affectation (Parkinson’s disease, osteoarthritis, or stroke), inability to provide informed consent and to cooperate with study procedures.

The patient’s labelling between robust and frail of the database was made by the medical staff participating in the clinical trial based on all the information proportioned by the different questionnaires and scales (25). Using the original database, the CRF provides different possible labels. If we choose the Fried Criteria (involuntary loss of weight, low energy or exhaustion, slow mobility, muscle weakness, low physical activity), then the classification is binary: frail and not-frail. Using the Fragile CGA criteria, we can distribute the patients between robust (CGA index < 0.2), prefrail (between 0.2 and 0.5), and frail (CGA index > 0.5). As a first step to achieve some practical results for classification purposes, we decided to combine frail and prefrail into a single class, and to have only two classes for the final classification action. The aim of this simplification was to distinguish between the robustness of frailty tendencies.

The sociodemographic characteristics of the participants, as well as their levels of fragility, are shown in Tables 1 and 2. The present cohort shows an unbalanced data set, so there are more than double frail participants than robust. Balancing techniques are required for obtaining coherent results. An overview of the data shows that there are no direct relationships between the different variables and guides us to search for more complex information, such as temporal variation in exerting the force of the hand (HGS).

Table 1 Age, force peak, and geriatric criteria values for ROBUST and FRAIL groups
Table 2 Sociodemographic data for Frail and Robust groups

Data base creation

First, geriatricians performed obtained physiological information from each patient and evaluated the following scales in clinical trials: Fried criteria (2), CGA Index (3), Barthel scale (4), and Lawton–Brody scale (5). The second part of the protocol was the HGS test, which consisted of three trials carried out in a sitting position on a chair with the forearm placed on top of the leg in a neutral position. The tests had a duration of six seconds each, with a rest interval of one minute between the tests.

In total, 223 valid HGS temporal series or force-time curves, were recorded for 138 patients. The validity of the HGS signal is based on simple geometric criteria of the recorded shape. Each patient was assigned a set of 19 geriatrician variables obtained from different scales and included in the Case Report Form (CRF).

Two Butterworth filters were used in this study. To obtain the geometric features, a 1 Hz low-pass filter was applied to determine the HGS shape. Three phases can be distinguished in the HGS time series after the first filtering process: the force generation phase (segment 1 in Figure 2), force maintenance phase (segment 2 in Figure 2), and force decay phase (segment 3 in the same Figure). A second 5 Hz filter was applied to obtain the dynamic and frequency features.

Figure 2
figure 2

Strength (Kg) vs. time (s) signal and the four characteristic points related to the step shape of the time-series signal

A simple algorithm based on the first and second derivatives provided the HGS time series. These phases were used to extract specific characteristics that could be used to determine whether a patient was prone to developing frailty.

Results

The Cohort

The Tables 1 and 2 describe the main characteristics of the cohort studied for each group of interest, “frail” and “robust”, and also for the global number of participants. Table 1 describes the maximum, minimum, average, and standard deviation values of age, strength peak, and Fried, Barthel, Lawton-Brody, and Fragile-VIG criteria. Table 2 describes the two significant sociodemographic variables, Marital Status and Cohabitation, with corresponding percentages for each group.

Feature selection

As a first part of the individual HGS signal processing we extract 177 statistical features such as: areas, the number of peak values inside a signal, the number of cycles calculated using the rain flow counting method, etc. In order to simplify the classification process, we use the Boruta methodology (11) for feature selection. The results are the 20 most significant features as shown in Table 3.

Table 3 Set of the 20 selected significant features

Data balancing techniques

To avoid a model that was biased toward majority class prediction, owing to the larger number of frail participants than robust ones, different balancing techniques were applied: down-sampling, up-sampling, and hybrid methodologies. Finally, a technique based on a combination of the Synthetic Minority Oversampling Technique (SMOTE) to generate new samples from the smallest group and ENN, technique based on K-nearest neighbours (12) to remove noisy samples provided the best results as a well-balanced distribution of data.

Data training and validation

Finally, the database for training consists of the 20 characteristics for each HGS signal as an input vector, and the robust or fragile label coming from the geriatric indices as an output vector. The database was randomly divided into training and validation sets (80% and 20%, respectively). A k-fold validation with k=10 training strategy was performed (Figure 3). The following algorithms were used, considering standardized and non-standardized data: logistic regression, linear discriminant analysis, K-nearest neighbor classifier, decision tree classifier, Gaussian naïve Bayes, and support vector machines (Figure 4). The following ensemble algorithms were used for the standardized data: AdaBoost classifier, gradient boosting classifier, random forest classifier (10 estimators), and extra tree classifier (10 estimators) (Figure 5).

Figure 3
figure 3

Data training preparation (k=10), partitioning, subsampling (balancing with down-sampling, up-sampling and SMOTE-hybrid) and resampling steps

Figure 4
figure 4

Basic algorithms used as a first round of training and classification: Logistic Regression (LR), Linear Discriminant Analysis (LDA), K-Nearest Neighbors Classifier (KNN), Classification and Regression Tree (CART), Gaussian Naive-Bayes (NB), and Support Vector Machines (SVM)

Figure 5
figure 5

Advanced algorithms achieve the best results in the classification: Random Forest Classifier (RF), Ada Boost Classifier (AB), Gradient Boosting Classifier (GBM), and Extra Trees Classifier (ET)

The balancing techniques that reported the best results were as follows: (1) over-sampling using SMOTE and cleaning using edited nearest neighbors, and (2) a combination of over- and under-sampling using SMOTE and edited nearest neighbors. The scaled random forest classifier achieved the best results, and there was a positive increase in the algorithm optimization.

The confusion matrices obtained for the best algorithm are shown in Figure 6. We can observe the high accuracy of the presented methodology, in the validation phase (data never seen before by the algorithm) 39 patients out of 42 were correctly classified as a fragile or robust.

Figure 6
figure 6

Confusion matrices for the best solution: Scaled Random Forest

The percentages shown in the colored cells represent the ratio of cases in each cell compared to the total cases (181 for training and 42 for validation). Precision values are derived by dividing the cases in each row (the color of the percentage denotes the corresponding row) by the total cases in the column. This calculation involves TP/(TP+FP) and FP/(TP+FP). The recall column (equivalent to sensitivity) mirrors precision but focuses on rows, expressed as TP/(TP+FN) and TN/(TN+FP). The F1-score column is a measure that combines precision and sensitivity, and it is calculated as twice the product of these measures divided by their sum. It reflects the balance between these metrics. The Support column aggregates the number of cases for each class per row in the dataset, providing insight into the balance of pattern distribution.

Discussion

The protocol designed and proposed in this study was successfully implemented via different tests conducted in the study population, in which the modified Deyard dynamometer was calibrated effectively and yielded satisfactory results. The resulting force–overtime signals were similar to those expected after consulting previous studies that used them (1,6,8).

A dataset of 138 patients with valid registration of 223 samples of force-time curves was used to train the different classifiers. The selected Random Forest Classifier was able to predict the frailty label with 92.9% accuracy, achieving sensitivities for robust participants (class 1.0) and fragile participants (class 2.0) of 90% and 93.8%, respectively, when the signal information of hand strength and physiological data of the patient were used.

This study included many layers of development, and we were able to obtain a predictor that could relate the hand force signal to the frailty level. All decisions regarding selection of the best classification algorithm were automated. Data preprocessing methods and their separation can be applied to other signals with similar step-like behaviors.

To achieve a support tool for correctly classifying fragile conditions using a simple test, it is crucial for predictions to have a high level of precision, so that True Positives (fragile) among all subjects can be classified correctly. A value of approximately 93% corroborates this assertion. However, of the same level of importance is the fact that we need to minimize the number of False Negatives, that is, being fragile but predicted to be robust. The sensitivity index, which measures this concept, achieved a high score of approximately 93%. When the combination of precision and sensitivity (recall) was considered (F1-score), a high level of performance was observed for both classes (85.7% and 93.8% for robust and fragile participants, respectively).

Future directions

The database presented here is sufficient for corroborating the possibility of extracting significant knowledge from simple measurements. However, a larger number of patients is necessary to develop a possible instrument for daily practice in medical settings.

An interesting study using the general data set obtained involved a search for correlations between the values of the different criteria and sociodemographic data. Augmenting the granularity of the studied groups, including different groups of women and men, could also be relevant.

Once the best predictive models are obtained, they should be complemented by medical criteria. The present methodology could be a first step for the development of a valuable support tool for general practitioners coping with possible frailty patients.