Physical function is of major relevance for older people’s quality of life, cognitive state and independence in activities of daily living (ADLs) [1,2,3]. Decline in physical function leads to disability in older adults [4], physical activity is associated with greater physical function [5]. As the percentage of older people in our communities steadily grows, it is becoming increasingly important to understand physical function in a bid to avoid care dependency and disability. Loss of independence and the inability to perform ADLs impose a growing burden on the health care system. In general, “physical performance”, including parameters of function, strength/power, balance and endurance, in older people is usually reviewed by comprehensive geriatric assessments such as handgrip strength, short physical performance battery, timed up & go, and stair climb power test. Choosing the most sensitive and meaningful instrument for assessing physical performance is a major challenge, particularly because the group of subjects 70+ years is very heterogeneous.

Cut-off values and clinically meaningful changes for the risk of functional decline and adverse health outcomes have already been defined for several of the aforementioned established tests. For walking speed, for instance, a cut-off of 0.8 m/s was identified as being associated with adverse health outcomes [6]. Taking more than 20 s for the timed up & go test is associated with low mobility [7]. Participants who took longer than 16.7 s to rise from a chair five times represented the slowest quartile in a cohort of 1122 subjects and exhibited the highest percentages of four-year follow-up disability [8]. Muscle strength and muscle power are important determinants of physical performance and mobility skills in older adults [9,10,11]. While strength is defined as the ability to exert force, muscle power is defined as the ability to exert force over time (power = force × velocity) [9]. Both parameters have significant effects on the fear of falling and the quality of life [12], and play a special role in the screening and diagnosis of sarcopenia [13], a disease that may lead to a loss of independence. Even a single measurement of handgrip strength has shown to be predictive of health outcomes [14]. Using data from the Women’s Health and Aging study, Xue et al. sought to predict the risk of falling, physical disability, and frailty by the rate of decline in grip strength. However, they concluded that greater baseline handgrip strength was significantly associated with a lower risk of IADL disability and frailty [15]. McKinnon et al. postulated that muscle power declines earlier than muscle strength, due to a reduction of motor unit numbers with aging [16]. Therefore, muscle power may exert a greater influence on physical performance than strength [9, 10]. Leg muscle power can be estimated from chair rising or stair climbing [17, 18]. Common tests to assess leg muscle power in older adults are the five times chair rise test (which is part of the short physical performance battery) [4] and the stair climb power test [18]. The five times chair-rise test has proven useful in clinical decision-making, although it exhibits limitations to discriminate good and poor performers in terms of balance disorders [19]. Many older adults are unable to perform the stair climb power test due to orthopedic or neurologic problems, a lack of power or fear of falling [20]. Many tests in the geriatric context exhibit ceiling effects. Therefore, a distinction in high-performers is challenging.

To our knowledge, only few previous studies have addressed longitudinal data on the physical performance of a high-performing population 70+ , measured by a comparably broad battery of assessments, including function, strength and power, balance and endurance with emphasis on the individual course of physical decline. In view of high functional level of the study population we did not focus on established geriatric thresholds, but rather on changes over time since we believe it is of utmost importance in the sense of primary prevention to detect an individual’s risk at a very early stage. Additionally, most studies have investigated younger subjects or subjects with a wider age range [21]; conducted fewer physical tests; monitored the follow-up examination only by (telephone) interview; or did not consider individual trajectories.


The present longitudinal observation Versa study (prediction for maintaining self-employment in old age) of older independent community-dwelling people aged above 70 years was part of the primary prevention project called AEQUIPA (physical activity and health equity: primary prevention for healthy aging). The aim of the study was to describe the development of physical performance in a high-performing group over two years, measured by comprehensive geriatric assessment, and to identify the most predictive tests for individual deterioration to minimize this assessment. This may be useful for identifying individuals with the highest risk who may benefit from early intervention, e.g. fitness programs.


Study population

Community-dwelling older adults without any acute health problems participated in the study. Recruitment took place in sports clubs, senior appointments, music societies, rehab sport centers, physiotherapy departments and via newspaper advertisements. The study inclusion criteria were: a minimum age of 70 years; community-dwelling; no severe acute diseases (e.g. lung, kidney or heart); no difficulties in climbing a flight of ten steps; the ability to attend assessments independently; no pacemaker or other electronic implants; and a timed up & go test < 20 s.

Study design

In this longitudinal observational study, eligible participants were assessed at the study center at the Carl von Ossietzky University Oldenburg for baseline assessment (t0). A written informed consent sheet was sent to the participants after a phone call at least one week before the baseline evaluation. All subjects signed informed consent. Subsequent visits were made after six (t1) and 24 (t2) months; the tests outlined below were performed in a standard manner each time. A health history was recorded including a semi-structured questionnaire for health status (hypertension, past strokes, chronic diseases as e.g. diabetes mellitus and COPD, falls and general health) and medication review. Blood pressure was measured for safety reasons, a value of > 180/95 mm/Hg led to a termination of the study of the affected participant. The study protocol was approved by the Medical Ethics Committee of the Hannover Medical School (MHH) (Nr. 6948), Germany.

Physical performance

Handgrip strength (HGS): HGS was measured using a JAMAR hand-held dynamometer (Jamar, Bolingbrook, IL). The participants performed the test while seated, using both hands and alternating with three trials per hand. The maximum of the mean value of the three measurements of the left or right hand (whichever was the stronger) was taken. Reduced HGS according to frailty criteria was related to body mass index (BMI) and defined according to Fried et al. [22].

The stair climb power test (SCPT): SCPT was used to measure leg power. The time taken (in sec.) (SCPTT) for the subject to climb a flight of ten stairs was measured, and the related power was calculated (in watts, P = m × g × h/t) (SCPTP) according to Bean et al. [18].

The timed up & go test (TUG): The TUG is an established test in community-dwelling older people to reliably assess functional mobility and its clinical change over time [23]. TUG was measured in seconds, and the results were evaluated according to Podsiadlo and Richardson [7].

Short physical performance battery (SPPB): SPPB included the five-times chair rise test (5TCR), 4-m gait speed (4mGS) and balance tests (semi-tandem stand, tandem stand) according to Guralnik et al. [8]. The time (in seconds) taken to perform each component and the cumulative score were used to assess the results.

The six minute walking test (6mWT): 6mWT is a common test for assessing functional exercise capacity and endurance performance over a period of time [24]. In this study, the participants were instructed to walk continuously at their individual habitual pace along a 20 m corridor until the tester asked them to stop; distances were recorded in meters (m). For safety reasons, the instructor accompanied the subject for the full distance. The time taken was measured in seconds using a stopwatch.

Statistical analysis

To describe the cohort, data were given in absolute numbers and in percentages as the mean, standard deviation, minimum, maximum, and the 1st to 3rd quartile, respectively. The results of follow-up data were presented in percentages, with a negative sign for a deterioration of physical performance and a positive, respectively no sign for an improvement. Friedman test, a non-parametric statistical test, was used to detect differences across multiple test attempts. A p value of ≤ 0.01 is referred to be statistically significant.

Reduction of the number of assessments

A principal component analysis (PCA), also known as orthogonal transformation, is a statistical procedure in the exploratory statistic and multivariate data analysis with the aim to convert a set of observations of possibly correlated variables into a set of linearly uncorrelated variables called principal components. The idea of this analysis is to “reduce the dimensionality of a data set, which consists of a large number of interrelated variables, while retaining as much as possible of the variation present in the data set” [25]. In the present study, PCA was applied to identify the most relevant components—and therefore of assessments—of physical performance and to reduce the number of variables of the comprehensive geriatric assessment. Non-metric measures as the categorical variable “semi- and tandem stand” were excluded from PCA [26].

Sub-group identification

The data was visualized using vector graphs. A k-means cluster analysis [27] was used on the basis of t2 data to divide n observations into k clusters in which each observation belongs to the cluster with the nearest mean. In the present analysis, clustering was used to identify subjects with comparable characteristics regarding physical function to find subjects with different levels of function according to the reduced set of assessment. The appropriate number of clusters was decided based on content, as described later.

Individual predictive value

Three hypothetical predictors have been developed to identify subjects with low function (see “Results”: Cluster 4 contains subjects with the lowest functional status). For Predictor 1 we used the cluster centers of the area of lowest physical function and adjacent clusters at t2 to identify the position and orientation of the dividing lines. We then plotted the position of the subjects at t0 and identified which subjects were already in the low function area (Cluster 4) at the baseline. Mean values of vectors of each subject of each cluster have been calculated. For Predictor 2, according to these mean values of vectors of the area of lowest physical function and adjacent clusters at t0 new dividing lines have been calculated. Then the individual position of the subjects at the baseline t0 was plotted and they matched to the cluster of lowest physical function. For Predictor 3, examining the delta values of t0t1, subjects were considered if they deteriorated in terms of strength (y-axis) and mobility (x-axis); in the following, the procedure was identical to that of Predictor 2. Sensitivity and specificity for subjects of lowest physical function (“test positive”) were calculated for all three predictors.


Statistical analysis was performed using IBM Corp. Released 2017, IBM Statistics for Windows, version 25.0, Armonk, NY: IBM Corp. and Matlab R2019 (The MathWorks Inc.).


We included 251 participants (mean age 75.4 years) in the study at baseline; 148 (59%) women and 103 (41%) men. High blood pressure was present in 51.8%, diabetes in 8.4%, past stroke and COPD in 6.8%. Table 1 shows the baseline and follow-up characteristics. Four (1.6%) subjects dropped out of the study after six months, and a further 19 (in total 23 = 9.1%) after 24 months. The majority of the subjects (n = 196, 78.1%) had an age-associated normal BMI between 20 and 30 kg/m2; one (0.4%) subject was malnourished (BMI < 20 kg/m2); and n = 54 (21.5%) were obese (BMI > 30 kg/m2). According to the established geriatric threshold, n = 53 (21.1%) of the subjects exhibited a low handgrip strength at the baseline, n = 67 (26.7%) reached less than 3 points in the 5TCR test and none of the subjects needed more than 20 s for the timed up & go test (according to inclusion criteria) (data not shown).

Table 1 Characteristics at the baseline (t0), after six months (t1) and after 24 months (t2)

Table 2a, b show percentage changes in physical performance tests compared to the baseline for the first and second follow-up, respectively, with the highest percentage changes at the beginning. The highest percentage changes in the first follow-up occurred in 5TCR [2.18 (SD 17.41)%] and 6mWT [1.70 (SD 8.18)%], and in the second follow-up in HGS [− 16.95 (SD 11.55)%] and SCPTT [− 9.15 (SD 16.84)%]. The changes present a decline of function when sign is negative and an improvement when sign no sign is present, which means a positive sign.

Table 2 Changes of physical performance (percentage)

According to the Friedman test, data differs significantly across the three measurements (significance level of p < 0.01, data not shown), except for BMI, SPPB and the SPPB tandem stand test. Only subjects with completed data at all three time points were included in the analysis (n = 208).

Table 3 shows the results of PCA (t2 data) to identify the most relevant variables (assessments) for describing physical performance. BMI, SPPB and the SPPB semi-tandem stand test were excluded from further analysis because they did not differ over time or they were ordinally scaled.

Table 3 Principal component analysis (PCA) at (t2)

Two main components were identified regarding the 24-month follow-up: first (x-axis), a combined time axis strongly associated with mobility measured via the variables SCPTT, TUG and 4mGS; second (y-axis), a component dominated by HGS. A cut-off value of > 0.8 was set for the integration of variables.

The data of vector position at t2 was analysed by k-means cluster analysis to identify sub-groups of comparable physical function. We decided to continue analysis with five clusters for issue-based reasons and due to a lack of a qualitative criterion. Three clusters included only men or only women. In four clusters, delta values (t0–t1) of physical performance differed only weakly. Six clusters resulted in an uneven distribution in terms of the number of subjects in each cluster. The silhouette coefficient, the only established quality criterion for cluster analysis, was approximately comparable in all cluster number variations (approx. 0.5 on a scale between − 1 and + 1) (data not shown). Figure 1 presents trajectories of physical function of all subjects at all three time points (t0, t1, and t2) in the new 2-dimensional coordinate system derived from the PCA. The first follow-up (t0t1) is always presented via a black arrow, the second follow-up (t1t2) in different colors in dependence of its cluster membership (please see the legend).

Fig. 1
figure 1

Vector graph showing physical function at all three time points (black line t0t1, t1t2 colored according to the five clusters)

Table 4 shows the characteristics of the subjects from the five clusters. Cluster 4 is characterized by the highest age of the subjects (79.2 years) and a high percentage of women (87%). Regarding physical function all tests showed the lowest level in comparison to the other clusters.

Table 4 Baseline (t0) characteristics according to cluster membership [mean (SD)]

We calculated sensitivity and specificity of the three predictors by testing how many subjects would have been clustered to cluster 4 at baseline (see Fig. 2). With this approach predictive value of the three predictors can be derived as cluster 4 identifies subjects with the lowest physical function. Figure 2 shows the dividing lines of predictor 1 (blue) and 2 (red). Cluster 4 is colored in magenta. In accordance to the legend, subjects who were plotted at baseline and were additionally identified positively with predictor 1, 2 or 3 were colored differently (blue for predictor 1, red for predictor 2, and green for predictor 3). In addition to the figure, Table 5a, b, c present the results of sensitivity and specificity. Predictor 2 showed highest values of sensitivity as 22 subjects of the 23 subjects of cluster 4 were also identified at baseline. Predictor 1 missed 11 and predictor 3 missed 12 subjects and exhibit therefore a sensitivity of 52%, respectively 48%. Figure 2 shows the dividing lines of predictors 1 and 2 (predictor 3 was defined as a negative delta value of the vector positions, meaning deterioration of physical function, from t0t1 plus dividing lines of predictor 2, therefore extra dividing lines are not available). To judge assessment results in clinical settings regarding the risk of a person i to be a low-performer (“cluster 4”), the following coordinates [t0x(i), t0y(i)] have to be calculated:

Fig. 2
figure 2

Vector graph showing physical function at all three time points. The relevant cluster is presented (t1t2) in magenta. Dividing lines of predictor 1 and 2 are shown in blue (predictor 1) and red (predictor 2). In accordance to the legend the identified subjects of the three predictors are presented in different colors. As predictor 3 is defined as predictor 2 with the additional condition of negative delta values from t0 to t1, no extra dividing lines are able to present. Identified subjects of predictor 2 (n = 22) involve those of predictor 1 (n = 12); predictor 3 is a subset of predictor 1 and 2 (n = 11)

Table 5 Predictor 1, Predictor 2, Predictor 3

t0x(i) = SCPTT*0.912 + TUG*0.905 + 4mGS*0.887 and t0y(i) = HGS*0.836.

Based on this individual coordinate of a person, the following conditions have to be checked in order to decide on cluster 4 membership:

xmin = [(t0y(i) − 97.282)/− 4.52] < t0x(i) and ymin = 1.887*t0x(i) − 17.931 > t0y(i).


The purpose of this research was to describe the longitudinal physical performance of older community-dwelling adults above the age of 70 over 2 years, and to develop a data-driven model to reduce comprehensive geriatric assessment to the most relevant tests that predict individual physical decline of subjects of the observational study VERSA within the primary prevention project AEQUIPA. To this end, inclusion criteria were initially selected with the idea of the subjects being able to participate in intervention action beyond domesticity; we therefore addressed the fittest older adults in our area. The cohort of 251 community-dwelling people (mean age 75.4 years) initially exhibited physical limitations only occasionally. Within the observational period of two years, the majority of the cohort were physically stable in terms of strength, power, balance and endurance, starting from a high level of performance for this age group. Considering the mean values of the present assessment, handgrip in women after two years was close to the BMI-associated cut-offs for low HGS [22]. In 5TCR, the subjects did not score the maximal 4 points in SPPB requiring a duration of under 11.7 s with a mean of 12.4 s initially and 11.1 s after 2 years. The variables with the highest percentage changes after 6 months were 5TCR and 6mWT; those after 24 months were HGS and SCPTT. The divergent result of the development of 5TCR over 2 years has to be discussed. Generally, the regular assessment of physical performance should be considered as low-grade intervention because it motivates the largely fit cohort of older people to perform well in the tests. We assume that the subjects started to exercise before the study commenced, as we experienced a highly motivated and interested study group. However, the effect was not sustained throughout the study period and was only observed for 5TCR by the end of the study.

Many tests of the geriatric assessment exhibit ceiling effects in high-performers. Nevertheless, the developed method enabled us to identify subjects with lowest function. The subjects with the lowest values of physical performance were identified by clustering (Cluster 4) based on a reduced comprehensive geriatric assessment containing the most relevant mobility and strength tests (SCPTT, TUG, 4mGS and HGS). These subjects were close to the threshold for relevant functional decline, or exceeded it; the majority were women (87%). They exhibited a greater decline of HGS (Y-axis) than men, and started out, as expected, from a lower level. We developed a predictor that identifies, with a very high degree of sensitivity (96%) and specificity (82%), subjects who were grouped in Cluster 4. This predictor considered the baseline values as we used the means values of the vectors under consideration of its cluster membership at t2 to predict the assignment to cluster 4 (on the basis of follow-up data). Regarding clinical relevance, patient’s data on HGS, SCPTT, TUG and 4mGS can be judged by using the presented equation which describes the conditions to be localized within the dividing lines of the predictor, meaning that all measurements in clinical settings can be assessed with regard to the risk of low physical function. “Cluster 4 subjects” show the highest deterioration and lowest baseline levels in terms of function. Since the data were collected within a primary prevention project, no endpoints such as “hospitalization”, “death” or “sustainable disability” were relevant, and they occurred only rarely. Identifying older adults in need of prevention at an early stage is most important in this context. After all, identifying a group for which participating in a fitness program (e.g. FITT) could prevent deterioration was the downstream key target. Further intervention studies are required to determine whether intervening at an early stage in the identified group by encouraging them to participate in fitness programs would prolong their independence and increase their quality of life. We believe that by choosing specific tools from the established geriatric assessment, which was developed for use in an extremely heterogeneous population, a high predictive quality can also be achieved for a functionally high-performing group of older adults. Also technology-based approaches provide high potential in this regard [28]. Identification of persons at risk in this apparently competent group would enable primary prevention interventions to postpone functional disability and need of care.