Introduction

Cardiorespiratory fitness (CRF) is established as a strong predictor of health (Kodama et al. 2009; Harber et al. 2017). A single measurement of CRF is a stronger predictor for mortality than high blood pressure, smoking, obesity and type 2 diabetes (Myers et al. 2002). CRF typically decreases with age (Betik and Hepple 2008); the rate of decline accelerates at 45 years and is even faster at 65 years (Jackson et al. 2009). Decreased CRF in the elderly can significantly impair functional capacity in everyday life and increase the risk of cardiovascular mortality (Kokkinos et al. 2010). From a health perspective, it is therefore particularly useful to monitor CRF in the elderly.

The maximal oxygen uptake (VO2max) test is the gold standard (Fletcher et al. 2001) for measuring CRF. Performing a VO2max test is time consuming and requires expertise and expensive ventilatory gas-exchange equipment. The test also requires the participant to perform a maximal effort that can be intimidating for some parts of the population. It is especially, challenging for an elderly population prone to abnormal gait (Mahlknecht et al. 2013), impaired balance (Lin and Bhattacharyya 2012), and muscular weakness (Julius et al. 1967). In addition, the elderly are often more apprehensive about performing maximal effort than younger age groups. Since CRF is such an important predictor of health outcomes, increasing its availability may enable identification of elderly individuals with low VO2max in need of medical care or lifestyle interventions. The American Heart Association has stated that CRF should be used as a clinical evaluation tool (Ross et al. 2016).

Submaximal tests estimate VO2max based on heart rate response at one or more submaximal work rates (Noonan and Dean 2000). The Åstrand test (Å-test) (Astrand and Ryhming 1954; Astrand 1960) is one of the most commonly used submaximal cycle ergometer tests and utilizes the heart rate response to one submaximal work rate. This test has been validated for a population up to 65 years. The validity of this method for individuals older than 65 years is to a large extent unknown.

Another predictive submaximal VO2max ergometer cycle test is the Ekblom-Bak (EB) test (Ekblom-Bak et al. 2014; Bjorkman et al. 2016). The EB-test consists of exercise at one standardized, low work rate followed by a higher, individually set work rate. This test has been developed and validated in a mixed sample of men and women (aged 20–86 years), with a wide range of VO2max (ranging from 1.33 to 5.97 L min−1). This test has shown reasonably strong validity (Ekblom-Bak et al. 2014; Bjorkman et al. 2016) with a coefficient of variation (CV) of 9.2% and 8.4% for women and men, respectively and a standard error of the estimate (SEE) of 0.24 L min−1 and 0.31 L min−1 for women and men, respectively.

Since neither tests has been validated for use in an elderly population of men nor women with a lower VO2max, the aim of this study was to validate the submaximal EB-test and Å-test in an elderly (> 65 years) population, using directly measured VO2max, as a reference. Based on previous research, we speculated that both the EB-test and the Å-test would give valid estimates of VO2max in an elderly population.

Methods

Participants

Participants were recruited through local newspapers and flyers. Exclusion criteria were severe joint problems, very high blood pressure, or other cardiovascular problems, psychiatric illness or neurological disease. Of the initially screened 170 volunteers, 120 performed the required maximal and submaximal tests for the present study. After being assessed for the VO2max criteria and the submaximal heart rate criteria for the Å method, a final sample of 104 participants was included (52 women and 52 men, age range 65–75 years with a mean age of 70.6 ± 2.9 years). Prior to undertaking the physical tests, participants answered a single item categorical answer mode questionnaire (Olsson et al. 2016), where they self-rated moderate to vigorous physical activity.

All participants visited the Åstrand Laboratory at the Swedish School of Sport and Health Sciences on one occasion to perform the submaximal and maximal testing. Test duration was approximately 60 min. Each participant provided written consent before the start of the tests. The present study is a part of a larger study, which was approved by the Stockholm ethical committee (2017/1115-31/4). Table 1 shows participant characteristics.

Table 1 Anthropometry and physiological characteristics of the study sample, mean ± standard deviation

Submaximal and maximal aerobic tests

Participants were instructed to abstain from eating 90 min prior and not to consume caffeine or nicotine 2 hours prior to testing. Furthermore, they were instructed not to perform any heavy or prolonged physical activity the day before or on the day of the test. The test started with participants resting in a seated position for 15 min, where they received oral information about the test procedure. Instructions were also given on how to use Borg´s scale of perceived exertion (Borg 1970). Participants were equipped with a heart rate (HR) monitor (Polar model H7, Kempele, Finland) and watch (Polar model M400, Kempele, Finland). The test was initiated at resting heart rate.

Submaximal test

The submaximal test was performed on a mechanically braked cycle ergometer (Monark model 828E, Varberg, Sweden). Participants were instructed to pedal at a cadence of 60 RPM and not speak or adjust position for the duration of the test. Total duration of the test was 8 min, with the initial 4 min performed at a fixed work rate of 0.5 kilo pounds (kp), directly followed by 4 min at a higher individualized work rate that varied between 1–3 kp. The individualized work rate was subjectively chosen by the test leader with regard to gender, body size, training background, and training status. Mean HR was recorded for the final minute of each work rate (calculated as the average of the heart rate recorded at 3:15, 3:30, 3:45, and 4:00). At the high work rate, if the Borg RPE was lower than 12 after the first minute, the load was further increased and the test duration was increased by 1 min.

Maximal test

A maximal incremental treadmill test was performed directly following the submaximal test. A maximal test uses the directly measured gas-exchange during an incremental effort to assess the gas-exchange threshold and hence the oxygen consumption (VO2max). To measure gas-exchange (O2 and CO2), a mask with a flow meter connected to a gas analyzer (Jaeger Oxycon Pro, Hoechberg, Germany) was used. Flow meter volume and gas were calibrated prior to each test using a precision gas mixture (15.00 ± 0.01% O2 and 6.00 ± 0.01% CO2, Air Liquid, Kungsängen, Sweden) and ambient indoor air. All participants wore a safety harness attached to the roof when performing the maximal treadmill test.

Before the maximal test, participants were allowed a short rest (~ 2 min) and a familiarization/warmup session (~ 1–2 min) followed by another short rest (~ 1–2 min). When performing the familiarization/warmup session, participants initially walked at an individualized comfortable speed and 1° inclination and progressed to an individualized higher speed and incline. The maximal incremental VO2max test started at a comfortable pace and 1° inclination, most times at a walking speed of 3.5–5.0 km/h. The protocol for each participant was individually set, with the aim of reaching respiratory exchange ratio (RER) 1.0 at 5–6 min and RER 1.1 at 6–8 min. A few participants were unable to run due to previous injuries and instead walked at a moderate speed and steep inclination. At the end of the maximal incremental test, speed and inclination were increased more frequently to ensure a maximal plateau of oxygen consumption (leveling off) was reached. Criteria for an approved test were: a leveling off in VO2 despite an increase in speed or incline, a RER ≥ 1.1, RPE ≥ 17, or test duration ≥ 5 min. When the plateau criteria and two of the remaining three criteria were met, the test was accepted as a VO2max. A small coefficient of variation (2.7%) between a first and second test using the same protocol as mentioned above has previously been reported (Howley et al. 1995; Ekblom-Bak et al. 2014).

Data analysis

The EB-test calculations estimated VO2max (in L min−1) using the gender-specific equations (equation for women: ln VO2max = 1.84390 − 0.00673 (age) − 0.62578 (ΔHR/ΔPO) + 0.00175 (ΔPO) − 0.00471 (HR at standard work rate). Equation for men: ln VO2max = 2.04900 − 0.00858 (age) − 0.90742 (ΔHR/ΔPO) + 0.00178 (ΔPO) − 0.00290 (HR at standard work rate)) (Bjorkman et al. 2016). The equations uses the increase in heart rate (HR) in relation to the increase in work rate (PO), sex, age, and the HR at the lower and higher work rates. For the Å method, HR was taken from the higher work rate of the EB-test. The Å-test calculations estimated VO2max using the Åstrand nomogram and the Åstrand extrapolated age correction factors (with a decreasing factor from 65 years of 0.006 per year).

For relative VO2max, the absolute L min−1 was divided with body weight and multiplied by 1000 to get ml kg−1 min−1. Low fit was seen as the quartile with the lowest measured VO2max in women and men, respectively. High fit was seen as the quartile with the highest measured VO2max in women and men, respectively.

Statistical analysis

SPSS Inc (Chicago, III, US) was used for all statistical analyses. Descriptive data are presented as mean ± SD (range). A Shapiro–Wilk test was used to determine normal distribution, which was present for all tested parameters.

Pearson’s correlation coefficient (r) was calculated between the variables estimated from the submaximal test and directly measured during the maximal test. Paired Student’s t tests were used to determine differences between measured and estimated VO2max. To determine whether validity was different for different fitness levels, HR levels, or self-reported physical activity levels, Pearson’s or Spearman’s correlation coefficients were calculated between the difference of estimated and measured VO2max and VO2max, maximal heart rate, and self-rated physical activity. We regarded the Pearson’s r and Spearman’s ρ as weak (< 0.10), modest (0.1–0.3), moderate (0.3–0.5), strong (0.5–0.8), or very strong (0.8–1.0). Standard errors of the estimate (SEE) were derived from a linear regression model to show the variation around the regression line. To determine the variation in relation to its mean, we used coefficient of variation (CV), which was calculated using the SD of the differences between measured and estimated VO2max divided by the mean of the two methods. 95% confidence intervals (95% CI) were calculated for the difference between estimated and measured VO2max. Limits of Agreement (LoA) was calculated using the equation: mean of the difference between estimated and measured VO2max ± 1.96 multiplied by the SD of difference between the two methods. LoA is expected to include 95% of the differences between the two measurement methods. Significance level was set at p < 0.05 and we regarded 0.05 ≥ p < 0.1 as a trend (Curran-Everett and Benos 2004).

Results

Mean measured VO2max in women was 1.89 L min−1 ± 0.22 and mean estimated VO2max was 1.88 ± 0.3 L min−1, using the EB method and 1.76 L min−1 ± 0.44, using the Å method. In men, mean measured VO2max was 2.83 L min−1 ± 0.4 and mean estimated VO2max was 2.88 ± 0.3, using the EB method and 2.54 L min−1 ± 0.55, using the Å method.

There were no significant differences between estimated and measured VO2max in neither women nor men for the EB method. However, the Å method significantly underestimated women’s fitness by − 0.12 L min−1 (95% CI − 0.22 to − 0.02) and men’s fitness by − 0.28 L min−1 (95% CI − 0.42 to − 0.14) (Table 2 and Fig. 1). LoA for women was − 0.43–0.39 L min−1, when using the EB method and − 0.83–0.59 L min−1, when using the Å method. Corresponding values for men were − 0.60–0.70 L min−1 and − 1.28–0.71 L min−1, respectively (see online resource 1 to 4).

Table 2 Validity of the EB-test and the Å-test, absolute VO2max, mean ± standard deviation
Fig. 1
figure 1

The submaximal EB method and the Å method vs. absolute (L min−1) and relative (ml kg−1 min−1) measured VO2max. The correlation coefficients and equations in the figure are for both genders together. Correlation coefficient, r, for absolute estimated VO2max in women was 0.64 (EB method) and 0.55 (Å method). Correlation coefficient, r, for absolute estimated VO2max in men was 0.44 (EB method and Å method). Correlation coefficient, r, for relative estimated VO2max in women was 0.81 (EB method) and 0.70 (Å method). Correlation coefficient, r, for relative estimated VO2max in men was 0.56 (EB method) and 0.49 (Å method)

CV was somewhat lower for the EB method compared to the Å method in both women (11.1% vs. 19.8%) and men (11.6% vs. 18.9%), accompanied with a smaller SEE in both women (0.20 L min−1 vs. 0.36 L min−1) and men (0.25 L min−1 vs. 0.50 L min−1) (Table 2).

Similar tendencies were found for relative estimated VO2max (ml kg−1 min−1) as for estimated absolute VO2max (L min−1) (Table 3). The EB method showed no bias for women (− 0.02 ml kg−1 min−1, 95% CI − 0.08 to 0.04) or men (0.05 ml kg−1 min−1, 95% CI − 0.04 to 0.14), while the Å method significantly underestimated women (− 0.12 ml kg−1 min−1, 95% CI − 0.22 to − 0.02) and men (− 0.28 ml kg−1 min−1, 95% CI − 0.42 to − 0.14). CV for the EB method and the Å method was 11.1% vs. 19.8% for women and 11.6% vs. 18.9% for men, respectively.

Table 3 Validity of the EB-test and the Å-test, Relative VO2max, mean ± standard deviation

In women, the estimated error (difference between estimated and measured VO2max) for the EB method displayed a trend toward being associated with VO2max level (p = 0.051, r = − 0.27) and self-rated physical activity level (p = 0.059, ρ = 0.26). It was also significantly associated with maximal HR level (p < 0.01, r = − 0.60). In men, the estimated error was associated with VO2max level (p < 0.01, r = − 0.67) and maximal HR level (p < 0.05, r = − 0.32), but not self-rated physical activity level (p = 0.22, ρ = − 0.17). Estimated error for the Å method in women was not associated with VO2max level (p = 0.50, r = − 0.10), but was with maximal HR level (p < 0.01, r = − 0.654) and self-rated physical activity level (p < 0.05, ρ = 0.28). In men, estimated error was not associated with VO2max level (p = 0.17, r = − 0.19) or self-rated physical activity level (p > 0.05, ρ = 0.05), but was associated with maximal HR level (p < 0.01, r = − 0.60).

Discussion

The main finding was that there was good agreement between measured and estimated VO2max when using the EB method in a population of elderly men and women. In addition, there was good agreement between estimated, using the EB method, and measured VO2max over the full VO2max spectrum for women. Low fit men were overestimated and high fit men were partly underestimated using the EB method. The Å method significantly underestimated VO2max in both women and men. Precision expressed as CV and SEE for the EB method was almost half that of the Å method in both men and women.

No previous studies have validated the EB or the Å-test in a large elderly population (> 65 years). However, there are other submaximal tests commonly used for the elderly and one of them is the 6 min walk test (6MWT). The 6MWT was developed to estimate VO2max from a single test (Ebbeling et al. 1991). This test has shown varying results in individuals with cardiopulmonary disorders (r = 0.21–0.70, mean r = 0.59) (Ross et al. 2010) and does not seem to be a valid test for relatively healthy elderly populations (r between estimated and measured VO2max for women was not significant, men r = 0.8) (Andersson et al. 2011). The 6MWT is easy to perform and has a high correlation with measured VO2max for elderly men but not for elderly women, and the test has a high variability with a relative SEE of ~ 27% (Ross et al. 2010). The 5 min pyramid test (5MPT) has a strong correlation to measured VO2max in elderly women (r = 0.78) and a very strong correlation for elderly men (r = 0.98) (Andersson et al. 2011). However, the 5MPT is a maximal test where factors such as motivation and anaerobic capacity may impact results. In comparison, both the EB-test and Å-test are performed submaximally, making them more accessible for populations that are not willing or able to perform maximal effort. The EB method and the Å method had a relative SEE of 11.4% and 20.5%, respectively, in the present study which is better than in the 6MWT.

Although estimated VO2max using the EB method agreed with measured VO2max in the overall male and female population, estimated VO2max for men with low fitness was found to be overestimated. In a previous validation study, including young girls and boys (age 10–15 years), estimated VO2max in pre-pubertal boys with similar levels of measured VO2max as the low fitness men in the present study was also found to be significantly overestimated (Bjorkman et al. 2018). However, when the EB equation for estimating VO2max in women was applied to the pre-pubertal boys in the Bjorkman et al. study, a significantly higher validity and agreement with measured VO2max was seen. Hence, we reanalyzed the data from the low fitness men using the EB equation for women. This resulted in a decreased estimation error to non-significant levels [0.04 L min−1 (− 0.02–0.09)] and a subsequent higher correlation in all men (r = 0.73 using the EB equation for women, compared to r = 0.44 using the EB equation for men). However, it was not possible to distinguish the men with low cardiorespiratory fitness using any other variables in the current study other than measured VO2max.

We speculate that a decline in testosterone levels with age could affect physiological variables (DeFina et al. 2018; Hosick et al. 2018; Kelsey et al. 2014), which may influence results when using the EB equation for men. Another explanation might be that both pre-pubertal boys and elderly men with low absolute VO2max are outside or at the lower end of the VO2max range in the EB sample for men in Bjorkman et al. (2016), indicating a more uncertain estimation of VO2max in all low fit men (< 2.55 L min−1). For clinical practice, the overestimation of elderly men in the present study with low absolute CRF means that elderly men with a general low fitness level and small body stature should be considered at risk of having their VO2max overestimated when using the EB-test. Therefore, it would be advantageous for this population to use the EB equation for women.

The Å-test was initially developed for 18–30 year old fairly well-trained individuals (Astrand and Ryhming 1954). Later, an age correction formula was developed to be able to apply the test to a wider range of ages (Astrand 1960). However, even with the age correction formula, many studies have reported that the Å-test still underestimates VO2max (Jessup et al. 1977; Jette 1979; Kasch 1984; Hartung et al. 1993), in agreement with the present study on elderly. This could be partly due to the Å-test having been developed with data from both maximal cycle and treadmill tests, since measured VO2max tends to be lower when using a cycle ergometer compared to a treadmill (Hermansen et al. 1970). Moreover, in 1960, when the age correction factor was developed, the creators raised a concern that it may underestimate the VO2max of older adults by 10% since they displayed lower lactate levels compared to the young adults in the study (Astrand 1960). This led to the belief that the older participants may not have reached their true VO2max. Another important factor that could go toward explaining the underestimation of the Å-test in the present study is that the test was designed to be performed on two occasions to eliminate variables such as nervousness and other factors that affect the absolute submaximal HR level on the first test occasion. The EB and the Å-tests both assume a decline in VO2max with age, which is usually the case in larger populations (Jackson et al. 2009). A discrepancy between biological and chronological age is a source of error that increases the uncertainty of the EB-test and the Å-test.

In previous studies (Ekblom-Bak et al. 2014; Bjorkman et al. 2016), the difference between estimated and measured VO2max using the EB-test was not dependent on maximal HR level. However in the present study, the difference between estimated and measured VO2max was found to be dependent on maximal HR for both women and men, when using the EB method. Incidentally, the same result was also seen when using the Å method. The lack of agreement between the current and previous studies is possibly due to the age difference in the sample populations (elderly vs. age mixed), resulting in a higher number of individuals with a low maximal HR in the present study. The present study showed that an individual with low maximal HR has a higher risk of having their VO2max overestimated. In previous studies using the EB-test, the difference between estimated and measured VO2max was dependent on VO2max, as it was in the present study for men, while for women there was a tendency (p = 0.051). In other words, elderly men with low measured absolute VO2max are at risk of being overestimated and elderly men with high VO2max are at risk of being underestimated, when using the EB method. The difference between estimated and measured VO2max had a tendency toward a relationship (p = 0.059), for the EB method, and was significantly correlated, for the Å method, with self-rated physical activity for women, but not for men. This suggests that the self-rated single item questionnaire may be useful for further strengthening the female, but not the male, algorithms for the estimation of VO2max.

Strengths and limitations

The strength of the study is the modality with which the measured VO2max tests were undertaken, i.e. walking or running on a treadmill. It is a modality that most people are familiar with (Lear et al. 1999). A maximal test on a treadmill utilizes greater recruitment of exercising muscle mass than a VO2max test performed on a cycle ergometer, where local fatigue in leg musculature could lead up to an 20% lower VO2max (Myers et al. 1991). On the other hand, when performing a submaximal VO2max test, it is better to use a cycle ergometer because of the low variability in energy expenditure at a certain work rate between individuals (Ekblom and Gjessing 1968).

This study adds to the pool of studies investigating directly measured CRF in an elderly population. Mean VO2max in the present study sample was similar to an elderly group in a previous study using the EB-test (Bjorkman et al. 2016). In comparison, a large Norwegian study, where a sample (n = 129) of > 70 year olds was tested on a treadmill, reported similar VO2max levels (women 1.85 ± 0.35 L min−1, men 2.81 ± 0.5 L min−1) (Loe et al. 2013) to the present study. Other Nordic studies where VO2max was measured directly in the elderly using a cycle ergometer have shown slightly lower VO2max values (Andersson et al. 2011; Eriksen et al. 2016). A limitation in the present study could be the self-selection and exclusion criteria resulting in a selected sample of participants. Most likely, this resulted in the present sample being biased toward a higher CRF than would generally be seen in the elderly population in Sweden. The present findings indicate that the EB-test is a good test for the elderly population, but that population-based studies will ultimately be required to ensure generalizability.

Future perspectives

It has been shown that the EB-test can be used to monitor long-term changes in CRF in an age and gender mixed population (Bjorkman 2017). Future intervention studies are needed to evaluate the ability of the EB-test to also monitor CRF in an elderly population and to identify subtle changes that may result from health promoting interventions such as a physical activity on prescription (Kallings et al. 2008). Another important topic for future research is if the EB-test is affected by certain medicines such as beta-adrenergic blockers and stimulators that affect the function of the cardiorespiratory system. Lastly it would be advantageous to further study the relationship between the EB-test and measured VO2max in low fit men and thereafter possibly adjust the EB equation for better VO2max estimation.

Conclusion

The validity of the EB-test in an elderly population was satisfactory both in women and men combined and in women alone but not in men. We found a moderate correlation between the EB method and measured VO2max in men; however, there was an overestimation of VO2max in men with low fitness. The moderate correlation and overestimation of VO2max in low fit men, in contrast to the good correlation and the similarity between estimated and measured VO2max in women, could be due to a gender difference in the physiological variables that affect VO2max with increasing age. Alternatively, the EB equation for men is unable to correctly estimate VO2max in men of all ages with low absolute VO2max. The Å method significantly underestimated VO2max in both women and men and had a variability that was almost twice that of the EB method. The current study therefore supports using both submaximal methods for population-based studies aiming to evaluate cardiorespiratory fitness as a predictor of health outcomes. On an individual level, the EB method appears suitable for estimating CRF in elderly women, but has insufficient precision for this purpose in elderly men. While the Å method was more accurate than the EB method at identifying the low fit men, its high variability still suggests that it should not be used alone for identifying individuals in need for lifestyle or medical support in elderly populations.