Introduction

Energy requirements of non-reproductive adults are defined as the amount of energy from food needed to balance energy expenditure so as to maintain body mass and composition as well as to meet the needs to maintain a level of physical activity associated with long-term health1. When body mass is stable, energy requirements are equivalent to total energy expenditure (TEE)1. As such, accurate assessment of TEE is crucial to determine nutritional needs, but also to understand many physiological, biological, and evolutionary processes2,3,4. The doubly labeled water (DLW) technique is the gold standard for measuring free-living TEE5. However, the isotopes and their quantification are costly. As such, other approaches are needed when DLW is unavailable or unaffordable.

Physical activity energy expenditure (PAEE) is the most variable component of TEE3,6, making the estimation of daily requirements challenging. Several approaches have been used to quantify the cost of physical activity, including behavioral observation, questionnaires, heart rate, or motion sensors7,8, but these resulted in limited accuracy. The physical activity level (PAL) can also be calculated as the quotient of TEE and resting metabolic rate (RMR). In turn, TEE can be estimated as multiples of RMR. The latter is a practical approach for controlling for age, sex, body mass and composition as well as expressing energy requirements in a range of individuals varying in habitual physical activity1. The limitations of this approach have been discussed in previous studies9,10 and include multiplicative prediction errors of estimated RMR and that PAL violates the assumption of a non-zero intercept and assumes PAEE depends on the same factors influencing RMR. In a similar additive conception of TEE, others have added the estimated PAEE obtained from accelerometry to RMR11,12. However, this approach ignores sources of expenditure such as the thermic effect of food and thermoregulation costs, and was associated with considerable variability and limited accuracy13. More importantly, this approach assumes that expenditure on activity is additive to the cost at rest, and does not consider the possibility that TEE could be constrained4,14,15 or that compensation might occur16,17.

Several prediction equations have been developed in an attempt to provide accurate estimates of TEE9,10,18. Although limited in number10, body mass, body composition, sex, age, height, and other factors have been used to predict TEE3,19,20,21. Interestingly, the inclusion of PA from accelerometry in these predictive equations does not contribute significantly to the variability accounted for in TEE10,14,21. However, this brings the possibility of testing how different equations that include and exclude PA perform when applied to a sample of individuals with different objectively measured PAL and PAEE.

The purpose of the present study was to determine the accuracy and precision of the available equations to estimate TEE (see Table S1) compared to DLW outcomes in a sample of females and males varying in physical activity levels.

Methods

Participants

Fifty-six healthy individuals (20–58 years; 27 females) with a wide range of habitual physical activity levels were recruited as part of a larger study. The participants were uniformly distributed across levels of self-reported physical activity, walking and/or running from 0 km per week to more than 120 km per week22.

Exclusion criteria were applied to those who were smokers, pregnant or breastfeeding, following fad diets, taking medications that could influence TEE or its components, or with a medical history that prevented their participation in the study. The complete experimental study was approved by the Institutional Review Board at the Virginia Polytechnic Institute and State University (Virginia Tech) (IRB #21-567). All experiments were performed in accordance with relevant guidelines and regulations. The volunteers included in the study were properly informed and verbal and written consent were obtained. Detailed experimental procedures have been described previously22.

Anthropometry, body composition, and resting metabolic rate (RMR)

Body mass (BM) (0.1 kg) and height (to the nearest cm) were obtained using a stand-on scale with stadiometer (Welch Allyn, Scale-Tronix 5002, Skaneateles Falls, NY, USA). Body mass index (BMI) was calculated as kg/m2. Body composition (Fat Mass, FM, and Fat-Free Mass, FFM) was measured by dual-energy X-ray absorptiometry (DXA scan, Lunar Digital Prodigy Advance, software enCORE version 15, GE Healthcare; Madison, WI, USA).

RMR was measured with indirect calorimetry (Parvo Medics, TrueOne 2400 Metabolic Measurement System, OUSW 4.3.4; Murray, Utah, USA) using a ventilated canopy in a rested state (after a minimum of 12 h with no exercise, and after fasting for 12 h) as described previously22,23. The last 30 min of a 45-min measurement period were used for analysis. RMR (kcal/day) was measured twice in an interval separated by 14 days. The second measurement of RMR was used for analysis after documenting stability of body mass and excellent test–retest reliability (r = 0.93; p < 0.001).

Physical activity

Physical activity (PA) was assessed by self-report (walking/running km per week) and with a triaxial accelerometer (ActiGraph GT3X, Actigraph Corporation, Pensacola, FL). Participants wore the accelerometer around their waist continuously for 14 days, removing it only for swimming, showering/bathing, or sleeping. Data collection was described in Prado-Nóvoa et al.22. Only individuals with at least 4 days each week for at least 10 h a day or more wear time were included for analysis. Fifty-three individuals met the established wear time criteria. Mean vector magnitude counts per minute per day (VM CPM) on valid monitoring days were used to quantify physical activity levels objectively. Self-reported physical activity levels (in km/week) were correlated with mean daily steps (r = 0.72, p < 0.001) and VM CPM (r = 0.62, p < 0.001) measured with accelerometry.

Total energy expenditure (TEE), physical activity energy expenditure (PAEE), and physical activity level (PAL)

TEE (kcal/day) was measured with Doubly Labeled Water (DLW) following standard procedures24,25. After the collection of a baseline urine specimen, the participants were orally dosed with deuterium (2H2) and oxygen-18 (18O) in the form of water (2H218O). Doses were calculated according to each participant's body mass, with desired enrichment of 10% 18O and 5% 2H2, as follows (1):

$$\text{dose }\left(\text{ml}\right)=\frac{\text{Body mass }\left(\text{in g}\right) *\text{ desired excess enrichment}}{\text{dose enrichment}}$$
(1)

where desired excess enrichment = 618.923 body mass, kg−0.305; and dose enrichment (10%) 100,000 ppm26. Each participant was provided with a glass containing the precise dose required (weighed to 3 decimal points) and asked to consume all of the dose. To ensure that the entire dose of DLW was consumed, additional water was added to the dosing glass, which was also consumed. The time of dosing was recorded.

The second urine specimen was collected 3 h after the dose. Urine specimens at the second void of the day were then collected daily over 14 days and the timing of each sample collection recorded. Urine samples were encapsulated into capillaries and vacuum-distilled27. The resulting water was analyzed using a liquid water analyzer (Los Gatos Research28). Samples were run alongside three laboratory standards and three international standards (SLAP2; Standard Light Artic Precipitate, vSMOW2; Standard Mean Ocean Water, and GRESP; Greenland Summit Precipitation26,29) to correct for daily variation and convert delta values to parts per million. Isotope elimination rates were converted to TEE using Equation 1 from Speakman et al.25. After obtaining the TEE, PAEE (kcal/day) (2) and PAL (3) were calculated as follows6:

$$\text{PAEE}=\left(\text{TEE}\times 0.9\right)-\text{RMR}$$
(2)
$$\text{PAL}=\text{TEE}/\text{RMR}$$
(3)

Prediction of TEE

TEE was predicted in our participants with 10 different models. We applied four equations published by Plucker et al.20, two equations published by Pontzer et al.3, and three equations published by Vinken et al.19. In addition, TEE was also estimated in our participants using the PAEE estimated by accelerometry, an assumed TEF (10%), and the measured RMR: RMR + ACC PAEE. A detailed description of all the models applied is provided in Supplementary Table 1. The models applied3,19,20 were selected because they are known predictive equations to estimate TEE based on individual characteristics not exclusively relying on an additive conception of TEE. Besides, some of them have been previously evaluated10.

Statistical analysis

The statistical analysis was similar to those previously described by Prado-Nóvoa et al.22. T-test analyses were used to compare sample descriptive characteristics by sex. A one-way repeated-measures analysis of variance (ANOVA), with Bonferroni post-hoc tests, was used to compare measured and estimated TEE means (p < 0.05). Agreement between measured and predicted TEE was analyzed by Bland–Altman plots30. The association between the magnitude of the TEE and the difference between predicted and measured TEE (heteroscedasticity) was examined by regression analysis, and the slope (β) pointed when the relationship was significant (p < 0.05) in the Bland–Altman plots, for the entire sample and each sex separately. Bias was calculated as the mean of the difference between measured and predicted TEE, with Standard Deviation (SD).

Other assessments of accuracy calculated were: lower (LLOA) and upper (ULOA) limits of agreement (Formula 1 in Supplementary Material), mean absolute percent error (MAPE) (Formula 2 in Supplementary Material), mean difference as a percentage (%) (Formula 3 in Supplementary Material), root mean square error (RMSE) and its percentage (RMSE%) (Formulas 4 and 5, respectively, in Supplementary Material). In previous studies, no significant difference between means (p ≥ 0.05), a mean difference (%) ≤ 10%, and an RMSE% ≤ 10% were indicative of accuracy in predictive equations for RMR22,31,32,33,34. Similar references are lacking for TEE predictive equations accuracy, but the same criteria will be used in this study to describe accuracy. In addition, accuracy at an individual level was calculated as the percentage of individuals with a predicted TEE within ± 10% of the measured TEE.

One-way ANOVA analyses were used to test the effect of sex on the equation’s accuracy. The biases of the predicted TEE were examined against age, sex, BM, height, FM, FFM, percentage of FM, percentage of FFM, VM CPM, and PAL by multiple regression (General Linear Models—GLM), with backward deletion, avoiding multicollinearity. This analysis was made in the entire sample and separately by sex which allowed us to assess if our participants’ characteristics and PA were affecting the error magnitude of the estimations. Lastly, GLMs with backward deletion were also applied with measured TEE as the dependent variable in our total sample and separately by sex.

As it was expected, the PAL of our participants affected the performance of the predictive equations, so those individuals with PAL ≤ 1.89 (n = 28) were re-analyzed following the same procedures previously described. The cut point for PAL was set at 1.89 to eliminate very active subjects35 from the calculations. To avoid redundancy, these analyses were only repeated in the three most accurate equations in the entire sample (Plucker 320, Pontzer23, Vinken119). These subsets of analyses may improve the applicability of our results, allowing comparisons with other populations that are more sedentary.

Results

Accuracy of the predictive equations in the whole sample

The characteristics of our sample are shown in Table 1. Males had a significantly higher BM, height, FFM, and %FFM, but a lower %FM compared with females (p < 0.001). RMR and TEE were also significantly higher for males (p < 0.001). However, there were no significant differences in the remainder of the characteristics (age, BMI, FM, PAEE, PAL, Steps/d, and VM CPM) between males and females.

Table 1 Summary characteristics of the sample.

The comparisons between estimated and measured TEE, positive MAPE, and positive mean difference (%) indicated that all predictive models underestimated the TEE in the entire sample (Table 2) and for females (Table S2). However, the Plucker3 model was the only one overestimating the TEE of males, but with a notable individual variability (average of 68 kcal ± 613) (Fig. 1). Based on the accuracy criteria established in this study, the Plucker3 equation performed the best in the entire sample (Table 2) and better in males than females (Tables 3 and S2). Plucker4 also predicted TEE in males more accurately (Tables 3 and S2). However, all of the models applied had an RMSE% > 10%, indicating generally low performance of the equations at an individual level. Accordingly, Plucker3 showed the highest percentage of individuals with a predicted TEE within ± 10% of the measured value, close to 43% in the entire sample and 55% of the males (Tables 2 and S2). This generally indicates a lower precision of the equations.

Table 2 Validity and accuracy of equations to estimate TEE (kcal/day) in the whole sample and for those subjects with physical activity levels (PAL) ≤ 1.89.
Figure 1
figure 1

Bland–Altman plots for measured and predicted TEE applied to the whole sample (n = 56). Purple (♀ = females), blue (♂ = males), and black (both sexes) dotted lines represent the relationship between the magnitude of the TEE and the extent of error of the predictive equations by sex (homoscedasticity or heteroscedasticity). When β (slope of the line) is shown, heteroscedasticity is significant (p-value < .01). Green solid line shows the mean difference between measured and predicted TEE for each model. Orange dashed lines show the limits of agreement (Bias ± 1.96*Standard Deviation).

Table 3 Summary of equations meeting 2 criteria to be considered accurate when applied to the whole sample and dichotomized by sex (♀ = Females; ♂ = Males) and also for those subjects with physical activity levels (PAL) ≤ 1.89.

All the equations showed large limits of agreement and RMSE (Table 2 and Fig. 1). Some equations showed heteroscedasticity (p < 0.01, black dotted line in Fig. 1), especially when each sex was considered separately (purple [females] and blue [males] dotted lines in Fig. 1). Sex influenced some indicators of accuracy; the performance of the equations was generally poorer when applied to females: significantly higher bias, MAPE, mean difference %, and lower accuracy (%) (see details in Table S2).

GLM regressions showed that in addition to age or body composition, physical activity (VM CPM and/or PAL) impacted the bias of the models in the whole sample and by sex (Table S3). Therefore, the error of the estimations was generally higher for more active participants.

On the other hand, GLM with measured TEE as the dependent variable showed BM, FM, and VM CPM predicted 61% of the variability in our total sample (Table S4). For males, 73% of TEE variability was explained by FFM and VM CPM. However, only 52% of TEE variability was explained for females, with RMR as the only significant variable in the model (Table S4).

Accuracy of the predictive equations in subjects with PAL ≤ 1.89

When the subset of subjects with PAL ≤ 1.89 were compared to the whole sample, significant differences were only found for PAEE and the average PAL (significantly lower, p < 0.05) (Table 1). The statistical differences detected by sex in the whole sample remained in this subset of participants (Table 1).

The three equations applied (Plucker3, Pontzer2, and Vinken1) performed significantly better in these subjects than when applied to the entire sample (Table 2). Although Pontzer2 was the only equation slightly underestimating TEE (average of − 44 kcal ± 358), predicted TEE averages were not significantly different from measured TEE in any equation. The mean difference (%) was < 10% for the three equations. The only accuracy metric not accomplished was %RMSE ≤ 10%, although the results were better than for the entire sample. Pontzer2, followed by Vinken1, were the most accurate equations (lower bias, mean difference%, and %RMSE, Table 2).

Lower performance at the individual level was still detected in these subjects (large limits of agreement) (Table 2), but Pontzer2 did not show heteroscedasticity when applied to these subjects, and heteroscedasticity was reduced for Vinken1 (Fig. 2). The percentage of individuals with a predicted TEE within ± 10% of the measured value (precision) was close to 50% for both sexes, and notably higher (67%) for females with the Plucker3 equation (Accuracy (%), Tables 2 and S2). Although higher in this subset, the precision of the equations can still be considered low.

Figure 2
figure 2

Bland–Altman plots for measured and predicted TEE for those subjects with physical activity levels ≤ 1.89 (n = 28). The three equations shown are the ones with a better performance in the entire sample. Purple (♀ = females), blue (♂ = males), and black (both sexes) dotted lines represent the relationship between the magnitude of the TEE and the extent of error of the predictive equations by sex (homoscedasticity or heteroscedasticity). When β (slope of the line) is shown, heteroscedasticity is significant (p-value < .01). Green solid line shows the mean difference between measured and predicted TEE for each model. Orange dashed lines show the limits of agreement (Bias ± 1.96*Standard Deviation).

Sex influenced some indicators of accuracy (Table S2), so Plucker3 and Vinken1 were more accurate for females and Pontzer3 for males (Tables S2 and 3). In this subset of analyses, Vinken1 was the only equation being accurate for both sexes (Tables S2 and 3).

GLM regressions for the bias of the models also showed that, besides characteristics like age, BM, or body composition, physical activity (VM CPM and/or PAL) positively influenced the error of the estimations (Table S3). On the other hand, measured TEE was more predictable in this subset of individuals (GLM), with AdjR2 ranging from 73.59 to 76.98%, using BM, RMR, and VM CPM as independent variables (see details in Table S4).

Discussion

The major finding of this study was that all of the models applied on average underestimated the TEE in our entire sample, and none met all the criteria to be considered accurate. This underestimation was usually greater with increasing TEE, which in our sample also indicated higher physical activity levels (as PAL or as VM CPM). The equation published by Plucker et al.20 based on age, body mass, height, and RMR was the most accurate in predicting the TEE in our entire sample (average bias of 195 kcal). However, the accuracy and precision of the equations applied were significantly improved when less active individuals (PAL ≤ 1.89) were considered separately. In this case, Pontzer23 (underestimating an average of 44 kcal) and Vinken119 (overestimating an average of 58 kcal) were the most accurate. The latter observation was independent of the sex of the participants.

The RMSE% > 10% reflected the sizable errors at the individual level for all the equations. The heteroscedasticity detected (Figs. 1 and 2) and the influence of our participant's PA on the equation's accuracy (Table S3) contributed to the differences between our population and those used to develop the predictive models (Table S1). Our participants were younger with lower BMI and notably higher TEE than those in Plucker et al.20, Pontzer et al.3, and Vinken et al.19 (Tables 1 and S1). The latter was a consequence of the higher levels of PA in our sample.

It is noteworthy to highlight the poor performance of the equations that included accelerometry-derived PAEE (Vinken2 and Vinken3, Table 2 and Fig. 1). The limitations of considering accelerometry PAEE to predict TEE include the use of models from different manufacturers, different wear locations, and/or sampling frequency (see references in Fernández-Verdejo and Galgani10). Therefore, the lack of standardization may reduce the applicability of these predictive equations. In this regard, our findings (RMR + ACC PAEE in Table 2) are consistent with a significant underestimation of TEE derived from accelerometry PAEE in free-living conditions11,36. As previously mentioned10, improving TEE estimations by including objectively measured PA remains a challenge, and accelerometry-derived PAEE accuracy is highly variable13. Standardization in the units of measurement, sharing the technical specifications and computational methods of the manufacturers, and better calibrations of PAEE against gold-standard techniques may improve the performance of these equations in future studies.

Interestingly, the three metrics of accuracy were differentially impacted by sex in the equations evaluated (Table S2). Contrary to the RMR prediction in our sample22, the equations evaluated in the present study were more accurate in males than females (Table S2), especially when the whole sample is considered (Table S4). Finally, subject characteristics and PA accounted for a greater portion of the variability in bias for males than females (Table S3). Future studies need to address sex in the development of equations to predict TEE, as has been suggested for RMR prediction22,37.

As also pointed out by Fernández-Verdejo and Gaglani10 and Macena et al.38, there is a gap in the literature validating TEE equations’ accuracy across different populations. We are aware that some of the equations with higher accuracy and precision in our study may not easily apply for others due to the inclusion of the RMR or body composition parameters. However, in our sample, these are also the factors remaining in the models to predict TEE (Table S4). This agrees with previous studies20,39,40,41 (but see Tudor-Locke et al.42), but challenges the possibility of accurately predicting TEE based on simple factors such as age, sex, height, and body mass, especially if individuals with different physical activity levels are considered (Table 2).

Although finding a unique predictive model for energy requirements across populations may be difficult20 and predictive equations perform poorly at an individual level (low precision), these equations are still essential for many clinical interventions20,43, i.e. to determine nutritional needs, target energy balance, or health improvements and lifestyle changes. Moreover, the average bias of the equations performing best in our sample was less than 195 kcal (Table 2). Together with other metrics of accuracy in Table 2, this indicates that these equations may be superior to other techniques to approximate TEE, like self-reported energy intake44, motion sensors45, and heart rate monitoring46, without the necessity of individual calibrations.

Practical applications and strengths

Our analyses indicated that when a large variability in physical activity levels (from sedentary to very active) was considered, Plucker3 equation was most accurate in predicting TEE at a population level, but less so at an individual level (low precision) and for females alone. Additionally, RMR is included in Plucker3 model, which may not be available or, if estimated, may add error to TEE estimation.

The Pontzer2 equation was the most accurate model, particularly for males, when less active individuals (PAL ≤ 1.89) were considered separately. However, the Vinken1 equation was the most accurate, for both males and females, among the equations that did not rely on body composition. Nonetheless, caution must be taken as our analyses suggested that the error of the predictions was influenced by sample parameters like body composition and physical activity.

There are several strengths of our study. First, we included gold-standard techniques for the measurement of body composition and both RMR and TEE. In addition, we quantified habitual physical activity of our participants over a 14-day period using accelerometry42,47,48. Second, the inclusion of similar numbers of males and females and a uniform distribution of physical activity levels allowed us to consider if these variables impact predictions. Third, body mass and composition stability were documented over the measurement period, thus avoiding a potential confound of energy imbalance. Lastly, we utilized objective metrics for evaluating the accuracy of published prediction equations and, in so doing, may serve as a reference to others seeking to develop and validate new equations.

Limitations of the study

There are some limitations that should be considered. First, our sample size was relatively small. As such, different results may be obtained in a larger sample. Second, the participants in our study were primarily Caucasian, young, with normal weight, and generally more active than the US population49. As such, the accuracy of the predictive equations may be different when applied to other groups. Finally, the assumed respiratory quotients (RQ) used to calculate TEE in the DLW calculations25 may have impacted the accuracy of the predictive equations.

Conclusions

The present study demonstrated that available published equations tended to underestimate TEE in our sample. Although some models were accurate in predicting TEE across a wide spectrum of habitual physical activity and in the less active participants (i.e., PAL ≤ 1.89), precise prediction of TEE at an individual level remains a challenge. More studies are needed to develop and validate predictive equations that do not rely on a classic additive conception of the TEE. The validation of these equations in diverse populations is necessary to improve application.