Introduction

Studies on frailty are increasing in literature on aging. There is no consensus on its definition, but it is generally recognized as a state of increased vulnerability that is associated with high risk of adverse outcomes, such as falls, disability, and even mortality [1, 2]. Traditionally, frailty was considered a unidimensional physical construct [3]. A broader paradigm is supported by other researchers who refer to a multidimensional approach with physical, psychological, and social factors, which interact and disturb the physiological balance [4].

Within this construct, the Tilburg Frailty Indicator (TFI) is a multi-domain frailty instrument, developed in 2010 as a screening tool for frailty [4, 5]. It has been translated and validated into multiple languages as Portuguese [6, 7], Polish [8, 9], Italian [10], German [11], Danish [12], Spanish [13], Arabic [14], Persian [15], Greek, and Croatian [16]. Several authors have reported low internal consistency estimates for the psychological and particularly the social frailty domains [4, 6, 7, 9,10,11, 13, 14, 16,17,18]. Additionally, lower predictive capacity has been found for the psychological and social components, especially the social one [19, 20].

Among all validations, only the Spanish [13], Turkish [18], and Taiwanese [21] studies have analyzed TFI’s structural validity. Confirmatory Factor Analysis (CFA) gave some support for the three domains of frailty but found poor indicators of the physical and the social domain with low factor loadings [13, 18, 21], suggesting that the TFI model in its current form is not entirely supported by the data [13]. In addition, the Turkish CFA also had some limitations, as no information about the estimation method or the correlations among factors was provided. On the other hand, a recent systematic psychometric review [22] of this measurement instrument concludes that, despite the large number of validation studies available, it is necessary to continue accumulating evidence on metric properties such as the structural validity of this tool.

Additionally, research on TFI has been limited to community-dwelling older adults. Thus, further studies involving institutionalized older adults could contribute to test its applicability to other contexts [15, 22]. Therefore, the aim of this study is to further validate the Spanish version of the TFI by Vrotsou et al. [13] in both institutionalized and community-dwelling older adults. The following psychometric properties will be assessed, with an emphasis on the factor structure including: (1) structural validity; (2) internal consistency; and (3) convergent and divergent validity.

Materials and methods

Population and study design

This cross-sectional study was carried out between 2018 and 2021. A convenience sample of 457 older adults aged ≥ 65 years was included. Community-dwelling older adults were recruited from several community settings (n = 322), and institutionalized participants from nursing homes (n = 135). Exclusion criteria included Mini-Mental State Examination < 18 points, acute disease, inability to walk, and hospital admission or unstable chronic disease in the last month. All participants signed an informed consent form. Ethical approval was given by the Ethics Committee for Human Research of the University of Valencia (H1542733812827). The research was conducted in accordance with the principles of the Declaration of Helsinki and was registered at http://www.clinicaltrials.gov (ID: NCT03832608).

Measurements

Tilburg Frailty Indicator was measured with a 15-item questionnaire, addressing physical (8 questions), psychological (4 questions), and social domains (3 questions). All items were dichotomized and scored with 0 points (absence) or 1 point (presence), and summed to obtain the total score ranging from 0 to 15 [4].

Alternative frailty assessment tools were included: Fried’s Frailty Phenotype [3] has five criteria assessing unintentional weight loss, exhaustion, low physical activity, reduced grip strength, and reduced gait speed; the Edmonton Frailty Scale [23] evaluates nine domains of frailty: cognition, general health status, functional independence, social support, medication usage, nutrition, mood, continence and functional performance; the FRAIL Scale [24] is a five-item screening tool including fatigue, resistance, ambulation, illness, and weight loss components; and finally, the Kihon Checklist (KCL) [25] is a self-report multidimensional screening tool with seven domains: instrumental activities of daily living, physical strength, nutrition, eating, socialization/isolation, memory, and mood. All participants were interviewed for the questionnaire’s completion and assessed for physical tests by trained researchers in a single session.

Statistical analysis

SPSS 26 was used to calculate descriptive statistics for the variables under study, and to obtain Cronbach’s alpha coefficients, corrected item-total correlations, and correlations among the dimensions in the TFI and external criteria. Additionally, an R function was used for alpha coefficients confidence intervals. Given the nature of the study with voluntary participation and interviewers present, there was a very low percentage of missing data. There was only one missing data point (0.2%), from the institutionalized sample in a single indicator. With such very low level of missingness in the datasets, there is no need to handle the missing data, and therefore list-wise selection was employed across the statistical analyses. The factor structure was tested with CFAs estimated with Weight Least Square Mean and Variance (WLSMV) corrected estimation in Mplus 8.6. WLSMV was selected because the variables are binary and lacked multivariate normality. Several fit indices were used for assessing model fit: Chi-square statistic; Comparative Fit Index (CFI); Root Mean Square Error of Approximation (RMSEA); and Standardized Root Means-square Residuals (SRMR). Criteria for reasonable fit were [26]: a CFI of at least 0.90, and a RMSEA and SRMR less than 0.08 together, indicate adequate fit. The Composite Reliability Index (CRI) for each dimension in the scale was calculated, as a superior measure of internal consistency compared to alpha. Values ≥ 0.70 represent good internal consistency [27].

Finally, Spearman’s correlations were used to study the convergent and divergent validity of the physical, psychological, and social domains of the TFI with other frailty assessment tools. Based on Cohen’s criteria, a correlation coefficient of 0.10 ≤ 0.30, 0.30 ≤ 0.50, and ≥ 0.50 indicated weak, moderate, and strong correlations, respectively [28]. Additionally, very similar guidelines are those in the COSMIN guide: adequate validity is shown if r ≥ 0.50 for similar constructs, r = 0.30–0.50 for related constructs, and r < 0.30 for unrelated constructs.

Results

Descriptive statistics are presented as means and standard deviations or percentages for the variables in Table 1, and for each of the items of the TFI in Table 2.

Table 1 Main descriptive characteristics of the sample (n = 457)
Table 2 Descriptive data for the TFI items

Structural validity

Two CFA models were estimated in both community-dwelling and institutionalized older adults. These models were: a one-factor solution (frailty); and a three-factor solution with the three frailty domains: physical (items 1–8), psychological (items 9–12), and social (items 13–15).

The one-factor solution had a poor fit: χ2(90) = 167.60, p < 0.001; RMSEA = 0.052; CFI = 0.847; and SRMR = 0.102. The three-factor model had better fit, but was still unsatisfactory: χ2(87) = 138.06, p < 0.001; RMSEA = 0.043; CFI = 0.899; and SRMR = 0.099. Additionally, no theoretically sound modification index could help in terms of fit. The factor loadings for items 2, 5, and 6 in the physical domain were all lower than 0.4. When deciding for this limit, it must be borne in mind that a factor loading of 0.4 indicates that only a 16% of the variance of the indicator is shared with the dimension that pretends to measure. Specifically, the standardized factor loadings for the three items were: 0.057 (p = 0.763) for item 2; 0.186 (p = 0.028) for item 5; and 0.372 (p < 0.001) for item 6. Apart from the statistical considerations of low relation with the dimension, there are substantive reasons that may also explain why these items behaved poorly. Regarding item 2, maybe it is difficult for an old adult to estimate what is an involuntary large amount of weight loss. Regarding items 5 and 6, they recall worsening of audition and vision, respectively. A worsening of these conditions is natural in the old age, but it may not be followed by functional problems, and therefore maybe unrelated to frailty. Given the items do not relate this worsening with functional problems in these areas, this may be an explanation for the poor functioning of these items. Therefore, we removed these items and estimated the CFA again. This time model fit was better and reasonable, as two of the three fit indexes were acceptable: χ2(51) = 92.38, p < 0.001; RMSEA = 0.050; CFI = 0.918; and SRMR = 0.094. Standardized factor loadings for this final model are shown in Fig. 1. The same CFA models were estimated for institutionalized older adults. The one-factor model had a poor fit: χ2(90) = 143.37, p < 0.001; RMSEA = 0.066; CFI = 0.920; and SRMR = 0.135. The three-factor model had a better fit, and two out of three fit indexes were in the acceptable range: χ2(87) = 119.43, p < 0.001; RMSEA = 0.053; CFI = 0.951; and SRMR = 0.124. Nevertheless, items 2, 5, and 13 had very poor factor loadings, and item 13 (live alone) had no variability (was almost constant). Specifically, the standardized factor loadings were: 0.37 (p = 0.002) for item 2; 0.39 (p < 0.001) for item 5; and 0.21 for item 13 (p = 0.11). Substantive reasons for the poor functioning of items 2 and 5 were already mentioned. The case of item 13 in the institutionalized people is clear, they do not live alone by definition, and the item should be avoided in the scale altogether when it is used in this population. Therefore, these items were removed, and a new three-factor model estimated. The new model had a better fit, as only the SRMR was a little above the acceptable cut-off: χ2(51) = 92.38, p < 0.001; RMSEA = 0.050; CFI = 0.910; and SRMR = 0.094. Standardized factor loadings are shown in Fig. 1.

Fig. 1
figure 1

Final confirmatory factor analysis (CFA) standardized parameter estimates for the Tilburg Frailty Indicator in community-dwelling and institutionalized older adults

Reliability estimates

Internal consistencies for the community-dwelling older adults were: alpha for the physical domain = 0.629, 95% CI [0.560, 0.689] with CRI = 0.803; alpha for the psychological domain = 0.410, 95% CI [0.297, 0.508] and CRI = 0.662; and for the social domain, alpha = 0.315, 95% CI [0.174, 0.435] and CRI = 0.518. The estimates for the institutionalized older adults were: for the physical domain, alpha = 0.764, 95% CI [0.696, 0.820] and CRI = 0.894; the psychological domain had an alpha = 0.608, 95% and CI [0.487, 0.705] and CRI = 0.769; and finally, the alpha for the social domain was 0.378, 95% CI [0.126, 0.557] and CRI = 0.682.

Convergent and divergent validity

Spearman’s correlations were calculated among the three dimensions of the TFI (physical, psychological, and social) and the domains of the alternative frailty scales (Fried’s Frailty Phenotype, Frail Scale, KCL, and Edmonton Scale). As there are several multidimensional instruments and dimensions that somehow relate to the three dimensions in the TFI, convergent validity will show if correlations among clearly related dimensions are large, and divergent validity will show if these correlations are lower for dimensions not so closely related. These correlations are presented in Table 3 (the items with poor behavior were removed prior to calculating the correlations).

Table 3 Correlation coefficients of the domains of the Tilburg frailty indicator with alternative frailty measures

Convergent validity of the physical domain was fair, it significantly correlated as expected with physical measures, but those correlations were not superior to 0.5. Results related to the analysis of this domain also suggested reasonable divergent validity, showing that the construct of this domain was unrelated with cognitive function (Edmonton cognition, KCL-memory) and social dimensions, except for the KCL-depressive mood and Edmonton mood psychological domains, whose constructs were similar to TFI physical domain in both samples.

The psychological domain correlated with other domains of the alternative frailty instruments, demonstrating some convergent and divergent validity with a similar pattern in both samples. However, the amount of the correlations cannot be considered adequate according to the COSMIN guide. Thus, this domain was related or similar to the psychological domains of the other scales used to compare (Edmonton mood, KCL-depressive mood, KCL-memory) and this psychological dimension was unrelated with physical and social domains.

The social domain suffers from both convergent and divergent validity problems. The correlations of this domain are unrelated with all other measures of frailty. This pattern of correlations does not even meet the COSMIN guideline adequacy criteria for related constructs. Although, in terms of convergent validity, there are related constructs in both samples with Edmonton’s social support dimension, there are unrelated constructs with KCL-socialization/isolation.

Discussion

Our findings aim to offer further insights on the TFI structure and provide evidence to improve its use in psychometric terms.

The need to confirm the structure of the scale is clear, given the available evidence gaps in some relevant measurement properties [13, 15, 21, 22]. Our results showed that one-factor model is not adequate, with similar results as Vrotsou et al. [13]. The three-factor model showed better fit, but items 2, 5, and 6 showed loadings < 0.40 with the physical domain, as previously showed by other authors [13, 18, 21]. Moreover, previous studies also found low factor loadings for items 13, 15 (social domain) [13, 18], and 14 (social domain) [18, 21]. It must be considered that low factor loadings indicate that the item (indicator) does not relate to the rest of indicators in the factor or dimension, and therefore cannot be aggregated to them. These findings, in line with Vrotsou et al. [13], indicate that the current TFI theoretical structure for the complete scale is not appropriate. Indeed, when certain items were removed, the factor structure was fixed.

Similarly, in the institutionalized sample, the three-factor solution fit better than the one-factor, but only fit well after depurating the poorly behaving items (2, 5, and 13). To our knowledge, no studies have previously analyzed the adequacy of the TFI in institutionalized older adults.

The need to review the TFI model and refine some indicators of the scale has also been suggested by other studies, when analyzing the poor correlations with other items or other similar measures. Among the indicators to be checked, the following have been pointed out: unexplained weight loss (2) [4, 10, 29], poor hearing (5) [10, 29], poor vision (6) [10, 29], ability to cope with problems (12) [10, 11], problems with memory (9) [10], living alone (13) [10], and social support (15) [10]. In the same vein, a recent longitudinal study testing predictive validity of the TFI excluded from the multivariate analyses the indicators poor hearing (5), poor vision (6), feeling down (10), and living alone (13), because they had p > 0.20 in the bivariate analyses [30].

A recent systematic review of the psychometric properties of the TFI [22] concluded, despite the 63 validation studies available, the need to continue accumulating evidence on relevant metric properties such as the structural validity to strengthen its use as a clinical decision-making tool. This review included two validation studies, in Spanish [13] and Taiwanese [21]. According to COSMIN guidelines, given the existence of two high-quality studies, the available evidence on the structural validity of this measurement instrument was graded as “sufficient”. However, this concern should be considered with caution. The Spanish validation [13] found poor values in 5 out of 15 items, whereas the Taiwanese validation [21] found very low loadings in 7 items. This was also true for the Turkish validation [18] not included in this systematic review. Therefore, the adequacy of the factor structure of the TFI needs more attention, which is in line with the conclusions of the aforementioned review [22].

Reliability estimates were not equal in both samples, showing better values for the institutionalized older adults. The CRI values for the physical and psychological domains were good (CRI > 0.70) in the institutionalized sample, while only the physical domain was satisfactory in the community-dwelling older adults. In both cases, social domain reliability estimates were not acceptable. These differences between the two groups may be due to a higher mean age, greater variability in the variables or a larger sample of frail people in the institutionalized group. These findings suggest that the TFI seems to be a good assessment tool to detect physical frailty, as indicated by other authors [9]. Furthermore, these findings are in line with the systematic psychometric review of Zamora-Sánchez et al. [22] in which only the TFI physical domain showed sufficient internal consistency of its scores. However, the psychological and social components of this scale should be cautiously considered depending on the context. The indicators of these domains should be carefully analyzed within the construct of frailty. Some items are not homogeneous enough (in terms of covariance) with their intended domains. Perhaps the way the question is written does not highlight the key point, or maybe these indicators could be antecedents or consequences of the process of frailty itself. Therefore, this issue should be studied in detail. Regarding the physical domain, some studies have shown good internal consistency varying from 0.70 to 0.79 [4, 6, 7, 9, 16,17,18] while others have shown low values varying between 0.57 and 0.68 [10, 11, 13, 14]. Internal consistency was not satisfactory in all studies, with Cronbach’s alpha varying between 0.43 and 0.67 for the psychological and between 0.05 and 0.49 for the social domains [4, 6, 7, 9,10,11, 13, 14, 16,17,18]. One plausible explanation could be related to the small number of items of these two domains, as stated by Gobbens et al. [4]. However, another possible explanation could be that the components of these domains, especially the social one, do not seem to measure what the scale intends to [18]. The mode of administration of the instrument could also influence the scores’ internal consistency [22].

In addition, although some authors refer to the adequate reliability of the scale considering the estimates of the total TFI [6, 7, 9, 14, 16,17,18, 21], if we consider that alpha assumes unidimensionality and that the TFI has several dimensions, there is no justification for an overall alpha, but for separate alphas for each dimension.

Convergent and divergent validity for the physical and psychological domains are acceptable given the obtained results, except for some psychological measures whose constructs are similar to the TFI physical domain (KCL-depressive mood and Edmonton mood). These results are in line with several studies in which the construct of alternative psychological measures was related or similar to the TFI physical component [6, 7, 10, 11, 16, 17]. These findings could be explained by the documented relationship between mental health and physical function [7, 17]. Regarding the social domain, our results showed unrelated constructs with most of the alternative measures used. These findings contradict some studies. Nevertheless, when analyzing the values obtained more thoroughly, some of them did not show a clear correlational pattern established in favor of its validity for at least one of the alternative measures related to social dimension, being unrelated (values below 0.30) [6, 10] or related with the rest of the psychological and physical domains [7, 16]. The available evidence shows inconsistent results regarding the association between TFI scores and different variables measuring related or similar constructs [22].

As mentioned before, previous validations have involved community-dwelling older adults. Thus, the use of the TFI in geriatrics still needs to be tested in different settings to explore its potential applications [4, 7, 10, 15, 18, 20, 22]. To the best of our knowledge, this is the first study investigating the validation of the TFI in a sample of institutionalized older adults. These data have been analyzed separately (community-dwelling and nursing homes samples) to compare the results and to assess the validity of the TFI scale as a measure of frailty in institutionalized older adults. Our findings show that a three-factor model is the most suitable one in this context, after removing items in the physical domain (unexplained weight loss and poor hearing) and in the social domain (living alone), since they do not covariate adequately with the other indicators or have no variability. Internal consistency, and convergent and divergent validity were good for the physical and psychological domains. Therefore, the TFI scale could be an acceptable instrument to assess frailty in nursing homes, interpreting the social domain with caution.

Conclusion

In conclusion, there is a need to revise the TFI structure in more detail and refine some items. Depurating items such as weight loss, poor hearing, and poor vision improve the psychometric characteristics of the scale. The physical domain is a cornerstone as a frailty measure both in community-dwelling and institutionalized older people. However, social component needs further clarification in psychometric terms but also in how it stands within the construct of frailty.