Background

This article reports on the cost-effectiveness of a proactive, integrated primary care approach compared with usual primary care for community-dwelling frail older persons in the Netherlands. We evaluated the Finding and Follow-up of Frail older persons (FFF) approach, which aims to maintain or improve older people’s well-being and is implemented by part of the Dutch general practitioners (GPs). The FFF approach consists of proactive identification of frail older persons in the community and subsequent multidisciplinary (including professionals with geriatric expertise) consultations and individualized follow-up coordinated by case managers. Integrated care and support is widely acknowledged to be a key initiative in improving care and support for older persons [1]. In addition, integrated care approaches, like the FFF program, may help to maintain community-dwelling frail older persons’ well-being [2]. Over the years, a shift has occurred from a disease-oriented care model toward a more proactive and integrated approach [3]. Traditional disease-specific care delivery approaches for frail older persons, who often have multiple conditions, do not meet these individuals’ comprehensive (healthcare) needs [4,5,6,7,8]. Moreover, frailty has been associated with increased utilization of primary, hospital, and nursing home care [9, 10]. The provision of high-quality care and support to the growing number of frail older persons poses a challenge [11, 12], and the comprehensive (healthcare) needs of this population place a burden on healthcare resources [13]. Integrated care initiatives are assumed to improve quality of care and ultimately aim to enhance patient outcomes while making efficient use of healthcare resources [14, 15]. Important elements of integrated care are: (i) a proactive approach that is coordinated effectively around a person’s health and social care needs; (ii) a patient-centered approach in which a person is involved in decision-making and care processes, and the person’s needs are taken into consideration; (iii) an approach in which multiple interventions are delivered (simultaneously); and (iv) a multidisciplinary approach in which professionals from multiple disciplines are involved [3]. GPs are considered to be key actors in the implementation of promising initiatives targeting frail older persons [9]. Many integrated care initiatives have emerged and are implemented in the primary healthcare sector, but evidence of their effectiveness and cost-effectiveness remains mixed [3, 16,17,18,19,20,21]. Integrated primary care programs for frail older persons have shown no effect on the majority of outcomes, and evidence for their cost-effectiveness is limited [16]. Although the FFF approach has been found to have positive effects on the quality of care as perceived by healthcare professionals [22], and to achieve improvements in older persons’ perceived care quality and coproduction of care over time [23], its cost-effectiveness has yet to be investigated. Therefore, the aim of the present study was to evaluate the cost-effectiveness of the FFF approach in a population of community-dwelling frail older persons.

Results

Table 2 shows the background characteristics of the study population at baseline. In total, 72.4% of participants were female, 41.8% had a low educational level, and 94.4% were considered to be frail according to the TFI (mean TFI score, 7.38) in both groups. At baseline, compared with participants in the control group, older persons in the intervention group were significantly less often single (p < 0.05). No significant difference in mean age or the proportion of older persons with multimorbidity was observed between the groups.

Table 2 Background characteristics of older persons in the two study groups at baseline

Table 3 shows the mean QALYs (with utilities based on the EQ-5D) and mean well-being scores (SPF-ILs) at T0 and T1 using the imputed dataset. Independent samples t-tests showed no statistically significant differences in QALYs between groups at T0 or T1 (univariate analysis). Paired sample t-tests showed a statistically significant improvement in QALYs over time in the control group (∆0.05; p < 0.05), but not in the intervention group (∆0.04; p = 0.07). Without imputation of missing values, the data also showed a significant improvement in terms of QALYs in the intervention group over time (paired sample t-test, ∆0.05; p < 0.05). Well-being did not differ significantly at T0 or T1 between the control and intervention groups, or over time in either group. Additional file 2: Table S2 displays the mean QALYs and SPF-ILs results of the univariate analyses based on data without imputation of missing values. Analyses based on data of matched participants, i.e., pairs with complete data, yielded comparable findings; independent samples t-tests showed no significant differences in mean QALYs and mean well-being scores between the groups at T0 and T1 (see Additional file 3: Tables S4–S7).

Table 3 Well-being and QALYs at baseline (T0) and 12 months (T1)

Multilevel analyses of SPF-ILs scores adjusted for background variables and baseline values showed a small but significant difference between the intervention group and control group for well-being at follow-up, in favor of the control group (− 0.09 (with imputation) and − 0.10 (without imputation)). No significant differences between the groups in terms of QALYs were observed (− 0.03 (with imputation) and − 0.02 (without imputation)) (Additional file 4: Tables S8–S11). Regression analyses to investigate multivariable relationships among the variables yielded comparable results as the multilevel analyses (details not shown). The multilevel analyses were redone for the propensity score matched group which showed a significant difference between the groups for well-being in favor of the control group [(− 0.09 (with imputation) and − 0.10 (without imputation)]. We found no significant differences in QALYs between the intervention group and control group [− 0.03 (with imputation) and − 0.02 (without imputation)]. For details see Additional file 5: Tables S12–S19.

For the imputed dataset, mean total costs were 7717 euros (SD, 9824 euros) in the control group and 9182 euros (SD, 11,754 euros) in the intervention group at baseline (independent samples t-test, p = 0.15; Mann–Whitney U-test, U = 28,618.50, t = 1.18, p = 0.24; Table 4). At 12 months, mean total costs were significantly higher in the intervention group (11,659 [SD, 14,600] euros; including intervention costs) than in the control group (8902 [SD, 11,227] euros) (independent samples t-test, p < 0.05; Table 5). In addition, differences in the median total costs at follow-up were statistically significant (Mann–Whitney U-test, U = 29,952.00, t = 2.11, p < 0.05). The mean total costs increased significantly over time in the intervention group (paired sample t-test, p < 0.05), but not in the control group (paired sample t-test, p = 0.14). The difference in median costs between T0 and T1 was significant in the intervention group (related-samples Wilcoxon signed rank test, t = 3.18, p < 0.05) and in the control group (related-samples Wilcoxon signed rank test, t = 2.34, p < 0.05). Based on the data without imputation of missing values, no statistically significant differences in total costs between the control and intervention groups were found at baseline (independent samples t-test, p = 0.15; Mann–Whitney U-test, U = 17,165.50, t = 0.60, p = 0.55) and 12 months (independent samples t-test, p = 0.09; Mann–Whitney U-test, U = 14,375.50, t = 1.09, p = 0.28). For details see Additional file 2: Table S3. In addition, univariate analyses for the propensity score matched group yielded comparable results; for the imputed dataset mean total costs were significantly higher in the intervention group compared with the control group at 12 months. Based on data without imputation of missing values, no significant differences in mean total costs between groups were observed at both time points. Univariate analyses based on data of matched participants, i.e., pairs with complete data, showed no significant difference in mean total costs between the groups at baseline and follow-up. See Additional file 3: Tables S4–S7.

Table 4 Healthcare use and costs (in euros) per patient per year in the intervention and control groups at baseline
Table 5 Healthcare use and costs (in euros) per patient per year in the intervention and control groups at 12 months

Using the imputed dataset, estimated differences in effectiveness and costs were both in favor of usual care, producing an ICER of − 14,788 euros per SPF-ILs point and an ICUR of − 126,711 euros per QALY, indicating the FFF approach is inferior in both approaches. In Fig. 2 (cost-effectiveness plane for costs versus effects in terms of well-being; SPF-ILs), 0.9% of all bootstrapped ICERs appear in the southeast quadrant (dominance; FFF approach is more effective and less costly), 78.9% appear in the northwest quadrant (inferiority; FFF intervention is more expensive and less effective), 1.5% appear in the northeast quadrant (FFF intervention is more effective, but also more expensive) and 18.7% appear in the southwest quadrant (FFF intervention is less costly, but also less effective). The probability that the FFF approach is cost-effective ranges between 0.9% and 21.1%, depending on the cost-effectiveness ratio a decision maker could apply for policy decisions. In Fig. 3 (cost-effectiveness plane for costs versus effects in terms of QALYs), 9.0% of bootstrapped ICURs are located in the southeast quadrant, 54.4% appear in the northwest quadrant, 26.1% are located in the northeast quadrant, and 10.5% are located in the southeast quadrant. The probability that the FFF approach is cost-effective ranges between 9.0% and 45.6%, depending on the cost-effectiveness ratio applied.

Fig. 2
figure 2

Cost-effectiveness plane for costs (in euros) versus effects (SPF-ILs; range 1–4) adjusted for baseline differences; data after imputation of missing values

Fig. 3
figure 3

Cost-effectiveness plane for costs (in euros) versus effects (QALYs; range − 0.33 to 1) adjusted for baseline differences; data after imputation of missing values

Although different analyses (e.g., univariate, multilevel, and propensity score matched analyses, with and without imputation) showed slightly different results with respect to estimated costs and effects, the data suggest that the FFF approach is most likely not cost-effective compared with usual primary care in the Netherlands in terms of well-being and QALYs over a 12 month-period, irrespective of analytical approach and method of handling missing values.

Discussion

The results of our economic evaluation indicate that proactive, integrated care for community-dwelling frail older persons as provided in the FFF program is most likely not a cost-effective initiative compared with usual primary care in the Netherlands, in terms of well-being and QALYs over a 12-month period. Our results are in line with outcomes of other studies investigating the cost-effectiveness of integrated care for frail older persons in the primary care setting in the Netherlands [43]. The comparability of integrated care programs and evaluation studies is limited due to differences in study populations, interventions and outcomes [16].

One explanation for the lack of effect may be the conceivably small difference between the FFF approach and usual primary care services in the Netherlands. There are indications that reforms in the primary care system in the Netherlands resulted in developments in the control GP practices to improve their care delivery. Although these practices did not provide care and support according to the FFF approach, several control GP practices implemented interventions, such as systematic follow-up of older adults and multidisciplinary consultation, during the study period [22]. In addition, the lack of effectiveness of complex interventions may be partly due to failure to (fully) implement the programs as intended [44]. Indeed, we have found suboptimal implementation of intervention components in GP practices organizing care according to the FFF approach [22]. Most interventions, and especially complex care programs like the FFF approach, require extensive time and effort to achieve full implementation [45, 46]. We noted differences among intervention GP practices with respect to the implementation and execution of the FFF program, including differences in the selection of older persons for proactive screening, the (number of) professionals involved in screening procedures, the organization of multidisciplinary consultations (e.g., frequency, number of patients discussed, (type of) professionals involved), and the organization of long-term follow-up of frail older persons [22]. These differences may obscure the added value of the FFF approach in terms of QALYs and well-being. Analyses based on matched participants of intervention GP practices with a high degree of implementation of (FFF-related) interventions (i.e., practices that implemented more interventions than average) [22], showed that the mean SPF-ILs score was higher, indicating greater subjective well-being, compared with participants in other intervention GP practices (Additional file 6: Tables S20–S23). Therefore, the degree of implementation may have an effect on effectiveness of complex interventions like the FFF approach. For a detailed description of implemented (FFF-related) interventions in the GP practices see Vestjens, Cramm and Nieboer [22]. Even with optimal implementation of such interventions, clinically meaningful improvement in outcomes is not guaranteed in the short term [45]. The length of the study period, being 12 months, may have been too short to detect improvements in older persons’ outcomes [20]. Especially in the short term, variations in costs and effects can be expected [47]. Patterns of healthcare utilization show, for example, a substantial increase in primary and hospital care utilization in frail older persons [9]. Consequently, the identification of frailty and introduction of interventions to postpone or prevent a decline into worse health states [48] may result in higher healthcare costs in the short term, but might reduce use of more expensive healthcare services and adverse outcomes in the long term [9]. Another explanation might be related to the heterogeneity of the population of older persons considered to be frail. No consensus has been reached about the conceptualization and measurement of frailty in older persons. Major approaches include the frailty phenotype, which focuses on physical aspects of frailty [49, 50], and a multidimensional approach to frailty including, for example, physical, social, and psychological factors [51]. Although we used a multidimensional approach to assess frailty in this study, Looman and colleagues [52] showed that distinction among domains of frailty does not fully capture its complexity. The TFI [30], which we used to measure (the degree of) frailty in older persons, does not discern among types of underlying problems in these domains or weigh different domains [52]. Researchers have suggested that the heterogeneity of frailty should be taken into account in the evaluation of integrated care programs [52], especially to better understand how interventions can be optimally aligned with different well-being needs of frail older persons [2].

Strengths and limitations

One strength of this study is that we measured the subjective well-being of community-dwelling frail older persons along with health-related quality of life. QALY measures in economic evaluations are based predominantly on aspects of health-related quality of life alone. Care programs for older persons may also aim to improve non-health related domains of quality of life. Thus, the sole use of health-related quality of life measures in economic evaluations may not be appropriate, as it may not capture broader benefits of such interventions beyond health [53]. Consequently, Makai and colleagues [53] recommended the inclusion of well-being measures with health measures like the EQ-5D in economic evaluations of care programs for older persons. We did so, although the different perspectives did not lead to different recommendations regarding the preference of the FFF intervention. Another strength of our study is the quality of the data gathered. We used dedicated, trained interviewers who collected the data in face-to-face interviews during home visits. All interviewers lived in the western North Brabant Province, assuring a cultural fit, and had backgrounds in healthcare. Moreover, we used a detailed resource use questionnaire covering a wide range of healthcare categories to assess healthcare utilization at the individual level. We included care disciplines that are frequently not included in studies, such as paramedical (e.g., physiotherapy) and psychological care, which may have increased content validity. Our study also has several potential limitations. First, we used a quasi-experimental design, which is more susceptible to bias due to the absence of randomization [54]. To increase comparability of the intervention and control groups, we used one-to-one matching based on key covariables. Despite this effort, the control group contained significantly more single persons than did the intervention group. Moreover, we noted indications (based on interviews with healthcare professionals and (project) managers) of a strong motivation to organize care and support for the elderly population in some control GP practices. Professionals in these practices may have perceived that the FFF program would not add value to their usual care practices and were therefore perhaps especially eager to participate in the control group. Second, recall bias might have occurred due to the retrospective assessment of service use in the preceding 12 months. Under-reporting and over-reporting of effects have been found in previous research in which health service utilization was assessed retrospectively [55]. Unfortunately, we were not able to include administrative or registry data to complement the reported healthcare service use. Nonetheless, given the same data collection procedure in both groups, we have no indication that recall bias varied significantly between the intervention and control groups. Third, mean standard costs of the FFF program were estimated, instead of assessing intervention costs for individual participants. We attempted to avoid duplicate inclusion of costs by including service use related to the follow-up of older patients in the FFF context only in healthcare costs, and not in intervention costs. The implementation and execution of (elements of) the FFF approach differed among intervention GP practices [22]. However, results of sensitivity analyses in which intervention costs were varied to test the robustness of the estimated ICER and ICUR did not affect the overall recommendation regarding the preference of the FFF program. Fourth, despite recommendations [37], we were unable to collect data on informal care due to practical considerations. The impact of informal care costs on the mean total costs in the intervention and control groups remains unknown, although we found no indication (based on interviews with healthcare professionals, (project) managers, and frail older persons) of unequal distribution of informal care costs between groups. In addition, we did not account for medication costs in either group or intervention training and implementation costs in the FFF group. We have noted no indication that medication use differed between groups.

Conclusions

Our study findings add to the current unconvincing body of evidence with respect to the cost-effectiveness of integrated primary care aimed at community-dwelling frail older persons. Future economic evaluations should use sufficiently long follow-up periods to assess durable costs and effects, adopt a societal perspective, and take into account the degree of implementation and the target population. Continued effort is required to unravel the black box of integrated care and find (cost-)effective (components of) programs for community-dwelling frail older persons.