Introduction

Pulmonary rehabilitation (PR) is the cornerstone of nonpharmacological treatment in chronic respiratory diseases to reduce the burden of the symptoms [1, 2]. Defined as a ‘comprehensive intervention based on a thorough patient assessment followed by patient-tailored therapies that include, but are not limited to, exercise training, education and behavior change’ [2], PR aims to 1) improve physical condition, exercise tolerance, health-related quality of life (QoL), and psychological condition, 2) reduce symptoms of dyspnea, and 3) favor long-term adherence to health-enhancing behaviors [2].

Due in particular to its multidisciplinary approach with coordinated health professional staff, PR was historically developed in a hospital context; either based on an outpatient setting (outPR) or a full-hospitalization setting (inpatient PR; inPR). Although other alternative models have also proven to be effective, such as community-based PR and home-based PR [3,4,5], to date, they remain marginal (provided in less than 5% of organizations [6]). There are no clear orientation criteria to favor one modality over the others in the PR statements [2, 7]. The choice to orientate a patient in one or the other modality hence appears to depend primarily on the available resources in each territory. Of the two hospital-based settings, outPR is the most common worldwide (provided in nearly 9 out of 10 organizations) [6]. However, inPR is the most common in some European countries (e.g., in France, 90% of stays are inPR - data from the French Technical Agency for Information in Hospitalization, 2018).

In the literature, these two distinct modalities are generally considered to be the same intervention. For example, in the Cochrane systematic review and meta-analysis of the effects of PR on health status, the two interventions were grouped together under the term “hospital-based programs” and the data were compiled jointly [7]. Furthermore, the setting is sometimes not disclosed in studies [8, 9]. Surprisingly, to our knowledge, no systematic review or meta-analysis based on the effects of PR has yet to address the equivalence between inPR and outPR.

Yet notable intrinsic elements indicate that inPR and outPR are likely to induce different effects. For example, in terms of program organization, inPR and outPR present different respective durations of programs and frequencies of sessions. While the majority of outPR programs last between 8 and 12 weeks, with 2 to 3 sessions per week [7], inPR programs are generally shorter (from 4 to 5 weeks) and the sessions are, therefore, spaced closer together (generally every weekday). The frequency of training sessions is an important parameter that can modulate the adaptations induced by exercise. For instance, at the same workload (i.e., the same intensity and the same total amount of exercise over the entire program), a greater frequency of resistance training (3 vs. 1 session per week) may result in greater improvement [10]. Another important intrinsic difference between inPR and outPR lies in the support model, whereby inPR requires that the patient is physically present 24/7 in the hospital. Notably, while inPR generates a definite break in patient routines for several weeks, outPR maintains patients in their usual environment. In addition, by its specific setting, inPR also offers more social support (e.g., informal caregiver support, other patient support, and health-caregivers present at all times). Yet, the level of social support has a key role in the efficacy of a therapeutic intervention [11]. More specifically, after a stay in outPR, it was shown that improvement in dyspnea was correlated with the level of social support [12]. Moreover, while it has been reported that, on average, one-third of patients do not complete their PR program in outPR (e.g., [13,14,15,16]), this issue is rarely described for inPR. This could be explained by the inPR environment that requires patients to be constantly present (day and night). In contrast, in outPR, the temptation not to return could be greater. This issue has, unfortunately, been investigated very little to date.

In the current state of knowledge, it remains unknown whether there is a loss of chance when one or the other modality is applied preferentially. Moreover, it is also unclear whether there is a risk of erroneous conclusions by analyzing outPR and inPR studies as being equivalent. In light of these considerations, we propose to perform a systematic review in order to identify studies that directly compared inPR versus outPR, as well as to synthesize the evidence regarding the effectiveness comparison of both modalities.

Materials and methods

The protocol for this systematic review was developed according to the guidelines of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA). The protocol has been registered in the international prospective register of systematic reviews PROSPERO (registration number CRD42020166546).

Literature search strategy

The literature search strategies were developed using medical subject headings (MeSH) and text words. (Table 1) MEDLINE (PubMed platform), Web of Science, and Cochrane Central Register of Controlled Trials (CENTRAL) bibliographic databases were searched from database inception through March 24, 2022. The bibliographies of eligible articles as well as existing systematic reviews in the field were also screened.

Table 1 Search methodology for Systematic Review

Study selection

Articles were included if they met the following criteria: (1) the sample population consisted of adults (age > 18 years); (2) with chronic respiratory disease; (3) included in a pulmonary rehabilitation program (i.e., according to the international recommendations, PR must include exercise training and at least one of the following components: patient therapeutic education, breathing exercises, peer-group interaction, self-management skill development, or other recognized PR interventions along with optimization of pharmacotherapy), and (4) articles comparing inPR/outPR based on at least one of the PR outcomes. The selected studies were prospective and retrospective cohort studies and randomized trials that directly compared the two modalities: outpatient versus inpatient pulmonary rehabilitation. Book chapters, systematic reviews (exception for reference lists, which were checked as mentioned above), non-English articles, and conference abstracts without the full text were excluded. Two reviewers (F.A. and V.M.) screened the titles and the abstracts of the retrieved studies for relevance, and discrepancies were resolved by consensus. The reviewers were blinded to each other’s decision to include or exclude an article. Articles published in languages other than English were excluded after screening the title and abstract. Two reviewers (F.A. and V.M.) reviewed the remaining articles in their entirety for consistency with the study protocol. Discrepancies were resolved by a third reviewer (N.H.).

Data extraction

The data extraction form for this systematic review was developed by the authors. The data collected included the following: (1) Type of study; (2) Study objectives; (3) Sample size; (4) Group assignment criteria, (5) Sample size per group, (6) Anthropometric characteristics, (7) Respiratory disease diagnosis, (8) Disease severity, (9) PR Program content, (10) Duration, (11) Number of sessions, (12) Intensity of exercise training, (13) Outcomes of interest found, and (14) Results from inPR versus outPR comparisons for each outcome.

Risk of bias assessment

The risk of bias was studied by the modified Cochrane tool [17,18,19,20]. It included 13 types of biases: selection bias (criteria 1, 2, 9), performance bias (criteria 3, 4, 10, 11), attrition bias (criteria 6, 7), detection (or measurement) bias (criteria 5, 12), and reporting bias (criterion 8). The last criterion, “other” (criterion 13), was reserved for any type of potential bias that is not detected by the previous items. Two reviewers independently scored all the included studies according to the list of questions. They had to reach a consensus, otherwise, a third reviewer made the final decision. Low risk of bias was defined as 1) ‘yes’ having been answered to at least 10 questions and 2) with at least one ‘yes’ in each risk category. A moderate risk of bias was defined as 1) ‘yes’ having been answered to at least eight questions and 2) with at least one ‘yes’ in two categories. All the other cases were considered to be ‘high risk of bias’.

Best-evidence synthesis

Since a meta-analysis could not be performed due to the lack of homogeneity in the measured outcomes and a lack of data, a best-evidence synthesis (BES) [21, 22] was performed using the methodology from van Tulder et al. and Eijgenraam et al. [18, 23]. When reported, statistical values were included in our systematic review and the BES. The levels of evidence regarding the significance or non-significance of a relationship among studies were ranked according to the following statements:

  1. (1)

    ‘strong evidence’ was assigned if two or more studies with a low risk of bias and findings generally consistent in all studies (≥ 75% of the studies had consistent findings) reported a result;

  2. (2)

    ‘moderate evidence’ was assigned if a result has been reported by:

    • One low risk of bias study and two or more moderate/high risk of bias studies.

    • Or two or more moderate/ high risk of bias studies and consistent findings in all studies (≥ 75%);

  3. (3)

    ‘limited evidence’ was assigned if a result has been reported by:

    • One or more moderate/high risk of bias studies or one low risk of bias study and consistent findings in all studies (≥ 75%);

  4. (4)

    ‘conflicting evidence’ was assigned in case of conflicting findings (< 75% of the studies reported consistent findings);

  5. (5)

    ‘no evidence’ was assigned when no studies could be found.

Results

Selection and search results

After examining a total of 936 abstracts (732 after removal of duplicates), we retrieved 19 full-text publications for possible inclusion. Among these publications, we identified six studies comparing the effects of outPR versus inPR: Bowen et al. (2000) [24], Braeken et al. (2017) [25], Clini et al. (2001) [26] Guler et al. (2021) [27], Hjalmarsen et al. (2014) [28], and Stoffels et al. (2021) [29]. A flowchart describing the selection process is presented in Fig. 1. During the full-text article assessment for eligibility, the reasons for article exclusion were: absence of direct comparisons between modalities (n = 11), not a real PR program according to the official recommendations (n = 1), and articles not in English (n = 1). Among the included studies, several outcomes of interest were found [24,25,26,27,28,29]: health-related quality of life, exercise tolerance, muscle strength, psychological status, dyspnea, dropouts, economic costs, and survival rate.

Fig. 1
figure 1

PRISMA Flow Diagram

General description of the included studies

A general description of each study is presented in Table 2.

Table 2 Study characteristics and main results of the included studies

Study design

All six included studies were retrospective (Table 2). The number of patients included in each study varied between 86 and 701: n = 149 in Bowen et al., n = 419 in Braeken et al., n = 86 in Clini et al., n = 701 in Guler et al., n = 144 in Hjalmarsen et al., and n = 632 in Stoffels et al. All six studies included exhibited a high risk of bias. Details of the risk of bias are provided in Supplementary Data 1.

Population characteristics

The population characteristics are listed in Table 2. The populations of three studies were exclusively composed of COPD patients [25, 28, 29]. In the other studies, the populations also included patients with asthma [30, 31], chest-wall disease [30], and pulmonary fibrosis [30, 32]. The baseline disease severity was reported in some studies using different indicators, such as the FEV1 [24, 26,27,28,29], dyspnea [26, 29], and exercise tolerance [24,25,26,27,28,29]. However, the number and type of comorbidities were never reported.

Group assignment criteria

In the study by Braeken et al. [25], the two PR groups were formed according to disease severity, co-morbid conditions, and access to nearby facilities (but no details were provided). In the study by Clini et al. [26], the two PR groups were formed according to travel time from home (> 1 h inPR, or outPR otherwise). For the four other studies, the assignment criteria were not specified.

Pulmonary rehabilitation characteristics

The program duration ranged from 2 to 16 weeks across the studies. The number of sessions also differed (12 to 80 sessions). In all six studies, the authors stated that the PR programs followed current guidelines, but the program details were sometimes lacking. Indeed, only four studies provided additional details about the PR program contents and intensities [26,27,28] (Table 2). In these studies, the intensity of the exercise training was similar between inPR and outPR [26,27,28].

Comparison of outpatient and inpatient pulmonary rehabilitation programs and best-evidence synthesis assessment

Table 3 lists the best-evidence synthesis for each outcome. The best-evidence synthesis provided:

  • moderate evidence in favor of inPR for health-related quality of life and psychological status

  • moderate evidence in favor of no difference between the two modalities for dropouts/adherence and survival status

  • limited evidence in favor of no difference between the two modalities for muscle strength

  • conflicting evidence in favor of inPR or in favor of no difference between the two modalities for exercise tolerance and dyspnea,

  • conflicting evidence in favor of inPR or in favor of outPR for economic costs

  • no evidence for healthcare burden and refusals

Table 3 Best-evidence synthesis

Health-related quality of life

Data on health-related quality of life (main outcome) were available in only two studies (Braeken et al. and Stoffels et al.) [25, 29]. Breaken et al. [25] assessed HRQoL across three different tools (CAT, SGRQ-C, and CCQ) versus only one in Stoffels et al. (CAT) [29]. In total, four results were available regarding the HRQoL.

In both studies, inPR resulted in a greater health-related quality improvement than outPR (Tables 2 and 3). Given that the two studies reported consistent findings with a high risk of bias, the best-evidence synthesis provided moderate evidence of results in favor of inPR for health-related quality of life improvement.

Exercise tolerance

Data on exercise tolerance data were available in four studies (Braeken et al. Clini et al., Guler et al., and Stoffels et al.) [25,26,27, 29]. Braeken et al. [25] studied exercise tolerance with 6MWD and CWRT, Stoeffels et al. [29] with 6MWD, CWRT TTE, 4MGS, and 5STS, Clini et al. [26] with peak workload and Guler et al. [27] with 6MWD. In total, eight results were available regarding exercise tolerance.

For five results, inPR resulted in greater exercise tolerance improvement than outPR, whereas for three others, no difference was observed between inPR and outPR. (Tables 2 and 3). Given these results and the high risk of bias assessed for each study, the best-evidence synthesis provided conflicting evidence in favor of inPR or in favor of no difference between the two modalities.

Muscle strength

Data on muscle strength was only available in the study by Stoffels et al. [29]. No significant difference in isokinetic quadriceps peak torque improvement was found between inPR and outPR (Tables 2 and 3). Therefore, limited evidence in favor of no difference between the two modalities was provided by the best-evidence synthesis.

Dyspnea

Data on dyspnea were available in two studies (Clini et al. and Stoffels et al. [26, 29]). Clini et al. [26] found no significant difference in dyspnea improvement between inPR and outPR, whereas Stoffels et al. [29] showed greater dyspnea improvement in inPR compared to outPR (Tables 2 and 3). Given both studies presented a high risk of bias and each one reported different results, the best-evidence synthesis provided conflicting evidence for dyspnea due to significant results in favor of inPR and in favor of no difference between the two modalities.

Psychological status

Data on psychological status were available in two studies, Braeken et al. and Stoffels et al. [25, 29]. Both studies used HADS, with one anxiety score and one depression score. In total, four results for psychological status were available. For both studies, inPR yielded greater improvement in anxiety and depression scores compared to outPR (Tables 2 and 3). Given that the two studies reported consistent results with a high risk of bias, [25, 29], the best-evidence synthesis provided moderate evidence in favor of inPR for psychological status improvement.

Healthcare burden

No study from this systematic review provided healthcare burden data.

Refusals

Refusal data were not available in the studies. Braeken et al. [33] provided the number of non-attendance (n = 32), but it was not possible to infer the cases of refusals.

Dropouts/adherence

Dropouts/Adherence data were only available in three studies [25, 27, 29], but Guler et al. [27] did not provide any statistics in their article regarding this comparison. Braeken et al. [25] found that the PR setting had no significant impact on dropout rates, and Stoffels et al. [29] reported no significant differences in adherence between inPR and outPR (Tables 2 and 3). Given the high risk of bias reported for both studies and consistent results in both studies, moderate evidence in favor of no difference between the two modalities was obtained.

Economic costs

Economic cost data were available in one study (Clini et al.) [26]. In this study, the authors calculated three different total costs (total per session, total per program, and the grand total including transport costs) (Tables 2 and 3). For the total per program and the grand total, inPR was cheaper than outPR, whereas for the total per session, the opposite was observed. Given these results and the high risk of bias assessed for each study, the best-evidence synthesis provided conflicting evidence results in favor of inPR or in favor of outPR.

Survival status

Data on survival status were available in three studies [24, 27, 28], but Guler et al. [27] did not perform any statistical comparison on this outcome. Bowen et al. [24] and Hjalmarsen et al. [28] found that the PR setting had no impact on survival rates for COPD patients with a time period ranging from 1 to 10 years (Tables 2 and 3). Given the high risk of bias for both studies and consistent results observed, moderate evidence in favor of no difference between the two modalities was obtained for survival status.

Discussion

We performed a systematic review in order to identify studies that directly compared inpatient pulmonary rehabilitation (inPR) versus outpatient pulmonary rehabilitation (outPR), as well as to synthesize the evidence on the effectiveness comparison of both modalities. Six retrospective studies were identified after the systematic review process, all of which had a high risk of bias. No meta-analysis was possible due to the lack of homogeneity in the reported outcomes. However, a general best-evidence synthesis was carried out. Of the ten targeted outcomes, eight could be found in the included papers. For health-care burden and refusals, no data could be extracted and, thus, no best-evidence synthesis (BES) was performed. For the eight remaining target outcomes, two results were in favor of inPR with moderate evidence (health-related quality of life - HRQoL and psychological status), three in favor of no difference between inPR and outPR with moderate or limited evidence (muscle strength, dropouts/adherence, and survival status), and three led to conflicting results (exercise tolerance, dyspnea, and economic costs).

The BES indicated moderate evidence in favor of greater effects of inPR on HRQoL and psychological status. Despite a consensus on these two outcomes, the level of evidence was only moderate due to numerous biases in the studies. Indeed, they were both retrospective, with no randomization and with heterogeneous groups, especially regarding disease severity. Unfortunately, the difference in disease severity was not taken into account in the analyses. Since severe patients usually exhibit more progress than the less severe patients, due to more scope for improvement [33], the higher severity in the inPR group represents a potential confounding factor, making it impossible to definitively conclude that inPR is superior to outPR for improving HRQoL and psychological status, despite a clear statistical trend.

For three other outcomes (muscle strength, survival status, and dropouts), the BES reported the absence of difference between inPR and outPR, with limited and moderated levels of evidence. While muscle strength is a key outcome in PR as an independent predictor of survival [34, 35], the fact that only one study investigated it is disappointing. Thus, the limited evidence on this result, in favor of an absence of a difference between the PR settings, does not allow any conclusions to be drawn. Regarding survival rates, the results revealed comparable survival rates with moderate evidence between both PR settings over a wide time frame after rehabilitation (up to 10 years) [24, 28]. No prospective follow-up was performed from the end of PR programs until the survival status was collected, translating to a real black box regarding any potential events that could have occurred during the follow-up. For example, neither of them reported some medium-term health indicators such as exacerbations and hospitalization rates (i.e., healthcare burden), which clearly influence survival status, however. As it stands, no modality can be considered more effective than the other in terms of the change in the vital prognosis of patients. A prospective study including several control variables appears to be essential to avoid any unwarranted conclusions regarding survival status. Regarding dropouts-adherence, the BES again showed no difference between inPR and outPR. To the best of our knowledge, in literature, the studies for which the main objective was to specifically investigate the dropouts phenomenon in PR were all carried out in outPR only [15, 30,31,32, 36, 37]. At first sight, due to the lack of studies in inPR, this may suggest that a high PR dropout rate is a phenomenon specific to outPR. Unexpectedly, our systematic review identified three studies that provided dropouts-adherence data in both PR settings, and with a significant number of dropouts in inPR in two studies [25, 29]. Unfortunately, they did not provide the reasons for dropping out. In previous studies performed in outPR, the main reasons for dropping out were mainly related to daily transport issues, lack of social support, and session times [15, 38]. By definition, these issues are unlikely in inPR. Beyond the rate of dropouts, the reasons for dropping out are of high importance in order to identify potential ways to minimize the dropout phenomenon. More specifically, if the reasons for dropping out differ between inPR and outPR, personalization of the PR setting according to each patient’s characteristics could become a way to limit dropouts in PR and thereby increase the PR efficacy. Hence, there is an urgent need for future studies to investigate and compare the reasons for dropping out between the two PR settings. Regarding the other three studies included in the review, the absence of data on dropout rates is problematic. Indeed, the efficacy of PR programs was only evaluated through the changes in the health status of patients who completed the program. However, the performance of a healthcare system must be assessed not only in terms of the health status of patients but also in terms of its capacity to be efficient for the majority of patients [39]. More specifically, if analyses were performed in per protocol (this information was not clearly provided in the three studies), failure to verify that completion or dropout rates were similar between inPR and outPR would be problematic. Indeed, in another study in which abandoners were followed and evaluated (intention to treat analyses), the results revealed differences in exercise tolerance according to the type of analyses: + 30 m in per protocol versus + 10 m in intention to treat analyses [39].

The BES also provided conflicting results between the two PR settings for exercise tolerance, dyspnea, and economic costs, preventing definition of any tendency. Concerning exercise tolerance, several different tests have been used to evaluate its progression. Some of them, such as the 6MWD, were in favor of a greater improvement in inPR. Interestingly, if only data on the 6MWD had been analyzed, moderate evidence would have been obtained. However, for some of the other variables, no difference between inPR and outPR was observed. Thus, the conflicting results could be explained by a lack of sensitivity of certain measures to detect any differences.

Concerning dyspnea, Stoffels et al. reported greater inPR efficacy, while Clini et al. did not find any differences between the two PR settings. Dyspnea is usually described as a multidimensional outcome, including impact, sensory, and emotional dimensions [40, 41]. In the two aforementioned studies, it was only assessed with unidimensional tools, and on two different dimensions: ‘impact’ in Stoffels et al. [29] vs. ‘sensory’ in Clini et al. [26]. Therefore, our systematic review could only provide a restrictive evaluation of dyspnea, and with conflicting results that could be explained by a difference in PR setting efficacy according to the dyspnea dimension. Finally, conflicting evidence in favor of inPR or in favor of outPR for economic costs was shown after BES. Only one study [26] investigated this outcome, and it was analyzed with different methods. First, the authors found that inPR in terms of the grand total (i.e., cost per program plus transports) was cheaper than outPR. This was expected since the inPR program included half the number of sessions than outPR (12 vs. 24 sessions, respectively). Given that no significant difference between inPR and outPR on PR outcomes was found in parallel, this result was logically interpreted as a possible indication of a better cost-benefit ratio for the inPR setting. However, it was also the case that the cost per session was cheaper in outPR than inPR, even without taking into account hospital bed costs for inPR. Consequently, it is likely that the inPR costs were underestimated. With different cost calculations yielding conflicting results it is, therefore, impossible to conclude regarding economic costs.

By drawing up a general summary, and despite the existence of biases and several different levels of evidence, it can be seen that the results of our comparison study are either in favor of inPR or in favor of the absence of differences between inPR and outPR. Not having a unanimous consensus regarding the superiority of one modality over the other raises questions about the impact of these results on the orientation of patients toward one or the other modality. Indeed, in light of the results, it would appear that the effectiveness comparison of inPR vs. outPR could be linked to the outcomes. Beyond a differentiated effect between the two modalities regarding some PR outcomes, it is also possible to envisage that inPR could be more suitable and adapted for certain patient profiles and outPR for some of the others. Unfortunately, this type of analysis at the individual level was not possible with the available data in the BES. Future research directions should also consider individual responses according to patient profiles, with the perspective of identifying the predictors of success across each PR setting.

Methodological considerations of the systematic review

Best-evidence synthesis is a very relevant alternative that makes it possible to discern trends. It allows expression of results nuanced by different parameters such as the risk of bias of each included study, the number of studies, and the number of consistent results. Then, although this approach does not replace a meta-analysis, it allows extension of the knowledge with the available data. Several limitations in our systematic review warrant consideration. Best-evidence synthesis allows the level of interest found in the literature for a field to be highlighted. Here, we found a low number of studies. This clearly shows a lack of interest of the scientific community in this topic, which is nonetheless an essential question for patients. Then, when performing a best-evidence synthesis, a degree of subjectivity is introduced since the criteria of levels of evidence have to be defined even if it is based on a published methodology [18,19,20,21,22,23]. However, as explained by Slavin in 1986, in the absence of the possibility to perform a meta-analysis, best-evidence synthesis provides a means to combine the strengths of meta-analytic and traditional reviews [23]. A final weakness of our study is that we did not perform a literature search of unpublished papers or articles written in languages other than English.

Conclusion

In conclusion, in the current state of knowledge, the majority of the studies converge towards the absence of differences between inPR and outPR or in favor of inPR for 7 out of 8 outcomes but with moderate, limited, or conflicting evidence. In addition, due to the retrospective nature of the studies, the absence of randomization, and of comparable severities between groups, no definite conclusions can be drawn from our systematic review and best-evidence synthesis. A well-designed RCT will potentially confirm this trend in favor of inPR in order to orient public health policies on the development of PR with a best-evidence-based medicine approach.