1 Introduction

CHF can be a long-lasting or acute progressive condition that affects the ability of the heart muscles to pump blood to other parts of the body because of fluid accumulation around the heart. CHF is caused by a broad range of non-communicable diseases (NCDs), with cardiovascular diseases (CVDs) being top on the list. About 2% of the adult population in developed countries suffer from this condition [1,2,3], and it is an all-important cause of increased morbidity and mortality of aging populations all over the world. For instance, it is documented in [4] that 23 million individuals are living with CHF worldwide. The occurrence of the condition increases considerably with age, coupled with an ever increasing symptom burden [5]. From evidence accrued in [6] and [7], CHF is also the leading cause of hospital admission and rehospitalization in the United States and Europe [8]. Although there has been marked improvement in the survival rate of patients with CHF in recent years, no known cure has been reported. It is encouraging to note, however, that even though CHF gets worse over time, it is not necessarily a death sentence. Proper management of the condition can yield improved quality of life (QoL) and slow down the progression rate. Treatment, some of which include lifestyle changes (smoking cessation, exercise, healthy diet), pharmacological therapy (beta-blockers, diuretics, vasoactive drugs), device implantation (invasively in the chest) and a recourse to surgery (heart transplant or bypass operation) if all else fails [6, 9, 10], is generally for the purpose of controlling symptoms for as long as possible.

1.1 Telemonitoring opportunities for CHF

Digital technology is being used overwhelmingly, more so now than any other period in the history and development of the human race. Some of the outputs of this digital revolution include the global adoption of social networking, wireless connectivity, mobile devices, and so on. These technologies are presented through interfaces using artificial intelligence (AI), big data, machine learning methods, and more recently, edge computing. This scenario therefore creates a pertinent case for the development and use of telehealth and telemonitoring devices for the management and monitoring of patients in an emerging branch of clinical practice popularly called telemedicine. Telemedicine allows health providers to care for their patients remotely using information and communication technologies (ICTs). An overarching definition is given by the world health organization (WHO) in [11] where opportunities and development in telemedicine are well articulated. The slow adoption of novel practices in telemedicine may well be traced to factors ranging from evaluation processes, cost, infrastructure, cost-effectiveness [11], to data security [12].

1.2 Call for action

Due to the rising burden of morbidity and mortality caused by CHF on a global scale, and the huge costs involved in providing quality care to the ever increasing elderly population who are most affected by this condition, there has been mounting calls for more effective means of managing and controlling patient characteristics, disease severity, and other factors [13, 14], which ultimately lead to higher rates of hospitalizations and readmissions [15,16,17]. Telemonitoring and telehealth procedures seem to have found a footing in the area of disease management programs (DSPs) which are highly recommended by regulatory agencies in both the United States and Europe. An objective assessment of the effectiveness of particular telemonitoring method(s) employed is sometimes difficult owing to conflicting outcomes from different RCTs. With a focus on the foregoing, we deemed it necessary to carry out an all-encompassing systematic review and meta-analysis of RCTs employing telemonitoring practices in one form or the other to manage CHF patients. This was done in order to provide a holistic and evidence-based view on four key areas; the complexity of intervention, patient characteristics, patient severity, as well as KETs adopted, thus capturing all possible aspects of the documented interventions and outcomes. In some previous works [18,19,20,21], the trend had been to focus on one or two of these areas, thereby failing to capture the entirety of the body of evidence available. Similarly, owing to previous reports of mixed effects on common themes, for example, as detailed in Giamouzis et al. [22], we thought it of great importance to ascertain new studies that could possibly tip the balance by providing clearer evidence. Beyond these however, there is also the need to continually provide updated evidence on telemonitoring interventions and outcomes using revised methods of evaluation, especially when considering the constant advancement in technology, and how such advances may or may not have impacted on the management of CHF patients. Finally, it is hoped that the observed effects from the synthesis of relevant studies in this comprehensive review will aid the formulation of new practice guidelines and policies on a global scale that will significantly improve clinical efficacy and prognosis, reduce healthcare utilization, and effectively cut down on the overall costs of managing CHF patients.

2 Materials and methods

We conducted a systematic literature review with meta-analysis following the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines, to compare the effects of home telemonitoring practices as opposed to traditional approaches of managing and monitoring patients affected by CHF. With this study, we aim to formulate a model that can be easily adopted for novel areas of CHF research and practice going forward and can provide reliable outcomes.

2.1 Eligibility criteria

To be included in the review, published studies had to be RCTs, deal with home telemonitoring of patients with CHF, be in a language known by the authors (i.e., English, Spanish or Italian), and be published after the first of January 2012. Full details of the inclusion and exclusion criteria are shown in Table 1.

Table 1 Study inclusion and exclusion criteria

2.2 Information sources

We searched a number of electronic data resources containing citations for indexed journals, articles/manuscripts, online books, etc. Most citations contained links to full text content from publisher websites and other associated online utilities. In Table 2, we present a list of databases that were searched, as well as a short description of their functions. Additionally, the coverage range (i.e., specific time interval) for retrieving eligible studies, and the particular date when each source was last consulted are shown. Although not presented in the table, key researchers for some eligible studies were also contacted to provide supplementary data if available.

Table 2 Sources searched for systematic review

2.3 Search strategy

The identification and selection of studies included in this review was based on the patient/problem, intervention, comparison and outcome (PICO) search strategy [23,24,25]. The study was performed according to the updated PRISMA 2020 statement for systematic literature reviews and meta-analysis [26,27,28]. Based on the research question of interest, an initial search of pertinent literature reviews was carried out on PubMed to identify relevant keywords, refine the search string, and clearly identify research gaps. From this early search, 21 papers were identified and scrutinized. Specifically, a combination of Boolean operators (i.e., AND or OR) was used to create the search string, which included the following terms: “remote monitoring”, “mHealth”, “telemedicine”, “eHealth”, “telemonitoring”, “telehealth”, “home monitoring”, “eTechnologies”, “telecare”, “telehomecare”, “THC”, “ICT”, “Information and communication technologies”, “IT”, “information technologies”, “congestive heart failure”, “Heart failure”, “CHF”, “RCT”, "randomized control trial", “RT”, “randomized trial”, “randomized controlled trial”,“RCT”, “randomized control trial”, “RT”, “randomized trial”, "randomized controlled trial”. Further information on the search strategy, including the complete search string, can be found in Online Resource 1 (ESM_1).

Once the string was refined, the Scopus online citation database and Ovid search interface (Embase, CENTRAL and MEDLINE) were selected to conduct electronic searches for eligible studies, in order to identify previous meta-analyses and RCTs comparing the effects of home telemonitoring in patients with CHF to usual care (UC), starting from 01/01/2012 up to 06/06/2019. Furthermore, the snowball sampling method was applied to the references of the retrieved articles, as well as forward citation tracking on Google Scholar, to identify additional relevant studies. Finally, we contacted the corresponding authors of some of the included studies to request related supplementary data where necessary, and also searched ClinicalTrials.gov to retrieve registered protocols where available in order to confirm whether each study followed the laid down rules and guidelines for implementing the study.

2.4 Selection process

The list of articles recovered from both Scopus and Ovid (Embase, MEDLINE(R), Your journals@Ovid) were exported as separate Excel sheets, which were later combined into a single Excel document. Replication of selected articles across the listed databases was controlled in the initial database search by screening for duplicates using Microsoft Excel software. After the initial duplicate screening, further removal of some remaining duplicates and multiple reports belonging to the same study was achieved by identifying author and cohort names, study locations, and sample characteristics of any particular study. Thereafter, residual titles were independently screened for eligibility by two researchers (BO and LL), followed by a careful review of abstracts to determine probable exclusions. The same researchers independently carried out a critical assessment of the full text of each screened paper to determine its correlation with the objectives of the study and coherence with the inclusion and exclusion criteria. A third researcher (DP) independently checked the results of the selection process. In instances where inconsistencies were observed, these were resolved by discussion with the senior researchers (GF and LP).

2.5 Data collection process

The total number of eligible studies was shared proportionally between two authors (BO and LL) for data extraction. Data were independently extracted from allotted titles by the two authors and entered in a preformed Excel sheet. Because of the need to capture as much data as possible, an exploratory method of data extraction was adopted whereby the Excel sheet was expanded over time to accommodate any new data item that seemed relevant to the purpose of the review. A third author (DP) then reviewed the extracted values to detect errors, inconsistencies, and misalignment in the reported values, or their units of measurements. In the second phase of the data extraction process, the two authors (BO and LL) reviewed their extracted values, and also verified each other’s work in order to eliminate errors and align units of measurements for different variables. In the event of missing or incomplete data, or any ambiguity in the intervention procedure, study authors were contacted to request for more details.

2.6 Data items

The examined studies covered an extensive range of endpoints (both primary and secondary) and were fairly similar in construction. However, there were a few studies which only had a general categorization of outcomes without any clear-cut distinction between the primary and secondary outcomes. Data extracted were categorized into 13 macro-groups, namely, intervention details, patient characteristics, differences in clinical variables at follow-up, psychological status, QoL, adverse events at follow-up, drug utilization, cost analysis, self-care behavior, healthcare/home care utilization, cardiopulmonary exercise testing (CPX), six-minute walk test (6MWT), and adherence statistics. A categorization code was introduced by two authors (BO and LL) to make the outcomes easier to identify. The outcomes of all eligible studies for this review are presented in Table A (Online Resource 2 – ESM_2) and the interpretation of the codes can be found at the bottom of the table. Only baseline results and the final follow-up values of the studies for each arm of the RCTs were sought. In cases where either of these two sets of results were unavailable, the extracted data were not included in the ensuing meta-analysis for different outcomes. In some other situations where enough data could not be extracted from a particular study, relevant data were extracted from some earlier studies (published before January 2012) that were referenced in the text of the study, as long as it could be ascertained that they were implemented as part of the same protocol.

2.7 Study risk-of-bias (RoB) assessment

As previously observed in Section 2.5, the studies were proportionally distributed between two researchers (BO and LL) for independent assessment. The RoB was performed (by BO and LL, under the supervision of DP) by scoring the included studies against the relevant items described in the Cochrane handbook [29], using an ad-hoc Excel file. Moreover, the two authors appended notes detailing justifications for arriving at an assigned value for each major item in the tool. Afterwards, the authors met to review all allocated assessments of bias for each study, and to resolve any differences in judgement. Conflicts of opinion or uncertainties which could not be resolved were discussed with the third author (DP) for final resolution. The RoB evaluation for all included trials, showing the domain assessment for individual trials, is summarized in Fig. 2.

2.8 Measures of intervention effects

In this meta-analysis, continuous outcomes (e.g., ejection fraction (EF), diastolic function, B-type natriuretic peptide (BNP), 6MWT, all-cause hospitalization, visit to the cardiologist, QoL (physical component), self-care behavior, peak VO2) were expressed by weighted mean differences (WMD), together with a relative 95% confidence interval (CI). Dichotomous outcomes (e.g., beta blockers, angiotensin converting enzyme inhibitors and angiotensin receptor blockers (ACE-i/ARB), number of heart failure (HF) death and/or rehospitalization, cardiovascular death, emergency admissions, unscheduled visits, death from any cause, etc.) were expressed as risk ratios (RR), together with the relative 95% CI. Results with p-values lower than 0.05 were considered statistically significant.

2.9 Synthesis methods

We identified four broad areas of interest and classified the included studies accordingly. This was done in order to properly understand the complexities inherent in the studies, and to provide an easier way of navigating the multiple components making up each unit. These include:

  1. 1.

    Patient characteristics: These are key attributes used to define individual patients or groups of patients, and they express the underlying relationships between the patient, physician, and treatment prescribed. For this study, some of the more commonly considered characteristics include the number of patients, age, gender, comorbidities, additional patient information, and frequency of monitoring.

  2. 2.

    Patient severity: This was an area of great importance in our research considerations because of its potential to determine the focus and outcomes of any intervention. Most studies usually specify a severity of illness (SOI) index [3031] in their eligibility criteria, and in our particular experience, almost all the studies included in this systematic review and meta-analysis specified the New York Heart Association (NYHA) functional classification of HF for their respective sample populations.

  3. 3.

    Complexity of intervention: We were interested in knowing the number of elements in the intervention, capture interactions between components of the intervention and/or the intervention and its context, and finally, recognize the wider system within which the intervention is introduced [3233].

  4. 4.

    KETs adopted: Since we were interested in assessing the effects of home telemonitoring on CHF patients, it was only natural that we would want to discover the type and breadth of KETs adopted for each intervention.

The meta-analysis was performed using MetaXL add-on for Microsoft Excel. Each outcome of interest was reported in a separate Excel sheet. When studies did not report the same unit of measure, adequate conversions were made, if possible. Only 12 such cases were encountered for this study. In the event that some outcomes of the studies were lacking information, for example, either no measure of dispersion or standard deviation was given, they were discarded from the meta-analysis. However, if the CI was given for a certain outcome, and the related study had more than 100 subjects, then the standard deviation was estimated as follows (this formula is valid for 95% CI):

$$SD = \frac{\sqrt{N}\cdot \left(upper\; limit\;-\;lower \;limit\right)}{3.92}$$
(1)

where \(SD\) is the standard deviation, \(N\) is the number of subjects, and the limits are those of the CI.

Moreover, for the continuous variables, the mean differences between the post- and pre-treatment, i.e., telemonitoring or usual care, were used to evaluate the overall efficacy of the treatment compared to usual care. If the mean differences and the related standard deviations were not available, they were calculated. In particular, the standard deviation of the difference was calculated as follows:

$$SDd=\sqrt{\frac{{\sigma }_{1}^{2}}{{n}_{1}}+\frac{{\sigma }_{2}^{2}}{{n}_{2}}}$$
(2)

where \(SDd\) is the standard deviation of the difference, \(\sigma\) are the standard deviations, and \(n\) are the sample sizes.

As regards heterogeneity, it was assessed by recurring to the I2 statistic that describes how much of the percentage variation across the studies is due to heterogeneity rather than chance. I2 values of less than 25%, 25% to 75%, and greater than 75% were associated with low, moderate, and high heterogeneity, respectively. For outcomes showing moderate or high heterogeneity, a random effects model was used. For all other outcomes showing an I2 of less than 25%, a fixed effect model relying on inverse variance was selected. Results were presented in the form of Forest plots to aid visualization and comparison. Moreover, sensitivity analysis was performed on resulting significant variables in order to assess the robustness of the pooled results. This analysis was performed by leaving one study out at a time, and then evaluating changes in either the statistical significance, pooled effect, or heterogeneity of the remaining studies, thus allowing us to highlight possible studies that might have quality issues. These flagged studies were again reassessed, taking into consideration the RoB, to ascertain whether they should not be considered for the variables for which they appeared problematic.

2.10 Reporting bias and certainty assessment

As reported in Section 2.7, we employed the Cochrane RoB tool for randomized trials [8] to assess shortcomings in the design, management, evaluation, and reporting of all eligible studies based on our knowledge of the trial methods and their likelihood to have led to a RoB. We did not employ additional guidance to separately analyze one cluster-randomized trial [34] that was included in the review for the sake of uniformity and simplicity. To assess selective reporting bias, we confirmed that the study protocol was available, and that all pre-specified outcomes had been reported. Similarly, incomplete outcome data bias was judged by determining whether the missing outcome data were relatively even across the intervention and control groups, with similar reasons given for missing data. Alternatively, we confirmed that missing data had been properly accounted for by using appropriate methods (e.g., intention-to-treat approach). Finally, for small-sample studies, we planned to observe any asymmetry generated in the Doi plots, resulting from the sensitivity analysis, in order to initiate a review of possible trial characteristics that could likely have contributed to the asymmetry. No assessment of certainty in the body of evidence was carried out.

3 Results

3.1 Study selection

Altogether, the initial online database search, using OvidSP and Scopus, yielded 1249 citations (OvidSP: n = 734, Scopus: n = 515). 1205 records remained after the removal of duplicates. Following this, titles and abstracts of the remaining articles were screened to determine whether they satisfied the inclusion criteria (see Table 1). Forty-one articles were thus determined to be potentially relevant to the cause of the study. A full text review of each retained article was then carried out, during which additional RCTs were identified from the reference lists of selected articles. The most recent RCT in this collection yielded 16 trials that had already been retrieved in the search. A total of 28 studies involving 10,258 patients (6,830 males, 3,428 females) were then summarily included in the qualitative synthesis, out of which 24 studies were used for the quantitative synthesis. Figure 1 illustrates the online database search and study selection procedure.

Fig. 1
figure 1

PRISMA flow chart for the systematic literature review

The full-text articles (n = 13) removed from the final selection for the qualitative analysis originally appeared to meet the inclusion criteria but were later excluded due to reasons given in Table 3.

Table 3 List of full-text exclusions and reasons for exclusion

The four articles that were excluded from the quantitative analysis are: Young et al. [88] (data extracted but not comparable with the rest of the studies), Maru et al. [86] (incomplete data for cost analysis), Oksman et al. [87] (incomplete cost data), Hofmann et al. [37] (no results reported).

3.2 Study characteristics

Included studies focused on the use of non-invasive telemonitoring practices to provide home monitoring of CHF patients, usually with the designed intervention being compared with usual care through the deployment of different kinds of electronic/mobile devices, with most having wireless communication capabilities. Moreover, studies focusing on implantable monitoring devices in terms of inputs, data and patient performance were also included. To make the study as broad as possible, no limitation was set on the duration of the intervention, length of follow-up, NYHA class for CHF, EF, population size, age, ethnicity, education, economic status, family situation, and religion. Table B (Online Resource 3 – ESM_3) gives full details of each study as extracted from the original articles.

3.3 Risk of bias in studies

Out of all 28 RCTs included, only 16 were judged to be at a low RoB. For example, allocation concealment and blinding of participants and personnel were very problematic for the majority of studies. However, selective reporting seemed to have been managed properly by almost all the studies included. Moreover, taking a look at the performance of all included studies in relation to the presence of other forms of bias that are not specifically stated in Fig. 2 (column 8), only Blum et al. [35] was assessed to be free of such bias. Some examples of other forms of real or implied biases that were commonly noted in the study texts include; (a) underpowered studies, (b) sample size calculation not done, (c) no method specified for handling missing data, (d) multiple hypothesis testing, (e) early termination of study or relatively short follow-up period, (f) generalizability and reproducibility of results not guaranteed, (g) potential bias by informative dropout, (h) limitation of using administrative databases due to assumptions about the accuracy of the data entry and coding, (i) the problem of outliers, (j) limitations to questionnaire data, (k) telemonitoring system only available in English, and so on.

Table 4 Luis Furuya-Kanamori (LFK) index

Of all the studies assessed for RoB, Hale et al. [36] and Hofmann et al. [37] performed most poorly. Both had issues with allocation concealment and blinding of participants and personnel, with Hale et al. failing to record any form of sequence generation, while Hoffman et al. had a high bias for selective reporting and blinding of outcome assessment. Likewise, Harter et al. [38], Koelling et al. [39], and Domenichini et al. [40], though with slightly lower bias assessments, still had issues with the quality of the studies. Figure 2 shows the full range of assessment for all the studies, with an interpretation of the color codes used. Furthermore, in order to assess publication bias, we evaluated the Doi plot of all variables that resulted in significant pooled effects through the Luis Furuya-Kanamori (LFK) index [41]. This tool helps to properly capture and quantify visual asymmetry of study effects more effectively than the funnel plot and Egger’s regression [42]–[46]. A symmetrical plot with (LFK) index <|1| indicates no asymmetry; LFK index between |1| and |2| is linked to minor asymmetry; while LFK index >|2| suggests major asymmetry [47]. Table 4 shows the LFK index for all variables evaluated.

Fig. 2
figure 2

RoB assessment

3.4 Results of synthesis

All outcomes as detailed in Table A (Online Resource 2 – ESM_2) were considered for meta-analysis. However, only six outcomes (BNP, rehospitalization/hospitalization for HF, visits to a nurse, 6MWT, cardiovascular death/HF hospitalization, and visit to a cardiologist) were statistically significant after synthesis, thus giving a strong indication that the observed effects are unlikely to have arisen solely by chance. This therefore provides evidence of the overall effect in terms of both the value and precision of the effect estimate. Results of five variables (two continuous, three dichotomous) are presented here, with outcomes separated into primary and secondary outcomes based on clinical significance and highest recurrent totals in included studies for each synthesis. The synthesis result obtained for ‘visit to a cardiologist’ has been suppressed in this review primarily because it was considered analogous to ‘visit to a nurse’, and secondarily because the heterogeneity of the pooled results was much higher, compared to the latter outcome. Results for some additional outcomes which we considered clinically important, but were not statistically significant (EF, aggregate QoL, QoL - Physical component, number of all cause hospitalizations, days alive and unhospitalized, self- care behavior, peak VO2 (VO2max), diastolic function, all cause death-rehospitalization, and death from any cause) have also been summarily presented. These were rated as either primary or secondary outcomes based on popularity-of-mention in included studies, relevance to patient wellbeing, and cost indices of healthcare.

3.4.1 Primary outcomes

3.4.1.1 Brain natriuretic peptide (BNP (pg/mL))

Four studies [40, 48,49,50] comparing usual care (control) versus telemonitoring (intervention) for CHF patients were enrolled and randomized in the RCTs under consideration, comprising 345 males and 96 females. Age range of participants was between 52.3(13.7)–73(5) years in mean (SD). Comorbidities identified included ischemic heart disease (IHD), diabetes, hypercholesterolaemia, cerebrovascular disease, hypertension, atrial fibrillation (AF), and hyperlipidemia. A variety of telemonitoring tools such as handheld PDAs, remote data transmission and storage facilities, land/mobile phones, accelerometer-embedded watches, and special medication event monitoring systems were employed. Eligible NYHA classes were heterogeneous (NYHA Class II-IV) across all studies. Two of the trials [48, 50] were assessed to have low RoB, while the remaining two had high RoB [40, 49]. Domenichini et al. [40] had issues with blinding of the outcome assessment, participants and personnel, while Seto et al. [49] recorded incomplete outcome data. Both studies were deemed to have irrevocable forms of other bias. As heterogeneity was high (I2 = 83.5%), hence the random effect model was applied. Our meta-analysis suggests that the intervention decreased the BNP significantly more than in the usual care (WMD = -27.75; 95% CI (-53.36, -2.14); p-value = 0.034) (see Fig. 3).

Fig. 3
figure 3

Forest plot for BNP

3.4.1.2 Rehospitalization/hospitalization for heart failure (n (%))

Six studies [36, 48, 51,52,53] comparing usual care (control) versus telemonitoring for CHF patients (intervention) reported the number of rehospitalization/hospitalization for HF. These studies enrolled 2,972 participants into both the control and intervention arms. There were 2,048 males and 924 females with ages varying between 68 and 73 years. Some of the telemonitoring tools used include electronic balances and blood pressure gauges, an interactive telecommunication software tool (TeleWatch), Health-Buddy system, land/mobile telephone, remotely monitored electronic pillboxes, home tele-monitoring equipment (Medic4All®), handheld PDAs, and remote data transmission and storage facilities. Three out of the six studies [34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53] included patients with NYHA classes II-IV, while the remaining had a mix of NYHA classes I-IV. Comorbidities/risk factors/complications included IHD, diabetes, smokers, hypercholesterolaemia, hypertension, AF, hyperlipidemia; peripheral vascular disease/stroke, depression, chronic obstructive pulmonary disease (COPD), renal dysfunction, and dyslipidemia. Four studies [48, 51, 52] were judged to have low RoB. The remaining two exhibited high bias due to failure to implement allocation concealment and blinding of participants and personnel [36, 53], as well as the absence of random sequence generation and a high incidence of other bias [36]. Since heterogeneity was low (I2 = 18.7%), the fixed effect model was applied. Our meta-analysis insinuates that the intervention significantly decreased the RR for rehospitalization/hospitalization for HF (RR = 0.88; 95% CI (0.79, 0.98); p-value = 0.015) (see Fig. 4).

Fig. 4
figure 4

Forest plot for HF rehospitalization/hospitalization

3.4.1.3 Visit to a nurse (n (%))

Two studies [51, 54] reported the number of visits to a nurse. The total number of participants enrolled in these studies comparing usual care (control) and telemonitoring treatments (intervention) for CHF patients was 809 with ages ranging from 57 to 69 years. There were 583 males and 226 females in the cohort. Many of the participants exhibited different kinds of comorbidities, while others had no diagnosed complications. Telemonitoring tools used for the intervention include mobile/land phones, mobile phone apps, weight scales, blood pressure meters, server and web user interfaces, as well as electronic health record systems. The RoB was assessed to be good in [51], but was evaluated as poor in [54] owing to absence of random sequence generation and presence of other forms of bias. Furthermore, in [54], NYHA class I patients were excluded from the intervention, while all available NYHA classes (I-IV) were included in [51]. As the heterogeneity was moderate (I2 = 45.5%), the random effect model was applied. Our meta-analysis suggests that the intervention significantly increased the visits to a nurse compared to usual care (WMD = 1.42; 95% CI (0.33, 2.52); p-value = 0.011) (see Fig. 5).

Fig. 5
figure 5

Forest plot for visit to a nurse

3.4.2 Secondary outcomes

3.4.2.1 Six-minute walk test (6MWT (m))

Four studies [40, 50, 55, 56], comparing usual care versus telemonitoring for CHF patients reported the 6MWT, which is the distance walked in six minutes measured in meters. In total, there were 423 participants: 351 males and 72 females. Recorded ages (mean (SD)) across both male and female categories ranged between 54.5(10.9) and 66(13). Some of the devices, tools and procedures used for the intervention incorporated wireless EHO mini devices, blood pressure measuring and weighing tele-rehabilitation sets, mobile phones, data servers, Accelerometers, Medication Event Monitoring System bottle cap in-home, cardioverter defibrillators, as well as CPX on a treadmill using the Naughton protocol. Comorbidities were identified at baseline. Some of the conditions detected included IHD, diabetes, AF, coronary heart disease (CHD), myocardial infarction (MI), arterial hypertension, HF etiology, hyperlipidemia, and some other unspecified conditions. In [55] and [50], patients with NYHA classes I to III were involved in the study, while in [40], participants fell within the NYHA functional class III or better. Smolis et al. [56] only dealt with NYHA class III patients. The RoB was appraised to be good for only two studies [50, 56]. In [55] and [40], two major biases were identified. These had to do with non-implementation of blinding measures for participants and healthcare personnel, coupled with some noticeable forms of other bias. In addition to these, investigators in [40] failed to blind the outcome assessment. As the heterogeneity was high (I2 = 91.3%), the random effect model was applied. Our meta-analysis suggests that the intervention significantly increased the 6MWT (WMD = 25.61; 95% CI (9.22, 41.99); p-value = 0.002) (see Fig. 6).

Fig. 6
figure 6

Forest plot for 6MWT

3.4.2.2 Cardiovascular death/heart failure hospitalization (n (%))

Three studies [48, 51, 57], reported the number of study participants who died from other cardiovascular complications not necessarily related to HF and/or number of rehospitalization for HF, for both the usual care and telemonitoring groups. Overall, 1,506 participants were randomized into both the intervention and control based on predefined eligibility criteria. Age of participants in mean (SD) ranged from 66.9(10.5) to 73(5), with a gender distribution of 1,141 males and 365 females. Comorbidities and risk factors recorded at baseline included peripheral vascular disease/stroke, the existence or history of IHD, diabetes, COPD, AF, MI, hyperlipidemia, hypercholesterolaemia, hypertension, history of depression, smokers, anemia, renal dysfunction, and uncured malignancy. An assortment of telemonitoring devices and tools were used. Some of these included electronic balances, blood pressure gauges, portable ECG devices, handheld personal digital assistants (PDAs), remote/wireless data transmission services, web servers, along with phones/text messages. All studies assessed were found to be of good quality, but there was no uniformity in the NYHA classes of study participants (Villani et al. - III-IV, Angermann et al. - I-IV, Koehler et al. - II-III). The heterogeneity was moderate (I2 = 68.1%), and hence the random effect model was applied. Our meta-analysis suggests that the intervention significantly decreased the RR for cardiovascular death/rehospitalization due to HF (RR = 0.70; 95% CI (0.51, 0.97); p-value = 0.03) (see Fig. 7).

Fig. 7
figure 7

Forest plot for cardiovascular death/HF hospitalization

3.4.3 Other outcomes

Summaries of results for other analyzed outcomes, which were popularly mentioned among several studies, and were thought to be clinically important but not statistically significant, are presented as follows:

  • Ejection fraction (primary)

    The EF variable (measured in % units) was reported by 7 studies [48]–[51, 54, 56, 58]. There were 1,285 participants (males = 948, females = 337) with ages ranging from 52.3 (13.7) - 73(5) mean (SD). Two of the studies [49, 54], were assessed to have a poor RoB, while the SOI index varied from minimal to severe across all studies. Only in Seto et al. [49] was one category of HF considered, i.e. NYHA functional class III. The heterogeneity was high (I2 = 96.26%). Consequently, we applied a random effect model. The meta-analysis suggests that, although the EF seemed to increase more in respect to the usual care group, there were no significant differences in the treatment effect between the intervention and usual care. (WMD = 0.45; 95% CI (-1.11, 2.02); p-value = 0.57).

  • Aggregate QoL (primary)

    Using the Minnesota living with heart failure questionnaire (MLHFQ), the aggregate QoL index was assessed in five studies [35, 36, 40, 49, 58]. The total number of patients enrolled for both the intervention and control is 478 (males=371, females=107). Mean age varied between 52.3(13.7) - 73(8). Three studies [36, 40, 49] had a poor RoB rating, while the NYHA classes were split between minimal to moderate [36, 40] (NYHA HF Classes I-III), and moderate to severe [35, 49, 58] (NYHA Classes II-IV). Given the high heterogeneity (I2 = 96.99%), a random effect model was applied. The treatment effect was not significantly different between both arms of the studies (WMD = 2.14; 95% CI (-6.55, 10.82); p-value = 0.63).

  • QoL - Physical component (primary)

    Three studies [35, 36, 49], reported this variable scored with the MLHFQ (presented in mean(SD)). The number of patients totaled 335, with the cohort registering 248 males and 87 females. The age of participants was reported in mean (SD) and was limited to between 52.3(13.7) - 73(8). In [35] and [49], only NYHA Classes II-IV were enrolled, while [36] focused on participants with minimal to moderate HF (NYHA Classes I-III). Likewise, two of the studies [36, 49] were judged to have a high RoB. Since the heterogeneity was high (I2=93.84%), a random effect model was applied. Our meta-analysis suggests that this index increased more in the intervention group compared to usual care, however, the increase was not significant (WMD =0.27; 95% CI (-2.64, 3.18); p-value = 0.857).

  • Days alive & unhospitalized (primary)

    The days alive and unhospitalized variable was reported by 4 studies [35, 51, 59, 60]. The sample population that participated in the RCTs totaled 1,435 (males = 890, females = 545). The mean age of participants included in the studies was between 62.7(12.5) - 83(7) in mean (SD). Only one of the studies, Ritchie et al. [60], was assessed to have high bias. Moreover, the NYHA HF class for this particular study was not documented. Given the low heterogeneity (I2 = 22.4%), a fixed effect model was applied. The pooled results did not show any noticeable variation in the treatment effect between the active and control arms of the studies (WMD = 0.24; 95% CI (-0.67, 1.14); p-value = 0.61).

  • Self-Care Behavior (primary)

    Two studies [49, 61] evaluated self-care behavior, reported as the self-care for heart failure index (SCHFI) score for self-care maintenance (I2 = 68.11%), SCHFI self-care management (I2 = 0%) and SCHFI self-care confidence (I2 = 0%). Altogether, 198 participants (males=144, females=54) took part in the 2-armed RCTs and were randomized between usual care and telemonitoring groups. In [49], the age bracket of participants for both the control and intervention groups varied between 52.3(13.7) and 55.1(13.7), reported in mean (SD), and 72 (62–83) to 72 (60–77), reported as median (interquartile range) in [61], possibly due to the non-Gaussian nature of the distribution. Seto et al. [49] recorded a poor RoB assessment. NYHA classification of selected patients was II–IV and III-IV for [49] and [61] respectively. The random effect model was selected to synthesize the self-care maintenance variable so as to properly capture the varying treatment effect the intervention might have had due to between-study heterogeneity. Our meta-analysis suggests that though the SCHFI self-care maintenance score [62] increased for the intervention (cut-point of ≥70 on each SCHFI scale recommended) compared to usual care, this increase was not significant (WMD = 4.18; 95% CI (-2.04, 10.40); p-value = 0.188). On the other hand, given the extremely low heterogeneity observed in both self-care management and confidence, a fixed effects model was applied. Our meta-analysis suggests that the SCHFI self-care management index score for the intervention decreased compared to usual care (WMD = -2.89; 95% CI (-7.88, 10.40); p-value = 0.256), while the SCHFI self-care confidence score increased (WMD = 0.87; 95% CI (-4.03, 5.77); p-value = 0.728). Notably, these treatment effects were not significantly different between both arms of the studies.

  • Peak VO2 (VO2max, primary)

    Peak VO2 is the maximum rate of oxygen consumption measured in milliliters of oxygen used in one minute per kilogram of body weight (mL/kg/min) during exercise of increasing intensity, where V stands for volume and O2 is for oxygen. Two studies [55, 56], comparing usual care (control) versus telemonitoring for CHF patients (intervention) reported this value. The total number of randomized participants was 163, with an age range of 54.5(10.9) - 62(9.3) reported in mean (SD). This sample population consisted mainly of 142 males and 21 females. NYHA class III [56] and classes II-III [55] CHF patients were recruited and randomized respectively. The RoB evaluation was poor in [55] while [56] was relatively free of bias. Since heterogeneity was high (I2=99.74%), a random effect model was applied. The synthesis results did not show significant differences between the telemonitoring intervention and traditional care (WMD = -0.45; 95% CI (-6.23, 5.35); p-value = 0.880).

  • Diastolic function

    Two studies [48, 51] reported the diastolic function as a measure of percentage millimeters of mercury (mmHg). The selected sample population for the two studies totaled 796 participants, comprising 564 males and 232 females. The average age documented in mean (SD) ranged from 68.6(12.2) to 73(5). Reported comorbidities across the two studies were fairly similar and risk factors were majorly linked to smoking. NYHA classes III to IV patients were assessed in [48], and I to IV in [51]. The RoB for both studies was judged to be good. A random effect model was used for the analysis due to high heterogeneity (I2 = 82.6%). Our meta-analysis suggests that the intervention decreased diastolic blood pressure compared to usual care, but not so much as to make a significant difference (WMD = -4.30; 95% CI (-9.18, 0.59); p-value = 0.085).

  • All cause death-rehospitalization (secondary):

    The all-cause death/rehospitalization variable was reported by 4 studies [34, 51, 60, 63]. In all, 2,903 participants were enrolled in the telemonitoring and control groups consisting of 1,710 males and 1,193 females. In [34, 51, 60], the mean age range was between 62.7(12.5) and 73(10), while in [63], age values were given in median (interquartile range), i.e. 73 (62-84) - 74 (63-82). Three studies [34, 51, 63] had a good RoB rating. Patient severity in Angermann et.al. [51] and Ritchie et al. [60] was set to NYHA classes I-IV, and II-IV in Krum et al. [34] and Ong et al. [63]. Given the moderate heterogeneity (I2 = 59.81%), a random effect model was applied. The results of the synthesis did not significantly vary between the two arms (RR = 0.91; 95% CI (0.75, 1.11); p-value = 0.36).

  • Number of all cause hospitalization (secondary):

    The number of all cause hospitalization variables was reported by 8 studies [35, 36, 40, 49, 55, 57, 58, 64]. A total of 1,509 participants (males = 1,204, females = 305) within the age range of 52.3(13.7) - 73(8) in (mean (SD)) were enrolled in the study. Moreover, there was high heterogeneity in patient severity assessed across all studies. The RoB was split evenly between good and poor judgements, i.e. four good judgements [35, 57, 58, 64] and four poor judgements [36, 40, 49, 55] were recorded. Given the low heterogeneity (I2 = 24.98%) obtained from the pooled results, a fixed effect model was applied. The results of the synthesis were not significantly different between the intervention and control arms. (WMD = 0.06; 95% CI (-0.05, 0.17); p-value = 0.26).

  • Death from any cause (secondary):

    The death from any cause variable was reported by 15 studies [38, 39, 48, 49, 51,52,53,54, 57, 60, 61, 63, 65, 66] and was reported in n(%). The number of participants in this cohort totaled 6,974 (males = 4,558 females = 2,416), with age representations from 52.3(13.7) – 76(10) for all studies apart from Ong et al. [63], which recorded a median age of between 72-74 and (60 - 84) interquartile range. The NYHA classes were heterogeneous across all studies except for Härter et al. [38] and Koelling [39], which did not specify any SOI index. Majority of the studies (9) had a good RoB rating. Given the moderate heterogeneity (I2 = 53.16%), a random effect model was applied. The results were not significant (RR = 0.81; 95% CI (0.65, 1.01); p-value = 0.066).

3.4.4 Sensitivity analysis

All analyses carried out were for outcomes that were statistically significant and had more than two studies included in the pooled results. The sensitivity analysis was performed by leaving out one study at a time, and then evaluating the resulting effects. Analysis results are presented in Table 5. All Doi plots for the analyzed variables can be found in the supplementary documentation S4.

Table 5 Results of sensitivity analysis

The meanings of the symbols shown in Table 5 are given below:

  1. (a)

    Significancy:

    1. 1.

      (=) - means that by removing a particular study, the significance (p value less than 0.05) is not lost.

    2. 2.

      (!=) - means that by removing a particular study, the significance (p value less than 0.05) is lost.

  2. (b)

    Pooled effect:

    1. 1.

      (=) - means that by removing a particular study, the pooled effect is the same (i.e., on the same side).

    2. 2.

      (!=) - means that by removing a particular study, the pooled effect is not the same, but now falls on the opposite side.

  3. (c)

    Heterogeneity:

    1. 1.

      (+) - means that by removing a particular study, the heterogeneity increases.

    2. 2.

      (-) - means that by removing a particular study, the heterogeneity decreases.

3.4.4.1 Brain natriuretic peptide

For this variable, an LFK index of -3.15 was recorded from the Doi plot, signifying major asymmetry. On removing Seto et al. [49] and Domenichini et al. [40], and reviewing the results obtained, we found that the synthesis for BNP became insignificant. Heterogeneity was again found to decrease when [40] was removed. Removing the study from the synthesis also showed a loss in significance of the results. The pooled effect remained the same in all instances where each study was removed. We are of the opinion that some of the variations observed in this analysis could have been affected by the quality and non-uniformity in the baseline characteristics of some of the studies.

3.4.4.2 Rehospitalization/hospitalization for HF

The LFK index obtained from the Doi plot for this variable was -6.38, reflecting major asymmetry of the results. Excluding Leibovici et al. [53] and Villani et al. [48] from the synthesis further reduced the heterogeneity in each instance. However, this behavior has little or no impact on the original results since heterogeneity is already low when all studies are taken into account. The variable under analysis remained statistically significant in all instances, and the pooled effect also remained the same.

3.4.4.3 Six-minute walk test

For this analysis, the Doi plot exhibits minor asymmetry with an LFK index of 1.62. When Piotrowicz et al. [55] and Sherwood et al. [50] were removed, the synthesized outcome became statistically insignificant for the two instances. By removing Domenichini et al. [40] and reviewing the results obtained, we found that the heterogeneity as expressed by I2 decreased, while the significancy and pooled effect remained constant. This might have been due to quality issues with the study since three instances of high bias (blinding of participants and personnel, blinding of outcome assessment, other bias) resulted in a poor global outcome. The pooled effect remained the same for all studies removed.

3.4.4.4 Cardiovascular death and/or HF hospitalization

For this analysis, the Doi plot showed minor asymmetry with a LFK index of -0.41. The pooled results became statistically insignificant when Villani et al. [48] and Angermann et al. [51] were discarded. Furthermore, the synthesized outcome witnessed a decrease in heterogeneity when Koehler et al. [57] and Villani et al. were respectively pulled out. In all instances under consideration, the pooled effect remained the same. We propose that the small sample size of participants in [57], and the different NYHA classes of CHF patients enrolled into the studies could have contributed to this behavior.

4 Discussions

As mentioned earlier, this study focused on all aspects of documented evidence for interventions and outcomes on telemonitoring practices for CHF (as specified in the eligibility criteria) within a period of about seven years (2012–2019), thus providing a broad and holistic appraisal and analysis of the body of evidence available. This is in contrast to some previous systematic reviews and meta-analysis, which only assessed just a few areas of home telemonitoring. For example, Feltner et al. [67] evaluated the impact of transitional care using structured telephone support (STS) on efficacy, comparative effectiveness, and harms, with a focus on the number of readmission and mortality. In the same vein, Inglis et al. [18] analyzed the effects of telemonitoring and STS on all-cause mortality, and all-cause and CHF/cardiac-related hospitalizations. The three studies found that telemonitoring significantly reduced HF-specific hospitalization and all-cause death in the intervention arm. All-cause hospitalization was unaffected by the intervention in [67], while there were considerable reductions in HF mortality. However, no significantly distinguishable difference was found in the two arms of the study in Zhu et al. [68].

Additionally, Clarke et al. [69] conducted a systematic review and meta-analysis somewhat similar in construction to our review, but with a much smaller number of selected articles and sample population, and a limited number of outcomes. The authors found that there was an overall reduction in all-cause mortality and CHF hospital admission, while there was no significant effect on all-cause hospital admission. These results were contradicted in a more recent systematic review and meta-analysis [70], where the authors observed that the pooled effect estimate of telemonitoring interventions on all-cause hospitalizations and all-cause death in patients with recently decompensated heart failure was neutral. In line with these results, our findings also show that there were no significant differences between intervention and usual care in all-cause mortality, all-cause hospitalization, and a composite of all-cause death and rehospitalization, although the sample population for these variables were fairly large. Moreover, we observed an appreciable relative risk reduction of 30% in cardiovascular death/hospitalization due to HF, which is similar to the results obtained in [68], and our results also suggest that 12% less patients are likely to be hospitalized for CHF if similar methods and KET’s adopted in the specified studies are implemented. These are patient-important outcomes in that the observed reduction in the risk of adverse events can greatly improve the QoL of patients, as well as relieve usage of healthcare facilities.

Another key finding from our results, which is again important for both patients and healthcare personnel, is that the BNP levels of the telemonitoring group was significantly lowered by 27.75 picograms (pg) per milliliter (mL) compared to the control group. Since a BNP blood test usually leads to an accurate diagnosis of HF and values less than 100 pg/mL are considered normal [71, 72], the marked improvement in this index for the intervention group suggests the clinical efficacy of this outcome, if administered to similar populations as reported in the analysis. We did not find this variable reported to any great extent in any of the systematic reviews and meta-analysis we consulted. Similarly, concerning exercise-related outcomes, our findings align with those of Cavalheiro et al. [21], showing that the intervention significantly improved functional status/fitness of participants and their capacity for aerobic exercise - these being specific measures of the 6MWT. The generalizability of this treatment effect has to be carefully considered however in relation to population size, age range, and disease severity (mild to moderate). The benefit of this effect usually translates to an overall improvement in clinical efficacy for the population under consideration.

Furthermore, we found a notable increase in the number of visits to a nurse (or healthcare personnel) for the intervention arm compared to the control for the studies examined for this variable. Other patient-important variables which are normally used to assess clinical efficacy, but which did not result in significant differences between the intervention and controls groups in our synthesis include EF, aggregate QoL, QoL - physical component, days alive and unhospitalized, self-care behavior, peak VO2, and diastolic function. We observed that there was a marginal improvement in the EJ in the control group compared to the intervention group, even though this change was not significant. Also, the physical component of QoL was found to have improved for the intervention group even though the aggregate QoL for both the intervention and control groups in the synthesized studies did not witness any significant changes. A fairly large pool of participants (1,435) was meta-analyzed in the studies selected for the ‘days alive and unhospitalized’ variable. This variable was of interest because of its potential to contribute appreciably to the QoL of the patient. However, the pooled results showed no noticeable difference between the two arms of the studies considered.

Furthermore, we found that for the self-care behavior variable, there was an improvement in the SCHFI self-care maintenance and confidence score for the intervention compared to the control group, while conversely, the SCHFI self-management score was observed to decrease. Usually, self-care behavior is commonly linked to variations in clinical outcomes and healthcare costs. It is important to note that better self-care behavior could also dramatically increase patient function and wellbeing. Another variable commonly used to evaluate exercise capacity is the Peak VO2. No significant differences were observed between the telemonitoring intervention and usual care. As well, the estimate of the average effect of the diastolic function shows that the intervention group witnessed some slight improvement in the diastolic function evidenced by a marginal reduction in the diastolic blood pressure compared to usual care. However, this did not result in any significant differences between both groups. Both studies synthesized for this outcome were judged to be free of bias, thus providing relative evidence of the treatment effect.

The results presented in this review should be carefully interpreted with regards to observed limitations in the body of evidence, some of which include wide variations in the following: sample populations, SOI index across outcome variable clusters, RoB assessment, gender distribution, telemonitoring tools used, and follow-up periods for synthesized studies. Comorbidities were fairly similar across all studies except in a few cases [36, 39, 49, 58, 59, 64, 66], where none were documented. Another limitation we wish to highlight is paucity of data for synthesizing particular outcomes of interest. This was especially true where data for these outcomes were reported in very few studies (e.g., all-cause emergency admission, length of stay in hospital, cost analysis and medication adherence). Further to the above, because our objective was to provide broad evidence of telemonitoring effects as compared to usual care, it is possible that we might have overlooked certain outcomes which were limited to just a few studies, or which could not be pooled together because of differences in terminology and/or unit of measure used. This situation led us to sometimes make informed assumptions about the terms to adopt and the categorization of these outcomes.

5 Conclusion

Overall, the evidence presented in this systematic review and meta-analysis for home telemonitoring of CHF patients using non-invasive methods compared to conventional approaches demonstrates an appreciable reduction in the relative risk of cardiovascular death and hospitalization due to CHF. We also found that the clinical efficacy of the telemonitoring group significantly improved compared to the control group. Patient-important variables contributing significantly to this outcome included notable decreases in BNP levels and improvements in the functional status/fitness of participants and their capacity for aerobic exercise. Marginal improvements were also recorded in the physical component of QoL in the SCHFI self-care maintenance and confidence score for the intervention group. Besides, the intervention group also witnessed some slight improvement in the diastolic function evidenced by a marginal reduction in the diastolic blood pressure compared to usual care. The observed decrease in the risk of adverse events is very important to CHF patients’ wellbeing and QoL, and also provides a means of relieving usage of healthcare facilities and allows better appropriation of healthcare resources. The body of evidence presented here has the potential to help policy makers and guideline developers to make adjustments to existing policies and healthcare guidelines on CHF treatment and management in a way that will stimulate a huge impact on clinical efficacy and patients’ wellbeing. Also, our findings will better aid healthcare professionals to update their knowledge on what telemonitoring treatments will likely work for different categories of HF patients. Further areas of research that could be explored include determining the types of interventions better suited to specific follow-up periods, telemonitoring methods and tools, as well as patient needs. Also of importance is the geographical location of the intervention. Some study authors actually recommended particular geographical locations (i.e., rural versus urban) for the implementation of their results. Further, considering that BNP levels can provide an accurate diagnosis of HF, and that this variable was only properly reported in a minuscule of the studies we consulted, we recommend that this variable should be assigned more importance in future studies. For all synthesis in which we observed high heterogeneity, we plan to update the meta-analysis in future by expanding the number of quality studies included in order to reduce heterogeneity, improve the pooled effect estimate, and potentially increase the certainty of evidence and reliability of the results. Correspondingly, for outcomes for which we experienced paucity of data, our intention is to carry out more extensive searches that will include longitudinal and unpublished data, as well as grey literature.