Introduction

Patients’ performance status is a significant prognostic and predictive factor for clinically relevant outcomes, such as progression-free and overall survival of patients with cancer [1,2,3]. It therefore is one of the key inclusion criteria for clinical trials and often serves as stratification factor in trial design and analyses. Moreover, in daily clinical practice it is used to decide whether a patient is fit for systemic therapy [4], or eligible for early phase clinical trials. Patients’ performance status is determined by healthcare professionals using either the Karnofsky Performance Status (KPS) or the Eastern Cooperative Oncology Group Performance Status (ECOG-PS, also known as WHO PS) [5, 6]. Both methods have proven their clinical relevance over the past decades and are widely used. However, these methods also present with potential bias and limitations [7]. First, performance status scoring depends on the oncologists subjective rating of a patient’s health and functioning with no standardized process for this assessment, making it prone to under- and overestimation, and inter-observer differences [8,9,10,11,12]. Second, performance status assessment may be susceptible to response and recall bias as it relies on patient-reported physical activity and functioning [13]. Third, both KPS and ECOG-PS are static measurements that are only captured during scheduled visits, whereas patient’s physical performance is a dynamic process that may change on a daily basis during the course of treatment. As a result, recent reviews have accentuated the need for a tool that can assess patient’s physical performance objectively in a more dynamic fashion [7, 14].

The expanding armamentarium of wearable activity monitors (e.g., accelerometers, pedometers, fitness trackers, smartwatches, and smartphones) offers new opportunities to supplement physician-assessed performance status with objective assessments of physical activity and sedentary behavior, which are passively gathered in a non-obtrusive manner. It is even hypothesized that wearable activity monitor metrics might prove superior to clinician-rated performance status or patient-reported data in terms of accurately discriminating between the heterogeneous spectrum of cancer patients [7, 14]. Therefore, wearable activity monitor-derived data may assist healthcare professionals in making treatment decisions (e.g., mono vs doublet vs triplet chemotherapy) for individual patients [15, 16] and could be useful in assuring that performance status of patients enrolled in clinical trials is recorded accurately [7]. Multiple recent clinical studies have demonstrated the feasibility of using wearable activity monitors to assess physical activity and sedentary behavior in patients with cancer. However, no aggregated evidence is available about the use of wearable activity monitor-derived physical activity and sedentary behavior metrics to supplement physician-assessed performance status. As a first step towards this purpose, we conducted a systematic review on the association between wearable activity monitor physical activity and sedentary behavior metrics and performance status in patients with cancer.

Methods

A systematic review of available literature was conducted in agreement with the guideline for preferred reporting items for systematic reviews and meta-analyses (PRISMA statement) [17]. This review has been registered in PROSPERO (CRD4202013865).

Literature search

MEDLINE® and Embase databases were searched from inception until April 2020 to identify all relevant published articles. An experienced clinical librarian from the Amsterdam UMC was consulted for the development of the search strategy. Relevant keywords included terms related to wearable activity monitor metrics AND cancer population AND wearable activity monitors (e.g., fitness trackers, smartwatches, accelerometers, pedometers, actigraphs, and inclinometers). The complete search strategies are presented in Supplementary Table 1. Moreover, cross-referencing was performed to identify additional relevant studies for the systematic review.

Eligibility criteria

Studies were included if they (1) were conducted among adults (≥ 18 years) with cancer, (2) objectively measured physical activity or sedentary behavior using wearable activity monitors in the outpatient setting, (3) measured physician-assessed performance status, (4) quantitatively assessed an association between wearable activity monitor metrics and clinician-assessed performance status, and (5) had a full text available in English.

Definitions of wearable activity monitor metrics for physical activity, sedentary behavior, and circadian rest-activity rhythm

Many different activity-related wearable activity monitor metrics are being used in research and reported wearable activity monitor metrics often depend on the used device. Four main categories of wearable activity monitor metrics relevant for this review can be identified: (1) accelerometer-related activity count-based metrics that capture the duration and intensity of accelerations in counts per minute and/or hours. Moreover, these intensities can be used to determine absolute or relative time spent in sedentary behavior, light physical activity (LPA), or moderate-to-vigorous physical activity (MVPA) based on predefined cut-points. (2) Posture-based measures that define hours of percentage of time per day spent sitting/lying (i.e., sedentary), standing, or stepping. (3) Steps-based measurement that estimate the number of steps per day using an algorithm that determines whether accelerometer measurements meet the threshold to be counted as a step. (4) Circadian rhythm rest-activity actigraphy parameters like, the dichotomous in-bed vs out-of-bed index (I < O) and mean activity level (MeanAct).

Selection process and eligibility criteria

Titles and abstracts of articles identified by the electronic database searches were extracted and duplicates were removed. Two reviewers (MK and EP or BB) independently screened the records of the initial search to select potentially relevant articles that were subsequently subjected to full-text screening for eligibility. Any discrepancies between reviewers were discussed in person. If no agreement was reached, discrepancies were referred to a third reviewer (MvO) before a final decision was made on inclusion.

Data extraction

Data on study design and population, physical activity/sedentary behavior measurement characteristics and protocols, wearable activity monitors, and outcome measures were extracted using a standardized form including the following items: first author, year of publication, country, study design, number of included patients, type of cancer and disease stage, current treatment, comorbidities, performance status scale used, measure of physical activity or sedentary behavior (including definitions and cutoff points), devices used for physical activity/sedentary behavior measurements, wear location of devices, statistical methods and analyses, and results on association between wearable activity monitor metrics and performance status. If point estimates (e.g., mean, median) were only depicted in figures, authors were first contacted and asked to provide these point estimates. If we did not receive data from the authors, we used open source software WebPlotDigitizer (version 4.3) to estimate the point estimates and corresponding measures for variability (e.g., IQR or SD) from the figures [18,19,20]. The widely used empirical classifications proposed by Evans were used for interpreting correlation strengths [21]. Correlation coefficients of 0–0.19 were interpreted as very weak, 0.2–0.39 as weak, 0.4–0.59 as moderate, 0.6–0.79 as strong, and 0.8–1 as very strong.

Risk of bias assessment

The risk of bias of included studies was scored independently by two reviewers (MK and EP) using a risk of bias assessment tool based on the guideline for assessing quality in prognostic studies [22]. These guidelines comprise 6 potential biases (i.e., study participation, study attrition, prognostic factor measurement, outcome measurement, confounding measurement and account, and analysis) and were applied on the basis of relevance to this systematic review. Subsequently, the potential biases were translated into a 15-point quality criteria list, based on previously published risk of bias criteria lists (Supplementary Table 2) [23,24,25,26]. Furthermore, quality items were categorized as informative (I, 5 items) or validity/precision (V/P, 10 items) [23, 25]. If the study provided adequate information and met the criterion of the quality item, one single point was appointed. If the study provided insufficient information or did not meet the criterion of the quality item, no point was appointed. If the study referred to another article for relevant information regarding the quality item, that article was reviewed to score the item. Disagreements regarding the risk of bias assessment were resolved by discussion and, if no agreement could be reached, by consulting the third reviewer (MvO). A total risk of bias score was calculated for each included study by dividing the amount of positively scored validity/precision items by the total amount of validity/precision items (i.e., 10), resulting in a score between 0 and 1, presented as a percentage. The five informative items were not included in the final calculation, as these items represent descriptive information only [23, 25]. In line with previous reviews [23, 27], a study with a score ≥ 70% was considered to be of “low risk of bias,” whereas as a study with a score < 70% was considered to be of “high risk of bias” [23, 27].

Level of evidence

A 3-level scoring system for best evidence synthesis based on the number, methodological quality, and consistency of outcomes of the different studies was applied to synthesize the methodological quality of the included studies [23, 27, 28]: (1) strong evidence; provided by general consistent findings in multiple (≥ 2) studies with low risk of bias, (2) moderate evidence; provided by general consistent findings in 1 study with low risk of bias and 1 or more studies of high risk of bias or general consistent findings in multiple (≥ 2) studies with high risk of bias, and (3) insufficient evidence; only one study available or inconsistent findings in multiple (≥ 2) studies. Results were considered general consistent when at least 75% of studies showed results in the same direction.

Results

The combined literature search yielded 1511 unique records. The result of the systematic literature search and subsequent selection of studies is depicted in Fig. 1. After initial screening, 373 articles were retrieved in full-text and checked for eligibility. Cross-referencing provided one additional study [29] that could be included in the systematic review. Ultimately, 14 studies met the eligibility criteria and where included for further analysis. Table 1 provides an overview of the baseline characteristics of the included articles.

Fig. 1
figure 1

Flowchart of literature search and inclusion of studies. PA, physical activity; SB, sedentary behavior; KPS, Karnofsky Performance Status; ECOG-PS, Eastern Conference Oncology Group Performance Status; CircAct, circadian rest-activity rhythm; PS, performance status; n, number of studies

Table 1 Baseline characteristics of the included studies (n = 14)

Risk of bias assessment

Results of the risk of bias assessment are depicted in Table 2. The median methodological quality of the included studies was 40% and ranged from 20 to 60%. Only 29% of included studies had an adequate participation rate (> 80%). Most (64%) studies had a small sample size (n < 100). None of the studies adequately described methods used for dealing with missing physical activity data, and the majority of studies (79%) used a combination of device and wear-time protocol that has not been adequately validated in the studied population.

Table 2 Risk of bias assessment tool and quality score of the included studies

Wearable activity monitors and physical activity/sedentary behavior metrics

The characteristics of the used wearable activity monitors and physical activity/sedentary behavior metrics are summarized in Table 3. A total of 9 different devices were used in the included articles and worn on the wrist (9 studies [29,30,31,32,33,34,35,36,37]), thigh (3 studies [38,39,40]), hip (1 study [41]), or waist (1 study [42]). Furthermore, different wear-time protocols were used between studies. Included articles reported on a total of 12 different wearable activity monitor metrics: 8 studies reported on steps taken [33,34,35, 37,38,39,40,41], 6 on sedentary behavior (i.e., posture-based and activity counts-based) [34, 36, 38, 39, 41, 42], 4 on mean daily activity counts [29,30,31,32], 3 on time spent in MVPA [40,41,42], 3 on the dichotomy in-bed versus out-of-bed index I < O [30,31,32], and 2 on time spent in light physical activity (LPA) [41, 42]. Other reported PA metrics included: time spent stepping [38,39,40], time spent standing [29, 31], distance walked [33], total metabolic expenditure per day [38], and daily floors climbed [33] (Table 3). Different methods, definitions, and cut-points were used for sedentary behavior, LPA, and MVPA based on the devices used.

Table 3 Characteristics of used wearable activity monitors and metrics in included studies

Physical activity and sedentary behavior outcomes per ECOG-PS group

Table 4 provides an overview of the included studies that reported on the physical activity and sedentary behavior outcomes per ECOG-PS group. All studies that reported on mean daily steps per ECOG-PS group found significant between-group differences with more daily steps in patients with better performance status. Moreover, two studies [38, 39] reported that patients with better performance status spent significantly less time sitting/lying (i.e., sedentary) and significantly more time standing and stepping as compared to patients with worse performance status. Three studies reporting on mean total activity counts per day and the circadian rest-activity dichotomous index I < O, all showed significantly more activity counts and per day and less circadian rhythm disruption in patients with better performance status [30,31,32]. Regarding intensity-based wearable activity monitor metrics, one study reported that patients with good performance status spent significantly more time in LPA and MVPA and significantly less time sedentary as compared to patients with poor performance status [41]. Conversely, another study did not show significant differences regarding time spent in LPA and MVPA between groups based on performance status [42]. This study, however, reported that patients with better performance status spent significantly less time sedentary as compared to patients with worse performance status.

Table 4 Physical activity and sedentary behavior outcomes from wearable activity monitors per ECOG-PS group

Evidence synthesis for associations between wearable activity monitor metrics and performance status

In total, we found 14 studies that could be included in the evidence synthesis. Results of evidence synthesis for the association between wearable activity monitor metrics and performance status are compiled in Table 5. We found moderate evidence for a moderate positive association between daily steps and performance status and moderate evidence for a weak positive association between activity counts and performance status. Moreover, we found moderate evidence for moderate positive associations between time spent standing/stepping and performance status and between the circadian rest-activity dichotomous index I < O and performance status. Finally, we found moderate evidence for moderate negative associations between sedentary behavior (intensity- or posture-based) and performance status.

Table 5 Evidence synthesis for association between wearable activity monitor metrics and performance status

Discussion

In this study, we reviewed the available evidence on the association between wearable activity monitor metrics and physician-assessed performance status. Evidence synthesis showed moderate evidence for weak-to-moderate positive associations between performance status and various wearable activity monitor metrics and a moderate negative association between performance status and sedentary behavior.

Different possible explanations can be provided for the absence of strong associations. First, these weak-to-moderate associations may suggest that wearable activity monitors and performance status scales assess different constructs of physical performance and cannot simply be interchanged. Wearable activity monitors objectively measure physical activity (levels) and can therefore be regarded as performance-based measurements that are independent of judgment. Performance status scales, on the other hand, are evaluation-based measurements that involve judgment using idiosyncratic criteria [5, 6]. Another possible explanation for the absence of strong associations can also be provided from a measurement perspective, with only few categories for physicians-assessed performance status. In the studies included in this review, only 12% of patients had poor performance status (ECOG-PS 2–3). The limited variation in scoring could have contributed to the absent, or weak-to-moderate associations. Furthermore, substantial heterogeneity across studies in terms of devices used, wear-time protocols, study population, and methodology could be a potential source for the absence of strong associations between wearable activity monitor metrics and performance status.

More detailed objectively and passively gathered activity data from wearable activity monitors might be of added value in clinical practice. Wearable activity monitor-assessed physical activity/sedentary behavior might serve as a dynamic and objective supplement measurement of patients’ performance status as assessed by the physician and, as such, might prove to be of added value in clinical decision making and evaluation of treatment options in oncology. This hypothesis is substantiated by observations that more daily steps are associated with lower risk of hospitalization during cancer treatments [33, 43], longer survival [33, 41, 44], and lower chance of serious adverse events [33, 45]. Interestingly, Fujisawa et al. demonstrated that among patients with good performance status (ECOG-PS 0–1), ECOG-PS was not predictive for survival, while sedentary behavior was a significant predictor for 6-month survival [36]. Moreover, Jeffery et al. reported that patients with a survival longer than 3 months spent significantly less time sedentary as compared to those who survived less than 3 months [41]. Together, these results suggest potential value of objective sedentary behavior measurement in predicting survival outcomes, especially in patients with good PS. In this way, wearable activity monitors might assist physicians in clinical decision making, like determining whether a patient is fit for treatment.

Most currently available wearable activity monitors (e.g., Fitbit Charge HR) are multisensory devices that have a built-in 3D-accelerometer as well as other sensors that measure, for example, heart rate. In the era of advancing artificial intelligence and machine learning, it is very conceivable that data input from combinations of wearable activity monitor sensors and data patterns over time might prove to be superior in assessing performance status and predicting outcomes for patients with cancer than physician-assessed performance status.

Recently, various pilot studies have demonstrated the feasibility of using wearable activity monitors in the context of an ambulatory monitoring platform that longitudinally assesses treatment-related adverse events, unplanned healthcare encounters, and survival in patients with cancer [33, 46, 47]. Results from these studies suggest potential for wearable activity monitors in early detection of adverse event and unplanned healthcare encounters. The application of wearable activity monitors in this context has a lot of potential to be clinically impactful and improve cancer care. Future research should focus on proving the efficacy of wearable activity monitors as a part of ambulatory monitoring platforms.

An important finding of our systematic review is the high heterogeneity between included studies regarding study population, devices used, wear-time protocols, definitions and cut-points used for different physical activity metrics, and reporting of outcomes, thereby hindering adequate comparison of results on the association between physical activity metrics and performance status and complicating best evidence synthesis. A second limitation of the studies included in this review is the high risk of bias scores. Major factors contributing to the high risk of bias scores were unvalidated methods regarding physical activity measurements, low sample sizes, and the lack of multivariable analyses to adjust for relevant confounders. It should be noted that the majority of these studies investigated the association between wearable activity monitor metrics and performance status as secondary or exploratory analysis, which may have contributed to the high risk of bias scores. Consequently, it is currently also unclear whether the association between wearable activity monitor metrics and performance status varies by cancer type or stage. With regard to the physical activity measurements, none of the studies adequately reported on the handling of missing physical activity data. Different studies have emphasized the need for missing accelerometer data imputation and suggested statistical methods of handling missing data [48, 49]. Moreover, the majority of studies used devices, wear-time protocols, or cutoff points that have not adequately been validated in comparable populations. Taken together, results should be interpreted with caution and emphasize that standardization of wearable activity monitor-measured physical activity and sedentary behavior methodology is warranted to decrease risk of bias in future studies on the subject.

Strengths of this systematic review include the in-depth risk of bias assessment that was adjusted specifically for studies using a wearable activity monitor for physical activity and sedentary behavior measurements and the subsequent best evidence synthesis. However, most of the included studies were not designed to investigate the association between wearable activity monitor metrics and performance status, complicating risk of bias assessment and evidence synthesis. More than half of the included studies were designed to investigate physical activity levels in specific cancer populations, study the feasibility of wearable activity monitors, or explore associations between other wearable activity monitor metrics, like circadian rest-activity rhythm parameters, and various outcomes. Therefore, the association between wearable activity monitor metrics and performance status was often analyzed in a secondary or exploratory analysis resulting in suboptimal presenting of results. Moreover, results may be prone to reporting bias as non-significant associations are less likely to be reported, resulting in an overestimation of the associations between wearable activity monitor metrics and performance status.

In conclusion, we found moderate evidence for a positive weak-to-moderate association between various physical activity metrics and performance status and for an inverse moderate association between sedentary behavior and performance status. The strength of the associations should be interpreted with caution given the aforementioned limitations of the available evidence. Nevertheless, our results suggest that objectively measured physical activity may serve as a dynamic and objective supplement measurement of a patient’s functional performance status and may be of added value in clinical decision making and evaluation of treatment options in oncology. Next steps include to study the association between wearable activity monitor metrics and clinical outcomes and directly compare the predictive value of objectively measured physical activity with performance status for relevant clinical outcomes. Finally, consensus is warranted on the methodology of objective physical activity measurement and efforts should be made to validate the different methods (i.e., device, parameters, wear-time protocols) in relevant patient populations.