Background

Globally, depressive disorders contribute to 14.3% of all-age years lived with disability (YLD), making it the third leading cause of YLD [1]. Major depressive disorder (MDD) is a severe form of depression characterised by prolonged periods of low mood and anhedonia combined with a range of other symptoms including changes in sleep quality, appetite, cognitive function, energy levels, activity, feelings of guilt or worthlessness and thoughts of death [2]. MDD is associated with a wide range of negative outcomes including: loss of occupational function [3], reduced quality-of-life [4], and premature mortality [5]. Whilst some may experience a single lifetime episode of MDD, it is becoming more widely recognised as a chronic condition, characterised by periods of relapse and recovery [6, 7]. The management of chronic illnesses requires ongoing monitoring of symptoms, for example to track response to treatment or identify early indicators of relapse. This monitoring is dependent on self-reported questionnaires or clinical interviews, which are typically infrequent (e.g. conducted at clinic visits) and reliant on individuals’ recollection of symptoms, and subject to recall bias [8].

The use and ownership of smartphones and wearable technology has increased exponentially in the last decade. These technologies provide the opportunity to collect data using unobtrusive, inbuilt sensors requiring minimal input from users [9, 10]. In additional to unobtrusive passive data collection, there is scope for more frequent self-report information to be collected. Many features of MDD are amenable to assessment via remote measurement technologies (RMT): for example, heart rate from photoplethysmography (PPG) sensors and activity from accelerometery sensors in wrist-worn wearable devices can give information indicative of sleep patterns and physical activity levels. Data such as Global Positioning System (GPS), Bluetooth, gyroscope, phone screen interactions, ambient noise and light levels have also been used to collect information from smartphones relating to sociability, movement and activity associated with low mood [11]. In contrast to this passive RMT (pRMT) form of data collection, which requires little or no input from the user, active RMT (aRMT), deliverable by smartphone, requires the user to respond to a notification and complete, for example, short questionnaires, cognitive tasks or speech sampling tasks. Combining these active and passive data streams could potentially provide a real-time overview of the patient’s health status which could inform treatment delivery. It could further be used to predict future changes in health states – for example signals might be identified to predict a relapse in an otherwise healthy individual [12]. A key question in the use of smartphones and wearables to track health is that these technologies require considerable commitment from participants and/or patients. Not only must they consent for their personal smartphone data to be used, they must also be motivated to wear wrist-worn devices, to maintain such devices (e.g. to have them charged) and to interact with their phones to provide active RMT data. Whilst the wider field of digital medicine has seen vast growth and investment, many technologies have poor uptake [13, 14]. In depression the illness, characterised by loss of motivation, may be a further barrier to adherence with digital medicine protocols [15, 16]. If such technologies are to be used in real-world settings they therefore have to have high acceptability. A key question for the field is therefore the extent to which people with depression will adhere to such protocols. In a recent systematic review we identified 52 publications testing RMT in depression [17]. The literature was characterised by inconsistent reporting, and very rarely were data on adherence to protocol reported.

The study reported here, Remote Assessment of Disease and Relapse in Major Depressive Disorder (RADAR-MDD) [18], is a longitudinal cohort study examining the utility of multi-parametric RMT to measure changes in symptoms and predict relapse in people with MDD. The study was designed with patient involvement from the outset (including systematic reviews [19, 20], focus groups [21] and a Patient Advisory Board) with the aim of developing a protocol which meets the needs of the target population. RADAR-MDD offers an opportunity to explore the recruitment of people with MDD into a complex digital technology study, and describe the long-term retention rates and adherence to a protocol which includes passive data collection via smartphone and wearable sensors, app-based questionnaires, experience sampling method (ESM) and traditional web-based outcome assessments [18].

Throughout this paper, we have used the term data “availability” instead of “completeness” as we describe all data provided throughout the study, regardless of quality or completeness. Data labelled as “available” in this paper may include i) complete, valid data which are usable for analysis; ii) partial data which are incomplete but potentially usable; and iii) data which have been corrupted or are invalid for any reason. We believe it is essential to include partial or incomplete data as part of this paper, as they are indicative not only of participant characteristics and study burden, but also of the underlying technical infrastructure. We decided to not withdraw participants for not providing data via the smartphone apps or wearable devices. This concession gives greater insight into how data availability may fluctuate with changes in depressive state and provides a truer representation of the feasibility of implementing RMT protocols in people with MDD.

The aims of this paper are to: 1) summarise study recruitment, retention, and completion rates of primary and secondary participant-reported outcomes throughout the course of follow-up; 2) describe the sociodemographic and clinical characteristics of the cohort for the RADAR-MDD study; 3) describe the availability of data throughout a multi-parametric RMT study protocol including active and passive assessments of symptoms, behaviour and cognitive function and 4) determine whether participants with depression at baseline had poorer data availability.

Methods

Study design

The full protocol for RADAR-MDD has been reported elsewhere [18]. In short, RADAR-MDD is a multi-centre, prospective observational cohort study. The study aimed to examine whether data collected via multiparametric RMT can be used to reliably track illness course and predict relapse in MDD. The study sought to recruit 600 individuals with a recent history of recurrent MDD (with the latest episode within the past 2 years) and follow them up for a maximum of 24 months. The study has three recruitment sites: King’s College London (KCL, UK), Amsterdam University Medical Centre (VUmc. Amsterdam, The Netherlands), and Centro de Investigación Biomédica en Red (CIBER; Barcelona, Spain).

Study population

To be eligible for participation in RADAR-MDD, individuals must: 1) have met DSM-5 diagnostic criteria for non-psychotic MDD within the past 2 years; 2) have recurrent MDD (having had a lifetime history of at least 2 episodes); 3) be able and willing to complete self-reported assessments via smartphone; 4) be able to give informed consent; 5) be fluent in English, Dutch, Spanish or Catalan; 5) have an existing Android smartphone, or willingness to swap to Android as their only phone; 6) be aged 18 or over. Depression diagnosis was determined using the Lifetime Depression Assessment – Self-Report (LIDAS; [22]) in addition to the review of medical records.

Exclusion criteria were: 1) having a self-reported lifetime history of bipolar disorder, schizophrenia, MDD with psychotic features, or schizoaffective disorder; 2) dementia; 3) having received treatment for drug or alcohol use in the 6 months prior to enrolment; 4) a major medical diagnosis which might impact an individual’s ability to participate in normal daily activities for more than 2 weeks; 5) pregnancy (although once enrolled, becoming pregnant did not result in withdrawal as pre-pregnancy baseline data had already been obtained).

Eligible participants were identified via several recruitment channels, including through existing research cohorts who have consented to be contacted for future research opportunities (in the UK [23] and the Netherlands), through primary and secondary mental health services (in the UK and Barcelona), or through advertisements for the study placed on mental health charity websites, circulars or Twitter notices (at all sites). Participants in Amsterdam were partially recruited through Hersenonderzoek.nl (https://hersenonderzoek.nl/). All participants provided written consent and provided detailed baseline assessments including sociodemographic, social environment, medical history, medical comorbidities and technology use questionnaires.

Data collection

Remote data collection

Data collection started in London (UK) in November 2017 in a pilot phase of app development, with additional assessments being added to the protocol throughout the first 18 months of the study period to allow small-scale functionality testing and quality control before international large-scale data collection commenced. Data collection started in Barcelona and Amsterdam in September 2018 and February 2019, respectively. The data collected used RADAR-base, an open-source platform designed to leverage data from wearables and mobile technologies [24]. RADAR-base provides both passive and active data collection via two apps – the RADAR active and passive monitoring apps.

Passive RMT

The passive RMT (pRMT) app unobtrusively collected information about phone usage throughout participation, requiring no input from the participant. It collected data on ambient noise, ambient light, location, app usage, Bluetooth connectivity, phone usage, and battery life. Some data sources were removed from the protocol throughout follow-up (summarised in Supplementary file 1) due to unavoidable changes in smartphone operating systems. Changes to Google’s Play Store permissions prevented access to text and call log data as of January 2019. Data pertaining to text and call logs have not been reported in the current paper due to data collection from this sensor ceasing when one site had only recruited 30 individuals and another site had not started recruitment at all. Participants were additionally asked to wear a Fitbit Charge 2/3 device for the duration of participation, providing information about individuals’ sleep and physical activity levels. Participants could keep the Fitbit at the end of the time in the study.

Active RMT

The RADAR-base active RMT (aRMT) app administered validated measurements of depression and self-esteem every 2-weeks via the 8-item Patient Health Questionnaire (PHQ8; [25] and Rosenberg Self-Esteem Scale (RSES; 26). Items on the PHQ8 can be totalled and used as a continuous score with higher scores indicating increased depression severity, and scores totalling ≥10 indicating those with significant symptoms [25]. The RSES requires reversing of 5 of the 10 items, which then can be totalled to create a total score with higher scores representing increased self-esteem [26].

The aRMT app also delivered a speech task every 2-weeks, requesting participants to record a pre-determined text from the “North Wind and the Sun” (see Supplementary file 2), an Aesop’s fable which is phonetically balanced across all three languages and has been shown to provide linguistic parameters indicative of low mood [27]. Participants were also asked to provide a sample of speech in answer a question relating to plans for the upcoming week. Finally, the aRMT app included an ESM protocol [18], requiring participants to complete brief questions relating to mood, stress, sociability, activity and sleep, multiple times per day for 6 days at scheduled times throughout the course of follow-up.

Cognitive function

Cognitive function was measured every 6-weeks via an additional THINC-it app®, which was integrated into the RADAR-base platform. The app has been validated to identify cognitive dysfunction within the context of depressive disorder [28]. The app contains the 5-item Perceived Deficits Questionnaire (PDQ-5; [29]), alongside computerised versions of the Choice Reaction Time Identification Task (“Code Breaker”), One-Back Test (“Spotter”), Digit Symbol Substitution Test (“Symbol Check”) and Trail Making Test-Part B (“Trails”) tasks to assess processing speed, working memory, concentration and attention [28].

Primary and secondary outcome assessments

All primary and secondary outcome measurements were collected via automatic surveys sent every 3 months via the Research Electronic Data Capture (REDCap) software [30]. A full description of the outcome assessment schedule is provided in our published protocol paper [18].

Depression

Depressive state was measured using the Inventory of Depressive Symptomatology – Self Report (IDS-SR; [31]) to capture changes in symptom severity, and the World Health Organisation’s Composite Diagnostic Interview – Short Form (CIDI-SF; [32]) to identify people meeting DSM-5 criteria for MDD at each timepoint. These two measurements were used to identify different operationalisations of depression across follow-up, summarised in Supplementary file 3. Briefly, participants were categorised as being “symptomatic” (scoring ≥26 on the IDS-SR and meeting CIDI-SF criteria for MDD), having “some symptoms” (scoring ≤25 on the IDS-SR and meeting CIDI-SF criteria for MDD; or > 21 on the IDS-SR and not meeting CIDI-SF criteria for MDD) or having “no/mild symptoms” (scoring ≤21 on the IDS-SR and not meeting CIDI-SF criteria for MDD).

As described previously [18], the primary outcome of interest in RADAR-MDD is depressive relapse, defined here as switching from a state of “no/mild symptoms” to “symptomatic” over a period of 6-months. Secondary depression outcomes are: remission (switching from a state of “symptomatic” to “no/mild symptoms” over a period of 6-months); and change in the severity of depressive symptoms (measured via the continuous IDS-SR).

Anxiety

Anxiety was measured via the 7-item Generalised Anxiety Disorder questionnaire (GAD7; [33]), used as a continuous indicator of anxiety symptom severity (a total of 21, with higher scores indicating increased anxiety severity) and a total score ≥ 10 indicating significant symptoms. This threshold has previously been shown to have good levels of sensitivity and specificity [34].

Functional ability

Functional ability was measured using the Work and Social Adjustment Scale (WSAS; [35]), using a continuous score from 0 to 40 to describe the level of impairment, with scores of 0–10, 11–20 and > 20 to indicate no, some and significant impairment respectively [35].

Alcohol use

The Alcohol Use Disorders Identification Test (AUDIT; [36]) was used to measure alcohol use across timepoints. A total score out of 40 describes the level of alcohol use; scores of 0–7 indicate low risk alcohol consumption; 8–15 indicate hazardous alcohol consumption; 16–19 indicate harmful alcohol consumption; and scores > 20 indicate likely alcohol dependence [37].

Illness perceptions

The Brief Illness Perceptions Questionnaire (BIPQ; [38]) measured emotional and cognitive representations of illness, capturing perceptions relating to illness identity, causes, control, consequences, timeline, concern, understanding and emotional response. Total scores for each domain can be used individually, or totalled, with higher scores representing a more threatening view of their illness.

Health service use

Access to health services, as well as changes in treatment, and care received was measured via a modified Client Service Receipt Inventory (CSRI; [39]), adapted to be suitable for online delivery and participant self-report.

Covariates

Life events

Any significant life events which may have happened between outcome assessments were measured via the List of Threatening Experiences Questionnaire (LTE-Q; [40]). Changes in employment status were recorded regularly as part of the CSRI [39].

Medication adherence

Self-reported adherence to depression medication was measured with the 5-item Medication Adherence Report Scale (MARS-5; [41]).

Patient and Public Involvement

The study was co-developed with service users in our Patient Advisory Board. They were involved in the choice of measures, the timing and issues of engagement and have also been involved in developing the analysis plan and representative(s) are authors of this paper and critically reviewed it.

Statistical analyses

Baseline characteristics of the sample were described using means and standard deviations or numbers and percentages as appropriate. To examine whether depressed mood is associated with the availability of data across all modes of data collection, participants were divided using scores on the IDS-SR and CIDI-SF (see Supplementary file 3 for operationalisation) into those who are symptomatic at baseline and those who are not (those with no/mild symptoms and some symptoms are pooled together due to the low number of people with no/mild symptoms at baseline (n = 4)). Chi-squared tests examined differences between those with baseline symptoms of depression and those without in categorial data, and linear regressions in continuous data.

The number and percentage of people who have provided any data via the aRMT and pRMT apps and the wearable device throughout the course of follow-up have been summarised, then divided into quartiles to examine the numbers of people who have provided 0–25% of expected data, 26–50%, 51–75 and > 75% of data throughout follow-up. Fitbit wear time estimates were calculated based on the presence of a single heart rate value, greater than zero, per 15-min window.

P-values comparing the amount of data available between people with symptoms of depression at baseline and those without symptoms of depression at baseline were created using Chi-Squared tests. T-tests compared the number of ESM questions completed in total across all follow-up timepoints between those with and without depression symptoms at baseline. Data were analysed using STATA v16.0.

Results

Recruitment and retention rates

The first person was enrolled in RADAR-MDD on 30th November 2017, and recruitment ended on 3rd June 2020, representing a total of 30 months of recruitment. Figure 1 shows the participation rate, detailing the total number of participants contacted and the reasons for non-participation.

Fig. 1
figure 1

STROBE flowchart for recruitment into RADAR-MDD

Figure 2 shows the participant retention rate throughout the period of follow-up. At each timepoint, the number of people eligible for contact for an outcome assessment decreased as: 1) more people had reached the end of the data collection period; and 2) as people had been withdrawn from the study. As the last participant was recruited in June 2020 and the study finished in April 2021, the minimum and maximum lengths of possible follow-up were 11 months and 24 months respectively. The completion rate of the primary and secondary outcomes in those who were eligible to complete it (those who had not already completed the study or been previously withdrawn) was approximately 80% throughout follow-up assessments.

Fig. 2
figure 2

Participants “not contacted” because they had already completed the maximum amount of follow-up time or had already withdrawn from the study. Participants were “contacted” when they were still active participants. *Reasons for withdrawal provided in Supplementary file 4. **Invalid outcomes collected ±21 days of due date

Of the 623 participants enrolled in the study, 445 (71.4%) provided outcome data at 1-year follow-up and 181 (29.1%) participated for a full 2-years. A total of 497 people (79.8%) participated for the maximum possible duration (from their enrolment until the end of data collection in April 2021), and 126 people (20.2%) withdrew prematurely. Reasons for withdrawal are provided in Supplementary file 4. The most common reason for withdrawal across all timepoints was loss to follow-up (n = 47) and problems using the Android study phone (for those who had switched from an iPhone for the purposes of the study (n = 14), representing 37.3 and 11.1% of all withdrawals respectively. A total of 8 participants identified study burden as the main reason for withdrawal, including finding the study “too demanding” (n = 6) or the study “not meeting expectations” (n = 2).

Sample characteristics

The target sample size of 600, across the three sites, was exceeded, with 623 individuals successfully enrolled in the study. The baseline sociodemographic and clinical characteristics of this sample are displayed in Table 1, with comparisons made between those with no/some symptoms at baseline and those who were symptomatic at baseline (see Supplementary file 5 for between-site stratification).

Table 1 Sociodemographic and clinical baseline data and comparisons between those with no/some depression symptoms at baseline, and those who are symptomatic at baseline

In comparison to those with no/some depression symptoms at baseline, the symptomatic group were significantly younger, and had a higher proportion of individuals who were female, on long-term sick leave or unemployed, receiving benefits, and earning less than £/€15,000 per annum. Regarding clinical characteristics, the symptomatic group had a higher proportion of current smokers, medical comorbidities, as well as increased levels of current depression, anxiety, functional disability, and worsened illness perceptions, although lower levels of alcohol use. Throughout RADAR-MDD, a total of 341 risk assessments were conducted (9.0% of the 3777 depression measurements taken).

Data collection with RMT

Data collection started on 30th November 2017, with data collection continuing until the last participant was unenrolled from the study on 1st May 2021, resulting in a median study duration of participation of 541 days (interquartile range (IQR): 401–730 days, range: 0–1217 days). A total of 2.9 terabytes of compressed data were collected, with 110 (17.7%) participants having more than 50% available data across all modes of data collection.

Data collected via aRMT

Figure 3a-c display active RMT data collection stratified baseline depression status. Overall, participants completed a median of 21 (IQR:9–31) PHQ-8 questionnaires, 20 (IQR:9–30) RSES questionnaires, 12 (IQR:2–23) speech tasks. A total of 95.3, 94.5 and 82.2% of participants had any data available for the PHQ8, RSES and speech tasks respectively. Chi squared tests found no significant differences in data availability between those with or without depression symptoms at baseline for the PHQ8 (X2 (622, n = 623) = 3.0, p = 0.38), RSES (X2 (622, n = 623) = 3.83, p = 0.28), or speech task (X2 = 4.8, p = 0.19). The mean numbers of ESM items completed by those with and without depression symptoms at baseline throughout the study duration were 11.8 (SD = 23.7) and 11.9 (SD = 23.7) respectively, with t-tests demonstrating no significant difference in ESM data availability between these groups (p = 0.158).

Fig. 3
figure 3

Questionnaires triggered every 2 weeks; maximum number of possible responses: 52. 3a: 8-item Patient Health Questionnaire (PHQ8); 3b: Rosenberg Self-Esteem Scale (RSES); 3c: Speech data

Figure 4 displays THINC-it app® data collection stratified baseline depression symptom status. Overall, participants completed a median of 5 (IQR:2–10) THINC-it app® PDQ5 questionnaires, 5 (IQR:2–9) Code Breaker tasks, 5 (IQR:2–9) Spotter tasks, 5 (IQR-2-9) Symbol Check tasks, and 5 (IQR = 2–10) Trails tasks. Over 84% of participants had any data available for the PDQ5 (90.5%), Code Breaker (84.4%), Spotter (84.8%), Symbol Check (84.6%) and Trails (89.9%) tests. Chi squared tests found no significant differences in data availability between those with or without depression at baseline for the PDQ5 (X2 (622, n = 623) = 2.5, p = 0.48), Code Breaker (X2 (622, n = 623) = 0.91, p = 0.82), Spotter (X2 (622, n = 623) = 1.28, p = 0.73), Symbol Check (X2 (622, n = 623) = 1.26, 0.74) or Trails (X2 (622, n = 623) = 2.0, p = 0.58) tasks.

Fig. 4
figure 4

Questionnaires triggered every 6 weeks; maximum number of possible responses: 17. 4a: 5-item Perceived Deficits Questionnaire (PDQ5); 4b: Code Breaker; 4c: Spotter; 4d: Symbol Check; 4e: Trails

Data collected via wearable technology

Table 2 displays wearable RMT data collection using Fitbit, stratified by baseline depression status. Data collection relied on 1) participants wearing the Fitbit device, 2) regularly charging and syncing the Fitbit device; 3) data being returned/provided by the Fitbit servers. Fitbit wear-time varied during the study (Fig. 5a), with the average participant wear-time across the entire duration of follow-up estimated as 62.5% (SD: 9.1 percentage points, Fig. 5b), and the average number of hours per day as 15.1 h (SD: 2.2 h). Wear-time decreased over time and wear-time did not significantly differ between those with no depression symptoms versus those with symptoms at baseline (X2(622, n = 623) = 525616, p=0.24).

Table 2 Wearable remote measurement technology data availability stratified by baseline depression status
Fig. 5
figure 5

A) Heatmap representing study day and data points per hour. B) percentage wear time stratified by baseline depression status

Step count data were the most frequently available data, with almost 50% of participants providing > 75% of expected data throughout the course of follow-up. Activity data (comprising a combination of data derived from Fitbit proprietary algorithms and via participants inputting their own activities manually) was the least readily available data, with only 5% of participants having > 75% data availability. Activity data are also the only data type found to have significantly different levels of availability according to the presence of depression at baseline (X2 (622, n = 623) = 14.1, p = 0.002). In comparison to those without depression at baseline, those identified as symptomatic at baseline had a significantly larger percentage of people providing < 26% of activity data. Figure 5a shows a paler horizontal band of colour between days 290 and 380 of study participation, indicating lower levels of wear-time during these time-points, and Fig. 5b shows a dip in percentage wear-time in people with symptoms of depression at baseline after the first year of participation.

Data collected via pRMT

Table 3 displays passive data collection across all smartphone sensors, stratified by the presence of baseline depression. The most data were available for GPS location and battery level data. The least data were available for phone usage. No evidence of a difference in data availability between those with and without depression at baseline was identified.

Table 3 Passive remote measurement technology data availability stratified by baseline depression status and measurement

Discussion

Study recruitment and retention

Recruitment into RADAR-MDD was highly successful, with the flexibility of face-to-face and remote enrolments resulting in the study exceeding its recruitment targets despite the COVID-19 pandemic [42]. Attrition rates in longitudinal research vary widely [43] and whilst there is no recognised threshold for “acceptable” versus “unacceptable” dropout, follow-up levels of 50, 60 and 70% have previously been described as adequate, good and very good respectively [44]. Here we report ~ 80% completion rates of our outcomes across all follow-up timepoints, with 79.8% of all enrolled individuals completing the study protocol for the maximum amount of time possible, representing excellent availability of our primary and secondary outcome measures.

Sociodemographic characteristics

The RADAR-MDD cohort has a higher proportion of White and female individuals than would typically be seen in the general population or depressed population [45] reflecting the tendency for White females to attend mental health services more often than their male/non-White counterparts, and their greater likelihood of participating in research studies [46, 47]. The mean age and gender distribution in our participants is comparable to other MDD samples, such as Sequenced Treatment Alternatives to Relieve Depression (STAR*D; [48]) and the Rhode Island Methods to Improve Diagnostic Assessment and Services (MIDAS; [49]). These characteristics may limit the generalisability of our findings to the wider population. It is also worth noting that ethnic groups across the two countries who collected ethnicity data are challenging to compare, meaning that in-depth interrogation of racial differences in outcomes will be affected by small cell sizes unless ethnic groups are merged into larger, less descriptive categories. In terms of clinical presentation, our sample have slightly lower levels of current depression severity and reduced WSAS functional disability than those recruited into STAR*D [48].

RMT domains and data availability

Data availability varied across the RMT domains. Over 90% of participants had data available for analysis from the aRMT, with the PHQ-8 and RSES having the largest amount of data available for the most people. The least amount of aRMT data was available for all assessments conducted via the THINC-it® app, with < 26% of expected data available in approximately 60% of participants. There are several explanations for this difference in data availability in comparison to our other aRMT assessments. Firstly, due to the technical requirements of integrating data from the separate THINC-it® app into the RADAR-base platform, the first THINC-it® data were received in March 2018, with the first 4-months of data collection excluding THINC-it® data. There were also initial challenges syncing data collected via the THINC-it with the RADAR-base platform, meaning there was potential for data loss in the early months of data collection. Secondly, the THINC-it® app is separate to the other RADAR-base apps, with different branding, design and feel to the RADAR-base apps. This may have made the tasks appear separate or “other” to the main protocol and reduced adherence to these tasks. The THINC-it® app does not have an inbuilt notification system - participants received notifications to complete the cognitive tasks via the RADAR-base aRMT app. Participants are required to switch between apps, which increases the number of points at which interest or motivation may be lost [50]. Finally, the cognitive tasks offered as part of the THINC-it requires more attention than conventional questionnaires which may be more challenging for those who are experiencing depression symptoms [51].

We report an overall Fitbit wear-time of 62.5%, across a median study participation of 541 days, and a mean wear-time of 15.1 h per day. This is lower than the wear-time of 22.6 h per day across a two-year follow-up period in a recent United States population-based Fitbit study by Radin and colleagues [52]. However, Radin et al. omitted missing wear time data, and excluded measurements with a wear-time lower than 1000 min per day which inflates their wear-time statistics. In contrast to our sample, Radin et al. [52] used a non-clinical population and the barriers to long-term use of a wearable device are likely to be different in an MDD versus general population sample [21]. Comparatively, Pedrelli et al., 2020 [53] report Empatica E4 wear-time estimates of 92–94% in their study involving 31 individuals with MDD, however their follow-up period was limited to only 8-weeks [53]. Although similar in clinical characteristics, our duration of follow-up and integration of a wearable into a more complex set of data collection sources likely explains the differences in wear time reported.

To the best of our knowledge, no remote measurement studies have reported the quantity of data collected via smartphone sensors. The largest amount of data were available for battery level and GPS sensors. For a multiparametric analysis, data across multiple sensor types will be needed. We report a total of 110 individuals (17.7% of the sample) who have > 50% of data for data types. It is important to acknowledge this as an indicator of the amount of resource and data collection required for multiparametric analyses. Although a remote study by nature, participants had close contact with the research team throughout the study, the researchers were available for technical support and questionnaire reminders, in addition to conducting risk assessments based on questionnaire answers. Future work will need to investigate the minimum amount of contact time required to acquire usable data, for real-world implementation to be viable.

Limitations

There are several limitations and challenges presented by the current paper. Firstly, each of the sensor and data types collected has different temporal validity and aggregation requirements. For example, sleep data are only meaningful when aggregated from midday-midday, whereas activity data are more relevant when calculated from midnight-midnight. At a granular level, data from smartphone and wearable sensors are so fine that no meaningful inferences can be gained, requiring some form of aggregation which may not be the same across different sensors. For example, whereas heart-rate data might be collected every 5 s and summarised across an hour, the aggregation of GPS data is dependent on the smartphone device being used. In the current paper we have endeavoured to summarise data availability as coherently as possible within these constraints, aiming to provide an easily replicable, comparable, and interpretable description of the data available within our dataset.

It is also essential to acknowledge the technical challenges inherent to multimodal data collection across long periods of time. RADAR-base and its associated apps were developed and piloted within the main data collection period, with iterative changes and updates being made throughout the course of follow-up. These changes may have been implemented to overcome a system-related issue introduced by the updates to the Android operating system, or in direct response to participant or researcher feedback. This flexibility in app design and development is essential to maintain app compatibility. This means that an individual participating throughout 2019–2020 will have had a different user experience to an individual participating throughout 2020–2021.

Whilst the majority of our recruitment occurred before the global pandemic, the threat posed by COVID-19 may have affected existing participants’ research experience and data completion. Recent evidence suggests that people with moderate to severe levels of depression who are already enrolled in a research study show a reduced ability and desire to adhere to research protocols due to COVID-19 [54]. Given the high level of depressive symptoms in our sample, the pandemic and its associated social interventions may have added a burden to participants resulting in an increased dropout rate and reduced adherence to the study protocol. We have previously reported the impact of the pandemic and associated social interventions on the data collected via RMT across the RADAR-CNS clinical studies [55] and future work will extend this to examine how the pandemic may have affected data availability.

Despite these limitations, RADAR-MDD remains the largest, most ambitious multimodal RMT study in depression. A recent systematic review summarising studies using passive and active smartphone-based measurements in affective disorders found only 5 studies in people with MDD, and these studies reported a median sample size of 5, and median follow-up time of 4 weeks, in addition to huge variability in the quality of reporting [17].

Future research

There are some vital next steps in the exploration of RADAR-MDD data which will be examined in addition to the primary objectives of the RADAR-MDD study [18]. Firstly, as reported earlier, the present paper reports the amount of data available across all modes of data collection. A more thorough investigation into the quality of the data is warranted before more complex analyses are conducted. Furthermore, whilst we show no evidence of a link between baseline depression status and data availability, it is likely that fluctuations in depression symptoms over time are more relevant for predicting technology use, rather than a static baseline status, for example, future work will explore whether missing data due to reduced participant adherence might be an early sign of depressive relapse. We have not described sociodemographic, clinical and technical predictors of data availability which will be the subject of a future paper.

Conclusion

The data collected in RADAR-MDD indicates that collecting RMT data from clinical populations is feasible. We found comparable levels of data availability in active (requiring input from the participant) and passive (requiring no input from the participant) forms of data collection, demonstrating that both are feasible in this patient group. However, data availability will depend on the data type, with higher burden data sources (such as cognitive tasks, or keeping wearable devices charged) reducing data availability. There was no convincing indication that the severity of depression symptoms at baseline was associated with data availability, in this sample. The next steps are to illustrate the predictive value of these data, which will be the focus of our future data analysis aims.