Introduction

In the European Union (EU), a disease is defined as rare if it affects fewer than five in 10,000 people across the EU [1]. In the United States (US), the Orphan Drug Act defines a rare disease as a disease or condition that affects less than 200,000 people in the US [2]. Around 10,000 rare diseases have been identified according to these different definitions and about 5% of the European population is thought to be affected by one of them, many of which have neurological manifestations. These diseases are often associated with high mortality and disability, which result in large societal costs [3]. Drug development for rare diseases is especially challenging due to the small number of subjects eligible for inclusion in clinical trials. Trials in this field tend to be small, single-arm, non-randomized, and open label [4]. As define by the US Food and Drug Administration [5], a clinical outcome assessment is a measure that describes or reflects how a patient feels, functions, or survives. In clinical trials for rare diseases, outcome measures are often subjective, depending on both patient motivation and rater [6]. Moreover, test results only reflect the patient’s condition at a specific timepoint, and severity of symptoms often varies with time.

Nevertheless, the precision and the objectivity of the outcome measures are critical [7]. Digital biomarker could to a certain extend fill this unmet need. Digital biomarkers have been defined by the European Medicines Agency (EMA) as an objective, quantifiable measure of physiology or behavior used as an indicator of biological, pathological process or response to an exposure or an intervention that is derived from a digital measure [8] (e.g., step length). Outcomes are defined as measures chosen to assess the impact of an intervention [9]. For most rare diseases, clinical outcomes have not been qualified or validated to the same rigor, or methodological approach and in large cohorts as they have in more common diseases. We define an outcome as validated or partially validated if it has been studied in adequate and well-controlled studies with full characterization of its psychometric properties after original discovery studies. Outcome qualification is a longer and codified process to gather regulatory authorities qualification of this outcome within a well-defined context of use [5]. Thus, in clinical trials for rare diseases, outcome measures are often subjective, depending on both patient motivation and rater [6]. Moreover, test results only reflect the patient’s condition at a specific timepoint, and severity of symptoms often varies with time.

In recent years, an increasing number of technologies have been used for remote monitoring of health [10]. Remote monitoring provides the opportunity to reduce cohort sizes and time-to-endpoint for clinical trials [11] while providing a more accurate representation of the patient’s condition than evaluations conducted in clinic facilities at a few time points.

Efforts to develop digital outcome measures have focused mostly on neurological disorders that are not classified as rare [12, 13], but these measures are increasingly used to evaluate patients who have rare diseases. The use of digital outcome measures is expected to increase with the development of innovative technologies and artificial intelligence [14].

In neuromuscular disorders, a recent literature review highlighted the increasing use of sensors to assess motor activity in the real-life and the ability of these sensors to overcome many issues of traditional evaluations [15]. Nevertheless, only one digital outcome measure, the 95th centile of stride velocity, is currently qualified by a regulatory agency for the use in Duchenne muscular dystrophy [16].

The purpose of this systematic review is to summarize the current state of progress with regard to the use of digital outcome measures for real-life motor function assessment of patients affected by rare neurological diseases. We also summarize the psychometric properties that have been assessed for each outcome providing a snapshot of where we stand currently in the process of developing and qualifying digital outcomes in rare diseases.

Methods

Methods for identification of relevant clinical studies are described in Additional file 1. Data extracted from individual studies is presented in Additional file 2.

Results

Selection of clinical studies

This review provides a summary of the use of digital outcome measures in clinical trials listed in Medline and Embase. Figure 1 is a flow-chart of the process used for inclusion of clinical studies in our analysis. Our search of published literature identified 3826 records, of which 139 were included. Amongst these studies, 51 focused on neuromuscular disorders, 42 on movement disorders, 16 on genetic ataxias, nine on multisystemic rheumatological diseases, and 21 miscellaneous neurodevelopmental disorders. Information on study designs and collected data are summarized in Table 1.

Fig. 1
figure 1

Flow-chart of selection process

Table 1 Summary by disease of number of studies and their design, population and main findings

Types of outcomes

In the studies evaluated, very few outcomes were precisely defined and even fewer were completely validated and ready for qualification. Event detection or frequency directly derived from sequential detection of an event was often measured. Physical activity was measured in various ways including number of steps and time spent in different levels of physical activity. Outcomes related to gait ranged from time-distance variables (i.e., stride length, velocity) to more elaborated variables resulting from machine-learning approaches. Methods for upper limb assessments were also heterogeneous. Other functions monitored included balance and body sway.

Choice and position of the sensors

Wearable or portable devices based on inertial technology were by far the most frequently used sensors, although pressure or surface electromyographic sensors were also employed. Technologies included applications and software. Critical to effective motor assessment is the number and position of the sensors as sensors only assess the movement of the segment where they are placed, and sensor number and position may influence patient compliance. There was large variability in the numbers and positions of sensors in the same patient populations. The locations of the sensors and the numbers of relevant studies are presented in Fig. 2.

Fig. 2
figure 2

Numbers and placements of wearable and portable inertial sensors. Abbreviation: Duchenne Muscular Dystrophy (DMD), Amyotrophic Lateral Sclerosis (ALS), Charcot-Marie-Tooth (CMT), Myotonic Dystrophic (DM), Facioscapulohumeral dystrophy (FSHD), Myasthenia Gravis (MG), Spinal Muscular Atrophy (SMA), Spinal and bulbar muscular atrophy (SBMA), Huntington Disease (HD), Progressive supranuclear palsy (PSP), focal dystonia (FD), Spinocerebellar ataxias (SCA), Hereditary spastic paraplegia (HSP), Fragile X Syndrome (FXS), Friedreich’s ataxia (FRDA), Prader–Willi syndrome (PWS), mucopolysaccharidosis (MPS), GM2 gangliosidosis (GM2), Niemann–Pick type C (NP-C), Tuberous sclerosis complex (TSC)

Psychometric properties

Psychometric properties that were evaluated in this review are summarized in Table 2. We found 92 studies that described use of psychometric evaluation. Most focus on validity assessment through correlation with standard outcomes and distinction between normal and pathological while reliability and sensitivity to change were assessed in only 16 and 21 studies, respectively.

Table 2 Psychometric properties

Spectrum of pathological conditions assessed

Neuromuscular disorders

Duchenne muscular dystrophy (DMD)

Among the 18 studies that involved subjects with DMD, two studies also included subjects with Becker muscular dystrophy [17] and Niemann–Pick type C [18]; the rest involved only subjects with DMD. One study showed that participants who were ambulatory 2 years after study initiation had almost double the level of step activity at baseline compared to subjects who were not ambulatory after 2 years [19]. Two studies demonstrated that limb coordination (homolateral-limb coupling coefficient) and physical activity (stride count and time spent in different levels of activity) declined with age [20]. Another study showed that total physical activity did not significantly change over 1 year [17]. Validity was demonstrated for numerous gait parameters (e.g., stride length, stride velocity) [21]. One outcome, stride velocity 95th centile (SV95C), which was qualified as a secondary outcome measure [22], was in public consultation for qualification as a primary endpoint until April 2023 and is very likely to be qualified in 2023. Of the two interventional studies identified, one, a randomized controlled trial, failed to show the effect of a nutritional supplement on steps or inactive minutes per day [23], and the other showed that duration of walking episodes or the succession of walking episodes increased with prednisolone treatment [24]. One research team used a smartphone maze game to evaluate and improve upper limb performance [25]. In another study, the feasibility of a patient-led initiative to assess upper and lower limb function through video analysis of four motor tasks was demonstrated [26].

Amyotrophic lateral sclerosis (ALS)

We found 15 studies that used various sensors, mainly inertial technologies, in the ALS population. One study showed an inverse association between overall acceleration average and ALS risk based on genome-wide association study and inertial data [27]. Raw parameters as vector magnitude count and variation in vertical axis showed less variability than clinical assessment leading to a potential reduction of sample sizes by 30.3% for 12-month trials [28]. Another study showed that daily home measurements such as step count, electrical impedance myography, and grip strength resulted in more accurate assessment to track progression and could reduce sample sizes for trials [29]. Two studies used apps that integrated different tasks to monitor motor control in the upper body, speech, and cognition. One of these apps detected two metrics, error metrics and velocity rate, that are useful for inference of clinical variables [30]. Changes in typing activity (e.g., acceleration at key press, key release) were correlated with progression of dysfunction [31]. Assessment of head movements through four parameters confirmed the efficacy of the Head Up collar by showing a significant improvement in the control of movement expressed as the median ratio of movement coupling value [32]. Another study showed that it is feasible to assess ALSFRS-R, a score that stratifies severity of ALS, remotely through an app [33]. Studies have also demonstrated that accelerometers and electromyography can be used for evaluation of tremor frequency, involuntary movement [34], and fasciculation [35].

Charcot-Marie-Tooth (CMT)

Six studies were found that used inertial sensors in subjects with CMT disease, four employed the sensors in the home setting. Four studies collected information on physical activity based on various variables (e.g., power of different activities, steps taken) without any ranking in terms of importance of these variables. One study attempted to characterize gait fatigability and demonstrated that velocity, cadence, trunk range of motion, step time, and stride length variability showed statistically significant differences during the 6-min walk test [36]. One study used the detection to tremor characteristics (e.g., frequency, spectral power) to study possible mechanisms for tremor [37].

Myotonic dystrophy type 1 (DM1)

Four studies focused on difference between in the myotonic dystrophy patients and controls mainly in terms of active minutes, limb accelerations, and gait abnormalities (i.e., walking speed, stride frequency, stride length). Three used inertial technology. The fourth showed the feasibility of remote video assessment of timed-up-and-go and a digitally measured hand grip strength [38].

Facioscapulohumeral muscular dystrophy (FSHD)

Four studies in subjects with FSHD focused on psychometric evaluation of outcome measures gained with inertial sensors. In one of them, data were acquired for 6 weeks through multiple technologies (accelerometer, apps, GPS, Google Places calls, microphone), and analyses resulted in an accurate classifier [39].

Myasthenia gravis (MG)

Two studies of subjects with myasthenia gravis continuously assessed for 7 days, examined the validity of physical activity variables, defined as total [40] or levels of physical activity (e.g., low, moderate, high) [41].

Spinal muscular atrophy (SMA) types 2 and 3

In a natural history study of subjects with SMA2 and SMA3, wrist angular velocity, wrist acceleration, wrist vertical acceleration, power, and percentage of active time were extracted from a magneto-inertial sensor worn at the wrist [42]. The reported results were similar to those described in DMD [43]. In a limited number of ambulant patients, the device was placed on the lower limb and the SV95C, which was qualified in DMD [22] and studied in other dystrophies [44], remained stable over 12 months.

Spinal and bulbar muscular atrophy (SBMA)

The single study of subjects with SBMA assessed the use of accelerometry to identify changes in the physical activity of patients after 10 days of functional exercises [45].

Movement disorders

Huntington disease (HD)

Among the 26 studies on HD patients, 10 reported continuous [46,47,48,49,50,51] or discontinuous [52,53,54,55] real-life assessments. Physical activity was assessed with different sets of variables in each study (e.g., activity profiles, step count, and time spent sitting). Contradictory results were obtained regarding discriminant validity of levels of physical activity [47, 51]. Two studies found no difference in physical activity between patients and controls, but there were differences in spatiotemporal gait variables [50, 56]. Validity and accuracy of estimating gait events and various gait spatiotemporal parameters were assessed in controlled environments [50, 56,57,58,59,60,61,62,63] and in real-life settings in several studies [49, 50, 55, 56]. These studies used a selection of outcome measures (e.g., cadence, stride length, gait speed, variability) with no prioritization of outcomes in terms of metric properties. In a cross-sectional study, compliance and psychometric properties of upper and lower limb variables (e.g., sway path, spiral drawing speed variability, median turn speed, and step frequency variance) were extracted from a digital monitoring platform [55]. Balance and body sway (as root mean square [61, 64], jerk, sway area and power [65], peak and mean thoracic and pelvic excursion [66]) were impaired even at the premanifest stage of disease.

Upper limb function was evaluated in three studies in real-life settings using software to record speed of finger movement during computer typing [53], inertial technology to compute a chorea score, and a smartphone touchscreen sensor to assess the tap rate [53]. One study demonstrated the validity of a model chorea score based on accelerometer data [46]. Three studies used machine learning approaches to extract composite scores [67] (i.e., movement impairment scores) as well as complex [68] and standard [59] spatiotemporal variables (e.g., stride length, velocity) for discriminant validity analysis. Use of machine learning based on inertial data, the accuracy of gait event detection [63], the ability to predict upper limb impaired reaction (reaction time and error distance) [69], and the potential of root mean square to serve as biomarkers of postural control impairments [61] were also demonstrated in a number of studies. Three studies demonstrated that compliance was good [54, 70] and reported that participants would be willing to wear the sensors again [47].

Progressive supranuclear palsy (PSP)

Seven studies focused on evaluation of subjects with PSP using inertial sensors in a clinical setting. Validity assessment via gait length or speed comparison between subjects with PSP and those with Parkinson’s disease (PD) were not in agreement [71, 72]. Patients with PSP have significant impairment in postural control, expressed as root mean square, which can result in falls during challenging test conditions [73]. Cerebellar repetitive transcranial magnetic stimulation had a significant effect on inertial parameters correlated with stability (i.e., area, velocity, velocity, acceleration, and jerk in the medio-lateral direction) compared to placebo [74]. Upper limb assessment was restricted to the discriminant validity assessment of finger tapping-specific features [75]. Another study reported that a machine learning approach identified classifiers that discriminate PSP from PD with high sensitivity and specificity [76]. In the only longitudinal study of PSP subjects that we found, three out of 150 gait and postural features that were the most sensitive to change (i.e., mean toe off angle, mean turn velocity, standard deviation of stride length) were integrated in a regression model able to detect early progression [77].

Focal dystonia (FD)

We found nine studies regarding hand dystonia [78,79,80,81] or cervical dystonia [82,83,84,85,86]. One study showed that instrumental detection of dystonic tremor outperforms clinical examination [80]. Two studies aimed to define the characteristics of dystonia [79] and dystonia-associated tremor [78] with multiple measures including force, power, and frequency. In the latter study, the authors built an accurate classifier (95.1%) that discriminated between essential tremor, dystonic tremor, and controls using two tri-axial accelerometers and four pairs of surface EMG electrodes. One study used simultaneous MRI and accelerometer recording of tremor power to investigate its pathophysiology [81]. In a single-participant interventional study, tremor magnitude assessed by 3D-kinematics was concordant with accelerometry and improved with deep brain stimulation [86].

Accelerations in three directions and speed measured with an inertial device during a Timed Up and Go Test showed abnormalities in turning, standing-up, and sitting [83].

Genetic syndromes with manifestations of ataxia

Spinocerebellar ataxias (SCA)

We found nine studies in subjects with SCA that used inertial technologies. Evaluations in clinical settings demonstrated that wearable sensors are able to accurately capture gait parameters (i.e., mean step velocity, length, and swing and stance time) [87] and turn (i.e., angle, duration, steps, mean velocity, lateral velocity change, outward acceleration, inward acceleration, average and maximum rate, and hesitations) [88, 89] in patients with SCA. Gait parameters and body sway measures (i.e., stride length, its variability, stride duration and speed, cadence, and swing, and stance time or percentage, double-support time variability, pelvis, ankle, and hip sway, turn duration, lateral velocity change, outward acceleration, and toe-out angle variability) consistently identified ataxic gait changes in the clinical setting [87, 89,90,91,92]. Inertial technology also detected gait abnormalities as variability of the swing period, toe-off and toe-out angles, and elevation of feet at mid-swing as well as ranges of motion of the trunk and arm in patients with premanifest SCA type 2 [91, 93].

Hereditary spastic paraplegia (HSP)

We found three studies led by the same team of researchers that studied subjects with hereditary spastic paraplegia using inertial sensors fixed on the feet. Two of them, a pilot study and its validation [94, 95], validated a machine learning approach against a GAITRite system and manual sensor data labeling. The third study used the same approach on a large transversal cohort (n = 112) and a small 1-year longitudinal cohort (n = 11) [96].

Fragile X Syndrome (FXS)

In two studies in patients with FXS, investigators performed an inertial detection of stride length and velocity, swing time, peak turn velocity, double limb support time, and number of steps to turn. A fast-paced gait exacerbated gait deficits, and stride velocity variability when gait was fast paced was significantly associated with the number of self-reported falls in the past year [97]. Further, cognitive performance was significantly associated with shorter stride length slower turn-to-sit times in premanifest patients [98].

Friedreich’s ataxia (FRDA)

We found two studies of patients with FRDA that used inertial technology. Remote monitoring of several voice parameters, upper limb function through 14 parameters that can be grouped into parameters related to movement velocity, spectral frequency, and parameters related to deviation of the ideal trajectory, and 15 spatiotemporal gait parameters was feasible over 1 week [99]. The sensitivity of an upper limb composite score, the AIM-S, obtained through sensors contained in a spoon designed to detect deterioration in upper limb function, was greater than other measures [100].

Multisystemic rheumatological diseases

Five studies in subjects with sarcoidosis reported transversal evaluation of physical activity expressed in various way as number of steps [101,102,103,104], sit-to-stand transition [102], or acceleration per day [105] in real-life settings over seven to 14 days. Patient-reported physical activity and fatigue collected through an app correlated with smartphone-tracked physical activity [104]. In dermatomyositis, three studies focused on real-life transversal physical activity assessment for 7–8 days. One study used vigorous physical activity detection for the validation of the stage of exercise scale in several rheumatic diseases [106]. The single study on scleroderma, which used an accelerometer for a 6-day period, explored validity of physical activity level detection [107].

Miscellaneous diseases

Prader–Willi syndrome (PWS)

Patients with PWS were included in eight different studies using accelerometers and/or gyroscopes. One study simply demonstrated significant agreement between mean stride length, mean stance percentage, and stance percentage coefficients of variation measured during two different walking tasks [108]. One study showed that because of the poor gait symmetry in PWS, upper body accelerations and the harmonic ratio (i.e., measure of step-to-step symmetry based on trunk acceleration) can be used as innovative parameters for gait analysis, providing information that cannot be extracted from spatiotemporal parameters only [109]. Two studies showed that patients with PWS do not meet healthy physical activity recommendations [110, 111]. Another group showed that moderate physical activity accounts for the variability found in bone mineral content and density and its z-score [112]. Two studies assessed the effect of training exercise programs. One reported that training increases moderate‐to‐vigorous physical activity and walking capacity without effect on body composition [113], whereas the other found that training did not result in changes in moderate‐to‐vigorous physical activity but did cause improvements in body coordination, strength, and agility [114].

Metabolic diseases

One study, in subjects with Pompe disease, focused on concurrent validity of physical activity outcomes (step count and peak 1-min activity) [115]. Another study showed that measurements performed by patients at home with a handheld electrical impedance device did not differ significantly from those performed in the clinic setting. These measurements correlated with measures of muscle strength and function and quantitative muscle ultrasonography [116]. Two trials compared physical activity and rhythmicity of children with mucopolysaccharidosis type III and Fabry disease to controls using continuous inertial recording [117, 118]. One study of patients with GM2 gangliosidosis used both a wearable device and a smartphone application and focused on compliance. This study failed to show significant changes in average daily maximum, average daily steps, or average daily steps per epoch over 6 months [119]. We found one study that assessed the feasibility of the use of accelerometers on patients with Niemann–Pick type C, DMD, and with juvenile idiopathic arthritis using the same outcome measures as used in the study of patients with GM2 gangliosidosis [18]. In another study, variables used in similar studies in PD (i.e., bradykinesia, dyskinesia, fluctuation score, percentage time immobile, and percent time with tremors) were used, demonstrating that bradykinesia and percentage time immobile are features of Niemann–Pick type C [120].

Rett syndrome

Of the five studies found that reported studies of subjects with Rett syndrome, all but one were observational, and no study had controls. Four studies used the same three inertial devices [121,122,123,124], and one study compared these sensors [125]. In a real-life setting, patients with Rett syndrome have sedentary behavior based on a 4-day accelerometer recording on the ankle [121]. One study was interventional and assessed the effects of an interventional program that aimed to increase enjoyable activities; positive effects on sedentary time, daily step count, and walking capacity were observed after the intervention [122]. Using machine learning algorithms and data derived from heart rate variability and activity metrics, a research team built an accurate classifier between high and low-severity Rett patients [123].

Tuberous sclerosis complex (TSC)

We identified a single study in this patient population that used accelerometry to assess movement levels in controlled settings and correlated these with clinical parameters [126].

Narcolepsy

Our review identified one study that compared energy expenditure and outcomes of physical activity (e.g., metabolic equivalent of task, step count, total energy expenditure), between a group of participants with narcolepsy type 1 and a group of participants with narcolepsy type 2 or idiopathic narcolepsy [127]. A tri-axial accelerometer was used, and there were no significant differences between the two groups that could account for weight gain.

Quality assessment

Finding a quality assessment tool suitable to the type of studies included in this review was challenging. The quality of studies we found were generally good even if data needed to answer all items of the quality assessment questionnaire were missing for most studies. The results of the quality assessment are presented in Additional file 3.

Discussion

This review presents the developments in the field of digital outcome measures for a range of rare diseases with neurological manifestations. The increase in the number of studies on this topic attest of the growing interest in the field: In 2011, there were two published studies that included digital outcome measures, whereas in 2022 there were 25. A key finding of this review is that the use of digital outcome measures for motor function outside the clinical setting is feasible and it is being employed to evaluate subjects with a broad range of diseases. Although this is very encouraging, a publication bias cannot be excluded, as studies that failed to recruit participants or to collect robust data are less likely to be published.

An asset of technologies that provide digital measures of motor function is their potential application to the home setting, which is even more appealing since the COVID-19 pandemic. Although these devices are meant to be used at home, more than half of the evaluations in the studies took place in clinical setting. This indicates that many of these technologies are not yet ready to be used unsupervised. This could partially explain why the use of such outcomes for interventional studies remains uncommon (n = 14) and limited to a small number of diseases (i.e., PWS, Rett syndrome, FSHD, DMD, SBMA, ALS, PSP, FD, and HD). Future research should focus only on sensors that are meant to be used outside clinical environment, unless there is a clear plan for technology evolution toward that goal or if the purpose of the device is not remote monitoring (e.g. a device aiming through digitalization to remove part of the subjectivity of standard in-clinic assessments).

Another key finding is that most studies investigated a number of potential outcomes, mostly to demonstrate the validity of the technology via group comparisons and correlations with gold standard outcomes, whereas other psychometric properties were largely neglected. It is generally assumed that digital outcomes will be more sensitive to change in patient condition than standard tests and scales; however, this has yet to be proven, as suggested by the small number of studies reporting longitudinal data (n = 34) or sensitivity to positive or negative change (n = 21). A few research teams have performed psychometric evaluations, in CMT [128, 129], ALS [28], FSHD [44], and FRDA [130], regarding the use of accelerometers for physical activity assessment, but the psychometric evaluations were not complete.

Despite the number of studies, we found that few outcome measures had been robustly studied and adopted as secondary or primary endpoints in clinical trials. An exception is the recent qualification of the SV95C in DMD [131]. SV95C has been shown to be discriminative and to have concurrent validity, reliability, and clinical relevance as assessed through interview with patients, precise context of use, and sensitivity to positive and negative change. Normative data have been collected [132]. As stated on the FDA website [5], the collection of information to formally qualified a clinical outcome assessment requires a long-term commitment. SV95C has been used in evaluation of subjects with limb-girdle muscular dystrophies, FSHD, and SMA, but how the qualification in DMD can be extended to less common diseases remains to be determined. This review did not identify other digital outcomes for which available metric properties description could complete the qualification process.

Outside diseases like DMD or Rett that are nearly only observed in males or female, respectively, very few studies paid specific attention to balance gender representation in the studied population- which may question the generalizability of the conclusion. Nevertheless, gender may clearly influence the phenotype of some autosomal condition such as SMA type 3. In this view, balancing the sample according to gender rather than to phenotype or phenotype/age can potentially result towards a bias towards specific- and more specifically milder- phenotypes.

Of the 139 relevant papers, the vast majority used wearable sensors and fewer studies used portable sensors or non-wearable/portable technologies. The most common sensor is magneto-inertial technology (used in 126/139 studies). This technology is at a more mature stage than other types of sensors. As shown in Fig. 2, most studies reported the use of sensors that were attached on the trunk or the back, which allow for gross evaluation of motor function and ambulation. In contrast, sensors fixed at the extremities allow for refined and more precise analysis of movement. Physical activity was the most frequently assessed parameter (n = 58), followed by gait (n = 39). This is likely because ambulation is critical from patient, caregiver and and clinician perspectives. Technical reasons could also contribute as gait is easier to record digitally than movements that are more erratic. Development and validation of outcomes for upper limb are one step behind. Given the heterogenicity of variables, there appears to be no consensus on an efficient way to quantify upper limb motor function or activity. Quantification of abnormal movements such as tremor may be slightly easier to harmonize.

Although promising, machine learning approach was used in few studies; it was employed almost exclusively in studies of subjects with HD [59, 61, 63, 67,68,69]. These studies demonstrated that machine learning algorithms can distinguish between normal and pathological motor performance. In rare diseases, machine learning faces the issue of the paucity of data. A major obstacle is the difficulty associated with gathering clinically valid information through interviews with patients, as the outcomes that result from machine learning are often non-intuitive. The solution resides in a clear association with a hard disease milestone or a clinically significant event, but this requires large amounts of data of high quality. Recent papers in DMD [14] and Friedreich’s ataxia [133] showed the feasibility and the potential of machine learning, but these proof of mechanism study were conducted using different sensors in a controlled setting [14].

Although we used a pre-defined search strategy, our search had several limitations. Four papers from 2021 were not accessible in their full versions, and we had to discard them. Other limitations could be related to methodological reasons (e.g., we included only manuscript published in English, did not search grey literature, and only included references cited by selected papers). For this review, we excluded studies using technology-based devices with no potential to be used outside the hospital setting. Examples of excluded articles include studies using laboratory motion analysis such as hip kinetic and spatio-temporal gait parameters in boys with DMD [134, 135] and Kinect-based stereo camera assessments that have been used in studies of subjects with ALS [136] and FSHD [137]. Motion analysis is a robust alternative to standard clinical evaluation and has the potential to be used as a complementary approach to remote movement monitoring even at home setting.

Conclusion

This review highlights several issues limiting the full integration of digital outcome measures into clinical trials and practice. Despite promising initiatives, most studies included in this review are monocentric, and almost all studies were performed in small groups of subjects. Another shortcoming is that it is difficult to compare data between studies, as different manufacturers provide different algorithms to analyze the recorded data.

Future research should focus on the systematic validation leading to the qualification of devices, variables, and algorithms to allow remote evaluation of diseases. To address the issue of generalizability, open-source platforms that facilitate data collection, sharing, and interpretation are necessary. The qualification of a first digital outcome could pave the way to development of digital outcome measures. It is clear from the current state of the field that digital outcome measures have great potential to positively impact clinical trials and accelerate drug development processes.