1 Introduction

The availability of low-cost, low-power, accessible Information and Communication Technology (ICT) can be profitably applied in the field of life sciences. Increasing evidence supports the effectiveness of tele-health systems from both the social and the economic standpoint [1]. Remote monitoring can provide a constant and objective, cost-effective follow-up of patients, so addressing an important health demand.

Parkinson’s disease (PD) is the second most common neuro degenerative disorder after Alzheimer’s disease [2]. Recent surveys estimate that 7–10 million people are affected by PD worldwide, and the prevalence of the disease exceeds 1.9% over the age of 80 years. PD patients suffer from both motor and non-motor symptoms, related to the degeneration of dopamine neurons, especially in (but not limited to) the substantia nigra region of the midbrain. The reason for this cell death is not completely understood, but is related to the piling up of degenerate proteins into the so-called Lewy bodies in the neurons. The most common, idiopathic PD form stems from a complex interplay between genetic predisposition and environmental factors, such as exposure to pesticides and a history of head injuries.

The cardinal motor symptoms of PD define the so-called parkinsonian syndrome, first described in 1817 by Dr. James Parkinson as the paralysis agitans (shaking palsy). They encompass rigidity, tremor at rest (i.e., disappearing during voluntary movements), bradykinesia and postural instability [2]. The onset of motor symptoms is typically asymmetric, with an arm affected first. Then, signs and symptoms spread to the other limb on the same side, and with the progression of the disease also affect the opposite side. On the other hand, non-motor symptoms include olfactory impairment, orthostatic hypotension, constipation, sleep disturbances and speech impairment. Behavioral problems, depression and anxiety frequently occur, and dementia is quite common in the advanced stages of the disease [3].

\(\textsc {l}\)-Dopa has been the main PD treatment for over 40 years. However, in advanced stages, patients may alternate periods of good disease control (on state), with others of poor control and significant PD symptoms (off state). Up to 50% of patients can develop mild motor fluctuations within 2 years of \(\textsc {l}\)-dopa therapy, and 70% after 9 years [4]. Another symptom in later stages of the disease is freezing of gait (FOG), a form of akinesia defined as a brief, episodic absence or marked reduction of forward progression of the feet despite having intention to walk [5, 6]. It may be elicited by activities such as initiation of gait, turning, passing though narrow spaces, negotiating obstacles or performing dual tasks [7]. FOG is strictly related to the risk of falls [8]; it limits the patient’s autonomy and impairs his/her quality of life (QoL).

Even though the control of motor fluctuations is a key clinical need for PD patients, such symptoms are difficult to appreciate in the outpatients department, and their assessment is often based on poorly reliable self-reports and questionnaires [9, 10]. On the other hand, lightweight, wearable inertial sensors may enable an effective identification and tracking of motor complications. The remote monitoring of motor fluctuations in PD patients can allow the clinicians to develop a thorough clinical picture of each patient, using the large amount of data made available by modern sensors and processing technologies. This can enable personalized follow-up of each patient, proper and timely adjustment of drug posology, and ultimately a cost reduction due to less access to the emergency departments and hospitalizations. This is inline with the modern paradigm of personalized medicine.

We focus on two classes of symptoms that can be effectively remote-monitored:

  • Bradykinesia;

  • FOG.

Bradykinesia Bradykinesia is a pathologically slow movement and an impaired ability to move following a command. Besides unusually slow movements, other common aspects of bradykinesia include shuffling and dragging one or both feet when walking, reduced facial expressions, difficulties in implementing repetitive tasks. The circadian evolution of bradykinesia depends on the fluctuating response to drugs.

At present, bradykinesia, as well an many other aspects of PD, is assessed by means of clinical scores such as MDS-UPDRS (Unified Parkinson’s Disease Rating Scale), promoted by the Movement Disorder Society (MDS) [11]. This rating scale encompasses six parts. Part I aims at assessing mental state, behavior, mood, pain and autonomic functions. Part II is a patient self-evaluation of several daily activities. Part IV scores possible complications such as dyskinesia and fluctuations of several clinical conditions, and Part V and VI take into account the severity of the disease (Hoehn and Yahr scale) and the disability degree.

On the other hand, Part III is the clinical evaluation of several motor skills, and takes into account speech impairment, facial expressiveness, tremor, rigidity, hand sensitivity and mobility, leg agility, dexterity in getting up from a chair, posture and postural stability, gait characteristics, bradykinesia and hypo-akinesia. The clinician assigns a score between 0 and 4 according to the severity of the symptom at hand, and following proper guidelines. This approach is not free of criticism, as it may lack repeatability and is affected by both intra- and inter-rater variability [12, 13]. Furthermore, the clinical evaluation, performed only within pre-scheduled follow-up sessions, makes virtually impossible for the neurologist to appreciate short-term variations of the patient’s disability level and to plan proper therapy adjustments.

Leg agility (LA) is included in the MDS-UDPRS Part III for motor evaluation of lower limbs. This test consists of raising and stomping each foot on the ground at least 10 times, as high and as fast as possible, starting from a sitting posture. According to the MDS guidelines, the task scores should encompass the following parameters: speed of execution, amplitude, slowing, hesitations and interruptions. LA is strictly correlated to bradykinesia [14]. For this reason, and due to its simplicity of execution and relatively easy assessment, in this paper, we have selected LA as an indirect measure of bradykinesia.

FOG A thorough knowledge of FOG frequency, duration and response to drug therapy provides the neurologist with valuable information on motor fluctuations and disease progression. Nevertheless, the clinical assessment of FOG is very difficult. Its episodic nature makes it hard to appreciate FOG events during pre-scheduled follow-up sessions in the medical office. The probability of suffering a FOG event increases with the time elapsed since the last \(\textsc {l}\)-dopa administration; moreover, it is dependent on the patient’s attention, emotional state and on many other cognitive factors [15,16,17]. Inevitably, the FOG assessment is based on self-reports, little correlated with the events appreciated in the outpatients department [18, 19]. From these considerations, it turns clear that only long-term observation, possibly carried on during activities of daily living (ADL), can provide a reliable assessment of this phenomenon.

The motivation of this work lies in the need for quantitative methods to assess the motor conditions of PD patients in their ADL. Monitoring the daily fluctuations of the motor symptoms can allow the clinicians to adapt the drug posology to the specific response of the single patient. The use of low-cost, low-power wearable sensors, as well as powerful machine learning (ML)-based classification techniques can provide effective and supportive treatment, accessible to all patients also in a context of overall cost reduction. Our objective is to implement a sort of electronic diary of the patient during ADL, and to check the effectiveness of such an electronic diary, in terms of reliability of measures during non-supervised activities, overall user acceptance and cost. We use low-cost, easy-to-use instrumentation and keep the patient’s acceptance and comfort as a key issue.

As a feasibility study, we have first investigated the use of common smartphone sensors to collect accelerometer and gyroscopic data, and have evaluated the performance of several ML algorithms to assess PD bradykinesia and FOG. For bradykinesia, we have carried out measures related to the LA test. As for FOG detection, participants have performed a 6-min walking test (6MWT) with frequent pauses and turning. Moreover, to evolve towards ADL, providing information about the number, duration and daily distribution of FOG episodes, we have also monitored patients during the main activities of their pre-scheduled follow-up visit, so that to catch different activities.

The rest of this paper is organized as follows. In Sect. 2, we discuss the state-of-the art in the field of PD symptom monitoring with automated tools.

In Sect. 3, we describe the experimental setup and the cohort of PD patients enrolled for the preliminary experiments using smartphones for LA evaluation and FOG detection, as well as the ML algorithms implemented, along with the achieved results. In Sect. 4, we describe the SensorTile IoT module, along with the preliminary results achieved with this platform for PD patients’ monitoring. Finally, in Sect. 6, we discuss the evolutions of the present work towards the creation of an effective electronic diary, and we draw the conclusions.

2 Background

Many papers address the use of wearable devices in healthcare applications, to promote physical activity [20], evaluate elderly people conditions [21], enable real-time inertial data treatment [22]. Moreover, many recent papers have been published related to remote monitoring, analysis and quantification of PD motor symptoms using wearable inertial measurement units (IMUs). The main differences concern the number and localization of such sensors, the experimental setup and the achieved sensitivity and specificity of the addressed algorithms. A thorough review of works in this field can be found in [23]. Several papers make use of data-mining techniques and ML systems to estimate the severity of PD cardinal motor symptoms. In Ref. [24], ten clinically diagnosed PD patients and ten control participants were provided with a low-cost Android OS smartphone, and were asked to perform short exercises related to voice, posture, gait, and finger tapping, four times a day. The method could effectively discriminate PD patients from controls, and the authors claim that it is useful for monitoring the disease progression, even though this task has not been demonstrated.

Many papers address upper limb bradykinesia by implementing the finger-tapping task or digital spiral analysis [25,26,27,28]. Lower limb bradykinesia is estimated in Refs. [29,30,31], using several IMUs placed on different body segments, and implementing motion analysis. In Refs. [32, 33], an estimation of some MDS-UPDRS Part III tasks, namely LA, gait analysis and sit-to-stand, is performed using three IMUs located on the patient’s chest and thighs. Several ML algorithms are employed, to assess their capability of reproducing the MDS-UPDRS evaluation yielded by expert neurologists; the results have been validated using a state-of-the-art optoelectronic system. These studies are comparable with ours in terms of methodology (e.g., type of sensors employed and position). Furthermore, the cardinality of the patients sets for each UPDRS class are similar: in our study, we address class 0: 21.8%, class 1: 45.6%, class 2: 21.8%, class 3: 8.1% whereas in Refs. [32, 33] the distribution is class 0: 19%, class 1: 46%, class 2: 30%, and class 3: 5%. Hence, in this paper our results are compared with those in Refs. [32, 33].

In Ref. [34], a single IMU, placed on the patient’s chest, estimates the MDS-UPDRS sit-to-stand score in a controlled environment. However, to the best of our knowledge, little attention has been devoted to the assessment of lower limb bradykinesia using a common smartphone and its sensors as the measurement unit, to be used during ADL, hence in non-supervised conditions.

A variety of wearable IMUs have been proposed for objective assessment of FOG. Yet, there is little agreement concerning their number, location, experimental protocols and data-processing algorithms. Dedicated tri-axial accelerometers are commonly addressed, either alone [35,36,37], or combined with gyroscopes [38, 39] or magnetometers [40]. Both a single location and a combination of two or more locations have been addressed. The shin [38, 40] and waist [35, 41] are considered suitable as single locations. When two or more sensor sets are employed, they are placed also on feet [42], knee [43], thigh [44], chest [45] or wrist [46].

Smartphones are used as assistants in fitness applications [47, 48], heart rate monitoring [49], gait recognition [50], and Human Activity Recognition (HAR) [51,52,53]. The use of smartphones for detecting FOG episodes is proposed in [39, 41, 54, 55]. Experimental protocols encompass timed up-and-go on a standardized 5-m course [41, 42], walking tasks and turns with or without FOG provocation [44, 46, 56] or dual tasking (e.g., carrying a full glass of water while walking) [36, 39]. Some experimental protocols emulate unconstrained ADL in laboratory [36, 44] or at patient’s home [57, 58]. All protocols encompass video recordings during data acquisition, with videos labeled by clinicians offline.

The data-processing techniques are various. Threshold algorithms are very simple and provide reasonable performance [56], but require the optimal threshold to be tuned on every single patient [42]. Naive Bayes (NB) [45], random forest (RF) [57] and support vector machine (SVM) [36, 58] are commonly addressed ML algorithms. NB and RF are computationally less demanding than SVM, but this latter exhibits higher accuracy and robustness [59]. In Ref. [41], 20 PD subjects wearing a waist-worn smartphone execute a walking test, and accelerometer data are processed by means of a threshold-based algorithm, achieving sensitivity, specificity, precision, and accuracy of 87.6%, 95.0%, 69.6% and 84.4%, respectively. In Ref. [58], SVM is employed, with data measured on 21 PD patients equipped with a waist-worn IMU; sensitivity and specificity turned out to be 74.7% and 79.0%. In Ref. [37], the Continuous Wavelet Transform is used for FOG identification on 10 PD patients wearing three accelerometers on shank, thigh and waist, achieving sensitivity and specificity of 83.0% and 67.0%, at the expenses of a remarkable computational burden. In [60], a model based on convolutional neural networks is applied on data from 21 PD patients wearing an IMU on their waist, achieving sensitivity, specificity, and accuracy of 92.6%, 88.0% and 87.0% respectively. In [55], a smartphone is placed in the patient’s trouser pocket, and data are represented as 2D images and transferred to a remote server for processing. The methods achieves sensitivity, specificity, accuracy of 93.8% 90.1% and 91.8%, respectively, at the expenses of a quite high computational complexity. In general, we believe that these results are very encouraging, even though direct comparison is not straightforward, as the studies address different experimental setup and rather small patient cohorts for algorithm validation.

3 PD monitoring using smartphones

In this section, we describe the experimental setup and the cohort of PD patients enrolled for the preliminary experiments using smartphones for LA evaluation and FOG detection. The ML algorithms implemented are described, along with the achieved results.

3.1 Dataset

The study has been carried on at the Regional Reference Center for Parkinson’s Disease and Movement Disorders, University Hospital Città della Salute e della Scienza, Turin, Italy, and at the Associazione Amici Parkinson Piemonte, a private association of PD patients in Turin. The experiments are in accordance with the Declaration of Helsinki. Participants received detailed information on the study purposes and execution, and written informed consent was obtained. Demographic and clinical data were noted anonymously. The study was approved by the local Ethics Committee.

A total number of 93 patients (mean age: 69, range: 43–87; mean age at diagnosis: 60.1, range: 39–81; Hoehn and Yahr score: 2.3, range: 1–4) were recruited for LA experiment. Instead, FOG study involved 38 subjects (mean age: 70.7, range: 44–84; mean age at diagnosis: 59.9, range: 36–77; Hoehn and Yahr score: 2.5, range: 2–4). The inclusion criteria for PD patients were a clinical diagnosis of PD with motor symptoms [61], no major comorbidities, dementia or vision impairments. During the test execution, most PD patients were in different stages of their daily on state: they had taken their usual drug dose, and they had not yet reached the time of subsequent dose, even though different time intervals had elapsed of each patient. Subjects needing gait assistance (e.g., walking stick, crutch) were included in both groups (patients: 21.1 %). The characteristics of the two populations are summarized in Table 1. Please note that about 70% PD patients are male in both studies; this is in line with the well-known fact that the relative risk is 1.5 times greater in men than it is in women [62].

Table 1 Participants’ demographic and clinical characteristics

In the first stage of this study, we employed a Samsung S5 mini-smartphone. The technical characteristics of the embedded inertial sensors are summarized in Table 2. Since acceleration signals related to human activity lay in the 0–20 Hz band [63], with amplitudes ranging between ± 1 g during walking [64] and ± 2 g during running [65], the sensors included in the smartphone largely meet the requirements in terms of sample frequency, resolution and dynamic range.

Table 2 Smartphone sensors technical characteristics

3.2 LA detection: experimental protocol

In the evaluation of the LA task, and following the MDS-UPDRS guidelines, subjects were asked to sit in a straight-backed chair and place the foot on the ground in a comfortable position. Then, after being properly instructed by the expert neurologists, the subjects performed the LA task from the UPDRS Motor Section III with each leg separately. A Velcro armband equipped with the Samsung S5 mini-smartphone was fixed to the patient’s thigh, with the y-axis parallel to the femur direction. Figure 1 shows the experimental setup. Data were locally stored by means of SensorLog, a commercial app for Android 6.0 [66]. Globally, we have measured 184 LA tests, as 2 patients were able to perform the test with a single leg only.

Fig. 1
figure 1

Smartphone position adopted for the LA task

During the experiment, the LA task was scored by an expert neurologist, and the obtained UPDRS Motor Section III ratings have been employed as class labels for the supervised classification algorithms. Figure 2 shows the distribution of the assigned scores. As in most similar measures, the dataset encompasses few cases in classes UPDRS 3 and 4, as the clinical conditions of patients belonging to such classes are severe, and may even prevent them from executing the task. Hence, the distribution of the UPDRS scores is right-skewed.

Fig. 2
figure 2

Distribution of the UPDRS scores assigned to the LA tasks. 0: normal (40 repetitions). 1: slight (84). 2: mild (40). 3: moderate (15). 4: severe (5) [11]

Data from accelerometer, gyroscope and orientation sensors have been collected, exported in CSV format and processed offline using Matlab, version 2018a for Windows 10. First, raw signals were recalibrated to compensate for slight deviations from the ideal positioning (i.e., gravity acting only on the vertical component, perpendicular to the smartphone screen. Then, after being preprocessed using a Chebyshev type I low-pass filter with a cutoff frequency of 3.5 Hz and stopband attenuation equal to 20 dB at 4 Hz, data have been processed to extract the relevant kinematic features to feed the ML algorithms.

Features were selected to be representative of the major traits that distinguish motion in PD patients and unaffected controls, reflecting the same aspects that the clinician is called to evaluate in the UPDRS scoring. Similar studies [29, 30, 32, 67, 68] have also been taken into account. The 16 selected features are listed in Table 3 and briefly described in the following.

Table 3 List of features used for LA detection
Fig. 3
figure 3

Time trajectories of x-axis angular velocity for LA UPDRS scores ranging from 0 to 4. For each class a patient typical signal is displayed

The number of leg movements is the number of peaks in the pitch signal. Since the UPDRS rating requests at least 10 foot beats for the LA task, this feature can distinguish between patients who can accomplish the task and those who cannot. The ith time interval I(i) between consecutive thigh movements is defined as \(I(i) \triangleq t_P(i+1)-t_P(i)\), where \(t_P(i)\) is the time associated with the peak of the (i)th repetition. The feature used in this analysis is the mean interval over 10 repetitions of the foot stomping on the ground (or over the number of actually executed foot movements, if the patient is unable to perform 10 repetitions). The standard deviation of the previously defined time intervals yields information about the steadiness of movements, expected to be inferior in PD patients with respect to controls. The thigh inclination trend, i.e., the trend of the thigh inclination from the beginning to the end of the LA sessions, is represented by the peak amplitudes evolution over time. We have considered the percentage decrease of the average peak amplitude of the first five movements with respect to the average amplitude of the last five movements; a high value suggests severe difficulty in maintaining a constant maximum inclination. On the other hand, the mean and max value of thigh inclination find their motivation considering that one of the common treats of bradykinesia is the inability to perform wide movements. One can expect to measure wider thigh movements (i.e., larger values of the inclination signal) on little affected patients, and vice-versa. The maximum acceleration and angular velocity reflects slowness in the execution of movements. The maximum absolute values of the measured angular velocities and accelerations are taken as features representing such aspect of bradykinesia. Furthermore, also RMS values of both acceleration and gyroscope signals are included into the study. Finally, the ranges of acceleration and angular velocity have proven to exhibit smaller values in patients suffering from bradykinesia with respect to controls, due to the limited range of movements performed by PD patients.

The dominant frequency \(f_\mathrm{d}\), i.e., the frequency for which the PSD of both acceleration and angular velocity is maximum, is selected as the relevant feature in the frequency domain, along with the ratio of dominant frequency power to total power, also addressed in Ref. [30] for the classification of tremor, dyskinesia and bradykinesia.

Finally, spectral signal entropy, i.e., Shannon entropy evaluated in the frequency domain, was used for the quantification of bradykinesia, dyskinesia and tremor with help of accelerometer measurements in Refs. [30, 69]. It is defined as:

$$\begin{aligned} H = -\sum _{m=1}^{N}{P(m)log_2 P(m)}, \end{aligned}$$
(1)

where P(m) is the estimated probability mass function of the signal power spectrum of acceleration and angular velocity data.

A preliminary analysis has revealed that the severity of bradykinesia is strictly correlated with the angular velocity along the direction perpendicular to the femur in the coronal plane. Figure 3 reports typical trends of this velocity, measured on patients scored UPDRS = 0 or 1 and UPDRS = 2–4. Both the signal amplitude and regularity decrease along with the severity of bradykinesia. The same trend can be appreciated considering the accelerometer signals along z-axis (Fig. 4).

Fig. 4
figure 4

Time trajectories of z-axis acceleration for LA MDS-UPDRS scores ranging from 0 to 4. For each class a patient typical signal is displayed

On the other hand, in Fig. 5, boxplots are depicted for two acceleration-based features: maximum value and signal range. Data have been split in subsets, grouping patients characterized by the same UPDRS LA evaluation.

Fig. 5
figure 5

Boxplot of the maximum acceleration value (a) and the range of the signal values (b)

A neat decreasing trend along with the severity of the bradykinesia can be appreciated, even though each single feature alone is unable to uniquely classify the UPDRS class (the interquartile ranges are indeed superimposed).

Fig. 6
figure 6

Boxplot of the maximum angular velocity value (a) and the range of the signal values (b)

Similar considerations hold true as for the maximum angular velocity and the range of angular velocity, evaluated on the gyroscope signal (boxplots in Fig. 6). Many features addressed in this paper exhibit similar trends (diagrams not reported for brevity).

3.3 FOG detection: experimental protocol

According to Refs. [36, 58], the best position of a single smartphone on the body for detecting gait impairments is waist, at lower back level, as it is close to the body center of mass and ensures maximum patient coziness.

In our experiments, participants executed the 6MWT; originally employed in cardiological rehabilitation, it is largely employed also in neurology [70] due to its ease of setting up, patient tolerance and reproducibility. After the smartphone was secured on the participants’ waist by means of a Velcro belt (see Fig. 7), they were asked to walk back and forth along a 10-m hallway for 6 min at their preferred pace. Inertial data were locally stored by means of SensorLog. Participants were free of using their usual walking aids, and of quitting the test in any moment. The tests were performed under the supervision of clinical personnel.

Fig. 7
figure 7

Experimental setup for FOG detection

Activities included walking, turning, standing, and possibly FOG episodes; no FOG provocation test was employed. To guarantee the patients’ privacy, as well as to test our tool in realistic working conditions, we have decided not to rely on any video recording. Instead, a chronometer was run simultaneously to the data-recording session, and FOG episodes, voluntary gait pauses, hesitations and gait asymmetries manually annotated by the clinician. Once collected, data were exported in CSV format and processed offline using Matlab, version 2018a for Windows 10. We recorded more than 3.5 h of acceleration signals, gathering 33 FOG events for a total of 4 min of registration. The histogram of the duration of the captured FOG episodes is reported in Fig. 8. Most episodes exhibited a duration inferior to 5 s, even though some of them lasted 20 s or more. This is in line with the expected distribution of FOG event duration [71].

Fig. 8
figure 8

Recorded FOG episode duration

Even though also gyroscopic data have been recorded, in this paper, we only present results related to accelerometer data, as they have proven to provide good classification performance.

Signals were band-pass filtered (0.5–15 Hz) with a 10-order Butterworth filter. Cutoff frequencies were chosen taking into account that the acceleration signal during locomotion lies in the band 0.5–3 Hz, whereas the signal during FOG episodes lies in the band 3–8 Hz [44, 56].

Each session of data acquisition was manually labeled using a Matlab graphic user interface (GUI), to identify signal segments related to different activity classes: walking (WA), turning (TU), standing (ST), and FOG. To reduce the computational time, from the total 3.5 h recording, for each patient, only a characteristic piece of 15 s was extracted from WA, TU and ST; while the whole FOG signal was kept. The resulting 33 min long track was segmented into 2 s-windows, with 50% overlap. This window duration is a tradeoff between resolution and computational efficiency, keeping in mind that the shortest FOG episodes carrying significant clinical information exhibit a duration of about 2 s [56]. The set of candidate features taken into account in the present work has been defined after a direct analysis of the signal kinematic parameters most suitable to distinguish FOG, and taking into account literature on similar studies [36, 58]. The list of the features selected in this paper is reported in Table 4. Among them, the total power is defined as the signal power lying in the whole signal band. The freeze ratio is defined as the ratio between signal power in the freeze band (3–10 Hz) and total signal power. The PSD peak, defined as the frequency value corresponding to the maximum of the PSD of the acceleration signal. The Number of peaks is defined as the number of peaks exceeding a given threshold, experimentally selected to be equal to the standard deviation of the corresponding signal. We have been evaluated all the addressed candidate features in each signal window, for all subjects, and for the three signal components (vertical, middle-lateral, anterior-posterior). We kept separated each class of activity performed during the test, i.e., WA, TU, ST and FOG.

3.4 LA estimation: results using the smartphone

In this section, the selected features for LA estimation are input to several classification methods. A leave-one-out validation criterion has been employed, i.e., each element of the dataset is used to test the performance of each algorithm, trained using the remaining elements. The performance of each model in terms of correct classification rate of each UPDRS class, and overall accuracy, is summarized in Table 5. For the sake of brevity, for each tested ML approach ,model parameters (e.g., Kernel function and scale for SVM, neighborhood dimension, distance metric and weight for kNN) were tuned to minimize the misclassification rate and only the best resulting models are listed.

Table 4 List of features used for FOG detection
Table 5 LA estimation: correct classification rate and accuracy for each UPDRS class (leave-one-out validation)

We can notice that the NN model (2 hidden layer of 16 neurons each with hyperbolic tangent sigmoid transfer function) exhibits the best performance for both overall (accuracy 76.1%) and single class classification, with the only exception of UPDRS 4 patients. However, this case is not significant, as patients are hardly able to perform LA due to the severity of their condition. The correct classification rate of patients belonging to UPDRS classes 0–3 ranges between 70 and 86.7%. SVM and kNN yield satisfactory results in classifying UPDRS 1 patients (77.4% and 79.8%), but their performance is lower for other UPDRS classes. LDA and Decision Tree algorithms yield overall less satisfactory performance.

Due to the random initialization of weights between neurons, NN model was run 20 times for each test. Continuous classification outcomes are averaged and then discretized to be compared to UPDRS scale; Fig. 9 reports the final confusion matrix. The performance is very sound, with the best classification results achieved with true UPDRS = 1 and true UPDRS = 3. An interesting aspect is that this method classifies incorrectly only by one step on the UPDRS scale. As also discussed in Ref. [32], such a variability is comparable with the inter-rater variability: the UPDRS evaluation performed by several neurologists is often non-homogeneous, due to the difficulty in discriminating between adjacent classes in cases of intermediate gravity. Hence, NN seems a good candidate to mimic the UPDRS clinical evaluation as for LA.

Fig. 9
figure 9

LA estimation: confusion matrix based on the results of the supervised neural network model

To achieve better insight in the behavior of the proposed classifiers, Fig. 10 reports the cumulative probability distribution (CDF) of the several methods addressed in this paper, as a function of the absolute error between the UPDRS value yielded by the algorithm and that provided by the neurologist. Even though, as for the independent variable, only integer values are considered, in accordance with the MDS-UPDRS scale, it can be appreciated that, even though all the proposed algorithms yield a rather steep curve, the NN method steadily outperforms the competing algorithms.

Fig. 10
figure 10

LA estimation: CDF vs. absolute error achieved by the proposed ML methods

Finally, in Table 6, the performance of the proposed methods are summarized, in terms of accuracy, sensitivity, specificity and area under the curve (AUC). For the sake of comparison, the best results achieved by the ML methods addressed in Refs. [33, 34] for the same LA test are also reported. From these results, we can conclude that the NN largely outperforms the other tested ML techniques, achieving very satisfactory performance scores.

Our classification method does not aim to replace the clinicians in the assessment of the clinical stage of the disease. Instead, it is a supporting tool for the clinicians to achieve a better follow-up of their patients. Hence, the algorithm should be able to appreciate variations in the patient’s conditions so that the neurologist can optimize the drug treatment. To this end, we have also tested a classification model able to distinguish between slight (true UPDRS 0 and 1) and severe motor condition (true UPDRS 2, 3 and 4). We refer to the NN model as it has shown the best overall performance. Figure 11 reports the confusion matrix achieved by this method in the binary classification task.

Table 6 LA estimation: performance measures for several ML methods
Fig. 11
figure 11

LA estimation: confusion matrix based on the results from the supervised Neural Network model for binary classification

The correct classification percentage is 92.9%, with only 3 (10) misclassifications over 121 (60) true mild cases (true severe, respectively). We believe that this result is very promising. It opens the possibility of a remote monitoring of the patient’s bradykinesia using a single smartphone, with a trivial and safe procedure that can be self-managed by the patient him/herself.

3.5 FOG detection: results using the smartphone

In this section, we present the classification results achieved for the detection of FOG episodes using a single smartphone. The features selected in Sect. 3.5 are input to a Support Vector Machine (SVM). This classification algorithm is devised to obtain high performance for unlearned data, they have the highest recognition rate among many known classifiers [59] and provide good results in FOG detection problems [?] [36]. SVM with Linear kernel was found in this work to provide the higher accuracy. A tenfold cross validation was run on a training data set encompassing, for each PD participant, a typical segment of each activity class (WA, TU, ST), and the available FOG episodes (total signal duration: 33 min). This validation method has been selected to ease the comparisons with other published work [37, 41, 58, 60]. Sensitivity, specificity, precision and accuracy are reported in Table 7.

Table 7 FOG estimation: results of tenfold cross validation on training data
Fig. 12
figure 12

FOG estimation: ROC curve of the proposed SVM model

Table 8 FOG estimation: comparison between the proposed method and similar recent studies

We can appreciate that specificity, precision and accuracy are very satisfactory. In particular, specificity exceeds 98%. The sensitivity is slightly inferior (81.2%); however, as the main goal of the algorithm is to gather information about the daily trend of FOG episodes (e.g., whether they tend to cluster during specific day moments or not), in accordance with the clinicians we have set the method to trade high specificity with a possibly inferior sensitivity, to avoid a high rate of false FOG detection. Furthermore, a leave-one-patient-out validation was performed to assess classification robustness, yielding to a mean detection rate of 80.0% of FOG windows per patient.

The receiver-operating characteristic (ROC) curve of the classifier is shown in Fig. 12. It is worth noticing that the AUC is 97.96, so confirming the excellent performance of the implemented model. Furthermore, high specificity (over 97.5%) persist for a wide range of sensitivity values (until about 87%).

It is also worth noticing that the processing time in the test phase is less than 0.5 s for a 10-s-long signal. This makes the algorithm suitable for a real-time implementation.

Finally, the proposed method is compared with other recent studies, addressing FOG detection and employing a waist-mounted device. The results are shown in Table 8. Specificity, precision and accuracy achieved by the proposed method outperform the best results from the algorithms under comparison by 4%, 19% and 9%, respectively. Despite sensitivity is actually outperformed by other algorithms, the high precision value (i.e., the probability that a window classified as FOG actually contains FOG) rises its meaningfulness, as the sensitivity alone is little significant if not associated to a high precision value.

A limitation of the presented study is that few FOG episodes have been identified. Actually, as measures were obtained during the pre-scheduled annual neurological visit, we could not employ any FOG provocation test. Most participants were in daily on state, and no selection based on previous history of FOG was implemented. For the sake of comparison, in Refs. [41, 55] about 10 min of FOG have been registered thanks to FOG provocation (e.g., dual tasking). In Ref. [37], 20 min of FOG have been collected, with hospitalized patients in their off state. In Refs. [58, 60], up to 93 FOG min have been recorded due to a complex experimental protocol implementing the tests under levodopa suppression, hence hospitalized patients; only 12 min are related to patients in on state.

Another possible limitation is that present data refer to supervised activities. To achieve a continuous and long-term monitoring of patient during ADL, new data are being collected to enlarge the set of performed activities to encompass most ADL situations.

4 PD monitoring using SensorTile™

Smartphones are widespread, easy to use and well-accepted devices. Moreover, the characteristics of the embedded inertial sensors are sufficient for human motion analysis. On the other hand, significant size, low battery autonomy and interference with other applications during data acquisition, make smartphones not suitable for a long-term monitoring. Thus, moving in the direction of acquiring data related to ADL, we consider the option to replace the smartphone with a SensorTile module.

4.1 SensorTile™

SensorTile is a compact wearable device integrating the functionalities of an IMU (accelerometers, gyroscopes and magnetometer are included), together with powerful processing capabilities provided by a 80 MHz MCU and additional environmental sensors (barometer, humidity and temperature sensors) plus a microphone. Sensor data streaming is allowed via USB, logging on SD card and via Bluetooth Low Energy.

Fig. 13
figure 13

SensorTile™

The device consists of a plastic box hosting a 100 mAh Li-Ion battery and the SensorTile module soldered on the SensorTile cradle (Fig. 13), all provided with the STLKT01V1 development kit by STMicroelectronicsTM.

The STEVAL-STLCS01V1 SensorTile core system is a tiny, 13.5 mm \(\times \) 13.5 mm square-shaped IoT module (Fig. 14) equipped with a low-power 80-MHz MCU, Bluetooth low-energy connectivity and a wide spectrum of motion and environmental MEMS sensors. The main components include:

  1. 1.

    MP34DT05-A digital microphone.

  2. 2.

    STM32L476JG 32-bit ultra-low-power MCU with Cortex\(\circledR \)M4F.

  3. 3.

    LSM6DSM iNEMO inertial module integrating a 3-axis accelerometer and a 3-axis gyroscope.

  4. 4.

    LSM303AGR Ultra-compact high-performance eCompass module integrating a ultra-low power 3-axis accelerometer and a 3-axis magnetometer.

  5. 5.

    LPS22HB MEMS nano-pressure sensor.

  6. 6.

    BlueNRG-MS Bluetooth low-energy network processor.

Fig. 14
figure 14

SensorTile core system

The STLCR01V1 SensorTile compact cradle (Fig. 15) is designed to host the SensorTile core system and to expand the functionalities of the module. Its main components are:

  1. 1.

    Solderable SensorTile footprint.

  2. 2.

    HTS221 temperature and humidity sensor.

  3. 3.

    Standalone linear Li-Ion battery charger.

  4. 4.

    MicroSD card socket.

  5. 5.

    Battery connector.

  6. 6.

    USB type A to Mini-B USB connector for power supply and communication.

  7. 7.

    Power switch.

  8. 8.

    SWD connector for programming and debugging.

Fig. 15
figure 15

SensorTile cradle

Table 9 iNemo inertial module main characteristics

Compared with commercial smartphones, this tiny wearable device provides noticeable benefits as for long-term monitoring of PD patients

  • Small size The reduced dimensions and weight of the device enable long-term monitoring. In fact, given the device coziness, it does not interfere with patient’s ADL.

  • Positioning Given the very small size of the device, many body locations, such as wrist and ankle, can be used for data acquisition in an unobtrusive way. Thus, detection and monitoring of further PD symptoms (e.g., tremor, upper limb dyskinesia) and more accurate measures of gait parameters may be enabled.

  • Sensors characteristics The iNemo inertial module provides state-of-the-art low-power 3D accelerometer and 3D gyroscope with settable Full-Scale Range, very high sensitivity and settable sample frequency (as shown in Table 9).

  • Power consumption The device is extremely energy efficient, given the low-power consumption of the integrated MCU and inertial sensors, together with energy-saving power modes provided (iNemo inertial module provide a low-power mode for data acquisition with sample frequency up to 52 Hz, that is suitable for the purposes of our study).

  • No interference Since SensorTile is a dedicated hardware, there is no interference of background applications running on the device. Data acquisition on smartphones may present gaps; this scenario occurs when other applications, installed on the device, employ inertial sensors.

4.2 SensorTile™ background

Due to its versatility, the SensorTile module has been applied in many different areas, from health (e.g., [72, 73]) to sport (e.g., [74, 75]). In [72], the authors deploy a platform (i.e., neMEMSi-Smart), produced by STMicroelectronics and similar to the SensorTile module, to assess gait and balance for elderly people. The platform is the core of a wireless Body Sensor Network, mounted on different body segments depending on the task performed. To evaluate the gait parameters, one module was mounted on the ankle, and the subjects were asked to perform the 6MWT. On the other hand, as for the stability evaluations, two modules were employed, the first one placed on the trunk and the other one on the forearm. In both cases, signals collected through the platform (3D orientation, acceleration, angular velocities and 3D compass) were stored on an SD card and post-processed.

In Ref. [76], human activities are classified in free-living conditions using 3D acceleration signals. In this work, two feed-forward artificial neural networks are loaded into the SensorTile MCU, mounted on a wristband. The communication is provided by the embedded BLE module. Furthermore, the authors overcome consumption issues, coupling the module with organic photovoltaic solar cells installed on the same wristband. This solution allows them to improve the battery autonomy. Torti et al. [73] have developed a recurrent neural network suitable to be incorporated into the MCU. The authors aim to detect falls in real-time during ADL, using a tri-axial accelerometer. After having verified the project feasibility in terms of data storage, memory consumption, battery lifetime, the developed software was embedded in the SensorTile MCU for the validation. In Ref. [74, 75], SensorTile was deployed to objectively evaluate athletic performance and monitor sport activity. Iervolino et al. [74] performed a preliminary gait analysis in both walking and running conditions, with the device located on the runner ankle. Acceleration data were processed to achieve an early activity recognition. Then, gait parameters (e.g., stride period, stance period) were gathered during running/walking sessions. Finally, the work described in Ref. [77] validates the use of the BLE technology in applications that require high throughput. Human activities monitoring through a body area network (BAN) are addressed, and the BLE performance evaluated on different sensor nodes, including SensorTile. This latter turns out to provide the best performance in terms of efficiency. This result corroborates the use of a such platform for biomedical applications. To the best of our knowledge, there is still no SensorTile application for PD patients, aiming at building up an electronic diary to keep a record of PD symptom progression in unsupervised environment.

4.3 Inertial signal assessment using SensorTile™

The performance of the algorithms developed for LA and FOG assessment, and making use of the smartphone, still hold valid or are improved if this latter is replaced with a SensorTile module. In fact, the technical characteristics of inertial sensors embedded in SensorTile are superior to that of smartphone sensors, provided that these latter fully meet our motion analysis requirements. Hence, we can conclude that the inertial signals acquired through the two devices are both valid for the application at hand. This is of fundamental importance, as it allows to achieve interoperability and to run the classification algorithms independently of the addressed platform.

Fig. 16
figure 16

Smartphone and SensorTile comparison in typical ADL

To validate these claims, we have performed gait data acquisition on a subject performing ADL activities (e.g., walking, turning, postural transitions). Both devices were located in the same smartphone belt and secured to the subject lower back. Data were saved on SD card and exported in CSV format for offline processing on Matlab. The records of the two nodes were synchronized and residual delay was removed offline. Figure 16 shows the resulting inertial signals related to different activities.

As expected, the difference between the two inertial signals can be hardly appreciated. Still, the higher sensitivity of the SensorTile inertial sensors, together with the slightly different position of the two devices, can explain the resulting small gap in terms of signal shift. Such a gap is almost undetectable as for angular velocity signal (Fig. 16b), due to the lower sensitivity to the rapid small-scale vibrations. In any case, it is worth to emphasize that such difference is so small that the performance of the developed algorithms, validated on previous data acquisitions, can be considered valid also if SensorTile is employed. The added value of this result is that the to-be-implemented electronic diary can be fully interoperable between the two platforms.

5 Discussion

Recently, the Movement Disorder Society—Task Force on Technology pointed out that a clinical assessment of PD based on pre-scheduled outpatient visits is suboptimal. They stated that obtaining reliable longitudinal data during ADL is crucial, and proposed a roadmap to make the introduction of mobile technologies more effective for patients, clinicians and caregivers [78]. The main objectives should be to support clinical diagnosis, implement a better follow-up of the disease progression, and detect subtle yet significant signs and symptoms. They also emphasized that the algorithm validation is a delicate step. In fact, the gold standard clinical scales (such as the UPDRS addressed in this paper) are prone to inter- and intra-rater variability, and their accuracy may be outperformed by digital algorithms; this may account for the suboptimal correlation between the two classed of measures. Moreover, data taken during ADL are affected by several confounding variables. Nevertheless, these issues should be largely overcome by the advantages of a continuous monitoring [78].

5.1 The patient’s and caregiver’s points of view

The main advantage for patients is that the disease progression could be finely monitored without requiring them to move to the outpatient department. Besides impossible for economy and resource allocation reasons, this would also be impractical for patients themselves. Moving to the hospital is a cause of stress, and implies logistic issues. On the other hand, a fine posology adjustment can extend the period in which the disease is well controlled, yielding an improved QoL. Psychological benefits must also be considered; the patient feels more monitored and safer, and the long-term adherence to the therapy is promoted. Similar benefits hold for the caregivers. The patient is expected to have better QoL, more stable mood and more residual autonomy; less logistic issues should be faced.

Acceptance deserves some more discussion. We have interviewed about 100 patients, and they agree that a noninvasive technology could be well accepted. Of course, it should not impair ADLs, be unobtrusive, able to catch subtle changes in physical conditions, to function as a personalized electronic diary, and to paint a global picture of each single patient.

Using the smartphone to collect data may be suboptimal from the patient’s point of view, due to the relatively high weight and size. On the other hand, smartphone is a widespread instrument, and does not imply significant additional cost to the patient. We have performed a preliminary study, in which some patients were asked to perform the LA measures at home, five times per day. This task has demonstrated to be feasible and little bothersome, even though the supervision of the caregiver was necessary. Less cumbersome technology such as SensorTile are expected to be even more manageable, even though it implies an extra cost to be sustained. This is the reason why we pursue interoperability of different technological devices, matching different patient’s needs and requirements.

5.2 The professional point of view

Professionals are aware that a PD patient follow-up based on a yearly, pre-scheduled visit is not adequate, especially in intermediate stages of the disease. They are well ready to accept technology support to ameliorate the patient’s management, facilitate decision-making, and catch signs of impairment before they occur. On the other hand, clinicians are not ready to manage large amounts of data; hence, the information should be condensed in periodic reports, plus possible alarms to be received in the hospital or in the outpatient department. It must be noticed that, as also discussed in Ref. [78], at present we lack a proper infrastructure able to transmit and store such data. Future research must be in the direction of designing a scalable, open-source, web-based architecture, able to integrate several types of information, to manage different technology standards, to meet regulatory and security rules for medical data. The cost effectiveness, in terms of improved health and reduction of hospital admissions, needs also to be properly quantified.

6 Conclusion and future work

Home monitoring of PD motor symptoms can provide valuable information to check PD progression, motor fluctuations and response to drug therapy. To this end, in this study we have implemented algorithms using inertial data from wearable sensors, to monitor bradykinesia and FOG. In a first phase, as a feasibility study, we have employed the inertial sensors embedded in a common smartphones; then, we have addressed the SensorTile platform by STMicroelectronics, to move towards a seamless solution for implementing a PD patient’s electronic diary. The achieved preliminary results are very promising, as the addressed ML methods have yielded excellent results, robustness, limited computational burden and high generalization capability.

As both smartphone and SensorTile solutions exhibit pros and cons, we intend to pursue a software solution that guarantees interoperability between the two systems. Future developments concern the acquisition of more training data, especially for the patients with severe conditions (UPDRS 3-4). Furthermore, a patient-specific training of the classificator could be addressed, leading to further improvement of the detection performance. A real-time implementation of the algorithm will be provided, given the computational efficiency of the proposed algorithm. We plan to include more monitoring tasks in the same wearable device, to provide a thorough patient follow-up. Finally, we plan to use our tool in an experimental trial addressing the new drug opicapone [79], to measure to what extent this drug is able to limit daily fluctuations in Parkinson’s disease patient.