1 Introduction

Neurological disorders are the leading source of disability worldwide, and Parkinson’s disease (PD) is the second most common disorder, affecting over 10 million people [1, 2]. The degeneration of dopaminergic neurons characterizes PD, altering voluntary movements and producing motor symptoms. The most common symptoms include slowness of movements (Bradykinesia), rigidity, and tremor [3]. Studies indicate that such motor symptoms are asymmetric from the onset of the disease [4]. For instance, this observation is supported by a large cohort study that involved more than 350 patients [5, 6].

Unilateral involvement in motor symptoms is a prevalent characteristic of Parkinson’s disease [7, 8]. However, this unilaterality phenomenon is poorly understood [8]. Recent works indicate that the asymmetrical neurodegeneration in the substantia nigra affects the contralateral hemisphere’s function of the predominantly affected side of the body [7, 9]. These predominant one-side symptoms may persist during disease progression, affecting cognitive function and other motor and non-motor features [4, 9]. Therefore, quantifying and analyzing the lateralization allows to enhance correlation analysis between motor and non-motor impairment with PD [10, 11], supporting diagnosis, prognosis and treatments planning [8]. The body side of symptoms onset can indicate different disease involvement, such as right-side tremor, which has a better prognosis, and left-side bradykinesia-rigidity, associated with cognitive decline [5, 8]. However, one-side motor impairments (bradykinesia, resting tremor, and rigidity) typically manifest at advanced stages, when roughly 50% of dopamine cells have already been lost [12].

Nowadays, oculomotor patterns have emerged as a promising PD biomarker due to their high sensitivity to capture eye motor impairments in the prodromal stages of the disease [13,14,15]. The presence of dopaminergic neurons connected to the retina path can be affected in the early stages of the disease, causing visual impairments [15]. Experimental studies support this statement. For instance, a study of 112 diagnosed patients found pervasive ocular microtremors, characterized as oscillatory movements of 5.7 ± 1.5 Hz and mean magnitude of 0.27°, preventing stability during a gaze fixation task [16]. However, standard protocols in these studies lack dynamic analysis, which may lead to lost important information about microtremor impairments. Therefore, computational and machine learning schemes have been proposed to quantify eye-tremor patterns, providing an enhanced description of eye movement and supporting PD diagnosis [17, 18]. To the best of our knowledge, no studies have yet explored the lateralization of oculomotor abnormalities captured from fixational eye studies, despite the sensitivity of eye movement for early detection.

This work proposes a deep convolutional representation that can capture oculomotor patterns associated with PD, allowing unilateral modeling and the respective PD classification. Firstly, the proposed method encodes videos into spatiotemporal slices that record fixational patterns from an ocular fixation task. These slices are then projected into the convolutional architecture to learn embedding descriptors that can better discriminate Parkinsonian patterns. This study was carried out into a retrospective study conducted with 13 PD patients and 13 control subjects. The validation of the proposed approach included an independent study of each eye regarding their ability to quantify patterns associated with the disease. The study estimated the dominant eye, which has more significant PD patterns, indicating the diagnosis and the unilateral involvement of the disease.

Fig. 1
figure 1

Pipeline of the proposed approach. a The input of the study are videos of each eye during a single fixational period of 5 seconds. Each eye was manually cropped and represented by four 2D slices. b The deep convolutional model receives the slices and outputs the PD probability. c A dominant side estimation is assigned to the largest eye probability

Fig. 2
figure 2

Flowchart of the study. Using Left and Right eye representation, we follow a stratified Leave One patient Out (LOO) cross-validation to quantify PD from each eye and estimate a dominant eye

2 Data

The capability of the proposed approach to relate fixational movement with the dominant side was validated by a retrospective study with control and PD patients. To recover oculomotor fixation patterns, eye video sequences were recorded from 26 recruited participants, including 13 PD patients (average age 72.3 ± 7.4) and 13 control adults (average age 72.2 ± 6.1). Five samples were recorded per participant, resulting in a total of 130 videos. Instead of using RGB videos, we extracted the three color channels (red, green, and blue) independently to prevent bias problems related to the iris color of the eyes, resulting in a total of 780 video sequences, 390 for each eye. In this study, a physical therapist coarsely categorized the PD patients into the Hoehn-Yahr rating scale, where five patients were categorized in stage two, six patients in stage three, and two patients in stage four.

The experimental setup of the proposed study aims to investigate the behavior and triggering of eye movement abnormalities during ocular fixation in PD patients by proposing a fixation task protocol to capture such oculomotor patterns. The protocol first project a white dot stimulus in the center of a screen that is located 150 cm ahead of the patients. The digital screen (32 inches diagonal) shows the white dot over a completely dark background. A sound signal triggered the appearance of the stimulus, which remained fixed for 10 seconds. From the recorded sequences we only consider the central interval of five seconds (2.5 seconds after the start and 2.5 before the end of the recording) to omit abrupt movements related to the tasks of focusing the gaze on the stimulus. During the experiment, participants were instructed to sit in a resting position with their heads facing forward and maintaining their gaze on the stimulus. The TV was placed at the same horizontal level as the eyes, and a camera was positioned at 37° of angle below the participant’s field of view. Ocular abnormalities such as head movement reflexes have been discussed in past works. Videos with abrupt head movements were discarded.

This markerless setup was implemented to avoid altering the natural behavior of eye movements. Eyes movements were recorded using a conventional camera (Nikon D3200) with a spatial resolution of 1280x720 and a temporal resolution of 60 fps. Furthermore, we cropped individual video samples from each eye by centering the first frame to the center of the pupil with a spatial resolution of 210x140 pixels. The camera was set to capture the upper face region, and recording videos were saved in an uncompressed format. This experimental protocol was also addressed to integrate the study into clinical routine setups.

3 Proposed approach

In this work, the unilateral involvement of PD is studied from eye movement abnormalities in ocular fixation tasks. Firstly, we compute 2D slices of each video to highlight temporal and spatial patterns of eye movement abnormalities. The proposed deep learning model takes these slices and, retrieves discriminative descriptors that allow classifying between Parkinson’s patients and a control Population. In addition, from the model prediction, the dominant eye was estimated for each patient as the one that presents more Parkinsonian patterns according to the model. The analysis of unilateral involvement estimated by the model was performed from the model output probability distribution. The model’s classification performance was evaluated under various configurations using several metrics. In addition, a low-dimensional visualization was implemented to observe the model-learned representations of the study population. A general diagram of the proposed method is presented in Fig 1.

3.1 Video slices coding

The proposed fixation experiment aims to capture promising indicators of PD. However, classical volumetric analysis of videos may be redundant and complex, fail in measuring the level of gaze stability, and often ignore patterns such as microtremors [18]. Consequently, we decided to capture spatiotemporal image slices that code temporal variations concerning a particular spatial region, as illustrated in Fig. 1(a). For this purpose, 2D slices \(S_{\theta }(x,t)\) were coded to describe fixational patterns as temporal signals that record ocular fixation patterns. For this, each video sequence is considered as a volume \(I(x, y, t)^{W,H,N}_{x=1,y=1,t=1}\) with N frames and a spatial resolution \(W\times H\). To codify the slices, we used a radial axis as a reference to record temporal variations along this axis. To achieve this, we selected a direction \(\theta\), centered at the patient’s pupil coordinates on the xy-plane. Then, we cut the video volume along the \(\theta\) direction as illustrated in Fig. 1(a). The result is a 2D slice denoted by \(S_{\theta }(x,t)\). This representation captures temporal variations along the axis of the selected direction. In this work, we considered four directions \(\theta \in \{ 0^\circ , 45^\circ , 90^\circ , 135^\circ \}\) for robust quantification of different movement profiles.

In this study, we preprocessed videos using a motion magnification strategy to capture tiny displacements over time [19]. This technique applies the acceleration magnification scheme proposed in [19] to optically amplify spatiotemporal ocular tremors. This strategy enhances subtle motion on videos in the presence of large motion displacements. The result is a new dataset of magnified video sequences, which can be processed similarly using videos slices. These magnified slices may better depict ocular fixation patterns and visually differentiate between control and PD eye motion.

3.2 Deep fixational characterization and the unilateral involvement

We implement a deep learning model that automatically fits representations to discriminate between PD and control slices. This model is trained using raw slices or magnified slices, allowing different configurations to adjust deep representations to capture fixation patterns. Then, the model outputs a probability value indicating whether the input 2D slice corresponds to the class PD. In addition to supporting the diagnosis of PD, the model enables the quantification of the disease and determines its most affected side.

For this purpose, we adjust a convolutional neural network composed of layers starting from the input 2D slices, transforming the input data to output probabilities throughout a hierarchical feature scheme representation. The convolutional neural network architecture includes convolutional, pooling, normalization, and rectification layers to preserve relevant information while achieving generalization in the discrimination task. Additionally, the proposed model incorporates depthwise convolutions, producing a more compact representation and addressing the challenge of limited samples in the trained dataset.

After training the model, we use it to analyze the slices of each test subject’s eyes and determine the unilateral involvement. For this, the deep learning model provides two scalar outputs, \(P_{PD}\) and \(P_C\), representing the probability that the input eye slice belongs to the PD or control group, respectively. These values range between 0 and 1, and their sum equals 1. From this, we determined the dominant side of PD for each patient by analyzing the \(P_{PD}\) value for each slice from the left and right eyes accordingly. The side with the highest \(P_{PD}\) value is considered the dominant side, while the other side is referred to as the non-dominant side. This allows us to assess the evaluation of unilateral involvement by measuring the severity of PD for each slice and identifying the most affected side. In addition, we used this approach to estimate a dominant side for control subjects by analyzing \(P_C\) and determining the dominant and non-dominant sides analogously.

3.3 Embedding space representations

The proposed method considers a visualization tool that may support the PD diagnosis. Once the deep learning model is trained and its representations are adjusted, we can recover an embedding from the last layers of the model’s architecture. These embeddings correspond to column vectors that encode hidden and complex relationships from the input slices and serve as low-dimensional disease descriptors. Therefore, the whole patient samples represented as video slices can be projected into the model to obtain the corresponding embedding descriptor samples. From such descriptors, we can carry out the characterization and geometrical visualization of subjects included in this study. To calculate a geometrical visualization, we used a UMAP projection [20] to reduce the embedding descriptors into a two-dimensional representation. This visualization allows the analysis of the geometrical representation of the sample’s distribution and the discrimination capability of the network. In fact, this strategy constitutes a promising tool that may support PD diagnosis by characterizing a new patient according to the closest samples in this space. Regarding the classification, we can quantitatively estimate the discrimination (separation) between Parkinson and Control classes by following a K-nearest neighborhood strategy over the embedding descriptors space.

4 Evaluation and results

4.1 Experimental setup

The main objective of this work is to propose a computational tool to distinguish PD and characterize unilateral involvement by using the discriminative power given by abnormal ocular fixation patterns. For this purpose, we used the proposed deep-learning model to diagnose PD and then analyze the disease lateralization from output probabilities as shown in Fig. 1(c). The experimental workflow is presented in Fig. 2, showcasing a step-by-step representation of our methodology.

The principal model of this work was fitted using information from both eyes. In addition, we considered configurations to analyze the advantages of using the acceleration magnification technique. Besides, models trained using only one side data were considered in order to fit models that best extract patterns from each eye independently. In that sense, a total of six models are proposed

  • LE+RE. The principal model, was trained using Left and Right eye raw slices.

  • LE-mag + RE-mag. Trained using left and eye slices calculated after applying acceleration magnification over eye video sequences.

  • LE. Trained using only the left eye slices of the study population.

  • RE. Trained using only the right eye slices of the study population.

  • LE-mag. Trained using left eye slices calculated after applying acceleration magnification over eye video sequences.

  • RE-mag. Trained using right eye slices calculated after applying acceleration magnification over eye video sequences.

Regarding the parameters and configuration of the proposed models, they were trained for 50 epochs using a binary cross-entropy loss, and an Adam optimizer were employed during each iteration of the training process. A learning rate of \(1\times 10^-4\) and a batch size of 100 was set. The weights were randomly initialized, and an early stopping technique is used to prevent the overfitting.

To validate the ability of the proposed models to discriminate between PD and control adults, we follow a leave one patient out (LOO) cross-validation scheme. During each experiment of the cross-validation, we leave out all the slice samples of one patient to test the model’s performance, and the remaining samples are used to train the model. Finally, this process is repeated for whole patients included in the study. The performance of the models was measured using the Precision (Pre), Sensitivity (Sen), F1-score, Accuracy (Acc), and the area under the ROC curve (AUC) metrics.

After training the models, the embedding descriptors were recovered for the whole population included in this study. For doing so, the vector of the last layer of the proposed method was recovered and taken as the embedding descriptor of the fixational patterns. Each descriptor for each projected slice sample corresponds to 256-dimensional vector. To evaluate the discrimination capability of this embedding representation, we randomly take eight patients to test (approximately 30%) and the rest for training a K-Nearest Neighbors (KNN) classification algorithm. Given a set of vectors, this algorithm uses information from the nearest neighbors to predict the group to which each vector belongs. In this way, the KNN algorithm builds decision boundaries based on the clustering qualities of the data. This approach was evaluated over the embedding space of the dominant eyes, the non-dominant eyes, and the space considering all the slices samples.

In addition, we calculated a non-linear projection of such embedding vectors onto a 2D-dimensional space to visualize the embedings’ space distribution. For this, a Uniform Manifold Approximation and Projection strategy (UMAP) was implemented [20]. This strategy uses data space approximations to associate a low-dimensional representation for each descriptor using a projection. The UMAP fit to the points geometrical distribution for getting down important geometric differences between the location of the two classes; Parkinson’s population and control subjects.

4.2 Results

In the first part of the evaluation, we measured the performance of the proposed methods, and the achieved results are presented in Table 1. In general, the proposed approach reports a remarkable capability to code fixational patterns and establish a discriminative output Parkinson probability, achieving in all experiments average scores up to 80%. Despite the relatively few amount of training samples, for independent eye experiments, the proposed approach with magnified slices achieves a remarkable Precision and Sensitivity of 92.02% and 88.72%. We hypothesize that the motion magnification enhances optical micro-tremors and consequently aids the model to discriminate better both populations, more accurately detecting subjects with the disease. For the LE+RE and LE-mag+RE-mag models, the proposed approach takes advantage of the data variability, extending the range of micro-tremors behavior and inducing a robust learning that results in superior performance. In fact, in these experiments, the raw inputs were sufficient to achieve an average accuracy of 91.92%, while the magnified sequences achieved 91.03%. In addition, compared with LE and RE experiments, the approach achieved a gain of 5.25% using LE+RE, while magnified sequences (LE-mag + RE+mag) achieved a gain accuracy score of 1.03%.

Table 1 Classification results (%) for raw eye slices of the left (LE) and Right (RE) eyes, as well as for the joined data (LE+RE). Results for the left (LE-mag), right (RE-mag), and joined (LE-mag + RE-mag) magnified data are also included
Fig. 3
figure 3

Prediction probability distribution from the model output. a, b Illustrate the violin plots for the LE+RE and LE-mag+RE-mag models respectively. The plots includes Dominant and Non-Dominant eyes distribution separately. c ROC curves of the proposed models

We also computed ROC (receiver operating characteristic) curves, see Fig. 3(c). This Figure illustrates the ratio between TPR (true positive rate) and FPR (false positive rate) at various threshold intervals, supporting the outstanding discrimination capability of the LE+RE configuration, reporting an AUC of 0.976.

The main interest in this work is to measure the capability of the proposed strategy for supporting the task of determine the dominant side where the involvement of the disease is principally manifested. For this purpose, from the best model (LE+RE) we recover prediction probabilities for each eye sample individually. This way, we can calculate the dominant and non-dominant side of each pair of Left and Right eye slices. Fig. 3(a) summarizes the prediction probability distribution for dominant and non-dominant eyes, with respect to Parkinson and the control population. As expected, the dominant distribution has a major Parkinson’s discrimination capability, with a confident mean around predictions. More specifically, for Parkinson’s patients, the model presents an average prediction probability \(P_{PD}\) of 93.3% (95% CI: [91.61,95.07]) using only the dominant side and raw slices. Contrary, using the non-dominant, the model predicts an average probability \(P_{PD}\) of 79.01% (95% CI: [75.72,82.3]). This analysis was also carried out for motion magnified video sequences, see Fig. 3(b). Similarly, using the magnification technique, the model predicts with an average probability of 92.36% (95% CI: [90.69,94.03]) using only the dominant side, and using the non-dominant predicts with an average probability of 79.39% (95% CI: [76.4,82.39]).

To assess the statistical difference between the probability distributions of dominant and non-dominant eyes (Fig. 3(a, b)), we conducted a Mann-Whitney U test. This test was performed on the output predictions to determine the statistical significance of Parkinson’s discrimination between the two groups. The statistical test results using the original slices indicate a significant distinction (p-value < 0.0001) with a confidence level of 95% between the dominant and non-dominant diagnoses. The magnified slices also exhibit a similar statistical difference (p-value < 0.0001) with a confidence level of 95%.

Fig. 4
figure 4

Embedding space visualizations. Low-dimensional projection from a Uniform Manifold Approximation and Projection (UMAP) strategy. Panel a shows dominant eye embedding. Panel b shows the non-dominant eye embedding, and panel c shows the projection of both eyes embeddings

Finally, the embedding vectors of eye slices of the best model (LE+RE), captured from the network’s last layer, are recovered as digital biomarkers that describe fixational patterns. After this, the resulting 2D low-dimensional geometrical space of the UMAP strategy (obtained from embedding descriptors) is illustrated in Fig. 4. This visualization may work as a complementary tool to support diagnosis, classification, and severity stratification according to close samples related to other patient conditions. Figure 4(a) illustrates only the points corresponding to the dominant side eye slices of PD patients and control subjects. As expected, there is a clear separation between the Parkinson and control groups, achieving an 83% of accuracy from the KNN classifier. Figure 4(b) illustrates the geometrical space corresponding to non-dominant side samples. It should be noted that even for non-dominant samples, the embedding representation shows a 77% of accuracy score in carrying out the clustering classification. Figure 4(c) shows the complete geometrical space with all samples considered in this study, achieving an 80% of accuracy. The performance of the KKN classifier over these embedding spaces enhances the ability of the dominant side eye to produce a more discriminant space. This result is a promising tool to support diagnosis based on close embedding points with a confirmed diagnosis.

5 Discussion and conclusions

The study described in this work aims to identify the unilateral involvement of Parkinsonian patterns by analyzing ocular fixation patterns using a deep convolutional model. In this regard, a fixational task was designed in which 13 Parkinson’s patients and 13 control subjects were invited to maintain their gaze on an immobile white spot projected on a dark screen. At the same time, their eye movements were recorded with a conventional camera. Then, 2D spatiotemporal slices were used to recover micro-tremor patterns that occur during the recording.

Then, a deep convolutional representation was trained with Parkinson and control eye slices to solve an automatic patient classification task. The proposed approach achieves a remarkable sensitivity of 92.31% to discriminate between Parkinson and a control population. Interestingly enough, the deep representation labels each eye slice with an output probability of Parkinson’s disease. Such probabilities were used to determine the dominant eye of each patient, being a digital descriptor to approximate unilateral disease involvement.

In the experimental results, the dominant Parkinson’s eye slices showed a remarkable prediction of the disease with a reported statistical difference with respect to the non-dominant group. The proposed approach then recovered and coded relevant micro-tremor patterns from eyes, being a potential alternative to support early diagnosis. Besides, from the reported sensitivity, we hypothesize that the developed methodology can be implemented to quantitatively follow the patient progression. In the literature, there exists evidence that eye characterization is a potential biomarker of the disease, but typical technical resources are limited to tracking eye movement patterns. These approaches however lost the sensitivity to capture tiny micro-tremor patterns that may occur in the very early stages of the disease. Contrary, the proposed approach uses conventional cameras to capture eye micro-tremor patterns that may be augmented from image analysis processing. Besides, the deep representation is charged to exploit dense and non-linear correlations of observed eye slices to bring an output probability of the disease.

Today, there is an open discussion about the correlation of oculomotor patterns with head movements. For instance, Kaski et al. [21] studied the possibility that parkinsonian eye alterations during fixation could be related to vestibulo-ocular reflex responses. In contrast, Gitchel et al. [22] raise that the effects of head motion are only relevant at a lower frequency. This discussion is very challenging to conduct from standard protocols and typical devices of eye motion capture, overall to describe microtremor during fixational patterns. In such sense, the proposed embedding representation may describe complex fixational patterns that may explain and determine the main spatiotemporal components associated with Parkinson’s disease. Alternatively, the exploited embedding representation can support the analysis and characterization of patients concerning a diagnosed population that share ocular fixation patterns, which translates into closed points in this low-dimensional space.

Future works must include a deep analysis over a larger Parkinson population that also has a stratification of the level of severity of the disease. This further study may determine the sensitivity among degrees of the disease and also the contribution of finding unilateral involvement for analyzing the disease progression for each patient. Finally, from achieved results, it is determinant to establish the capability of the proposed approach as a potential digital biomarker by including in the study other motor pathologies that may overlap with motor symptoms of Parkinson’s disease.