High-throughput classification of clinical populations from natural viewing eye movements
Many high-prevalence neurological disorders involve dysfunctions of oculomotor control and attention, including attention deficit hyperactivity disorder (ADHD), fetal alcohol spectrum disorder (FASD), and Parkinson’s disease (PD). Previous studies have examined these deficits with clinical neurological evaluation, structured behavioral tasks, and neuroimaging. Yet, time and monetary costs prevent deploying these evaluations to large at-risk populations, which is critically important for earlier detection and better treatment. We devised a high-throughput, low-cost method where participants simply watched television while we recorded their eye movements. We combined eye-tracking data from patients and controls with a computational model of visual attention to extract 224 quantitative features. Using machine learning in a workflow inspired by microarray analysis, we identified critical features that differentiate patients from control subjects. With eye movement traces recorded from only 15 min of videos, we classified PD versus age-matched controls with 89.6 % accuracy (chance 63.2 %), and ADHD versus FASD versus control children with 77.3 % accuracy (chance 40.4 %). Our technique provides new quantitative insights into which aspects of attention and gaze control are affected by specific disorders. There is considerable promise in using this approach as a potential screening tool that is easily deployed, low-cost, and high-throughput for clinical disorders, especially in young children and elderly populations who may be less compliant to traditional evaluation tests.
KeywordsADHDFASDParkinson’s diseaseAttention deficitsEye tracking
Visual attention and eye movements enable us to interact with complex environments by selecting relevant information to be processed in the brain. To properly allocate attention, a network of brain resources is engaged, from low-level visual processing to motor control of gaze orienting . This renders visual attention vulnerable to neurological disorders. Several neuropsychological and neuroimaging studies have demonstrated that damage in different areas of the attentional network can impair distinct aspects of task performance or can reveal unusual patterns of brain activity in laboratory tasks that test for specific aspects of attention . However, while in-depth clinical evaluation, structured behavioral tasks, and neuroimaging are extremely valuable and are the current gold standard for identifying particular impairments, they suffer from limitations that prevent their large-scale deployment: time and cost by limited numbers of medical experts, and inability of some patients (e.g., young children or some elderly) to either understand or comply with structured task instructions, or with the testing machinery or protocol.
Our core hypothesis is that natural attention and eye movement behavior—like a drop of saliva—contains a biometric signature of an individual and of her/his state of brain function or dysfunction. Such individual signatures, and especially potential biomarkers of particular neurological disorders, which they may contain, however, have not yet been successfully decoded. This is likely because of the high dimensionality and complexity of the natural stimulus (input space), of the stimulus to behavior transfer function (brain function), and of the behavioral repertoire itself (output space). We devised a simple paradigm that does not require expensive machinery, involves no preparation and no cognitive task for participants, is completed in 15 min, is portable for use outside large medical centers, and (after initial training of the machine learning algorithms) autonomously provides detailed decoding of an individual’s signature.
We validated our technique with one neurodegenerative and two neurodevelopmental disorders that have been shown to involve deficits in visual attention and oculomotor functions. These deficits were exploited by our algorithm with features corresponding to oculomotor control, stimulus-driven (bottom-up) attention, and voluntary, contextual (top-down) attention. We first tested the algorithm on elderly participants with the neurodegenerative disorder, Parkinson’s disease (PD) and validated the signature of PD discovered by our algorithm, because the behavioral deficits of PD are well understood. In short, PD is characterized by degeneration of dopaminergic neurons in the substantia nigra pars compacta, affecting basal ganglia processes, which subsequently impairs body movement (tremor, bradykinesia) and oculomotor movement (slower and shorter saccades) [3–5]. PD also impairs the prefrontal, premotor, motor, and basal ganglia networks , leading to deficits in attentional control; in particular, PD patients are less successful in inhibiting automatic saccades to a salient stimulus compared to controls [3, 4]. Therefore, we expected PD patients to show deficient oculomotor control, weakened top-down control, and stronger bottom-up guidance in natural viewing.
Next, we tested the algorithm on the two neurodevelopmental disorders at the other end of the age spectrum: attention deficit hyperactivity disorder (ADHD) and fetal alcohol spectrum disorder (FASD). Patients with ADHD or FASD demonstrate comparable deficits in visual attention tasks [7–11], but for different reasons. ADHD in childhood is characterized by delayed cortical maturation, dysfunction in dopamine transmission in the frontal cortex and/or basal ganglia , and decreased activity in frontal and striatal regions [13, 14]. These deficits result in difficulties in inhibiting premature responses (weakened top-down control), and thus patients appear more stimulus-driven (stronger bottom-up guidance) . Oculomotor function seems relatively unimpaired, though previous studies have shown inconsistent findings . On the other hand, FASD is caused by excessive maternal alcohol consumption, which results in malformation of the cerebral cortex, basal ganglia and cerebellum, and reduced overall brain and white-matter volumes [10, 15]. Deficits include impaired oculomotor functions , decreased top-down attentional control , and weakened bottom-up attention, possibly due to deficient visual sensory processing . The weakened bottom-up guidance of children with FASD could be a differential factor between FASD and ADHD, because children with ADHD appear to be more stimulus-driven. For example, in pro-/anti-saccade tasks (where a pro-saccade requires participants to initiate an automatic eye movement to a visual stimulus, and an anti-saccade requires participants to make a voluntary eye movement in the opposite direction) , children with ADHD or FASD both made more directional errors in the anti-saccade task (implying difficulty in inhibiting automatic responses), but only children with FASD made more directional errors and had longer reaction time in the pro-saccade task (implying weakened stimulus-driven guidance) [7, 9]. While diagnosis of some subtypes of FASD is often assisted by the presence of dysmorphic facial features , the majority of affected children do not exhibit facial dysmorphology, and when these features are not obvious, there is a significant risk of misdiagnosis with ADHD . Thus, the differential classification of ADHD versus FASD provides a difficult challenge for our method.
Standard protocol approvals and patient consent
All experimental procedures were approved by the Human Research and Ethics Board at Queen’s University, adhering to the guidelines of the Declaration of Helsinki and the Canadian Tri-Council Policy Statement on Ethical Conduct for Research Involving Humans.
Demographic data (see Supplementary Table S1 for full demographic data)
70.33 ± 7.53
67.43 ± 6.62
Hoehn and Yahr
Stage 2: 6
Stage 2.5: 6
Stage 3: 2
23.17 ± 2.60
10.67 ± 1.82
11.19 ± 1.83
12.31 ± 2.10
From eye traces recorded while participants viewed short videos, we extracted three types of features that we hypothesized would be differentially affected by disorders. First, oculomotor-based features were computed (e.g., distributions of saccade amplitudes and fixation durations) as they might reveal deficiencies in motor control of attention and gaze. Second, saliency-based features correlated participants’ gaze to predictions from a computational model of visual salience , which has been previously shown to significantly predict which locations in a scene may more strongly attract attention of control subjects. We hypothesized that these features would reveal deficits in reflexive, stimulus-driven, or so-called “bottom-up” attention. The third type, group-based features, captured deviations in participants’ gaze allocation onto our stimuli compared to a normative group of young adult controls. These features, we posited, might reveal impaired volitional, subject-dependent, or “top-down” attentional control, especially if differences were observed in group-based but not saliency-based features. Together, we utilized all these features to classify participants into clinical groups based on natural viewing behavior, the complexity of which imposed challenges in data analysis, but also revealed rich and profound information about the different populations.
The classifiers were built to discriminate patients from controls based on 15 core features from our three types: four oculomotor-based core features (distributions of saccade duration, inter-saccade interval, saccadic peak velocity, and saccade amplitude), ten saliency-based core features (differential distributions of salience values at human gaze vs other locations, using the ten saliency maps of Fig. 1b), and one group-based core feature (correlation between a patient’s gaze and aggregate eye traces from a normative group of young adult controls, Fig. 1a). Each core feature was represented by several sub-features to capture the dynamics of free-viewing: each oculomotor-based core feature was subdivided into 12 sub-features [3 measures (lower quartile, medium, upper quartile) × 4 saccades (the 1st, 2nd, 3rd, and all saccades on each 2–4-s clip snippet) = 12 sub-features]; each saliency-based core feature was subdivided into 16 sub-features: 4 measures [area under the ROC curve (AUC; see Supplementary Methods: Computing Features) for low/medium/high salience bins] × 4 saccades, as was each group-based core feature: 4 measures (AUC, low/medium/high similarity bins) × 4 saccades. Thus, in total, 15 core features subdivided into 224 sub-features were used (Supplementary Table S2).
Classification and feature selection
Feature selection is a popular machine learning method to identify useful features and overcome situations where the number of features is possibly larger than the number of samples when training a classifier . We performed feature selection with support vector machine-recursive feature elimination (SVM-RFE) , which has been used with great success in other fields (e.g., cancer classification with microarrays ). SVM-RFE consists of training a classifier and discarding the weakest feature iteratively until all features are eliminated. We used SVM-RFE to differentiate PD patients from elderly controls (binary classification), and multiple SVM-RFE (MSVM-RFE)  to distinguish children in the ADHD, FASD, and control groups (3-way classification). All classification accuracies reported were obtained using these two feature selection methods.
Performance of each classifier that used a particular selected subset of features was computed using 30 iterations of a repeated leave-one-out bootstrap validation . This validation method was similar to the standard leave-one-out validation, which leaves one participant out for testing, but here the classifier was trained on the remaining participants that were bootstrapped (sample with replacement) ten times the number of these remaining participants. The performance was tested against permuted chance, which was the classification accuracy of a classifier trained on the same bootstrap structure but with randomly permuted class labels (class labels were randomly rearranged). Because classification accuracy varied with the number of features in the process of RFE, we tested the performance of classifiers by comparing the maximum accuracy obtained by the classifier trained with true labels to that obtained by the classifier trained with randomly permuted labels (permuted chance, the chance referred to in this article unless stated otherwise), regardless of how many features each classifier used to obtain maximum accuracy (one-tail paired t test; Supplementary Methods: Classification and Feature Selection). All tests were Bonferroni corrected.
Classifying PD and controls
Our method not only differentiated PD from elderly controls [one-tail paired t test, t(29) = 23.07, p < 0.01], but also provided information about how PD affects eye movements, obtained by separately studying classification accuracy for oculomotor-based, saliency-based, or group-based features (Fig. 2b). PD patients demonstrated motor deficits as revealed by classification differences between them and controls in oculomotor features [considering only the 48 oculomotor-based sub-features, accuracy was 86.4 %, t(29) = 28.02, p < 0.01]. Oculomotor deficits have been attributed to dysfunction in the basal ganglia [28–30], crucial for voluntary saccade control . Patient’s top-down attention also differed from elderly controls [16 group-based sub-features, 74.6 %, t(29) = 11.58, p < 0.01], in agreement with previously reported impairment in voluntary attention, involving cortical and sub-cortical attention networks [28, 29, 31–33]. However, counter to our expectation that lower top-down control may give rise to higher reliance upon stimulus-driven salience, bottom-up attention of PD patients seemed unaffected, as saliency-based features showed no overall differences [160 saliency-based sub-features, 63.16 %, t(29) = −4.10, n.s]. It is possible that any higher reliance upon visually salient stimuli to guide gaze may have been offset by impaired salience computation because of deficient early visual processing in PD patients, as reported in previous laboratory studies  [see Supplementary Discussion of Neurological Implications: Parkinson’s disease (PD) for more details relating the findings from previous studies to the results from classification].
At a finer granularity, our method also permitted investigating whether each of our 15 core features was affected by PD. We tested 15 separate classifiers, each using only the 12 or 16 sub-features of a given core feature (with SVM-RFE). This yielded a 15-component biometric signature of PD (Fig. 2c). During natural viewing, PD patients demonstrated motor deficits as their saccades were of shorter amplitude and duration [classification accuracy: t(29) > 9.62, p < 0.01; direction of the effect: two-sample t test, t(36) > 2.73, p < 0.01]; peak velocity and inter-saccade interval were also affected [t(29) > 6.31, p < 0.01], but without a unified upward or downward direction of effect among the 12 sub-features (Supplementary Methods: Direction of Effect). These observations are consistent with earlier structured-task studies, which showed shorter and slower voluntary saccades of PD patients toward pre-determined visual locations [3, 5, 28, 35], with less impairment for visually guided saccades [28, 35]. The classifier also found that PD and elderly controls differed in intensity variance [t(29) = 4.96, p < 0.01] and texture contrast [t(29) = 8.36, p < 0.01], though with mixed upward and downward effects among the involved sub-features, suggesting complex interactions between deficits that affect behavior in opposite directions: e.g., weakened top-down control (stronger bottom-up) and impaired saliency computation (weaker bottom-up). Deficits in voluntary control and top-down attention were also revealed by different similarities to our normative young observers between PD patients and elderly controls [t(29) = 7.06, p < 0.01].
Classifying ADHD, FASD, and control children
Our method further examined which of the three feature types contained differential information among the three groups of children (Fig. 3b). Classification accuracies were significantly above chance with the saliency-based [50.8 %, t(29) = 4.04, p < 0.05], but not with the oculomotor-based features [40.5 %, t(29) = −5.28, n.s] and the group-based features [45.7 %, t(29) = 1.03, n.s.]. When comparing each pair of the three child groups, first, children with ADHD and controls were distinguished significantly in saliency-based features [78.2 %, t(29) = 12.68, p < 0.01]; second, children with FASD and controls differed in both saliency-based features [77.6 %, t(29) = 9.95, p < 0.01] and group-based features [69.8 %, t(29) = 6.01, p < 0.01]; lastly, children with ADHD and FASD showed no differentiability by each feature type alone, but they could be distinguished with all feature types together [t(29) < 22.96, p < 0.01]. Although we focus on classification performance, these results are in line with earlier studies that showed how children with ADHD have difficulties in inhibiting premature responses and thus appear more stimulus-driven , as well as studies that demonstrated how children with FASD have atypical top-down [8, 9, 17] and bottom-up  attentional control (see Supplementary Discussion of Neurological Implications: ADHD, FASD, and ADHD versus FASD for more details pertaining to previous studies and the present results). However, when we examined whether the saliency-based and group-based sub-features showed larger feature values in one population than in the other, we found mixed directions of effect among the sub-features of both feature types, indicating that the disorder impacts natural viewing behavior in more than one single unified manner (e.g., impaired response inhibition [7, 9], but also possibly weakened early visual processing [36–38]). The quantitative predictions of our classifier for every sub-feature provide for the first time a rich basis to further investigate these complex effects from a neurological viewpoint.
At the level of the 15 core features, our method yielded clearly distinct biometric signatures for ADHD versus FASD (Fig. 3c), thus successfully teasing apart the two disorders along 15 important dimensions. For children with ADHD, the best feature differentiating them from control children was texture processing [t(29) = 15.67, p < 0.01]; children with ADHD showed a higher correlation with texture contrast [two-sample t test, t(37) = 2.75, p < 0.01; Fig. 3c], in line with previously reported tactile texture sensitivity [39–41]. Thus, the current results suggest this may not be limited to the tactile domain. Propensity to look toward color contrast [36, 37] [t(29) = 5.63, p < 0.01] and oriented edges [t(29) = 6.72, p < 0.01] was also discriminative between children with ADHD and controls. Oriented edges are important to perceptually construct the contour and shape of objects. For children with FASD, line junctions, overall salience, and texture contrast were discriminative [t(29) > 4.92, p < 0.01]. To our knowledge, no previous study has investigated how ADHD might affect processing of oriented edges, nor how different domains of salient features may be affected by FASD. The discovery of these features by our classifier thus suggests interesting new research directions.
Sub-features selected by the SVM-RFE process
This study revealed different biometric profiles of oculomotor function and attention allocation among PD, ADHD, and FASD patient groups through quantitative analysis of natural viewing eye traces. Our automated SVM-RFE process discovered that PD patients were best discriminated from elderly controls by oculomotor-based features, implying that motor deficits are more apparent than attention deficits for PD patients during free viewing. In contrast, children with ADHD or FASD were best distinguished from controls by saliency-based features, suggesting that the disorders affect their bottom-up attention. The disorders also influence overall attention allocation in every patient group, as group-based features showed differentiability for clinical and control populations (see Supplementary Discussion for our interpretations of the particular features identified by our method and the corresponding neurological implications in each disorder). By identifying features that are most discriminative among populations, our technique provides new insights into the nature of the different disorders and their interactions with attentional control. The encouraging results obtained here with diseases that lie on both ends of the age spectrum suggest that the proposed approach may generalize to additional disorders that affect attention and oculomotor systems. The fact that our paradigm alleviates the need for structured tasks is of great importance because the approach can be applied to a wider range of populations, including very young children who cannot understand the instructions of experiments or individuals who have cognitive impairment.
Our method robustly differentiates disorders that may have overlapping behavioral phenotypes (ADHD and FASD) but that nonetheless affect visual processing differently. Overall, we suggest that with natural scene videos, participants’ natural viewing behaviors are evoked, and their eye movement patterns contain unique and revealing information about their cognitive and motor processes. One of the strengths of this study is that it is a general framework that could identify such information in several patient populations. In the future, with better understanding of differences in cognitive control, attention, and oculomotor systems of patients with these disorders, the experiment could be further shortened by selecting stimuli that maximally evoke different eye movement patterns between populations. This would also provide for a better understanding of novel behavioral differences that were revealed by this study, such as the discovery of edge processing differences in children with ADHD (see Supplementary Discussion: Future Directions and Study Limitations). In summary, our method provides for the first time an objective, automated, high-throughput, time- and cost-effective tool that can screen large populations and that, through clustering, may further discover new disease subtypes and assist making more precise medical diagnoses. Future benefits of our method may include earlier and more accurate identification of neurological disorders and subtypes.
We thank the National Science Foundation (CRCNS grant number BCS-0827764), the Army Research Office (grant nos. W911NF-08-1-0360 and W911NF-11-1-0046), the Human Frontier Science Program (grant RGP0039/2005-C), and the Canadian Institutes of Health Research (grant no. ELA 80227) for supporting this study. IGMC was supported by a scholarship from the Canadian Institutes for Health Research, and DPM was supported by the Canada Research Chair program.
Conflicts of interest