Keywords

1 Introduction

Cognitive impairments may be caused by a large group of neurological disorders with heterogeneous clinical and pathological expressions. They are defined as cognitive decline greater than expected for an individual’s age and education level but that does not interfere notably with activities of daily life. Cognitive impairment symptoms can remain stable or even disappear, but for more than half of the cases they evolve into a dementia disease [6]. Cognitive impairment can thus be regarded as a risk state for dementia, and its identification could lead to the prevention of a dementia disease. Moreover, the amnestic subtype of cognitive impairments has a high risk of progression to Alzheimer’s disease, and it could constitute a prodromal stage of this disorder. To date, cognitive impairments are diagnosed by physicians. However, in the cases in which it is difficult to confirm the diagnosis, biomarker tests such as brain imaging and cerebrospinal fluid tests may be performed to determine if the patient’s cognitive impairment is due to Alzheimer’s.

Among the daily activities affected by cognitive impairments, there is certainly the handwriting, which is based on cognitive and perceptive-motor skills [21]. Deterioration in writing skills had already emerged in the first diagnosis of Alzheimer’s disease (AD) in 1907 [11]. In recent decades, however, researchers have more accurately discovered that the handwriting of Alzheimer’s patients shows alterations in spatial organization and poor control of movement [13]. Several studies have also been published to study the effectiveness of handwriting analysis as a tool for diagnosis and monitoring of Parkinson’s disease (PD) [20]. Recently, it has been also observed that some aspects of the writing process are more vulnerable than others and may present diagnostic signs. For example, during the clinical course of AD, dysgraphia occurs both during the initial phase and in the subsequent phase of the progression of the disorder. However, most of the studies which analyze the effects of cognitive impairments on handwriting published so far have been conducted in the medical field, where typically statistical tools, e.g. ANOVA analysis, are used to investigate the relationship between the disease and each of the variables taken into account [8, 12, 16, 18, 23]. On the contrary, very few studies have been published that use classification algorithms to analyze people’s handwriting to detect those affected by cognitive impairments. Moreover, almost all of these studies have involved few dozens of subjects, thus limiting the effectiveness of classification algorithms, such as neural networks, SVMs and decision trees [7, 22]. To try to overcome this problem, we proposed [3] a protocol consisting of twenty-five handwriting tasks (copy, reverse copy, free writing, drawing, etc.) to investigate how cognitive impairments affect the different motor and cognitive skills involved in the handwriting process.

In this paper, we present the results of a preliminary study in which we have considered nine of the tasks included in the above-mentioned protocol, with the aim to characterize the handwriting of patients affected by cognitive impairments. We collected the data produced by 130 subjects, by using a graphic tablet. From these data, we extracted the most common features used in the literature [5], both on-air and on-paper. As for the classification algorithms, we considered four well-known and widely-used classifiers and we characterized their performance in terms of recognition rate and false negative rate. The achieved results confirm our hypothesis that handwriting analysis can be used to develop machine learning tools to support the diagnosis of cognitive impairments. The paper is organized as follows: Sect. 2 describes the data collection, the protocol developed to collect traits of patients and shows the feature extraction method. Section 3 displays the experiments and presents the results obtained. We conclude our paper in Sect. 4 with some future work perspectives.

2 Data Collection and Protocol

In the following subsections, the dataset collection procedure, the protocol designed for collecting handwriting samples, the segmentation and feature extraction methods, are detailed.

2.1 Data Collection

The 130 subjects who participated to the experiments, namely 68 AD patients and 62 healthy controls, were recruited with the support of the geriatric ward, Alzheimer unit, of the “Federico II” hospital in Naples. As concerns the recruiting criteria, we took into account clinical tests (such as PET, TAC and enzymatic analyses) and standard cognitive tests (such as MMSE). In these tests, the cognitive skills of the examined subject were assessed by using questionnaires including questions and problems in many areas, which range from orientation to time and place, to registration recall. As for the healthy controls, in order to have a fair comparison, demographic as well as educational characteristics were considered and matched with the patient group. Finally, for both patients and controls, it was necessary to check whether they were on therapy or not, excluding those who used psychotropic drugs or any other drug that could influence their cognitive abilities. As regards the dataset employed it is slightly unbalanced by the total number of patients and controls (68 - 62) and by the average age within each group (73, 16 - 63, 67). This is due to the difficulty in recruiting young patients. However we preferred not to use a subset of subjects because, although the results may be affected by these features, the aim of the work, as will be discussed further below, is to evaluate the contribution of three groups of features extracted from the handwriting (on paper - on air and all features, for more details see Sect. 2.3).

The data were collected by using a graphic tablet, which allowed the recording of pen movements during the handwriting process. During the trial, images and sound stimuli are also provided to the subject to guide the execution of the tasks. Moreover, the white sheets on which subjects are supposed to write contain the instructions of the tasks and the letters/words/phrases to be copied. Finally, the subjects were also asked to follow the indications provided by the experimenter.

2.2 The Protocol

The proposed has been defined with the aim of recording the dynamics of the handwriting, in order to investigate whether there are specific features that allow us to distinguish subjects affected by the above mentioned diseases from healthy ones. The nine tasks considered for this study are selected from a larger experimental protocol presented in [3], and they are arranged in increasing order of difficulty, in terms of the cognitive functions required. The goal of these tasks is to test the patients’ abilities in repeating complex graphic gestures, which have a semantic meaning, such as letters and words of different lengths and with different spatial organizations. The tasks have been selected according to the literature, which suggests that:

  1. (i)

    graphical tasks and free spaces allow the assessment of the spatial organization skills of the patient;

  2. (ii)

    the copy and dictation tasks allow to compare the variations of the writing respect to different stimuli (visual or sound);

  3. (iii)

    tasks involving different pen-ups allow the analysis of air movements, which it is known to be altered in the AD patients;

  4. (iv)

    tasks involving different graphic arrangements, e.g. words with ascenders and/or descendants, or complex graphic shapes, allow testing fine motor control capabilities.

Furthermore, in order to evaluate patient responses under different fatigue conditions, these tasks should be provided by varying their intensity and duration.

  1. (1)

    As in [22] or in [11], in the first task the subjects must copy three letters which have different graphic composition and presented ascender and descender in the stroke.

  2. (2)

    The second task consists in copying four letters on adjacent rows. The aim of the cues is to test the spatial organization abilities of the subject [15].

  3. (3–4)

    The tasks 3 and 4 require the participants to write continuously for four times, in cursive, a single letter and a bigram, respectively [10, 19]. These letters have been chosen because they can be done with a single continuous stroke and contain ascenders, descenders and loops. These characteristics allow the testing of the motion control alternation.

  4. (5–8)

    The tasks 5, 6, 7 and 8 imply word copying, which is the most explored activity in the analysis of handwriting for individuals with cognitive impairment [10, 14, 22]. Moreover, to observe the variation of the spatial organization, we have introduced a copy of the same word without or with a cue.

  5. (9)

    In the ninth task, subjects are asked to write, above a line (the cue), a simple phrase, dictated them by the experimenter. The phrase has a complete meaning, and describes an action easy to memorize. As in [8], the hypothesis is that the movements can be modified because of the lack of visualization of the stimulus.

2.3 Segmentation and Feature Extraction

The features extracted during the handwriting process have been exploited to investigate the presence of cognitive impairment in the examined subjects. We used the MovAlyzer tool ([9]) to process the handwritten trace, considering both on-paper and on-air traits and then segmenting them in elementary strokes. The feature values were computed for each stroke and averaged over all the strokes relative to a single task: we considered for each feature both the mean value and the maximum value for that task. Note that, as suggested in [22], we have separately computed the features over on-paper and on-air traits, since the literature shows significant differences in motor performance in these two conditions. As for the features, we used those related to subject handwriting movements such as, for example, velocity, acceleration and jerk. Moreover, we also taken into account the age of and level of education of the subjects.

Finally, as detailed below, we have merged all tasks in a single dataset, adding the information identifying each specific task.

3 Experiments and Results

Three different groups of data were considered in the experiments: the data obtained by selecting only on-air features, those obtained by selecting only on-paper features and those relative to the use of both types of features. The data were produced by 130 subjects, each performing the 9 tasks illustrated in Subsect. 2.2. As for the classification stage, we used four different classification schemes included in Weka tool: The Random Forest (RF), the Decision Tree (DT) [17], the Neural Network (NN), and the Support Vector Machines (SVM). The classifiers used by the Random Forest are 100 Random Trees (for more details see [1]). For the Neural Network classifier the number of hidden nodes are equal to (number of features + number of classes)/2. Finally, RBF kernel is used with parameter \(\gamma \) equal to 0.5 for SVM classifier ([2]). For all of them, 500 iterations for the training phase were performed and a 5 fold validation strategy was considered.

Being, in this preliminary study, the dataset still unbalanced by age and education, the results could be biased by such not uniform distribution of these features: we discussed this point in Subsect. 2.1. Thus, we performed a further set of experiments discarding such features.

The tables shown below summarize the values of Recognition Rate (RR) and False Negative Rate (FNR) for each task. In each table, the first column reports the types of features used, the second one the classifier employed, while the following columns report, for each task, the value of RR and FNR, respectively. Finally, the last two columns respectively show RR and FNR obtained without considering age and education (column labeled as “Reduced” in all the tables).

It is worth noticing that the false negative rate is very relevant in medical diagnosis applications, since it characterizes the ability to keep as low as possible the number of subjects affected by cognitive impairments, which are discarded by the system, thus allowing their inclusion in the appropriate therapeutic pathway.

Table 1. Classification results of tasks 1 and 2.
Table 2. Classification results of tasks 3 and 4.
Table 3. Classification results of tasks 5 and 6.
Table 4. Classification results of tasks 7 and 8.
Table 5. Classification results of task 9.
Table 6. Classification results of all tasks.

The preliminary results are very promising and seem to encourage the use of classification systems based on these features for supporting cognitive impairment diagnoses. From the tables shown below (Tables 1, 2, 3, 4, and 5) we can point out that: firstly, for each task the maximum value (in bold) of RR is over 70%, reaching peaks in some tasks, such as the fifth one, exhibiting values of about 76%. Secondly, we can observe that, on average, the Random Forest classifier provides higher classification rates. This result is in good accordance with the theory, considering that the Random Forest is an ensemble of classifiers. However, as reported in the last column, FNR is lower using DT classifier. In particular, the lower value of FNR occurs considering the on-paper traits of the second task, with a value of 8.82%.

Finally, Table 6 shows the results obtained by merging, for each subject, the features derived from the whole set of tasks. For the sake of comparison, three groups of data were generated using the same criteria as in the previous set of experiments (on-air, on-paper and both features). Furthermore, to avoid the above-mentioned bias, we excluded age and education features as in the previous case. Using these datasets, we repeated all the classification experiments: the results indicate an increase in the overall performance, showing higher recognition rates and lower false negative rates. In particular, the best value is obtained using RF classifier with on-paper features. The FNR is always very low, reaching the minimum value using DT with on paper. The exclusion of features related to age and education does not show particularly encouraging results.

Although the best performing group of features is on-paper, if we reduce the dataset excluding age and education features (Reduced Condition), the performance drops drastically. However, it is noteworthy that this does not happen using all-features condition. This leads us to claim that the on-air features have the greatest weight in the classification of patients, and that the on-paper features contribute very little in increasing the RR of the classification. If we consider all the tasks, in fact, the RR obtained with all-features differs by just two percentage points compared to that obtained with only on-air features. On the other hand, a similar argument does not apply to the classification values obtained on single tasks, in which the general performances are good also in the reduced condition, in most cases using on-paper or all-features, and do not differ significantly from the all-features classification values.

4 Conclusions and Future Works

In this paper, we presented a novel solution for the early diagnosis of Alzheimer’s disease by analyzing features extracted from handwriting. The preliminary results obtained are encouraging and the work is in progress to increase general performance.

To date, this work represents the state of art of diagnosing of AD by means of machine learning techniques with a so large dataset. Nonetheless, for the future works we will try to better balance the data recruiting both young patients and aged healthy controls in order to make the dataset homogeneous, as much as possible, in terms of employed features. We will also try to investigate feature selection techniques to detect most informative features, for better explaining the relevance of each feature in the classification process. Finally, we will try to aggregate the tasks of all classifiers, combining the results of them [4]. In other terms, we will combine the results of the four classifiers taken into account, trained on one of nine tasks, and we will introduce a reject option to improve classification reliability (reducing the risk of false negative).