Parkinson’s disease diagnosis using convolutional neural networks and figure-copying tasks

Alissa, Mohamad; Lones, Michael A.; Cosgrove, Jeremy; Alty, Jane E.; Jamieson, Stuart; Smith, Stephen L.; Vallejo, Marta

doi:10.1007/s00521-021-06469-7

Parkinson’s disease diagnosis using convolutional neural networks and figure-copying tasks

Original Article
Open access
Published: 08 September 2021

Volume 34, pages 1433–1453, (2022)
Cite this article

Download PDF

You have full access to this open access article

Neural Computing and Applications Aims and scope Submit manuscript

Parkinson’s disease diagnosis using convolutional neural networks and figure-copying tasks

Download PDF

Mohamad Alissa¹,
Michael A. Lones²,
Jeremy Cosgrove³,
Jane E. Alty⁴,
Stuart Jamieson³,
Stephen L. Smith⁵ &
…
Marta Vallejo ORCID: orcid.org/0000-0001-9957-954X⁶

7862 Accesses
20 Citations
8 Altmetric
Explore all metrics

Abstract

Parkinson’s disease (PD) is a progressive neurodegenerative disorder that causes abnormal movements and an array of other symptoms. An accurate PD diagnosis can be a challenging task as the signs and symptoms, particularly at an early stage, can be similar to other medical conditions or the physiological changes of normal ageing. This work aims to contribute to the PD diagnosis process by using a convolutional neural network, a type of deep neural network architecture, to differentiate between healthy controls and PD patients. Our approach focuses on discovering deviations in patient’s movements with the use of drawing tasks. In addition, this work explores which of two drawing tasks, wire cube or spiral pentagon, are more effective in the discrimination process. With $93.5\%$ accuracy, our convolutional classifier, trained with images of the pentagon drawing task and augmentation techniques, can be used as an objective method to discriminate PD from healthy controls. Our compact model has the potential to be developed into an offline real-time automated single-task diagnostic tool, which can be easily deployed within a clinical setting.

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

Article Open access 31 March 2021

Deep Learning Approach for Early Detection of Alzheimer’s Disease

Article 03 November 2021

A deep learning framework for early diagnosis of Alzheimer’s disease on MRI images

Article Open access 19 May 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Parkinson’s disease (PD) is a progressive neurodegenerative disorder characterised histologically by the death of dopaminergic neurons in the substantia nigra pars compacta (SNpc) and the presence of Lewy bodies in various parts of the brain [17]. The SNpc is a compact structure in the midbrain that plays a vital role in motor coordination and movement control by producing a chemical substance called dopamine, which is integral for controlling the initiation, velocity, and fluidity of voluntary movement sequences [83]. The causes of most cases of PD (known as ‘sporadic’ or ‘idiopathic’ PD) are still unknown, but involve complex interactions between genetic and environmental factors [46].

PD is the second most common neurodegenerative disorder after Alzheimer’s disease, affecting $1\%$ of the population over the age of 60 and reaching approximately $5\%$ at 85 [69]. The prevalence is rising due to ageing populations. According to the Parkinson Disease Foundation [63], about 10 million people worldwide have PD, one million of them in the USA, 1.2 million in Europe [59], and two million projected in China by 2030 [19]. One out of 500 individuals in the UK is affected, and it is expected that this number will rise threefold in the next 50 years [61]. There is currently no proven disease-modifying therapy [24]. The diagnosis of PD requires the presence of bradykinesia (slowness of movements) in addition to muscle rigidity or tremor or postural instability [62]. Approximately 20% of patients do not develop a tremor [37]. The manifestations of PD are not limited to motor impairments.

Prompt diagnosis of PD is important in order to provide patients with appropriate treatment and information on prognosis. However, an accurate early diagnosis can be challenging because the movement symptoms can overlap with other conditions [72]. Doctors make the diagnosis of PD based on clinical evaluation, interpreting information gained predominantly through history-taking and examination of the patient. Sometimes brain imaging may be requested to help support the clinical diagnosis, but there are currently no tests that are wholly sensitive or specific for Parkinson’s. The rate of misdiagnosis of PD is approximately 10–25% [38], and the average time required to achieve 90% accuracy is 2.9 years [36]. Autopsy is still the gold standard for the confirmation of the disease.

There remains a need for quick and non-invasive tests to provide objective results to support a clinician’s diagnosis. We address this in our work, with the aim of developing a medical device that can assist with early diagnosis of PD, focusing on the primary care context where the rate of misdiagnosis is particularly high [38]. Patients with suspected PD could then be forwarded for expert assessment by movement disorder specialists. The approach is based around a graphics tablet on which a patient traces or copies a cognitive assessment figure; this has the benefit of collecting a lot of information about the patient’s movements and cognitive processes in a short period of time using an inexpensive device. The system then uses a deep learning model to detect whether the patient’s drawings shows signs of Parkinson’s disease.

In this paper, we describe the training and selection of the deep learning model. Unlike earlier work in this area (see Sect. 2.2), we focus on developing a model that can diagnose Parkinson’s disease from a single drawing. This is important, because elderly patients fatigue quickly, meaning that it is not practical within a primary care context to ask them to carry out multiple drawing tasks. In particular, we show that the use of dynamic movement data (rather than static images) combined with data augmentation techniques allows us to build a highly predictive model without having to integrate information from multiple drawings. Also of importance from a clinical perspective, we show that PD can be diagnosed using an intentionally simple CNN model. Simple models are more likely to generalise beyond their training data and hence are considered more trustworthy for medical diagnosis.

1.1 Figure-drawing tasks for assessing Parkinson’s disease

Due to the lack of accepted definitive biomarkers [53] and specific neuroimaging findings [51], the diagnosis of PD is typically based on patient history, observations, judgements on clinical examination criteria and specific symptom questionnaires. These test outcomes are highly examiner-dependent (based on training and experience), with variability among different groups of observers [68]. The necessity of systematic kinematic tests to aid for clinical decision making led to the development of independent and objective quantitative assessments, more suitable for statistical analysis and data processing. Some of these tools, such as the systematic analysis of data from the finger-tapping test [5], the use of handwriting [20] and sketching abilities [73], have already been proposed to evaluate motor and cognitive function in the clinical setting to assess and diagnose PD.

Kinematic aspects of handwriting movements such as size, speed, acceleration and stroke length are affected in PD from its early stages [82]. As PD progresses, changes in handwriting occur with reductions in writing size (micrographia) [16] and decreased ability to write in general (dysgraphia) [47]. These deficits can be used to diagnose and monitor PD. Research to date has investigated signature writing [67] and the writing of short phrases [41]. The disadvantage of selecting handwriting abilities for PD diagnosis is that this skill is correlated with culture and penmanship, along with the level of literacy and education of the individual [22]. On the contrary, the execution of drawing tasks is considered an education-independent measure and may be more sensitive in detecting early signs of PD [80]. They are also fast, non-invasive and relatively easy to perform. There are different graphonometric methods used as tests, where patients have to draw figures of different levels of complexity like a spiral [73], cube [8], pentagon [6], interlocking pentagons [4], meander [64], star [78], the Bender–Gestalt test [54] and more complex figures like the clock [9], the Benson or the Rey–Osterrieth figure copy test [76]. Each test can be applied to particular aspects of PD. For instance, the pentagon task has been used for the analysis of cognitive decline [40], to assess at the same time both motor and cognitive levels [85] and to compare PD with other neurodegenerative diseases [15].

The analysis of drawings provides significant motor function data as a result of the force, speed, time, tightness and uniformity generated by the patient for a period of time. However, it is not straightforward for clinicians to diagnose PD based on a simple visual inspection and requires detailed analysis. Although tremor may be visually apparent, tremor manifestations are not a symptomatology requirement in PD. Some 30% of patients do not develop this sign, and it is even less predominant at the early stages of the disease. However, this information can be used as the input for a computational model designed to support the diagnosis of PD. Computational models have been effectively applied to classification problems in the area of health care for a long time [88]. One successfully and widely used complex model with a multi-layer structure is the deep neural network (DNN). The learning methods that support multi-layer models are generally categorised as deep learning (DL). DL is a multi-level feature learning method that can deal with multimodal data and high-dimensional search spaces [31, 44]. Its performance and versatility are two reasons why this technology has been extended to a variety of different domains, including image classification [33], speech recognition [34], among many others.

The goal of this work is to use DL to analyse the information collected from patients’ drawings in the form of images as a basis for discriminating PD patients from healthy controls. The architecture selected for this work is a convolutional neural network (CNN), a form of DNN that is known to work well with image data. Specifically, we aim to develop DNN models to achieve the following objectives:

Selecting the most suitable model structure for our CNN classifier to automatically learn significant features from drawing assessments in order to differentiate between PD and healthy controls.
Developing a reliable set of tests to investigate which data representation is the most informative option for training predictive models.
Comparing two different drawing tasks (pentagon and cube drawing) to examine which one is more informative for discriminating PD as input for a CNN classifier.
Analysing the effect of applying augmentation techniques on the classification performance and its level of stability (variance).

The remainder of this paper is organised as follows: Sect. 2 introduces DL as a tool to support learning in DNN models, presents a general overview of the CNN topology and illustrates the way in which other studies have applied these techniques to medical diagnosis. Section 3 outlines the datasets and the methods employed in this work, the description of the experiments performed and the procedure used to validate our results. Section 4 shows the set of experiments conducted and the results obtained from the analysis of the multiple classification scenarios. Section 5 comments on the experimental results in detail. Finally, Sect. 6 summarises this paper and lays out directions for future work.

2 Deep neural networks

DNNs are advanced multi-layer network models that are able to deal with complex, nonlinear and unstructured data such as audio, video, image and text by transforming them into a hierarchical structure of features with multiple levels of abstraction [44]. A crucial advantage of such models is that the transformation is performed without the intervention of human expertise and without the need to perform any feature extraction and data preprocessing. The feature extraction is, instead, automatic [31].

2.1 Convolutional neural network topology

The way in which the multiple layers of a DNN are linked and arranged characterises its topology, also called architecture. A CNN is a deep feed-forward DNN that was inspired by the structure of the cat’s visual cortex. Using only the local connectivity of the nodes arranged in adjacent layers, the CNN specialises in processing grid-like data such as images [32] and performs this learning by extracting features from raw data automatically [12]. The CNN architecture has shown remarkable performance on hard classification problems [33]. A typical CNN topology consists of a combination of several convolution layers that can extract features from input data based on the local underlying spatial patterns, allowing for learning features with a higher level of abstraction [44]. Each layer is composed of three cardinal stages: (1) convolution, (2) activation function (nonlinear transformation) and (3) pooling (nonlinear down-sampling). By stacking these layers together, the network is able to extract progressively more abstract patterns, reducing the number of connections of the network. Afterwards, the extracted features are transformed to a one-dimensional vector using a flattening layer, and finally, the CNN combines these convolutional layers with traditional dense layers to produce the output of the classifier.

2.2 Deep learning for medical diagnosis

DL has been successfully applied in the broad area of medical diagnosis [48], including medical imaging [87]. For image-related problems, CNNs and its variants have been widely used in this field due to their extraordinary ability to exploit image data [43].

The use of drawing data and DL techniques was first proposed by Pereira et al. [64]. The research group investigated the use of a five-layer CNN to aid PD discrimination using 264 scanned images of $256\times 256$ pixels showing meanders and spiral tasks gathered from 35 individuals as input. The authors achieved higher recognition ability measured by the accuracy per class metric processing spiral images ($90.38\%$) than meander figures ($83.11\%$). Another, more recent work using scanned data was conducted by Seedat et al. [74]. The most important contribution of these authors is the size of the dataset, which is significantly larger than the rest included in other works, with data from 370 PD subjects and 357 controls. However, paper-based tests imply that only X, Y coordinates and pressure as changes in terms of shades of intensity were collected. Despite that, authors reported accuracies of over $98\%$ using a pretrained hyperparameter optimised CNN approach with data augmentation.

In [66], the group of Pereira explored the use of different well-known CNN architectures to analyse a set of 308 images gathered from 35 individuals performing the same type of tests. The HandPD dataset, gathered initially as a time series from a biometric pen, was initially transformed into a set of vectors composed by six signal channels. For each time step, these vectors were stacked together to form an image. The approach achieved a performance level of $87.14\%$ for the meander images and $80.19\%$ for the spirals using a CaffeeNet topology. Pereira et al. [65] extended their work using the same sensors, a larger dataset, called NewHandPD, with information from 92 individuals and a time series-based image pattern representation. The paper covered the comparison of three different CNN architectures (CaffeNet, CIFAR-10_quick and LeNet), three baselines and a combination of six different tests that were linked in a fusion approach to reach an average accuracy of $95.74\%$ for $128\times 128$ pixel size images with the CaffeNet architecture.

Recurrence plots were applied by members of the same research group led, this time, by Afonso et al. [2], to map the signals gathered from the NewHandPD dataset onto the image domain. These images were further used as input of the previous three CNN topologies. The experiments compared also the same two image resolutions ($64\times 64$ and $128 \times 128$), achieving the best results ($88.05\%$) with the meander $64\times 64$ pixel-size figure and the CaffeNet architecture.

Two similar unsupervised clustering approaches using a deep optimum-path forest (OPF) model were then proposed by Afonso et al. in [1, 3], using the NewHandPD dataset. In both works, the OPF was used as a feature extractor for three traditional machine learning algorithms, namely Bayesian classifier, supervised OPF and support vector machine (SVM). In [3], accuracies from meander and spiral tests were rather similar, with values around $81\%$; meanwhile in [1], the accuracy from the meander dataset outperformed the spiral by over $2\%$, reaching almost $84\%$. Linked to this research is the work of De Souza et al. [79], where a fuzzy OPF is used, merging HandPD and NewHandPD datasets, and using restricted Boltzmann machines as feature extractors, reaching 79.57% and 77.94% accuracies for meander and spiral, respectively.

Four recent papers approached the diagnosis of PD using deep recurrent neural networks (RNN). A bidirectional gated recurrent unit network, along with an attention mechanism, was investigated using the NewHandPD dataset [70], achieving superior results with the meander figures ($92.24\%$) compared to the spiral ($89.48\%$) and outperforming previous works on this dataset. Gallicchio et al. [27] proposed another type of deep RNN architecture, a 10-layered deep echo state network (ESN) and a different significantly imbalanced public dataset called ParkinsonHW with 61 PD patients and 15 controls, reaching accuracies of up to $89.3\%$. This dataset contains information about pen position (x and y components), pressure and grip angle. Szumilas et al. [81] suggested also the use of an ESN-ensemble model to quantify kinetic tremor in PD by drawing circles on a digitising tablet, using, in this case, a dataset of 64 PD patients. Finally, in [75], the authors compared an ESN with a long short-term memory model using our dataset and reaching accuracies of 91% for the LSTM and 93.7% for the ESN.

Considering the same ParkinsonHW dataset, Canturk [11] employed a CNN-based approach, selecting the pre-trained AlexNet and GoogleNet models as feature extractors to achieve an accuracy of $94\%$. In this case, the author applied a fuzzy recurrence plot to convert time-series signals into greyscale texture images and K-Nearest Neighbour (KNN) and SVM as final classifiers, reporting the superiority of SVM over KNN by only $1\%$. In [29], this accuracy was increased to $96.5\%$ with the same AlexNet approach, but using spectrum points as input data, since PD symptomatology is better reflected in the frequency domain. Another similar, but inferior work in terms of final accuracy (88%) was also published by Khatamino et al. [42], inspired by the time-series image representation of [65].

Moetesum et al. [55] used a set of eight pre-trained CNNs (AlexNet) as a feature extractor system to be used by a SVM classifier. The networks were trained with the PaHaW dataset [20] that comprises 72 subjects (37 controls and 38 PD patients) performing eight different tests, one of them being a spiral drawing. Afterwards, using fusion techniques, the eight outputs were combined to provide a final single metric. Information was collected as sequential data by a digitised pen and transformed into images using X, Y coordinates and zero-pressure information, achieving $83\%$ in overall accuracy and $62\%$ for the spiral data.

Using the same dataset, Diaz et al. [18] integrated together the features extracted from three parallel VGG16 CNNs, which shared the same 16-layer architecture, but trained with different data representations and transfer learning. As a result, the extracted features were given as the input to a combination of traditional ML models (SVM, random forest and AdaBoost). This work reported a maximum accuracy of $86.67\%$, gathered by the ML ensemble, using a majority voting scheme.

The next work that continues experimenting with the PaHaW dataset is the study conducted by Naseer et al. [57]. In this case, authors proposed a deep 25-layer CNN classifier (AlexNet), with transfer learning and data augmentation, achieving an outstanding accuracy of $98.28\%$. Authors used the ImageNet and MNIST fine-tuning-based approach over the spiral data of the PaHaW dataset and reported that the AlexNet-ImageNet approach outperformed the MNIST pre-trained version by over $3\%$.

In the work of Vasquez et al. [86], data collected from speech, handwriting and gait were used together as a multimodal ensemble mechanism to distinguish between PD patients and healthy controls. Handwriting data consisted of 14 tasks, including circle, cube, rectangle and spiral drawings gathered from a total of 84 subjects, 44 PD patients and 40 controls, as a time series data. From that, a feature extraction step collected the transitions in handwriting. A one-dimensional CNN with four layers was designed to extract spatial features from these transitions and sent them as input to a SVM model. The approach achieved high accuracy ($97.6\%$) when information from speech, handwriting and gait were combined. However, using the handwriting data as a single classifier was not very effective, resulting in only a $67.1\%$ accuracy.

Much of the existing work in this area has been done using a small number of publicly-available datasets, containing relatively few data points. In addition, the focus has been on using increasingly complex predictive models to raise accuracy rates, with the best accuracies achieved using deep architectures and ensemble models. All of these factors contribute to the likelihood of overfitting. The use of small datasets to train and test deep neural architectures is particularly concerning, since this will likely lead to many model parameters being under-specified. However, large datasets are very difficult to acquire. Hence, going against this trend, our work focuses on using shallower CNNs, where the number of trainable parameters is much smaller, and hence the generality is likely to be greater when trained on small datasets. Rather than focusing on more complex models, we instead investigate the features within the data that are most significant for accurate classification, and tailor the representation of the data to emphasise these.

Another important consideration that has not really been addressed by the existing literature is the burden placed upon patients when collecting data within a clinical setting. The most accurate existing models have been achieved by forming ensembles from multiple data modalities. This, in turn, requires patients to undergo a corresponding number of data collection exercises, something that may be difficult to achieve in practice with elderly and physically infirm patients. In our work, we focus on training models that require only a single drawing as their input, hence minimising the burden placed upon patients in the clinic, and providing a more practical predictive model for use in a primary care setting.

A summary of the related work introduced in this section can be seen in Table 1, in chronological order. An extended comparison of these studies can be found in the Sect. 5, in Table 16.

Table 1 List of works included in the literature review

Parkinson’s disease diagnosis using convolutional neural networks and figure-copying tasks

Abstract

Similar content being viewed by others

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

Deep Learning Approach for Early Detection of Alzheimer’s Disease

A deep learning framework for early diagnosis of Alzheimer’s disease on MRI images

1 Introduction

1.1 Figure-drawing tasks for assessing Parkinson’s disease

2 Deep neural networks

2.1 Convolutional neural network topology

2.2 Deep learning for medical diagnosis

3 Methodology

3.1 Data acquisition

3.2 Data preprocessing

3.3 Architecture and training

3.4 Experimental set-up

3.5 CNN assessment

4 Results and evaluation

4.1 Evaluating the experimental results

4.1.1 The effect of applying augmentation

4.1.2 The effect of adding zero-pressure information on the input data

4.1.3 The effect of adding the range of pressure values

4.1.4 The effect of the image resolution on the CNN architecture

4.1.5 The effect of the application of augmentation techniques in the variability of the performance among runs

4.2 Evaluating the final model

5 Discussion

6 Conclusions and future work

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation