1 Introduction

Parkinson’s disease (PD) is the second most common neurodegenerative disease which manifests cardinal motor symptoms of resting tremor, rigidity, bradykinesia and loss of postural stability [1]. The disease may also include a variety of non-motor symptoms, such as cognitive impairment, depression, sleep disturbances and autonomic dysfunction that affect patients’ quality of life [2].

During the early stages, cognitive functions are impaired in about 40% of PD patients, and almost 80% will develop dementia in the course of the disease [3]. Cognitive impairment may range from mild deficits up to dementia [4]. Early identification of PD patients at risk for the development of dementia is crucial for assuring patients’ well-being and proper medical intervention. Therefore, searching for easily accessible biomarkers is an important endeavour.

Neuroimaging investigations of cognitive impairment in PD is a topic of growing interest. Structural magnetic resonance imaging (MRI) studies have already established that patients who have Parkinson’s disease with dementia (PDD) have atrophy of the parietal–temporal lobes, entorhinal cortex, hippocampus, prefrontal cortex and posterior cingulate-unlike PD patients without dementia or healthy controls [5, 6]. Recently, research of new methods for improving the performance of MR image classification has started to be applied for medical purposes [7].

Many medical research problems are solved with the use of deep learning. Examples of applications include automatic segmentation of brain tumours [8], brain anatomical structures [9] and lesions [10], as well as the diagnosis of some neurological disorders such as Alzheimer’s disease [11]. These methods have also found their application in the evaluation of Parkinson’s disease symptoms, handwritten tests [12] and FP-CIT SPECT for dopamine transporter imaging [13], from which data is then used for training and classification by a deep neural network. The obtained class activation maps can provide information that actively supports the identification of parts of MR images that play a crucial role during the decision making process.

The study aims to demonstrate that convolutional neural networks may serve as a tool in feature engineering allowing identification of the damaged brain areas visualised by MRI in PD patients suffering from varying severities of cognitive impairment.

2 Materials and methods

Twenty-one successive patients with previously diagnosed Parkinson’s disease who reported to the out-patient clinic at the Medical University of Silesia, Katowice were recruited for the study. Finally, 18 PD patients without cognitive complaints or with differing severity of cognitive impairment constructed the patient groups. The demographic and clinical data of the patients are presented in Supplementary File 1. The subgroup of nine age-matched patients (3 persons per each diagnosis status) was additionally chosen from the initial 21 patients to inspect the identified biomarkers with no presence of the confounding factor. These patients were age-matched with the use of a k-nearest neighbour algorithm, treating the PD patients with mild cognitive impairment as the reference group (tests for mean value equality: patient age at the exam \(p=0.1275\), age at the onset \(p=0.2593\), disease duration \(p=0.9746\)). PD was diagnosed according to the UK PD Society Brain Bank criteria (UKPDSBB) [14].

The informed consent was obtained from all the patients to include their medical and personal information in the study, preserving their privacy.

2.1 Clinical assessment

All subjects enrolled in the conducted study were carefully examined by a specialist and were subject to the Hoehn–Yahr test to estimate the stage of Parkinson’s disease. All available additional data such as disease onset, duration and information about the treatment were collected in a specially prepared survey. All patients were subjects in a complex testing scenario, including a neuropsychological test, Mini-Mental State Examination, Clock Drawing Test and Beck Depression Inventory. Other tests (Rey Auditory Verbal Learning Test, forward & backward Digit Span subscale of Wechsler Adult Intelligence Scale, Trail Making Test (TMT parts A &B) and Benton Visual Recognition Test (BVRT) were performed to estimate the strength of subjects’ cognitive abilities.

Based on the results of the neuropsychological assessment, patients were then divided into three groups: cognitively normal (PD-CN), PD with mild cognitive impairment (PD-MCI) and PD with dementia (PDD). Cognitive impairment (MCI or dementia) was diagnosed according to Movement Disease Society criteria [4, 15].

Fig. 1
figure 1

Structure of the proposed convolutional neural network

2.2 MRI acquisition

MRI scans were obtained using a General Electric 1.5 Tesla system. T1- and T2-weighted sequences were acquired for each patient. The in-plane resolution for the images was \(0.65 \times 0.65\) mm, slice thickness of 6 mm, repetition time of 3.6 s and echo time of 95 ms. The total number of 2D scans acquired in the group of 18 patients was 5984, for an average of 332.5 2D scans per patient.

2.3 Data preprocessing and statistical analysis

The MRI data were subjected to a complex, specialised preprocessing pipeline. As a first step, the T1 and T2 images were co-registered using the FSL-FLIRT program [16, 17]. Data with visible crosstalk artefacts/slice overlap artefacts were identified and corrected. The median of Hodges–Lehmann estimates of pairwise between-slice MRI signal intensity difference was calculated for each MRI sequence per each 3D study, and the affected 3D studies were identified with the use of Tukey’s criterion. The next steps were (1) correction of magnetic field inhomogeneity, performed with the use of N4ITK [18]; (2) image intensity normalisation [19] to ensure the same signal range on each MRI sequence, and (3) brain extraction performed with FSL-BET (Brain Extraction Tool) [20]. The images were downsized to a common resolution of 160x160 pixels. Additionally, the images were internally normalised with the use of the z-score algorithm. The final number of 2D scans used in the study was 5,760.

The Shapiro–Wilk test was applied to verify the hypothesis of numerical descriptor distribution normality within each subgroup. Bartlett’s test was used to check variance homogeneity. Depending on the test results, parametric ANOVA or nonparametric Kruskal–Wallis ANOVA was performed to test the hypothesis of mean/median value equality across all subgroups. The Benjamini–Hochberg procedure was applied to correct for multiple testing. Post hoc Tukey–Kramer testing was used in the pairwise comparisons. Eta-squared—ANOVA-related effect size measure [21]—supported the findings obtained by standard statistical testing. Additionally, the above described statistical analysis was performed for the subgroup of the nine age-matched patients.

2.4 Convolutional neural networks (CNN) model

MR T1- and T2-weighted modalities were used to train the network.

The applied CNN consisted of 4 convolutional layers (kernel size of \(3\,\times \,3\) pixels), followed by a max-pooling (kernel size of \(2\,\times \,2\) pixels, a stride of 2) and batch normalisation modules. Each hidden layer was activated with the use of ReLU. Additionally, an image augmentation routine, including various image transformations such as rotation, rescaling and shifting, was applied, which doubled the training set to 11,520 2D scans. Nadam was used as the optimizer with a learning rate of 0.001. The simplified structure of the proposed network is shown in Fig. 1.

The CNN model was built and trained to estimate parameters used in the process of creating activation maps of the neuronal layers’ classification. To generate an activation map, some ready-to-use solutions exist. The most commonly used is Grad-CAM [22, 23], which computes a gradient of the score for a considered class with respect to the feature map activations of the last convolutional layer. Gradients that contribute to the prediction are rescaled and resized to match the original image size. CNN training was performed independently in three one-versus-other classification scenarios. The patient-level leave-one-out cross-validation schema was applied as presented in Fig. 2. Gradient-weighted class activation maps, rescaled to the targeted resolution, were constructed for each patient, and the regions with the highest average across all patients activation values were recognised.

The segmentation of the identified distinguishing brain area was performed with MiMSeg [24]. The segmented areas’ relative volume and folding were estimated, and the basic descriptive statistics within each patient group were calculated. Similar to the statistical analysis of the clinical parameters, the hypothesis of mean value equality was verified by applying the ANOVA after checking if the test assumptions were upheld. Post hoc pairwise comparisons were performed to seek for group differences. A p value of less than 0.05 was assumed as statistically significant.

Fig. 2
figure 2

Diagram of a single loop of the leave-one-out validation experiment. The experiment was repeated for each patient independently

3 Results

The severity of cognitive impairment could be assessed based on significant brain areas that are identified by visualising activation maps for a given disease entity. Regions of the highest importance were found in the frontal, posterior, temporal and cingulate cortex, gyrus lingualis, cerebellum, caudate nucleus and thalamus; exemplary maps are shown in Figs. 3, 4 and 5.

Fig. 3
figure 3

Class activation maps for the strongest features reflecting the brain region identified in the comparison of patients with no dementia and for patients diagnosed with MCI

Fig. 4
figure 4

Class activation maps for the strongest features reflecting the brain region identified in the comparison of patients with no dementia and patients with dementia

Fig. 5
figure 5

Class activation maps for the strongest features reflecting the brain region identified in the comparison of patients with dementia and patients diagnosed with MCI

Fig. 6
figure 6

Average activation maps of the cerebellum for the three different patient groups

Fig. 7
figure 7

Distribution of the relative cerebellum volume and folding among three patient groups

The cerebellum was chosen for further detailed analysis due to showing the highest importance in the automated diagnosis performed by the trained convolutional neural network. The average activation maps of the cerebellum for the three patient groups are presented in Fig. 6. The cerebella were automatically segmented with MiMSeg. Cerebellum volume and folding were calculated and compared across the patient groups to verify the hypothesis of cerebellum importance in the context of dementia development. The distributions of the obtained values are shown in Fig. 7.

The hypothesis of among group mean value equality of relative cerebellum volume and folding was investigated with the use of ANOVA tests (Supplementary File 2). The statistically significant change in relative cerebellum volume (Kruskal–Wallis ANOVA p value = 0.0176) and cerebellum folding (p value = 0.0057) of large effect size [25] were identified among the groups of patients, showing volume and folding decrease with the progress of dementia. The findings remain significant in the subgroup of age-matched patients (p = 0.0273 in both cases), even with a much larger effect size.

4 Discussion

It is well documented that grey matter volume loss and cortical Lewy bodies pathology are involved in the development of cognitive decline in Parkinson’s disease [26]. In cross-sectional studies about Parkinson’s disease dementia (PDD), greater atrophy was observed in the frontal and temporal lobes compared to controls [27, 28]. Our findings are consistent with this study, identifying the most involved areas in cognitive decline within the same brain regions.

Widespread atrophic changes are especially present in the limbic/paralimbic area [29, 30]. Abnormalities associated with cognitive impairment within the posterior cingulate cortex (PCC), which is a part of the limbic system, arise from neurodegenerative processes. Moreover, finding PDD patients’ metabolic deficits in this region strongly suggests the possible influence of Alzheimer’s-like pathology in PD dementia [26, 29, 31] due to the region being heavily connected to the entorhinal cortex. Our results have also shown strong differences in PCC between PD patients with normal cognition and PD patients with dementia.

These subcortical atrophic changes are consistent with pathological studies of PD dementia [28, 30]. The predominant pattern of left hemisphere cortical atrophy in PD was described by Classen et al., where the rates of left hemisphere cortical atrophy were also strongly correlated with disease duration [32]. Other reports demonstrate that left-lateralised atrophy and left predominance is found in different neurodegenerative disorders with dementia [33, 34].

In our study, the strongest association between the severity of cognitive impairment and cortical atrophy was detected in the cerebellum. More recent pathological studies have shown that there is an alpha-synuclein aggregation pathology present in the cerebellum [35].

5 Conclusions

Our analysis proved that convolutional neural networks could be successfully used for feature engineering even in cases of small sample studies. The proposed method utilises only widely available basic MRI sequences and, after very carefully designed image preprocessing and normalisation, can detect changes for which more time-consuming and expensive medical tests are currently being used.

6 Limitations

Some limitations of our study should be recognised. We recruited a relatively small group of PD patients. However, we obtained significant and strong results in all discussed brain regions, and our results are in agreement with related literature. Additionally, the classical p values are supported by sample-size-independent effect size estimates. Another limitation was the fact that our PDD group was the oldest, and the age at disease onset was also higher in this group. To limit the impact of these confounding factors, an additional validation on a smaller group of age-matched patients was performed, which confirmed our findings. According to the clinical variability in PD, we can distinguish two types of the disease—one with the onset of old age and another of younger age [36]. The groups are characterised by rapid and slower disease progression, respectively, and along with advanced age, the risk of developing dementia in PD patients increases to 80% [37]. However, usually ageing results in diffuse, not focal cortical thinning [38], and our results showed a specific pattern of focal cortical and subcortical atrophy referring strictly to cognitive impairment, assuming that age was not a confounding factor.