Introduction

Alzheimer’s disease (AD), the most common form of dementia with 60–70% of total cases, has an onset over 65 years of age. AD is a progressive, incurable neurodegenerative disease characterized by a gradual decline in cognition, memory, and thinking (Kumar et al. 2022).

Currently, the therapeutic approaches offer limited results in symptoms and progression of the disease, but there is not a definitive treatment (Lane et al. 2018; Brookmeyer et al. 1998). Thus, a great challenge is to develop novel methods for early detection, in order to decrease or prevent disease progression.

Sporadic cases of AD originate from complex genetic architecture that involves many risk loci with small single influences (Tesi et al. 2021). Indeed, AD-associated SNPs were common with other medical conditions and human traits (Tesi et al. 2021). Genetic correlation analysis based on phenome-wide screening generates novel hypotheses related to risk conditions and comorbid events of AD (Liu and Crawford 2022). Rapid rising of high-throughput technologies (e.g., microarray and next generation sequencing) over the past years has resulted in a significant recent increase of novel computational methods of many diseases including AD (van IJzendoorn et al. 2019).

Genome-wide association studies (GWAS) are essential tools to address this complexity and opening up new therapeutic challenges. Previous GWAS in AD demonstrated the association between immune system and lipid metabolism and identified several genes and genetic variants related to lipid metabolism (Baloni et al. 2022; Kunkle et al. 2019). In addition, recent studies demonstrated that cardiovascular and life style factors could increase the risk of AD development as well as diabetes, obesity, and hypertension (Broce et al. 2019; Desikan et al. 2015). Adewuyi et al. showed an association between gut and brain, suggesting a potential genetic susceptibility of gastrointestinal disorders with AD’s risk (Adewuyi et al. 2022). Another group with AD genetically associated traits is related to the dietary habits (Squitti et al. 2014). For example, a lower incidence of AD has been reported in subjects following a Mediterranean diet (Gardener et al. 2012). The lack of micronutrients in the diet such as vitamins B1, C, and folate has been related to cognitive decline in elderly people (Solfrizzi et al. 2011). However, genetic correlation is not a diagnostic tool, but a method to establish the genetic similarity between complex traits (van IJzendoorn et al. 2019).

Most machine learning algorithms proposed for AD classification are based on phenotypic data such as imaging, and few studies use genetic data (Lee et al. 2018). Since 2012, deep learning, a branch of machine learning, has been shown good performance in several areas outside biological problems (Abiodun et al. 2018). Lately, with this assumption, several studies have demonstrated the potential of deep learning to address also biological questions as diagnostic tools (Zhu et al. 2020a; Rukhsar et al. 2022). Deep learning indicates machine learning algorithms that are composed of deep neural networks. Different studies are based on the biological application of neural networks and few studies are focused on how the network architectures could improve the performance of models (Bellot et al. 2018; Yu et al. 2019; Wilentzik Müller and Gat-Viks 2020). There are different challenges in obtaining the optimal model of neuronal network in a classification problem (Rukhsar et al. 2022). Mainly, the performance of neural network models can be influenced by the amount of data that could generate overfitting problems (Esteva et al. 2019). To resolve this problem, researchers have developed several techniques such as regularization methods, dropout, class balanced, and feature selection (Esteva et al. 2019). However, there is not a best general method because it is difficult to obtain the performance of each network architecture using a same dataset (Nusrat and Jang 2018; Moolayil 2019). In addition, unlike imaging or text data, classifiers based on neural networks are still novel in gene expression analysis (Hanczar et al. 2022).

In this work, using genome-wide associations statistics from public datasets, we explored a genetic correlation between AD and many other human traits. In addition, we considered and compared different methods to reduce the overfitting using gene expression profiles of Alzheimer patients derived by 5 published datasets. The dimensionality of the gene expression profiles was reduced with principal component analysis (PCA) as it transforms the features into a lower dimensional space considering the relationships between the features. Furthermore, after a procedure to avoid unbalanced classes, we evaluated 5 network models considering the performance of the classifier with accuracy, sensitivity, and specificity.

The aims of our study are (i) to identify the mechanisms of AD genetic liability that could be connected to different human traits, (ii) to explore artificial neural network models for AD diagnosis, (iii) comparison of different models based on artificial neural network using gene expression profiles of AD patients. In particular, the present findings could (i) suggest future studies to assess the impact of several traits with AD risk and (ii) open new potential frontiers in the study of AD.

Materials and methods

Genome-wide association studies

GWAS summary statistics of AD were downloaded from GWAS Atlas (Jansen et al. 2019). The dataset consists of European cohort and have participants of both sexes. We used the summary statistics of 71,880 AD cases and 383,378 controls (Jansen et al. 2019).

We also considered genome-wide association statistics of other phenotypes and diseases, derived from the UK Biobank (UKB) (Bycroft et al. 2018). This dataset was downloaded from (http://www.nealelab.is/uk-biobank/, accessed on 4 February 2022).

The UK Biobank enrolled approximately 500,000 participants aged 40–69 years, of both sexes from the UK (Bycroft et al. 2018). UKB participants were analyzed for a wide range of phenotypic information such as diet, educational status, cognitive function, social activities, health status, and other phenotypes.

Data quality control was performed separately for each data set (AD and UKB).

In particular, we calculated SNP-heritability for AD and UKB phenotypes and considered for further analyses only phenotypes with SNP-heritability z > 4.

In addition, AD and UKB genome-wide association statistics were processed by removing SNP with a minor allele frequency (MAF) < 1%. More detailed descriptions of quality control step are available at https://github.com/Nealelab/UK_Biobank_GWAS.

Genetic correlation analysis

We estimated the genetic correlation analysis between the AD phenotypes and the other phenotypes included in UKB. To perform the genetic correlation, we used the package linkage disequilibrium score regression (LDSC) (Accessed on May 2022; LD Score: https://github.com/bulik/ldsc) (). This is a method that performs the linkage disequilibrium (LD) mechanism to calculate the distribution of effect sizes for each SNP, thus assigning the score and the type of correlation between phenotypes.

We used SNPs present in the HapMap 3 reference panel, and as reference data, the individuals of European ancestry from the 1000 Genomes Project. We performed the genetic correlation analysis between AD phenotype with UKB traits with SNP-based heritability z score > 4, in line with the guidelines of the LDSC developers (). We considered statistically significant genetic correlations as those that had an FDR less than 0.05.

Gene expression data

Five publicly available datasets of gene expression profiles of Alzheimer patients (GSE1297, GSE5281, GSE36980, GSE29378, and GSE48350) were downloaded from the Gene Expression Omnibus (GEO). These datasets contain the gene expression profiles of hippocampus of Alzheimer patients as this brain area is involved in the early stages of disease (Quarato et al. 2022). We considered hippocampus because we suppose that this part of brain plays a fundamental role in different traits associated with AD. Table 1 shows the number of samples for Alzheimer patients and controls of each dataset.

Table 1 Number of samples for each class

Training and testing sets

We split each GEO dataset into two sets: training and testing sets. Neural network was trained using the training set and the testing set to test the model: 70% of the original dataset for the training and 30% for the testing.

To avoid unbalanced datasets, namely a number of samples in a class (e.g., Alzheimer) is greater than another class (e.g., control), we performed a random oversampling. This approach balances the minority class with majority class.

In addition, we standardized each dataset separately converting the data distribution per feature to a normal distribution.

Feature selection

The presence of unrelated features in the dataset can decrease the accuracy of the models. During the feature selection step, we selected a subset of features that contribute to reduce overfitting (Moolayil 2019). PCA was used to decrease the dimensionality of datasets and to identify the key components on the standardized training data (Moolayil 2019). The number of basic components according to the training data was defined considering 95% of the variance of the data. We considered the same components in both the training and testing data.

Neural network models

Similar to other machine learning methods, neural network model is composed of (i) the training step where the parameters of the network are estimated from a given training dataset and (ii) the testing step that applies the trained network to predict the classes of new input data.

The neurons in our models are organized in 4 layers where all nodes in a specific layer must be connected to all the nodes in the next layer.

In all models, the four layers were defined as follows: the input (first) layer consists of a number of neurons equal to the number of features (i.e., key components derived by PCA). The first hidden layer characterized by 17 neurons and the second hidden layer of 8 neurons. The output layer returns the predicted class.

Each neuron calculates a weighted sum of its inputs and applies an activation function. We used a model rectified linear unit (ReLU) at each node of the network for all models (Glorot et al. 2011). It is the most common activation function used and it generates 0 as output for negative inputs, following the formula:

$$\mathrm{f}\left(\mathrm{x}\right)=\mathrm{max}\left(0,\mathrm{x}\right)$$

A sigmoid activation function is used for the output layer to identify the class to be predicted for all model:

$$\mathrm{sigmoid}\left(\mathrm{x}\right)=1/\left(1+\mathrm{exp}\left(-\mathrm{x}\right)\right)$$

where x is a feature vector.

The Adam stochastic gradient descent optimization is used as optimizer algorithm in all our models to train the network (Kingma and Ba 2014). It assigns the parameters that decreases the loss function. Gradient descent uses the first derivate of the activation function to modify the parameters of the model. Specifically, Adam changes the weights of the model in the training set iteratively to maximize a particular class (Kingma and Ba 2014).

Table 2 shows the parameters considered for all 4 models. We evaluated the parameters as setted in Izadkhah 2022.

Table 2 Description of parameters used in artificial neural network (ANN) models

We tested 4 different neural network architectures for the binary classification problem which differ in loss function, metrics, dropout, and weight regularization.

The loss functions to minimize during the training that we used are binary cross-entropy or mean squared logarithmic error. This function is used to evaluate the classifier through the model error and to quantify how the model fits (Rengasamy et al. 2020).

It is a common strategy to reduce the overfitting of neural network to add dropout or introduce a penalty (weight regularization).

A dropout can be used to decrease the overfitting of the model. The dropout consists in removing a random subset of nodes (Srivastava et al. 2014).

Table 3 shows the 4 different models of neural networks used.

Table 3 Description of neural network models

Summarizing, the different 4 models are organized as follows:

  • First model: The first model consists of binary cross-entropy as loss function, adam as optimization algorithm, and the binary accuracy as metric. Binary cross-entropy determines the cross-entropy loss between the predicted classes and the true labels.

  • Second model: The second model consists of mean squared logarithmic error as loss function, adam as optimization algorithm, and the accuracy as metric. Mean squared logarithmic error is calculated between the true classes and predicted classes.

  • Third model: The third model consists of mean squared logarithmic error as loss function, adam as optimization algorithm, and as metric the accuracy. Dropout is applied between the second and third layer to reduce overfitting and the dropout rate is set to 0.5. Dropout consists of a random selection of a small number of nodes instead of all nodes changing by regularly the nodes in the training process (Kingma and Ba 2014).

  • Fourth model: The fourth model consists of mean squared logarithmic error as loss function, adam as optimization algorithm, and as metric the accuracy. Weight regularization is used to reduce overfitting and regularization hyper-parameter value is set to 0.001. Weight regularization is a method to reduce the overfitting by regulating the weight distribution adding a regularization expression to the cost function (loss function) (Maki 2019). Weight regularization to reduce the error is based on criterion in our study: it adds “summed squared weights” as penalty term to the loss function (Maki 2019).

In all models, we introduced an “early stopping function” presents in the package Keras (https://keras.io/callbacks/#earlystopping) that regularly checks loss values of testing data and stops the training process when there is not a significant improvement in the loss values of the testing data. The quantification of acceptable improvement is set to 0.005 and if there are not improvement of the loss values in at the last 5 interactions the training process will terminate. To reduce the time and memory activity, the model was trained with a batch size = 8 and run for a maximum of 200 epochs.

Finally, we compared the performance of the 4 models (sensitivity, specificity, and accuracy) in the testing set for each GEO dataset. It must be noted that neural networks are based on stochastic algorithms and so, the performance on the same data with same model can slightly differ. In order to obtain more realistic results, we calculated the average sensitivity, average specificity, and average accuracy running 10 times the same model.

The neural network model code was implemented in Python using the keras package (version 2.10).

Results

Genetic correlation

After quality control step, the number of SNPs in GWAS of AD is reduced from 13,367,299 to 9,736,043 SNPs. Out of 4000 phenotypes of UKB, only 957 passed the quality control.

Genetic correlation analysis can demonstrate if AD is influenced by external factors. We found 65 traits correlated with AD (Table 4).

Table 4 The table shows genetic correlation (GC) with the respective standard error and associated FDR

Neural network models

As first aim of our work, we investigated if neural networks can be used as tool for Alzheimer diagnosis (i.e., classification Alzheimer vs control) considering 5 gene expression datasets. For each of these datasets, we explored different neural network models.

All neural network models consist of three layers. Input nodes equal to the number of input feature (i.e., key components derived by PCA). We used as hidden layer a number of 50% of input nodes. Being a binary classification, the models require an only output node. Figure 1 shows the described neural network.

Fig. 1
figure 1

Neural network structure used in the study. The input layer is the results of principal component analysis; the output layer consists of one node describing the class of the sample

We investigated 4 neural network models. As the most basic neural network structure, we examined a neural network that uses binary cross-entropy as loss function and binary accuracy as metric to evaluate the model in the training. The classification model was demonstrated to be more accurate in GSE5281 (accuracy 0.78, sensitivity 0.68, and specificity 0.88) achieving an overall average good performance in all datasets (accuracy 0.66, sensitivity 0.62, and specificity 0.712).

We then tested the classification using a model neural network based on mean squared logarithmic error as loss function and the accuracy as metric. The average performance of all GEO datasets showed a dramatic decrease: accuracy 0.546, sensitivity 0.524, and specificity 0.566. Similar results were obtained with the third model that used dropout: accuracy 0.554, sensitivity 0.492, and specificity 0.628.

A slight improvement was achieved with the weight regularization in the fourth model: accuracy 0.582, sensitivity 0.6, and specificity 0.566.

Table 5 shows the performance of the classifier for each GEO dataset.

Table 5 Performance (accuracy, sensitivity, and specificity with standard deviation) for each neural network model and for each GEO dataset

Overall, the best performances were achieved with the first and fourth model (Fig. 2).

Fig. 2
figure 2

Comparison of performance for each neural network model

Discussion

Sporadic AD is the most common form of dementia. It is due to the effects of many risk loci with small single consequences. In the present study, we performed a genetic correlation analysis between genome-wide association statistics of AD derived by GWAS Atlas and human traits from UK Biobank.

We observed that AD was mainly associated with fluid intelligence score, medical conditions, diet, and activities. Regarding the diet, AD is positively associated with cereal and salt intake and inversely correlated with dried fruit and alcohol intake. Further studies should be performed to understand the potential beneficial effect of alcohol consumption and negative effect of salt intake.

Another macro-area with multiple AD genetically correlated phenotypes is anthropometric measurements: positively associated with standing height and inversely correlated with leg fat percentage (left), high light scatter reticulocyte count, leg fat percentage (right), and body mass index (BMI).

As hippocampus, a part of cerebral cortex, plays a central role in several traits that we found to be associated with AD, we applied to gene expression profiles of hippocampus of AD patients and artificial neural models.

Regarding the development of diagnostic tools for AD, we explored the role of artificial neural network based on gene expression of hippocampus of Alzheimer patients.

Artificial neural network, an emergent field of machine learning, is a computational model involving interconnected nodes inspired by neurons in the human brain to solve complex problems. It uses one or more hidden layers, an activation function and hyper-parameters to elaborate the input and generate the output.

Recent studies in bioinformatics have proposed the use of neural networks in molecular classification of diseases by gene expression and multi-omics data (Qiu et al. 2022; Shao et al. 2021). Many studies were focused on the comparison between artificial neuronal network and other machine learning methods, demonstrating that artificial neural networks are more flexible and work on different types of data (e.g., discrete or continuous data) (Esteva et al. 2019; Biganzoli et al. 1998; Zhu et al. 2020b).

However, few studies have been performed to evaluate different procedures to avoid overfitting and improve the performance of the artificial neuronal network considering gene expression datasets (Hanczar et al. 2022; Zhu et al. 2020b; Chen et al. 2016). This could be explained by the great number of hyper-parameters to test.

Our study compared 4 neural network models applied to gene expression datasets of Alzheimer, showing that the simple basic neural network model achieves a better performance than other more complex methods with dropout and weight regularization (accuracy 0.66, sensitivity 0.62, and specificity 0.712). However, increasing the size of the samples in the datasets, the model could further improve the performance and confirm these results. Indeed, the dataset size is a critical aspect that could influence the performance of models. Typically, larger datasets could lead to better performance and small datasets may generate overfitting (Prusa et al. 2015). Supervised machine learning methods also depend on the diversity and quality of the dataset to achieve good performances in generalization step (Leguy et al. 2021).

In line with our results, a previous study found that simple neural network models have obtained similar performance compared to other complex methods (Zhu et al. 2020b). Although the values of hyper-parameters used in this study are closely associated with our data, we can suggest the use of simple basic neural network for gene expression classification. In addition, loss function with binary cross-entropy seems to work with better performance than mean squared logarithmic error. Note, regularization methods seem to reduce the overfitting and work better than dropout procedures.

Conclusion

In conclusion, the present study with genetic correlation analysis suggested several mechanisms of AD that could be associated with different human traits. It can be grouped into 9 clusters: medical conditions, fluid intelligence, education, anthropometric measures, employment status, activity, diet, lifestyle, and sexuality. However, correlation analysis does not necessarily imply causation, namely the cause-and-effect relationship between two variables. In order to establish causality, it is necessary to conduct further studies that can identify cause-effect relationships more reliably. In addition, further studies should be conducted to fully understand the impact of SNPs on these relationships.

Related to neural network models in our study, we compared the most suitable schemes for artificial neuronal network applied to gene expression datasets of patients with Alzheimer. Our results showed that the simple basic neural network model achieved a better performance (66% of accuracy). To our knowledge, in literature, there was not similar research, and more studies are needed to completely define standard procedures to achieve more efficient results. It could be also interesting to explore more sophisticated deep neural networks also increasing the size of the datasets.