Identifying Alzheimer’s disease-related proteins by LRRGD

Zhao, Tianyi; Hu, Yang; Zang, Tianyi; Cheng, Liang

doi:10.1186/s12859-019-3124-7

Identifying Alzheimer’s disease-related proteins by LRRGD

Research
Open access
Published: 25 November 2019

Volume 20, article number 570, (2019)
Cite this article

Download PDF

You have full access to this open access article

BMC Bioinformatics Aims and scope Submit manuscript

Identifying Alzheimer’s disease-related proteins by LRRGD

Download PDF

Tianyi Zhao¹^na1,
Yang Hu¹^na1,
Tianyi Zang¹ &
…
Liang Cheng²

2634 Accesses
14 Citations
11 Altmetric
1 Mention
Explore all metrics

Abstract

Background

Alzheimer’s disease (AD) imposes a heavy burden on society and every family. Therefore, diagnosing AD in advance and discovering new drug targets are crucial, while these could be achieved by identifying AD-related proteins. The time-consuming and money-costing biological experiment makes researchers turn to develop more advanced algorithms to identify AD-related proteins.

Results

Firstly, we proposed a hypothesis “similar diseases share similar related proteins”. Therefore, five similarity calculation methods are introduced to find out others diseases which are similar to AD. Then, these diseases’ related proteins could be obtained by public data set. Finally, these proteins are features of each disease and could be used to map their similarity to AD. We developed a novel method ‘LRRGD’ which combines Logistic Regression (LR) and Gradient Descent (GD) and borrows the idea of Random Forest (RF). LR is introduced to regress features to similarities. Borrowing the idea of RF, hundreds of LR models have been built by randomly selecting 40 features (proteins) each time. Here, GD is introduced to find out the optimal result. To avoid the drawback of local optimal solution, a good initial value is selected by some known AD-related proteins. Finally, 376 proteins are found to be related to AD.

Conclusion

Three hundred eight of three hundred seventy-six proteins are the novel proteins. Three case studies are done to prove our method’s effectiveness. These 308 proteins could give researchers a basis to do biological experiments to help treatment and diagnostic AD.

ReRF-Pred: predicting amyloidogenic regions of proteins based on their pseudo amino acid composition and tripeptide composition

Article Open access 09 November 2021

AMYPred-FRL is a novel approach for accurate prediction of amyloid proteins by using feature representation learning

Article Open access 11 May 2022

Machine learning prediction and tau-based screening identifies potential Alzheimer’s disease genes relevant to immunity

Article Open access 11 February 2022

Background

Alzheimer’s disease [1] has become the greatest threat to the elderly. At present, there is no effective drug for AD. Many studies have reported that neurodegenerative diseases such as Alzheimer’s disease are closely related to aging diseases and can interact with each other [2, 3]. Many scholars reported that abnormal behavior of specific proteins is the key to cause AD [4, 5]. This is because the main pathological feature of AD patients is that a large number of beta amyloid (A beta) deposits are formed outside the neurons in the cortex and hippocampus and neurofibrillary tangles (NFT) are formed in neurons with tau protein as the main component [6, 7].

Recently, finding alternatives to diagnosing AD has become a hot issue [8]. Ray et al. found 18 plasma proteins have high specificity in AD patients. They then found that these proteins were associated with Aβ and tau levels in CSF. Then the Human Discovery Multi-Analyte Profile (MAP) has become a popular tool to identify plasma analytes. But, these exciting results raise a major issue that it is hard to reproduce these protein panels [8]. Gisslen M et al. [9] found that the correlation between CSF and plasma NFL was stronger than tau. Olsson B et al. [10] confirmed this view, and they found that the NFL was increasing in both AD patients and MCI’s CSF. Studies have found this phenomenon in serum and plasma samples as well [11]. O’Bryant et al. [12] used a serum-based algorithm to distinguish AD from Parkinson’s disease and cross-validated this algorithm. At present, biological experiments and bioinformatics methods are the most widely used methods. Lista et al. [13] reviewed the blood biomarkers of AD disease based on mass spectrometry. They concluded that about 20 proteins may be potential biomarkers of AD diseases. They also emphasized that the molecular level of neurodegenerative diseases (such as AD) may change 20 years before the onset of clinical symptoms.

Complex protein interactions could be researched by Protein-protein interaction (PPI) network [2, 14, 15]. Most PPI networks are built based on genes’ relationship. Shubhabrata et al. [16] used dense module searching (DMS) method to integrate gene-wide association results into PPI network and identified candidate genes or sub-networks for AD. However, most of protein networks are static network which has highly average and idealized network structures. In fact, with the change of external conditions, some proteins will be degraded, while others will be translated [17]. This would result in the new protein interactions and disappearance of old protein interactions.

Based on the prior knowledge of protein interaction and biology, some researchers use machine learning [17, 18] and pattern classification methods [19] to predict diseases-related protein interaction. Machine learning methods include Bayesian network method [20], Markov model method [21], Random Forest method [22] and Support Vector Machine method [23] etc. Barber et al. [24] uses Simulated Annealing (SA) to select the proteins most relevant to AD and uses Random Forest (RF) to classify patients based on these proteins. The best model trained in serum can significantly predict disease status with AUC of 0.66. At the same time, training with serum data and testing by CSF data, the AUC is 0.77. However, machine learning method usually needs negative samples, but in fact, negative samples are hard to obtain.

Therefore, in this paper, we consider the problem of identifying AD-related proteins as a regression problem, which makes it unnecessary for us to obtain negative sets. This can greatly improve the accuracy of recognition and reduce the false positive rate.

Methods

Data collection and database content

Disease ontology

Three thousand five hundred twenty-four kinds of diseases are downloaded from Disease Ontology (DO) which is an authoritative website that contains comprehensive disease related knowledge [25]. The concept of each disease or disease is a node in DO. Each node has an ID. There is a subordinate relationship between nodes. Similarity between AD and other diseases could be obtained based on DO using similarity calculation methods.

Uniprot

UniProt [26] consists of three parts: UniProt Knowledgebase (UniProt), which is the information access center of protein sequence, function, classification, cross-reference, etc. UniProt Non-redundant Reference (UniRef) database, which combines closely related protein sequences into a single record to improve search speed; currently, three sub-libraries are formed according to sequence similarity, namely UniRef100, UniRef90 and UniRef50; UniProt Archive (UniParc) is a repository that records the history of all protein sequences. Users can query database by text, search database by BLAST program, or download data directly by FTP. All known diseases-related proteins could be obtained by UniProt.

Gene ontology

Gene ontology (GO) is one of the most successful ontology in the field of biomedicine. It provides a standard and accurate term set for describing the molecular function, biological process and other related information, which is widely used in the field of biomedical research.

Disease similarity

Firstly, similarity between AD and other diseases are calculated by five methods. At present, these five methods are widely used: Resnik’s [27], Lin’s [28], Wang’s [5], Process-similarity Based (PSB) [29], SemFunSim [30].

The principle of Resnik’s method and Lin’s method is same. Both of them calculate similarity by GO terms, but Resnik’s method uses the information content (IC) of the most informative common ancestor (MICA) between two terms. However, Wang’s method improves Resnik’s method. It considers multiple common ancestors. PSB: associations of GO terms are considered. Semfensim: semantic and gene functional association are intergrated to calculate similarity. Since it is hard to recognize which method is the best, all of them are used to calculate similarities. Finally, 3524 diseases’ similarity with AD are calculated. Therefore, each disease gets 5 different similarity values, and we add these five values together as the final similarity.

Figure 1 shows all the similarities which are higher than 1 between 3524 diseases and AD. Two thousand six hundred sixty-three of three thousand five hundred twenty-four diseases’ similarity is lower than 1, so they did not show in the Fig. 1. As we can see, since 99% diseases’ similarities are less than 3.5, 3.5 is set as a threshold to retain only a small number of diseases most associated with AD.

Finally, there are 34 diseases left. Table 1 shows their similarity with AD and the names of them.

Table 1 Similarities between AD and other diseases by five different methods

Full size table

Extracting features

Firstly, the 34 disease’s name are obtained by the ID of DO. Then, we obtained 34 disease-related proteins on the Uniprot. To ensure the accuracy of the results, only human and reviewed proteins are selected.

We excluded two disease: DOID: 936 ‘brain disease’ and DOID: 14332 ‘postencephalitic Parkinson disease’. Brain disease is related to more than 2000 proteins and it is a large group of diseases and includes AD. postencephalitic Parkinson disease has no related information in Uniprot, so we removed this disease from data too. Therefore, 32 diseases are left and we obtained 32 diseases-related proteins by Uniprot.

Figure 2 shows the number of proteins for each protein. AD is related to 299 proteins. Therefore, 33 kinds of diseases are related to 2827 proteins. Some of the 2827 proteins are duplicated, which indicates that similar diseases share similar proteins. Firstly, we removed the redundant proteins and 1608 kinds of proteins are left. To our surprise, 43.1% proteins are redundant. So there must be some AD-related proteins that we have not known that they are related to AD, but we have known that they act on AD’s similar diseases.

As we mentioned before, proteins are the features for similarity. Therefore, the dimension of feature’s matrix is 1608. Each disease corresponds to a 1608*1 feature matrix.

Each protein has a weight for similarity and it represents its relationship with AD. Constantly iterating over these weights so that they can map to similarities and get their relationship with AD.

Map features to similarity by logistics regression

Firstly, we normalized all diseases’ similarity. All similarities are transformed into a number between 0 and 1. The similarity between AD and AD itself should be the max number in all methods. For Resnik method, the max number is 4 and other methods are 1. Therefore, the max similarity is 8. Then we could normalize all other diseases’ similarity by eq. (1).

$$ {similarity}_{normalized}= similarity/8 $$

(1)

Thirty-two diseases are 32 samples and 1608 proteins are 1608 dimensions of feature. It is a typical high dimension and small samples problem. LR could hardly solve this problem. Therefore, we borrowed the idea of Random Forests (RF). Forty features (proteins) are randomly selected to build model each time. The 40 features (proteins) would be put back after building model. We selected 40 features because $ \sqrt{1608}\approx 40 $. This is the typical way to select the number of features in RF. We would repeat 400 times so that each protein would be selected nearly 10 times.

After building models every time, GD is used to find out optimal result. Since GD is easy to get local optimal solution rather than global optimal solution, we used the known AD-related protein as the initial value of the iteration. In this way, the initial value is very close to the global optimal solution so we can get the global optimal solution with fewer iterations.

Figure 3 shows the work flow of selecting features and building models.

The workflow of LR is shown in Table 2.

Table 2 Work flow of LR

Full size table

Through the above steps, we can build a logistic regression function: $ {h}_{\theta }(x)=\frac{1}{1+{e}^{-{\theta}^Tx}} $. X which is our input is 1608 proteins for each disease, the output h(x) is the similarity between each disease and AD.

Obviously, the similarity between disease and AD is not the result we hope to obtain. So if we can find a suitable weight for each protein, the similarity between AD and AD itself would be 1. Then, the weight is reasonable and we can obtain the AD-related protein by these weights.

Find AD-related proteins by gradient descent

Therefore, Gradient Descent (GD) is introduced to solve the model obtained by LR.

GD is a kind of optimization method. The work flow of GD is shown in Table 3.

Table 3 Work flow of GD

Full size table

Through the above steps, feature matrix of AD-related proteins are obtained. The 1 in matrix represents that this protein is related to AD.

Figure 4 shows our workflow. Firstly, the similarity between AD and other diseases could be calculated. Then We can get diseases similar to AD. In addition, these diseases-related proteins could be obtained by Uniprot. Finally, LR could be used to build models. After that, GD should be used to obtain the optimal results.

Results

A.
Data process
B.
Result

Since 400 models are built by LR, 400 kinds of results are obtained. Each protein has 10 times chances to be selected as features and algorithm can judge whether it is related to AD. Therefore, the maximum number of times for each protein to be related to AD is 10, and the minimum number is 0.

Figure 5 shows the times that proteins are thought to be related to AD.

As we can see in Fig. 5, more than 500 kinds of proteins are unrelated to AD. Algorithm never gets results that they are AD-related proteins. However, about 50 kinds of proteins are identified to be related to AD for 10 times.

Seven times is set as a threshold to select AD-related proteins. If proteins are thought to be related to AD more than 7 times by algorithm, the proteins are related to AD. Otherwise, we did not consider them as AD-related proteins. There are 376 such proteins.

The Fig. 6 shows the proportion of newly discovered proteins and known proteins.

As we can see, 18% of 376 proteins are known AD-related proteins. Most of proteins are associated with AD-like diseases and researchers do not know that they are associated with AD.

C.
Case study

Three case studies are done to verify our method’s effectiveness. We selected three novel proteins from 308 novel AD-related proteins.

1.
SUMO-conjugating enzyme UBC9

In UniProt, there is no information about the relationship between this protein and AD. Our method identifies the strong correlation between AD and AD. (10 times). Several research have found that UBC9 plays an important role in AD due to its function is associated with the aggregation of beta-amyloid protein (Aβ). It can interact with target protein and change their localization, activity, or stability. LE Mcmillan et al. [31] demonstrated this in 2011.
2.
Kinesin light chain 2 (KLC2)

APP is known important gene to AD. KLC2 can interact with APP and it is considered to be related to AD. Kamal et al. [32] reported that KLC2 can affect transport of APP into axons. S Matsuda et al. ‘s study [33] also demonstrated that KLC2 causes AD by affecting APP.
3.
Kinesin heavy chain isoform 5C (KIF5C)

KIF5A showed pan-neuronal distribution in the nervous system. KIF5B plays an important roles in the maintenance of motor neurons rather than in their formation. D Sepulvedafalla et al. [34] found that KIF5C are highly related to familial AD and neurodegeneration.

Discussion

Identifying the AD-related proteins can help us treatment and diagnose AD better. It saves lots of researchers’ time and money. Doing biological experiments by the priority is an efficient way to understand the mechanism of AD.

Here we purposed a method to identify the AD-related proteins based on a hypothesis which is similar disease share similar proteins. Here is no doubt that proteins have contribution to the similarity of symptoms between diseases.

Therefore, the first step is to calculate the similarity between other diseases and AD. We totally used 5 methods to obtain the similarity. 3.5 was set as threshold to screen diseases which are most related to AD. There are 34 diseases left. Then, we downloaded these diseases-related proteins by Uniprot. Due to the reason mentioned in method section C, 2 diseases are excluded.

Then we aggregate the proteins that correspond to these diseases. Each protein is a one-dimensional feature, and we try to map these features to similarity. Because this is a small sample of high-dimensional problems, the use of LR alone is not enough to solve this problem. Here, we borrowed the idea of RF: randomly selected 40 features to build model by LR each time. Then, GD is introduced to find out the optimal result. After 400 models are built, we summarized the whole results and set 7 as threshold to screen the AD-related proteins.

Finally, we obtained 376 proteins which are related to AD. Three hundred eight of three hundred seventy-six proteins are novel. We selected three of them to do case studies to prove our method’s effectiveness.

Conclusions

Identification of disease-related proteins is essential for developing new drugs and understanding the pathogenesis. In view of the shortcomings of current machine learning methods and protein interaction networks, we propose a regression method, which can effectively avoid the shortcomings of obtaining negative samples and the inability of the network to change dynamically. It provides a new way to solve disease-related proteins, that is, to transform classification or clustering problems into regression problems.

This paper proposes a hypothesis that similar diseases share similar proteins. A total of 2827 proteins were obtained by searching 32 disease-related proteins in Uniprot, but they are only 1608 kinds of proteins, which shows that this hypothesis is valid. Similar diseases have multiple protein duplications.

In the aspect of algorithm innovation, we combine LR with RF to solve the problem of small sample and high dimension. In order to overcome the problem that GD often falls into local optimum, we get a very reasonable initial iteration value.

The results show that this method has certain practical value and is helpful for further research. Through our method, we can find more disease-related proteins.

Availability of data and materials

All the datasets used in this paper could be downloaded from website.

Abbreviations

AD:: Alzheimer’s disease
DO:: Disease Ontology
GD:: Gradient Descent
GO:: Gene Ontology
IC:: Information Content
KIF5C:: Kinesin heavy chain isoform 5C
LR:: Logistic Regression
MICA:: Most Informative Common Ancestor
NFT:: Neurofibrillary Tangles
PPI:: Protein-protein interaction
PSB:: Process-similarity Based
RF:: Random Forests
SUMO:: Small Ubiquitin-related Modifier

References

Cummings J, Lee G, Mortsdorf T, Ritter A, Zhong K. Alzheimer’s disease drug development pipeline: 2017. Alzheimers Dement Transl Res Clin Interv. 2017;3(3):367–84.
Article Google Scholar
Peng J, Guan J, Shang X. Predicting Parkinson's disease genes based on node2vec and autoencoder. Front Genet. 2019;10:226.
Article CAS PubMed PubMed Central Google Scholar
Hu Y, Zhao T, Zang T, Zhang Y, Cheng L. Identification of Alzheimer’s disease-related genes based on data integration method. Front Genet. 2018;9:730.
Jellinger KA. General aspects of neurodegeneration. J neural Transm Suppl. J Neural Transm Suppl. 2003;65(65):101–44.
Article Google Scholar
Wang JZ, Zhidian D, Rapeeporn P, Yu PS, Chin-Fu C. A new method to measure the semantic similarity of GO terms. Bioinformatics. 2007;23(10):1274–81.
Article CAS PubMed Google Scholar
Navarromabarak C, Camachocarranza R, Espinosaaguirre JJ. Cytochrome P450 in the central nervous system as a therapeutic target in neurodegenerative diseases. Drug Metab Rev. 2018;50(2):1–14.
Google Scholar
Leon MJD, Convit A, Wolf OT, Tarshish CY, Desanti S, Rusinek H, Tsui W, Kandil E, Scherer AJ, Roche A. Prediction of cognitive decline in normal elderly subjects with 2-[18F]fluoro-2-deoxy-d-glucose/positron-emission tomography (FDG/PET). Proc Natl Acad Sci U S A. 2001;98(19):10966–71.
Article PubMed PubMed Central Google Scholar
Henriksen K, O’Bryant SE, Hampel H, Trojanowski JQ, Montine TJ, Jeromin A, Blennow K, Lönneborg A, Wyss-Coray T, Soares H. The future of blood-based biomarkers for Alzheimer's disease. Alzheimers Dement. 2014;10(1):115–31.
Article PubMed Google Scholar
Zetterberg H, Wilson D, Andreasson U, Minthon L, Blennow K, Randall J, Hansson O. Plasma tau levels in Alzheimer's disease. Alzheimers Res Ther. 2013;5(2):9.
Article CAS PubMed PubMed Central Google Scholar
Olsson B, Lautner R, Andreasson U, Öhrfelt A, Portelius E, Bjerke M, Hölttä M, Rosén C, Olsson C, Strobel G. CSF and blood biomarkers for the diagnosis of Alzheimer's disease: a systematic review and meta-analysis. Lancet Neurol. 2016;15(7):673–84.
Article CAS PubMed Google Scholar
Bacioglu M, Maia LF, Preische O, Schelle J, Apel A, Kaeser SA, Schweighauser M, Eninger T, Lambert M, Pilotto A. Neurofilament light chain in blood and CSF as marker of disease progression in mouse models and in neurodegenerative diseases. Neuron. 2016;91(1):56–66.
Article CAS PubMed Google Scholar
O'Bryant SE, Xiao G, Zhang F, Edwards M, German DC, Yin X, Como T, Reisch J, Huebinger RM, Graff-Radford N. Validation of a serum screen for Alzheimer's disease across assay platforms, species, and tissues. J Alzheimers Dis. 2014;42(4):1325–35.
Article CAS PubMed PubMed Central Google Scholar
Lista S, Dubois B, Hampel H. Paths to Alzheimer’s disease prevention: from modifiable risk factors to biomarker enrichment strategies. J Nutr Health Aging. 2015;19(2):154–63.
Article CAS PubMed Google Scholar
Peng J, Wang Y, Jin C. Towards integrative gene functional similarity measurement. BMC Bioinformatics. 2014;15(2):1–10.
Google Scholar
Peng J, Wang X, Shang X. Combining gene ontology with deep neural networks to enhance the clustering of single cell RNA-Seq data. BMC Bioinformatics. 2019;20(8):284.
Article PubMed PubMed Central Google Scholar
Mukherjee S, Kaeberlein M, Kauwe J, Naj AC, Crane P. A systems-biology approach to identify candidate genes for Alzheimer's disease by integrating protein-protein interaction network and subsequent in vivo validation of candidate genes using a C. elegans model of ab toxicity. Alzheimers Dement. 2014;10(4):P298–9.
Article Google Scholar
Peng J, Hui W, Li Q, Chen B, Hao J, Jiang Q, Shang X, Wei Z. A learning-based framework for miRNA-disease association identification using neural networks. Bioinformatics. 2019;21(1):4364-71.
Cheng L, Hu Y, Sun J, Zhou M, Jiang Q. DincRNA: a comprehensive web-based bioinformatics toolkit for exploring disease associations and ncRNA function. Bioinformatics. 2018;34(11):1953–6.
Article CAS PubMed Google Scholar
Cheng L, Wang P, Tian R, Wang S, Guo Q, Luo M, Zhou W, Liu G, Jiang H, Jiang Q. LncRNA2Target v2. 0: a comprehensive database for target genes of lncRNAs in human and mouse. Nucleic Acids Res. 2018;47(D1):D140–4.
Article PubMed Central CAS Google Scholar
Fu C, Deng S, Jin G, Wang X, Yu Z-G. Bayesian network model for identification of pathways by integrating protein interaction with genetic interaction data. BMC Syst Biol. 2017;11(4):81.
Article PubMed PubMed Central CAS Google Scholar
Krejci A, Hupp TR, Lexa M, Vojtesek B, Muller P. Hammock: a hidden Markov model-based peptide clustering algorithm to identify protein-interaction consensus motifs in large datasets. Bioinformatics. 2015;32(1):9–16.
PubMed PubMed Central Google Scholar
Xu L, Liao C, Chen G-D, Chang C-C. k-skip-n-gram-RF: a random Forest based method for Alzheimer’s disease protein identification. Front Genet. 2019;10:33.
Article CAS PubMed PubMed Central Google Scholar
Cui Y, Cai M, Stanley HE. Discovering disease-associated genes in weighted protein–protein interaction networks. Physica A Stat Mech Appl. 2018;496:53–61.
Article CAS Google Scholar
Barber IS, Nevado-Holgado AJ, Lovestone S. A Parkinson’s disease protein biomarker panel using the Somamer assay and machine learning. Alzheimers Dement. 2017;13(7):P1027.
Article Google Scholar
Schriml LM, Arze C, Nadendla S, Chang YWW, Mazaitis M, Felix V, Feng G, Kibbe WA. Disease ontology: a backbone for disease semantic integration. Nucleic Acids Res. 2012;40(Database issue):940–6.
Article CAS Google Scholar
Consortium UP. UniProt: a hub for protein information. Nucleic Acids Res. 2015;43(Database issue):204–12.
Article CAS Google Scholar
Resnik P. Using information content to evaluate semantic similarity in a taxonomy; 1999. p. 448–53.
Google Scholar
Lin D. An information-theoretic De nition of similarity; 1998.
Google Scholar
Mathur S, Dinakarpandian D. Finding disease similarity based on implicit semantic similarity. J Biomed Inform. 2012;45(2):363–71.
Article PubMed Google Scholar
Cheng L, Li J, Ju P, Peng J, Wang Y. SemFunSim: a new method for measuring disease similarity by integrating semantic and gene functional association. PLoS One. 2014;9(6):e99415.
Article PubMed PubMed Central Google Scholar
Mcmillan LE, Brown JT, Henley JM, Cimarosti H. Profiles of SUMO and ubiquitin conjugation in an Alzheimer's disease model. Neurosci Lett. 2011;502(3):201–8.
Article CAS PubMed PubMed Central Google Scholar
Kamal A, Stokin GB, Yang Z, Xia CH, Goldstein LSB. Axonal transport of amyloid precursor protein is mediated by direct binding to the Kinesin light chain subunit of Kinesin-I. Neuron. 2000;28(2):449–59.
Article CAS PubMed Google Scholar
Matsuda S, Matsuda Y, D'Adamio L. Amyloid beta protein precursor (AbetaPP), but not AbetaPP-like protein 2, is bridged to the kinesin light chain by the scaffold protein JNK-interacting protein 1. J Biol Chem. 2003;278(40):38601.
Article CAS PubMed Google Scholar
Sepulvedafalla D, Barreraocampo A, Hagel C, Korwitz A, Vinuezaveloz MF, Zhou K, Schonewille M, Zhou H, Velazquezperez L, Rodriguezlabrada R. Familial Alzheimer’s disease–associated presenilin-1 alters cerebellar activity and calcium homeostasis. J Clin Investig. 2014;124(4):1552–67.
Article CAS Google Scholar

Download references

Acknowledgements

The authors thank the anonymous referees for their many useful suggestions.

About this supplement

This article has been published as part of BMC Bioinformatics Volume 20 Supplement 18, 2019: Selected articles from the Biological Ontologies and Knowledge bases workshop 2018. The full contents of the supplement are available online at https://bmcbioinformatics.biomedcentral.com/articles/supplements/volume-20-supplement-18.

Funding

The publication cost of this article was funded by the Fundamental Research Funds for the Central Universities (Grant No. HIT NSRIF 201856), National Natural Science Foundation of China (Grant No. 61502125), Heilongjiang Postdoctoral Fund (Grant No. LBH-Z6064 and LBH-Z15179), and China Postdoctoral Science Foundation (Grant No. 2016 M590291).

Author information

Tianyi Zhao and Yang Hu contributed equally to this work.

Authors and Affiliations

School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
Tianyi Zhao, Yang Hu & Tianyi Zang
College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150001, China
Liang Cheng

Authors

Tianyi Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Yang Hu
View author publications
You can also search for this author in PubMed Google Scholar
Tianyi Zang
View author publications
You can also search for this author in PubMed Google Scholar
Liang Cheng
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

TY Zang, and LC helped revise this paper. YH and TY Zhao wrote this paper and did the experiments. All authors have read and approved the final manuscript.

Corresponding authors

Correspondence to Tianyi Zang or Liang Cheng.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article

Zhao, T., Hu, Y., Zang, T. et al. Identifying Alzheimer’s disease-related proteins by LRRGD. BMC Bioinformatics 20 (Suppl 18), 570 (2019). https://doi.org/10.1186/s12859-019-3124-7

Download citation

Published: 25 November 2019
DOI: https://doi.org/10.1186/s12859-019-3124-7

Identifying Alzheimer’s disease-related proteins by LRRGD

Abstract

Background

Results

Conclusion

Similar content being viewed by others

ReRF-Pred: predicting amyloidogenic regions of proteins based on their pseudo amino acid composition and tripeptide composition

AMYPred-FRL is a novel approach for accurate prediction of amyloid proteins by using feature representation learning

Machine learning prediction and tau-based screening identifies potential Alzheimer’s disease genes relevant to immunity

Background

Methods

Data collection and database content

Disease ontology

Uniprot

Gene ontology

Disease similarity

Extracting features

Map features to similarity by logistics regression

Find AD-related proteins by gradient descent

Results

Discussion

Conclusions

Availability of data and materials

Abbreviations

References

Acknowledgements

About this supplement

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation