Frontiers of Electrical and Electronic Engineering in China

, Volume 5, Issue 2, pp 107–118

Comparative study of network-based prioritization of protein domains associated with human complex diseases

Research Article

DOI: 10.1007/s11460-010-0018-x

Cite this article as:
Zhang, W., Chen, Y. & Jiang, R. Front. Electr. Electron. Eng. China (2010) 5: 107. doi:10.1007/s11460-010-0018-x
  • 46 Views

Abstract

Domains are basic structural and functional unit of proteins, and, thus, exploring associations between protein domains and human inherited diseases will greatly improve our understanding of the pathogenesis of human complex diseases and further benefit the medical prevention, diagnosis and treatment of these diseases. Based on the assumption that deleterious nonsynonymous single nucleotide polymorphisms (nsSNPs) underlying human complex diseases may actually change structures of protein domains, affect functions of corresponding proteins, and finally result in these diseases, we compile a dataset that contains 1174 associations between 433 protein domains and 848 human disease phenotypes. With this dataset, we compare two approaches (guilt-by-association and correlation coefficient) that use a domain-domain interaction network and a phenotype similarity network to prioritize associations between candidate domains and human disease phenotypes. We implement these methods with three distance measures (direct neighbor, shortest path with Gaussian kernel, and diffusion kernel), demonstrate the effectiveness of these methods using three large-scale leave-one-out cross-validation experiments (random control, simulated linkage interval, and whole-genome scan), and evaluate the performance of these methods in terms of three criteria (mean rank ratio, precision, and AUC score). Results show that both methods can effectively prioritize domains that are associated with human diseases at the top of the candidate list, while the correlation coefficient approach can achieve slightly higher performance in most cases. Finally, taking the advantage that the correlation coefficient method does not require known disease-domain associations, we calculate a genome-wide landscape of associations between 4036 protein domains and 5080 human disease phenotypes using this method and offer a freely accessible web interface for this landscape.

Keywords

protein domainsdisease phenotypesprioritizationguilt-by-associationcorrelation coefficient

Copyright information

© Higher Education Press and Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  1. 1.MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST/Department of AutomationTsinghua UniversityBeijingChina
  2. 2.School of SciencesUniversity of JinanJinanChina