Data Mining and Knowledge Discovery

, Volume 11, Issue 3, pp 213–222

Accurate Prediction of Protein Disordered Regions by Mining Protein Structure Data

  • Jianlin Cheng
  • Michael J. Sweredoski
  • Pierre Baldi
Article

DOI: 10.1007/s10618-005-0001-y

Cite this article as:
Cheng, J., Sweredoski, M.J. & Baldi, P. Data Min Knowl Disc (2005) 11: 213. doi:10.1007/s10618-005-0001-y

Abstract

Intrinsically disordered regions in proteins are relatively frequent and important for our understanding of molecular recognition and assembly, and protein structure and function. From an algorithmic standpoint, flagging large disordered regions is also important for ab initio protein structure prediction methods. Here we first extract a curated, non-redundant, data set of protein disordered regions from the Protein Data Bank and compute relevant statistics on the length and location of these regions. We then develop an ab initio predictor of disordered regions called DISpro which uses evolutionary information in the form of profiles, predicted secondary structure and relative solvent accessibility, and ensembles of 1D-recursive neural networks. DISpro is trained and cross validated using the curated data set. The experimental results show that DISpro achieves an accuracy of 92.8% with a false positive rate of 5%. DISpro is a member of the SCRATCH suite of protein data mining tools available through http://www.igb.uci.edu/servers/psss.html.

Keywords

protein structure predictiondisordered regionsrecursive neural networks

Copyright information

© Springer Science + Business Media, Inc 2005

Authors and Affiliations

  • Jianlin Cheng
    • 1
  • Michael J. Sweredoski
    • 1
  • Pierre Baldi
    • 1
  1. 1.School of Information and Computer Science, Institute for Genomics and BioinformaticsUniversity of California IrvineIrvineUSA