Dynamic Outlier Exclusion Training Algorithm for Sequence Based Predictions in Proteins Using Neural Network

  • Shandar Ahmad
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4774)


Many structural and functional properties of proteins can be described as a one-dimensional one-to-one mapping between residues of protein sequence and target structure or function. These residue level properties (RLPs) have been frequently predicted using neural networks and other machine learning algorithms. Here we present an algorithm to dynamically exclude from the neural network training, examples which are most difficult to separate. This algorithm automatically filters out statistical outliers causing noise and makes training faster without losing network ability to generalize. Different methods of sampling data for neural network training have been tried and their impact on learning has been analyzed.


Binding sites Neural networks Sequence information Outliers 


  1. 1.
    Rost, B., Liu, J., Nair, R., Wrzeszczynski, K.O., Ofran, Y.: Automatic prediction of protein function. Cell Mol. Life Sci. 60(12), 2637–2650 (2003)CrossRefGoogle Scholar
  2. 2.
    Moult, J.: A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction. Curr. Opin. Struct. Biol. 15(3), 285–289 (2005)CrossRefGoogle Scholar
  3. 3.
    Wolfson, H.J., Shatsky, M., Schneidman-Duhovny, D., Dror, O., Shulman-Peleg, A., Ma, B., Nussinov, R.: From structure to function: methods and applications. Curr. Protein Pept. Sci. 6(2), 171–183 (2005)CrossRefGoogle Scholar
  4. 4.
    Schlessinger, A., Rost, B.: Protein flexibility and rigidity predicted from sequence. Proteins 61(1), 115–126 (2005)CrossRefGoogle Scholar
  5. 5.
    Nguyen, M.N., Rajapakse, J.C.: Prediction of protein relative solvent accessibility with a two-stage SVM approach. Proteins 59(1), 30–37 (2005)CrossRefGoogle Scholar
  6. 6.
    Ahmad, S., Gromiha, M.M., Sarai, A.: A Real value prediction of solvent accessibility from amino acid sequence. Proteins 50(4), 629–635 (2003)CrossRefGoogle Scholar
  7. 7.
    Ahmad, S., Sarai, A.: PSSM-based prediction of DNA binding sites in proteins. BMC Bioinformatic 6, 33–35 (2005)CrossRefGoogle Scholar
  8. 8.
    Ahmad, S., Gromiha, M., Sarai, A.: Analysis and Prediction of DNA-binding proteins and their binding residues based on Composition, Sequence and Structural Information. Bioinformatics 20, 477–486 (2004)CrossRefGoogle Scholar
  9. 9.
    Malik, A., Ahmad, S.: Sequence and structural features of carbohydrate binding in proteins and assessment of predictability using a neural network. BMC Structural BGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Shandar Ahmad
    • 1
  1. 1.National Institute of Biomedical Innovation, Saito Asagi, Ibaraki-shi, OsakaJapan

Personalised recommendations