GENN: A GEneral Neural Network for Learning Tabulated Data with Examples from Protein Structure Prediction

Protocol
Part of the Methods in Molecular Biology book series (MIMB, volume 1260)

Abstract

We present a GEneral Neural Network (GENN) for learning trends from existing data and making predictions of unknown information. The main novelty of GENN is in its generality, simplicity of use, and its specific handling of windowed input/output. Its main strength is its efficient handling of the input data, enabling learning from large datasets. GENN is built on a two-layered neural network and has the option to use separate inputs–output pairs or window-based data using data structures to efficiently represent input–output pairs. The program was tested on predicting the accessible surface area of globular proteins, scoring proteins according to similarity to native, predicting protein disorder, and has performed remarkably well. In this paper we describe the program and its use. Specifically, we give as an example the construction of a similarity to native protein scoring function that was constructed using GENN. The source code and Linux executables for GENN are available from Research and Information Systems at http://mamiris.com and from the Battelle Center for Mathematical Medicine at http://mathmed.org. Bugs and problems with the GENN program should be reported to EF.

Key words

Neural network Protein scoring Windowed input Automatic learning GENN Protein structure prediction 

References

  1. 1.
    Kassin SM (1979) Consensus information, prediction, and causal attribution: a review of the literature and issues. J Pers Soc Psychol 37:1966CrossRefGoogle Scholar
  2. 2.
    Crick NR, Dodge KA (1994) A review and reformulation of social information-processing mechanisms in children’s social adjustment. Psychol Bull 115:74CrossRefGoogle Scholar
  3. 3.
    Fielding AH, Bell JF (1997) A review of methods for the assessment of prediction errors in conservation presence/absence models. Environ Conserv 24:38–49CrossRefGoogle Scholar
  4. 4.
    Makhoul J (1975) Linear prediction: a tutorial review. Proc IEEE 63:561–580CrossRefGoogle Scholar
  5. 5.
    Fontenot RJ, Wilson EJ (1997) Relational exchange: a review of selected models for a prediction matrix of relationship activities. J Bus Res 39:5–12CrossRefGoogle Scholar
  6. 6.
    Rost B et al (2001) Review: protein secondary structure prediction continues to rise. J Struct Biol 134:204–218PubMedCrossRefGoogle Scholar
  7. 7.
    Maier HR, Dandy GC (2000) Neural networks for the prediction and forecasting of water resources variables: a review of modelling issues and applications. Environ Model Softw 15:101–124CrossRefGoogle Scholar
  8. 8.
    Chang JC, Wooten EC, Tsimelzon A, Hilsenbeck SG, Gutierrez MC, Elledge R, Mohsin S, Osborne CK, Chamness GC, Allred DC et al (2003) Gene expression profiling for the prediction of therapeutic response to docetaxel in patients with breast cancer. Lancet 362:362–369PubMedCrossRefGoogle Scholar
  9. 9.
    Schofield W et al (1985) Predicting basal metabolic rate, new standards and review of previous work. Hum Nutr Clin Nutr 39:5PubMedGoogle Scholar
  10. 10.
    Blundell T, Sibanda B, Sternberg M, Thornton J (1987) Knowledge-based prediction of protein structures. Nature 326:26CrossRefGoogle Scholar
  11. 11.
    Chou PY, Fasman GD (1978) Empirical predictions of protein conformation. Annu Rev Biochem 47:251–276PubMedCrossRefGoogle Scholar
  12. 12.
    Floudas C, Fung H, McAllister S, Mönnigmann M, Rajgaria R (2006) Advances in protein structure prediction and de novo protein design: a review. Chem Eng Sci 61:966–988CrossRefGoogle Scholar
  13. 13.
    Moult J (2005) A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction. Curr Opin Struct Biol 15:285–289PubMedCrossRefGoogle Scholar
  14. 14.
    Vazquez A, Flammini A, Maritan A, Vespignani A (2003) Global protein function prediction from protein-protein interaction networks. Nat Biotechnol 21:697–700PubMedCrossRefGoogle Scholar
  15. 15.
    Borgwardt KM, Ong CS, Schönauer S, Vishwanathan S, Smola AJ, Kriegel H-P (2005) Protein function prediction via graph kernels. Bioinformatics 21:i47–i56PubMedCrossRefGoogle Scholar
  16. 16.
    Chothia C (1974) Hydrophobic bonding and accessible surface area in proteins. Nature 248:338–339PubMedCrossRefGoogle Scholar
  17. 17.
    Moret M, Zebende G (2007) Amino acid hydrophobicity and accessible surface area. Phys Rev E 75:011920CrossRefGoogle Scholar
  18. 18.
    Dor O, Zhou Y (2007) Real-spine: an integrated system of neural networks for real-value prediction of protein structural properties. Proteins: Struct Funct Bioinf 68:76–81CrossRefGoogle Scholar
  19. 19.
    Durham E, Dorr B, Woetzel N, Staritzbichler R, Meiler J (2009) Solvent accessible surface area approximations for rapid and accurate protein structure prediction. J Mol Model 15:1093–1108PubMedCentralPubMedCrossRefGoogle Scholar
  20. 20.
    Zhang H, Zhang T, Chen K, Shen S, Ruan J, Kurgan L (2009) On the relation between residue flexibility and local solvent accessibility in proteins. Proteins: Struct Funct Bioinf 76:617–636CrossRefGoogle Scholar
  21. 21.
    Faraggi E, Xue B, Zhou Y (2009) Improving the prediction accuracy of residue solvent accessibility and real-value backbone torsion angles of proteins by guided-learning through a two-layer neural network. Proteins: Struct Funct Bioinf 74:847–856CrossRefGoogle Scholar
  22. 22.
    Zhang T, Zhang H, Chen K, Ruan J, Shen S, Kurgan L (2010) Analysis and prediction of RNA-binding residues using sequence, evolutionary conservation, and predicted secondary structure and solvent accessibility. Curr Protein Pept Sci 11:609–628PubMedCrossRefGoogle Scholar
  23. 23.
    Gao J, Zhang T, Zhang H, Shen S, Ruan J, Kurgan L (2010) Accurate prediction of protein folding rates from sequence and sequence-derived residue flexibility and solvent accessibility. Proteins: Struct Funct Bioinf 78:2114–2130Google Scholar
  24. 24.
    Nunez S, Venhorst J, Kruse CG (2010) Assessment of a novel scoring method based on solvent accessible surface area descriptors. J Chem Inf Model 50:480–486PubMedCrossRefGoogle Scholar
  25. 25.
    Faraggi E, Zhang T, Yang Y, Kurgan L, Zhou Y (2012) Spine x: improving protein secondary structure prediction by multistep learning coupled with prediction of solvent accessible surface area and backbone torsion angles. J Comput Chem 33:259–267PubMedCentralPubMedCrossRefGoogle Scholar
  26. 26.
    Wang C, Xi L, Li S, Liu H, Yao X (2012) A sequence-based computational model for the prediction of the solvent accessible surface area for α-helix and β-barrel transmembrane residues. J Comput Chem 33:11–17PubMedCrossRefGoogle Scholar
  27. 27.
    Faraggi E, Kloczkowski A (2013) A global machine learning based scoring function for protein structure prediction. Proteins: Struct Funct Bioinf. doi:10.1002/prot.24454Google Scholar
  28. 28.
    Xue B, Dor O, Faraggi E, Zhou Y (2008) Real value prediction of backbone torsion angles. Proteins: Struct Funct Bioinf 72:427–433CrossRefGoogle Scholar
  29. 29.
    Faraggi E, Yang Y, Zhang S, Zhou Y (2009) Predicting continuous local structure and the effect of its substitution for secondary structure in fragment-free protein structure prediction. Structure 17:1515–1527PubMedCentralPubMedCrossRefGoogle Scholar
  30. 30.
    Zhang T, Faraggi E, Zhou Y (2010) Fluctuations of backbone torsion angles obtained from nmr-determined structures and their prediction. Proteins: Struct Funct Bioinf 78:3353–3362CrossRefGoogle Scholar
  31. 31.
    Zhang T, Faraggi E, Xue B, Dunker AK, Uversky VN, Zhou Y (2012) Spine-d: accurate prediction of short and long disordered regions by a single neural-network based method. J Biomol Struct Dyn 29:799–813PubMedCentralPubMedCrossRefGoogle Scholar
  32. 32.
    Moult J, Fidelis K, Kryshtafovych A, Tramontano A (2011) Critical assessment of methods of protein structure prediction (casp) round ix. Proteins: Struct Funct Bioinf 79:1–5CrossRefGoogle Scholar
  33. 33.
    Faraggi E, Yaoqi Z, Kloczkowski A (2014) Accurate single-sequence prediction of solvent accessible surface area using local and global features. Proteins: Struct, Funct, and Bioinf. DOI: 10.1002/prot.24682
  34. 34.
    CASP10 (2012) Official group performance ranking. http://www.predictioncenter.org/casp10/groups_analysis.cgi. Accessed 10 June 2012
  35. 35.
    Feng Y, Kloczkowski A, Jernigan R (2007) Four-body contact potentials derived from two protein datasets to discriminate native structures from decoys. Proteins: Struct Funct Bioinf 68:57–66CrossRefGoogle Scholar
  36. 36.
    Feng Y, Kloczkowski A, Jernigan RL (2010) Potentials’ r’us web-server for protein energy estimations with coarse-grained knowledge-based potentials. BMC Bioinf 11:92CrossRefGoogle Scholar
  37. 37.
    Gniewek P, Leelananda SP, Kolinski A, Jernigan RL, Kloczkowski A (2011) Multibody coarse-grained potentials for native structure recognition and quality assessment of protein models. Proteins: Struct Funct Bioinf 79:1923–1929CrossRefGoogle Scholar
  38. 38.
    Zhou H, Zhou Y (2002) Distance-scaled, finite ideal-gas reference state improves structure-derived potentials of mean force for structure selection and stability prediction. Protein Sci 11:2714–2726PubMedCentralPubMedCrossRefGoogle Scholar
  39. 39.
    Yang Y, Zhou Y (2008) Ab initio folding of terminal segments with secondary structures reveals the fine difference between two closely related all-atom statistical energy functions. Protein Sci 17:1212–1219PubMedCentralPubMedCrossRefGoogle Scholar
  40. 40.
    Zhang J, Zhang Y (2010) A novel side-chain orientation dependent potential derived from random-walk reference state for protein fold selection and structure prediction. PLoS One 5:e15386PubMedCentralPubMedCrossRefGoogle Scholar
  41. 41.
    Zhang Y, Skolnick J (2004) Scoring function for automated assessment of protein structure template quality. Proteins: Struct Funct Bioinf 57:702–710CrossRefGoogle Scholar
  42. 42.
    Xu J, Zhang Y (2010) How significant is a protein structure similarity with tm-score = 0.5? Bioinformatics 26:889–895PubMedCentralPubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.Department of Biochemistry and Molecular BiologyIndiana University School of Medicine, IndianapolisIndianaUSA
  2. 2.Battelle Center for Mathematical MedicineNationwide Children’s HospitalColumbusUSA
  3. 3.Physics DivisionResearch and Information Systems, LLCCarmelUSA
  4. 4.Battelle Center for Mathematical MedicineNationwide Children’s HospitalColumbusUSA
  5. 5.Department of PediatricsThe Ohio State UniversityColumbusUSA

Personalised recommendations