Skip to main content

GENN: A GEneral Neural Network for Learning Tabulated Data with Examples from Protein Structure Prediction

  • Protocol
  • First Online:
Artificial Neural Networks

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1260))

Abstract

We present a GEneral Neural Network (GENN) for learning trends from existing data and making predictions of unknown information. The main novelty of GENN is in its generality, simplicity of use, and its specific handling of windowed input/output. Its main strength is its efficient handling of the input data, enabling learning from large datasets. GENN is built on a two-layered neural network and has the option to use separate inputs–output pairs or window-based data using data structures to efficiently represent input–output pairs. The program was tested on predicting the accessible surface area of globular proteins, scoring proteins according to similarity to native, predicting protein disorder, and has performed remarkably well. In this paper we describe the program and its use. Specifically, we give as an example the construction of a similarity to native protein scoring function that was constructed using GENN. The source code and Linux executables for GENN are available from Research and Information Systems at http://mamiris.com and from the Battelle Center for Mathematical Medicine at http://mathmed.org. Bugs and problems with the GENN program should be reported to EF.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Kassin SM (1979) Consensus information, prediction, and causal attribution: a review of the literature and issues. J Pers Soc Psychol 37:1966

    Article  Google Scholar 

  2. Crick NR, Dodge KA (1994) A review and reformulation of social information-processing mechanisms in children’s social adjustment. Psychol Bull 115:74

    Article  Google Scholar 

  3. Fielding AH, Bell JF (1997) A review of methods for the assessment of prediction errors in conservation presence/absence models. Environ Conserv 24:38–49

    Article  Google Scholar 

  4. Makhoul J (1975) Linear prediction: a tutorial review. Proc IEEE 63:561–580

    Article  Google Scholar 

  5. Fontenot RJ, Wilson EJ (1997) Relational exchange: a review of selected models for a prediction matrix of relationship activities. J Bus Res 39:5–12

    Article  Google Scholar 

  6. Rost B et al (2001) Review: protein secondary structure prediction continues to rise. J Struct Biol 134:204–218

    Article  CAS  PubMed  Google Scholar 

  7. Maier HR, Dandy GC (2000) Neural networks for the prediction and forecasting of water resources variables: a review of modelling issues and applications. Environ Model Softw 15:101–124

    Article  Google Scholar 

  8. Chang JC, Wooten EC, Tsimelzon A, Hilsenbeck SG, Gutierrez MC, Elledge R, Mohsin S, Osborne CK, Chamness GC, Allred DC et al (2003) Gene expression profiling for the prediction of therapeutic response to docetaxel in patients with breast cancer. Lancet 362:362–369

    Article  CAS  PubMed  Google Scholar 

  9. Schofield W et al (1985) Predicting basal metabolic rate, new standards and review of previous work. Hum Nutr Clin Nutr 39:5

    PubMed  Google Scholar 

  10. Blundell T, Sibanda B, Sternberg M, Thornton J (1987) Knowledge-based prediction of protein structures. Nature 326:26

    Article  Google Scholar 

  11. Chou PY, Fasman GD (1978) Empirical predictions of protein conformation. Annu Rev Biochem 47:251–276

    Article  CAS  PubMed  Google Scholar 

  12. Floudas C, Fung H, McAllister S, Mönnigmann M, Rajgaria R (2006) Advances in protein structure prediction and de novo protein design: a review. Chem Eng Sci 61:966–988

    Article  CAS  Google Scholar 

  13. Moult J (2005) A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction. Curr Opin Struct Biol 15:285–289

    Article  CAS  PubMed  Google Scholar 

  14. Vazquez A, Flammini A, Maritan A, Vespignani A (2003) Global protein function prediction from protein-protein interaction networks. Nat Biotechnol 21:697–700

    Article  CAS  PubMed  Google Scholar 

  15. Borgwardt KM, Ong CS, Schönauer S, Vishwanathan S, Smola AJ, Kriegel H-P (2005) Protein function prediction via graph kernels. Bioinformatics 21:i47–i56

    Article  CAS  PubMed  Google Scholar 

  16. Chothia C (1974) Hydrophobic bonding and accessible surface area in proteins. Nature 248:338–339

    Article  CAS  PubMed  Google Scholar 

  17. Moret M, Zebende G (2007) Amino acid hydrophobicity and accessible surface area. Phys Rev E 75:011920

    Article  CAS  Google Scholar 

  18. Dor O, Zhou Y (2007) Real-spine: an integrated system of neural networks for real-value prediction of protein structural properties. Proteins: Struct Funct Bioinf 68:76–81

    Article  CAS  Google Scholar 

  19. Durham E, Dorr B, Woetzel N, Staritzbichler R, Meiler J (2009) Solvent accessible surface area approximations for rapid and accurate protein structure prediction. J Mol Model 15:1093–1108

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  20. Zhang H, Zhang T, Chen K, Shen S, Ruan J, Kurgan L (2009) On the relation between residue flexibility and local solvent accessibility in proteins. Proteins: Struct Funct Bioinf 76:617–636

    Article  CAS  Google Scholar 

  21. Faraggi E, Xue B, Zhou Y (2009) Improving the prediction accuracy of residue solvent accessibility and real-value backbone torsion angles of proteins by guided-learning through a two-layer neural network. Proteins: Struct Funct Bioinf 74:847–856

    Article  CAS  Google Scholar 

  22. Zhang T, Zhang H, Chen K, Ruan J, Shen S, Kurgan L (2010) Analysis and prediction of RNA-binding residues using sequence, evolutionary conservation, and predicted secondary structure and solvent accessibility. Curr Protein Pept Sci 11:609–628

    Article  CAS  PubMed  Google Scholar 

  23. Gao J, Zhang T, Zhang H, Shen S, Ruan J, Kurgan L (2010) Accurate prediction of protein folding rates from sequence and sequence-derived residue flexibility and solvent accessibility. Proteins: Struct Funct Bioinf 78:2114–2130

    CAS  Google Scholar 

  24. Nunez S, Venhorst J, Kruse CG (2010) Assessment of a novel scoring method based on solvent accessible surface area descriptors. J Chem Inf Model 50:480–486

    Article  CAS  PubMed  Google Scholar 

  25. Faraggi E, Zhang T, Yang Y, Kurgan L, Zhou Y (2012) Spine x: improving protein secondary structure prediction by multistep learning coupled with prediction of solvent accessible surface area and backbone torsion angles. J Comput Chem 33:259–267

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  26. Wang C, Xi L, Li S, Liu H, Yao X (2012) A sequence-based computational model for the prediction of the solvent accessible surface area for α-helix and β-barrel transmembrane residues. J Comput Chem 33:11–17

    Article  PubMed  Google Scholar 

  27. Faraggi E, Kloczkowski A (2013) A global machine learning based scoring function for protein structure prediction. Proteins: Struct Funct Bioinf. doi:10.1002/prot.24454

    Google Scholar 

  28. Xue B, Dor O, Faraggi E, Zhou Y (2008) Real value prediction of backbone torsion angles. Proteins: Struct Funct Bioinf 72:427–433

    Article  CAS  Google Scholar 

  29. Faraggi E, Yang Y, Zhang S, Zhou Y (2009) Predicting continuous local structure and the effect of its substitution for secondary structure in fragment-free protein structure prediction. Structure 17:1515–1527

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  30. Zhang T, Faraggi E, Zhou Y (2010) Fluctuations of backbone torsion angles obtained from nmr-determined structures and their prediction. Proteins: Struct Funct Bioinf 78:3353–3362

    Article  CAS  Google Scholar 

  31. Zhang T, Faraggi E, Xue B, Dunker AK, Uversky VN, Zhou Y (2012) Spine-d: accurate prediction of short and long disordered regions by a single neural-network based method. J Biomol Struct Dyn 29:799–813

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  32. Moult J, Fidelis K, Kryshtafovych A, Tramontano A (2011) Critical assessment of methods of protein structure prediction (casp) round ix. Proteins: Struct Funct Bioinf 79:1–5

    Article  CAS  Google Scholar 

  33. Faraggi E, Yaoqi Z, Kloczkowski A (2014) Accurate single-sequence prediction of solvent accessible surface area using local and global features. Proteins: Struct, Funct, and Bioinf. DOI: 10.1002/prot.24682

  34. CASP10 (2012) Official group performance ranking. http://www.predictioncenter.org/casp10/groups_analysis.cgi. Accessed 10 June 2012

  35. Feng Y, Kloczkowski A, Jernigan R (2007) Four-body contact potentials derived from two protein datasets to discriminate native structures from decoys. Proteins: Struct Funct Bioinf 68:57–66

    Article  CAS  Google Scholar 

  36. Feng Y, Kloczkowski A, Jernigan RL (2010) Potentials’ r’us web-server for protein energy estimations with coarse-grained knowledge-based potentials. BMC Bioinf 11:92

    Article  Google Scholar 

  37. Gniewek P, Leelananda SP, Kolinski A, Jernigan RL, Kloczkowski A (2011) Multibody coarse-grained potentials for native structure recognition and quality assessment of protein models. Proteins: Struct Funct Bioinf 79:1923–1929

    Article  CAS  Google Scholar 

  38. Zhou H, Zhou Y (2002) Distance-scaled, finite ideal-gas reference state improves structure-derived potentials of mean force for structure selection and stability prediction. Protein Sci 11:2714–2726

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  39. Yang Y, Zhou Y (2008) Ab initio folding of terminal segments with secondary structures reveals the fine difference between two closely related all-atom statistical energy functions. Protein Sci 17:1212–1219

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  40. Zhang J, Zhang Y (2010) A novel side-chain orientation dependent potential derived from random-walk reference state for protein fold selection and structure prediction. PLoS One 5:e15386

    Article  PubMed Central  PubMed  Google Scholar 

  41. Zhang Y, Skolnick J (2004) Scoring function for automated assessment of protein structure template quality. Proteins: Struct Funct Bioinf 57:702–710

    Article  CAS  Google Scholar 

  42. Xu J, Zhang Y (2010) How significant is a protein structure similarity with tm-score = 0.5? Bioinformatics 26:889–895

    Article  CAS  PubMed Central  PubMed  Google Scholar 

Download references

Acknowledgements

We gratefully acknowledge the financial support provided by the National Institutes of Health (NIH) through Grants R01GM072014 and R01GM073095 and the National Science Foundation through Grant NSF MCB 1071785. Both authors would like to thank the organizers of CASP10 conference in Gaeta, Italy, for inviting them to the conference and providing free registration to EF. EF would also like to thank Yaoqi Zhou and Keith Dunker for hosting him at IUPUI and general discussions.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Eshel Faraggi or Andrzej Kloczkowski .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer Science+Business Media New York

About this protocol

Cite this protocol

Faraggi, E., Kloczkowski, A. (2015). GENN: A GEneral Neural Network for Learning Tabulated Data with Examples from Protein Structure Prediction. In: Cartwright, H. (eds) Artificial Neural Networks. Methods in Molecular Biology, vol 1260. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-2239-0_10

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-2239-0_10

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4939-2238-3

  • Online ISBN: 978-1-4939-2239-0

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics