GENN: A GEneral Neural Network for Learning Tabulated Data with Examples from Protein Structure Prediction
We present a GEneral Neural Network (GENN) for learning trends from existing data and making predictions of unknown information. The main novelty of GENN is in its generality, simplicity of use, and its specific handling of windowed input/output. Its main strength is its efficient handling of the input data, enabling learning from large datasets. GENN is built on a two-layered neural network and has the option to use separate inputs–output pairs or window-based data using data structures to efficiently represent input–output pairs. The program was tested on predicting the accessible surface area of globular proteins, scoring proteins according to similarity to native, predicting protein disorder, and has performed remarkably well. In this paper we describe the program and its use. Specifically, we give as an example the construction of a similarity to native protein scoring function that was constructed using GENN. The source code and Linux executables for GENN are available from Research and Information Systems at http://mamiris.com and from the Battelle Center for Mathematical Medicine at http://mathmed.org. Bugs and problems with the GENN program should be reported to EF.
Key wordsNeural network Protein scoring Windowed input Automatic learning GENN Protein structure prediction
We gratefully acknowledge the financial support provided by the National Institutes of Health (NIH) through Grants R01GM072014 and R01GM073095 and the National Science Foundation through Grant NSF MCB 1071785. Both authors would like to thank the organizers of CASP10 conference in Gaeta, Italy, for inviting them to the conference and providing free registration to EF. EF would also like to thank Yaoqi Zhou and Keith Dunker for hosting him at IUPUI and general discussions.
- 23.Gao J, Zhang T, Zhang H, Shen S, Ruan J, Kurgan L (2010) Accurate prediction of protein folding rates from sequence and sequence-derived residue flexibility and solvent accessibility. Proteins: Struct Funct Bioinf 78:2114–2130Google Scholar
- 27.Faraggi E, Kloczkowski A (2013) A global machine learning based scoring function for protein structure prediction. Proteins: Struct Funct Bioinf. doi:10.1002/prot.24454Google Scholar
- 33.Faraggi E, Yaoqi Z, Kloczkowski A (2014) Accurate single-sequence prediction of solvent accessible surface area using local and global features. Proteins: Struct, Funct, and Bioinf. DOI: 10.1002/prot.24682
- 34.CASP10 (2012) Official group performance ranking. http://www.predictioncenter.org/casp10/groups_analysis.cgi. Accessed 10 June 2012