Classification of Proteins Using Naïve Bayes Classifier and Surface-Invariant Coordinates

  • Babasaheb S. SatputeEmail author
  • Raghav Yadav
Part of the Studies in Computational Intelligence book series (SCI, volume 771)


Protein classification is one of the challenging problems in computational biology and bioinformatics. Our aim here is to classify proteins into different families using the surface roughness similarity of proteins as a criterion. Because Protein Data Bank (PDB) ( [1]) coordinates give no indication of the orientation of the protein, we designed an invariant coordinate system (ICS) in which we took as the origin the protein’s center of gravity (CG). From PDB we found the surface residue coordinates. We then divided those coordinates into eight octants based on the sign of x, y and z coordinates. For the residues in each octant, we found the standard deviation of the coordinates and created a parameter called the surface-invariant coordinate (SIC). Thus, for every protein we obtained eight SIC values. We also made use of the Structural Classification of Proteins (SCOP) ( [2]) database. SCOP classifies proteins on the basis of the surface structure of the protein. As it is a classification problem, we used the naïve Bayes classifier algorithm for the classification to achieve better results.


Protein classification Structural classification of proteins SCOP Protein data bank PDB Surface-invariant coordinate SIC Naïve Bayes classifier 


  1. 1.
  2. 2.
  3. 3.
    Connolly, M.L. 1986. Measurement of protein surface shape by solid angles. Journal of Molecular Graphics 4: 3–6.CrossRefGoogle Scholar
  4. 4.
    Bandyopadhyay, S. 2005. An efficient technique for superfamilyclassification of amino acid sequences: Feature extraction, fuzzy clustering and prototype selection. ELSEVIER Jounal of FuzzySets and Systems 152: 5–16.zbMATHGoogle Scholar
  5. 5.
    Vipsita, S., B.K. Shee and S.K. Rath. 2010. An efficient technique for protein classification using feature extraction by artificial neural networks IEEE India conference: Green energy, computing and communication, INDICON.Google Scholar
  6. 6.
    Wang, D., and G.B. Huang. 2005. Protein sequence classification using extreme learning machine. In Proceedings of international joint conference on neural networks (IJCNN, 2005), Montreal, Canada.Google Scholar
  7. 7.
    Brink, Henrik, Joseph W. Richards, and Mark Fetherolf. Real-World Machine Learning. ISBN 9781617291920.Google Scholar
  8. 8.
    Datta, A., V. Talukdar, A. Konar, and L.C. Jain. 2009. A neural network based approach for protein structural class prediction. Journal ofIntelligent and Fuzzy Systems 20: 61–71.Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.Department of Computer Science & ITSIET, SHUATSAllahabadIndia

Personalised recommendations