Protein Sequence Classification Involving Data Mining Technique: A Review

  • Suprativ SahaEmail author
  • Tanmay Bhattacharya
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 767)


In the field of bio-informatics, size of the bio-database is increasing at an exponential rate. In this scenario, traditional data analysis procedure fails to classify it. Currently, a lot of classification techniques involving data mining are used to classify biological data, like protein sequence. In this paper, most popular classification techniques, like neural network-based classifier, fuzzy ARTMAP-based classifier, and rough set classifier are reviewed with the proper limitation. The accuracy level and computational time are also been analyzed in this review. At the end, an idea is proposed which can increase the accuracy level with low computational overhead.


Data mining Neural network Fuzzy ARTMAP Rough set String kernel Protein-hashing SVM/GA 


  1. 1.
    T.L. Jason et al., Application of Neural Networks to Biological Data Mining: A case study in Protein Sequence Classification (KDD, Boston, 2000), pp. 305–309Google Scholar
  2. 2.
    C. Wu, M. Berry, S. Shivakumar, J. Mclarty, Neural Networks for Full-Scale Protein Sequence Classification: Sequence Encoding with Singular Value Decomposition (Kluwer Academic Publishers, Boston, Machine Learning, 1995), pp. 177–193Google Scholar
  3. 3.
    Z. Zainuddin, M. Kumar, Radial basic function neural networks in protein sequence classification. MJMS 2(2), 195–204 (2008)Google Scholar
  4. 4.
    P.V. Nageswara Rao, T. Uma Devi, D. Kaladhar, G. Sridhar, A.A. Rao (2009) A probabilistic neural network approach for protein superfamily classification. J. Theor. Appl. Inf. TechnolGoogle Scholar
  5. 5.
    S. Mohamed, D. Rubin, T. Marwala, in Multi-class Protein Sequence Classification Using Fuzzy ARTMAP. IEEE Conference. (2006) pp. 1676–1680Google Scholar
  6. 6.
    E.G. Mansoori et al., Generating fuzzy rules for protein classification. Iran. J. Fuzzy Syst. 5(2), 21–33 (2008)MathSciNetzbMATHGoogle Scholar
  7. 7.
    E.G. Mansoori, M.J. Zolghadri, S.D. Katebi, Protein superfamily classification using fuzzy rule-based classifier. IEEE Trans. Nanobiosci. 8(1), 92–99 (2009)CrossRefGoogle Scholar
  8. 8.
    S.A. Rahman, A.A. Bakar, Z.A.M. Hussein, in Feature Selection and Classification of Protein Subfamilies Using Rough Sets. International Conference on Electrical Engineering and Informatics. (Selangor, Malaysia, 2009)Google Scholar
  9. 9.
    Z. Pawlak (2002) Rough set theory and its applications, J. Telecommun. Inf. TechnolGoogle Scholar
  10. 10.
    R. Yellasiri, C.R. Rao, Rough set protein classifier. J. Theor. Appl. Inform. Technol (2009)Google Scholar
  11. 11.
    S. Saha, R. Chaki (2012) Application of data mining in protein sequence classification. IJDMS. 4(5)CrossRefGoogle Scholar
  12. 12.
    J.D. Spalding, D.C. Hoyle, Accuracy of String Kernels for Protein Sequence Classification, ICAPR 2005, vol. 3686. (Springer (LNCS), 2005)CrossRefGoogle Scholar
  13. 13.
    N.M. Zaki, S. Deri, R.M. Illias, Protein sequences classification based on string weighting scheme. Int. J. Comput. Internet Manage. 13(1), 50–60 (2005)Google Scholar
  14. 14.
    A.F. Ali, D.M. Shawky, A novel approach for protein classification using fourier transform. IJEAS 6(4), 2010 (2010)Google Scholar
  15. 15.
    R. Busa-Fekete, A. Kocsor, S. Pongor (2010) Tree-based algorithms for protein classification. Int. J. Comput. Sci. Eng. (IJCSE)Google Scholar
  16. 16.
    K. Boujenfa, N. Essoussi, M. Limam, Tree-kNN: A tree-based algorithm for protein sequence classification. IJCSE 3, 961–968 (2011)Google Scholar
  17. 17.
    P. Desai, Sequence Classification Using Hidden Markov Model (2005)Google Scholar
  18. 18.
    M.M. Rahman, A.U. Alam, A. Al-Mamun, T.E. Mursalin, A more appropriate protein classification using data mining. JATIT, 33–43 (2010)Google Scholar
  19. 19.
    D. Bolser et al., Visualization and graph-theoretic analysis of a large-scale protein structural interactome. BMC Bioinformatics 4, 1–11 (2003)CrossRefGoogle Scholar
  20. 20.
    C. Caragea, A. Silvescu, P. Mitra, Protein sequence classification using feature hashing. Proteome Sci. 10(Suppl 1), S14 (2012)CrossRefGoogle Scholar
  21. 21.
    X.M. Zhao et al., A Novel Hybrid GA/SVM System for Protein Sequences Classification, IDEAL 2004, vol. 3177. (Springer(LNCS), 2004), pp. 11–16Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringBrainware UniversityBarasat, KolkataIndia
  2. 2.Department of Information TechnologyTechno IndiaSalt Lake, KolkataIndia

Personalised recommendations