Skip to main content

Advertisement

Log in

Application of Intelligent Techniques for Classification of Bacteria Using Protein Sequence-Derived Features

  • Published:
Applied Biochemistry and Biotechnology Aims and scope Submit manuscript

Abstract

Standard molecular experimental methodologies and mathematical procedures often fail to answer many phylogeny and classification related issues. Modern artificial intelligent-based techniques, such as radial basis function, genetic algorithm, artificial neural network, and support vector machines are of ample potential in this regard. Reliance on a large number of essential parameters will aid in enhanced robustness, reliability, and better accuracy as opposed to single molecular parameter. This study was conducted with dataset of computed protein physicochemical properties belonging to 20 different bacterial genera. A total of 57 sequential and structural parameters derived from protein sequences were considered for the initial classification. Feature selection based techniques were employed to find out the most important features influencing the dataset. Various amino acids, hydrophobicity, relative sulfur percentage, and codon number were selected as important parameters during the study. Comparative analyses were performed applying RapidMiner data mining platform. Support vector machine proved to be the best method with maximum accuracy of more than 91 %.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Godfray, H. C. J. (2002). Nature, 417, 17–19.

    Article  CAS  Google Scholar 

  2. Yooseph, S., Li, W., & Sutton, G. (2008). BMC Bioinformatics, 9, 182.

    Article  Google Scholar 

  3. Xiao, Y., & Segal, M. R. (2008). Bioinformatics, 24(9), 1198–1205.

    Article  CAS  Google Scholar 

  4. Rubinstein, N. D., Mayrose, I., & Pupko, T. (2009). Molecular Immunology, 46, 840–847.

    Article  CAS  Google Scholar 

  5. Nanni, L., & Lumini, A. (2009). Neural Computing and Applications, 18, 185–192.

    Article  Google Scholar 

  6. Murty, U. S. N., Banerjee, A. K., & Arora, N. (2009). Interdisciplinary Sciences, 1, 173–178.

    Article  CAS  Google Scholar 

  7. Werner, D., Martin, G., & Berrar, D. P. (Eds.). (2007). Fundamentals of data mining in genomics and proteomics, XXII (282) (p. 68). Berlin: Springer.

    Google Scholar 

  8. Guarracino, M. R., Chinchuluun, A., & Pardalos, P. M. (2009). Optimization Letters, 3, 357–366.

    Article  Google Scholar 

  9. Banerjee, A. K., Manasa, B. P., & Murty, U. S. N. (2010). Indian Journal of Biochemistry & Biophysics, 47(6), 370–377.

    CAS  Google Scholar 

  10. Murty, U. S. N., Banerjee, A. K., & Arora, N. (2009). Journal of Proteomics & Bioinformatics, 2, 97–107.

    Article  CAS  Google Scholar 

  11. Banerjee, A. K., Arora, N., & Murty, U. S. N. (2008). Elect J Biol, 4(1), 27–33.

    Google Scholar 

  12. Banerjee, A. K., Arora, N., Pranitha, V., & Murty, U. S. N. (2008). Journal of Proteomics & Bioinformatics, 1, 77–089.

    Article  CAS  Google Scholar 

  13. Zhang, L., Shao, C., Zheng, D., & Gao, Y. (2006). Molecular & Cellular Proteomics, 5(7), 1224–1232.

    Article  CAS  Google Scholar 

  14. Ganesan, P., Tang, K., Suganthan, P. N., Archunan, G., & Sowdhamini, R. (2007). BMC Bioinformatics, 8, 351.

    Article  Google Scholar 

  15. King, R. D., & Sternberg, M. J. E. (1990). Journal of Molecular Biology, 216(2), 441–457.

    Article  CAS  Google Scholar 

  16. Banerjee, A. K., Harikrishna, N., Vikram Kumar, J., & Murty, U. S. N. (2011). Applied Artificial Intelligence, 25(5), 426–439.

    Article  Google Scholar 

  17. Matsushita, M., & Janda, K. D. (2002). Bioorganic & Medicinal Chemistry, 10, 855–867.

    Article  CAS  Google Scholar 

  18. Qin, Z., Zhang, J., Xu, B., Chen, L., Wu, Y., Yang, X., et al. (2006). BMC Microbiology, 6, 96.

    Article  Google Scholar 

  19. Deschenes, R. J., Lin, H., Ault, A. D., & Fassler, J. S. (1990). Antimicrobial Agents and Chemotherapy, 43(7), 1700–1703.

    Google Scholar 

  20. Wai-Leung, N., Wei, Y., Perez, L. J., Cong, J., Long, T., Koch, M., et al. (2010). Proceedings of the National Academy of Sciences of the United States of America, 107(12), 5575–5580.

    Article  Google Scholar 

  21. Surette, M. G., Levit, M., Liu, Y., Lukat, G., Ninfai, E. G., Ninfai, A., et al. (1996). Journal of Biological Chemistry, 271(2), 939–945.

    Article  CAS  Google Scholar 

  22. Alm, E., Huang, K., & Arkin, A. (2006). PLoS Computational Biology, 2(11), e143.

    Article  Google Scholar 

  23. Kim, D., & Forst, S. (2001). Microbiology, 147, 1197–1212.

    CAS  Google Scholar 

  24. Gasteiger, E., Hoogland, C., Gattiker, A., Duvaud, S., Wilkins, M. R., Walker, J. M., et al. (2005). Protein identification and analysis tools on the ExPASy server. The proteomics protocols handbook (pp. 571–607). New York: Humana Press.

    Book  Google Scholar 

  25. Han, J., Rodriguez, J. C., & Beheshti, M. (2008). Second International Conference on Future Generation Communication and Networking, 3, 96–99.

  26. Demner-Fushman, D., Antani, S., Simpson, M., & Thoma, G. R. (2009). International Journal of Medical Informatics, 78, e59–e67.

    Article  Google Scholar 

  27. Nguyen, N. T., Kowalczyk, R., & Chen, S. M. (Eds.). (2009). ICCCI LNAI, 5796, pp. 800–812.

  28. Vapnik, V. N. (1995). The nature of statistical learning theory. New York: Springer.

    Book  Google Scholar 

  29. Lin, Y. C., Hwang, K. S., & Wang, F. S. (2002). Hybrid differential evolution with multiplier updating method for nonlinear constrained optimization problems. In: Computational Intelligence, WCCI, Proceedings of the 2002 World Congress, 1, pp. 872–877.

  30. Cristianini, N., & Shawe-Taylor, J. (2000). An introduction to support vector machines. Cambridge: Cambridge University Press.

    Google Scholar 

  31. Cortes, C., & Vapnik, V. N. (1995). Machine Learning, 20, 273–297.

    Google Scholar 

  32. Ames, C., Turner, B., & Daniel, B. (2006). Estimating the post-mortem interval (I): the use of genetic markers to aid in identification of Dipteran species and subpopulations. International Congress Series, 1288, 795–797.

    Article  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vadlamani Ravi.

Electronic Supplementary Material

Below is the link to the electronic supplementary material.

ESM 1

(DOC 80 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Banerjee, A.K., Ravi, V., Murty, U.S.N. et al. Application of Intelligent Techniques for Classification of Bacteria Using Protein Sequence-Derived Features. Appl Biochem Biotechnol 170, 1263–1281 (2013). https://doi.org/10.1007/s12010-013-0268-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12010-013-0268-1

Keywords

Navigation