Application of Intelligent Techniques for Classification of Bacteria Using Protein Sequence-Derived Features

Banerjee, Amit Kumar; Ravi, Vadlamani; Murty, U. S. N.; Sengupta, Neelava; Karuna, Batepatti

doi:10.1007/s12010-013-0268-1

Application of Intelligent Techniques for Classification of Bacteria Using Protein Sequence-Derived Features

Published: 09 May 2013

Volume 170, pages 1263–1281, (2013)
Cite this article

Applied Biochemistry and Biotechnology Aims and scope Submit manuscript

Amit Kumar Banerjee¹,
Vadlamani Ravi²,
U. S. N. Murty¹,
Neelava Sengupta¹ &
…
Batepatti Karuna¹

338 Accesses
5 Citations
Explore all metrics

Abstract

Standard molecular experimental methodologies and mathematical procedures often fail to answer many phylogeny and classification related issues. Modern artificial intelligent-based techniques, such as radial basis function, genetic algorithm, artificial neural network, and support vector machines are of ample potential in this regard. Reliance on a large number of essential parameters will aid in enhanced robustness, reliability, and better accuracy as opposed to single molecular parameter. This study was conducted with dataset of computed protein physicochemical properties belonging to 20 different bacterial genera. A total of 57 sequential and structural parameters derived from protein sequences were considered for the initial classification. Feature selection based techniques were employed to find out the most important features influencing the dataset. Various amino acids, hydrophobicity, relative sulfur percentage, and codon number were selected as important parameters during the study. Comparative analyses were performed applying RapidMiner data mining platform. Support vector machine proved to be the best method with maximum accuracy of more than 91 %.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation

Article Open access 02 January 2020

Davide Chicco & Giuseppe Jurman

Feature selection techniques for machine learning: a survey of more than two decades of research

Article 01 December 2023

Dipti Theng & Kishor K. Bhoyar

A Comparative Analysis of Logistic Regression, Random Forest and KNN Models for the Text Classification

Article 05 March 2020

Kanish Shah, Henil Patel, … Manan Shah

References

Godfray, H. C. J. (2002). Nature, 417, 17–19.
Article CAS Google Scholar
Yooseph, S., Li, W., & Sutton, G. (2008). BMC Bioinformatics, 9, 182.
Article Google Scholar
Xiao, Y., & Segal, M. R. (2008). Bioinformatics, 24(9), 1198–1205.
Article CAS Google Scholar
Rubinstein, N. D., Mayrose, I., & Pupko, T. (2009). Molecular Immunology, 46, 840–847.
Article CAS Google Scholar
Nanni, L., & Lumini, A. (2009). Neural Computing and Applications, 18, 185–192.
Article Google Scholar
Murty, U. S. N., Banerjee, A. K., & Arora, N. (2009). Interdisciplinary Sciences, 1, 173–178.
Article CAS Google Scholar
Werner, D., Martin, G., & Berrar, D. P. (Eds.). (2007). Fundamentals of data mining in genomics and proteomics, XXII (282) (p. 68). Berlin: Springer.
Google Scholar
Guarracino, M. R., Chinchuluun, A., & Pardalos, P. M. (2009). Optimization Letters, 3, 357–366.
Article Google Scholar
Banerjee, A. K., Manasa, B. P., & Murty, U. S. N. (2010). Indian Journal of Biochemistry & Biophysics, 47(6), 370–377.
CAS Google Scholar
Murty, U. S. N., Banerjee, A. K., & Arora, N. (2009). Journal of Proteomics & Bioinformatics, 2, 97–107.
Article CAS Google Scholar
Banerjee, A. K., Arora, N., & Murty, U. S. N. (2008). Elect J Biol, 4(1), 27–33.
Google Scholar
Banerjee, A. K., Arora, N., Pranitha, V., & Murty, U. S. N. (2008). Journal of Proteomics & Bioinformatics, 1, 77–089.
Article CAS Google Scholar
Zhang, L., Shao, C., Zheng, D., & Gao, Y. (2006). Molecular & Cellular Proteomics, 5(7), 1224–1232.
Article CAS Google Scholar
Ganesan, P., Tang, K., Suganthan, P. N., Archunan, G., & Sowdhamini, R. (2007). BMC Bioinformatics, 8, 351.
Article Google Scholar
King, R. D., & Sternberg, M. J. E. (1990). Journal of Molecular Biology, 216(2), 441–457.
Article CAS Google Scholar
Banerjee, A. K., Harikrishna, N., Vikram Kumar, J., & Murty, U. S. N. (2011). Applied Artificial Intelligence, 25(5), 426–439.
Article Google Scholar
Matsushita, M., & Janda, K. D. (2002). Bioorganic & Medicinal Chemistry, 10, 855–867.
Article CAS Google Scholar
Qin, Z., Zhang, J., Xu, B., Chen, L., Wu, Y., Yang, X., et al. (2006). BMC Microbiology, 6, 96.
Article Google Scholar
Deschenes, R. J., Lin, H., Ault, A. D., & Fassler, J. S. (1990). Antimicrobial Agents and Chemotherapy, 43(7), 1700–1703.
Google Scholar
Wai-Leung, N., Wei, Y., Perez, L. J., Cong, J., Long, T., Koch, M., et al. (2010). Proceedings of the National Academy of Sciences of the United States of America, 107(12), 5575–5580.
Article Google Scholar
Surette, M. G., Levit, M., Liu, Y., Lukat, G., Ninfai, E. G., Ninfai, A., et al. (1996). Journal of Biological Chemistry, 271(2), 939–945.
Article CAS Google Scholar
Alm, E., Huang, K., & Arkin, A. (2006). PLoS Computational Biology, 2(11), e143.
Article Google Scholar
Kim, D., & Forst, S. (2001). Microbiology, 147, 1197–1212.
CAS Google Scholar
Gasteiger, E., Hoogland, C., Gattiker, A., Duvaud, S., Wilkins, M. R., Walker, J. M., et al. (2005). Protein identification and analysis tools on the ExPASy server. The proteomics protocols handbook (pp. 571–607). New York: Humana Press.
Book Google Scholar
Han, J., Rodriguez, J. C., & Beheshti, M. (2008). Second International Conference on Future Generation Communication and Networking, 3, 96–99.
Demner-Fushman, D., Antani, S., Simpson, M., & Thoma, G. R. (2009). International Journal of Medical Informatics, 78, e59–e67.
Article Google Scholar
Nguyen, N. T., Kowalczyk, R., & Chen, S. M. (Eds.). (2009). ICCCI LNAI, 5796, pp. 800–812.
Vapnik, V. N. (1995). The nature of statistical learning theory. New York: Springer.
Book Google Scholar
Lin, Y. C., Hwang, K. S., & Wang, F. S. (2002). Hybrid differential evolution with multiplier updating method for nonlinear constrained optimization problems. In: Computational Intelligence, WCCI, Proceedings of the 2002 World Congress, 1, pp. 872–877.
Cristianini, N., & Shawe-Taylor, J. (2000). An introduction to support vector machines. Cambridge: Cambridge University Press.
Google Scholar
Cortes, C., & Vapnik, V. N. (1995). Machine Learning, 20, 273–297.
Google Scholar
Ames, C., Turner, B., & Daniel, B. (2006). Estimating the post-mortem interval (I): the use of genetic markers to aid in identification of Dipteran species and subpopulations. International Congress Series, 1288, 795–797.
Article CAS Google Scholar

Download references

Author information

Authors and Affiliations

Bioinformatics Group, Biology Division, Indian Institute of Chemical Technology (C.S.I.R), Tarnaka, Uppal Road, Hyderabad, AP, India
Amit Kumar Banerjee, U. S. N. Murty, Neelava Sengupta & Batepatti Karuna
Institute for Development and Research in Banking Technology (IDBRT), Masab Tank, Hyderabad, AP, India
Vadlamani Ravi

Authors

Amit Kumar Banerjee
View author publications
You can also search for this author in PubMed Google Scholar
Vadlamani Ravi
View author publications
You can also search for this author in PubMed Google Scholar
U. S. N. Murty
View author publications
You can also search for this author in PubMed Google Scholar
Neelava Sengupta
View author publications
You can also search for this author in PubMed Google Scholar
Batepatti Karuna
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vadlamani Ravi.

Electronic Supplementary Material

Below is the link to the electronic supplementary material.

ESM 1

(DOC 80 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Banerjee, A.K., Ravi, V., Murty, U.S.N. et al. Application of Intelligent Techniques for Classification of Bacteria Using Protein Sequence-Derived Features. Appl Biochem Biotechnol 170, 1263–1281 (2013). https://doi.org/10.1007/s12010-013-0268-1

Download citation

Received: 16 December 2012
Accepted: 24 April 2013
Published: 09 May 2013
Issue Date: July 2013
DOI: https://doi.org/10.1007/s12010-013-0268-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Application of Intelligent Techniques for Classification of Bacteria Using Protein Sequence-Derived Features

Abstract

Access this article

Similar content being viewed by others

The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation

Feature selection techniques for machine learning: a survey of more than two decades of research

A Comparative Analysis of Logistic Regression, Random Forest and KNN Models for the Text Classification

References

Author information

Authors and Affiliations

Corresponding author

Electronic Supplementary Material

ESM 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Application of Intelligent Techniques for Classification of Bacteria Using Protein Sequence-Derived Features

Abstract

Access this article

Similar content being viewed by others

The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation

Feature selection techniques for machine learning: a survey of more than two decades of research

A Comparative Analysis of Logistic Regression, Random Forest and KNN Models for the Text Classification

References

Author information

Authors and Affiliations

Corresponding author

Electronic Supplementary Material

ESM 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation