Classification of Protein Sequences by Means of an Ensemble Classifier with an Improved Feature Selection Strategy

Sriram, Aditya; Sanapala, Mounica; Patel, Ronak; Patil, Nagamma

doi:10.1007/978-981-10-8636-6_18

Classification of Protein Sequences by Means of an Ensemble Classifier with an Improved Feature Selection Strategy

Aditya Sriram¹⁸,
Mounica Sanapala¹⁸,
Ronak Patel¹⁸ &
…
Nagamma Patil¹⁸

Conference paper
First Online: 05 November 2018

679 Accesses

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 708))

Abstract

With decreasing cost of biological sequencing, the influx of new sequences into biological databases such as NCBI, SwissProt, UniProt is increasing at an ever-growing pace. Annotating these newly sequenced proteins will aid in ground breaking discoveries for developing novel drugs and potential therapies for diseases. Previous work in this field has harnessed the high computational power of modern machines to achieve good prediction quality but at the cost of high dimensionality. To address this disparity, we propose a novel word segmentation-based feature selection strategy to classify protein sequences using a highly condensed feature set. Using an incremental classifier selection strategy was seen to yield better results than all existing methods. The antioxidant protein data curated in the previous work was used in order to facilitate a level ground for evaluation and comparison of results. The proposed method was found to outperform all existing works on this data with an accuracy of 95%.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Patil, N., Toshniwal, D., Garg, K.: Effective framework for protein structure prediction. Int. J. Func. Inform. Pers. Med. 4(1), 69–79 (2012)
Article Google Scholar
Valko, M., Rhodes, C.J., Moncola, J., Izakovic, M., Mazur, M.: Free radicals, metals and antioxidants in oxidative stress-induced cancer. Chem.-Biol. Interact. 160, 1–40 (2006). https://doi.org/10.1016/j.cbi.2005.12.009 PMID: 16430879
Article Google Scholar
DNA, RNA and Protein: The Central Dogma. http://science-explained.com/theory/dna-rna-and-protein
Feng, P.M., Lin, H., Chen, W.: Identification of antioxidants from sequence information using naïve Bayes. Comput. Math. Methods Med. 2013: 567529 (2013). https://doi.org/10.1155/2013/567529 PMID: 24062796
Google Scholar
Zhang, L.N., Zhang, C.J., Gao, R., Yang, R.T.: Incorporating g-gap dipeptide composition and position specific scoring matrix for identifying antioxidant proteins. In: 28th IEEE Canadian Conference on Electrical and Computer Engineering, Halifax, Canada (2015)
Google Scholar
Zhang, L., Zhang, C., Gao, R., Yang, R., Song, Q.: Sequence based prediction of antioxidant proteins using a classifier selection strategy. PLoS ONE 11(9), e0163274 (2016). https://doi.org/10.1371/journal.pone.0163274
Article Google Scholar
Yang, Y., Lu, B.-L., Yang, W.-Y.: Classification of protein sequences based on word segmentation methods. In: 2007 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (2007)
Google Scholar
K-mer Counting—A 2014 Recap. http://homolog.us/blogs/kmer-counting-a-2014-recap/
TF-IDF and Log Entropy Model. http://stats.stackexchange.com/difference-between-log-entropy-model-and-tf-idf-model
Radial Basis Function Network Tutorial. http://mccormickml.com/2013/08/radial-basis-function-network-rbfn-tutorial/
Machine Learning Algorithms for Classification. www.cs.princeton.edu/picasso-minicourse.html
Naive Bayes Multinomial Text Classification made easy document. http://nlp.stanford.edu/htmledition/naive-bayes-text-classification-1.html/

Download references

Author information

Authors and Affiliations

National Institute of Technology Karnataka, Surathkal, Mangalore, 575025, Karnataka, India
Aditya Sriram, Mounica Sanapala, Ronak Patel & Nagamma Patil

Authors

Aditya Sriram
View author publications
You can also search for this author in PubMed Google Scholar
Mounica Sanapala
View author publications
You can also search for this author in PubMed Google Scholar
Ronak Patel
View author publications
You can also search for this author in PubMed Google Scholar
Nagamma Patil
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aditya Sriram .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, National Institute of Technology, Rourkela, Rourkela, Odisha, India
Pankaj Kumar Sa
Department of Computer Science and Engineering, National Institute of Technology, Rourkela, Rourkela, Odisha, India
Sambit Bakshi
Department of Computer Engineering and Informatics, University of Patras, Patras, Greece
Ioannis K. Hatzilygeroudis
Department of Computer Science and Engineering, National Institute of Technology, Rourkela, Rourkela, Odisha, India
Manmath Narayan Sahoo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sriram, A., Sanapala, M., Patel, R., Patil, N. (2018). Classification of Protein Sequences by Means of an Ensemble Classifier with an Improved Feature Selection Strategy. In: Sa, P., Bakshi, S., Hatzilygeroudis, I., Sahoo, M. (eds) Recent Findings in Intelligent Computing Techniques . Advances in Intelligent Systems and Computing, vol 708. Springer, Singapore. https://doi.org/10.1007/978-981-10-8636-6_18

Download citation

DOI: https://doi.org/10.1007/978-981-10-8636-6_18
Published: 05 November 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8635-9
Online ISBN: 978-981-10-8636-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics