Skip to main content

Classification of Protein Sequences by Means of an Ensemble Classifier with an Improved Feature Selection Strategy

  • Conference paper
  • First Online:
  • 679 Accesses

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 708))

Abstract

With decreasing cost of biological sequencing, the influx of new sequences into biological databases such as NCBI, SwissProt, UniProt is increasing at an ever-growing pace. Annotating these newly sequenced proteins will aid in ground breaking discoveries for developing novel drugs and potential therapies for diseases. Previous work in this field has harnessed the high computational power of modern machines to achieve good prediction quality but at the cost of high dimensionality. To address this disparity, we propose a novel word segmentation-based feature selection strategy to classify protein sequences using a highly condensed feature set. Using an incremental classifier selection strategy was seen to yield better results than all existing methods. The antioxidant protein data curated in the previous work was used in order to facilitate a level ground for evaluation and comparison of results. The proposed method was found to outperform all existing works on this data with an accuracy of 95%.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Patil, N., Toshniwal, D., Garg, K.: Effective framework for protein structure prediction. Int. J. Func. Inform. Pers. Med. 4(1), 69–79 (2012)

    Article  Google Scholar 

  2. Valko, M., Rhodes, C.J., Moncola, J., Izakovic, M., Mazur, M.: Free radicals, metals and antioxidants in oxidative stress-induced cancer. Chem.-Biol. Interact. 160, 1–40 (2006). https://doi.org/10.1016/j.cbi.2005.12.009 PMID: 16430879

    Article  Google Scholar 

  3. DNA, RNA and Protein: The Central Dogma. http://science-explained.com/theory/dna-rna-and-protein

  4. Feng, P.M., Lin, H., Chen, W.: Identification of antioxidants from sequence information using naïve Bayes. Comput. Math. Methods Med. 2013: 567529 (2013). https://doi.org/10.1155/2013/567529 PMID: 24062796

    Google Scholar 

  5. Zhang, L.N., Zhang, C.J., Gao, R., Yang, R.T.: Incorporating g-gap dipeptide composition and position specific scoring matrix for identifying antioxidant proteins. In: 28th IEEE Canadian Conference on Electrical and Computer Engineering, Halifax, Canada (2015)

    Google Scholar 

  6. Zhang, L., Zhang, C., Gao, R., Yang, R., Song, Q.: Sequence based prediction of antioxidant proteins using a classifier selection strategy. PLoS ONE 11(9), e0163274 (2016). https://doi.org/10.1371/journal.pone.0163274

    Article  Google Scholar 

  7. Yang, Y., Lu, B.-L., Yang, W.-Y.: Classification of protein sequences based on word segmentation methods. In: 2007 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (2007)

    Google Scholar 

  8. K-mer Counting—A 2014 Recap. http://homolog.us/blogs/kmer-counting-a-2014-recap/

  9. TF-IDF and Log Entropy Model. http://stats.stackexchange.com/difference-between-log-entropy-model-and-tf-idf-model

  10. Radial Basis Function Network Tutorial. http://mccormickml.com/2013/08/radial-basis-function-network-rbfn-tutorial/

  11. Machine Learning Algorithms for Classification. www.cs.princeton.edu/picasso-minicourse.html

  12. Naive Bayes Multinomial Text Classification made easy document. http://nlp.stanford.edu/htmledition/naive-bayes-text-classification-1.html/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aditya Sriram .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sriram, A., Sanapala, M., Patel, R., Patil, N. (2018). Classification of Protein Sequences by Means of an Ensemble Classifier with an Improved Feature Selection Strategy. In: Sa, P., Bakshi, S., Hatzilygeroudis, I., Sahoo, M. (eds) Recent Findings in Intelligent Computing Techniques . Advances in Intelligent Systems and Computing, vol 708. Springer, Singapore. https://doi.org/10.1007/978-981-10-8636-6_18

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-8636-6_18

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-8635-9

  • Online ISBN: 978-981-10-8636-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics