A study to find a potent feature by combining the various disulphide bonds of protein using data mining technique

Saha, Suprativ; Paul, Twinkle; Bhattacharya, Tanmay

doi:10.1007/s13721-021-00311-9

A study to find a potent feature by combining the various disulphide bonds of protein using data mining technique

Original Article
Published: 25 May 2021

Volume 10, article number 36, (2021)
Cite this article

Network Modeling Analysis in Health Informatics and Bioinformatics Aims and scope Submit manuscript

85 Accesses
1 Citation
Explore all metrics

Abstract

In the sphere of bioinformatics, the identification of an effective protein feature, is of the essence. The fruitfulness of any classification technique, relies heavily on the identification of informative and distinct features. Various pre-existing classifiers recognised the use of a single type of disulphide bond (viz, parallel, or alternate) as a useful feature. However, the computational efficiency may be increased by the identification of appropriate combination of disulphide bonds, as a single feature. Hence, in this paper, the various combinations of disulphide bonds have been studied, to formulate a potent protein feature. It can be utilised in various studies, for achieving better protein classification results, without incorporating redundant data. After that, a data mining approach has been applied on the seven different combinations of disulphide bonds (viz. parallel, alternate and quad) to identify the best feature. A statistical analysis conducted in terms of confusion matrix and various point metrics (such as, sensitivity, specificity, recall and precision), resulted in a high level of accuracy and F score, for the feature, formed by the combination of two disulphide bonds i.e. alternative and quad bond. The average F Score achieved in this combination is approximately, 0.9 and the average accuracy level turned out to be more than 93%. These turn out to be an unprecedented level of precision for any individual feature, considered so far, in any research methodology. Also, the combination of two disulphide bonds instead of three ensures less computational time. The overall analytical results, in this study, revealed that the combination of alternative and quad disulphide bonds can be used as an effective feature in any form of protein classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Prediction of protein structural class using tri-gram probabilities of position-specific scoring matrix and recursive feature elimination

Article 13 January 2015

A Brief Review on Protein Classification Based on Functional, Behavioral, and Structural Properties Using Data Mining Techniques

An Empirical Investigation of Discretization Techniques on the Classification of Protein–Protein Interaction

References

Ali AF, Shawky DM (2010) A novel approach for protein classification using fourier transform. Int J Eng Appl Sci 6:4
Google Scholar
AlQuraishi M (2019) ProteinNet: a standardized data set for machine learning of protein structure. BMC Bioinform 20: 311
Bolser DM, Dafas P, Harrington R, Park J, Schroeder M (2003) Visualization and graph-theoretic analysis of a large-scale protein structural interactome. BMC Bioinform 4, 45,1471–2105, 1–11
Boujenfa K, Essoussi N, Limam M (2011) Tree-kNN: a tree-based algorithm for protein sequence classification. In: International journal on computer science and engineering (IJCSE), vol 3, ISSN: 0975-3397, pp 961–968
Caragea C, Silvescu A, Mitra P (2012) Protein sequence classification using feature hashing. Proteome Sci 10(Suppl 1):S14. https://doi.org/10.1186/1477-5956-10-S1-S14
Article Google Scholar
Desai P (2005) Sequence classification using hidden markov models. https://etd.ohiolink.edu/
Ghosh SK, Ghosh A, Chakrabarti A (2018) VEA: vessel extraction algorithm by active contour model and a novel wavelet analyzer for diabetic retinopathy detection. Int J Image Gr 18(02):1850008
Article MathSciNet Google Scholar
Jain P et al (2009) Supervised machine learning algorithms for protein structure classification. Comput Biol Chem 33:216–223
Article Google Scholar
Jain P, Hirst JD (2010) Automatic structure classification of small proteins using random forest. BMC Bioinform 11:364
Article Google Scholar
John M et al (2018) Critical assessment of methods of protein structure prediction (CASP) round XII. Proteins Struct Funct Bioinforma 86(S1):7–15
Google Scholar
Kumar AV, Ali RFM, Cao Y, Krishnan VV (2015) Application of data mining tools for classification of protein structural class from residue based averaged NMR chemical shifts. Biochim Biophys Acta 1854(10):1545–1552
Article Google Scholar
Lu CH et al (2007) Predicting disulfide connectivity patterns. Proteins 67:262–270
Article Google Scholar
Mansoori EG, Zolghadri MJ, Katebi SD, Mohabatkar H, Boostani R, Sadreddini MH (2008) Generating fuzzy rules for protein classification. Iran J Fuzzy Syst 5(2):21–33
MathSciNet MATH Google Scholar
Mohamed S, Rubin D, Marwala T (2006) Multi-class protein sequence classification using fuzzy ARTMAP. In: IEEE conference, pp 1676–1680
Murzin AG et al (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 247:536–540
Google Scholar
Nageswara Rao PV, Uma Devi T, Kaladhar D, Sridhar GR, Appa RA (2009) A probabilistic neural network approach for protein superfamily classification. J Theor Appl Inf Technol
Pawlak Z (2002) Rough set theory and its applications. J Telecommun Inf Technol 3:7–10
Google Scholar
Rahman MM, Alam AU, Abdullah-Al-Mamun, Mursalin TE (2010) A more appropriate protein classification using data mining. J Theor Appl Inf Technol (JATIT):33–43
Saha S, Bhattacharya T (2018) A new protein sequence classification approach using positional-average values of features AISC, SoCTA2018. Springer, Jalandhar
Google Scholar
Saha S, Bhattacharya T (2018) A novel approach to find the saturation point of n-gram encoding method for protein sequence classification involving data mining. In: LNNS, Springer, vol 56, ICICC-2018, Delhi, pp 101–108
Saha S, Bhattacharya T (2019) An approach to find proper execution parameters of n-gram encoding method for protein sequence classification. In: CCIS, Springer, vol 1046, ICACDS-2019, Ghaziabad, India, pp 294–303
Saha S, Chaki R (2012) A brief review of data mining application involving protein sequence classification, AISC, Springer, ACITY 2012. Chennai, India 177, pp 469–477
Saha S, Chaki R (2012) Application of data mining in protein sequence classification. In: International journal of database management systems (IJDMS), vol 4, no. 5
Seavey BR, Farr EA, Westler WM, Markley JL (1991) A relational database for sequence-specific protein NMR data. J Biomol NMR 1:217–236
Article Google Scholar
Song J, Yuan Z, Tan H, Huber T, Burrage K (2007) Predicting disulfide connectivity from protein sequence using multiple sequence feature vectors and secondary structure. Bioinformatics 23(23):3147–3154
Article Google Scholar
Spalding JD, Hoyle DC (2005) Accuracy of string kernels for protein sequence classification. In: ICAPR 2005. LNCS, Springer, vol 3686
Wang JTL, Ma QH, Shasha D, Wu CH (2000) Application of neural networks to biological data mining: a case study in protein sequence classification. KDD, Boston, MA, USA, pp 305–309
Watts DJ, Strogatz SH (1998) Collective dynamics of “small-world’’ networks. Nature 393(6684):440–2
Article Google Scholar
Yellasiri R, Rao CR (2009) Rough set protein classifier. J Theor Appl Inf Technol
Zainuddin Z et al (2008) Radial basic function neural networks in protein sequence classification. Malays J Math Sci 2:195–204
Google Scholar
Zaki NM, Deri S, Illias RM (2005) Protein sequences classification based on string weighting scheme. Int J Comput Internet Manag 13:50–60
Google Scholar
Zhang HY, Neal S, Wishart DS (2003) RefDB: a database of uniformly referenced protein chemical shifts. J Biomol NMR 25:173–195
Article Google Scholar
Zhao X-M, Huang D-S, Cheung Y, Wang H, Huang X (2004) A novel hybrid GA/SVM system for protein sequences classification. IDEAL 2004. LNCS Springer 3177:11–16
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Brainware University, Barasat, Kolkata, 700125, India
Suprativ Saha & Twinkle Paul
Department of Information Technology, Techno India, Saltlake, Kolkata, 700091, India
Tanmay Bhattacharya

Authors

Suprativ Saha
View author publications
You can also search for this author in PubMed Google Scholar
Twinkle Paul
View author publications
You can also search for this author in PubMed Google Scholar
Tanmay Bhattacharya
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Suprativ Saha.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Saha, S., Paul, T. & Bhattacharya, T. A study to find a potent feature by combining the various disulphide bonds of protein using data mining technique. Netw Model Anal Health Inform Bioinforma 10, 36 (2021). https://doi.org/10.1007/s13721-021-00311-9

Download citation

Received: 05 October 2020
Revised: 29 April 2021
Accepted: 29 April 2021
Published: 25 May 2021
DOI: https://doi.org/10.1007/s13721-021-00311-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A study to find a potent feature by combining the various disulphide bonds of protein using data mining technique

Abstract

Access this article

Similar content being viewed by others

Prediction of protein structural class using tri-gram probabilities of position-specific scoring matrix and recursive feature elimination

A Brief Review on Protein Classification Based on Functional, Behavioral, and Structural Properties Using Data Mining Techniques

An Empirical Investigation of Discretization Techniques on the Classification of Protein–Protein Interaction

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A study to find a potent feature by combining the various disulphide bonds of protein using data mining technique

Abstract

Access this article

Similar content being viewed by others

Prediction of protein structural class using tri-gram probabilities of position-specific scoring matrix and recursive feature elimination

A Brief Review on Protein Classification Based on Functional, Behavioral, and Structural Properties Using Data Mining Techniques

An Empirical Investigation of Discretization Techniques on the Classification of Protein–Protein Interaction

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation