APSLAP: An Adaptive Boosting Technique for Predicting Subcellular Localization of Apoptosis Protein

Abstract

Apoptotic proteins play key roles in understanding the mechanism of programmed cell death. Knowledge about the subcellular localization of apoptotic protein is constructive in understanding the mechanism of programmed cell death, determining the functional characterization of the protein, screening candidates in drug design, and selecting protein for relevant studies. It is also proclaimed that the information required for determining the subcellular localization of protein resides in their corresponding amino acid sequence. In this work, a new biological feature, class pattern frequency of physiochemical descriptor, was effectively used in accordance with the amino acid composition, protein similarity measure, CTD (composition, translation, and distribution) of physiochemical descriptors, and sequence similarity to predict the subcellular localization of apoptosis protein. AdaBoost with the weak learner as Random-Forest was designed for the five modules and prediction is made based on the weighted voting system. Bench mark dataset of 317 apoptosis proteins were subjected to prediction by our system and the accuracy was found to be 100.0 and 92.4 %, and 90.1 % for self-consistency test, jack-knife test, and tenfold cross validation test respectively, which is 0.9 % higher than that of other existing methods. Beside this, the independent data (N151 and ZW98) set prediction resulted in the accuracy of 90.7 and 87.7 %, respectively. These results show that the protein feature represented by a combined feature vector along with AdaBoost algorithm holds well in effective prediction of subcellular localization of apoptosis proteins. The user friendly web interface “APSLAP” has been constructed, which is freely available at http://apslap.bicpu.edu.in and it is anticipated that this tool will play a significant role in determining the specific role of apoptosis proteins with reliability.

This is a preview of subscription content, log in to check access.

Fig. 1

References

  1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410. doi:10.1016/S0022-2836(05)80360-2

    Google Scholar 

  2. Binkowski TA, Adamian L, Liang J (2003) Inferring functional relationships of proteins from local sequence and spatial surface patterns. J Mol Biol 332(2):505–526

    Article  Google Scholar 

  3. Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  Google Scholar 

  4. Bulashevska A, Eils R (2006) Predicting protein subcellular locations using hierarchical ensemble of Bayesian classifiers based on Markov chains. BMC Bioinform 7:298. doi:10.1186/1471-2105-7-298

    Article  Google Scholar 

  5. Carr K, Murray E, Armah E, He RL, Yau SS (2010) A rapid method for characterization of protein relatedness using feature vectors. PLoS One 5(3):e9550. doi:10.1371/journal.pone.0009550

    Article  Google Scholar 

  6. Chen Y, Li Q (2004) Prediction of the subcellular location apoptosis proteins using the algorithm of measure of diversity. Acta Sci Nat Univ NeiMongol 25:413–417

    Google Scholar 

  7. Chen YL, Li QZ (2007) Prediction of the subcellular location of apoptosis proteins. J Theor Biol 245(4):775–783. doi:10.1016/j.jtbi.2006.11.010

    Article  Google Scholar 

  8. Chou KC (1995) A novel approach to predicting protein structural classes in a (20-1)-D amino acid composition space. Proteins 21(4):319–344. doi:10.1002/prot.340210406

    Article  Google Scholar 

  9. Chou KC, Shen HB (2007) Recent progress in protein subcellular location prediction. Anal Biochem 370(1):1–16. doi:10.1016/j.ab.2007.07.006

    Article  Google Scholar 

  10. Deng M, Yu C, Liang Q, He RL, Yau SS (2011) A novel method of characterizing genetic sequences: genome space with biological distance and applications. PLoS ONE 6(3):e17293. doi:10.1371/journal.pone.0017293

    Article  Google Scholar 

  11. Dietterich TG (2000) An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Mach Learn 40(2):139–157

    Article  Google Scholar 

  12. Ding CH, Dubchak I (2001) Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics 17(4):349–358

    Article  Google Scholar 

  13. Ding Y-S, Zhang T-L (2008) Using Chou’s pseudo amino acid composition to predict subcellular localization of apoptosis proteins: an approach with immune genetic algorithm-based ensemble classifier. Pattern Recogn Lett 29(13):1887–1892

    Article  Google Scholar 

  14. Dubchak I, Muchnik I, Holbrook SR, Kim SH (1995) Prediction of protein folding class using global description of amino acid sequence. Proc Natl Acad Sci USA 92(19):8700–8704

    Article  Google Scholar 

  15. Freund Y, Schapire R (1996) Experiments with a new boosting algorithm. In: International conference on machine learning, pp 148–156

  16. Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139

    Article  Google Scholar 

  17. Gu Q, Ding YS, Jiang XY, Zhang TL (2010) Prediction of subcellular location apoptosis proteins with ensemble classifier and feature selection. Amino Acids 38(4):975–983. doi:10.1007/s00726-008-0209-4

    Article  Google Scholar 

  18. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD Explor Newslett 11(1):10–18

    Article  Google Scholar 

  19. Huang J, Shi F (2005) Support vector machines for predicting apoptosis proteins types. Acta Biotheor 53(1):39–47. doi:10.1007/s10441-005-7002-5

    Article  Google Scholar 

  20. Hunter S, Jones P, Mitchell A, Apweiler R, Attwood TK, Bateman A, Bernard T, Binns D, Bork P, Burge S, de Castro E, Coggill P, Corbett M, Das U, Daugherty L, Duquenne L, Finn RD, Fraser M, Gough J, Haft D, Hulo N, Kahn D, Kelly E, Letunic I, Lonsdale D, Lopez R, Madera M, Maslen J, McAnulla C, McDowall J, McMenamin C, Mi H, Mutowo-Muellenet P, Mulder N, Natale D, Orengo C, Pesseat S, Punta M, Quinn AF, Rivoire C, Sangrador-Vegas A, Selengut JD, Sigrist CJ, Scheremetjew M, Tate J, Thimmajanarthanan M, Thomas PD, Wu CH, Yeats C, Yong SY (2012) InterPro in 2011: new developments in the family and domain prediction database. Nucleic acids research 40 (Database issue):D306-312. doi:10.1093/nar/gkr948

  21. Jiang X, Wei R, Zhang T, Gu Q (2008) Using the concept of Chou’s pseudo amino acid composition to predict apoptosis proteins subcellular location: an approach by approximate entropy. Protein Pept Lett 15:392–396

    Google Scholar 

  22. Kandaswamy KK, Pugalenthi G, Moller S, Hartmann E, Kalies KU, Suganthan PN, Martinetz T (2010) Prediction of apoptosis protein locations with genetic algorithms and support vector machines through a new mode of pseudo amino acid composition. Protein Pept Lett 17(12):1473–1479

    Article  Google Scholar 

  23. Kerr JF, Wyllie AH, Currie AR (1972) Apoptosis: a basic biological phenomenon with wide-ranging implications in tissue kinetics. Br J Cancer 26(4):239–257

    Article  Google Scholar 

  24. Liao B, Jiang JB, Zeng QG, Zhu W (2011) Predicting apoptosis protein subcellular location with PseAAC by incorporating tripeptide composition. Protein Pept Lett 18(11):1086–1092

    Article  Google Scholar 

  25. Lin H, Wang H, Ding H, Chen YL, Li QZ (2009) Prediction of subcellular localization of apoptosis protein using Chou’s pseudo amino acid composition. Acta Biotheor 57(3):321–330. doi:10.1007/s10441-008-9067-4

    Article  Google Scholar 

  26. Matsuda S, Vert JP, Saigo H, Ueda N, Toh H, Akutsu T (2005) A novel representation of protein sequences for prediction of subcellular location using support vector machines. Protein Sci Publ Protein Soc 14(11):2804–2813. doi:10.1110/ps.051597405

    Article  Google Scholar 

  27. Petersen TN, Brunak S, von Heijne G, Nielsen H (2011) SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat Methods 8(10):785–786. doi:10.1038/nmeth.1701

    Article  Google Scholar 

  28. Raff M (1998) Cell suicide for beginners. Nature 396(6707):119–122. doi:10.1038/24055

    Article  Google Scholar 

  29. Saravanan V, Lakshmi PT (2013) SCLAP: an adaptive boosting method for predicting subchloroplast localization of plant proteins. OMICS 17(2):106–115. doi:10.1089/omi.2012.0070

    Article  Google Scholar 

  30. Schapire RE, Singer Y (1999) Improved boosting algorithms using confidence-rated predictions. Mach Learn 37(3):297–336

    Article  Google Scholar 

  31. Schulz JB, Weller M, Moskowitz MA (1999) Caspases as treatment targets in stroke and neurodegenerative diseases. Ann Neurol 45(4):421–429

    Article  Google Scholar 

  32. Shen HB, Chou KC (2006) Ensemble classifier for protein fold pattern recognition. Bioinformatics 22(14):1717–1722. doi:10.1093/bioinformatics/btl170

    Article  Google Scholar 

  33. Suzuki M, Youle RJ, Tjandra N (2000) Structure of Bax: coregulation of dimer formation and intracellular localization. Cell 103(4):645–654

    Article  Google Scholar 

  34. Tantoso E, Li KB (2008) AAIndexLoc: predicting subcellular localization of proteins based on a new representation of sequences using amino acid indices. Amino Acids 35(2):345–353. doi:10.1007/s00726-007-0616-y

    Article  Google Scholar 

  35. Thompson CB (1995) Apoptosis in the pathogenesis and treatment of disease. Science 267(5203):1456–1462

    Article  Google Scholar 

  36. Wang G, Dunbrack RL Jr (2003) PISCES: a protein sequence culling server. Bioinformatics 19(12):1589–1591

    Article  Google Scholar 

  37. Yau SS, Yu C, He R (2008) A protein map and its application. DNA Cell Biol 27(5):241–250. doi:10.1089/dna.2007.0676

    Article  Google Scholar 

  38. Yu C, Liang Q, Yin C, He RL, Yau SS (2010) A novel construction of genome space with biological geometry. DNA Res Int J Rapid Publ Reports Genes Genomes 17(3):155–168. doi:10.1093/dnares/dsq008

    Google Scholar 

  39. Yu C, Cheng SY, He RL, Yau SS (2011) Protein map: an alignment-free sequence comparison method based on various properties of amino acids. Gene 486(1–2):110–118. doi:10.1016/j.gene.2011.07.002

    Article  Google Scholar 

  40. Yu X, Zheng X, Liu T, Dou Y, Wang J (2012) Predicting subcellular location of apoptosis proteins with pseudo amino acid composition: approach from amino acid substitution matrix and auto covariance transformation. Amino Acids 42(5):1619–1625. doi:10.1007/s00726-011-0848-8

    Article  Google Scholar 

  41. Yu C, Deng M, Cheng SY, Yau SC, He RL, Yau SS (2013) Protein space: a natural method for realizing the nature of protein universe. J Theor Biol 318:197–204. doi:10.1016/j.jtbi.2012.11.005

    Article  Google Scholar 

  42. Zhang H, Gu C (2006). Support Vector Machines versus Boosting. Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, USA

  43. Zhang ZH, Wang ZH, Zhang ZR, Wang YX (2006) A novel method for apoptosis protein subcellular localization prediction combining encoding based on grouped weight and support vector machine. FEBS Lett 580(26):6169–6174. doi:10.1016/j.febslet.2006.10.017

    Article  Google Scholar 

  44. Zhou GP, Doctor K (2003) Subcellular location prediction of apoptosis proteins. Proteins 50(1):44–48. doi:10.1002/prot.10251

    Article  Google Scholar 

  45. Zou KH, O’Malley AJ, Mauri L (2007) Receiver-operating characteristic analysis for evaluating diagnostic tests and predictive models. Circulation 115(5):654–657. doi:10.1161/CIRCULATIONAHA.105.594929

    Article  Google Scholar 

Download references

Acknowledgments

Saravanan Vijayakumar is supported by the DBT-BINC junior research fellow. The authors thank Borhan Kazimipour of Shiraz University for suggestions in algorithm design. The authors thank Dr. Archana Pan (Centre for Bioinforamtics, Pondicherry University) and Dr. Sivasathya (Department of Computer Science, Ponidcherry University) for their valuable suggestions and P. Elavarasan, E. Malanco and S. Manikandan of Centre for Bioinformatics, Pondicherry University for their technical support in hosting the tool onto the web server.

Conflict of interest

The author declare no conflict of interest.

Author information

Affiliations

Authors

Corresponding author

Correspondence to P. T. V. Lakshmi.

Electronic supplementary material

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Saravanan, V., Lakshmi, P.T.V. APSLAP: An Adaptive Boosting Technique for Predicting Subcellular Localization of Apoptosis Protein. Acta Biotheor 61, 481–497 (2013). https://doi.org/10.1007/s10441-013-9197-1

Download citation

Keywords

  • AdaBoost
  • Apoptosis protein
  • Jack-knife test
  • Physio-chemical parameres
  • Random forest
  • Subcellular localization
  • Web-server