Skip to main content
Log in

Prediction of protein subcellular localization by incorporating multiobjective PSO-based feature subset selection into the general form of Chou’s PseAAC

  • Original Article
  • Published:
Medical & Biological Engineering & Computing Aims and scope Submit manuscript

Abstract

In this article, the possible subcellular location of a protein is predicted using multiobjective particle swarm optimization-based feature selection technique. In general form of pseudo-amino acid composition, the protein sequences are used for constructing protein features. Here, the different amino acids compositions are used to construct the feature sets. Therefore, the data are presented as sample of protein versus amino acid compositions as features. The proposed algorithm tries to maximize the feature relevance and minimize the feature redundancy simultaneously. After proposed algorithm is executed on the multiclass dataset, some features are selected. On this resultant feature subset, tenfold cross-validation is applied and corresponding accuracy, F score, entropy, representation entropy and average correlation are calculated. The performance of the proposed method is compared with that of its single objective versions, sequential forward search, sequential backward search, minimum redundancy maximum relevance with two schemes, CFS, CBFS, \(\chi ^2\), Fisher discriminant and a Cluster-based technique.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Algoul S, Alam MS, Hossain MA, Majumder MAA (2011) Multi-objective optimal chemotherapy control model for cancer treatment. Med Biol Eng Comput 49:51–65

    Article  CAS  PubMed  Google Scholar 

  2. Bulashevska A, Eils R (2006) Predicting protein subcellular locations using hierarchical ensemble of bayesian classifiers based on markov chains. BMC Bioinform 7(298):44–48

    Google Scholar 

  3. Cai YD, Chou KC (2004) Predicting subcellular localization of proteins in a hybridization space. Bioinformatics 20:1151–1156

    Article  CAS  PubMed  Google Scholar 

  4. Cao DS, Xu QS, Liang YZ (2013) propy: a tool to generate various modes of Chou’s pseAAC. Bioinformatics 29:960–962

    Article  CAS  PubMed  Google Scholar 

  5. Cheok MH, Yang W, Pui CH, Downing JR, Cheng C, Naeve CW, Relling MV, Evans WE (2003) Characterization of pareto dominance. Oper Res Lett 31(1):7–11

    Article  Google Scholar 

  6. Chi SM (2010) Prediction of protein subcellular localization by weighted gene ontology terms. Biochem Biophys Res Commun 399(3):402–405

    Article  CAS  PubMed  Google Scholar 

  7. Chou K (2011) Some remarks on protein attribute prediction and pseudo amino acid composition. J Theor Biol 273(1):236–247

    Article  CAS  PubMed  Google Scholar 

  8. Chou K, Wu Z, Xiao X (2012) iLoc-Hum: Using accumulation-label scale to predict subcellular locations of human proteins with both single and multiple sites. Mol Biosyst 8:629–641

    Article  CAS  PubMed  Google Scholar 

  9. Chou KC (2001) Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins Struct Funct Bioinform 43(3):246–255

    Article  CAS  Google Scholar 

  10. Chou KC (2013) Some remarks on predicting multi-label attributes in molecular biosystems. Mol Biosyst 9:1092–1100

    Article  CAS  PubMed  Google Scholar 

  11. Chou KC, Cai YD (2002) Using functional domain composition and support vector machines for prediction of protein subcellular location. J Biol Chem 277(48):45765–45769

    Article  CAS  PubMed  Google Scholar 

  12. Chou KC, Elrod DW (1999) Protein subcellular location prediction. Protein Eng 12(2):107–118

    Article  CAS  PubMed  Google Scholar 

  13. Chou KC, Shen HB (2006) Predicting eukaryotic protein subcellular location by fusing optimized evidence-theoretic k-nearest neighbor classifiers. J Proteome Res 5:1888–1897

    Article  CAS  PubMed  Google Scholar 

  14. Chou KC, Shen HB (2007) Review: recent progresses in protein subcellular location prediction. Anal Biochem 370(1):1–16

    Article  CAS  PubMed  Google Scholar 

  15. Cover TM, Thomas JA (2006) Entropy, relative entropy and mutual information. Elements of information theory. Wiley, New York

    Google Scholar 

  16. Dash M., Liu H.(2000) Unsupervised feature selection. In: Proceedings of the Pacific Asia conference knowledge discovery and data mining

  17. Deb K (2001) Multi-objective optimization using evolutionary algorithms. Wiley, Oxford

    Google Scholar 

  18. Deb K, Pratap A, Agrawal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 6:182–197

    Article  Google Scholar 

  19. Devijver PA, Kittler J (1982) Pattern recognition: a statistical approach. Prentice Hall, Englewood Cliffs

    Google Scholar 

  20. Ding C., Peng H.(2003) Minimum redundancy feature selection from microarray gene expression data. In: Proceedings of the computational systems bioinformatics (CSB03)

  21. Du P, Gu S, Jiao Y (2014) Pseaac-general: fast building various modes of general form of Chou’s pseudo-amino acid composition for large-scale protein datasets. Int J Mol Sci 15:3495–3506

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  22. Du P, Wang X, Xu C, Gao Y (2012) PseAAC-builder: a cross-platform stand-alone program for generating various special chou’s pseudo-amino acid compositions. Anal Biochem 425:117–119

    Article  CAS  PubMed  Google Scholar 

  23. Garg A, Bhasin M, Raghava GPS (2005) Support vector machine-based method for subcellular location of human proteins using amino acid compositions, their order and similarity search. J Biol Chem 280:14,427–14,432

    Article  CAS  Google Scholar 

  24. Hall MA, Smith LA(1999) Feature selection for machine learning: comparing a correlation-based filter approach to the wrapper. In: The twelfth international Florida artificial intelligence research society conference, Orlando, Florida, USA

  25. Hou J, Shi W, Li G, Zhou W (2007) An effective non-parametric method for globally clustering genes from expression profiles. Med Biol Eng Comput 45:1175–1185

    Article  PubMed  Google Scholar 

  26. Kamandar M, Ghassemian H (2009) A cluster-based feature selection approach. In: International conference on hybrid artificial intelligence systems

  27. Kamandar M, Ghassemian H (2011) Maximum relevance, minimum redundancy band selection for hyperspectral images. In: 19th Iranian conference on electrical engineering (ICEE)

  28. Khanesar MA, Teshnehlab M, Shoorehdeli MA (2007) A novel binary particle swarm optimization. In: Mediterranean conference on control and automation

  29. Lin WZ, Fang JA, Xiao X (2013) iLoc-animal: a multi-label learning classifier for predicting subcellular localization of animal proteins. Mol Biosyst 9:634–644

    Article  CAS  PubMed  Google Scholar 

  30. Lustgarten J, Gopalakrishnan V, Visweswaran S (2009) Measuring stability of feature selection in biomedical datasets. In: Annual symposium proceedings/AMIA symposium, pp 406–410

  31. Marcano-Cedeno A, Quintanilla-Dominguez J, Cortina-Januchs M, Andina D (2010) Feature selection using sequential forward selection and classification applying artificial metaplasticity neural network. In: 36th annual conference on IEEE industrial electronics society, pp 2845–2850

  32. Mohamad MS, Omatu S, Deris S, Yoshioka M (2011) An improved binary particle swarm optimization algorithm for gene selection and classification of colon cancer data. In: Advances in bioinformatics. Penerbit UTM

  33. Nakai K, Kanehisa M (1992) A knowledge base for predicting protein localization sites in eukaryotic cells. Genomics 14(4):897–911

    Article  CAS  PubMed  Google Scholar 

  34. Park KJ, Kanehisa M (2003) Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs. Bioinformatics 19:1656–1663

    Article  CAS  PubMed  Google Scholar 

  35. Parsopoulos KE (2010) Particle swarm optimization and intelligence: advances and applications. Information science reference, Hershey, New York

    Book  Google Scholar 

  36. Pirogova E, Vojisavljevic V, Caceres J, Cosic I (2010) Ataxin active site determination using spectral distribution of electron ion interaction potentials of amino acids. Med Biol Eng Comput 48(4):303–309

    Article  CAS  PubMed  Google Scholar 

  37. Reinhardt A, Hubbard T (1998) Using neural networks for prediction of the subcellular location of proteins. Nucleic Acids Res 26:2230–2236

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  38. Shen HB, Chou KC (2007) Hum-mPLoc: an ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sites. Biochem Biophys Res Commun 355:1006–1011

    Article  CAS  PubMed  Google Scholar 

  39. Sierra MR, Coello CC (2006) Multi-objective particle swarm optimizers: A survey of the state-of-the-art. Int J Comput Intell Res 2(3):287–308

    Google Scholar 

  40. Song C, Shi F, Ma X (2009) Prediction of the subcellular location of apoptosis proteins based on approximate entropy. J Converg Inf Technol 4(4):118–122

    Google Scholar 

  41. Su ECY, Chiu HS, Lo A, Hwang JK, Sung TY, Hsu WL (2007) Protein subcellular localization prediction based on compartment-specific features and structure conservation. BMC Bioinform 8(330)

  42. Wan S, Mak M, Kung S (2012) mGOASVM: multi-label protein subcellular localization based on gene ontology and support vector machines. BMC Bioinform 13(290)

  43. Wu ZC, Xiao X (2011) iLoc-Euk: a multi-label classifier for predicting the subcellular localization of singleplex and multiplex eukaryotic proteins. PLoS One 6:629–641

    Google Scholar 

  44. Xiao X, Wu ZC (2011) iLoc-Virus: a multi-label learning classifier for identifying the subcellular localization of virus proteins with both single and multiple sites. J Theor Biol 284:42–51

    Article  CAS  PubMed  Google Scholar 

  45. Yang W, Lu B, Yang Y (2006) A comparative study on feature extraction from protein sequences for subcellular localization prediction. In: IEEE symposium on computational intelligence and bioinformatics and computational biology, pp 1–8

  46. Yu CS, Lin CJ, Huwang JK (2004) Predicting subcellular localization of proteins for gram-negative bacteria by support vector machines based on n-peptide compositions. Protein Sci 13:1402–1406

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  47. Yu L, Liu H (2003) Feature selection for high-dimensional data: a fast correlation-based filter solution. In: 12th international conference on machine learning (ICML-2003)

  48. Yu X, Zheng X, Liu T, Dou Y, Wang J (2012) Predicting subcellular location of apoptosis proteins with pseudo amino acid composition: approach from amino acid substitution matrix and auto covariance transformation. Amino Acids 42:1619–1625

    Article  CAS  PubMed  Google Scholar 

  49. Zhou GP, Doctor K (2003) Subcellular location prediction of apoptosis proteins. Proteins Struct Funct Bioinform 50(1):44–48

    Article  CAS  Google Scholar 

  50. Zhou XB, Chen C, Li ZC, Zou XY (2008) Improved prediction of subcellular location for apoptosis proteins by the dual-layer support vector machine. Amino Acids 35:383–388

    Article  CAS  PubMed  Google Scholar 

  51. Zitzler E, Thiele L (1998) An evolutionary algorithm for multiobjective optimization: the strength pareto approach. Tech. Rep. 43, Zurich, Switzerland

Download references

Acknowledgments

The work is partially supported by DST-PURSE scheme of University of Kalyani.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Monalisa Mandal.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mandal, M., Mukhopadhyay, A. & Maulik, U. Prediction of protein subcellular localization by incorporating multiobjective PSO-based feature subset selection into the general form of Chou’s PseAAC. Med Biol Eng Comput 53, 331–344 (2015). https://doi.org/10.1007/s11517-014-1238-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11517-014-1238-7

Keywords

Navigation