Skip to main content

Advertisement

Log in

IDM-PhyChm-Ens: Intelligent decision-making ensemble methodology for classification of human breast cancer using physicochemical properties of amino acids

  • Original Article
  • Published:
Amino Acids Aims and scope Submit manuscript

Abstract

Development of an accurate and reliable intelligent decision-making method for the construction of cancer diagnosis system is one of the fast growing research areas of health sciences. Such decision-making system can provide adequate information for cancer diagnosis and drug discovery. Descriptors derived from physicochemical properties of protein sequences are very useful for classifying cancerous proteins. Recently, several interesting research studies have been reported on breast cancer classification. To this end, we propose the exploitation of the physicochemical properties of amino acids in protein primary sequences such as hydrophobicity (Hd) and hydrophilicity (Hb) for breast cancer classification. Hd and Hb properties of amino acids, in recent literature, are reported to be quite effective in characterizing the constituent amino acids and are used to study protein foldings, interactions, structures, and sequence-order effects. Especially, using these physicochemical properties, we observed that proline, serine, tyrosine, cysteine, arginine, and asparagine amino acids offer high discrimination between cancerous and healthy proteins. In addition, unlike traditional ensemble classification approaches, the proposed ‘IDM-PhyChm-Ens’ method was developed by combining the decision spaces of a specific classifier trained on different feature spaces. The different feature spaces used were amino acid composition, split amino acid composition, and pseudo amino acid composition. Consequently, we have exploited different feature spaces using Hd and Hb properties of amino acids to develop an accurate method for classification of cancerous protein sequences. We developed ensemble classifiers using diverse learning algorithms such as random forest (RF), support vector machines (SVM), and K-nearest neighbor (KNN) trained on different feature spaces. We observed that ensemble-RF, in case of cancer classification, performed better than ensemble-SVM and ensemble-KNN. Our analysis demonstrates that ensemble-RF, ensemble-SVM and ensemble-KNN are more effective than their individual counterparts. The proposed ‘IDM-PhyChm-Ens’ method has shown improved performance compared to existing techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  • American Cancer Society (2013) Cancer Facts & Figures. American Cancer Society Inc. http://www.cancer.org/acs/groups/content/@epidemiologysurveilance/documents/document/acspc-036845.pdf. Accessed 4 Aug 2013

  • Balmain A, Gray J et al (2003) The genetics and genomics of cancer. Nat Genet 33:238–244

    Article  PubMed  CAS  Google Scholar 

  • Benediktsson JA, Swain PH (1992) Consensus theoretic classification methods. IEEE Trans Syst Man Cabernet 22:688–704

    Article  Google Scholar 

  • Bennett KP, Blue JA (1998) A support vector machine approach to decision trees. In: Neural networks proceedings. IEEE world congress on computational intelligence. The 1998 IEEE international joint conference, Anchorage, pp 2396–2401

  • Bing-Yu S, Zhu Z-H, Li J, Linghu B (2011) Combined feature selection and cancer prognosis using support vector machine regression. EEE/ACM Trans Comput Biol Bioinform 8(6):1671–1677

    Article  Google Scholar 

  • Bray F, McCarron P, Parkin DM (2004) The changing global patterns of female breast cancer incidence and mortality. Breast Cancer Res 6(6):229–239

    Article  PubMed Central  PubMed  Google Scholar 

  • Caroline D, Brasseur K, Leblanc V, Parent S, Asselin É, Bérubé G (2012) SAR study of tyrosine–chlorambucil hybrid regioisomers; synthesis and biological evaluation against breast cancer cell lines. Amino Acids 43(2):923–935

    Article  CAS  Google Scholar 

  • Chang C-C, Lin C-J (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(27):1–27

    Article  Google Scholar 

  • Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    Google Scholar 

  • Chen C, Zhou X, Tian Y, Zou X, Cai P (2006) Predicting protein structural class with pseudo-amino acid composition and support vector machine fusion network. Anal Biochem 357(1):116–121

    Article  PubMed  CAS  Google Scholar 

  • Chou KC (2005) Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 21(1):10–19

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, David WE (1999) Prediction of membrane protein types and subcellular locations. Proteins: Struct, Funct, Bioinf 34(1):137–153

    Article  CAS  Google Scholar 

  • Dobson PD, Cai YD, Stapley BJ, Doig AJ (2004) Prediction of protein function in the absence of significant sequence similarity. Curr Med Chem 11(16):2135–2142

    Article  PubMed  CAS  Google Scholar 

  • Dursun D, Walker G, Kadam A (2005) Predicting breast cancer survivability: a comparison of three data mining methods. Artif Intell Med 34(2):113–128

    Article  Google Scholar 

  • Džeroski S, Ženko B (2004) Is combining classifiers with stacking better than selecting the best one? Mach Learn 54:255–273

    Article  Google Scholar 

  • Einipour A (2011) A fuzzy-ACO method for detect breast cancer. Glob J Health Sci 3(2):195–199

    Google Scholar 

  • Emmanuel M, Alvarez MM, Trevino V (2010) Compact cancer biomarkers discovery using a swarm intelligence feature selection algorithm. Comput Biol Chem 34(4):244–250

    Article  CAS  Google Scholar 

  • Eshlaghy AT, Poorebrahimi A, Ebrahimi M, Razavi AR, Ahmad LG (2013) Using three machine learning techniques for predicting breast cancer recurrence. J Health Med Inform 4(2):124. doi:10.4172/2157-7420.1000124

    Article  Google Scholar 

  • Goodman DE, Boggess L, Watkins A (2002) Artificial immune system classification of multiple-class problems. In: Proceedings of the artificial neural networks in engineering 2002, pp 179–183

  • Hastie T, Tibshirani R, Friedman J (eds) (2001) The elements of statistical learning. Springer, New York

    Google Scholar 

  • Hayat M, Khan A (2011) Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition. J Theor Biol 271:10–17

    Article  CAS  Google Scholar 

  • Hopp TP, Woods KR (1981) Prediction of protein antigenic determinants from amino acid sequences. Nat Acad Sci 78(6):3824–3828

    Article  CAS  Google Scholar 

  • Huang M-L, Hung Y-H, Chen W-Y (2010) Neural network classifier with entropy based feature selection on breast cancer diagnosis. J Med Syst 34(5):865–873

    Article  PubMed  Google Scholar 

  • Jene-Sanz A, Váraljai R, Vilkova AV, Khramtsova GF, Khramtsov AI, Olopade OI, Lopez-Bigas N, Benevolenskaya EV (2013) Expression of polycomb targets predicts breast cancer prognosis. Mol Cell Biol 33(19):3951–3961

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  • Ji-Yeon Y, Yoshihara K, Tanaka K, Hatae M, Masuzaki H, Itamochi H, Takano M, Ushijima K, Tanyi JL, Coukos G, Lu Y, Mills GB, Verhaak RGW (2013) Predicting time to ovarian carcinoma recurrence using protein markers. J Clin Investig 123(9):3740–3750

    Google Scholar 

  • Karabatak M, Ince MC (2009) An expert system for detection of breast cancer based on association rules and neural network. Expert Syst Appl 36(2, Part 2):3465–3469

    Article  Google Scholar 

  • Khan A, Majid A, Tae-Sun C (2010) Predicting protein subcellular location: exploiting amino acid based sequence of feature spaces and fusion of diverse classifiers. Amino Acids 38(1):347–350

    Article  PubMed  CAS  Google Scholar 

  • Khan A, Majid A, Hayat M (2011) CE-PLoc: an ensemble classifier for predicting protein subcellular locations by fusing different modes of pseudo amino acid composition. Comput Biol Chem 35(4):218–229

    Article  PubMed  CAS  Google Scholar 

  • Krishnan MMR, Banerjee S, Chakraborty C, Ray AK (2010) Statistical analysis of mammographic features and its classification using support vector machine. Expert Syst Appl 37:470–478. doi:10.1016/j.eswa.2009.05.045

    Article  Google Scholar 

  • Li DC, Wu CS, Tsai TI, Lina YS (2007) Using mega-trend-diffusion and artificial samples in small data set learning for early flexible manufacturing system scheduling knowledge. Comput Oper Res 34:966–982

    Article  CAS  Google Scholar 

  • Li D-C, Liu C-W, Hu SC (2010) A learning method for the class imbalance problem with medical data sets. Comput Biol Med 40(5):509–518

    Article  PubMed  Google Scholar 

  • Li DC, Liu CW, Hu SC (2011) A fuzzy-based data transformation for feature extraction to increase classification performance with small medical data sets. Artif Intell Med 52:45–52. doi:10.1016/j.artmed.2011.02.001

    Article  PubMed  Google Scholar 

  • Liao R, Wan T, Qin Z (2010) Classification of benign and malignant breast tumors in ultrasound images based on multiple sonographic and textural features. In: Proceedings international conference on intelligent human-machine systems and cybernetics 2011 (IHMSC-2011). IEEE, Hangzhou, 26–27 Aug 2010, pp 71–74

  • Lin H (2008) The modified Mahalanobis discriminant for predicting outer membrane proteins by using Chou’s pseudo amino acid composition. J Theor Biol 252(2):350–356

    Article  PubMed  CAS  Google Scholar 

  • Maqsood H, Khan A, Yeasin M (2012) Prediction of membrane proteins using split amino acid and ensemble classification. Amino Acids 42(6):2447–2460

    Article  CAS  Google Scholar 

  • Milenković J, Hertl K, Košir A, Žibert J, Tasič JF (2013) Characterization of spatiotemporal changes for the classification of dynamic contrast-enhanced magnetic-resonance breast lesions. Artif Intell Med 58(2):101–114

    Article  PubMed  Google Scholar 

  • Mohabatkar H (2010) Prediction of cyclin proteins using Chous pseudo amino acid composition. Protein Pept Lett 17(10):1207

    Article  PubMed  CAS  Google Scholar 

  • Muhammad T, Khan A, Majid A, Lumini A (2013) Subcellular localization using fluorescence imagery: utilizing ensemble classification with diverse feature extraction strategies and data balancing. Appl Soft Comput 13(11):4231–4243

    Article  Google Scholar 

  • Munteanu CR, Magalhães AL, Uriarte E, González-Díaz H (2009) Multi-target QPDR classification model for human breast and colon cancer-related proteins using star graph topological indices. J Theor Biol 257(2):303–311

    Article  PubMed  CAS  Google Scholar 

  • Nasim FU, Ejaz S, Ashraf M, Asif AR, Oellerich M, Ahmad G, Malik GA, Attiq-ur-Rehman (2012) Potential biomarkers in the sera of breast cancer patients from Bahawalpur, Pakistan. Biomark Cancer 10(4):19–34

  • Pena-Reyes CA, Sipper M (1999) A fuzzy-genetic approach to breast cancer diagnosis. Artif Intell Med 17:131–155

    Article  PubMed  CAS  Google Scholar 

  • Phang JM, Liu W (2012) Proline metabolism and cancer. Front Biosci: J Virtual Libr 17:1835

    Article  CAS  Google Scholar 

  • Pierrick C, Joseph AP, Poulain P, Brevern AGd, Rebehmed J (2013) Cis-trans isomerization of omega dihedrals in proteins. Amino Acids 45(2):279–289

    Article  CAS  Google Scholar 

  • Qiu JD, Huang JH, Shi SP, Liang RP (2010) Using the concept of Chou’s pseudo amino acid composition to predict enzyme family classes: an approach with support vector machine based on discrete wavelet transform. Protein Pept Lett 17(6):715–722

    Article  PubMed  CAS  Google Scholar 

  • Quinlan JR (1996) Improved use of continuous attributes in C4.5. J Artif Intell Res 4:77–90

    Google Scholar 

  • Ramani RG, Jacob SG (2013a) Improved classification of lung cancer tumors Based on structural and physicochemical properties of proteins using data mining models. PLoS One 8(3):e58772. doi:10.1371/journal.pone.0058772

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  • Ramani RG, Jacob SG (2013b) Prediction of cancer rescue p53 mutants in silico using Naïve Bayes learning methodology. Protein Pept Lett 20(11):1280–1891

    Article  PubMed  CAS  Google Scholar 

  • Ramani RG, Jacob SG (2013c) Prediction of P53 mutants (multiple sites) transcriptional activity based on structural (2D&3D) properties. PLoS One 8(2):e55401

    Article  PubMed  CAS  Google Scholar 

  • Richardson A (2011) Proline metabolism in metastatic breast cancer. http://cbcrp.org.127.seekdotnet.com/research/PageGrant.asp?grant_id=6922. Accessed 23 Sept 2013

  • Ruxandra S, Stoean C (2013) Modeling medical decision making by support vector machines, explaining by rules of evolutionary algorithms with feature selection. Expert Syst Appl 40:2677–2686

    Article  Google Scholar 

  • Şahan S, Polat K, Kodaz H, Güneş S (2007) A new hybrid method based on fuzzy-artificial immune system and k-nn algorithm for breast cancer diagnosis. Comput Biol Med 37(3):415–423

    Article  PubMed  Google Scholar 

  • Sahu SS, Panda G (2010) A novel feature representation method based on Chou’s pseudo amino acid composition for protein structural class prediction. J Comput Biol Chem 34(5):320–327

    Article  CAS  Google Scholar 

  • Saima R, Hussain M, Ali A, Khana A (2013) A recent survey on colon cancer detection techniques. IEEE/ACM Trans Comput Biol Bioinform 10(3):545–563

    Google Scholar 

  • Sheau-Ling H, Hsieh S-H, Cheng P-H, Chen C-H, Hsu K-P, Lee I-S, Wang Z, Lai F (2012) Design ensemble machine learning model for breast cancer diagnosis. J Med Syst 36(5):2841–2847

    Article  Google Scholar 

  • Sjoblom T, Jones S, Wood LD, Parsons DW, Lin J, Barber TD, Mandelker D, Leary RJ, Ptak J, Silliman N (2006) The consensus coding sequences of human breast and colorectal cancers. Science 314(5797):268–274

    Article  PubMed  CAS  Google Scholar 

  • Ster B, Dobnikar A (1996) Neural networks in medical diagnosis: Comparison with other methods. In: Proceedings of the international conference on engineering applications of neural networks, pp 427–430

  • Tanford C (1962) Contribution of hydrophobic interactions to the stability of the globular conformation of proteins. J Am Chem Soc 84(22):4240–4247

    Article  CAS  Google Scholar 

  • Vapnik VN (1995) The nature of statistical learning theory. Springer Verlag, New York

    Book  Google Scholar 

  • William CC (ed) (2010) An omics perspective on cancer research. Springer, Netherlands. ISBN: 978-90-481-2674-3

  • Xin M, Guo J, Liu H, Xie J, Sun X (2012) Sequence-based prediction of DNA-binding residues in proteins with conservation and correlation information. IEEE/ACM Trans Comput Biol Bioinform 9(6):1766–1775

    Article  Google Scholar 

  • Xu R, Anagnostopoulos GC, Wunsch DC (2007) Multiclass cancer classification using semisupervised ellipsoid ARTMAP and particle swarm optimization with gene expression data. IEEE/ACM Trans Comput Biol Bioinform 4(1):65–77

    Article  PubMed  CAS  Google Scholar 

  • Yvan S, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517

    Article  CAS  Google Scholar 

Download references

Acknowledgments

Authors are very grateful to Pakistan Institute of Engineering and Applied Sciences (PIEAS) for providing useful resources for this work.

Conflict of interest

None.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abdul Majid.

Additional information

Matlab based codes developed for this study can be provided to academicians on request.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ali, S., Majid, A. & Khan, A. IDM-PhyChm-Ens: Intelligent decision-making ensemble methodology for classification of human breast cancer using physicochemical properties of amino acids. Amino Acids 46, 977–993 (2014). https://doi.org/10.1007/s00726-013-1659-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00726-013-1659-x

Keywords

Navigation