Abstract
Mass spectrometry is currently the most commonly used technology in biochemical research for proteomic analysis. The main goal of proteomic profiling using mass spectrometry is the classification of samples from different clinical states. This requires the identification of proteins or peptides (biomarkers) that are expressed differentially between different clinical states. However, due to the high dimensionality of the data and the small number of samples, classification of mass spectrometry data is a challenging task. Therefore, an effective feature manipulation algorithm either through feature selection or construction is needed to enhance the classification performance and at the same time minimise the number of features. Most of the feature manipulation methods for mass spectrometry data treat this problem as a single objective task which focuses on improving the classification performance. This paper presents two new methods for biomarker detection through multi-objective feature selection and feature construction. The results show that the proposed multi-objective feature selection method can obtain better subsets of features than the single-objective algorithm and two traditional multi-objective approaches for feature selection. Moreover, the multi-objective feature construction algorithm further improves the perfomance over the multi-objective feature selection algorithm. This paper is the first multi-objective genetic programming approach for biomarker detection in mass spectrometry data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Morris, J.S., Coombes, K.R., Koomen, J., Baggerly, K.A., Kobayashi, R.: Feature extraction and quantification for mass spectrometry in biomedical applications using the mean spectrum. Bioinformatics 21(9), 1764–1775 (2005)
Ahmed, S., Zhang, M., Peng, L., Xue, B.: Genetic programming for measuring peptide detectability. In: Dick, G., et al. (eds.) SEAL 2014. LNCS, vol. 8886, pp. 593–604. Springer, Heidelberg (2014)
Yang, P., Zhang, Z.: A clustering based hybrid system for mass spectrometry data analysis. In: Chetty, M., Ngom, A., Ahmad, S. (eds.) PRIB 2008. LNCS (LNBI), vol. 5265, pp. 98–109. Springer, Heidelberg (2008)
Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic Publishers, Norwell (1998)
Xue, B., Fu, W., Zhang, M.: Differential evolution (de) for multi-objective feature selection in classification. In: Proceedings of the 2014 Conference Companion on Genetic and Evolutionary Computation Companion, GECCO Comp 2014, pp. 83–84. ACM, New York (2014)
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)
Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)
Neshatian, K., Zhang, M.: Unsupervised elimination of redundant features using genetic programming. In: Nicholson, A., Li, X. (eds.) AI 2009. LNCS, vol. 5866, pp. 432–442. Springer, Heidelberg (2009)
Gertheiss, J., Tutz, G.: Supervised feature selection in mass spectrometry-based proteomic profiling by blockwise boosting. Bioinformatics 25(8), 1076–1077 (2009)
Somnath, D.: Classification of breast cancer versus normal samples from mass spectrometry profiles using linear discriminant analysis of important features selected by random forest. Stat. Appl. Genet. Mol. Biol. 7(2), 1–14 (2008)
Muni, D., Pal, N., Das, J.: Genetic programming for simultaneous feature selection and classifier design. IEEE Trans. Syst. Man Cybern. Part B: Cybern. 36(1), 106–117 (2006)
Ahmed, S., Zhang, M., Peng, L.: Improving feature ranking for biomarker discovery in proteomics mass spectrometry data using genetic programming. Connection Sci., 1-29 (2014). doi:10.1080/09540091.2014.906388
Kourid, A., Batouche, M.: Biomarker discovery based on large-scale feature selection and MapReduce. In: Amine, A., Bellatreche, L., Elberrichi, Z., Neuhold, E.J., Wrembel, R. (eds.) Computer Science and Its Applications. IFIP AICT, vol. 456, pp. 81–92. Springer, Heidelberg (2015)
Duval, B., Hao, J.K.: Advances in metaheuristics for gene selection and classification of microarray data. Briefings Bioinform. 11(1), 127–141 (2010)
Xue, B., Cervante, L., Shang, L., Browne, W.N., Zhang, M.: Binary PSO and rough set theory for feature selection: a multi-objective filter based approach. Int. J. Comput. Intell. Appl. 13(2), 1450009 (2014)
Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast elitist multi-objective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6, 182–197 (2000)
Zitzler, E., Laumanns, M., Thiele, L.: SPEA2: improving the strength pareto evolutionary algorithm for multiobjective optimization. In: Evolutionary Methods for Design, Optimisation, and Control, CIMNE, Barcelona, Spain, pp. 95–100 (2002)
Ngatchou, P., Zarei, A., El-Sharkawi, M.: Pareto multi objective optimization. In: Proceedings of the 13th International Conference on Intelligent Systems Application to Power Systems, pp. 84–91 (2005)
Bhowan, U., Johnston, M., Zhang, M., Yao, X.: Evolving diverse ensembles using genetic programming for classification with unbalanced data. IEEE Trans. Evol. Comput. 17(3), 368–386 (2013)
Ahmed, S., Zhang, M., Peng, L., Xue, B.: Multiple feature construction for effective biomarker identification and classification using genetic programming. In: Proceedings of the 2014 Conference on Genetic and Evolutionary Computation, GECCO 2014, pp. 249–256. ACM, New York (2014)
Petricoin, E.F., Ardekani, A.M., Hitt, B.A., Levine, P.J., Fusaro, V.A., Steinberg, S.M., Mills, G.B., Simone, C., Fishman, D.A., Kohn, E.C., Liotta, L.A.: Use of proteomic patterns in serum to identify ovarian cancer. Lancet 359, 572–577 (2002)
Hingorani, S.R., Petricoin III, E.F., Maitra, A., Rajapakse, V., King, C., Jacobetz, M.A., Ross, S., Conrads, T.P., Veenstra, T.D., Hitt, B.A., Kawaguchi, Y., Johann, D., Liotta, L.A., Crawford, H.C., Putt, M.E., Jacks, T., Wright, C.V., Hruban, R.H., Lowy, A.M., Tuveson, D.A.: Preinvasive and invasive ductal pancreatic cancer and its early detection in the mouse. Cancer Cell 4(6), 437–450 (2003)
Petricoin, E.F., Rajapaske, V., Herman, E.H., Arekani, A.M., Ross, S., Johann, D., Knapton, A., Zhang, J., Hitt, B.A., Conrads, T.P., Veenstra, T.D., Liotta, L.A., Sistare, F.D.: Toxicoproteomics: serum proteomic pattern diagnostics for early detection of drug induced cardiac toxicities and cardioprotection. Toxicol. Pathol. 32, 122–130 (2004)
Ressom, H., Varghese, R.S., Orvisky, E., Drake, S., Hortin, G., Abdel-Hamid, M., Loffredo, C.A., Goldman, R.: Ant colony optimization for biomarker identification from MALDI-TOF mass spectra. In: Proceedings ofthe 28th IEEE Annual International Conference in Engineering in Medicine and Biology Society, pp. 4560–4563 (2006)
Armañanzas, R., Saeys, Y., Inza, I., GarcÃa-Torres, M., Bielza, C., Larranaga, P., van de Peer, Y.: Peakbin selection in mass spectrometry data using a consensus approach with estimation of distribution algorithms. IEEE/ACM Trans. Comput. Biol. Bioinform. 8(3), 760–774 (2011)
Petricoin, E.F., Ornstein, D.K., Paweletz, C.P., Ardekani, A., Hackett, P.S., Hitt, B.A., Velassco, A., Trucco, C., Wiegand, L., Wood, K., Simone, C.B., Levine, P.J., Linehan, W.M., Emmert-Buck, M.R., Steinberg, S.M., Kohn, E.C., Liotta, L.A.: Serum proteomic patterns for detection of prostate cancer. J. Nat. Cancer Institute 94(20), 1576–1578 (2002)
MATLAB: version 7.10.0 (R2010a). The MathWorks Inc., Natick, Massachusetts (2010)
Smith, C., Want, E., O’Maille, G., Abagyan, R., Siuzdak, G.: XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Anal. Chem. 78, 779–787 (2006)
Datta, S.: Feature selection and machine learning with mass spectrometry data. In: Matthiesen, R. (ed.) Mass Spectrometry Data Analysis in Proteomics. Methods in Molecular Biology, vol. 1007, pp. 237–262. Humana Press (2013)
Koza, J.: Genetic Programming III: Darwinian Invention and Problem Solving. A Bradford book, Elsevier Science & Tech, Massachusetts, Philadelphia (1999)
Neshatian, K., Zhang, M., Johnston, M.: Feature construction and dimension reduction using genetic programming. In: Orgun, M.A., Thornton, J. (eds.) AI 2007. LNCS (LNAI), vol. 4830, pp. 160–170. Springer, Heidelberg (2007)
Luke, S.: Essentials of Metaheuristics, 2nd edn. Lulu (2013). http://cs.gmu.edu/sean/book/metaheuristics/
Soyel, H., Tekguc, U., Demirel, H.: Application of NSGA-II to feature selection for facial expression recognition. Comput. Electr. Eng. 37(6), 1232–1240 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Ahmed, S., Zhang, M., Peng, L., Xue, B. (2016). A Multi-objective Genetic Programming Biomarker Detection Approach in Mass Spectrometry Data. In: Squillero, G., Burelli, P. (eds) Applications of Evolutionary Computation. EvoApplications 2016. Lecture Notes in Computer Science(), vol 9597. Springer, Cham. https://doi.org/10.1007/978-3-319-31204-0_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-31204-0_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31203-3
Online ISBN: 978-3-319-31204-0
eBook Packages: Computer ScienceComputer Science (R0)