PRIB 2008: Pattern Recognition in Bioinformatics pp 98-109 | Cite as
A Clustering Based Hybrid System for Mass Spectrometry Data Analysis
Abstract
Recently, much attention has been given to the mass spectrometry (MS) technology based disease classification, diagnosis, and protein-based biomarker identification. Similar to microarray based investigation, proteomic data generated by such kind of high-throughput experiments are often with high feature-to-sample ratio. Moreover, biological information and pattern are compounded with data noise, redundancy and outliers. Thus, the development of algorithms and procedures for the analysis and interpretation of such kind of data is of paramount importance. In this paper, we propose a hybrid system for analyzing such high dimensional data. The proposed method uses the k-mean clustering algorithm based feature extraction and selection procedure to bridge the filter selection and wrapper selection methods. The potential informative mass/charge (m/z) markers selected by filters are subject to the k-mean clustering algorithm for correlation and redundancy reduction, and a multi-objective Genetic Algorithm selector is then employed to identify discriminative m/z markers generated by k-mean clustering algorithm. Experimental results obtained by using the proposed method indicate that it is suitable for m/z biomarker selection and MS based sample classification.
Keywords
Feature Selection Information Gain Feature Subset Subset Evaluation Base Feature SelectionReferences
- 1.Morris, J.S., Coombes, K.R., Koomen, J., Baggerly, K.A., Kobayashi, R.: Feature extraction and quantification for mass spectrometry in biomedical applications using the mean spectrum. Bioinformatics 21(9), 1764–1775 (2005)CrossRefPubMedGoogle Scholar
- 2.Petricoin, E.F., Liotta, L.A.: SELDI-TOF-based serum proteomic pattern diagnostics for early detection of cencer. Curr. Opin. Biotechnol. 15, 24–30 (2004)CrossRefPubMedGoogle Scholar
- 3.Petricoin, E.F., Ornstein, D.K., Paweletz, C.P., Ardekani, A.M., Hackett, P.S., Hitt, B.A., Velassco, A., Trucco, C., Wiegand, L., Wood, K., Simone, C.B., Levine, P.J., Linehan, W.M., Emmert-Buck, M.R., Steinberg, S.M., Kohn, E.C., Liotta, L.A.: Serum Proteomic Patterns for Detection of Prostate Cancer. Journal of the National Cancer Institute 94(20), 1576–1578 (2002)CrossRefPubMedGoogle Scholar
- 4.Petricoin, E.F., Ardekani, A.M., Hitt, B.A., Levine, P.J., Fusaro, V.A., SteinBerg, S.M., Mills, G.B., Simone, C., Fishman, D.A., Kohn, E.C., Liotta, L.A.: Use of proteomic patterns in serum to identify ovarian cancer. The Lancet 359, 572–577 (2002)CrossRefGoogle Scholar
- 5.Li, L., Umbach, D.M., Terry, P., Taylor, J.A.: Application of the GA/KNN method to SELDI proteomics data. Bioinformatics 20(10), 1638–1640 (2004)CrossRefPubMedGoogle Scholar
- 6.Yu, J.S., Ongarello, S., Fiedler, R., Chen, X.W., Toffolo, G., Cobelli, C., Trajanoski, Z.: Ovarian cancer identification based on dimensionality reduction for high-throughput mass spectrometry data. Bioinformatics 21(10), 2200–2209 (2005)CrossRefPubMedGoogle Scholar
- 7.Boguski, M.S., McIntosh, M.W.: Biomedical informatics for proteomics. Nature 422, 233–236 (2003)CrossRefPubMedGoogle Scholar
- 8.Somorjai, R.L., Dolenko, B., Baumgartner, R.: Class prediction and discovery using gene microarray and protenomics mass spectroscopy data: curses, caveats, cautions. Bioinformatics 19(12), 1484–1491 (2003)CrossRefPubMedGoogle Scholar
- 9.Ding, C., Peng, H.: Minimum Redundancy Feature Selection From Microarray Gene Expression Data. Journal of Bioinformatics and Computational Biology 3(2), 185–205 (2005)CrossRefPubMedGoogle Scholar
- 10.Golub, T.R., Tamayo, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Boomfield, C.D., Lander, E.S.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 286, 531–537 (1999)CrossRefPubMedGoogle Scholar
- 11.Liu, H., Li, J., Wang, L.: A Comparative Study on Feature Selection and Classification Methods Using Gene Expression Profiles and Proteomic Patterns. Genome Informatics 13, 51–60 (2002)PubMedGoogle Scholar
- 12.Su, Y., Murali, T., Pavlovic, V., Schaffer, M., Kasif, S.: RankGene: Identification of Diagnostic Genes Based on Expression Data. Bioinformatics 19(12), 1578–1579 (2003)CrossRefPubMedGoogle Scholar
- 13.Kohavi, R., John, G.: Wrapper for feature subset selection. Artificial Intelligence 97(1-2), 273–324 (1997)CrossRefGoogle Scholar
- 14.Jaeger, J., Sengupta, R., Ruzzo, W.L.: Improved Gene Selection for Clssification of Microarrays. Pac. Symp. Biocomput., 53–64 (2003)Google Scholar
- 15.Jirapech-Umpai, T., Aitken, S.: Feature Selection and Classification for Microarray Data Analysis: Evolutionary Methods for Identifying Predictive Genes. BMC Bioinformatics 6, 146 (2005)CrossRefGoogle Scholar
- 16.Yang, P.Y., Zhang, Z.L.: Hybrid Methods to Select Informative Gene Sets in Microarray Data Classification. In: Orgun, M.A., Thornton, J. (eds.) AI 2007. LNCS (LNAI), vol. 4830, pp. 811–815. Springer, Heidelberg (2007)Google Scholar
- 17.Yang, P.Y., Zhang, Z.L.: A Hybrid Approach to Selecting Susceptible Single Nucleotide Polymorphisms for Complex Disease Analysis. In: Proceedings of BMEI 2008, pp. 214–218. IEEE, Los Alamitos (2008)Google Scholar
- 18.Quinlan, J.R.: Learning efficient classification procedures and their applicaiton to chess and games. In: Machine Learning: An Artificial Intelligence Approach. Morgan Kaufmann, San Mateo (1983)Google Scholar
- 19.Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)Google Scholar
- 20.Blum, A.L., Langley, P.: Selection of relevent Features and Examples in Machine Learning. Artificial Intelligence 97(1-2), 245–271 (1997)CrossRefGoogle Scholar
- 21.Geurts, P., Fillet, M., de Seny, D., Meuwis, M.A., Malaise, M., Merville, M.P., Wehenkel, L.: Proteomic mass spectra classifcation using decision tree based ensemble methods. Bioinformatics 21, 3138–3145 (2005)CrossRefPubMedGoogle Scholar
- 22.Wang, Y., Makedon, F., Ford, J., Pearlman, J.: HykGene: A Hybrid Approach for Selecting Marker Genes for Phenotype Classification using Microarray Gene Expression Data. Bioinformatics 21(8), 1530–1537 (2005)CrossRefPubMedGoogle Scholar
- 23.Zhang, Z.L., Yang, P.Y.: An Ensemble of Classifier with Genetic Algorithm Based Feature Selection (accepted by IEEE Intelligent Informatics Bulletin)Google Scholar
- 24.Cai, Z., Goebel, R., Salavatipour, M.R., Lin, G.: Selecting Dissimilar Genes for Multi-Class Classification, an Application in Cancer Subtyping. BMC Bioinformatics 8, 206 (2007)CrossRefPubMedPubMedCentralGoogle Scholar
- 25.Hanczar, B., Courtine, M., Benis, A., Hennegar, C., Clement, K., Zucker, J.-D.: Improving classification of microarray data using prototype-based feature selection. SIGKDD Explorations 5, 23–30 (2003)CrossRefGoogle Scholar
- 26.Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)CrossRefPubMedGoogle Scholar