Integrated Classifier: A Tool for Microarray Analysis
Abstract
Microarray technology has been developed and applied in different biological context, especially for the purpose of monitoring the expression levels of thousands of genes simultaneously. In this regard, analysis of such data requires sophisticated computational tools. Hence, we confined ourselves to propose a tool for the analysis of microarray data. For this purpose, a feature selection scheme is integrated with the classical supervised classifiers like Support Vector Machine, K-Nearest Neighbor, Decision Tree and Naive Bayes, separately to improve the classification performance, named as Integrated Classifiers. Here feature selection scheme generates bootstrap samples that are used to create diverse and informative features using Principal Component Analysis. Thereafter, such features are multiplied with the original data in order create training and testing data for the classifiers. Final classification results are obtained on test data by computing posterior probability. The performance of the proposed integrated classifiers with respect to their conventional classifiers is demonstrated on 12 microarray datasets. The results show that the integrated classifiers boost the performance up to 25.90% for a dataset, while the average performance gain is 9.74%, over the conventional classifiers. The superiority of the results has also been established through statistical significance test.
Keywords
Feature selection Microarray Principle component analysis Supervised classifiers Statistical significance testReferences
- 1.DeRisi, J.L., Iyer, V.R., Brown, P.O.: Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 278(5338), 680–686 (1997)CrossRefGoogle Scholar
- 2.Stears, R.L., Martinsky, T., Schena, M., et al.: Trends in microarray analysis. Nat. Med. 9(1), 140–145 (2003)CrossRefGoogle Scholar
- 3.Valentini, G., Masulli, F.: Ensembles of learning machines. In: Marinaro, M., Tagliaferri, R. (eds.) WIRN 2002. LNCS, vol. 2486, pp. 3–20. Springer, Heidelberg (2002). doi: 10.1007/3-540-45808-5_1 CrossRefGoogle Scholar
- 4.Mitra, S., Mitra, P., Pal, S.K.: Evolutionary modular design of rough knowledge-based network using fuzzy attributes. Neurocomputing 36, 45–66 (2001)CrossRefzbMATHGoogle Scholar
- 5.Khotanzad, A., Chung, C.: Application of multi-layer perceptron neural networks to vision problems. Neural Comput. Appl. 7(3), 249–259 (1998)CrossRefGoogle Scholar
- 6.Freund, Y., Schapire, R.E.: A desicion-theoretic generalization of on-line learning and an application to boosting. In: Vitányi, P. (ed.) EuroCOLT 1995. LNCS, vol. 904, pp. 23–37. Springer, Heidelberg (1995). doi: 10.1007/3-540-59119-2_166 CrossRefGoogle Scholar
- 7.Jordan, M.I., Jacobs, R.A.: Hierarchical mixture of experts and the EM algorithm. Neural Comput. 6, 181–214 (1994)CrossRefGoogle Scholar
- 8.Hashem, S.: Optimal linear combination of neural networks. Neural Comput. 10, 519–614 (1997)Google Scholar
- 9.Boser, B.E., Guyon, I.M., N.Vapnik, V.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 144–152 (1992)Google Scholar
- 10.Sun, S.: Ensembles of feature subspaces for object detection. In: Yu, W., He, H., Zhang, N. (eds.) ISNN 2009. LNCS, vol. 5552, pp. 996–1004. Springer, Heidelberg (2009). doi: 10.1007/978-3-642-01510-6_113 CrossRefGoogle Scholar
- 11.Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Francisco (1993)Google Scholar
- 12.Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, New York (1973)zbMATHGoogle Scholar
- 13.Armstrong, S.A., Staunton, J.E., Silverman, L.B., Pieters, R., den Boer, M.L., Minden, M.D., Sallan, S.E., Lander, E.S., Golub, T.R., Korsmeyer, S.J.: MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat. Genet. 30(1), 41–47 (2002)CrossRefGoogle Scholar
- 14.Bhattacharjee, A., Richards, W.G., Staunton, J., Li, C., Monti, S., Vasa, P., Ladd, C., Beheshti, J., Bueno, R., Gillette, M., Loda, M., Weber, G., Mark, E.J., Lander, E.S., Wong, W., Johnson, B.E., Golub, T.R., Sugarbaker, D.J., Meyerson, M.: Classification of human lung carcinomas by mrna expression profiling reveals distinct adenocarcinoma subclasses. Proc. Natl Acad. Sci. 98(24), 13790–13795 (2001)CrossRefGoogle Scholar
- 15.Chowdary, D., Lathrop, J., Skelton, J., Curtin, K., Briggs, T., Zhang, Y., Yu, J., Wang, Y., Mazumder, A.: Prognostic gene expression signatures can be measured in tissues collected in rnalater preservative. J. Mol. Diagn. 8(1), 31–39 (2006)CrossRefGoogle Scholar
- 16.Friedman, M.: A comparison of alternative tests of significance for the problem of m rankings. Ann. Math. Stat. 11, 86–92 (1940)MathSciNetCrossRefzbMATHGoogle Scholar
- 17.Cohen, J.A.: Coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20(1), 37–46 (1960)CrossRefMathSciNetGoogle Scholar
- 18.Jardine, N., Sibson, R.: Mathematical Taxonomy. Wiley, New Jersey (1971)zbMATHGoogle Scholar
- 19.Yeung, K.Y., Ruzzo, W.L.: An empirical study on principal component analysis for clustering gene expression data. Bioinformatics 17, 763–774 (2001)CrossRefGoogle Scholar
- 20.Saha, I., Rak, B., Bhowmick, S.S., Maulik, U., Bhattacharjee, D., Koch, U., Lazniewski, M., Plewczynski, D.: Binding activity prediction of cyclin-dependent inhibitors. J. Chem. Inf. Model. 55(7), 1469–1482 (2015)CrossRefGoogle Scholar
- 21.Mazzocco, G., Bhowmick, S.S., Saha, I., Maulik, U., Bhattacharjee, D., Plewczynski, D.: MaER: a new ensemble based multiclass classifier for binding activity prediction of HLA Class II proteins. In: Kryszkiewicz, M., Bandyopadhyay, S., Rybinski, H., Pal, S.K. (eds.) PReMI 2015. LNCS, vol. 9124, pp. 462–471. Springer, Cham (2015). doi: 10.1007/978-3-319-19941-2_44 CrossRefGoogle Scholar
- 22.Bhowmick, S.S., Saha, I., Maulik, U., Bhattacharjee, D.: Identification of miRNA signature using next-generation sequencing data of prostate cancer. In: Proceedings of the 3rd International Conference on Recent Advances in Information Technology, pp. 528–533 (2016)Google Scholar
- 23.Lancucki, A., Saha, I., Bhowmick, S.S., Maulik, U., Lipinski, P.: A new evolutionary microRNA marker selection using next-generation sequencing data. In: 2016 IEEE Congress on Evolutionary Computation (CEC), pp. 2752–2759 (2016)Google Scholar
- 24.Saha, I., Bhowmick, S.S., Geraci, F., Pellegrini, M., Bhattacharjee, D., Maulik, U., Plewczynski, D.: Analysis of next-generation sequencing data of mirna for the prediction of breast cancer. In: Panigrahi, B.K., Suganthan, P.N., Das, S., Satapathy, S.C. (eds.) SEMCCO 2015. LNCS, vol. 9873, pp. 116–127. Springer, Cham (2016). doi: 10.1007/978-3-319-48959-9_11 CrossRefGoogle Scholar
- 25.Bhowmick, S.S., Saha, I., Maulik, U., Bhattacharjee, D.: Biomarker identification using next generation sequencing data of RNA. In: 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 299–303 (2016)Google Scholar