Abstract
In this paper, we investigate the classification accuracy of different cancers based on microarray expression values. For this purpose, we have used hybridization between a filter selection method and a clustering method to select relevant features in each cancer dataset. Our work is carried out in two steps. First, we examine the effect of the filter selection methods on the classification accuracy before clustering. The studied filter selection methods are SNR, ReliefF, Correlation Coefficient and Mutual Information. The K Nearest Neighbor, Support Vector Machine and Linear Discriminant Analyses classifier were used for supervised classification task.
In the second step, the same investigation is carried out, but the feature selection task is preceded by a k-means clustering operation.
Obtained results showed that the best classification accuracies were obtained (for leukemia, colon, prostate, lung and lymphoma cancers datasets) for SNR method. After adding the clustering step to the phase of the feature subset selection, the classification accuracy has been increased for the four selection methods SNR, ReliefF, Correlation Coefficient, and Mutual Information.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
broadinstitute.org/cgi-bin/cancer/publications/pub_paper.cgi?mode = view&paper_id = 43.
- 2.
genomics-pubs.princeton.edu/oncology/affydata/insdex.html.
- 3.
broadinstitute.org/cgi-bin/cancer/publications/pub_paper.cgi?mode = view&paper_id = 75.
- 4.
- 5.
References
Bouazza, S.H., Hamdi, N., Zeroual, A., Auhmani, K.: Gene-expression-based cancer classification through feature selection with KNN and SVM classifiers. In: 2015 Intelligent Systems and Computer Vision (ISCV) (2015)
Vincent, I., Kwon, K.-R., Lee, S.-H., Moon, K.-S.: Acute lymphoid leukemia classification using two-step neural network classifier, May 2015
Logique floue et algorithmes génétiques pour le pré-traitement de données de biopuces et la sélection de gènes, thèse de doctorat, edmundobonilla huerta (2008)
El Akadi, A.: Contribution to select relevant features in supervised classification: application to the selection of genes for DNA chips and facial characteristics (2012)
Zhang, L., Chen, Y., Abraham, A.: Hybrid flexible neural tree approach for leukemia cancer classification. In: World Congress on Information and Communication Technologies (2011)
Park, C., Cho, S.B.: Evolutionary ensemble classifier for lymphoma and colon cancer classification. In: Conference: Evolutionary Computation (2003). https://doi.org/10.1109/CEC.2003.1299385
Singh, D., Febbo, P.G., Ross, K., Jackson, D.G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A.A., D’Amico, A.V., Richie, J.P., Lander, E.S., Loda, M., Kantoff, P.W., Golub, T.R., Sellers, W.R.: Cancer Cell: March 2002, vol. 1, 28 Feb 2002
Gordon, G.J., Jensen, R.V., Hsiao, L.L., Gullans, S.R., Blumenstock, J.E., Ramaswamy, S., Richards, W.G., Sugarbaker, D.J., Bueno, R.: Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res. 62, 4963–4967 (2002)
Shipp, M.A., Ross, K.N., Tamayo, P., et al.: Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat. Med. 8(1), 68–74 (2002)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. JMLR 3, 1157–1182 (2003)
Cuperlovic-Cuf, M., Belacel, N., Ouellette, R.J.: Determination of tumour marker genes from gene expression data. DDT 10(6), 429–437 (2005)
Kira, K., Rendell, L.: A practical approach to feature selection, pp. 249–256 (1992)
Robnik-Šikonja, M., Kononenko, I.: Theoretical and empirical analysis of relieff and rrelieff. Mach. Learn. 53(1–2), 23–69 (2003)
Egghe, L., Leydesdorff, L.: The relation between Pearson’s correlation coefficient r and Salton’s cosine measure. J. Am. Soc. Inf. Sci. Technol. 60, 1027–1036 (2009). https://doi.org/10.1002/asi.21009
Shannon, E.: A mathematical theory of communication. Bell Syst. Tech. J. 27, 623–654 (1948)
MacKay, D.: An example inference task: clustering. Information Theory, Inference and Learning Algorithm, pp. 284–292. Cambridge University Press, Cambridge (2003). Chapter 20. ISBN 0-521-64298-1. MR 2012999
Smola, A.J., Schölkopf, B.: A tutorial on support vector regression. Stat. Comput. 14(3), 199–222 (2004)
Sergey, Y.: Sensors and biosensors, MEMS technologies and its applications. In: Advances in Sensors: Reviews, vol. 2. Par Sergey Yurish (2014)
Pehlivanlı, A.Ç.: A novel feature selection scheme for high-dimensional data sets: four-staged feature selection. J. Appl. Stat. 43, 1140–1154 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Haddou Bouazza, S., Auhmani, K., Zeroual, A., Hamdi, N. (2018). Cancer Classification Using Gene Expression Profiling: Application of the Filter Approach with the Clustering Algorithm. In: Abraham, A., Haqiq, A., Muda, A., Gandhi, N. (eds) Proceedings of the Ninth International Conference on Soft Computing and Pattern Recognition (SoCPaR 2017). SoCPaR 2017. Advances in Intelligent Systems and Computing, vol 737. Springer, Cham. https://doi.org/10.1007/978-3-319-76357-6_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-76357-6_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-76356-9
Online ISBN: 978-3-319-76357-6
eBook Packages: EngineeringEngineering (R0)