Abstract
Several research areas are being faced with data matrices that are not suitable to be managed with traditional clustering, regression, or classification strategies. For example, biological so-called omic problems present models with thousands or millions of rows and less than a hundred columns. This matrix structure hinders the successful progress of traditional data analysis methods and thus needs some means for reducing the number of rows. This article presents an unsupervised approach called PreCLAS for preprocessing matrices with dimension problems to obtain data that are apt for clustering and classification strategies. The PreCLAS was implemented as an unsupervised strategy that aims at finding a submatrix with a drastically reduced number of rows, preferring those rows that together present some group structure. Experimentation was carried out in two stages. First, to assess its functionality, a benchmark dataset was studied in a clustering context. Then, a microarray dataset with genomic information was analyzed, and the PreCLAS was used to select informative genes in the context of classification strategies. Experimentation showed that the new method performs successfully at drastically reducing the number of rows of a matrix, smartly performing unsupervised feature selection for both classification and clustering problems.
This work is supported by CONICET (Grant number 112-2017-0100829) and Secre-taría de Ciencia y Tecnología (UNS) (Grant number 24/N042).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alvar, A.S., Abadeh, M.S.: Efficient instance selection algorithm for classification based on fuzzy frequent patterns. In: 2016 IEEE 17th International Symposium on Computational Intelligence and Informatics (CINTI), pp. 000319–000324 (2016)
Antonelli, M., Ducange, P., Marcelloni, F.: Genetic training instance selection in multiobjective evolutionary fuzzy systems: a coevolutionary approach. Trans. Fuzzy Sys. 20(2), 276–290 (2012)
Bezdek, J.C., Hathaway, R.J.: VAT: a tool for visual assessment of (cluster) tendency. In: Proceedings of the 2002 International Joint Conference on Neural Networks, IJCNN 2002 (Cat. No. 02CH37290), vol. 3, pp. 2225–2230 (2002)
Chen, Z.-Y., Tsai, C.-F., Eberle, W., Lin, W.-C., Ke, S.-W.: Instance selection by genetic-based biological algorithm. Soft. Comput. 19(5), 1269–1282 (2014). https://doi.org/10.1007/s00500-014-1339-0
Darwin, C.: On the Origin of Species by Means of Natural Selection. Murray, London (1859)
Delany, S.J., Segata, N., Mac Namee, B.: Profiling instances in noise reduction. Knowl.-Based Syst. 31, 28–40 (2012)
Derrac, J., García, S., Herrera, F.: A survey on evolutionary instance selection and generation. Int. J. Appl. Metaheuristic Comput. 1(1), 60–92 (2010)
Edgar, R., Domrachev, M., Lash, A.E.: Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30(1), 207–210 (2002)
Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learning, 1st edn. Addison-Wesley Longman Publishing Co. Inc., Reading (1989)
Grochowski, M., Jankowski, N.: Comparison of instance selection algorithms II. Results and comments. In: Rutkowski, L., Siekmann, J.H., Tadeusiewicz, R., Zadeh, L.A. (eds.) ICAISC 2004. LNCS (LNAI), vol. 3070, pp. 580–585. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24844-6_87
Guillen, A., Herrera, L.J., Rubio, G., Pomares, H., Lendasse, A., Rojas, I.: New method for instance or prototype selection using mutual information in time series prediction. Neurocomputing 73(10–12), 2030–2038 (2010)
Holland, J.H.: Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor (1975). 2nd edn, 1992
Ishibuchi, H., Nakashima, T., Nii, M.: Learning of neural networks with GA-based instance selection. In: Proceedings Joint 9th IFSA World Congress and 20th NAFIPS International Conference (Cat. No. 01TH8569), vol. 4, pp. 2102–2107, August 2001
Jamjoom, M., El Hindi, K.: Partial instance reduction for noise elimination. Pattern Recogn. Lett. 74(C), 30–37 (2016)
Kassambara, A.: Practical Guide To Principal Component Methods in R: PCA, M (CA), FAMD, MFA, HCPC, Factoextra, vol. 2. STHDA (2017)
Kuri-Morales, A., Rodríguez, F.: A search space reduction methodology for large databases: a case study. In: Perner, P. (ed.) ICDM 2007. LNCS (LNAI), vol. 4597, pp. 199–213. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-73435-2_16
Lawson, R.G., Jurs, P.C.: New index for clustering tendency and its application to chemical problems. J. Chem. Inf. Comput. Sci. 30(1), 36–41 (1990)
Mirisaee, S.H., Douzal, A., Termier, A.: Selecting representative instances from datasets. In: 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 1–10 (2015)
Olvera-López, J.A., Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F., Kittler, J.: A review of instance selection methods. Artif. Intell. Rev. 34(2), 133–143 (2010). https://doi.org/10.1007/s10462-010-9165-y10.1007/s10462-010-9165-y
Olvera-López, J.A., Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F.: Object selection based on clustering and border objects. In: Kurzynski, M., Puchala, E., Wozniak, M., Zolnierek, A. (eds.) Computer Recognition Systems. AINSC, vol. 45, pp. 27–34. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-75175-5_4
Ruspini, E.H.: Numerical methods for fuzzy clustering. Inf. Sci. 2(3), 319–350 (1970)
Samuels, E.: Fantasies of Identification: Disability, Gender, Race. NYU Press, New York (2014)
Sato, T., et al.: PRC2 overexpression and PRC2-target gene repression relating to poorer prognosis in small cell lung cancer. Sci. Rep. 3 (2013). Article number: 1911
Triguero, I., García, S., Herrera, F.: Differential evolution for optimizing the positioning of prototypes in nearest neighbor classification. Pattern Recogn. 44(4), 901–916 (2011)
Tsai, C.F., Eberle, W., Chu, C.Y.: Genetic algorithms in feature and instance selection. Know.-Based Syst. 39, 240–247 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Carballido, J.A., Ponzoni, I., Cecchini, R.L. (2020). PreCLAS: An Evolutionary Tool for Unsupervised Feature Selection. In: de la Cal, E.A., Villar Flecha, J.R., Quintián, H., Corchado, E. (eds) Hybrid Artificial Intelligent Systems. HAIS 2020. Lecture Notes in Computer Science(), vol 12344. Springer, Cham. https://doi.org/10.1007/978-3-030-61705-9_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-61705-9_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61704-2
Online ISBN: 978-3-030-61705-9
eBook Packages: Computer ScienceComputer Science (R0)