Abstract
Data mining methods are frequently applied in the framework of data classification. Under data mining methods, feature selection (FS) algorithms are essential for dealing with various dimensional data sets that may contain features in the range of small, medium, and large dimensions. Handling large number of features always raises the issues regarding the classifier accuracy and running time. A novel hybrid feature selection technique build on symmetrical uncertainty and genetic algorithm is proposed. The experiments’ results on UCI datasets using this hybrid framework proved that proposed feature selector is efficient through minimizing the volume of initial features and accurate by providing better detection performance in the classification algorithms comparing with other feature selectors in the literature. It is evident from the earlier research work the prosed method promotes in optimizing and improves the performance. In summary, the proposed feature selection method has outperformed other methods in minimizing the selected features, classification performance and reduces the executing time.
References
Frenay, B., Doquire, G., Verleysen, M.: Estimating mutual information for feature selection in the presence of label noise. Comput. Stat. Data Anal. 71(1), 832–848 (2014)
Hemphill, E., Lindsay, J., Lee, C., Mandoiu, I., Nelson, C.E.: Feature selection and classifier performance on diverse bio-logical datasets. BMC Bioinf. 15(13) (2014)
Ganapathy, S., Kulothungan, K., Muthurajkumar, S., Vijayalakshmi, M., Yogesh, P., Kannan, A.: Intelligent feature selection and classification techniques for intrusion detection in networks: a survey. EURASIP J. Wirel. Commun. Netw. (2013)
Raymer, M.L., Doom, T.E., Kuhn, L.A., Punch, W.F.: Knowledge discovery in medical and biological datasets using a hybrid bayes classifier/evolutionary algorithm. IEEE Trans. Syst. Man Cybern. 33(5), 802–810 (2003)
Osl, M., Dreiseit, S., Cerqueira, F., Netzer, M., Pfeifer, B., Baumgartner, C.: Demoting redundant features to improve the discriminatory ability in cancer data. J. Biomed. Inform. 42(4), 721–725 (2009)
Xie, J., Wang, C.: Using support vector machines with a novel hybrid feature selection method for diagnosis of erythemato-squamous diseases. Expert Syst. Appl. 38, 5809–5815 (2010)
Holland, J.H.: Adaptation in Natural Artificial Systems, 2nd edn. MIT Press (1992)
Deutsch, J.M.: Evolutionary algorithms for finding optimal gene sets in microarray prediction. Bioinformatics 19(1), 45–52 (2003)
Jirapech-Umpai, T., Aitken, S.: Feature selection and classification for microarray data analysis: evolutionary methods for identifying predictive genes. BMC Bioinf. 6, 148 (2005)
Li, L., Weinberg, C.R., Darden, T.A., Pedersen, L.G.: Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method. Bioinformatics 17(12), 1131–1142 (2001)
Li, L., Pedersen, L.G., Darden, T.A., Weinberg, C.R.: Computational analysis of leukemia microarray expression data using GA/KNN method. In: Proceeding of the 1st Conference on Critical Assessment of Microarray Data Analysis, CAMDA (2000)
Ooi, C.H., Tan, P.: Genetic algorithm applied to multi-class prediction for the analysis of gene expression data. Bioinformatics 19(1), 37–44 (2003)
Moscato, P.: On evolution, search, optimization, genetic algorithms and martial arts: toward memetic algorithms. Technical Report Caltech Concurrent Computation Program, Rep. 826, California Institute of Technology, Pasadena, CA (1989)
Zhu, Z., Ong, Y.S., Dash, M.: Wrapper-Filter feature selection algorithm using a memetic framework. IEEE Trans. Syst. Man Cybern. Part B 10(4), 392–404 (2006)
Hettich, S., Blake, C., Merz, C.: UCI repository of machine learning databases. http://www.ics.uci.edu/mlearn/MLRepository.html (1998)
Moretti, S., van Leeuwen, D., Gmuender, H., Bonassi, S., Van Delft, J., Kleinjans, J., Patrone, F., Merlo, D.F.: Combining Shapley value and statistics to the analysis of gene expression data in children exposed to air pollution. BMC Bioinf. 9(361), 1–21 (2008)
Aitkenhead, M.J.A.: Co-evolving decision tree classification method. Expert Syst. Appl. 34(1), 18–25 (2006)
Baker, J.E.: Adaptive selection methods for genetic algorithms. In: Proceedings of International Conference in Genetic Algorithm and Their Applications, pp. 101–111 (1985)
Hualonga, B., Jingb, X.: Hybrid feature selection mechanism based high dimensional date sets reduction. Energy Procedia 11, 4973–4978 (2011)
Tan, F., Fu, X., Zhang, Y., Bourgeois, A.G.: A genetic algorithm—based method for feature subset selection. Soft Comput. 11, 111–120 (2008)
Jinyan, L., Huiqing, L.: Kentridge bio-medical data set repository. http://datam.i2r.a-star.edu.sg/datasets/krbd (2001)
Keinan, A., Sandbank, B., Hilgetag, C.C., Ellison, I., Ruppin, E.: Fair attribution of functional contribution in artificial and biological networks. Neural Comput. 16(9), 1887–1915 (2004)
Qi, Z., Tian, Y., Shi, Y.: Robust twin support vector machine for pattern classification. J. Pattern Recognit. 46(1), 305–316 (2013)
Senthamarai Kannan, S., Ramaraj, N.: A novel hybrid feature selection via Symmetrical Uncertainty ranking based local memetic search algorithm. Knowl. Based Syst. 23, 580–585 (2010)
Shao, Y.H., Chen, W.J., Zhang, J.J. et al.: An efficient weighted Lagrangian twin support vector machine for imbalanced data classification. J. Pattern Recognit. 47(9), 3158–3167 (2014)
Weka.: Machine Learning Software in Java. The University of Waikato software documentation. http://www.cs.waikato.ac.nz/_ml/wek
Eswa, J., Yang, J.H., Honavar, V.: Feature selection using a genetic algorithm. IEEE Intell. Syst. 13(2), 44–49 (1998)
Yildirim, P.: Filter based feature selection methods for prediction of risks in hepatitis disease. Int. J. Mach. Learn. Comput. 5(4), 258–263 (2015)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Venkataraman, S., Rajalakshmi Selvaraj (2018). Optimal and Novel Hybrid Feature Selection Framework for Effective Data Classification. In: Konkani, A., Bera, R., Paul, S. (eds) Advances in Systems, Control and Automation. Lecture Notes in Electrical Engineering, vol 442. Springer, Singapore. https://doi.org/10.1007/978-981-10-4762-6_48
Download citation
DOI: https://doi.org/10.1007/978-981-10-4762-6_48
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-4761-9
Online ISBN: 978-981-10-4762-6
eBook Packages: EngineeringEngineering (R0)