Attribute reduction on real-valued data in rough set theory using hybrid artificial bee colony: extended FTSBPSD algorithm
Discretization and attribute reduction are two preprocessing steps for most of the induction algorithms. Discretization before attribute reduction will result in high computation cost as many irrelevant and redundant attributes need to be discretized. Attribute reduction before discretization may result in over-fitting of the data leading to low performance of the induction algorithm. In this paper, we have proposed a hybrid algorithm using artificial bee colony (ABC) algorithm and extended forward tentative selection with backward propagation of selection decision (EFTSBPSD) algorithm for attribute reduction on real-valued data in rough set theory (RST). Based on the principle of indiscernibility, the hybrid ABC–EFTSBPSD algorithm performs discretization and attribute reduction together. The hybrid ABC–EFTSBPSD algorithm takes as input the decision system consisting of real-valued attributes and determines a near optimal set of irreducible cuts. Here, optimality of the set of irreducible cuts is defined in terms of the cardinality of the set of irreducible cuts. Reduct is obtained from the determined approximate optimal set of irreducible cuts by extracting the attributes corresponding to the cuts in the obtained set of irreducible cuts. The proposed hybrid algorithm is tested on various data sets from University of California Machine Learning Repository. Experimental results obtained by the proposed hybrid algorithm are compared with those obtained by the Q-MDRA, ACO-RST and IMCVR algorithms described in the literature and found to give better classification accuracy when tested using (1) C4.5 classifier and (2) SVM classifier. The proposed hybrid algorithm has also shown reduced length of the reduct in comparison with the results obtained by Q-MDRA, ACO-RST and IMCVR algorithms.
KeywordsRough set theory Indiscernibility Boolean reasoning Discretization Attribute reduction Artificial bee colony algorithm FTSBPSD algorithm
This work is supported by Tata Consultancy Services, under TCS Research Scholar Program.
Compliance with ethical standards
Conflict of interest
The authors declare that they have no conflict of interest.
- Alcalá-Fdez J, Fernández A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. Mult Valued Logic Soft Comput 17:255–287Google Scholar
- Chebrolu S, Sanjeevi SG (2015) Attribute reduction on continuous data in rough set theory using ant colony optimization metaheuristic. In: Proceedings of the third international symposium on women in computing and informatics, WCI ’15, pp 17–24, ACM, New York, 2015Google Scholar
- Chebrolu S, Sanjeevi SG (2015) Forward tentative selection with backward propagation of selection decision algorithm for attribute reduction in rough set theory. Int J Reason Based Intell Syst 7(3/4):221–243Google Scholar
- IBM Corp (2013) IBM SPSS Statistics for Windows, Version 22.0. IBM Corp, ArmonkGoogle Scholar
- Karaboga D (2005) An idea based on honey bee swarm for numerical optimization. Technical Report TR06, Erciyes University, October 2005Google Scholar
- Karaboga D, Akay B, Ozturk C (2007) Artificial bee colony (ABC) optimization algorithm for training feed-forward neural networks. In: Torra V, Narukawa Y, Yoshida Y (eds) Modeling decisions for artificial intelligence: 4th international conference, MDAI 2007, Kitakyushu, Japan, August 16–18, 2007. Proceedings, pp 318–329Google Scholar
- Kent ridge bio-medical data set repository. http://sdmc.lit.org.sg/gedatasets/datasets.html
- Komorowski J, Pawlak Z, Polkowski L, Skowron A (1999) Rough sets: a tutorial. In: Pal SK, Skowron A (eds) Rough fuzzy hybridization: a new trend in decision-making. Springer, Singapore, pp 3–98Google Scholar
- Lichman M (2013) UCI machine learning repository. School of Information and Computer Sciences, University of California, Irvine. http://archive.ics.uci.edu/ml
- Pawlak Z (2002) Rough set theory and its applications. J Telecommun Inf Technol 3:7–10Google Scholar
- Pawlak Z, Skowron A (2007c) Rudiments of rough sets. Inf Sci 177(1):3–27Google Scholar
- Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc, San FranciscoGoogle Scholar
- Skowron A, Rauszer C (1992) The discernibility matrices and functions in information systems. In: Słowiński R (ed) Intelligent decision support: handbook of applications and advance of the rough sets theory, volume 11 of theory and decision library. Springer, Dordrecht, pp 331–362CrossRefGoogle Scholar