Feature Selection in an Electric Billing Database Considering Attribute Inter-dependencies
With the increasing size of databases, feature selection has become a relevant and challenging problem for the area of knowledge discovery in databases. An effective feature selection strategy can significantly reduce the data mining processing time, improve the predicted accuracy, and help to understand the induced models, as they tend to be smaller and make more sense to the user. Many feature selection algorithms assumed that the attributes are independent between each other given the class, which can produce models with redundant attributes and/or exclude sets of attributes that are relevant when considered together. In this paper, an effective best first search algorithm, called buBF, for feature selection is described. buBF uses a novel heuristic function based on n-way entropy to capture inter-dependencies among variables. It is shown that buBF produces more accurate models than other state-of-the-art feature selection algorithms when compared on several real and synthetic datasets. Specifically we apply buBF to a Mexican Electric Billing database and obtain satisfactory results.
KeywordsFeature Selection Feature Subset Feature Selection Method Synthetic Dataset Feature Selection Algorithm
Unable to display preview. Download preview PDF.
- 2.Kohavi, R., John, G.: Wrappers for feature subset selection. Artificial Intelligence Journal, Special issue on relevance, 273–324 (1997)Google Scholar
- 3.Piramuthu, S.: Evaluating feature selection methods for learning in data mining applications. In: Proc. 31st annual Hawaii Int. conf. on system sciences, pp. 294–301 (1998)Google Scholar
- 5.Molina, L., Belanche, L., Nebot, A.: Feature selection algorithms, a survey and experimental eval. In: IEEE Int. conf. data mining, Maebashi City Japan, pp. 306–313 (2002)Google Scholar
- 9.Frank, A., Geiger, D., Yakhini, Z.: A distance-B&B feature selection algorithm. In: Procc. Uncertainty in artificial intelligence, México, August 2003, pp. 241–248 (2003)Google Scholar
- 11.Jakulin, A., Bratko, I.: Testing the significance of attribute interactions. In: Proc. Int. conf. on machine learning, Canada, pp. 409–416 (2004)Google Scholar
- 14.www.cs.waikato.ac.nz/ml/weka (2004)
- 15.www.ia.uned.es/~elvira/ (2004)
- 16.Quinlan, J.R.: Decision trees and multi-valued attributes. In: Hayes, J.E., Michie, D., Richards, J. (eds.) Machine Intelligence, vol. 11, pp. 305–318. Oxford University Press, Oxford (1988)Google Scholar