An ICA-Based Multivariate Discretization Algorithm
Discretization is an important preprocessing technique in data mining tasks. Univariate Discretization is the most commonly used method. It discretizes only one single attribute of a dataset at a time, without considering the interaction information with other attributes. Since it is multi-attribute rather than one single attribute determines the targeted class attribute, the result of Univariate Discretization is not optimal. In this paper, a new Multivariate Discretization algorithm is proposed. It uses ICA (Independent Component Analysis) to transform the original attributes into an independent attribute space, and then apply Univariate Discretization to each attribute in the new space. Data mining tasks can be conducted in the new discretized dataset with independent attributes. The numerical experiment results show that our method improves the discretization performance, especially for the nongaussian datasets, and it is competent compared to PCA-based multivariate method.
KeywordsData mining Multivariate Discretization Independent Component Analysis Nongaussian
Unable to display preview. Download preview PDF.
- 3.Dougherty, J., Kohavi, R., Sahami, M.: Supervised and unsupervised discretization of continuous features. In: Proceedings of the Twelfth International Conference on Machine Learning (1995)Google Scholar
- 4.Kerber, R.: Chimerge discretization of numeric attributes. In: Proceedings of the 10th International Conference on Artificial Intelligence (1991)Google Scholar
- 5.Zeta, K.M.H.O.: A Global Method for Discretization of Continuous Variables. In: The Third International Conference on Knowledge Discovery and Data Mining. (1997)Google Scholar
- 9.Huang, Y., Luo, S.: Genetic Algorithm Applied to ICA Feature Selection. In: Proceedings of the International Joint Conference on Neural Networks (2003)Google Scholar
- 10.Bach, F.R., Jordan, M.I.: Kernel Independent Component Analysis. Journal of Machine Learning Research 3 (2002)Google Scholar
- 13.Fayyad, U., Irani, K.B.: Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning. In: Proceeding of 13th International Joint Conference on Artificial Intelligence (1993)Google Scholar