LR-SDiscr: An Efficient Algorithm for Supervised Discretization
Discretization is the process of transforming continuous attributes into discrete. It has a great importance nowadays, as continuous data are often present in several domains such as health and industry. This paper describes a new supervised discretization method based on a LR (Left to Right) scanning technique called LR-SDiscr (Left to Right Supervised Discretization). Using both merging and partitioning operations, LR-SDiscr discretizes the data in a single pass, which reduces the complexity of the process and ensures scalability. Various discretization measures can be tested and then compared, as the algorithm offers the possibility of introducing any discretization measure as input. The preliminary results of experiments designed for classification purposes are encouraging.
KeywordsData mining Data pre-processing Supervised classification Supervised discretization Division and merging framework Scanner
- 1.Aha, D., et al.: UCI repository of machine learning databases (2017). http://www.ics.uci.edu/mlearn/MLRepository.html
- 2.Bettinger, R.: A \(chi^2\)-based discretization algorithm, a modern analytics. In: Proceedings of WUSS (2011)Google Scholar
- 6.Kerber, R.: ChiMerge discretization of numeric attributes. In: AAAI Proceedings, pp. 123–128 (1992)Google Scholar
- 8.Lee, C.I., Tsai, C.J., Yang, Y.R., Yang, W.P.: A top-down and greedy method for discretization of continuous attributes. In: Proceedings of ICFSKD, pp. 145–153 (2007)Google Scholar
- 11.Su, C.T., Hsu, J.H.: An extended Chi2 algorithm for discretization of real value attributes. IEEE TKDE 17, 437–441 (2005)Google Scholar