Skip to main content

Discretization

  • Chapter
  • First Online:
Data Preprocessing in Data Mining

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 72))

Abstract

Discretization is an essential preprocessing technique used in many knowledge discovery and data mining tasks. Its main goal is to transform a set of continuous attributes into discrete ones, by associating categorical values to intervals and thus transforming quantitative data into qualitative data. An overview of discretization together with a complete outlook and taxonomy are supplied in Sects. 9.1 and 9.2. We conduct an experimental study in supervised classification involving the most representative discretizers, different types of classifiers, and a large number of data sets (Sect. 9.4).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.keel.es.

References

  1. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of the 20th Very Large Data Bases conference (VLDB), pp. 487–499 (1994)

    Google Scholar 

  2. Aha, D.W. (ed.): Lazy Learning. Springer, New York (2010)

    Google Scholar 

  3. Alcalá-Fdez, J., Fernández, A., Luengo, J., Derrac, J., García, S., Sánchez, L., Herrera, F.: KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Multiple-Valued Logic Soft Comput. 17(2–3), 255–287 (2011)

    Google Scholar 

  4. Alcalá-Fdez, J., Sánchez, L., García, S., del Jesus, M.J., Ventura, S., Garrell, J.M., Otero, J., Romero, C., Bacardit, J., Rivas, V.M., Fernández, J.C., Herrera, F.: KEEL: a software tool to assess evolutionary algorithms for data mining problems. Soft Comput. 13(3), 307–318 (2009)

    Article  Google Scholar 

  5. An, A., Cercone, N.: Discretization of Continuous Attributes for Learning Classification Rules. In: Proceedings of the Third Conference on Methodologies for Knowledge Discovery and Data Mining, pp. 509–514 (1999)

    Google Scholar 

  6. Au, W.H., Chan, K.C.C., Wong, A.K.C.: A fuzzy approach to partitioning continuous attributes for classification. IEEE Trans. Knowl. Data Eng. 18(5), 715–719 (2006)

    Article  Google Scholar 

  7. Augasta, M.G., Kathirvalavakumar, T.: A new discretization algorithm based on range coefficient of dispersion and skewness for neural networks classifier. Appl. Soft Comput. 12(2), 619–625 (2012)

    Article  Google Scholar 

  8. Bache, K., Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml

  9. Bakar, A.A., Othman, Z.A., Shuib, N.L.M.: Building a new taxonomy for data discretization techniques. In: Proceedings on Conference on Data Mining and Optimization (DMO), pp. 132–140 (2009)

    Google Scholar 

  10. Bay, S.D.: Multivariate discretization for set mining. Knowl. Inf. Syst. 3, 491–512 (2001)

    Article  MATH  Google Scholar 

  11. Berka, P., Bruha, I.: Empirical comparison of various discretization procedures. Int. J. Pattern Recognit. Artif. Intell. 12(7), 1017–1032 (1998)

    Article  Google Scholar 

  12. Berrado, A., Runger, G.C.: Supervised multivariate discretization in mixed data with random forests. In: ACS/IEEE International Conference on Computer Systems and Applications (ICCSA), pp. 211–217 (2009)

    Google Scholar 

  13. Berzal, F., Cubero, J.C., Marín, N., Sánchez, D.: Building multi-way decision trees with numerical attributes. Inform. Sci. 165, 73–90 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  14. Bondu, A., Boulle, M., Lemaire, V.: A non-parametric semi-supervised discretization method. Knowl. Inf. Syst. 24, 35–57 (2010)

    Article  Google Scholar 

  15. Boulle, M.: Khiops: a statistical discretization method of continuous attributes. Mach. Learn. 55, 53–69 (2004)

    Article  MATH  Google Scholar 

  16. Boullé, M.: MODL: a bayes optimal discretization method for continuous attributes. Mach. Learn. 65(1), 131–165 (2006)

    Article  Google Scholar 

  17. Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees. Chapman and Hall/CRC, New York (1984)

    MATH  Google Scholar 

  18. Butterworth, R., Simovici, D.A., Santos, G.S., Ohno-Machado, L.: A greedy algorithm for supervised discretization. J. Biomed. Inform. 37, 285–292 (2004)

    Article  Google Scholar 

  19. Catlett, J.: On changing continuous attributes into ordered discrete attributes. In European Working Session on Learning (EWSL), Lecture Notes on Computer Science, vol. 482, pp. 164–178. Springer (1991)

    Google Scholar 

  20. Cerquides, J., Mantaras, R.L.D.: Proposal and empirical comparison of a parallelizable distance-based discretization method. In: Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD), pp. 139–142 (1997)

    Google Scholar 

  21. Chan, C., Batur, C., Srinivasan, A.: Determination of quantization intervals in rule based model for dynamic systems. In: Proceedings of the Conference on Systems and Man and and Cybernetics, pp. 1719–1723 (1991)

    Google Scholar 

  22. Chao, S., Li, Y.: Multivariate interdependent discretization for continuous attribute. Proc. Third Int. Conf. Inf. Technol. Appl. (ICITA) 2, 167–172 (2005)

    Google Scholar 

  23. Chen, C.W., Li, Z.G., Qiao, S.Y., Wen, S.P.: Study on discretization in rough set based on genetic algorithm. In: Proceedings of the Second International Conference on Machine Learning and Cybernetics (ICMLC), pp. 1430–1434 (2003)

    Google Scholar 

  24. Ching, J.Y., Wong, A.K.C., Chan, K.C.C.: Class-dependent discretization for inductive learning from continuous and mixed-mode data. IEEE Trans. Pattern Anal. Mach. Intell. 17, 641–651 (1995)

    Article  Google Scholar 

  25. Chlebus, B., Nguyen, S.H.: On finding optimal discretizations for two attributes. Lect. Notes Artif. Intell. 1424, 537–544 (1998)

    MathSciNet  Google Scholar 

  26. Chmielewski, M.R., Grzymala-Busse, J.W.: Global discretization of continuous attributes as preprocessing for machine learning. Int. J. Approximate Reasoning 15(4), 319–331 (1996)

    Article  MATH  Google Scholar 

  27. Chou, P.A.: Optimal partitioning for classification and regression trees. IEEE Trans. Pattern Anal. Mach. Intell. 13, 340–354 (1991)

    Article  Google Scholar 

  28. Cios, K.J., Kurgan, L.A., Dick, S.: Highly scalable and robust rule learner: performance evaluation and comparison. IEEE Trans. Syst. Man Cybern. Part B 36, 32–53 (2006)

    Article  Google Scholar 

  29. Cios, K.J., Pedrycz, W., Swiniarski, R.W., Kurgan, L.A.: Data Mining: A Knowledge Discovery Approach. Springer, New York (2007)

    Google Scholar 

  30. Clarke, E.J., Barton, B.A.: Entropy and MDL discretization of continuous variables for bayesian belief networks. Int. J. Intell. Syst. 15, 61–92 (2000)

    Article  Google Scholar 

  31. Cohen, J.A.: Coefficient of agreement for nominal scales. Educ. Psychol. Measur. 20, 37–46 (1960)

    Article  Google Scholar 

  32. Cohen, W.W.: Fast Effective Rule Induction. In: Proceedings of the Twelfth International Conference on Machine Learning (ICML), pp. 115–123 (1995)

    Google Scholar 

  33. Dai, J.H.: A genetic algorithm for discretization of decision systems. In: Proceedings of the Third International Conference on Machine Learning and Cybernetics (ICMLC), pp. 1319–1323 (2004)

    Google Scholar 

  34. Dai, J.H., Li, Y.X.: Study on discretization based on rough set theory. In: Proceedings of the First International Conference on Machine Learning and Cybernetics (ICMLC), pp. 1371–1373 (2002)

    Google Scholar 

  35. Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)

    MATH  MathSciNet  Google Scholar 

  36. Dougherty, J., Kohavi, R., Sahami, M.: Supervised and unsupervised discretization of continuous features. In: Proceedings of the Twelfth International Conference on Machine Learning (ICML), pp. 194–202 (1995)

    Google Scholar 

  37. Elomaa, T., Kujala, J., Rousu, J.: Practical approximation of optimal multivariate discretization. In: Proceedings of the 16th International Symposium on Methodologies for Intelligent Systems (ISMIS), pp. 612–621 (2006)

    Google Scholar 

  38. Elomaa, T., Rousu, J.: General and efficient multisplitting of numerical attributes. Mach. Learn. 36, 201–244 (1999)

    Article  MATH  Google Scholar 

  39. Elomaa, T., Rousu, J.: Necessary and sufficient pre-processing in numerical range discretization. Knowl. Inf. Syst. 5, 162–182 (2003)

    Article  Google Scholar 

  40. Elomaa, T., Rousu, J.: Efficient multisplitting revisited: Optima-preserving elimination of partition candidates. Data Min. Knowl. Disc. 8, 97–126 (2004)

    Article  MathSciNet  Google Scholar 

  41. Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of the 13th International Joint Conference on Artificial Intelligence (IJCAI), pp. 1022–1029 (1993)

    Google Scholar 

  42. Ferrandiz, S., Boullé, M.: Multivariate discretization by recursive supervised bipartition of graph. In: Proceedings of the 4th Conference on Machine Learning and Data Mining (MLDM), pp. 253–264 (2005)

    Google Scholar 

  43. Flores, J.L., Inza, I., Larrañaga, P.: Larra: Wrapper discretization by means of estimation of distribution algorithms. Intell. Data Anal. 11(5), 525–545 (2007)

    Google Scholar 

  44. Flores, M.J., Gámez, J.A., Martínez, A.M., Puerta, J.M.: Handling numeric attributes when comparing bayesian network classifiers: does the discretization method matter? Appl. Intell. 34, 372–385 (2011)

    Article  Google Scholar 

  45. Friedman, N., Goldszmidt, M.: Discretizing continuous attributes while learning bayesian networks. In: Proceedings of the 13th International Conference on Machine Learning (ICML), pp. 157–165 (1996)

    Google Scholar 

  46. Gaddam, S.R., Phoha, V.V., Balagani, K.S.: K-Means+ID3: a novel method for supervised anomaly detection by cascading k-means clustering and ID3 decision tree learning methods. IEEE Trans. Knowl. Data Eng. 19, 345–354 (2007)

    Article  Google Scholar 

  47. Gama, J., Torgo, L., Soares, C.: Dynamic discretization of continuous attributes. In: Proceedings of the 6th Ibero-American Conference on AI: Progress in Artificial Intelligence, IBERAMIA, pp. 160–169 (1998)

    Google Scholar 

  48. Garcia, E.K., Feldman, S., Gupta, M.R., Srivastava, S.: Completely lazy learning. IEEE Trans. Knowl. Data Eng. 22, 1274–1285 (2010)

    Article  Google Scholar 

  49. García, M.N.M., Lucas, J.P., Batista, V.F.L., Martín, M.J.P.: Multivariate discretization for associative classification in a sparse data application domain. In: Proceedings of the 5th International Conference on Hybrid Artificial Intelligent Systems (HAIS), pp. 104–111 (2010)

    Google Scholar 

  50. García, S., Herrera, F.: An extension on “statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. J. Mach. Learn. Res. 9, 2677–2694 (2008)

    MATH  Google Scholar 

  51. García, S., Luengo, J., Sáez, J.A., López, V., Herrera, F.: A survey of discretization techniques: taxonomy and empirical analysis in supervised learning. IEEE Trans. Knowl. Data Eng. 25(4), 734–750 (2013)

    Article  Google Scholar 

  52. Giráldez, R., Aguilar-Ruiz, J., Riquelme, J., Ferrer-Troyano, F., Rodríguez-Baena, D.: Discretization oriented to decision rules generation. Frontiers Artif. Intell. Appl. 82, 275–279 (2002)

    Google Scholar 

  53. González-Abril, L., Cuberos, F.J., Velasco, F., Ortega, J.A.: AMEVA: an autonomous discretization algorithm. Expert Syst. Appl. 36, 5327–5332 (2009)

    Article  Google Scholar 

  54. Grzymala-Busse, J.W.: A multiple scanning strategy for entropy based discretization. In: Proceedings of the 18th International Symposium on Foundations of Intelligent Systems, ISMIS, pp. 25–34 (2009)

    Google Scholar 

  55. Grzymala-Busse, J.W., Stefanowski, J.: Three discretization methods for rule induction. Int. J. Intell. Syst. 16(1), 29–38 (2001)

    Article  MATH  Google Scholar 

  56. Gupta, A., Mehrotra, K.G., Mohan, C.: A clustering-based discretization for supervised learning. Stat. Probab. Lett. 80(9–10), 816–824 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  57. He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)

    Article  Google Scholar 

  58. Ho, K.M., Scott, P.D.: Zeta: A global method for discretization of continuous variables. In: Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD), pp. 191–194 (1997)

    Google Scholar 

  59. Holte, R.C.: Very simple classification rules perform well on most commonly used datasets. Mach. Learn. 11, 63–90 (1993)

    Article  MATH  Google Scholar 

  60. Hong, S.J.: Use of contextual information for feature ranking and discretization. IEEE Trans. Knowl. Data Eng. 9, 718–730 (1997)

    Article  Google Scholar 

  61. Hu, H.W., Chen, Y.L., Tang, K.: A dynamic discretization approach for constructing decision trees with a continuous label. IEEE Trans. Knowl. Data Eng. 21(11), 1505–1514 (2009)

    Article  Google Scholar 

  62. Ishibuchi, H., Yamamoto, T., Nakashima, T.: Fuzzy data mining: Effect of fuzzy discretization. In: IEEE International Conference on Data Mining (ICDM), pp. 241–248 (2001)

    Google Scholar 

  63. Janssens, D., Brijs, T., Vanhoof, K., Wets, G.: Evaluating the performance of cost-based discretization versus entropy- and error-based discretization. Comput. Oper. Res. 33(11), 3107–3123 (2006)

    Article  MATH  Google Scholar 

  64. Jiang, F., Zhao, Z., Ge, Y.: A supervised and multivariate discretization algorithm for rough sets. In: Proceedings of the 5th international conference on Rough set and knowledge technology, RSKT, pp. 596–603 (2010)

    Google Scholar 

  65. Jiang, S., Yu, W.: A local density approach for unsupervised feature discretization. In: Proceedings of the 5th International Conference on Advanced Data Mining and Applications, ADMA, pp. 512–519 (2009)

    Google Scholar 

  66. Jin, R., Breitbart, Y., Muoh, C.: Data discretization unification. Knowl. Inf. Syst. 19, 1–29 (2009)

    Article  Google Scholar 

  67. Kang, Y., Wang, S., Liu, X., Lai, H., Wang, H., Miao, B.: An ICA-based multivariate discretization algorithm. In: Proceedings of the First International Conference on Knowledge Science, Engineering and Management (KSEM), pp. 556–562 (2006)

    Google Scholar 

  68. Kerber, R.: Chimerge: Discretization of numeric attributes. In: National Conference on Artifical Intelligence American Association for Artificial Intelligence (AAAI), pp. 123–128 (1992)

    Google Scholar 

  69. Kononenko, I., Sikonja, M.R.: Discretization of continuous attributes using relieff. In: Proceedings of Elektrotehnika in Racunalnika Konferenca (ERK) (1995)

    Google Scholar 

  70. Kurgan, L.A., Cios, K.J.: CAIM discretization algorithm. IEEE Trans. Knowl. Data Eng. 16(2), 145–153 (2004)

    Article  Google Scholar 

  71. Kurtcephe, M., Güvenir, H.A.: A discretization method based on maximizing the area under receiver operating characteristic curve. Int. J. Pattern Recognit. Artif. Intell. 27(1), 8 (2013)

    Google Scholar 

  72. Lee, C.H.: A hellinger-based discretization method for numeric attributes in classification learning. Knowl. Based Syst. 20, 419–425 (2007)

    Article  Google Scholar 

  73. Li, R.P., Wang, Z.O.: An entropy-based discretization method for classification rules with inconsistency checking. In: Proceedings of the First International Conference on Machine Learning and Cybernetics (ICMLC), pp. 243–246 (2002)

    Google Scholar 

  74. Li, W.L., Yu, R.H., Wang, X.Z.: Discretization of continuous-valued attributes in decision tree generation. In: Proocedings of the Second International Conference on Machine Learning and Cybernetics (ICMLC), pp. 194–198 (2010)

    Google Scholar 

  75. Liu, H., Hussain, F., Tan, C.L., Dash, M.: Discretization: an enabling technique. Data Min. Knowl. Disc. 6(4), 393–423 (2002)

    Article  MathSciNet  Google Scholar 

  76. Liu, H., Setiono, R.: Feature selection via discretization. IEEE Trans. Knowl. Data Eng. 9, 642–645 (1997)

    Article  Google Scholar 

  77. Liu, L., Wong, A.K.C., Wang, Y.: A global optimal algorithm for class-dependent discretization of continuous data. Intell. Data Anal. 8, 151–170 (2004)

    Google Scholar 

  78. Liu, X., Wang, H.: A discretization algorithm based on a heterogeneity criterion. IEEE Trans. Knowl. Data Eng. 17, 1166–1173 (2005)

    Article  Google Scholar 

  79. Ludl, M.C., Widmer, G.: Relative unsupervised discretization for association rule mining, In: Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery, The Fourth European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), pp. 148–158 (2000)

    Google Scholar 

  80. Macskassy, S.A., Hirsh, H., Banerjee, A., Dayanik, A.A.: Using text classifiers for numerical classification. In: Proceedings of the 17th International Joint Conference on Artificial Intelligence, Vol. 2 (IJCAI), pp. 885–890 (2001)

    Google Scholar 

  81. Mehta, S., Parthasarathy, S., Yang, H.: Toward unsupervised correlation preserving discretization. IEEE Trans. Knowl. Data Eng. 17, 1174–1185 (2005)

    Article  Google Scholar 

  82. Monti, S., Cooper, G.: A latent variable model for multivariate discretization. In: Proceedings of the Seventh International Workshop on AI & Statistics (Uncertainty) (1999)

    Google Scholar 

  83. Monti, S., Cooper, G.F.: A multivariate discretization method for learning bayesian networks from mixed data. In: Proceedings on Uncertainty in Artificial Intelligence (UAI), pp. 404–413 (1998)

    Google Scholar 

  84. Muhlenbach, F., Rakotomalala, R.: Multivariate supervised discretization, a neighborhood graph approach. In: Proceedings of the 2002 IEEE International Conference on Data Mining, ICDM, pp. 314–320 (2002)

    Google Scholar 

  85. Nemmiche-Alachaher, L.: Contextual approach to data discretization. In: Proceedings of the International Multi-Conference on Computing in the Global Information Technology (ICCGI), pp. 35–40 (2010)

    Google Scholar 

  86. Nguyen, S.H., Skowron, A.: Quantization of real value attributes - rough set and boolean reasoning approach. In: Proceedings of the Second Joint Annual Conference on Information Sciences (JCIS), pp. 34–37 (1995)

    Google Scholar 

  87. Pazzani, M.J.: An iterative improvement approach for the discretization of numeric attributes in bayesian classifiers. In: Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD), pp. 228–233 (1995)

    Google Scholar 

  88. Perner, P., Trautzsch, S.: Multi-interval discretization methods for decision tree learning. In: Advances in Pattern Recognition, Joint IAPR International Workshops SSPR 98 and SPR 98, pp. 475–482 (1998)

    Google Scholar 

  89. Pfahringer, B.: Compression-based discretization of continuous attributes. In: Proceedings of the 12th International Conference on Machine Learning (ICML), pp. 456–463 (1995)

    Google Scholar 

  90. Pongaksorn, P., Rakthanmanon, T., Waiyamai, K.: DCR: Discretization using class information to reduce number of intervals. In: Proceedings of the International Conference on Quality issues, measures of interestingness and evaluation of data mining model (QIMIE), pp. 17–28 (2009)

    Google Scholar 

  91. Qu, W., Yan, D., Sang, Y., Liang, H., Kitsuregawa, M., Li, K.: A novel chi2 algorithm for discretization of continuous attributes. In: Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development, APWeb, pp. 560–571 (2008)

    Google Scholar 

  92. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc, San Francisco (1993)

    Google Scholar 

  93. Rastogi, R., Shim, K.: PUBLIC: a decision tree classifier that integrates building and pruning. Data Min. Knowl. Disc. 4, 315–344 (2000)

    Article  MATH  Google Scholar 

  94. Richeldi, M., Rossotto, M.: Class-driven statistical discretization of continuous attributes. In: Proceedings of the 8th European Conference on Machine Learning (ECML), ECML ’95, pp. 335–338 (1995)

    Google Scholar 

  95. Roy, A., Pal, S.K.: Fuzzy discretization of feature space for a rough set classifier. Pattern Recognit. Lett. 24, 895–902 (2003)

    Article  MATH  Google Scholar 

  96. Ruiz, F.J., Angulo, C., Agell, N.: IDD: a supervised interval distance-based method for discretization. IEEE Trans. Knowl. Data Eng. 20(9), 1230–1238 (2008)

    Article  Google Scholar 

  97. Sang, Y., Jin, Y., Li, K., Qi, H.: UniDis: a universal discretization technique. J. Intell. Inf. Syst. 40(2), 327–348 (2013)

    Article  Google Scholar 

  98. Sang, Y., Li, K., Shen, Y.: EBDA: An effective bottom-up discretization algorithm for continuous attributes. In: Proceedings of the 10th IEEE International Conference on Computer and Information Technology (CIT), pp. 2455–2462 (2010)

    Google Scholar 

  99. Shehzad, K.: Edisc: a class-tailored discretization technique for rule-based classification. IEEE Trans. Knowl. Data Eng. 24(8), 1435–1447 (2012)

    Article  Google Scholar 

  100. Singh, G.K., Minz, S.: Discretization using clustering and rough set theory. In: Proceedings of the 17th International Conference on Computer Theory and Applications (ICCTA), pp. 330–336 (2007)

    Google Scholar 

  101. Su, C.T., Hsu, J.H.: An extended chi2 algorithm for discretization of real value attributes. IEEE Trans. Knowl. Data Eng. 17, 437–441 (2005)

    Article  Google Scholar 

  102. Subramonian, R., Venkata, R., Chen, J.: A visual interactive framework for attribute discretization. In: Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD), pp. 82–88 (1997)

    Google Scholar 

  103. Sun, Y., Wong, A.K.C., Kamel, M.S.: Classification of imbalanced data: a review. Int. J. Pattern Recognit. Artif. Intell. 23(4), 687–719 (2009)

    Article  Google Scholar 

  104. Susmaga, R.: Analyzing discretizations of continuous attributes given a monotonic discrimination function. Intell. Data Anal. 1(1–4), 157–179 (1997)

    Article  Google Scholar 

  105. Tay, F.E.H., Shen, L.: A modified chi2 algorithm for discretization. IEEE Trans. Knowl. Data Eng. 14, 666–670 (2002)

    Article  Google Scholar 

  106. Tsai, C.J., Lee, C.I., Yang, W.P.: A discretization algorithm based on class-attribute contingency coefficient. Inf. Sci. 178, 714–731 (2008)

    Article  Google Scholar 

  107. Vannucci, M., Colla, V.: Meaningful discretization of continuous features for association rules mining by means of a SOM. In: Proocedings of the 12th European Symposium on Artificial Neural Networks (ESANN), pp. 489–494 (2004)

    Google Scholar 

  108. Ventura, D., Martinez, T.R.: BRACE: A paradigm for the discretization of continuously valued data, In: Proceedings of the Seventh Annual Florida AI Research Symposium (FLAIRS), pp. 117–121 (1994)

    Google Scholar 

  109. Ventura, D., Martinez, T.R.: An empirical comparison of discretization methods. In: Proceedings of the 10th International Symposium on Computer and Information Sciences (ISCIS), pp. 443–450 (1995)

    Google Scholar 

  110. Wang, K., Liu, B.: Concurrent discretization of multiple attributes. In: Proceedings of the Pacific Rim International Conference on Artificial Intelligence (PRICAI), pp. 250–259 (1998)

    Google Scholar 

  111. Wang, S., Min, F., Wang, Z., Cao, T.: OFFD: Optimal flexible frequency discretization for naive bayes classification. In: Proceedings of the 5th International Conference on Advanced Data Mining and Applications, ADMA, pp. 704–712 (2009)

    Google Scholar 

  112. Wei, H.: A novel multivariate discretization method for mining association rules. In: 2009 Asia-Pacific Conference on Information Processing (APCIP), pp. 378–381 (2009)

    Google Scholar 

  113. Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics 1, 80–83 (1945)

    Article  Google Scholar 

  114. Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-based learning algorithms. Mach. Learn. 38(3), 257–286 (2000)

    Article  MATH  Google Scholar 

  115. Wong, A.K.C., Chiu, D.K.Y.: Synthesizing statistical knowledge from incomplete mixed-mode data. IEEE Trans. Pattern Anal. Mach. Intell. 9, 796–805 (1987)

    Article  Google Scholar 

  116. Wu, M., Huang, X.C., Luo, X., Yan, P.L.: Discretization algorithm based on difference-similitude set theory. In: Proceedings of the Fourth International Conference on Machine Learning and Cybernetics (ICMLC), pp. 1752–1755 (2005)

    Google Scholar 

  117. Wu, Q., Bell, D.A., Prasad, G., McGinnity, T.M.: A distribution-index-based discretizer for decision-making with symbolic ai approaches. IEEE Trans. Knowl. Data Eng. 19, 17–28 (2007)

    Article  Google Scholar 

  118. Wu, Q., Cai, J., Prasad, G., McGinnity, T.M., Bell, D.A., Guan, J.: A novel discretizer for knowledge discovery approaches based on rough sets. In: Proceedings of the First International Conference on Rough Sets and Knowledge Technology (RSKT), pp. 241–246 (2006)

    Google Scholar 

  119. Wu, X.: A bayesian discretizer for real-valued attributes. Comput. J. 39, 688–691 (1996)

    Article  Google Scholar 

  120. Wu, X., Kumar, V. (eds.): The Top Ten Algorithms in Data Mining. Data Mining and Knowledge Discovery. Chapman and Hall/CRC, Taylor and Francis, Boca Raton (2009)

    MATH  Google Scholar 

  121. Yang, P., Li, J.S., Huang, Y.X.: HDD: a hypercube division-based algorithm for discretisation. Int. J. Syst. Sci. 42(4), 557–566 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  122. Yang, Y., Webb, G.I.: Discretization for naive-bayes learning: managing discretization bias and variance. Mach. Learn. 74(1), 39–74 (2009)

    Article  Google Scholar 

  123. Yang, Y., Webb, G.I., Wu, X.: Discretization methods. In: Data Mining and Knowledge Discovery Handbook, pp. 101–116 (2010)

    Google Scholar 

  124. Zhang, G., Hu, L., Jin, W.: Discretization of continuous attributes in rough set theory and its application. In: Proceedings of the 2004 IEEE Conference on Cybernetics and Intelligent Systems (CIS), pp. 1020–1026 (2004)

    Google Scholar 

  125. Zhu, W., Wang, J., Zhang, Y., Jia, L.: A discretization algorithm based on information distance criterion and ant colony optimization algorithm for knowledge extracting on industrial database. In: Proceedings of the 2010 IEEE International Conference on Mechatronics and Automation (ICMA), pp. 1477–1482 (2010)

    Google Scholar 

  126. Zighed, D.A., Rabaséda, S., Rakotomalala, R.: FUSINTER: a method for discretization of continuous attributes. Int. J. Uncertainty, Fuzziness Knowl. Based Syst. 6, 307–326 (1998)

    Article  MATH  Google Scholar 

  127. Zighed, D.A., Rakotomalala, R., Feschet, F.: Optimal multiple intervals discretization of continuous attributes for supervised learning. In: Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD), pp. 295–298 (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Salvador García .

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

García, S., Luengo, J., Herrera, F. (2015). Discretization. In: Data Preprocessing in Data Mining. Intelligent Systems Reference Library, vol 72. Springer, Cham. https://doi.org/10.1007/978-3-319-10247-4_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-10247-4_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-10246-7

  • Online ISBN: 978-3-319-10247-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics