Advertisement

Discretization

Chapter
  • 420 Downloads
Part of the Cognitive Intelligence and Robotics book series (CIR)

Abstract

The process of transforming of continuous functions, variables, data, and models into discrete form is known as discretization. Real-world processes usually deal with continuous variables. However, for being processed in a computer, the data sets generated by these processes need to be discretized.

References

  1. 1.
    Y. Yang, G.I. Webb, X. Wu, Discretization methods, in Data Mining and Knowledge Discovery Handbook (2010), pp. 101–116CrossRefGoogle Scholar
  2. 2.
    A. Bondu, M. Boulle, V. Lemaire, S. Loiseau, B. Duval, in A Non-parametric Semi-supervised Discretization Method. Eighth IEEE International Conference on Data Mining (2008), pp. 53–62Google Scholar
  3. 3.
    I. Mitov, K. Ivanova, K. Markov, V. Velychko, P. Stanchev, K. Vanhoof, Comparison of discretization methods for preprocessing data for pyramidal growing network classification method, in International Book Series on Information Science and Computing (2010)Google Scholar
  4. 4.
    S. Bay, in Multivariate Discretization of Continuous Variables for Set Mining. Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2000), pp. 315–319Google Scholar
  5. 5.
    Y. Yang, G.I. Webb, in Non-disjoint Discretization for Naive-Bayes Classifiers. Proceedings Nineteenth International Conference on Machine Learning, Morgan Kaufmann (2002), pp. 666–673Google Scholar
  6. 6.
    C.N. Hsu, H.J. Huang, T.T. Wong, in Why Discretization Works for Naive Bayesian Classifiers. Proceedings of the Seventeenth International Conference on Machine Learning (2000), pp. 309–406Google Scholar
  7. 7.
    Ying Yang, Geoffrey I. Webb, Discretization for naive-Bayes learning: managing discretization bias and variance. Mach. Learn. 74(1), 39–74 (2009)CrossRefGoogle Scholar
  8. 8.
    Y. Yang, Discretization for Naive-Bayes learning, Ph.D. thesis, July 2003Google Scholar
  9. 9.
    R.-P. Li, Z.-O. Wang, in An Entropy-Based Discretization Method for Classification Rules with Inconsistency Checking. Proceedings International Conference on Machine Learning and Cybernetics (IEEE, 2002), pp. 243–246Google Scholar
  10. 10.
    R. Cang, X. Wang, K. Li, N. Yang, New method for discretization of continuous attributes in rough set theory. J. Syst. Eng. Electron. 21(2), 250–253 (2010)CrossRefGoogle Scholar
  11. 11.
    Y. Ge, F. Cao, R.F. Duan, Impact of discretization methods on the rough set-based classification of remotely sensed images. Int. J. Digital Earth 4(4), 330–346 (2011)CrossRefGoogle Scholar
  12. 12.
    C.-H. Lee, A Hellinger-based discretization method for numeric attributes in classification learning. Knowl. Based Syst. 20(4), 419–425 (2007)CrossRefGoogle Scholar
  13. 13.
    T.M. Cover, J.A. Thomas, Elements of Information Theory, 2nd edn. (Wiley Inc., 2006)Google Scholar
  14. 14.
    Z. Ying, Minimum Hellinger distance estimation for censored data. Ann. Stat. 20(3), 1361–1390 (1992)MathSciNetzbMATHCrossRefGoogle Scholar
  15. 15.
    R.J. Beran, Minimum Hellinger distances for parametric models. Ann. Stat. 5, 445–463 (1977)MathSciNetzbMATHCrossRefGoogle Scholar
  16. 16.
    I. Kononenko, Naive Bayesian classifier and continuous attributes. Informatica 16(1), 1–8 (1992)Google Scholar
  17. 17.
    P. Blajdo, J.W. Grzymala-Busse, Z.S. Hippe, M. Knap, T. Mroczek, L. Piatek, A comparison of six approaches to discretization—a rough set perspective. Rough Sets Knowl. Technol. 5009(2008), 31–38 (2008)CrossRefGoogle Scholar
  18. 18.
    KDD Cup 1999 Data Data Set, UCI Machine Learning Repository. http://archive.ics.uci.edu/ml/datasets/KDD+Cup+1999+Data
  19. 19.
    R.L. Kruse, A.J. Ryba, Data Structures and Program Design in C++ (Prentice Hall, 1998). ISBN-13: 9780137689958Google Scholar
  20. 20.
    F.R. Hampel, E.M. Ronchetti, P.J. Rousseeuw, W.A. Stahel, Robust Statistics—The Approach Based on Influence Functions (Wiley, 1986)Google Scholar
  21. 21.
    P.J. Huber, Robust Statistics (Wiley, 1981)Google Scholar
  22. 22.
    J.W. Osborne, A. Overbay, The power of outliers (and why researchers should always check for them). Pract. Assess. Rese. Eval. 9(6) (2004)Google Scholar
  23. 23.
    C. Manikopoulos, S. Papavassiliou, Network intrusion and fault detection: a statistical anomaly approach. Commun. Mag. 40(10), 76–82 (2002)CrossRefGoogle Scholar
  24. 24.
    L.T. Heberlein, G.V. Dias, K.N. Levitt, B. Mukherjee, J. Wood, D. Wolber, A Network Security Monitor. Proceedings 1990 IEEE Computer Society Symposium on Research in Security and Privacy (1990), pp. 296–304Google Scholar
  25. 25.
    P. Barford, S. Jha, V. Yegneswaran, Fusion and Filtering in Distributed Intrusion Detection Systems. Proceedings of the 42nd Annual Allerton Conference on Communication, Control and Computing, September (2004)Google Scholar
  26. 26.
    S.R. Snapp, J. Brentano, G.V. Dias, T.L. Goan, L.T. Heberlein, C. Ho, K.N. Levitt, B. Mukherjee, S.E. Smaha, T. Grance, et al., DIDS (Distributed Intrusion Detection System)-Motivation, Architecture, and an Early Prototype. Proceedings of the 14th National Computer Security Conference (1991), pp. 167–176Google Scholar
  27. 27.
    J. Undercoffer, F. Perich, C. Nicholas, SHOMAR: An Open Architecture for Distributed Intrusion Detection Services (University of Maryland, Baltimore County, 2002)Google Scholar
  28. 28.
    R. Janakiraman, M. Waldvogel, Q. Zhang, in Indra: A Peer-to Peer Approach to Network Intrusion Detection and Prevention. Proceedings of IEEE WETICE 2003 (2003)Google Scholar
  29. 29.
    P.J. Rousseeuw, A.M. Leroy, Robust Regression and Outlier Detection (Wiley, 1987)Google Scholar
  30. 30.
    Janez Demˇsar, Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)MathSciNetGoogle Scholar
  31. 31.
    C.-T. Su, J.-H. Hsu, An extended Chi2 algorithm for discretization of real value attributes. IEEE Trans. Knowl. Data Eng. 17, 437–441 (2005)CrossRefGoogle Scholar
  32. 32.
    F.E.H. Tay, L. Shen, A modified Chi2 algorithm for discretization. IEEE Trans. Knowl. Data Eng. 14(3), 666–670 (2002)CrossRefGoogle Scholar
  33. 33.
    V. Faraci, Jr., Discrete reliability growth tracking. J. Reliab. Inf. Anal. Cent. Second Quarter (2010)Google Scholar
  34. 34.
    W. Qu, D. Yan, S. Yu, H. Liang, M. Kitsuregawa, Kqiu Li, A novel Chi2 algorithm for discretization of continuous attributes. Prog. WWW Res. Dev. 4976(2008), 560–571 (2008)CrossRefGoogle Scholar
  35. 35.
    D. Tian, X.-j Zeng, J. Keane, Core-generating approximate minimum entropy discretization for rough set feature selection in pattern classification. Int. J. Approximate Reasoning 52(6), 863–880 (2011)CrossRefGoogle Scholar
  36. 36.
    Z. Marzuki, F. Ahmad, in Data Mining Discretization Methods and Performances. Proceedings of the International Conference on Electrical Engineering and Informatics (2007)Google Scholar
  37. 37.
    L.A. Kurgan, K.J. Cios, CAIM discretization algorithm. IEEE Trans. Knowl. Data Eng. 16(2), 145–153 (2004)CrossRefGoogle Scholar
  38. 38.
    S. Kotsiantis, D. Kanellopoulos, Discretization techniques: a recent survey. GESTS Int. Trans. Comput. Sci. Eng. 32(1), 47–58 (2006)Google Scholar
  39. 39.
    R. Kerber, in Discretization of Numeric Attributes. Proceedings of the Tenth National Conference on Artificial Intelligence (MIT Press, Cambridge, MA, 1992), pp. 123–128Google Scholar
  40. 40.
    U.M. Fayyad, K.B. Irani, in Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning. Proceedings of the 13th Joint Conference on Artificial Intelligence (1993), pp. 1022–1029Google Scholar
  41. 41.
    R. Kohavi, M. Sahami, in Error-Based and Entropy-Based Discretization of Continuous Features. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (Menlo Park CA, AAAI Press, 1996), pp. 114–119Google Scholar
  42. 42.
    Xu Tan, Chen Yingwu, Half-global discretization algorithm based on rough set theory. J. Syst. Eng. Electron. 20(2), 339–347 (2009)Google Scholar
  43. 43.
    J.Y. Ching, A.K.C. Wong, K.C.C. Chan, Class-dependent discretization for inductive learning from continuous and mixed mode data. IEEE Trans. Pattern Anal. Mach. Intell. 17(7), 641–651 (1995)CrossRefGoogle Scholar
  44. 44.
    Z. Ren, Y. Hao, B. Wen, A heuristic genetic algorithm for continuous attribute discretization in rough set theory. Adv. Mater. Res. 211–212, 132–136 (2011)CrossRefGoogle Scholar
  45. 45.
    S. Mazumder, T. Sharma, R. Mitra, N. Sengupta, J. Sil, Chapter 62 Generation of Sufficient Cut Points to Discretize Network Traffic Data Sets (Springer Science and Business Media LLC, 2012)Google Scholar
  46. 46.
    N. Sengupta, in Security and Privacy at Cloud System, ed. by B. Mishra, H. Das, S. Dehuri, A. Jagadev. Cloud Computing for Optimization: Foundations, Applications, and Challenges. Studies in Big Data, vol. 39 (Springer, Cham, 2018)Google Scholar
  47. 47.
    K.J. Cios, L. Kurgan, in Hybrid Inductive Machine Learning: An Overview of CLIP Algorithms. ed. by L.C. Jain, J. Kacprzyk. New Learning Paradigms in Soft Computing (Physica-Verlag, Springer, 2001), pp. 276–322Google Scholar
  48. 48.
    R. Giraldez, J.S. Aguilar-Ruiz, J.C. Riquelme, F.J. Ferrer-Troyano, D.S. Rodriguez-Baena, Discretization oriented to decision rules generation. Front. Artif. Intell. Appl. 82, 275—279 (2002)Google Scholar
  49. 49.
    P. Datta, D. Kibler, in Symbolic Nearest Mean Classifiers. Proceeding of AAAI’97 (AAAI Press, 1997)Google Scholar
  50. 50.
    J. Ge, Y. Xia, in A Discretization Algorithm for Uncertain Data. Proceeding of the 21st international conference on Database and expert systems applications: Part II (Springer, Berlin, 2010), pp. 485–499Google Scholar
  51. 51.
    Nsl-kdd data set for network-based intrusion detection systems. http://nsl.cs.unb.ca/KDD/NSL-KDD.html (2009)
  52. 52.
    J. Han, M. Kamber, J. Pei, in Data Mining: Concepts and Techniques, 3rd ed. The Morgan Kaufmann Series in Data Management Systems (Morgan Kaufmann Publishers, 2011). ISBN-10: 0123814790Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  1. 1.Department of Information TechnologyUniversity College of BahrainManamaBahrain
  2. 2.Department of Computer Science and TechnologyIndian Institute of Engineering Science and Technology (IIEST), ShibpurHowrahIndia

Personalised recommendations