Advertisement

Data Reduction

Chapter
  • 425 Downloads
Part of the Cognitive Intelligence and Robotics book series (CIR)

Abstract

An information table comprises a set of data items, presented as tuples (rows), where each tuple includes a set of attributes. Data reduction refers to redundancy of both data items/instances and attributes, and thus is an important item of study in pattern recognition.

References

  1. 1.
    R. Jensen, Q. Shen, Computational Intelligence and Feature Selection: Rough and Fuzzy Approaches (Wiley-IEEE Press, 2008) Google Scholar
  2. 2.
    M. Prasad, A. Sowmya, I. Koch, Efficient feature selection based on independent component analysis, in Proceedings of the 2004 Conference on Intelligent Sensors, Sensor Networks and Information Processing (2004), pp. 427–432Google Scholar
  3. 3.
    H. Peng, F. Long, C. Ding, Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27, 1226–1238 (2005)CrossRefGoogle Scholar
  4. 4.
    Y. Saeys, T. Abeel, Y. Peer, Robust feature selection using ensemble feature selection techniques, in Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases—Part II (Springer, 2008), pp. 313–325. ISBN 978-3-540-87480-5Google Scholar
  5. 5.
    J. Rissanen, Modeling by shortest data description. Automatica 14, 465–471 (1978)zbMATHCrossRefGoogle Scholar
  6. 6.
    M.H. Hansen, B. Yu, Model selection and the principle of minimum description length. J. Am. Stat. Assoc. 96, 454 (2001)MathSciNetCrossRefGoogle Scholar
  7. 7.
    J.-P. Pellet, A. Elisseeff, Using Markov blankets for causal structure learning. J. Mach. Learn. Res. (2008)Google Scholar
  8. 8.
    J.R. Anaraki, M. Eftekhari, Improving fuzzy-rough quick reduct for feature selection, in IEEE 19th Iranian Conference on Electrical Engineering (ICEE) (2011), pp. 1–6Google Scholar
  9. 9.
    E.P. Ephzibah, B. Sarojini, J. Emerald Sheela, A study on the analysis of genetic algorithms with various classification techniques for feature selection. Int. J. Comput. Appl. 8(8) (2010)CrossRefGoogle Scholar
  10. 10.
    S.B. Kotsiantis, D. Kanellopoulos, P.E. Pintelas, Data preprocessing for supervised leaning. Int. J. Comput. Sci. 1(2) (2006). ISSN 1306-4428Google Scholar
  11. 11.
    X. Zhu, X. Wu, Q. Chen, Eliminating class noise in large datasets, in Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003) (Washington DC, 2003)Google Scholar
  12. 12.
    J.M. Hellerstein, Quantitative Data Cleaning for Large Databases (United Nations Economic Commission for Europe (UNECE), 2008)Google Scholar
  13. 13.
    B. Zadrozny, Learning and evaluating classifiers under sample selection bias, in International Conference on Machine Learning ICML’04 (2004)Google Scholar
  14. 14.
    X.Y. Wang, G. Whitwell, J.M. Garibaldi, The application of a simulated annealing fuzzy clustering algorithm for cancer diagnosis, in Proceedings of IEEE 4th International Conference on Intelligent Systems Design and Application, Budapest, Hungary, 26–28 Aug. 2004 pp. 467–472Google Scholar
  15. 15.
    K.J. Cios, W. Pedrycz, R. Swiniarski, Data Mining Methods for Knowledge Discovery (Kluwer, 1998)Google Scholar
  16. 16.
    A. Skowron, C. Rauszer, The discernibility matrices and functions in information systems, in Intelligent Decision Support-Handbook of Applications and Advances of the Rough Sets Theory, ed. by Slowinski (1991), pp. 331–362CrossRefGoogle Scholar
  17. 17.
    D. Bhattacharjee, D.K. Basu, M. Nasipuri, M. Kundu, Reduction of Feature Vectors Using Rough Set Theory for Human Face Recognition, CoRR, volume (abs/1005.4044) (2010)Google Scholar
  18. 18.
    J. Han, R. Sanchez, X.T. Hu, Feature selection based on relative attribute dependency: an experimental study, in Proceedings of the 10th international conference on Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing—Volume Part I (2005)Google Scholar
  19. 19.
    Q. Shen, A. Chouchoulas, A rough-fuzzy approach for generating classification rules. Pattern Recogn. 35, 2425–2438 (2002)zbMATHCrossRefGoogle Scholar
  20. 20.
    Y. Caballero, R. Bello, D. Alvarez, M.M. Garcia, Two new feature selection algorithms with rough sets theory. IFIP AI 209–216, 2006 (2006)Google Scholar
  21. 21.
    R. Jensen, Combining rough and fuzzy sets for feature selection, Ph.D. thesis (2005)Google Scholar
  22. 22.
    N. Sengupta, Security and privacy at cloud system, in Cloud Computing for Optimization: Foundations, Applications, and Challenges. Studies in Big Data, vol 39 ed. by Mishra, B., Das, H., Dehuri, S., Jagadev, A (Springer, Cham, 2018)Google Scholar
  23. 23.
    S. Givant, P. Halmos, Introduction to Boolean Algebras (Springer, Berlin, 2009)Google Scholar
  24. 24.
    P.K. Singh, P.S.V.S. Sai Prasad, Scalable quick reduct algorithm: iterative MapReduce approach, in Proceeding CODS ‘16 Proceedings of the 3rd IKDD Conference on Data Science, 2016, Article No. 25 Pune, India—13–16 Mar 2016. (ACM New York, NY, USA ©, 2016)Google Scholar
  25. 25.
    R. Jensen, Q. Shen, Fuzzy-rough sets for descriptive dimensionality reduction, in Proceedings of the 11th International Conference on Fuzzy Systems (2002), pp. 29–34Google Scholar
  26. 26.
    F. Abu-Amara, I. Abdel-Qader, Hybrid Mammogram classification using rough set and fuzzy classifier. Int. J. Biomed. Imaging 2009 (2009)CrossRefGoogle Scholar
  27. 27.
    M. Yang, S. Chen, X. Yang, A novel approach of rough set-based attribute reduction using fuzzy discernibility matrix, in Proceedings of the Fourth International Conference on Fuzzy Systems and Knowledge Discovery, vol. 03 (IEEE Computer Society, 2007), pp. 96–101Google Scholar
  28. 28.
    Neil MacParthalain, Richard Jensen, Measures for unsupervised fuzzy-rough feature selection. Int. J. Hybrid Intell. Syst. 7(4), 249–259 (2010)zbMATHCrossRefGoogle Scholar
  29. 29.
    A.F. Gomez-Skarmeta, F. Jimenez, J. Ibanez, Data preprocessing in knowledge discovery with fuzzy-evolutionary algorithms, in IFSA’99, Proceedings of the Eighth International Fuzzy Systems Association World Congress, vol. I, 17–20 Aug 1999Google Scholar
  30. 30.
    P. Blajdo, J.W. Grzymala-Busse, Z.S. Hippe, M. Knap, T. Mroczek, L. Piatek, A comparison of six approaches to discretization—a rough set perspective, in Rough Sets and Knowledge Technology. LNCS, Springer, vol. 5009/2008 (2008), pp. 31–38Google Scholar
  31. 31.
    H. Takahashia, H. Iwakawaa, S. Nakaob, T. Ojiob, R. Morishitab, S. Morikawab, Y. Machidad, C. Machidaa, T. Kobayashia, Knowledge-based fuzzy adaptive resonance theory and its application to the analysis of gene expression in plants. J. Biosci. Bioeng. 106(6), 587–593 (2008)CrossRefGoogle Scholar
  32. 32.
    M. Ektefa, S. Memar, F. Sidi, L.S. Affendey, Intrusion detection using data mining techniques, in International Conference on Information Retrieval & Knowledge Management (2010)Google Scholar
  33. 33.
    E. Kesavulu Reddy, V. Naveen Reddy, P. Govinda Rajulu, A study of intrusion detection in data mining, in Proceedings of the World Congress on Engineering, vol III, WCE 2011, London, U.K., 6–8 July 2011Google Scholar
  34. 34.
    E. Lughofer, On dynamic soft dimension reduction in evolving fuzzy classifiers, in Proceedings of the Computational intelligence for Knowledge-Based Systems Design, and 13th International Conference on Information Processing and Management of Uncertainty, IPMU’10 (Springer, Berlin, 2010), pp. 79–88Google Scholar
  35. 35.
    N.J. Pizzi, W. Pedrycz, Classifying high-dimensional patterns using a fuzzy logic discriminant network. Adv. Fuzzy Syst. 2012, Article ID 920920 (2012), 7 pagesGoogle Scholar
  36. 36.
    O.Z. Maimon, L. Rokach, Data Mining and Knowledge Discovery Handbook (Springer, Berlin, 2010)Google Scholar
  37. 37.
    L.A. Zadeh, Fuzzy sets. Inf. Control 8, 338–353 (1965)zbMATHCrossRefGoogle Scholar
  38. 38.
    R. Roselin, K. Thangavel, C. Velayutham, Fuzzy-rough feature selection for mammogram classification. J. Electron. Sci. Technol. 9(2) (2011)Google Scholar
  39. 39.
    R.B. Bhatt, M. Gopal, On fuzzy-rough sets approach to feature selection. Elsevier Pattern Recogn. Lett. 26(7), 965–975 (2005)CrossRefGoogle Scholar
  40. 40.
    J. Derrac, C. Cornelis, S. García, F. Herrera, Enhancing evolutionary instance selection algorithms by means of fuzzy rough set based feature selection. Elsevier J. Inf. Sci. 186, 73–92 (2012)CrossRefGoogle Scholar
  41. 41.
    Z. Shaeiri, R. Ghaderi, A. Hojjatoleslami, Fuzzy-rough feature selection and a fuzzy 2-level complementary approach for classification of gene expression data. Sci. Res. Essays 7(14), 1512–1520 16 Apr 2012Google Scholar
  42. 42.
    P. Kumar, P. Vadakkepat, L.A. Poh, Fuzzy-rough discriminative feature selection and classification algorithm, with application to microarray and image datasets. Appl. Soft Comput. 11(4), 3429–3440 (2011)Google Scholar
  43. 43.
    R. Jensen, Q. Shen, Semantics-preserving dimensionality reduction: rough and fuzzy-rough-based approaches. IEEE Trans. Knowl. Data Eng. 17(1) (2005)Google Scholar
  44. 44.
    R. Jensen, Q. Shen, Rough and Fuzzy Sets for Dimensionality Reduction (2001)Google Scholar
  45. 45.
    R. Jensen, Q. Shen, New approaches to fuzzy-rough feature selection. IEEE Trans. Fuzzy Syst. 17(4), 824–838 (2009)CrossRefGoogle Scholar
  46. 46.
    A. Banumathi, A. Pethalakshmi, Refinement of K-Means and Fuzzy C-Means. Int. J. Comput. Appl. 39(17), 11–16 (2012)Google Scholar
  47. 47.
    L. Xie, Y. Wang, L. Chen, G. Yu, An anomaly detection method based on fuzzy C-means clustering algorithm, in Proceedings of the Second International Symposium on Networking and Network Security (ISNNS ’10) (2010), pp. 089–092Google Scholar
  48. 48.
    J.-H. Man, An improved fuzzy discretization way for decision tables with continuous attributes, in Proceedings of the Sixth International Conference on Machine Learning and Cybernetics, Hong Kong, 19–22 Aug 2007Google Scholar
  49. 49.
    N.S. Chaudhari, A. Ghosh, Feature extraction using fuzzy rule based system. Int. J. Comput. Sci. Appl. 5(3), 1–8 (2008)Google Scholar
  50. 50.
    M. Saha, J. Sil, Dimensionality reduction using genetic algorithm and fuzzy rough concepts, in 2011 World Congress on Information and Communication Technologies (2011)Google Scholar
  51. 51.
    H. Chih-Cheng, S. Kulkarni, K. Bor-Chen, A new weighted fuzzy C-means clustering algorithm for remotely sensed image classification. IEEE J. Sel. Top. Sig. Process. 5(3), 543–553 (2011)Google Scholar
  52. 52.
    K. G, Comparison Of Mamdani And Sugeno fuzzy inference system models for resonant frequency calculation of rectangular microstrip antennas. Progr Electromagn. Res B 12, 81–104 (2009)CrossRefGoogle Scholar
  53. 53.
    H. Liu, Z. Xu, A. Abraham, Hybrid fuzzy-genetic algorithm approach for crew grouping, in Proceedings of the Fifth International Conference on Intelligent Systems Design and Applications (ISDA’05) (2005), pp. 332–337Google Scholar
  54. 54.
    J.A. Tenreiro Machadoa, A.C. Costa, M. Dulce Quelhas, Entropy analysis of the DNA code dynamics in human chromosomes. Comput. Math. Appl. 62, 1612–1617 (2011)MathSciNetzbMATHCrossRefGoogle Scholar
  55. 55.
    V. Torra, Fuzzy c-means for fuzzy hierarchical clustering, in Proceedings of FUZZ ‘05, the 14th IEEE International Conference On Fuzzy Systems, 25–25 May 2005. ISBN: 0-7803-9159-4, 646-651Google Scholar
  56. 56.
    S. Bandyopadhyay, Simulated Annealing for Fuzzy Clustering: Variable Representation, Evolution of the Number of Clusters and Remote Sensing Application. Machine Intelligence Unit, Indian Statistical Institute (unpublished personal communication) (2003)Google Scholar
  57. 57.
    M.A. Rassam, M.A. Maarof, A. Zainal, Intrusion detection system using unsupervised immune network clustering with reduced features. Int. J. Adv. Soft Comput. Appl. 2(3) (2010)Google Scholar
  58. 58.
    J.A. Lee, M. Verleysen, Nonlinear projection with the Isotope method, in ICANN’2002 Proceedings—International Conference on Artificial Neural Networks Madrid (Spain) ed. by J.R. Dorronsoro. Springer, Lecture Notes in Computer Science 2415, 28–30 Aug 2002, pp. 933–938. ISBN 3-540-44074-7Google Scholar
  59. 59.
    Ch. Aswani Kumar, Reducing data dimensionality using random projections and fuzzy k-means clustering. Int. J. Intell. Comput. Cybern. 4(3), 353–365 (2011)Google Scholar
  60. 60.
    E. Bingham, H. Mannila, Random projection in dimensionality reduction: applications to image and text data, in Proceedings of 7th ACM SIGKDD International Conference Knowledge Discovery and Data Mining (KDD-2001), (San Francisco, CA, 26–29 Aug 2001), pp. 245–250Google Scholar
  61. 61.
    S.C.G. Kirkpatrick, C.D. Gelatt, M. Vecchi, Optimization by simulated annealing. Science 220(1983), 49–58 (1983)MathSciNetzbMATHGoogle Scholar
  62. 62.
    H. Liu, Yu. Lei, Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Knowl. Data Eng. 17(4), 491–502 (2005)CrossRefGoogle Scholar
  63. 63.
    X.L. Xie, G. Beni, A validity measure for fuzzy clustering. IEEE Trans. Pattern Anal. Mach. Intell. 13, 841–847 (1991)CrossRefGoogle Scholar
  64. 64.
    K. Thangavel, A. Pethalakshmi, Dimensionality reduction based on rough set theory: a review. J. Appl. Soft Comput. 9(1), 1–12 (2009)CrossRefGoogle Scholar
  65. 65.
    M. Sammany, T. Medhat, Dimensionality reduction using rough set approach for two neural networks-based applications, in Proceedings of the international conference on Rough Sets and Intelligent Systems Paradigms (Springer, Berlin, 2007), pp. 639–647Google Scholar
  66. 66.
    Q. Shen, A. Chouchoulas, Rough set-based dimensionality reduction for supervised and unsupervised learning. Int. J. Appl. Math. Comput. Sci. 11(3), 583–601 (2001)MathSciNetzbMATHGoogle Scholar
  67. 67.
    J.R. Leathwick, D. Rowe, J. Richardson, J. Elith, T. Hastie, Using multivariate adaptive regression splines to predict the distributions of New Zealand’s freshwater diadromous fish. Freshw. Biol. 50, 2034–2052 (2005)CrossRefGoogle Scholar
  68. 68.
    A. Chouchoulas, Q. Shen, Rough set-aided keyword reduction for text categorisation. Appl. Artif. Intell. 15(9), 843–873 (2001)CrossRefGoogle Scholar
  69. 69.
    N. Zhong, J. Dong, S. Ohsuga, Using rough sets with heuristics for feature selection. J. Intell. Inf. Syst. 16, 199–214 (2001)zbMATHCrossRefGoogle Scholar
  70. 70.
    J.-H. Leet, J.-H. Leet, S.-G. Sohn, J.-H. Ryu, T.-M. Chungt, Effective Value of Decision Tree with KDD 99 Intrusion Detection Datasets for Intrusion Detection System (IEEE, 2008). ISBN: 978-89-5519-136-3Google Scholar
  71. 71.
    N. Sengupta, Intrusion detection system for cloud computing, in Middle East and North Africa Conference for Public Administration Research, Bahrain, 23rd–24th Apr 2014Google Scholar
  72. 72.
    D.L. Davies, D.W. Bouldin, A cluster separation measure. IEEE Trans. Pattern Recogn. Mach. Intell. 1(2), 224–227 (1979)CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  1. 1.Department of Information TechnologyUniversity College of BahrainManamaBahrain
  2. 2.Department of Computer Science and TechnologyIndian Institute of Engineering Science and Technology (IIEST), ShibpurHowrahIndia

Personalised recommendations