Skip to main content

Categorical Feature Reduction Using Multi Objective Genetic Algorithm in Cluster Analysis

  • Chapter

Part of the book series: Lecture Notes in Computer Science ((TCOMPUTATSCIE,volume 8160))

Abstract

In the paper, real coded multi objective genetic algorithm based K-clustering method has been studied, K represents the number of clusters. In K-clustering algorithm value of K is known. The searching power of Genetic Algorithm (GA) is exploited to search for suitable clusters and centers of clusters so that intra-cluster distance (Homogeneity, H) and inter-cluster distances (Separation, S) are simultaneously optimized. It is achieved by measuring H and S using Mod distance per feature metric, suitable for categorical features (attributes). We have selected 3 benchmark data sets from UCI Machine Learning Repository containing categorical features only.

The paper proposes two versions of MOGA based K-clustering algorithm. In proposed MOGA (H, S), all features are taking part in building chromosomes and calculation of H and S values. In MOGA_Feature_Selection (H, S), selected features take part to build chromosomes, relevant for clusters. Here, K-modes is hybridized with GA. We have used hybridized GA to combine global searching capabilities of GA with local searching capabilities of K-modes. Considering context sensitivity, we have used a special crossover operator called “pairwise crossover” and “substitution”. The main contribution of this paper is simultaneous dimensionality reduction and optimization of objectives using MOGA.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: Proceeding of ACM International Conference Management of Data, pp. 94–105 (1998)

    Google Scholar 

  2. Ankerst, M., Breunig, M., Kriegel, H.P., Sander, J.: OPTICS: ordering points to identify the clustering structure. In: Proceeding of International Conference Management of Data, pp. 49–60 (1999)

    Google Scholar 

  3. Bandyopadhyay, S., Pal, S.K., Aruna, B.: Multi-objective GAs, quantitative indices and pattern classification. IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics 34, 2088–2099 (2004)

    Article  Google Scholar 

  4. Bandyopadhyay, S., Maulik, U., Mukhopadhyay, A.: Multiobjective genetic clustering for pixel classification in remote sensing imagery. IEEE Transactions on Geoscience Remote Sensing 45, 1506–1511 (2007)

    Google Scholar 

  5. Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Kluwer Academic Publishers, Norwell (1981)

    Book  MATH  Google Scholar 

  6. Bhandari, D., Murthy, C.A., Pal, S.K.: Genetic algorithm with elitist model and its convergence. International Journal of Pattern Recognition and Artificial Intelligence 10, 731–747 (1996)

    Article  Google Scholar 

  7. Cvetkovic, D., Parmee, I.C., Webb, E.: Multi-Objective Optimisation and Preliminary Air-frame Design. In: Parmee, I.C. (ed.) Adaptive Computing in Design and Manufacture: The Integration of Evolutionary and Adaptive Computing Technologies with Product/System Design and Realisation, pp. 255–267. Springer, New York (1998)

    Google Scholar 

  8. Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence 1, 224–227 (1979)

    Google Scholar 

  9. Day, W.H., Edelsbrunner, H.: Efficient algorithms for agglomerative hierarchical clustering methods. Journal of Classification 1, 1–24 (1984)

    Article  Google Scholar 

  10. Deb, K.: Multi-Objective Optimization Using Evolutionary Algorithms. John Wiley & Sons, New York (2001)

    MATH  Google Scholar 

  11. Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 6, 182–197 (2002)

    Article  Google Scholar 

  12. Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society 39, 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  13. Deng, S., He, Z., Xu, X.: G-ANMI: A mutual information based genetic clustering algorithm for categorical data. Knowledge-Based Systems 23, 144–149 (2010)

    Article  Google Scholar 

  14. Dhiraj, K., Rath, S.K.: Comparison of SGA and RGA based clustering algorithm for pattern recognition. International Journal of Recent Trends in Engineering 1, 269–273 (2009)

    Google Scholar 

  15. Dogra, S.K.: Confusion Matrix, QSARWorld–A Strand Life Sciences Web Resource, http://www.qsarworld.com/qsar-ml-confusion-matrix.php

  16. Dunham, M.H.: Data Mining: Introductory and Advanced Topics. Prentice-Hall, Eaglewood Cliffs (2002)

    Google Scholar 

  17. Dunn, J.: Well separated clusters and optimal fuzzy partitions. Journal of Cybernetics 4, 95–104 (1974)

    Article  MathSciNet  Google Scholar 

  18. Dutta, D., Dutta, P., Sil, J.: Clustering by multi objective genetic algorithm. In: Proceeding of 1st IEEE International Conference on Recent Advances in Information Technology, pp. 548–553 (2012)

    Google Scholar 

  19. Dutta, D., Dutta, P., Sil, J.: Clustering data set with categorical feature using multi objective genetic algorithm. In: Proceeding of IEEE International Conference on Data Science and Engineering, pp. 103–108 (2012)

    Google Scholar 

  20. Dutta, D., Dutta, P., Sil, J.: Data clustering with mixed features by multi objective genetic algorithm. In: Proceeding of 12th IEEE International Conference on Hybrid Intelligent Systems, pp. 336–341 (2012)

    Google Scholar 

  21. Dutta, D., Dutta, P., Sil, J.: Simultaneous feature selection and clustering for categorical features using multi objective genetic algorithm. In: Proceeding of 12th IEEE International Conference on Hybrid Intelligent Systems, pp. 191–196 (2012)

    Google Scholar 

  22. Dutta, D., Dutta, P., Sil, J.: Simultaneous continuous feature selection and K clustering by multi objective genetic algorithm. In: Proceeding of 3rd IEEE International Advance Computing Conference, pp. 937–942 (2013)

    Google Scholar 

  23. Ester, M., Kriegel, H.P., Sander, J.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceeding of 2nd International Conference on Knowledge Discovery and Data Mining, pp. 226–231 (1996)

    Google Scholar 

  24. Faceli, K., de Carvalho, A.C.P.L.F., de Souto, M.C.P.: Multi-objective clustering ensemble. International Journal of Hybrid Intelligent Systems 4, 145–156 (2013)

    Google Scholar 

  25. Falkenauer, E.: Genetic Algorithms and Grouping Problems. John Wiley & Sons, New York (1998)

    Google Scholar 

  26. Fisher, D.H.: Improving inference through conceptual clustering. In: Proceeding of National Conference on Artificial Intelligence, pp. 461–465 (1987)

    Google Scholar 

  27. Forsati, R., Doustdar, H.M., Shamsfard, M., Keikha, A., Meybodi, M.R.: A fuzzy co-clustering approach for hybrid recommender systems. International Journal of Hybrid Intelligent Systems 10, 71–81 (2013)

    Google Scholar 

  28. Fränti, P., Kivijärvi, J., Kaukoranta, T., Nevalainen, O.: Genetic Algorithms for Large-Scale Clustering Problems. The Computer Journal 40, 547–554 (1997)

    Article  Google Scholar 

  29. Freitas, A.A.: Data Mining and Knowledge Discovery with Evolutionary Algorithms. Springer, Berlin (2002)

    Google Scholar 

  30. Fukunaga, K.: Introduction to Statistical Pattern Recognition. Academic Press, New York (1990)

    MATH  Google Scholar 

  31. Gan, G., Wu, J., Yang, Z.: A genetic fuzzy K-Modes algorithm for clustering categorical data. Expert Systems with Applications 36, 1615–1620 (2009)

    Google Scholar 

  32. Gennari, J.H., Langley, P., Fisher, D.: Models of incremental concept formation. Artificial Intelligence 40, 11–61 (1989)

    Article  Google Scholar 

  33. Goldberg, D.E.: Genetic Algorithms for Search, Optimization, and Machine Learning, 1st edn. Addison-Wesley Longman, Reading (1989)

    Google Scholar 

  34. Guha, S., Rastogi, R., Shim, K.: CURE, an efficient clustering algorithm for large databases. In: Proceedings of ACM International Conference on Management of Data, pp. 73–84 (1998)

    Google Scholar 

  35. Hall, L.O., Özyurt, I.B., Bezdek, J.C.: Clustering with a genetically optimized approach. IEEE Transactions on Evolutionary Computation 3, 103–112 (1999)

    Article  Google Scholar 

  36. Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2005)

    Google Scholar 

  37. Handl, J., Knowles, J.D.: Evolutionary multiobjective clustering. In: Yao, X., et al. (eds.) PPSN 2004. LNCS, vol. 3242, pp. 1081–1091. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  38. Handl, J., Knowles, J.D.: Exploiting the Trade-off - The Benefits of Multiple Objectives in Data Clustering. In: Coello Coello, C.A., Hernández Aguirre, A., Zitzler, E. (eds.) EMO 2005. LNCS, vol. 3410, pp. 547–560. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  39. Handl, J., Knowles, J.: Multi-objective clustering and cluster validation. In: Jin, Y. (ed.) Multi-Objective Clustering and Cluster Validation. SCI, vol. 16, pp. 21–47. Springer, Heidelberg (2006)

    Google Scholar 

  40. Handl, J., Knowles, J.: An evolutionary approach to multiobjective clustering. IEEE Transactions on Evolutionary Computation 11, 56–76 (2007)

    Article  Google Scholar 

  41. Hettich, S., Blake, C., Merz, C.: UCI repository of machine learning databases (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html

  42. Hilton, G., Sejnowski, T.J. (eds.): Unsupervised learning: foundations of neural computation. MIT Press, Cambridge (1999)

    Google Scholar 

  43. Hinneburg, A., Hinneburg, E., Keim, D.A.: An efficient approach to clustering in large multimedia databases with noise. In: Proceeding of 4th International Conference on Knowledge Discovery and Data Mining, pp. 58–65 (1998)

    Google Scholar 

  44. Horn, J., Nafploitis, N., Goldberg, D.E.: A niched Pareto genetic algorithm for multiobjective optimization. In: Proceeding of IEEE Conference on Evolutionary Computation, pp. 82–87 (1994)

    Google Scholar 

  45. Hruschka, E.R., Ebecken, N.F.F.: A genetic algorithm for cluster analysis. Intelligent Data Analysis 7, 15–25 (2003)

    Google Scholar 

  46. Hruschka, E.R., Campello, R.J.G.B., Freitas, A.A., Carvalho, A.C.P.L.F., de: A Survey of Evolutionary Algorithms for Clustering. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews 39, 133–155 (2009)

    Article  Google Scholar 

  47. Huang, Z.: Clustering large data sets with mixed numeric and categorical values. In: Proceeding of 1st Pacific Asia Knowledge Discovery Data Mining Conference, pp. 21–34 (1997)

    Google Scholar 

  48. Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining Knowledge Discovery 2, 283–304 (1998)

    Article  Google Scholar 

  49. Huang, Z., Ng, M.K.: A fuzzy k-modes algorithm for clustering categorical data. IEEE Transactions on Fuzzy Systems 7, 446–452 (1999)

    Article  Google Scholar 

  50. Hubert, L., Schultz, J.: Quadratic assignment as a general data-analysis strategy. British Journal of Mathematical and Statistical Psychology 29, 190–241 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  51. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: A review. ACM Computing Surveys 31, 264–323 (1999)

    Article  Google Scholar 

  52. Jutler, H.: Liniejnaja modiel z nieskolmini celevymi funkcjami (linear model with several objective functions). Ekonomika i Matematiceckije Metody 3, 397–406 (1967) (in Polish)

    Google Scholar 

  53. Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons, New York (1990)

    Book  Google Scholar 

  54. Kim, Y., Street, W.N., Menczer, F.: Feature selection in unsupervised learning via evolutionary search. In: Proceeding of 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 365–369 (2000)

    Google Scholar 

  55. Kim, Y., Street, W.N., Menczer, F.: Evolutionary model selection in unsupervised learning. Intelligent Data Analysis 6, 531–556 (2002)

    Google Scholar 

  56. Kleinberg, J.: An impossibility theorem for clustering. In: Becker, S., Thrum, S., Obermayer, K. (eds.) Advances in Neural Information Processing Systems, pp. 446–453. MIT Press, Cambridge (2002)

    Google Scholar 

  57. Knowles, J.D., Corne, D.W.: Approximating the nondominated front using the Pareto archived evolution strategy. Evolutionary Algorithm 8, 149–172 (2000)

    Google Scholar 

  58. Kohonen, T.: Self-organized formation of topologically correct feature maps. Biological Cybernetics 43, 59–69 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  59. Kohonen, T.: Self-Organization and Associative Memory. Springer, New York (1984)

    MATH  Google Scholar 

  60. Kohonen, T., Kaski, S., Lagus, K., Solojärvi, J., Paatero, A., Saarela, A.: Self organization of a massive document collection. IEEE Transactions on Neural Networks 11, 574–585 (2000)

    Article  Google Scholar 

  61. Korkmaz, E.E., Du, J., Alhajj, R., Barker, K.: Combining advantages of new chromosome representation scheme and multi-objective genetic algorithms for better clustering. Intelligent Data Analysis 10, 163–182 (2006)

    Google Scholar 

  62. Langley, P.: Elements of Machine Learning. Morgan Kaufmann, San Francisco (1995)

    Google Scholar 

  63. Law, M.H.C., Topchy, A.P., Jain, A.K.: Multiobjective data clustering. In: Proceeding of IEEE Computer Socity Conference on Compututer Vision and Pattern Recognition, pp. 424–430 (2004)

    Google Scholar 

  64. Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering 17, 1–12 (2005)

    Article  MATH  Google Scholar 

  65. Lloyd, S.: Least squares quantization in PCM. IEEE Transactions on Information Theory 28, 129–137 (1982); Original version: Technical Report, Bell Labs (1957)

    Google Scholar 

  66. MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceeding of 5th Berkeley Symposium Mathematical Statistics and Probability, pp. 281–297 (1967)

    Google Scholar 

  67. Maulik, U., Bandyopadhyay, S.: Genetic algorithm-based clustering technique. Pattern Recognition 3, 1455–1465 (2000)

    Article  Google Scholar 

  68. Maulik, U., Bandyopadhyay, S., Saha, I.: Integrating clustering and supervised learning for categorical data analysis. IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans 40, 664–675 (2010)

    Article  Google Scholar 

  69. Merz, P., Zell, A.: Clustering gene expression profiles with memetic algorithms. In: Guervós, J.J.M., Adamidis, P.A., Beyer, H.-G., Fernández-Villacañas, J.-L., Schwefel, H.-P. (eds.) PPSN 2002. LNCS, vol. 2439, pp. 811–820. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  70. Michalski, R.S., Carbonell, J.G., Mitchell, T.M.: An Overview of Machine Learning. In: Michalski, R.S., Carbonell, J.G., Mitchell, T.M. (eds.) Machine Learning, An Artificial Intelligence Approach, pp. 3–23. Springer, Berlin (1984)

    Google Scholar 

  71. Mierswa, I., Wurst, M.: Information preserving multi-objective feature selection for unsupervised learning. In: Proceeding of 8th ACM Annual Conference on Genetic and Evolutionary Computation, pp. 1545–1552 (2006)

    Google Scholar 

  72. Mierswa, I., Wurst, M.: Sound multi-objective feature space transformation for clustering. In: Proceeding of International Conference on Knowledge Discovery, Data Mining, and Machine Learning, pp. 330–337 (2006)

    Google Scholar 

  73. Mitra, S., Pal, S.K., Mitra, P.: Data mining in soft computing framework: a survey. IEEE Transactions on Neural Networks 13, 3–14 (2002)

    Article  Google Scholar 

  74. Morita, M.E., Sabourin, R., Bortolozzi, F., Suen, C.Y.: Unsupervised feature selection using multi-objective genetic algorithms for handwritten word recognition. In: Proceeding of the 7th International Conference on Document Analysis and Recognition, pp. 666–670 (2003)

    Google Scholar 

  75. Mukhopadhyay, A., Maulik, U., Bandyopadhyay, S.: Multiobjective genetic fuzzy clustering of categorical attributes. In: Proceeding of 10th International Conference on Information Technology, pp. 74–79 (2007)

    Google Scholar 

  76. Mukhopadhyay, A., Maulik, U., Bandyopadhyay, S.: Multiobjective genetic algorithm-based fuzzy clustering of categorical attributes. IEEE Transactions on Evolutionary Computation 13, 991–1005 (2009)

    Article  Google Scholar 

  77. Nuovo, A.G.D., Palesi, M., Catania, V.: Multiobjective evolutionary fuzzy clustering for high-dimensional problems. In: Proceeding of IEEE International Conference of Fuzzy System, pp. 1–6 (2007)

    Google Scholar 

  78. Pan, H., Zhu, J., Han, D.: Genetic algorithms applied to multi-class clustering for gene expression data. Genomics, Proteomics, Bioinformatics 1, 279–287 (2003)

    Google Scholar 

  79. Pareto, V.: Manuale di Economia Politica. Piccola Biblioteca Scientifica, Milan (1906); Translated into English by Schwier, A.S., Page, A.N.: Manual of Political Economy. Kelley Publishers, London (1971)

    Google Scholar 

  80. Parsons, L., Haque, E., Liu, H.: Subspace clustering for high dimensional data, a review. ACM SIGKDD Explorations Newsletter 6, 90–105 (2004)

    Article  Google Scholar 

  81. Ripon, K.S.N., Tsang, C.H., Kwong, S.: Multi-objective data clustering using variable-length real jumping genes genetic algorithm and local search method. In: Proceeding of IEEE International Joint Conference on Neural Networks, pp. 3609–3616 (2006)

    Google Scholar 

  82. Ripon, K.S.N., Tsang, C.H., Kwong, S, Ip, M.: Multi-objective evolutionary clustering using variable-length real jumping genes genetic algorithm. In: Proceeding of IEEE 18th International Conference on Pattern Recognition, pp. 1200–1203 (2006)

    Google Scholar 

  83. Ripon, K.S.N., Siddique, M.N.H.: Evolutionary multi-objective clustering for overlapping clusters detection. In: Proceeding of IEEE 11th International Congress on Evolutionary Computation, pp. 976–982 (2009)

    Google Scholar 

  84. Ritzel, B.J., Eheart, J.W., Ranjithan, S.: Using genetic algorithms to solve a multiple objective groundwater pollution containment problem. Water Resources Research 30, 1589–1603 (1994)

    Article  Google Scholar 

  85. Rumelhart, D.E., Zipser, D.: Feature discovery by competitive learning. Cognitive Science 9, 75–112 (1985)

    Article  Google Scholar 

  86. Samet, H.: Foundations of Multidimensional and Metric Data Structures. Elsevier, Amsterdam (2006)

    MATH  Google Scholar 

  87. Schaffer, J.D.: Some Experiments in Machine Learning Using Vector Evaluated Genetic Algorithms, Ph.D. Dissertation, University of Vanderbilt (1984)

    Google Scholar 

  88. Schaffer, J.D.: Multiple objective optimization with vector evaluated genetic algorithms. In: Proceeding of International Conference on Genetic Algorithms and their Applications, pp. 93–100 (1985)

    Google Scholar 

  89. Scheunders, P.: A genetic C-means clustering algorithm applied to color image quantization. Pattern Recognition 30, 859–866 (1997)

    Article  Google Scholar 

  90. Sheikholeslami, G., Chatterjee, S., Zhang, A.: WaveCluster: A multi-resolution clustering approach for very large spatial databases. In: Proceeding of 24th International Conference on Very Large Data Bases, pp. 428–439 (1998)

    Google Scholar 

  91. Srinivas, N., Deb, K.: Multiobjective optimization using non-dominated sorting in genetic algorithms. Evolutionary Computation 2, 221–248 (1994)

    Article  Google Scholar 

  92. Stewart, G.W.: On the early history of the singular value decomposition. SIAM Review 35, 551–566 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  93. Surry, P.D., Radcliffe, N.J., Boyd, I.D.: A multi-objective approach to constrained optimisation of gas supply networks: The COMOGA method. In: Fogarty, T.C. (ed.) AISB-WS 1995. LNCS, vol. 993, pp. 166–180. Springer, Heidelberg (1995)

    Chapter  Google Scholar 

  94. Tou, J.T., Gonzalez, R.C.: Pattern Recognition Principles. Addison-Wesley, Reading (1974)

    MATH  Google Scholar 

  95. Wang, W., Yang, J., Muntz, R.: STING: A statistical information grid approach to spatial data mining. In: Proceeding of 23rd International Conference on Very Large Data Bases, pp. 186–195 (1997)

    Google Scholar 

  96. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn. Morgan Kaufmann, San Francisco (2005)

    Google Scholar 

  97. Zhang, T., Ramakrishnan, R., Livny, M.: Birch: an efficient data clustering method for very large databases. In: Proceeding of ACM International Conference on Management of Data, pp. 103–114 (1996)

    Google Scholar 

  98. Zitzler, E.: Evolutionary Algorithms for Multiobjective Optimization: Methods and Applications, Ph.D. Dissertation, Swiss Federal Institute of Technology (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Dutta, D., Dutta, P., Sil, J. (2013). Categorical Feature Reduction Using Multi Objective Genetic Algorithm in Cluster Analysis. In: Gavrilova, M.L., Tan, C.J.K., Abraham, A. (eds) Transactions on Computational Science XXI. Lecture Notes in Computer Science, vol 8160. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45318-2_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-45318-2_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-45317-5

  • Online ISBN: 978-3-642-45318-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics