New Representations in Genetic Programming for Feature Construction in k-Means Clustering

  • Andrew LensenEmail author
  • Bing Xue
  • Mengjie Zhang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10593)


k-means is one of the fundamental and most well-known algorithms in data mining. It has been widely used in clustering tasks, but suffers from a number of limitations on large or complex datasets. Genetic Programming (GP) has been used to improve performance of data mining algorithms by performing feature construction—the process of combining multiple attributes (features) of a dataset together to produce more powerful constructed features. In this paper, we propose novel representations for using GP to perform feature construction to improve the clustering performance of the k-means algorithm. Our experiments show significant performance improvement compared to k-means across a variety of difficult datasets. Several GP programs are also analysed to provide insight into how feature construction is able to improve clustering performance.


Cluster analysis Feature construction Genetic programming k-means Evolutionary computation 


  1. 1.
    Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recogn. Lett. 31(8), 651–666 (2010)CrossRefGoogle Scholar
  2. 2.
    García, A.J., Gómez-Flores, W.: Automatic clustering using nature-inspired metaheuristics: a survey. Appl. Soft Comput. 41, 192–213 (2016)CrossRefGoogle Scholar
  3. 3.
    Hartigan, J.A., Wong, M.A.: Algorithm AS 136: a k-means clustering algorithm. J. R. Stat. Soc. Ser. C (Appl. Stat.) 28(1), 100–108 (1979)zbMATHGoogle Scholar
  4. 4.
    Tseng, L.Y., Yang, S.B.: A genetic clustering algorithm for data with non-spherical-shape clusters. Pattern Recogn. 33(7), 1251–1259 (2000)CrossRefGoogle Scholar
  5. 5.
    Liu, H., Motoda, H.: Feature Extraction, Construction and Selection: A Data Mining Perspective. Springer Science & Business Media, Heidelberg (1998)CrossRefzbMATHGoogle Scholar
  6. 6.
    Espejo, P.G., Ventura, S., Herrera, F.: A survey on the application of genetic programming to classification. IEEE Trans. Syst. Man Cybern. Part C 40(2), 121–144 (2010)CrossRefGoogle Scholar
  7. 7.
    Koza, J.R.: Genetic programming: on the programming of computers by means of natural selection, vol. 1. MIT press, Cambridge (1992)zbMATHGoogle Scholar
  8. 8.
    Eiben, A.E., Smith, J.E.: Introduction to Evolutionary Computing. Natural Computing Series. Springer, Heidelberg (2015)CrossRefzbMATHGoogle Scholar
  9. 9.
    Neshatian, K., Zhang, M., Andreae, P.: A filter approach to multiple feature construction for symbolic learning classifiers using genetic programming. IEEE Trans. Evol. Comput. 16(5), 645–661 (2012)CrossRefGoogle Scholar
  10. 10.
    Tran, B., Xue, B., Zhang, M.: Genetic programming for feature construction and selection in classification on high-dimensional data. Memet. Comput. 8(1), 3–15 (2016)CrossRefGoogle Scholar
  11. 11.
    Nanda, S.J., Panda, G.: A survey on nature inspired metaheuristic algorithms for partitional clustering. Swarm Evol. Comput. 16, 1–18 (2014)CrossRefGoogle Scholar
  12. 12.
    Aggarwal, C.C., Reddy, C.K. (eds.): Data Clustering: Algorithms and Applications. CRC Press, Boca Raton (2014)zbMATHGoogle Scholar
  13. 13.
    Ester, M., Kriegel, H., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), Portland, pp. 226–231 (1996)Google Scholar
  14. 14.
    Hartuv, E., Shamir, R.: A clustering algorithm based on graph connectivity. Inf. Process. Lett. 76(4–6), 175–181 (2000)MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Boric, N., Estévez, P.A.: Genetic programming-based clustering using an information theoretic fitness measure. In: Proceedings of the IEEE Congress on Evolutionary Computation (CEC), pp. 31–38 (2007)Google Scholar
  16. 16.
    Ahn, C.W., Oh, S., Oh, M.: A genetic programming approach to data clustering. In: Kim, T., Adeli, H., Grosky, W.I., Pissinou, N., Shih, T.K., Rothwell, E.J., Kang, B.-H., Shin, S.-J. (eds.) MulGraB 2011. CCIS, vol. 263, pp. 123–132. Springer, Heidelberg (2011). doi: 10.1007/978-3-642-27186-1_15 CrossRefGoogle Scholar
  17. 17.
    Handl, J., Knowles, J.D.: An evolutionary approach to multiobjective clustering. IEEE Trans. Evol. Comput. 11(1), 56–76 (2007)CrossRefGoogle Scholar
  18. 18.
    Lichman, M.: UCI machine learning repository (2013)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.School of Engineering and Computer ScienceVictoria University of WellingtonWellingtonNew Zealand

Personalised recommendations