Advertisement

Annals of Operations Research

, Volume 168, Issue 1, pp 225–245 | Cite as

Chameleon based on clustering feature tree and its application in customer segmentation

  • Jinfeng Li
  • Kanliang Wang
  • Lida Xu
Article

Abstract

Clustering analysis plays an important role in the filed of data mining. Nowadays, hierarchical clustering technique is becoming one of the most widely used clustering techniques. However, for most algorithms of hierarchical clustering technique, the requirements of high execution efficiency and high accuracy of clustering result cannot be met at the same time. After analyzing the advantages and disadvantages of the hierarchical algorithms, the paper puts forward a two-stage clustering algorithm, named Chameleon Based on Clustering Feature Tree (CBCFT), which hybridizes the Clustering Tree of algorithm BIRCH with algorithm CHAMELEON. By calculating the time complexity of CBCFT, the paper argues that the time complexity of CBCFT increases linearly with the number of data. By experimenting on sample data set, this paper demonstrates that CBCFT is able to identify clusters with large variance in size and shape and is robust to outliers. Moreover, the result of CBCFT is as similar as that of CHAMELEON, but CBCFT overcomes the shortcoming of the low execution efficiency of CHAMELEON. Although the execution time of CBCFT is longer than BIRCH, the clustering result of CBCFT is much satisfactory than that of BIRCH. Finally, through a case of customer segmentation of Chinese Petroleum Corp. HUBEI branch; the paper demonstrates that the clustering result of the case is meaningful and useful.

Keywords

Data mining Clustering Hierarchical clustering Customer segmentation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ankerst, M., Breunig, M., & Kriegel, H. P. Sander, J. (1999). OPICS: Ordering points to identify the clustering structure. In Proceedings of 1999 ACM-SIGMOD international conference on management of data (SIGMOD’99), Philadelphia, June 1999 (pp. 49–60). Google Scholar
  2. Chen, M. S., Han, J., & Yu, P. S. (1996). Data mining: an overview from a database perspective. IEEE Transactions on Knowledge and Data Engineering, 8(6), 866–883. CrossRefGoogle Scholar
  3. Chicc, G., Napoli, R. C., & Piglione, F. (2006). Comparisons among clustering techniques for electricity customer classification. IEEE Transactions on Power Systems, 21(2). Google Scholar
  4. Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the second international conference of knowledge discovery and data mining (pp. 226–231). Google Scholar
  5. Friedman, J. H., Bentley, J. L., & Finkel, R. A. (1977). An algorithm for finding best matches in logarithmic expected time. ACM Transactions on Mathematical Software, 3, 209–226. CrossRefGoogle Scholar
  6. Guha, S., Rastogi, R., & Shim, K. (1998) CURE: An efficient clustering algorithm for large databases. In Proceedings of ACM–SIGMOD international conference on management of data (pp. 73–84). Google Scholar
  7. Guha, S., Rastogi, R., & Shim, K. (1999). ROCK: a robust clustering algorithm for categorical attributes. In Proceedings of the 15th international conference on data engineering (p. 512). Google Scholar
  8. Halkidi, M., Batistakis, Y., & Vazirgiannis, M. (2001). Clustering algorithms and validity measures. In Proceedings of the thirteenth international conference on scientific and statistical database management (Vol. 3, pp. 1099–3371). Google Scholar
  9. Han, J., & Kamber, M. (2001). Data mining concepts and techniques. BeiJing: Higher Education Press. Google Scholar
  10. Hinneburg, A., & Keim, D. A. (1998). An efficient approach to clustering in large multimedia databases with noise. In Proc. of 1998 int. conf. on knowledge discovery and data mining (KDD’98), New York, August 1998 (pp. 58–65). Google Scholar
  11. Hu, T. L., & Sheu, J. B. (2003). A fuzzy-based customer classification method for demand-responsive logistical distribution operations. Fuzzy Sets and Systems, 139, 431–450. CrossRefGoogle Scholar
  12. Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: a review. ACM Computing Surveys (CSUR), 31(3), 264–323. CrossRefGoogle Scholar
  13. Jung, S. Y., & Kim, T. S. (2001). An agglomerative hierarchical clustering using partial maximum array and incremental similarity computation method. In First IEEE international conference on data mining (p. 265). Google Scholar
  14. Karypis, G., Han, E. H., & Kumar, V. (1999). Chameleon: Hierarchical clustering using dynamic modeling. IEEE Computer, 32(8), 68–75 (Special Issue on Data Analysis and Mining). Google Scholar
  15. Karypis, G., & Kumar, V. (1998). hMETIS 1.5: A hypergraph partitioning package. Technical report, Department of Computer Science, University of Minnesota. http://www.cs.umn.edu/~metis.
  16. Lewis, J., & Chase, J. (2004). Data structures. Java edition. Englewood Cliffs: Prentice-Hall. Google Scholar
  17. Li, C., Becerra, V. M., & Deng, J. (2004). Extension of fuzzy c-means algorithm. In Proceedings of the IEEE conference on cybernetics and intelligent systems, Singapore, 1–3 December 2004. Google Scholar
  18. Lin, C. R., & Chen, M. S. (2005). Combining partitional and hierarchical algorithms for robust and efficient data clustering with cohesion self-merging. IEEE Transactions on Knowledge and Date Engineering, 17(2). Google Scholar
  19. Ng, R., & Han, J. (1994). Efficient and effective clustering method for spatial data mining. In Proceedings of international conference of very large data bases (VLDB’94), Santiago, Chile, September 1994 (pp. 144–155). Google Scholar
  20. Olson, D. L., & Zhao, F. (2007). CIOs’ perspectives of critical success factors in ERP upgrade projects. Enterprise Information Systems, 1(1), 129–138. CrossRefGoogle Scholar
  21. Palomino, M. A., & Whitley, E. A. (2007). The effects of national culture on ERP implementation: a study of Colombia and Switzerland. Enterprise Information Systems, 3(1), 301–325. Google Scholar
  22. Qian, Y. T., Shi, Q. S., & Wang, Q. (2002). CURE-NS: A hierarchical clustering algorithm with new shrinking scheme. In Proceedings of the first international conference on machine learning and cybernetics, Beijing, 4–5 November 2002. Google Scholar
  23. Sheikholeslami, G., Chatterjee, S., & Zhang, A. (1998). WaveCluster: A multi-resolution clustering approach for very large spatial databases. In Proceedings of the IEEE conference on very large data bases (VLDB’98), New York, August 1998 (pp. 428–439). Google Scholar
  24. Wan, L. H., Li, Y. J., Liu, W. Y., & Zha, D. Y. (2005). Application and study of spatial cluster and customer partitioning. In Proceedings of the fourth international conference on machine learning and cybernetics, Guangzhou, 18–21 August 2005. Google Scholar
  25. Wang, W., Yang, J., & Muntz, R. (1997). STING: A statistical information grid approach to spatial data mining. In Proceedings of 1997 international conference on very large data bases (VLDB’97), Athens, Greece, August 1997 (pp. 186–195). Google Scholar
  26. Wang, Z., He, P. L., Guo, L. S., & Zheng, X. S. (2004). Clustering analysis of customer relationship in securities trade. In Proceedings of the third international conference on machine learning and cybernetics, Shanghai, August 2004 (pp. 26–29). Google Scholar
  27. Warfield, J. N. (2007). Systems science serves enterprise integration: a tutorial. Enterprise Information Systems, 2(1), 235–254. CrossRefGoogle Scholar
  28. Weiss, M. A. (2004). Data structures and algorithm analysis in Java. Englewood Cliffs: Prentice-Hall. Google Scholar
  29. Zhang, T., Ramakrishnan, R., & Linvy, M. (1996). Birch: an efficient data clustering method for large databases. In Proc. of 1996 ACM–SIGMOD international conference on management of data, Montreal, Quebec (pp. 103–114). Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  1. 1.The School of ManagementXi’an Jiaotong UniversityXi’anChina
  2. 2.Department of Information Technology and Decision ScienceOld Dominion UniversityNorfolkUSA
  3. 3.College of Economics and ManagementBeijing Jiaotong UniversityBeijingChina

Personalised recommendations