Customer Segmentation Based on Transactional Data Using Stream Clustering

  • Matthias CarneinEmail author
  • Heike Trautmann
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11439)


Customer Segmentation aims to identify groups of customers that share similar interest or behaviour. It is an essential tool in marketing and can be used to target customer segments with tailored marketing strategies. Customer segmentation is often based on clustering techniques. This analysis is typically performed as a snapshot analysis where segments are identified at a specific point in time. However, this ignores the fact that customer segments are highly volatile and segments change over time. Once segments change, the entire analysis needs to be repeated and strategies adapted. In this paper we explore stream clustering as a tool to alleviate this problem. We propose a new stream clustering algorithm which allows to identify and track customer segments over time. The biggest challenge is that customer segmentation often relies on the transaction history of a customer. Since this data changes over time, it is necessary to update customers which have already been incorporated into the clustering. We show how to perform this step incrementally, without the need for periodic re-computations. As a result, customer segmentation can be performed continuously, faster and is more scalable. We demonstrate the performance of our algorithm using a large real-life case study.


Customer segmentation Market segmentation Stream clustering Data streams Machine learning 


  1. 1.
    Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proceedings of the 29th International Conference on Very Large Data Bases, VLDB 2003, Berlin, Germany, vol. 29, pp. 81–92. VLDB Endowment (2003)Google Scholar
  2. 2.
    Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for projected clustering of high dimensional data streams. In: Proceedings of the Thirtieth International Conference on Very Large Data Bases, VLDB 2004, Toronto, Canada, vol. 30, pp. 852–863. VLDB Endowment (2004)Google Scholar
  3. 3.
    Bifet, A., Gavalda, R., Holmes, G., Pfahringer, B.: Machine Learning for Data Streams with Practical Examples in MOA. MIT Press, Cambridge (2018)CrossRefGoogle Scholar
  4. 4.
    Buttle, F.: Customer Relationship Management: Concepts and Technologies. Elsevier Butterworth-Heinemann, Oxford (2009)Google Scholar
  5. 5.
    Carnein, M., Assenmacher, D., Trautmann, H.: An empirical comparison of stream clustering algorithms. In: Proceedings of the ACM International Conference on Computing Frontiers (CF 2017), pp. 361–365. ACM (2017).
  6. 6.
    Carnein, M., Trautmann, H.: Evostream - evolutionary stream clustering utilizing idle times. Big Data Res. (2018). Scholar
  7. 7.
    Carnein, M., Trautmann, H.: Optimizing data stream representation: an extensive survey on stream clustering algorithms. Bus. Inf. Syst. Eng. (BISE) (2019). Scholar
  8. 8.
    Chen, Y., Tu, L.: Density-based clustering for real-time stream data. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2007, San Jose, California, USA, pp. 133–142. ACM (2007).
  9. 9.
    Hahsler, M., Bolaños, M.: Clustering data streams based on shared density between micro-clusters. IEEE Trans. Knowl. Data Eng. 28(6), 1449–1461 (2016). Scholar
  10. 10.
    Kranen, P., Assent, I., Baldauf, C., Seidl, T.: Self-adaptive anytime stream clustering. In: 9th IEEE International Conference on Data Mining (ICDM 2009), pp. 249–258, December 2009.
  11. 11.
    Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theor. 28(2), 129–137 (2006). Scholar
  12. 12.
    Rousseeuw, P.J., Kaufman, L.: Finding Groups in Data. Wiley, Hoboken (1990)zbMATHGoogle Scholar
  13. 13.
    Schiffman, L.G., Hansen, H., Kanuk, L.L.: Consumer Behaviour: A European Outlook. Pearson Education, London (2008)Google Scholar
  14. 14.
    Wedel, M., Kamakura, W.A.: Market Segmentation, 2nd edn. Springer, USA (2000). Scholar
  15. 15.
    Welford, B.P.: Note on a method for calculating corrected sums of squares and products. Technometrics 4(3), 419–420 (1962)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: a new data clustering algorithm and its applications. Data Min. Knowl. Discov. 1(2), 141–182 (1997)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.University of MünsterMünsterGermany

Personalised recommendations