Tracking recurring contexts using ensemble classifiers: an application to email filtering


Concept drift constitutes a challenging problem for the machine learning and data mining community that frequently appears in real world stream classification problems. It is usually defined as the unforeseeable concept change of the target variable in a prediction task. In this paper, we focus on the problem of recurring contexts, a special sub-type of concept drift, that has not yet met the proper attention from the research community. In the case of recurring contexts, concepts may re-appear in future and thus older classification models might be beneficial for future classifications. We propose a general framework for classifying data streams by exploiting stream clustering in order to dynamically build and update an ensemble of incremental classifiers. To achieve this, a transformation function that maps batches of examples into a new conceptual representation model is proposed. The clustering algorithm is then applied in order to group batches of examples into concepts and identify recurring contexts. The ensemble is produced by creating and maintaining an incremental classifier for every concept discovered in the data stream. An experimental study is performed using (a) two new real-world concept drifting datasets from the email domain, (b) an instantiation of the proposed framework and (c) five methods for dealing with drifting concepts. Results indicate the effectiveness of the proposed representation and the suitability of the concept-specific classifiers for problems with recurring contexts.

This is a preview of subscription content, log in to check access.


  1. 1

    Aggarwal, C (eds) (2007) Data streams: models and algorithms. Springer, Heidelberg

    Google Scholar 

  2. 2

    Tsymbal A (2004) The problem of concept drift: definitions and related work. Technical report, Department of Computer Science Trinity College

  3. 3

    Widmer G, Kubat M (1996) Learning in the presense of concept drift and hidden contexts. Mach Learn 23(1): 69–101

    Google Scholar 

  4. 4

    Harries MB, Sammut C, Horn K (1998) Extracting hidden context. Mach Learn 32(2): 101–126

    MATH  Article  Google Scholar 

  5. 5

    Forman G (2006) Tackling concept drift by temporal inductive transfer. In: SIGIR ’06: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval. ACM Press, New York, pp 252–259

  6. 6

    Gaber M, Zaslavsky A, Krishnaswamy S (2007) A survey of classification methods in data streams. In: Aggarwal C (eds) Data streams, models and algorithms. Springer, Heidelberg, pp 39–59

    Google Scholar 

  7. 7

    Barbará D (2002) Requirements for clustering data streams. SIGKDD Explor 3(2): 23–27

    Article  Google Scholar 

  8. 8

    Cheng J, Ke Y, Ng W (2008) A survey on algorithms for mining frequent itemsets over data streams. Knowl Inform Syst 16(1): 1–27

    Article  MathSciNet  Google Scholar 

  9. 9

    Kolter J, Maloof M (2003) Dynamic weighted majority: a new ensemble method for tracking concept drift. In: Proceedings of the Third IEEE international conference on data mining. IEEE Press, Los Alamitos, pp 123–130

  10. 10

    Kolter JZ, Maloof MA (2005) Using additive expert ensembles to cope with concept drift. In: ICML ’05: Proceedings of the 22nd international conference on machine learning. ACM Press, New York, pp 449–456

  11. 11

    Wenerstrom B, Giraud-Carrier C (2006) Temporal data mining in dynamic feature spaces. IEEE Computer Society, Los Alamitos, pp 1141–1145

  12. 12

    Gama J, Medas P, Castillo G, Rodrigues PP (2004) Learning with drift detection. In: Bazzan ALC, Labidi S (eds) Advances in artificial intelligence. Proceedings of the 17th Brazilian symposium on artificial intelligence (SBIA 2004). Lecture notes in artificial intelligence, vol 3171. Springer, Brazil, pp 286–295

  13. 13

    Kolter JZ, Maloof MA (2007) Dynamic weighted majority: an ensemble method for drifting concepts. J Mach Learn Res 8: 2755–2790

    Google Scholar 

  14. 14

    Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensembles classifiers. In: 9th ACM SIGKDD international conference on knowledge discovery and data mining. ACM Press, Washington, DC, pp 226–235

  15. 15

    Martin Scholz RK (2007) Boosting classifiers for drifting concepts. Intell Data Anal, Spec Issue Knowl Discovery from Data Streams 11(1): 3–28

    Google Scholar 

  16. 16

    Zhou A, Cao F, Qian W, Jin C (2008) Tracking clusters in evolving data streams over sliding windows. Knowl Inform Syst 15(2): 181–214

    Article  Google Scholar 

  17. 17

    O’Callaghan L, Mishra N, Meyerson A, Guha S, Motwani R (2002) High-performance clustering of streams and large data sets. In: ICDE 2002

  18. 18

    Aggarwal CC, Han J, Wang J, Yu PS (2004) A framework for projected clustering of high dimensional data streams. In: VLDB ’04: Proceedings of the 30th international conference on very large data bases, VLDB Endowment, pp 852–863

  19. 19

    Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. ACM SIGMOD Rec 25(2): 103–114

    Article  Google Scholar 

  20. 20

    Klinkenberg R, Joachims T (2000) Detecting concept drift with support vector machines. In: ICML ’00: Proceedings of the 17th international conference on machine learning. Morgan Kaufmann, San Francisco, pp 487–494

  21. 21

    Klinkenberg R (2004) Learning drifting concepts: example selection vs. example weighting. Intell Data Anal 8(3): 200–281

    Google Scholar 

  22. 22

    Fan W (2004) Systematic data selection to mine concept-drifting data streams. In: KDD ’04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM Press, New York, pp 128–137

  23. 23

    Delany SJ, Padraig Cunningham ATLC (2005) A case-based technique for tracking concept drift in spam filtering. Knowl Based Syst 18(4–5): 187–195

    Article  Google Scholar 

  24. 24

    Street WN, Kim Y (2001) A streaming ensemble algorithm (SEA) for large-scale classification. In: 7th ACM SIGKDD international conference on knowledge discovery in data mining. ACM Press, pp 277–382

  25. 25

    Zhu X, Wu X, Yang Y (2006) Effective classification of noisy data streams with attribute-oriented dynamic classifier selection. Knowl Inform Syst 9(3): 339–363

    Article  MathSciNet  Google Scholar 

  26. 26

    Spinosa EJ, Carvahlo Ad, Gama J (2007) OLINDDA: a cluster-based approach for detecting novelty and concept drift in data streams. In: 22nd annual acm symposium on applied computing. ACM Press, pp 448–452

  27. 27

    Hulten G, Spence L, Domingos P (2001) Mining time-changing data streams. In: KDD ’01: 7th ACM SIGKDD International conference on knowledge discovery and data mining. ACM Press, pp 97–106

  28. 28

    Duda RO, Hart PE, Stork DG (2000) Pattern classification. Wiley, New York

    Google Scholar 

  29. 29

    Asuncion A, Newman D (2007) UCI machine learning repository

  30. 30

    Katakis I, Tsoumakas G, Vlahavas I (2006) Dynamic feature space and incremental feature selection for the classification of textual data streams. In: ECML/PKDD-2006 international workshop on knowledge discovery from data stream, pp 107–116

  31. 31

    Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (Methodological) 39(1): 1–38

    MATH  MathSciNet  Google Scholar 

  32. 32

    Witten I, Frank E (2005) Data mining: practical machine learning tools and techniques. 2nd edn, San Francisco

  33. 33

    John GH, Langley P (1995) Estimating continuous distributions in Bayesian classifiers. In: UAI ’95: Proceedings of the 11th annual conference on uncertainty in artificial intelligence. Morgan Kaufman, Montreal, pp 338–345

  34. 34

    Domingos P, Pazzani MJ (1997) On the optimality of the simple bayesian classifier under zero-one loss. Mach Learn 29(2–3): 103–130

    MATH  Article  Google Scholar 

  35. 35

    Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv 34(1): 1–47

    Article  Google Scholar 

  36. 36

    Sahami M, Dumais S, Heckerman D, Horvitz E (1998) A bayesian approach to filtering junk e-mail. In: Learning for text categorization: papers from the 1998 Workshop, Madison, Wisconsin, AAAI Technical Report WS-98-05

  37. 37

    Rennie J (2000) ifile: an application of machine learning to e-mail filtering. In: KDD-2000 workshop on text mining

  38. 38

    Vapnik V (1995) The nature of statistical learning theory. Springer, Heidelberg

    Google Scholar 

  39. 39

    Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: Nédellec C, Rouveirol C (eds) Proceedings of ECML-98, 10th European conference on machine learning. Number 1398. Springer, Heidelberg, pp 137–142

    Google Scholar 

  40. 40

    Peng T, Zuo W, He F (2008) SVM based adaptive learning method for text classification from positive and unlabeled documents. Knowl Inform Syst 16(3): 281–301

    Article  Google Scholar 

  41. 41

    Klimt B, Yang Y (2004) The enron corpus: a new dataset for email classification research. In: ECML 2004, 15th European conference on machine learning. Springer, Pisa, pp 217–226

  42. 42

    Rennie JD, Rifkn R (2001) Improving multiclass text classification with the support vector machine. Technical Report AIM-2001-026, Massachusetts Institute of Technology

  43. 43

    Yang Y, Liu X (1999) A re-examination of text categorization methods. In: SIGIR ’99: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. ACM Press, New York, pp 42–49

  44. 44

    Tsoumakas G, Angelis L, Vlahavas I (2004) Clustering classifiers for knowledge discovery from physically distributed databases. Data Knowl Eng 49(3): 223–242

    Article  Google Scholar 

  45. 45

    Katakis I, Tsoumakas G, Banos E, Bassiliades N, Vlahavas I An adaptive personalized news dissemination system. J Intell Inform Syst 32:191–212

Download references

Author information



Corresponding author

Correspondence to Ioannis Katakis.

Additional information

A preliminary version of this paper appears in the proceedings of the 18th European Conference on Artificial Intelligence, Patras, Greece, 2008.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Katakis, I., Tsoumakas, G. & Vlahavas, I. Tracking recurring contexts using ensemble classifiers: an application to email filtering. Knowl Inf Syst 22, 371–391 (2010).

Download citation


  • Data Stream
  • Mach Learn
  • Concept Drift
  • Cluster Assignment
  • True Label