Clustering-Based Optimised Probabilistic Active Learning (COPAL)

  • Georg KremplEmail author
  • Tuan Cuong Ha
  • Myra Spiliopoulou
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9356)


Facing ever increasing volumes of data but limited human annotation capacities, active learning approaches that allocate these capacities to the labelling of the most valuable instances gain in importance. A particular challenge is the active learning of arbitrary, user-specified adaptive classifiers in evolving datastreams.We address this challenge by proposing a novel clustering-based optimised probabilistic active learning (COPAL) approach for evolving datastreams. It combines established clustering techniques, inspired by semi-supervised learning, which are used to capture the structure of the unlabelled data, with the recently introduced probabilistic active learning approach, which is used for the selection among clusters. The labels actively selected by COPAL are then available for training an arbitrary adaptive stream classifier. The performance of our algorithm is evaluated on several synthetic and real-world datasets. The results show that it achieves a better accuracy for the same budget than other recently proposed active learning approaches for such evolving datastreams.


Probabilistic active learning Selective sampling Evolving datastreams Nonstationary environments Concept drift Adaptive classification Clustering 



We thank our colleagues, in particular Daniel Kottke, from University of Magdeburg, Christian Beyer from IBM Germany, and Vincent Lemaire from Orange Labs France, as well as Dino Ienco, Albert Bifet and Bernhard Pfahringer and the anonymous reviewers.


  1. 1.
    Abdallah, Z., Gaber, M., Srinivasan, B., Krishnaswamy, S.: Streamar: incremental and active learning with evolving sensory data for activity recognition. In: Proceedings of the 24th IEEE International Conference on Tools with Artificial Intelligence (2012)Google Scholar
  2. 2.
    Asuncion, A., Newman, D.J.: UCI machine learning repository (2015)Google Scholar
  3. 3.
    Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: massive online analysis. J. Mach. Learn. Res. 11, 1601–1604 (2010)Google Scholar
  4. 4.
    Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with drift detection. In: Bazzan, A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 286–295. Springer, Heidelberg (2004) CrossRefGoogle Scholar
  5. 5.
    Gama, J., Sebastião, R., Rodrigues, P.P.: On evaluating stream learning algorithms. Mach. Learn. 90, 317–346 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Gantz, J., Reinsel, D.: The digital universe in 2020: Big data, bigger digital shadows, and biggest growth in the far east, December 2012Google Scholar
  7. 7.
    Gopalkrishnan, V., Steier, D., Lewis, H., Guszcza, J.: Big data, big business: Bridging the gap. In: Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, BigMine 2012, pp. 7–11. ACM, New York (2012)Google Scholar
  8. 8.
    Harries, M.: Splice-2 comparative evaluation: Electricity pricing. University of New South Wales, Australia, Technical report (1999)Google Scholar
  9. 9.
    Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: KDD 2001: Proceedings of the seventh ACM SIGKDD International Conference on Knowledge discovery and data mining, pp. 97–106. ACM, New York (2001)Google Scholar
  10. 10.
    Ienco, D., Bifet, A., Žliobaitė, I., Pfahringer, B.: Clustering based active learning for evolving data streams. In: Fürnkranz, J., Hüllermeier, E., Higuchi, T. (eds.) DS 2013. LNCS (LNAI), vol. 8140, pp. 79–93. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  11. 11.
    Ienco, D., Pfahringer, B., Zliobaitė, I.: High density-focused uncertainty sampling for active learning over evolving stream data. In: Proceedings of the 3rd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, pp. 133–148 (2014)Google Scholar
  12. 12.
    Kottke, D., Krempl, G., Spiliopoulou, M.: Probabilistic active learning in data streams. In: De Bie, T., Fromont, E. (eds.) Advances in Intelligent Data Analysis XIV - 14th International Symposium (IDA 2015). LNCS. Springer (2015)Google Scholar
  13. 13.
    Krempl, G., Kottke, D., Lemaire, V.: Optimised probabilistic active learning (OPAL) for fast, non-myopic, cost-sensitive active classification. Mach. Learn. Spec. Issue ECML PKDD 2015, 1–28 (2015)zbMATHGoogle Scholar
  14. 14.
    Krempl, G., Kottke, D., Spiliopoulou, M.: Probabilistic active learning: towards combining versatility, optimality and efficiency. In: Džeroski, S., Panov, P., Kocev, D., Todorovski, L. (eds.) DS 2014. LNCS, vol. 8777, pp. 168–179. Springer, Heidelberg (2014) Google Scholar
  15. 15.
    Krempl, G., Zliobaitė, I., Brzeziński, D., Hüllermeier, E., Last, M., Lemaire, V., Noack, T., Shaker, A., Sievi, S., Spiliopoulou, M., Stefanowski, J.: Open challenges for data stream mining research. SIGKDD Explor. 16(1), 1–10 (2014). special Issue on Big DataCrossRefGoogle Scholar
  16. 16.
    Loy, C.C., Hospedales, T.M., Xiang, T., Gong, S.: Stream-based joint exploration-exploitation active learning. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1560–1567 (2012)Google Scholar
  17. 17.
    Masud, M.M., Gao, J., Khan, L., Han, J., Thuraisingham, B.: Classification and novel class detection in data streams with active mining. In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds.) PAKDD 2010. LNCS, vol. 6119, pp. 311–324. Springer, Heidelberg (2010) CrossRefGoogle Scholar
  18. 18.
    Moro, S., Laureano, R., Cortez, P.: Using data mining for bank direct marketing: an application of the crisp-dm methodology. In: Novais, P. (ed.) Proceedings of the European Simulation and Modelling Conference (ESM’2011), pp. 117–121. EUROSIS, Guimarães (2011)Google Scholar
  19. 19.
    Nguyen, H.T., Smeulders, A.: Active learning using pre-clustering. In: Proceedings of the 21st International Conference on Machine Learning, ICML 2004, Banff, Alberta, Canada, pp. 79–86. ACM Press (2004)Google Scholar
  20. 20.
    Nguyen, H.-L., Ng, W.-K., Woon, Y.-K.: Concurrent semi-supervised learning with active learning of data streams. In: Hameurlain, A., Küng, J., Wagner, R., Cuzzocrea, A., Dayal, U. (eds.) TLDKS VIII. LNCS, vol. 7790, pp. 113–136. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  21. 21.
    Ryu, J.W., Kantardzic, M.M., Kim, M.-W., Ra Khil, A.: An efficient method of building an ensemble of classifiers in streaming data. In: Srinivasa, S., Bhatnagar, V. (eds.) BDA 2012. LNCS, vol. 7678, pp. 122–133. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  22. 22.
    Settles, B.: Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin-Madison, Madison, Wisconsin, USA (2009)Google Scholar
  23. 23.
    Settles, B.: Active Learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, vol. 18. Morgan and Claypool Publishers, San Rafael (2012) zbMATHGoogle Scholar
  24. 24.
    Zhu, X., Zhang, P., Lin, X., Shi, Y.: Active learning from data streams. In: Proceedings of the 2007 Seventh IEEE International Conference on Data Mining, ICDM 2007, pp. 757–762. IEEE Computer Society, Washington, DC (2007)Google Scholar
  25. 25.
    Zhu, X., Zhang, P., Lin, X., Shi, Y.: Active learning from stream data using optimal weight classifier ensemble. IEEE Trans. Syst. Man. Cybern. Part B Cybern. 40(6), 1607–1621 (2010)CrossRefGoogle Scholar
  26. 26.
    Zliobaitė, I., Bifet, A., Pfahringer, B., Holmes, G.: Active learning with drifting streaming data. IEEE Trans. Neural Netw. Learn. Syst. 25(1), 27–39 (2013)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Georg Krempl
    • 1
    Email author
  • Tuan Cuong Ha
    • 1
  • Myra Spiliopoulou
    • 1
  1. 1.Knowledge Management and DiscoveryOtto-von-Guericke UniversityMagdeburgGermany

Personalised recommendations