Advertisement

Efficient Estimation of AUC in a Sliding Window

  • Nikolaj TattiEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11051)

Abstract

In many applications, monitoring area under the ROC curve (AUC) in a sliding window over a data stream is a natural way of detecting changes in the system. The drawback is that computing AUC in a sliding window is expensive, especially if the window size is large and the data flow is significant.

In this paper we propose a scheme for maintaining an approximate AUC in a sliding window of length k. More specifically, we propose an algorithm that, given \(\epsilon \), estimates AUC within \(\epsilon / 2\), and can maintain this estimate in \( \mathcal {O} \mathopen {}\left( (\log k) / \epsilon \right) \) time, per update, as the window slides. This provides a speed-up over the exact computation of AUC, which requires \( \mathcal {O} \mathopen {}\left( k\right) \) time, per update. The speed-up becomes more significant as the size of the window increases. Our estimate is based on grouping the data points together, and using these groups to calculate AUC. The grouping is designed carefully such that (i) the groups are small enough, so that the error stays small, (ii) the number of groups is small, so that enumerating them is not expensive, and (iii) the definition is flexible enough so that we can maintain the groups efficiently.

Our experimental evaluation demonstrates that the average approximation error in practice is much smaller than the approximation guarantee \(\epsilon / 2\), and that we can achieve significant speed-ups with only a modest sacrifice in accuracy. Code related to this paper is available at: https://bitbucket.org/orlyanalytics/streamauc.

Keywords

AUC Approximation guarantee Sliding window 

References

  1. 1.
    Ataman, K., Streetr, W., Zhang, Y.: Learning to rank by maximizing AUC with linear programming. In: International Joint Conference on Neural Networks, IJCNN 2006, pp. 123–129. IEEE (2006)Google Scholar
  2. 2.
    Bifet, A., Frank, E.: Sentiment knowledge discovery in Twitter streaming data. In: Pfahringer, B., Holmes, G., Hoffmann, A. (eds.) DS 2010. LNCS (LNAI), vol. 6332, pp. 1–15. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-16184-1_1CrossRefGoogle Scholar
  3. 3.
    Bouckaert, R.R.: Efficient AUC learning curve calculation. In: Australasian Joint Conference on Artificial Intelligence, pp. 181–191 (2006)Google Scholar
  4. 4.
    Brefeld, U., Scheffer, T.: AUC maximizing support vector learning. In: Workshop on ROC Analysis in Machine Learning (2005)Google Scholar
  5. 5.
    Brzezinski, D., Stefanowski, J.: Prequential AUC: properties of the area under the ROC curve for data streams with concept drift. KAIS 52(2), 531–562 (2017)Google Scholar
  6. 6.
    Calders, T., Jaroszewicz, S.: Efficient AUC optimization for classification. In: PKDD, pp. 42–53 (2007)Google Scholar
  7. 7.
    Ferri, C., Flach, P., Hernández-Orallo, J.: Learning decision trees using the area under the ROC curve. ICML 2, 139–146 (2002)Google Scholar
  8. 8.
    Gama, J.: Knowledge Discovery from Data Streams. CRC Press, Boca Raton (2010)CrossRefGoogle Scholar
  9. 9.
    Gama, J., Sebastião, R., Rodrigues, P.P.: On evaluating stream learning algorithms. Mach. Learn. 90(3), 317–346 (2013)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. 46(4), 44 (2014)CrossRefGoogle Scholar
  11. 11.
    Hand, D.J.: Measuring classifier performance: a coherent alternative to the area under the ROC curve. Mach. Learn. 77(1), 103–123 (2009)CrossRefGoogle Scholar
  12. 12.
    Herschtal, A., Raskutti, B.: Optimising area under the roc curve using gradient descent. In: Proceedings of the twenty-first international conference on Machine learning, p. 49. ACM (2004)Google Scholar
  13. 13.
    Žliobaitė, I., Bifet, A., Read, J., Pfahringer, B., Holmes, G.: Evaluation methods and decision theory for classification of streaming data with temporal dependence. Mach. Learn. 98(3), 455–482 (2015)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.F-SecureHelsinkiFinland

Personalised recommendations