Efficient Top Rank Optimization with Gradient Boosting for Supervised Anomaly Detection

  • Jordan Frery
  • Amaury Habrard
  • Marc Sebban
  • Olivier Caelen
  • Liyun He-Guelton
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10534)

Abstract

In this paper we address the anomaly detection problem in a supervised setting where positive examples might be very sparse. We tackle this task with a learning to rank strategy by optimizing a differentiable smoothed surrogate of the so-called Average Precision (AP). Despite its non-convexity, we show how to use it efficiently in a stochastic gradient boosting framework. We show that using AP is much better to optimize the top rank alerts than the state of the art measures. We demonstrate on anomaly detection tasks that the interest of our method is even reinforced in highly unbalanced scenarios.

References

  1. 1.
    Cortes, C., Mohri, M.: AUC optimization vs. error rate minimization. In: NIPS, pp. 313–320 (2003)Google Scholar
  2. 2.
    Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. JAIR 16, 321–357 (2002)MATHGoogle Scholar
  3. 3.
    Ramentol, E., Caballero, Y., Bello, R., Herrera, F.: SMOTE-RSB*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory. Knowl. Inf. Syst. 33, 245–265 (2012)CrossRefGoogle Scholar
  4. 4.
    Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEBoost: improving prediction of the minority class in boosting. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 107–119. Springer, Heidelberg (2003).  https://doi.org/10.1007/978-3-540-39804-2_12 CrossRefGoogle Scholar
  5. 5.
    Seiffert, C., Khoshgoftaar, T.M., Hulse, J.V., Napolitano, A.: RUSBoost: a hybrid approach to alleviating class imbalance. IEEE Trans. Syst. Man Cybern. 40(1), 185–197 (2010)CrossRefGoogle Scholar
  6. 6.
    Fan, W., Stolfo, S.J., Zhang, J., Chan, P.K.: AdaCost: misclassification cost-sensitive boosting. In: ICML, pp. 97–105 (1999)Google Scholar
  7. 7.
    Dal Pozzolo, A., Caelen, O., Bontempi, G.: When is undersampling effective in unbalanced classification tasks? In: Appice, A., Rodrigues, P.P., Santos Costa, V., Soares, C., Gama, J., Jorge, A. (eds.) ECML PKDD 2015. LNCS (LNAI), vol. 9284, pp. 200–215. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-23528-8_13 CrossRefGoogle Scholar
  8. 8.
    Niculescu-Mizil, A., Caruana, R.: Predicting good probabilities with supervised learning. In: ICML, pp. 625–632 (2005)Google Scholar
  9. 9.
    Dal Pozzolo, A., Caelen, O., Johnson, R.A., Bontempi, G.: Calibrating probability with undersampling for unbalanced classification. In: SSCI, pp. 159–166 (2015)Google Scholar
  10. 10.
    Liu, T.Y.: Learning to Rank for Information Retrieval. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-14267-3 CrossRefMATHGoogle Scholar
  11. 11.
    Joachims, T.: Optimizing search engines using clickthrough data. In: SIGKDD, pp. 133–142 (2002)Google Scholar
  12. 12.
    Yue, Y., Finley, T., Radlinski, F., Joachims, T.: A support vector method for optimizing average precision. In: SIGIR, pp. 271–278 (2007)Google Scholar
  13. 13.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)CrossRefMATHGoogle Scholar
  14. 14.
    Freund, Y., Schapire, R., Abe, N.: A short introduction to boosting. J.-Jpn Soc. Artif. Intell. 14(771–780), 1612 (1999)Google Scholar
  15. 15.
    Chapelle, O., Chang, Y.: Yahoo! learning to rank challenge overview, pp. 1–24 (2011)Google Scholar
  16. 16.
    Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2001)MathSciNetCrossRefMATHGoogle Scholar
  17. 17.
    Freund, Y., Iyer, R., Schapire, R.E., Singer, Y.: An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res. 4, 933–969 (2003)MathSciNetMATHGoogle Scholar
  18. 18.
    Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., Hullender, G.: Learning to rank using gradient descent. In: ICML, pp. 89–96 (2005)Google Scholar
  19. 19.
    Herschtal, A., Raskutti, B.: Optimising area under the ROC curve using gradient descent. In: Proceedings of the Twenty-First International Conference on Machine learning, p. 49. ACM (2004)Google Scholar
  20. 20.
    Burges, C.J.: From RankNet to LambdaRank to LambdaMART: an overview. Learning 11, 23–581 (2010)Google Scholar
  21. 21.
    Xu, J., Li, H.: AdaRank: a boosting algorithm for information retrieval. In: SIGIR, pp. 391–398. ACM (2007)Google Scholar
  22. 22.
    Wu, Q., Burges, C.J., Svore, K.M., Gao, J.: Adapting boosting for information retrieval measures. Inf. Retr. 13(3), 254–270 (2010)CrossRefGoogle Scholar
  23. 23.
    Burges, C.J., Ragno, R., Le, Q.V.: Learning to rank with nonsmooth cost functions. In: Schölkopf, P.B., Platt, J.C., Hoffman, T., (eds.) NIPS, pp. 193–200 (2007)Google Scholar
  24. 24.
    Hanley, J.A., McNeil, B.J.: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1), 29–36 (1982)CrossRefGoogle Scholar
  25. 25.
    Friedman, J.H.: Stochastic gradient boosting. Comput. Stat. Data Anal. 38(4), 367–378 (2002)MathSciNetCrossRefMATHGoogle Scholar
  26. 26.
    Li, N., Jin, R., Zhou, Z.H.: Top rank optimization in linear time. In: NIPS, pp. 1502–1510 (2014)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Jordan Frery
    • 1
    • 2
  • Amaury Habrard
    • 1
  • Marc Sebban
    • 1
  • Olivier Caelen
    • 2
  • Liyun He-Guelton
    • 2
  1. 1.Univ. Lyon, Univ. St-Etienne, UMR CNRS 5516, Laboratoire Hubert-CurienSaint-ÉtienneFrance
  2. 2.WorldlineBezonsFrance

Personalised recommendations