Efficient Top Rank Optimization with Gradient Boosting for Supervised Anomaly Detection

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10534)


In this paper we address the anomaly detection problem in a supervised setting where positive examples might be very sparse. We tackle this task with a learning to rank strategy by optimizing a differentiable smoothed surrogate of the so-called Average Precision (AP). Despite its non-convexity, we show how to use it efficiently in a stochastic gradient boosting framework. We show that using AP is much better to optimize the top rank alerts than the state of the art measures. We demonstrate on anomaly detection tasks that the interest of our method is even reinforced in highly unbalanced scenarios.


  1. 1.
    Cortes, C., Mohri, M.: AUC optimization vs. error rate minimization. In: NIPS, pp. 313–320 (2003)Google Scholar
  2. 2.
    Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. JAIR 16, 321–357 (2002)zbMATHGoogle Scholar
  3. 3.
    Ramentol, E., Caballero, Y., Bello, R., Herrera, F.: SMOTE-RSB*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory. Knowl. Inf. Syst. 33, 245–265 (2012)CrossRefGoogle Scholar
  4. 4.
    Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEBoost: improving prediction of the minority class in boosting. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 107–119. Springer, Heidelberg (2003). CrossRefGoogle Scholar
  5. 5.
    Seiffert, C., Khoshgoftaar, T.M., Hulse, J.V., Napolitano, A.: RUSBoost: a hybrid approach to alleviating class imbalance. IEEE Trans. Syst. Man Cybern. 40(1), 185–197 (2010)CrossRefGoogle Scholar
  6. 6.
    Fan, W., Stolfo, S.J., Zhang, J., Chan, P.K.: AdaCost: misclassification cost-sensitive boosting. In: ICML, pp. 97–105 (1999)Google Scholar
  7. 7.
    Dal Pozzolo, A., Caelen, O., Bontempi, G.: When is undersampling effective in unbalanced classification tasks? In: Appice, A., Rodrigues, P.P., Santos Costa, V., Soares, C., Gama, J., Jorge, A. (eds.) ECML PKDD 2015. LNCS (LNAI), vol. 9284, pp. 200–215. Springer, Cham (2015). CrossRefGoogle Scholar
  8. 8.
    Niculescu-Mizil, A., Caruana, R.: Predicting good probabilities with supervised learning. In: ICML, pp. 625–632 (2005)Google Scholar
  9. 9.
    Dal Pozzolo, A., Caelen, O., Johnson, R.A., Bontempi, G.: Calibrating probability with undersampling for unbalanced classification. In: SSCI, pp. 159–166 (2015)Google Scholar
  10. 10.
    Liu, T.Y.: Learning to Rank for Information Retrieval. Springer, Heidelberg (2011). CrossRefzbMATHGoogle Scholar
  11. 11.
    Joachims, T.: Optimizing search engines using clickthrough data. In: SIGKDD, pp. 133–142 (2002)Google Scholar
  12. 12.
    Yue, Y., Finley, T., Radlinski, F., Joachims, T.: A support vector method for optimizing average precision. In: SIGIR, pp. 271–278 (2007)Google Scholar
  13. 13.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)CrossRefzbMATHGoogle Scholar
  14. 14.
    Freund, Y., Schapire, R., Abe, N.: A short introduction to boosting. J.-Jpn Soc. Artif. Intell. 14(771–780), 1612 (1999)Google Scholar
  15. 15.
    Chapelle, O., Chang, Y.: Yahoo! learning to rank challenge overview, pp. 1–24 (2011)Google Scholar
  16. 16.
    Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2001)MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Freund, Y., Iyer, R., Schapire, R.E., Singer, Y.: An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res. 4, 933–969 (2003)MathSciNetzbMATHGoogle Scholar
  18. 18.
    Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., Hullender, G.: Learning to rank using gradient descent. In: ICML, pp. 89–96 (2005)Google Scholar
  19. 19.
    Herschtal, A., Raskutti, B.: Optimising area under the ROC curve using gradient descent. In: Proceedings of the Twenty-First International Conference on Machine learning, p. 49. ACM (2004)Google Scholar
  20. 20.
    Burges, C.J.: From RankNet to LambdaRank to LambdaMART: an overview. Learning 11, 23–581 (2010)Google Scholar
  21. 21.
    Xu, J., Li, H.: AdaRank: a boosting algorithm for information retrieval. In: SIGIR, pp. 391–398. ACM (2007)Google Scholar
  22. 22.
    Wu, Q., Burges, C.J., Svore, K.M., Gao, J.: Adapting boosting for information retrieval measures. Inf. Retr. 13(3), 254–270 (2010)CrossRefGoogle Scholar
  23. 23.
    Burges, C.J., Ragno, R., Le, Q.V.: Learning to rank with nonsmooth cost functions. In: Schölkopf, P.B., Platt, J.C., Hoffman, T., (eds.) NIPS, pp. 193–200 (2007)Google Scholar
  24. 24.
    Hanley, J.A., McNeil, B.J.: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1), 29–36 (1982)CrossRefGoogle Scholar
  25. 25.
    Friedman, J.H.: Stochastic gradient boosting. Comput. Stat. Data Anal. 38(4), 367–378 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  26. 26.
    Li, N., Jin, R., Zhou, Z.H.: Top rank optimization in linear time. In: NIPS, pp. 1502–1510 (2014)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Univ. Lyon, Univ. St-Etienne, UMR CNRS 5516, Laboratoire Hubert-CurienSaint-ÉtienneFrance
  2. 2.WorldlineBezonsFrance

Personalised recommendations