Advertisement

An Unsupervised Boosting Strategy for Outlier Detection Ensembles

  • Guilherme O. CamposEmail author
  • Arthur Zimek
  • Wagner MeiraJr.
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10937)

Abstract

Ensemble techniques have been applied to the unsupervised outlier detection problem in some scenarios. Challenges are the generation of diverse ensemble members and the combination of individual results into an ensemble. For the latter challenge, some methods tried to design smaller ensembles out of a wealth of possible ensemble members, to improve the diversity and accuracy of the ensemble (relating to the ensemble selection problem in classification). We propose a boosting strategy for combinations showing improvements on benchmark datasets.

Keywords

Outlier detection Ensembles Boosting Ensemble selection 

Notes

Acknowledgments

This work was partially supported by CAPES - Brazil, Fapemig, CNPq, and by projects InWeb, MASWeb, EUBra-BIGSEA (H2020-EU.2.1.1 690116, Brazil/MCTI/RNP GA-000650/04), INCT-Cyber, and Atmosphere (H2020-EU 777154, Brazil/MCTI/RNP 51119).

References

  1. 1.
    Angiulli, F., Pizzuti, C.: Fast outlier detection in high dimensional spaces. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS, vol. 2431, pp. 15–27. Springer, Heidelberg (2002).  https://doi.org/10.1007/3-540-45681-3_2CrossRefGoogle Scholar
  2. 2.
    Breunig, M.M., Kriegel, H.-P., Ng, R., Sander, J.: LOF: identifying density-based local outliers. In: Proceedings SIGMOD, pp. 93–104 (2000)Google Scholar
  3. 3.
    Brown, G., Wyatt, J., Harris, R., Yao, X.: Diversity creation methods: a survey and categorisation. Inf. Fusion 6, 5–20 (2005)CrossRefGoogle Scholar
  4. 4.
    Campos, G.O., Zimek, A., Sander, J., Campello, R.J.G.B., Micenková, B., Schubert, E., Assent, I., Houle, M.E.: On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Min. Knowl. Disc. 30, 891–927 (2016)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Caruana, R., Niculescu-Mizil, A., Crew, G., Ksikes, A.: Ensemble selection from libraries of models. In: Proceedings of ICML (2004)Google Scholar
  6. 6.
    Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000).  https://doi.org/10.1007/3-540-45014-9_1CrossRefGoogle Scholar
  7. 7.
    Gao, J., Tan, P.-N.: Converting output scores from outlier detection algorithms into probability estimates. In: Proceedings of ICDM, pp. 212–221 (2006)Google Scholar
  8. 8.
    Ghosh, J., Acharya, A.: Cluster ensembles. WIREs DMKD 1(4), 305–315 (2011)Google Scholar
  9. 9.
    Gionis, A., Mannila, H., Tsaparas, P.: Clustering aggregation. ACM TKDD 1(1) (2007)CrossRefGoogle Scholar
  10. 10.
    Hautamäki, V., Kärkkäinen, I., Fränti, P.: Outlier detection using k-nearest neighbor graph. In: Proceedings of ICPR, pp. 430–433 (2004)Google Scholar
  11. 11.
    Iam-On, N., Boongoen, T.: Comparative study of matrix refinement approaches for ensemble clustering. Mach. Learn. (2013)Google Scholar
  12. 12.
    Jin, W., Tung, A.K.H., Han, J., Wang, W.: Ranking outliers using symmetric neighborhood relationship. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 577–593. Springer, Heidelberg (2006).  https://doi.org/10.1007/11731139_68CrossRefGoogle Scholar
  13. 13.
    Kirner, E., Schubert, E., Zimek, A.: Good and bad neighborhood approximations for outlier detection ensembles. In: Beecks, C., Borutta, F., Kröger, P., Seidl, T. (eds.) SISAP 2017. LNCS, vol. 10609, pp. 173–187. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-68474-1_12CrossRefGoogle Scholar
  14. 14.
    Kriegel, H.-P., Kröger, P., Schubert, E., Zimek, A.: LoOP: local outlier probabilities. In: Proceedings of CIKM, pp. 1649–1652 (2009)Google Scholar
  15. 15.
    Kriegel, H.-P., Kröger, P., Schubert, E., Zimek, A.: Interpreting and unifying outlier scores. In: Proceedings of SDM, pp. 13–24 (2011)CrossRefGoogle Scholar
  16. 16.
    Kriegel, H.-P., Schubert, M., Zimek, A.: Angle-based outlier detection in high-dimensional data. In: Proceedings of KDD, pp. 444–452 (2008)Google Scholar
  17. 17.
    Latecki, L.J., Lazarevic, A., Pokrajac, D.: Outlier detection with kernel density functions. In: Perner, P. (ed.) MLDM 2007. LNCS (LNAI), vol. 4571, pp. 61–75. Springer, Heidelberg (2007).  https://doi.org/10.1007/978-3-540-73499-4_6CrossRefGoogle Scholar
  18. 18.
    Lazarevic, A., Kumar, V.: Feature bagging for outlier detection. In: Proceedings of KDD, pp. 157–166 (2005)Google Scholar
  19. 19.
    Liu, F.T., Ting, K.M., Zhou, Z.-H.: Isolation-based anomaly detection. ACM TKDD 6(1), 3:1–3:39 (2012)Google Scholar
  20. 20.
    Margineantu, D.D., Dietterich, T.G.: Pruning adaptive boosting. In: Proceedings of ICML, pp. 211–218 (1997)Google Scholar
  21. 21.
    Nguyen, H.V., Ang, H.H., Gopalkrishnan, V.: Mining outliers with ensemble of heterogeneous detectors on random subspaces. In: Kitagawa, H., Ishikawa, Y., Li, Q., Watanabe, C. (eds.) DASFAA 2010, Part I. LNCS, vol. 5981, pp. 368–383. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-12026-8_29CrossRefGoogle Scholar
  22. 22.
    Nguyen, N., Caruana, R.: Consensus clusterings. In: Proceedings of ICDM, pp. 607–612 (2007)Google Scholar
  23. 23.
    Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. In: Proceedings of SIGMOD, pp. 427–438 (2000)CrossRefGoogle Scholar
  24. 24.
    Rayana, S., Akoglu, L.: Less is more: building selective anomaly ensembles. ACM TKDD 10(4), 42:1–42:33 (2016)Google Scholar
  25. 25.
    Rayana, S., Zhong, W., Akoglu, L.: Sequential ensemble learning for outlier detection: a bias-variance perspective. In: Proceedings of ICDM, pp. 1167–1172 (2016)Google Scholar
  26. 26.
    Rokach, L.: Ensemble-based classifiers. Artif. Intell. Rev. 33, 1–39 (2010)CrossRefGoogle Scholar
  27. 27.
    Salehi, M., Zhang, X., Bezdek, J.C., Leckie, C.: Smart sampling: a novel unsupervised boosting approach for outlier detection. In: Kang, B.H., Bai, Q. (eds.) AI 2016. LNCS (LNAI), vol. 9992, pp. 469–481. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-50127-7_40CrossRefGoogle Scholar
  28. 28.
    Schapire, R.E., Freund, Y.: Boosting. Foundations and Algorithms. MIT Press, Cambridge (2012)zbMATHGoogle Scholar
  29. 29.
    Schubert, E., Wojdanowski, R., Zimek, A., Kriegel, H.-P.: On evaluation of outlier rankings and outlier scores. In: Proceedings of SDM, pp. 1047–1058 (2012)CrossRefGoogle Scholar
  30. 30.
    Schubert, E., Zimek, A., Kriegel, H.-P.: Generalized outlier detection with flexible kernel density estimates. In: Proceedings of SDM, pp. 542–550 (2014)CrossRefGoogle Scholar
  31. 31.
    Schubert, E., Zimek, A., Kriegel, H.-P.: Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection. Data Min. Knowl. Disc. 28(1), 190–237 (2014)MathSciNetCrossRefGoogle Scholar
  32. 32.
    Strehl, A., Ghosh, J.: Cluster ensembles – a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002)MathSciNetzbMATHGoogle Scholar
  33. 33.
    Tang, J., Chen, Z., Fu, A.W., Cheung, D.W.: Enhancing effectiveness of outlier detections for low density patterns. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 535–548. Springer, Heidelberg (2002).  https://doi.org/10.1007/3-540-47887-6_53CrossRefGoogle Scholar
  34. 34.
    Topchy, A., Jain, A., Punch, W.: Clustering ensembles: models of concensus and weak partitions. IEEE TPAMI 27(12), 1866–1881 (2005)CrossRefGoogle Scholar
  35. 35.
    Tsoumakas, G., Partalas, I., Vlahavas, I.: An ensemble pruning primer. In: Okun, O., Valentini, G. (eds.) Applications of Supervised and Unsupervised Ensemble Methods. SCI, vol. 245, pp. 1–13. Springer, Heidelberg (2009).  https://doi.org/10.1007/978-3-642-03999-7_1CrossRefGoogle Scholar
  36. 36.
    Valentini, G., Masulli, F.: Ensembles of learning machines. In: Marinaro, M., Tagliaferri, R. (eds.) WIRN 2002. LNCS, vol. 2486, pp. 3–20. Springer, Heidelberg (2002).  https://doi.org/10.1007/3-540-45808-5_1CrossRefzbMATHGoogle Scholar
  37. 37.
    Zhang, K., Hutter, M., Jin, H.: A new local distance-based outlier detection approach for scattered real-world data. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5476, pp. 813–822. Springer, Heidelberg (2009).  https://doi.org/10.1007/978-3-642-01307-2_84CrossRefGoogle Scholar
  38. 38.
    Zhou, Z., Wu, J., Tang, W.: Ensembling neural networks: many could be better than all. Artif. Intell. 137(1–2), 239–263 (2002)MathSciNetCrossRefGoogle Scholar
  39. 39.
    Zhou, Z.-H.: Ensemble Methods. Foundations and Algorithms. CRC Press, Boca Raton (2012)Google Scholar
  40. 40.
    Zimek, A., Campello, R.J.G.B., Sander, J.: Ensembles for unsupervised outlier detection: challenges and research questions. SIGKDD Explor. 15(1), 11–22 (2013)CrossRefGoogle Scholar
  41. 41.
    Zimek, A., Campello, R.J.G.B., Sander, J.: Data perturbation for outlier detection ensembles. In: Proceedings of SSDBM, pp. 13:1–13:12 (2014)Google Scholar
  42. 42.
    Zimek, A., Gaudet, M., Campello, R.J.G.B., Sander, J.: Subsampling for efficient and effective unsupervised outlier detection ensembles. In: Proceedings of KDD, pp. 428–436 (2013)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Guilherme O. Campos
    • 1
    • 2
    Email author
  • Arthur Zimek
    • 2
  • Wagner MeiraJr.
    • 1
  1. 1.Department of Computer ScienceFederal University of Minas GeraisBelo HorizonteBrazil
  2. 2.Department of Mathematics and Computer ScienceUniversity of Southern DenmarkOdenseDenmark

Personalised recommendations