An Unsupervised Boosting Strategy for Outlier Detection Ensembles

Campos, Guilherme O.; Zimek, Arthur; Meira, Wagner

doi:10.1007/978-3-319-93034-3_45

Guilherme O. Campos^19,20,
Arthur Zimek²⁰ &
Wagner Meira Jr.¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10937))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

5645 Accesses
11 Citations

Abstract

Ensemble techniques have been applied to the unsupervised outlier detection problem in some scenarios. Challenges are the generation of diverse ensemble members and the combination of individual results into an ensemble. For the latter challenge, some methods tried to design smaller ensembles out of a wealth of possible ensemble members, to improve the diversity and accuracy of the ensemble (relating to the ensemble selection problem in classification). We propose a boosting strategy for combinations showing improvements on benchmark datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Angiulli, F., Pizzuti, C.: Fast outlier detection in high dimensional spaces. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS, vol. 2431, pp. 15–27. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45681-3_2
Chapter Google Scholar
Breunig, M.M., Kriegel, H.-P., Ng, R., Sander, J.: LOF: identifying density-based local outliers. In: Proceedings SIGMOD, pp. 93–104 (2000)
Google Scholar
Brown, G., Wyatt, J., Harris, R., Yao, X.: Diversity creation methods: a survey and categorisation. Inf. Fusion 6, 5–20 (2005)
Article Google Scholar
Campos, G.O., Zimek, A., Sander, J., Campello, R.J.G.B., Micenková, B., Schubert, E., Assent, I., Houle, M.E.: On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Min. Knowl. Disc. 30, 891–927 (2016)
Article MathSciNet Google Scholar
Caruana, R., Niculescu-Mizil, A., Crew, G., Ksikes, A.: Ensemble selection from libraries of models. In: Proceedings of ICML (2004)
Google Scholar
Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45014-9_1
Chapter Google Scholar
Gao, J., Tan, P.-N.: Converting output scores from outlier detection algorithms into probability estimates. In: Proceedings of ICDM, pp. 212–221 (2006)
Google Scholar
Ghosh, J., Acharya, A.: Cluster ensembles. WIREs DMKD 1(4), 305–315 (2011)
Google Scholar
Gionis, A., Mannila, H., Tsaparas, P.: Clustering aggregation. ACM TKDD 1(1) (2007)
Article Google Scholar
Hautamäki, V., Kärkkäinen, I., Fränti, P.: Outlier detection using k-nearest neighbor graph. In: Proceedings of ICPR, pp. 430–433 (2004)
Google Scholar
Iam-On, N., Boongoen, T.: Comparative study of matrix refinement approaches for ensemble clustering. Mach. Learn. (2013)
Google Scholar
Jin, W., Tung, A.K.H., Han, J., Wang, W.: Ranking outliers using symmetric neighborhood relationship. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 577–593. Springer, Heidelberg (2006). https://doi.org/10.1007/11731139_68
Chapter Google Scholar
Kirner, E., Schubert, E., Zimek, A.: Good and bad neighborhood approximations for outlier detection ensembles. In: Beecks, C., Borutta, F., Kröger, P., Seidl, T. (eds.) SISAP 2017. LNCS, vol. 10609, pp. 173–187. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68474-1_12
Chapter Google Scholar
Kriegel, H.-P., Kröger, P., Schubert, E., Zimek, A.: LoOP: local outlier probabilities. In: Proceedings of CIKM, pp. 1649–1652 (2009)
Google Scholar
Kriegel, H.-P., Kröger, P., Schubert, E., Zimek, A.: Interpreting and unifying outlier scores. In: Proceedings of SDM, pp. 13–24 (2011)
Chapter Google Scholar
Kriegel, H.-P., Schubert, M., Zimek, A.: Angle-based outlier detection in high-dimensional data. In: Proceedings of KDD, pp. 444–452 (2008)
Google Scholar
Latecki, L.J., Lazarevic, A., Pokrajac, D.: Outlier detection with kernel density functions. In: Perner, P. (ed.) MLDM 2007. LNCS (LNAI), vol. 4571, pp. 61–75. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-73499-4_6
Chapter Google Scholar
Lazarevic, A., Kumar, V.: Feature bagging for outlier detection. In: Proceedings of KDD, pp. 157–166 (2005)
Google Scholar
Liu, F.T., Ting, K.M., Zhou, Z.-H.: Isolation-based anomaly detection. ACM TKDD 6(1), 3:1–3:39 (2012)
Google Scholar
Margineantu, D.D., Dietterich, T.G.: Pruning adaptive boosting. In: Proceedings of ICML, pp. 211–218 (1997)
Google Scholar
Nguyen, H.V., Ang, H.H., Gopalkrishnan, V.: Mining outliers with ensemble of heterogeneous detectors on random subspaces. In: Kitagawa, H., Ishikawa, Y., Li, Q., Watanabe, C. (eds.) DASFAA 2010, Part I. LNCS, vol. 5981, pp. 368–383. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12026-8_29
Chapter Google Scholar
Nguyen, N., Caruana, R.: Consensus clusterings. In: Proceedings of ICDM, pp. 607–612 (2007)
Google Scholar
Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. In: Proceedings of SIGMOD, pp. 427–438 (2000)
Article Google Scholar
Rayana, S., Akoglu, L.: Less is more: building selective anomaly ensembles. ACM TKDD 10(4), 42:1–42:33 (2016)
Google Scholar
Rayana, S., Zhong, W., Akoglu, L.: Sequential ensemble learning for outlier detection: a bias-variance perspective. In: Proceedings of ICDM, pp. 1167–1172 (2016)
Google Scholar
Rokach, L.: Ensemble-based classifiers. Artif. Intell. Rev. 33, 1–39 (2010)
Article Google Scholar
Salehi, M., Zhang, X., Bezdek, J.C., Leckie, C.: Smart sampling: a novel unsupervised boosting approach for outlier detection. In: Kang, B.H., Bai, Q. (eds.) AI 2016. LNCS (LNAI), vol. 9992, pp. 469–481. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-50127-7_40
Chapter Google Scholar
Schapire, R.E., Freund, Y.: Boosting. Foundations and Algorithms. MIT Press, Cambridge (2012)
MATH Google Scholar
Schubert, E., Wojdanowski, R., Zimek, A., Kriegel, H.-P.: On evaluation of outlier rankings and outlier scores. In: Proceedings of SDM, pp. 1047–1058 (2012)
Chapter Google Scholar
Schubert, E., Zimek, A., Kriegel, H.-P.: Generalized outlier detection with flexible kernel density estimates. In: Proceedings of SDM, pp. 542–550 (2014)
Chapter Google Scholar
Schubert, E., Zimek, A., Kriegel, H.-P.: Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection. Data Min. Knowl. Disc. 28(1), 190–237 (2014)
Article MathSciNet Google Scholar
Strehl, A., Ghosh, J.: Cluster ensembles – a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002)
MathSciNet MATH Google Scholar
Tang, J., Chen, Z., Fu, A.W., Cheung, D.W.: Enhancing effectiveness of outlier detections for low density patterns. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 535–548. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-47887-6_53
Chapter Google Scholar
Topchy, A., Jain, A., Punch, W.: Clustering ensembles: models of concensus and weak partitions. IEEE TPAMI 27(12), 1866–1881 (2005)
Article Google Scholar
Tsoumakas, G., Partalas, I., Vlahavas, I.: An ensemble pruning primer. In: Okun, O., Valentini, G. (eds.) Applications of Supervised and Unsupervised Ensemble Methods. SCI, vol. 245, pp. 1–13. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03999-7_1
Chapter Google Scholar
Valentini, G., Masulli, F.: Ensembles of learning machines. In: Marinaro, M., Tagliaferri, R. (eds.) WIRN 2002. LNCS, vol. 2486, pp. 3–20. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45808-5_1
Chapter MATH Google Scholar
Zhang, K., Hutter, M., Jin, H.: A new local distance-based outlier detection approach for scattered real-world data. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5476, pp. 813–822. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01307-2_84
Chapter Google Scholar
Zhou, Z., Wu, J., Tang, W.: Ensembling neural networks: many could be better than all. Artif. Intell. 137(1–2), 239–263 (2002)
Article MathSciNet Google Scholar
Zhou, Z.-H.: Ensemble Methods. Foundations and Algorithms. CRC Press, Boca Raton (2012)
Google Scholar
Zimek, A., Campello, R.J.G.B., Sander, J.: Ensembles for unsupervised outlier detection: challenges and research questions. SIGKDD Explor. 15(1), 11–22 (2013)
Article Google Scholar
Zimek, A., Campello, R.J.G.B., Sander, J.: Data perturbation for outlier detection ensembles. In: Proceedings of SSDBM, pp. 13:1–13:12 (2014)
Google Scholar
Zimek, A., Gaudet, M., Campello, R.J.G.B., Sander, J.: Subsampling for efficient and effective unsupervised outlier detection ensembles. In: Proceedings of KDD, pp. 428–436 (2013)
Google Scholar

Download references

Acknowledgments

This work was partially supported by CAPES - Brazil, Fapemig, CNPq, and by projects InWeb, MASWeb, EUBra-BIGSEA (H2020-EU.2.1.1 690116, Brazil/MCTI/RNP GA-000650/04), INCT-Cyber, and Atmosphere (H2020-EU 777154, Brazil/MCTI/RNP 51119).

Author information

Authors and Affiliations

Department of Computer Science, Federal University of Minas Gerais, Belo Horizonte, Brazil
Guilherme O. Campos & Wagner Meira Jr.
Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
Guilherme O. Campos & Arthur Zimek

Authors

Guilherme O. Campos
View author publications
You can also search for this author in PubMed Google Scholar
Arthur Zimek
View author publications
You can also search for this author in PubMed Google Scholar
Wagner Meira Jr.
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guilherme O. Campos .

Editor information

Editors and Affiliations

Deakin University, Geelong, Victoria, Australia
Dinh Phung
National Chiao Tung University, Hsinchu City, Taiwan
Vincent S. Tseng
Monash University, Clayton, Victoria, Australia
Geoffrey I. Webb
Japan Advanced Institute of Science and Technology, Nomi, Ishikawa, Japan
Bao Ho
University of Melbourne, Melbourne, Victoria, Australia
Mohadeseh Ganji
University of Melbourne, Melbourne, Victoria, Australia
Lida Rashidi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Campos, G.O., Zimek, A., Meira, W. (2018). An Unsupervised Boosting Strategy for Outlier Detection Ensembles. In: Phung, D., Tseng, V., Webb, G., Ho, B., Ganji, M., Rashidi, L. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2018. Lecture Notes in Computer Science(), vol 10937. Springer, Cham. https://doi.org/10.1007/978-3-319-93034-3_45

Download citation

DOI: https://doi.org/10.1007/978-3-319-93034-3_45
Published: 19 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93033-6
Online ISBN: 978-3-319-93034-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics