Advertisement

Randomizing Greedy Ensemble Outlier Detection with GRASP

  • Lediona Nishani
  • Marenglen Biba
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 611)

Abstract

Ensemble methods have been recently used in many applications of machine learning in different areas. In this context, outlier detection is an area where recently these methods have received increasing attention. This paper deals with randomization in ensemble methods for outlier detection. We have developed a novel algorithm exploiting stochastic local search heuristics to induce diversity in an ensemble outlier detection algorithm. We exploit the capability of the GRASP heuristic to induce diversity into the search process and to maintain a good balance of exploitation and diversification in building the ensemble. The conducted experiments show interesting improvements over the greedy ensemble method and open the path for novel research in this direction.

Keywords

Outlier detection Ensemble methods Machine learning Stochastic local search GRASP 

References

  1. 1.
    Hawkins, D.: Identification of outliers. Monographs on Applied Probability and Statistics (1980)Google Scholar
  2. 2.
    Grubbs, F.E.: Procedures for Detecting outlying observations in samples. In: Technometrics 11.1 (1969), pp. 1–21 (1969)Google Scholar
  3. 3.
    Barnett, V., Lewis, T.: Outliers in Statistical Data, 3rd edn. Wiley, Hoboken (1994)zbMATHGoogle Scholar
  4. 4.
    Ng, R., Subrahmanian, V.: Stable for Semantics for Probabilistic Deductive Database. University of MaryLand (1990)Google Scholar
  5. 5.
    Blakeslee, S.: Lost on Earth: Wealth of Data Found in Space. The New York Times (1990)Google Scholar
  6. 6.
    Ester, M., Kriegel, H-P., Sander, J., Xu, X.: A Density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD 1996 Proceedings of AAAI (1996). Copyright © 1996. www.aaai.org
  7. 7.
    Ankerst, M., Breunig, M., Kriegel, H.-P., Sander, J.: OPTICS: ordering points to identify the clustering structure. In: SIGMOD 1999 Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, Philadelphia, Pennsylvania, USA, 31 May–03 June 1999, pp. 49–60. ACM, New York (1999). ©1999Google Scholar
  8. 8.
    Breunig, M.M., Kriegel, H.-P., Ng, R.T., Sander, J.: Lof: identifying density-based local outliers. In: Chen, W., Naughton, J.F., Bernstein, P.A. (eds.) Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, Texas, USA, 16–18 May 2000, pp. 93–104. ACM (2000)Google Scholar
  9. 9.
    Tang, J., Chen, Z., Fu, A.W.-C., Cheung, D.W.: Enhancing effectiveness of outlier detections for low density patterns. In: Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), Taipei, Taiwan (2002)Google Scholar
  10. 10.
    Papadimitriou, S., Kitagawa, H., Gibbons, P.B.: LOCI: fast outlier detection using the local correlation integral. In: IEEE 19th International Conference on Data Engineering (ICDE 2003) (2003) Google Scholar
  11. 11.
    Kriegel, H.-P., Kroger, P., Schubert, E., Zimek, A.: LoOP: local outlier probabilities. In: Proceedings of CIKM, pp. 1649–1652 (2009)Google Scholar
  12. 12.
    Schubert, E., Zimek, A., Kriegel, H.-P.: Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection. Data Min. Knowl. Disc. (2012). doi: 10.1007/s10618-012-0300-z zbMATHGoogle Scholar
  13. 13.
    Hadi, A.S., Imon, A.H.M.R., Werner, M.: Detection of outliers. Wiley Interdisc. Rev.: Comput. Stat. 1(1), 57–70 (2009)CrossRefGoogle Scholar
  14. 14.
    Angiulli, F., Pizzuti, C.: Fast outlier detection in high dimensional spaces. In: Proceedings of European Conference on Principles of Knowledge Discovery and Data Mining, Helsinki, Finland (2002)Google Scholar
  15. 15.
    Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets, pp. 392–403 (1998)Google Scholar
  16. 16.
    Orair, G.H., Teixeira, C.H.C., Meira Jr., W., Wang, Y., Parthasarathy, S.: Distance-based outlier detection: consolidationGoogle Scholar
  17. 17.
    Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. In: SIGMOD Record, vol. 29, pp. 427–438. ACM (2000)Google Scholar
  18. 18.
    Vu, N.H., Gopalkrishnan, V.: Efficient Pruning Schemes for Distance-Based Outlier Detection. In: Machine Learning and Knowledge Discovery in Databases. Lecture Notes in Computer Science, vol. 5782, pp. 160–175Google Scholar
  19. 19.
    Zhang, K., Hutter, M., Jin, H.: A New Local Distance-Based Outlier Detection (2009)Google Scholar
  20. 20.
    Approach for scattered real-world data. In: Proceedings of 13th Pacific-Asia Conference on Knowledge and Discovery and Data Mining (PAKDD 2000), pp. 813–822Google Scholar
  21. 21.
    de Vries, T., Chawla, S., Houle, M.E.: Finding local anomalies in very high dimensional space. In: Proceedings of the 10th IEEE International Conference on Data Mining (ICDM), Sydney, Australia, pp. 128–137 (2010). doi: 10.1109/ICDM.2010.151
  22. 22.
    Keller, F., Müller, E., Böhm, K.: HiCS: high contrast subspaces for density-based outlier ranking. In: Proceedings of the 28th International Conference on Data Engineering (ICDE), Washington, DC (2012)Google Scholar
  23. 23.
    Zimek, A., Gaudet, R.J.G., Campello, B., Sander, J.: Subsampling for efficient and effective unsupervised outlier detection ensembles. In: Proceedings of the 19th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), Chicago, IL, pp. 428–436 (2013). doi: 10.1145/2487575.2487676
  24. 24.
    Breiman, L.: Bagging predictors. Mach. Learn. 24, 123–140 (1996). © 1996 Kluwer Academic Publishers, Boston. Manufactured in The NetherlandszbMATHGoogle Scholar
  25. 25.
    Schubert, E., Wojdanowski, R., Zimek, A., Kriegel, H.: On evaluation of outlier rankings and outlier scores. In: Proceedings of the SIAM International Conference on Data Mining, (SIAM 2012), Anaheim, CA, pp. 1047–1058 (2012)Google Scholar
  26. 26.
    Schubert, E.: Generalized and Efficient Outlier Detection for Spatial, Temporal, and High-Dimensional Data Mining. Munchen, Germany (2013)Google Scholar
  27. 27.
    Hoos, H.H., Stützle, T.: Stochastic Local Search Foundations and Applications. Elsevier Inc., San Francisco (2005)zbMATHGoogle Scholar
  28. 28.
    Resende, M., Ribeiro, C.: Greedy randomized adaptive search procedures. J. Glob. Optim. 6, 109–133 (1995). Kluwer Academic Publisher, NetherlandsMathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  1. 1.University of New York in TiranaTiranaAlbania

Personalised recommendations