Advertisement

Visual Verification and Analysis of Outliers Using Optimal Outlier Detection Result by Choosing Proper Algorithm and Parameter

  • Bilkis Jamal Ferdosi
  • Muhammad Masud Tarek
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 813)

Abstract

Outlier detection is a nontrivial but important task for many of the application areas. There exist several methods in literature to find outliers. However, there is no single method that outperforms in all cases. Thus, finding proper algorithm and the value of its relevant parameter is crucial. In addition, none of the methods are perfect and verification by the domain experts can confirm if the outliers detected are meaningful or not. Proper visual representation of the detected outliers may help experts to resolve anomalies. In this paper, we proposed a visual analytic system that finds proper algorithm and value of its relevant parameter for a specific dataset using training set. Later, the chosen method is applied to the test data to obtain the outlier ranking. After that, data points are visualized with parallel coordinate plot (PCP) where colors of the lines are obtained by using the outlier factor of the data points. PCP is one of the popular high-dimensional data visualization techniques where coordinates are parallel to each other and each data point is represented by a line. Using the visual, experts can provide feedback and update the result. Experiments with different datasets ensure the strength of our system.

Keywords

Outlier detection Parallel coordinates Human-centered computing Visual analytics 

References

  1. 1.
    Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: Proceedings of the 24th International Conference on Very Large Data Bases, New York, NY, pp. 392–403 (1998)Google Scholar
  2. 2.
    Breunig, M.M., Kriegel, H.-P., Ng, R., Sander, J.: LOF: identifying density-based local outliers. In: Proceedings of the SIGMOD, pp. 93–104 (2000)Google Scholar
  3. 3.
    Aggarwal, C.C., Yu, P.S.: Outlier detection for high dimensional data. In: Proceedings of the SIGMOD, pp. 37–46 (2001)Google Scholar
  4. 4.
    Kriegel, H.-P., Kröger, P., Schubert, E., Zimek, A.: Outlier detection in axis-parallel subspaces of high dimensional data. In: Proceedings of the PAKDD, pp. 831–838 (2009)Google Scholar
  5. 5.
    Müller, E., Assent, I., Steinhausen, U., Seidl, T.: OutRank: ranking outliers in high dimensional data. In: Proceedings of the ICDE Workshop DBRank, pp. 600–603 (2008)Google Scholar
  6. 6.
    Müller, E., Schiffer, M., Seidl, T.: Adaptive outlierness for subspace outlier ranking. In: Proceedings of the CIKM (2010)Google Scholar
  7. 7.
    Nguyen, H.V., Gopalkrishnan, V., Assent, I.: An unbiased distance-based outlier detection approach for high dimensional data. In: Proceedings of the DASFAA, pp. 138–152 (2011)Google Scholar
  8. 8.
    Kriegel, H.-P., Kröger, P., Schubert, E., Zimek, A.: Outlier detection in arbitrarily oriented subspaces. In: Proceedings of the 12th IEEE International Conference on Data Mining (ICDM), Brussels, Belgium (2012)Google Scholar
  9. 9.
    Knorr, E.M., Ng, R.T.: Finding intensional knowledge of distance-based outliers. In: Proceedings of the 25th International Conference on Very Large Data Bases, Edinburgh, Scotland, pp. 211–222 (1999)Google Scholar
  10. 10.
    Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. In: Proceedings of the SIGMOD, pp. 427–438 (2000)Google Scholar
  11. 11.
    Bay, S.D., Schwabacher, M.: Mining distance-based outliers in nearly linear time with randomization and a simple pruning rule. In: Proceedings of the KDD, pp. 29–38 (2003)Google Scholar
  12. 12.
    Angiulli, F., Pizzuti, C.: Fast outlier detection in high dimensional spaces. In: Proceedings of the PKDD, pp. 15–26 (2002)Google Scholar
  13. 13.
    Tang, J., Chen, Z., Fu, A.W.-C., Cheung, D.W.: Enhancing effectiveness of outlier detections for low density patterns. In: Proceedings of the PAKDD, pp. 535–548 (2002)Google Scholar
  14. 14.
    Jin, W., Tung, A.K. H., Han, J., Wang, W.: Ranking outliers using symmetric neighborhood relationship. In: Proceedings of the PAKDD, pp. 577–593 (2006)Google Scholar
  15. 15.
    Kriegel, H.-P., Kröger, P., Schubert, E., Zimek, A.: LoOP: local outlier probabilities. In: Proceedings of the CIKM (2009)Google Scholar
  16. 16.
    Barnett, V., Lewis, T.: Outliers in Statistical Data, 3rd edn. Wiley (1994)Google Scholar
  17. 17.
    Campos, G., Zimek, A., Sander, J., Campello, R., Micenkova, B., Schubert, E., Assent, I., Houle, M.: On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Min. Knowl. Discov. 30(4), 891–927 (2016)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Craswell, N.: Precision at n. In: Liu, L., Özsu, M.T. (eds.) Encyclopedia of database systems, pp. 2127–212. Springer, Berlin (2009).  https://doi.org/10.1007/978-0-387-39940-9_484
  19. 19.
    Zhang, E., Zhang, Y., Average precision. In: Liu, L., Özsu, M.T. (eds.) Encyclopedia of Database Systems, pp. 192–193. Springer, Berlin (2009).  https://doi.org/10.1007/978-0-387-39940-9_482
  20. 20.
    Angiulli, F., Pizzuti, C.: Outlier mining in large high-dimensional data sets. IEEE Trans. Knowl. Data Eng. 17(2), 203– 215 (2005).  https://doi.org/10.1109/tkde.2005.31
  21. 21.
    Schubert, E., Zimek, A., Kriegel, H.-P.: Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection. Data Min. Knowl. Discov. 28(1), 190–237 (2014).  https://doi.org/10.1007/s10618-012-0300-z
  22. 22.
    Kriegel, H.-P., Kröger, P., Schubert, E., Zimek, A.: LoOP: local outlier probabilities. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM), Hong Kong, pp. 1649–1652 (2009).  https://doi.org/10.1145/1645953.1646195
  23. 23.
    Zhang, K., Hutter, M., Jin, H.: A new local distance-based outlier detection approach for scattered real world data. In: Proceedings of the 13th Pacific-Asia conference on Knowledge Discovery and Data Mining (PAKDD), Bangkok, pp. 813–822 (2009).  https://doi.org/10.1007/978-3-642-01307-2_84
  24. 24.
    Hautamäki, V., Kärkkäinen, I., Fränti, P.: Outlier detection using k-nearest neighbor graph. In: Proceedings of the 17th International Conference on Pattern Recognition (ICPR), Cambridge, pp. 430–433 (2004).  https://doi.org/10.1109/icpr.2004.1334558
  25. 25.
    Schubert, E., Zimek, A., Kriegel, H.-P.: Generalized outlier detection with flexible kernel density estimates. In: Proceedings of the 14th SIAM International Conference on Data Mining (SDM), Philadelphia, pp. 542–550 (2014).  https://doi.org/10.1137/1.9781611973440.63
  26. 26.
    Kriegel, H.-P., Schubert, M., Zimek, A.: Angle-based outlier detection in high-dimensional data. In: Proceedings of the 14th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), Las Vegas, pp. 444–452 (2008).  https://doi.org/10.1145/1401890.1401946
  27. 27.
    Latecki, L.J., Lazarevic, A., Pokrajac, D.: Outlier detection with kernel density functions. In: Proceedings of the 5th International Conference on Machine Learning and Data Mining in Pattern Recognition (MLDM), Leipzig, pp. 61–75 (2007).  https://doi.org/10.1007/978-3-540-73499-4_6
  28. 28.
    Inselberg, A.: Parallel Coordinates: Visual Multidimensional Geometry and its Applications. Springer-Verlag New York Inc, Secaucus, NJ, USA (2009)CrossRefGoogle Scholar
  29. 29.
    Micenkova, B., Ng, R.T., Dang, X.-H., Assent, I.: Explaining outliers by subspace separability. In: 2013 IEEE 13th International Conference on Data Mining, vol. 00, pp. 518–527 (2013).  https://doi.org/10.1109/icdm.2013.132
  30. 30.
    von Brünken, J., Houle, M.E., Zimek, A.: Intrinsic Dimensional Outlier Detection in High-Dimensional Data Technical Report, No. NII-2015-003E, National Institute of Informatics (2015)Google Scholar
  31. 31.
    Tukey, J.W.: Exploratory Data Analysis. Addison–Wesley (1977)Google Scholar
  32. 32.
    Novotny, M., Hauser, H.: Outlier-preserving focus+context visualization in parallel coordinates. IEEE Trans. Visual. Comput. Graphics 5, 893-900 (2006). http://dx.doi.org/10.1109/TVCG.2006.170
  33. 33.
    Achtert, E., Kriegel, H.P., Reichert, L., Schubert, E., Wojdanowski, R., Zimek, A.: Visual evaluation of outlier detection models. In: Proceedings of the 15th International Conference on Database Systems for Advanced Applications (DASFAA), Tsukuba, Japan, pp. 396–399 (2010)Google Scholar
  34. 34.
    Kandogan, E.: Just-in-time annotation of clusters, outliers, and trends in point based data visualizations. IEEE TVCG 73-82 (2012)Google Scholar
  35. 35.
    Schubert, E., Koos, A., Emrich, T., Züfle, A., Schmid, K.A., Zimek, A.: A framework for clustering uncertain data. PVLDB 8(12), 1976–1979 (2015)Google Scholar
  36. 36.
    Swayne, D.F., Lang, D.T., Buja, A., Cook, D.: GGobi: evolving from XGobi into an extensible framework for interactive data visualization. Comput. Stat. Data Anal. 43, 423–444 (2003)MathSciNetCrossRefGoogle Scholar
  37. 37.

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.Department of CSEUniversity of Asia PacificDhakaBangladesh
  2. 2.Department of CSEState University of BangladeshDhakaBangladesh

Personalised recommendations