Outliers on Concept Lattices

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8417)

Abstract

Outlier detection in mixed-type data, which contain both discrete and continuous features, is still a challenging problem. Here we newly introduce concept-based outlierness, which is defined on a hierarchy of clusters of data points and features, called the concept lattice, obtained by formal concept analysis (FCA). Intuitively, this outlierness is the degree of isolation of clusters on the hierarchy. Moreover, we investigate discretization of continuous features to embed the original continuous (Euclidean) space into the concept lattice. Our experiments show that the proposed method which detects concept-based outliers is more effective than other popular distance-based outlier detection methods that ignore the discreteness of features and do not take cluster relationships into account.

Keywords

Outlier Formal concept analysis Concept lattice Cluster Discretization 

References

  1. 1.
    Adda, M., Wu, L., White, S., Feng, Y.: Pattern detection with rare item-set mining. arXiv:1209.3089 (2012)
  2. 2.
    Aggarwal, C.C.: Outlier Analysis. Springer, New York (2013)CrossRefMATHGoogle Scholar
  3. 3.
    Bache, K., Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
  4. 4.
    Bay, S.D., Schwabacher, M.: Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 29–38 (2003)Google Scholar
  5. 5.
    Bhaduri, K., Matthews, B.L., Giannella, C.R.: Algorithms for speeding up distance-based outlier detection. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 859–867 (2011)Google Scholar
  6. 6.
    Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 93–104 (2000)Google Scholar
  7. 7.
    Caputo, B., Sim, K., Furesjo, F., Smola, A.: Appearance-based object recognition using SVMs: which kernel should I use? In: Proceedings of NIPS Workshop on Statistical Methods for Computational Experiments in Visual Processing and Computer Vision (2002)Google Scholar
  8. 8.
    Davey, B.A., Priestley, H.A.: Introduction to Lattices and Order, 2nd edn. Cambridge University Press, Cambridge (2002)CrossRefMATHGoogle Scholar
  9. 9.
    Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations. Springer, New York (1998)Google Scholar
  10. 10.
    Hawkins, D.: Identification of Outliers. Chapman and Hall, London (1980)CrossRefMATHGoogle Scholar
  11. 11.
    Kaytoue, M., Kuznetsov, S.O., Napoli, A.: Revisiting numerical pattern mining with formal concept analysis. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence, pp. 1342–1347 (2011)Google Scholar
  12. 12.
    Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: Proceedings of the 24th International Conference on Very Large Data Bases, pp. 392–403 (1998)Google Scholar
  13. 13.
    Knorr, E.M., Ng, R.T., Tucakov, V.: Distance-based outliers: algorithms and applications. VLDB J. 8(3), 237–253 (2000)CrossRefGoogle Scholar
  14. 14.
    Liu, F.T., Ting, K.M., Zhou, Z.-H.: On detecting clustered anomalies using SCiForest. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010, Part II. LNCS, vol. 6322, pp. 274–290. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  15. 15.
    Makino, K., Uno, T.: New algorithms for enumerating all maximal cliques. In: Hagerup, T., Katajainen, J. (eds.) SWAT 2004. LNCS, vol. 3111, pp. 260–272. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  16. 16.
    Okubo, Y., Haraguchi, M.: An algorithm for extracting rare concepts with concise intents. In: Kwuida, L., Sertkaya, B. (eds.) ICFCA 2010. LNCS, vol. 5986, pp. 145–160. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  17. 17.
    Orair, G.H., Teixeira, C.H.C., Wang, Y., Meira Jr., W., Parthasarathy, S.: Distance-based outlier detection: consolidation and renewed bearing. Proc. VLDB Endowment 3(1–2), 1469–1480 (2010)CrossRefGoogle Scholar
  18. 18.
    Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Efficient mining of association rules using closed itemset lattices. Inf. Syst. 24(1), 25–46 (1999)CrossRefGoogle Scholar
  19. 19.
    R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing (2013). http://www.R-project.org
  20. 20.
    Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 427–438 (2000)Google Scholar
  21. 21.
    Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2001)CrossRefMATHGoogle Scholar
  22. 22.
    Sugiyama, M., Imajo, K., Otaki, K., Yamamoto, A.: Semi-supervised ligand finding using formal concept analysis. IPSJ Trans. Math. Model. Appl. (TOM) 5(2), 39–48 (2012)Google Scholar
  23. 23.
    Sugiyama, M., Yamamoto, A.: Semi-supervised learning on closed set lattices. Intell. Data Anal. 17(3), 399–421 (2013)Google Scholar
  24. 24.
    Tsuiki, H.: Real number computation through Gray code embedding. Theor. Comput. Sci. 284(2), 467–485 (2002)CrossRefMATHMathSciNetGoogle Scholar
  25. 25.
    Valtchev, P., Missaoui, R., Godin, R.: Formal concept analysis for knowledge discovery and data mining: the new challenges. In: Eklund, P. (ed.) ICFCA 2004. LNCS (LNAI), vol. 2961, pp. 352–371. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  26. 26.
    Weihrauch, K.: Computable Analysis: An Introduction. Springer, New York (2000)CrossRefGoogle Scholar
  27. 27.
    Wille, R.: Restructuring lattice theory: an approach based on hierarchies of concepts. In: Ferré, S., Rudolph, S. (eds.) ICFCA 2009. LNCS, vol. 5548, pp. 314–339. Springer, Heidelberg (2009)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.Machine Learning and Computational Biology Research GroupMax Planck Institute for Intelligent Systems and Max Planck Institute for Developmental BiologyTübingenGermany

Personalised recommendations