Advertisement

A Sampling-Based Approach for Discovering Subspace Clusters

  • Sandy Moens
  • Boris CuleEmail author
  • Bart Goethals
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11828)

Abstract

Subspace clustering aims to discover clusters in projections of highly dimensional numerical data. In this paper, we focus on discovering small collections of interesting subspace clusters that do not try to cluster all data points, leaving noisy data points unclustered. To this end, we propose a randomised method that first converts the highly dimensional database to a binarised one using projected samples of the original database. This database is then mined for frequent itemsets, which we show can be translated back to subspace clusters. In our extensive experimental analysis, we show on synthetic as well as real world data that our method is capable of discovering highly interesting subspace clusters.

References

  1. 1.
    Moise, G., Sander, J., Ester, M.: P3C: a robust projected clustering algorithm. In: Sixth International Conference on Data Mining (ICDM 2006), pp. 414–425. IEEE (2006)Google Scholar
  2. 2.
    Aksehirli, E., Goethals, B., Muller, E., Vreeken, J.: Cartification: a neighborhood preserving transformation for mining high dimensional data. In: 2013 IEEE 13th International Conference on Data Mining (ICDM), pp. 937–942. IEEE (2013)Google Scholar
  3. 3.
    Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications, vol. 27, no. 2. ACM (1998)Google Scholar
  4. 4.
    Aggarwal, C.C., Wolf, J. L., Yu, P.S., Procopiuc, C., Park, J.S.: Fast algorithms for projected clustering. In: ACM SIGMoD Record, vol. 28, no. 2, pp. 61–72. ACM (1999)CrossRefGoogle Scholar
  5. 5.
    Freedman, D., Diaconis, P.: On the histogram as a density estimator: L 2 theory. Probab. Theory Relat. Fields 57(4), 453–476 (1981)MathSciNetzbMATHGoogle Scholar
  6. 6.
    Moens, S., Goethals, B.: Randomly sampling maximal itemsets. In: KDD Workshop on Interactive Data Exploration and Analytics, pp. 79–86. ACM (2013)Google Scholar
  7. 7.
    Günnemann, S., Färber, I., Müller, E., Assent, I., Seidl, T.: External evaluation measures for subspace clustering. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 1363–1372. ACM (2011)Google Scholar
  8. 8.
    Maaten, L.v.d., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)Google Scholar
  9. 9.
    Andrews, D.F.: Plots of high-dimensional data. Biometrics 29, 125–136 (1972)CrossRefGoogle Scholar
  10. 10.
    MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Oakland, CA, USA, vol. 1, no. 14, pp. 281–297 (1967)Google Scholar
  11. 11.
    Procopiuc, C.M., Jones, M., Agarwal, P.K., Murali, T.: A Monte Carlo algorithm for fast projective clustering. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 418–427. ACM (2002)Google Scholar
  12. 12.
    Yiu, M.L., Mamoulis, N.: Frequent-pattern based iterative projected clustering. In: Third IEEE International Conference on Data Mining, ICDM 2003, pp. 689–692. IEEE (2003)Google Scholar
  13. 13.
    Ester, M., Kriegel, H.-P., Sander, J., Xu, X., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, vol. 96, no. 34, pp. 226–231 (1996)Google Scholar
  14. 14.
    Kailing, K., Kriegel, H.-P., Kröger, P.: Density-connected subspace clustering for high-dimensional data. In: Proceedings of the 2004 SIAM International Conference on Data Mining, pp. 246–256. SIAM (2004)Google Scholar
  15. 15.
    Nguyen, H.V., Müller, E., Vreeken, J., Keller, F., Böhm, K.: CMI: an information-theoretic contrast measure for enhancing subspace cluster and outlier detection. In: SIAM International Conference on Data Mining, pp. 198–206. SIAM (2013)Google Scholar
  16. 16.
    Kriegel, H.-P., Kroger, P., Renz, M., Wurst, S.: A generic framework for efficient subspace clustering of high-dimensional data. In: Fifth IEEE International Conference on Data Mining (ICDM 2005), 8-pp. IEEE (2005)Google Scholar
  17. 17.
    Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 1(1), 24–45 (2004)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.University of AntwerpAntwerpBelgium
  2. 2.Monash UniversityMelbourneAustralia

Personalised recommendations