Data Mining and Knowledge Discovery

, Volume 27, Issue 2, pp 193–224 | Cite as

How to “alternatize” a clustering algorithm

  • M. Shahriar Hossain
  • Naren Ramakrishnan
  • Ian Davidson
  • Layne T. Watson
Article

Abstract

Given a clustering algorithm, how can we adapt it to find multiple, nonredundant, high-quality clusterings? We focus on algorithms based on vector quantization and describe a framework for automatic ‘alternatization’ of such algorithms. Our framework works in both simultaneous and sequential learning formulations and can mine an arbitrary number of alternative clusterings. We demonstrate its applicability to various clustering algorithms—k-means, spectral clustering, constrained clustering, and co-clustering—and effectiveness in mining a variety of datasets.

Keywords

Clustering Alternative clustering 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. SIGMOD Rec 27(2): 94–105CrossRefGoogle Scholar
  2. Bae E, Bailey J (2006) COALA: a novel approach for the extraction of an alternate clustering of high quality and high dissimilarity. In: ICDM ’06, pp 53–62Google Scholar
  3. Banerjee A, Merugu S, Dhillon IS, Ghosh J (2005) Clustering with Bregman divergences. J Mach Learn Res 6: 1705–1749MathSciNetMATHGoogle Scholar
  4. Banerjee A, Basu S, Merugu S (2007) Multi-way clustering on relation graphs. In: SDM ’07, pp 225–334Google Scholar
  5. Brohee S, van Helden J (2006) Evaluation of clustering algorithms for protein-protein interaction networks. BMC Bioinform 7: 488CrossRefGoogle Scholar
  6. Caruana R, Elhawary M, Nguyen N, Smith C (2006) Meta clustering. In: ICDM ’06, pp 107–118Google Scholar
  7. Chakrabarti D, Papadimitriou S, Modha DS, Faloutsos C (2004) Fully automatic cross-associations. In: KDD ’04, pp 79–88Google Scholar
  8. Cheng C, Fu AW, Zhang Y (1999) Entropy-based subspace clustering for mining numerical data. In: KDD ’99, pp 84–93Google Scholar
  9. Conn AR, Gould NIM, Toint PL (1992) LANCELOT: a Fortran package for large-scale nonlinear optimization (release A), vol 17. Springer, New YorkGoogle Scholar
  10. Cui Y, Fern X, Dy JG (2007) Non-redundant multi-view clustering via orthogonalization. In: ICDM ’07, pp 133–142Google Scholar
  11. Dang X, Bailey J (2010a) A hierarchical information theoretic technique for the discovery of non-linear alternative clusterings. In: KDD ’10, pp 573–582Google Scholar
  12. Dang X, Bailey J (2010b) Generation of alternative clusterings using the CAMI approach. In: SDM ’10, pp 118–129Google Scholar
  13. Davidson I, Basu S (2007) A survey of clustering with instance level constraints. In: TKDD, pp 1–41Google Scholar
  14. Davidson I, Qi Z (2008) Finding alternative clusterings using constraints. In: ICDM ’08, pp 773–778Google Scholar
  15. Dhillon IS (2001) Co-clustering documents and words using bipartite spectral graph partitioning. In: KDD ’01, pp 269–274Google Scholar
  16. Dhillon IS, Mallela S, Modha DS (2003) Information theoretic co-clustering. In: KDD ’03, pp 89–98Google Scholar
  17. Dunn JC (1974) Well-separated clusters and optimal fuzzy partitions. J Cybernet 4(1): 95–104MathSciNetCrossRefGoogle Scholar
  18. Friedman N, Mosenzon O, Slonim N, Tishby N (2001) Multivariate information bottleneck. In: UAI ’01, pp 152–161Google Scholar
  19. Gondek D, Hofmann T (2005) Non-redundant clustering with conditional ensembles. In: KDD ’05, pp 70–77Google Scholar
  20. Gondek D, Hofmann T (2007) Non-redundant data clustering. Knowl Inf Syst 12(1): 1–24CrossRefGoogle Scholar
  21. Gondek D, Vaithyanathan S, Garg A (2005) Clustering with model-level constraints. In: SDM ’05, pp 126–137Google Scholar
  22. Govaert G, Nadif M (2003) Clustering with block mixture models. Pattern Recog Lett 36(2): 463–473CrossRefGoogle Scholar
  23. Greenacre M. (1988) Clustering the rows and columns of a contingency table. J Classif 5(1): 39–51MathSciNetCrossRefMATHGoogle Scholar
  24. Hossain MS, Tadepalli S, Watson LT, Davidson I, Helm RF, Ramakrishnan N (2010) Unifying dependent clustering and disparate clustering for non-homogeneous data. In: KDD ’10, pp 593–602Google Scholar
  25. Jain P, Meka R, Dhillon IS (2008) Simultaneous unsupervised learning of disparate clusterings. In: SDM ’08, pp 858–869Google Scholar
  26. Kaski S, Nikkilä J, Sinkkonen J, Lahti L, Knuuttila JEA, Roos C (2005) Associative clustering for exploring dependencies between functional genomics data sets. IEEE/ACM TCBB 2(3): 203–216Google Scholar
  27. Kullback S, Gokhale D (1978) The information in contingency tables. Marcel Dekker Inc., New YorkMATHGoogle Scholar
  28. Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1): 79–86MathSciNetCrossRefMATHGoogle Scholar
  29. Li T, Ding C, Jordan MI (2007) Solving consensus and semi-supervised clustering problems using nonnegative matrix factorization. In: ICDM ’07, pp 577–582Google Scholar
  30. Malakooti B, Yang Z (2004) Clustering and group selection of multiple criteria alternatives with application to space-based networks. IEEE Trans SMC B 34(1): 40–51Google Scholar
  31. Miettinen K, Salminen P (1999) Decision-aid for discrete multiple criteria decision making problems with imprecise data. Eur J Oper Res 119(1): 50–60CrossRefMATHGoogle Scholar
  32. Monti S, Tamayo P, Mesirov J, Golub T (2003) Consensus clustering: A resampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn 52: 91–118CrossRefMATHGoogle Scholar
  33. Nadif M, Govaert G (2005) Block clustering of contingency table and mixture model. In: IDA ’05, pp 249–259Google Scholar
  34. Niu D, Dy JG, Jordan MI (2010) Multiple non-redundant spectral clustering views. In: ICML ’10, pp 831–838Google Scholar
  35. Qi Z, Davidson I (2009) A principled and flexible framework for finding alternative clusterings. In: KDD ’09, pp 717–726Google Scholar
  36. Ross DA, Zemel RS (2006) Learning parts-based representations of data. J Mach Learn Res 7: 2369–2397MathSciNetMATHGoogle Scholar
  37. Shi J, Malik J (2000) Normalized cuts and image segmentation. IEE Trans Pattern Anal Meach Intel 22(8): 888–905CrossRefGoogle Scholar
  38. Sinkkonen J, Kaski S. (2002) Clustering based on conditional distributions in an auxiliary space. Neural Comput 14(1): 217–239CrossRefMATHGoogle Scholar
  39. Sinkkonen J, Kaski S, Nikkilä J (2002) Discriminative clustering: optimal contingency tables by learning metrics. In: ECML ’02, pp 418–430Google Scholar
  40. Sinkkonen J, Nikkilä J, Lahti L, Kaski S (2004) Associative clustering. In: ECML ’04, pp 396–406Google Scholar
  41. Strehl A, Ghosh J (2003) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3: 583–617MathSciNetMATHGoogle Scholar
  42. Tadepalli S (2009) Schemas of clustering. PhD thesis, Virginia Tech, BlacksburgGoogle Scholar
  43. Tan P-N, Steinbach M, Kumar V (2005) Introduction to data mining. Addison-Wesley, BostonGoogle Scholar
  44. Vinh NX, Epps J (2010) mincentropy: a novel information theoretic approach for the generation of alternative clusterings. In: ICDM ’10, pp 521–530Google Scholar
  45. Wang X, Davidson I (2010) Flexible constrained spectral clustering. In: KDD ’10 pp 563–572Google Scholar
  46. Zeng Y, Tang J, Garcia-Frias J, Gao GR (2002) An adaptive meta-clustering approach: combining the information from different clustering results. In: CSB ’02, pp 276–287Google Scholar
  47. Zhang W, Surve A, Fern X, Dietterich T (2009) Learning non-redundant codebooks for classifying complex objects. In: ICML ’09, pp 1241–1248Google Scholar

Copyright information

© The Author(s) 2012

Authors and Affiliations

  • M. Shahriar Hossain
    • 1
  • Naren Ramakrishnan
    • 2
  • Ian Davidson
    • 4
  • Layne T. Watson
    • 2
    • 3
  1. 1.Department of Mathematics and Computer ScienceVirginia State UniversityPetersburgUSA
  2. 2.Department of Computer ScienceVirginia Polytechnic Institute and State UniversityBlacksburgUSA
  3. 3.Department of MathematicsVirginia Polytechnic Institute and State UniversityBlacksburgUSA
  4. 4.Department of Computer ScienceUniversity of CaliforniaDavisUSA

Personalised recommendations