Machine Learning

, Volume 98, Issue 1–2, pp 121–155 | Cite as

The blind men and the elephant: on meeting the problem of multiple truths in data from clustering and pattern mining perspectives

Article

Abstract

In this position paper, we discuss how different branches of research on clustering and pattern mining, while rather different at first glance, in fact have a lot in common and can learn a lot from each other’s solutions and approaches. We give brief introductions to the fundamental problems of different sub-fields of clustering, especially focusing on subspace clustering, ensemble clustering, alternative (as a variant of constraint) clustering, and multiview clustering (as a variant of alternative clustering). Second, we relate a representative of these areas, subspace clustering, to pattern mining. We show that, while these areas use different vocabularies and intuitions, they share common roots and they are exposed to essentially the same fundamental problems; in particular, we detail how certain problems currently faced by the one field, have been solved by the other field, and vice versa.

The purpose of our survey is to take first steps towards bridging the linguistic gap between different (sub-) communities and to make researchers from different fields aware of the existence of similar problems (and, partly, of similar solutions or of solutions that could be transferred) in the literature on the other research topic.

Keywords

Subspace clustering Pattern mining Ensemble clustering Alternative clustering Constraint clustering Multiview clustering 

References

  1. Achtert, E., Kriegel, H. P., Pryakhin, A., & Schubert, M. (2005). Hierarchical density-based clustering for multi-represented objects. In Workshop on mining complex data (MCD) on the 5th IEEE international conference on data mining (ICDM), Houston, TX (p. 9). Google Scholar
  2. Achtert, E., Böhm, C., Kriegel, H. P., Kröger, P., Müller-Gorman, I., & Zimek, A. (2006a). Finding hierarchies of subspace clusters. In Proceedings of the 10th European conference on principles and practice of knowledge discovery in databases (PKDD), Berlin, Germany (pp. 446–453). doi:10.1007/11871637_42. Google Scholar
  3. Achtert, E., Böhm, C., Kröger, P., & Zimek, A. (2006b). Mining hierarchies of correlation clusters. In Proceedings of the 18th international conference on scientific and statistical database management (SSDBM), Vienna, Austria (pp. 119–128). doi:10.1109/SSDBM.2006.35. Google Scholar
  4. Achtert, E., Kriegel, H. P., Pryakhin, A., & Schubert, M. (2006c). Clustering multi-represented objects using combination trees. In Proceedings of the 10th Pacific-Asia conference on knowledge discovery and data mining (PAKDD), Singapore (pp. 174–178). doi:10.1007/11731139_21. Google Scholar
  5. Achtert, E., Böhm, C., Kriegel, H. P., Kröger, P., Müller-Gorman, I., & Zimek, A. (2007a). Detection and visualization of subspace cluster hierarchies. In Proceedings of the 12th international conference on database systems for advanced applications (DASFAA), Bangkok, Thailand (pp. 152–163). doi:10.1007/978-3-540-71703-4_15. CrossRefGoogle Scholar
  6. Achtert, E., Böhm, C., Kriegel, H. P., Kröger, P., & Zimek, A. (2007b). On exploring complex relationships of correlation clusters. In Proceedings of the 19th international conference on scientific and statistical database management (SSDBM), Banff, Canada (pp. 7–16). doi:10.1109/SSDBM.2007.21. Google Scholar
  7. Achtert, E., Goldhofer, S., Kriegel, H. P., Schubert, E., & Zimek, A. (2012). Evaluation of clusterings—metrics and visual support. In Proceedings of the 28th international conference on data engineering (ICDE), Washington, DC (pp. 1285–1288). doi:10.1109/ICDE.2012.128. Google Scholar
  8. Aggarwal, C. C., Procopiuc, C. M., Wolf, J. L., Yu, P. S., & Park, J. S. (1999). Fast algorithms for projected clustering. In Proceedings of the ACM international conference on management of data (SIGMOD), Philadelphia, PA (pp. 61–72). Google Scholar
  9. Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules. In Proceedings of the 20th international conference on very large data bases (VLDB), Santiago de Chile, Chile (pp. 487–499). Google Scholar
  10. Agrawal, R., Gehrke, J., Gunopulos, D., & Raghavan, P. (1998). Automatic subspace clustering of high dimensional data for data mining applications. In Proceedings of the ACM international conference on management of data (SIGMOD), Seattle, WA (pp. 94–105). Google Scholar
  11. Al-Shahrour, F., Diaz-Uriarte, R., & Dopazo, J. (2004). FatiGO: a web tool for finding significant associations of Gene ontology terms with groups of genes. Bioinformatics, 20(4), 578–580. doi:10.1093/bioinformatics/btg455. CrossRefGoogle Scholar
  12. Ankerst, M., Breunig, M. M., Kriegel, H. P., & Sander, J. (1999). OPTICS: ordering points to identify the clustering structure. In Proceedings of the ACM international conference on management of data (SIGMOD), Philadelphia, PA (pp. 49–60). Google Scholar
  13. Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., Davis, A. P., Dolinski, K., Dwight, S. S., Eppig, J. T., Harris, M. A., Hill, D. P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J. C., Richardson, J. E., Ringwald, M., Rubin, G. M., & Sherlock, G. (2000). Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature Genetics, 25(1), 25–29. CrossRefGoogle Scholar
  14. Assent, I. (2012). Clustering high dimensional data. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2(4), 340–350. Google Scholar
  15. Assent, I., Krieger, R., Müller, E., & Seidl, T. (2007). DUSC: dimensionality unbiased subspace clustering. In Proceedings of the 7th IEEE international conference on data mining (ICDM), Omaha, NE (pp. 409–414). doi:10.1109/ICDM.2007.49. CrossRefGoogle Scholar
  16. Assent, I., Krieger, R., Müller, E., & Seidl, T. (2008). INSCY: indexing subspace clusters with in-process-removal of redundancy. In Proceedings of the 8th IEEE international conference on data mining (ICDM), Pisa, Italy (pp. 719–724). doi:10.1109/ICDM.2008.46. Google Scholar
  17. Assent, I., Müller, E., Günnemann, S., Krieger, R., & Seidl, T. (2010). Less is more: non-redundant subspace clustering. In MultiClust: 1st international workshop on discovering, summarizing and using multiple clusterings held in conjunction with KDD 2010, Washington, DC. Google Scholar
  18. Azimi, J., & Fern, X. (2009). Adaptive cluster ensemble selection. In Proceedings of the 21st international joint conference on artificial intelligence (IJCAI), Pasadena, CA (pp. 992–997). Google Scholar
  19. Bade, K., & Nürnberger, A. (2008). Creating a cluster hierarchy under constraints of a partially known hierarchy. In Proceedings of the 8th SIAM international conference on data mining (SDM), Atlanta, GA (pp. 13–23). Google Scholar
  20. Bae, E., & Bailey, J. (2006). COALA: a novel approach for the extraction of an alternate clustering of high quality and high dissimilarity. In Proceedings of the 6th IEEE international conference on data mining (ICDM), Hong Kong, China (pp. 53–62). doi:10.1109/ICDM.2006.37. Google Scholar
  21. Barutcuoglu, Z., Schapire, R. E., & Troyanskaya, O. G. (2006). Hierarchical multi-label prediction of gene function. Bioinformatics, 22(7), 830–836. doi:10.1093/bioinformatics/btk048. CrossRefGoogle Scholar
  22. Basu, S., Davidson, I., & Wagstaff, K. (Eds.) (2008). Constraint clustering: advances in algorithms, applications and theory. Boca Raton, London, New York: CRC Press. Google Scholar
  23. Bayardo, R. (1998). Efficiently mining long patterns from databases. In Proceedings of the ACM international conference on management of data (SIGMOD), Seattle, WA (pp. 85–93). Google Scholar
  24. Bellman, R. (1961). Adaptive control processes. a guided tour. Princeton: Princeton University Press. MATHGoogle Scholar
  25. Bennett, K. P., Fayyad, U., & Geiger, D. (1999). Density-based indexing for approximate nearest-neighbor queries. In Proceedings of the 5th ACM international conference on knowledge discovery and data mining (SIGKDD), San Diego, CA (pp. 233–243). doi:10.1145/312129.312236. Google Scholar
  26. Bernecker, T., Houle, M. E., Kriegel, H. P., Kröger, P., Renz, M., Schubert, E., & Zimek, A. (2011). Quality of similarity rankings in time series. In Proceedings of the 12th international symposium on spatial and temporal databases (SSTD), Minneapolis, MN (pp. 422–440). doi:10.1007/978-3-642-22922-0_25. CrossRefGoogle Scholar
  27. Bertoni, A., & Valentini, G. (2005). Ensembles based on random projections to improve the accuracy of clustering algorithms. In 16th Italian workshop on neural nets (WIRN), and international workshop on natural and artificial immune systems (NAIS), Vietri sul Mare, Italy (pp. 31–37). doi:10.1007/11731177_5. Google Scholar
  28. Beyer, K., Goldstein, J., Ramakrishnan, R., & Shaft, U. (1999). When is “nearest neighbor” meaningful? In Proceedings of the 7th international conference on database theory (ICDT), Jerusalem, Israel (pp. 217–235). doi:10.1007/3-540-49257-7_15. Google Scholar
  29. Bickel, S., & Scheffer, T. (2004). Multi-view clustering. In Proceedings of the 4th IEEE international conference on data mining (ICDM), Brighton, UK (pp. 19–26). doi:10.1109/ICDM.2004.10095. CrossRefGoogle Scholar
  30. Blum, A., & Mitchell, T. (1998). Combining labeled and unlabeled data with Co-training. In Proceedings of the 11th annual conference on computational learning theory (COLT), Madison, WI (pp. 92–100). doi:10.1145/279943.279962. Google Scholar
  31. Böhm, C., Fiedler, F., Oswald, A., Plant, C., Wackersreuther, B., & Wackersreuther, P. (2010). ITCH: information-theoretic cluster hierarchies. In Proceedings of the European conference on machine learning and knowledge discovery in databases (ECML PKDD), Barcelona, Spain. Google Scholar
  32. Boley, M., & Grosskreutz, H. (2008). A randomized approach for approximating the number of frequent sets. In Proceedings of the 8th IEEE international conference on data mining (ICDM), Pisa, Italy (pp. 43–52). New York: IEEE Press. Google Scholar
  33. Boutell, M. R., Luo, J., Shen, X., & Brown, C. M. (2004). Learning multi-label scene classification. Pattern Recognition, 37(9), 1757–1771. doi:10.1016/j.patcog.2004.03.009. CrossRefGoogle Scholar
  34. Brin, S., Motwani, R., & Silverstein, C. (1997). Beyond market baskets: generalizing association rules to correlations. In Proceedings of the ACM international conference on management of data (SIGMOD), Tucson, AZ (pp. 265–276). New York: ACM Press. Google Scholar
  35. Brown, G., Wyatt, J., Harris, R., & Yao, X. (2005). Diversity creation methods: a survey and categorisation. Information Fusion, 6, 5–20. doi:10.1016/j.inffus.2004.04.004. CrossRefGoogle Scholar
  36. Cai, L., & Hofmann, T. (2004). Hierarchical document categorization with support vector machines. In Proceedings of the 13th ACM conference on information and knowledge management (CIKM), Washington, DC (pp. 78–87). doi:10.1145/1031171.1031186. Google Scholar
  37. Calders, T., & Goethals, B. (2007). Non-derivable itemset mining. Data Mining and Knowledge Discovery, 14(1), 171–206. MathSciNetCrossRefGoogle Scholar
  38. Campello, R. J. G. B. (2010). Generalized external indexes for comparing data partitions with overlapping categories. Pattern Recognition Letters, 31(9), 966–975. doi:10.1016/j.patrec.2010.01.002. CrossRefGoogle Scholar
  39. Caruana, R., Elhawary, M., Nguyen, N., & Smith, C. (2006). Meta clustering. In Proceedings of the 6th IEEE international conference on data mining (ICDM), Hong Kong, China (pp. 107–118). doi:10.1109/ICDM.2006.103. Google Scholar
  40. Chakrabarti, D., Papadimitriou, S., Modha, D. S., & Faloutsos, C. (2004). Fully automatic cross-associations. In Proceedings of the 10th ACM international conference on knowledge discovery and data mining (SIGKDD), Seattle, WA (pp. 79–88). Google Scholar
  41. Chakrabarti, S., Dom, B., Agrawal, R., & Raghavan, P. (1998). Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies. The VLDB Journal, 7(3), 163–178. CrossRefGoogle Scholar
  42. Chakravarthy, S. V., & Ghosh, J. (1996). Scale-based clustering using the radial basis function network. IEEE Transactions on Neural Networks, 7(5), 1250–1261. CrossRefGoogle Scholar
  43. Chaudhuri, K., Kakade, S. M., Livescu, K., & Sridharan, K. (2009). Multi-view clustering via canonical correlation analysis. In Proceedings of the 26th international conference on machine learning (ICML), Montreal, QC, Canada (pp. 129–136). Google Scholar
  44. Cheng, C. H., Fu, A. W. C., & Zhang, Y. (1999). Entropy-based subspace clustering for mining numerical data. In Proceedings of the 5th ACM international conference on knowledge discovery and data mining (SIGKDD), San Diego, CA (pp. 84–93). doi:10.1145/312129.312199. Google Scholar
  45. Clare, A., & King, R. (2001). Knowledge discovery in multi-label phenotype data. In Proceedings of the 5th European conference on principles of data mining and knowledge discoverys (PKDD), Freiburg, Germany (pp. 42–53). doi:10.1007/3-540-44794-6_4. CrossRefGoogle Scholar
  46. Clare, A., & King, R. (2002). How well do we understand the clusters found in microarray data? In Silico Biology, 2(4), 511–522. Google Scholar
  47. Cover, T. M., & Thomas, J. A. (2006). Elements of information theory. New York: Wiley-Interscience. MATHGoogle Scholar
  48. Csiszár, I. (1975). I-divergence geometry of probability distributions and minimization problems. Annals of Probability, 3(1), 146–158. MATHCrossRefMathSciNetGoogle Scholar
  49. Cui, Y., Fern, X. Z., & Dy, J. G. (2007). Non-redundant multi-view clustering via orthogonalization. In Proceedings of the 7th IEEE international conference on data mining (ICDM), Omaha, NE (pp. 133–142). doi:10.1109/ICDM.2007.94. CrossRefGoogle Scholar
  50. Dang, X. H., & Bailey, J. (2010). Generation of alternative clusterings using the CAMI approach. In Proceedings of the 10th SIAM international conference on data mining (SDM), Columbus, OH (pp. 118–129). Google Scholar
  51. Dang, X. H., Assent, I., & Bailey, J. (2012). Multiple clustering views via constrained projections. In 3rd MultiClust workshop: discovering, summarizing and using multiple clusterings held in conjunction with SIAM data mining 2012, Anaheim, CA. Google Scholar
  52. Datta, S., & Datta, S. (2006). Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes. BMC Bioinformatics, 7, 397. doi:10.1186/1471-2105-7-397. CrossRefGoogle Scholar
  53. Davidson, I., & Qi, Z. (2008). Finding alternative clusterings using constraints. In Proceedings of the 8th IEEE international conference on data mining (ICDM), Pisa, Italy (pp. 773–778). doi:10.1109/ICDM.2008.141. Google Scholar
  54. Davidson, I., & Ravi, S. (2009). Using instance-level constraints in agglomerative hierarchical clustering: theoretical and empirical results. Data Mining and Knowledge Discovery, 18, 257–282. MathSciNetCrossRefGoogle Scholar
  55. Davidson, I., Ravi, S. S., & Shamis, L. (2010). A SAT-based framework for efficient constrained clustering. In Proceedings of the 10th SIAM international conference on data mining (SDM), Columbus, OH (pp. 94–105). Google Scholar
  56. De Bie, T. (2011). Maximum entropy models and subjective interestingness: an application to tiles in binary databases. Data Mining and Knowledge Discovery, 23(3), 1–40. MathSciNetMATHGoogle Scholar
  57. Dietterich, T. G. (2000). Ensemble methods in machine learning. In First international workshop on multiple classifier systems (MCS), Cagliari, Italy (pp. 1–15). doi:10.1007/3-540-45014-9_1. CrossRefGoogle Scholar
  58. Dietterich, T. G. (2003). Ensemble learning. In M. A. Arbib (Ed.), The handbook of brain theory and neural networks (2nd edn., pp. 405–408). Cambridge: MIT Press. Google Scholar
  59. Domeniconi, C. (2012). Subspace clustering ensembles (invited talk). In 3rd MultiClust workshop: discovering, summarizing and using multiple clusterings held in conjunction with SIAM data mining 2012, Anaheim, CA. Google Scholar
  60. Domeniconi, C., & Al-Razgan, M. (2009). Weighted cluster ensembles: methods and analysis. ACM Transactions on Knowledge Discovery from Data, 2(4), 1–40. doi:10.1145/1460797.1460800. CrossRefGoogle Scholar
  61. Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd ACM international conference on knowledge discovery and data mining (KDD), Portland, OR (pp. 226–231). Google Scholar
  62. Faloutsos, C., & Megalooikonomou, V. (2007). On data mining, compression and Kolmogorov complexity. In Data mining and knowledge discovery (Vol. 15, pp. 3–20). Berlin: Springer. Google Scholar
  63. Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). Knowledge discovery and data mining: towards a unifying framework. In Proceedings of the 2nd ACM international conference on knowledge discovery and data mining (KDD), Portland, OR (pp. 82–88). Google Scholar
  64. Fern, X. Z., & Brodley, C. E. (2003). Random projection for high dimensional data clustering: a cluster ensemble approach. In Proceedings of the 20th international conference on machine learning (ICML), Washington, DC (pp. 186–193). Google Scholar
  65. Fern, X. Z., & Lin, W. (2008). Cluster ensemble selection. Statistical Analysis and Data Mining, 1(3), 128–141. doi:10.1002/sam.10008. MathSciNetCrossRefGoogle Scholar
  66. Fowlkes, E. B., & Mallows, C. L. (1983). A method for comparing two hierarchical clusterings. Journal of the American Statistical Association, 78(383), 553–569. MATHCrossRefGoogle Scholar
  67. Fradkin, D., & Mörchen, F. (2010). Margin-closed frequent sequential pattern mining. In Proc. ACM SIGKDD workshop on useful patterns (UP’10). Google Scholar
  68. François, D., Wertz, V., & Verleysen, M. (2007). The concentration of fractional distances. IEEE Transactions on Knowledge and Data Engineering, 19(7), 873–886. doi:10.1109/TKDE.2007.1037. CrossRefGoogle Scholar
  69. Frank, A., & Asuncion, A. (2010). UCI machine learning repository. http://archive.ics.uci.edu/ml, http://archive.ics.uci.edu/ml.
  70. Fred, A. L. N., & Jain, A. K. (2005). Combining multiple clusterings using evidence accumulation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(6), 835–850. CrossRefGoogle Scholar
  71. Fürnkranz, J., & Sima, J. F. (2010). On exploiting hierarchical label structure with pairwise classifiers. ACM SIGKDD Explorations, 12(2), 21–25. doi:10.1145/1964897.1964903. CrossRefGoogle Scholar
  72. Färber, I., Günnemann, S., Kriegel, H. P., Kröger, P., Müller, E., Schubert, E., Seidl, T., & Zimek, A. (2010). On using class-labels in evaluation of clusterings. In MultiClust: 1st international workshop on discovering, summarizing and using multiple clusterings held in conjunction with KDD 2010, Washington, DC. Google Scholar
  73. Galbrun, E., & Miettinen, P. (2011). From black and white to full colour: extending redescription mining outside the boolean world. In Proceedings of the 11th SIAM international conference on data mining (SDM), Mesa, AZ (pp. 546–557). Google Scholar
  74. Gallo, A., Miettinen, P., & Mannila, H. (2008). Finding subgroups having several descriptions: algorithms for redescription mining. In Proceedings of the 8th SIAM international conference on data mining (SDM), Atlanta, GA. Google Scholar
  75. Gao, J., & Tan, P. N. (2006). Converting output scores from outlier detection algorithms into probability estimates. In Proceedings of the 6th IEEE international conference on data mining (ICDM), Hong Kong, China (pp. 212–221). doi:10.1109/ICDM.2006.43. Google Scholar
  76. Gat-Viks, I., Sharan, R., & Shamir, R. (2003). Scoring clustering solutions by their biological relevance. Bioinformatics, 19(18), 2381–2389. doi:10.1093/bioinformatics/btg330. CrossRefGoogle Scholar
  77. Geerts, F., Goethals, B., & Mielikäinen, T. (2004). Tiling databases. In Proceedings of the 7th international conference on discovery science, Padova, Italy (pp. 278–289). Google Scholar
  78. Geerts, F., Goethals, B., & Van den Bussche, J. (2005). Tight upper bounds on the number of candidate patterns. ACM Transactions on Database Systems, 30(2), 333–363. CrossRefGoogle Scholar
  79. Geusebroek, J. M., Burghouts, G. J., & Smeulders, A. (2005). The Amsterdam library of object images. International Journal of Computer Vision, 61(1), 103–112. doi:10.1023/B:VISI.0000042993.50813.60. CrossRefGoogle Scholar
  80. Ghosh, J., & Acharya, A. (2011). Cluster ensembles. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1(4), 305–315. doi:10.1002/widm.32. Google Scholar
  81. Gibbons, F. D., & Roth, F. P. (2002). Judging the quality of gene expression-based clustering methods using gene annotation. Genome Research, 12, 1574–1581. CrossRefGoogle Scholar
  82. Gionis, A., Mannila, H., & Seppänen, J. K. (2004). Geometric and combinatorial tiles in 0-1 data. In Proceedings of the 8th European conference on principles and practice of knowledge discovery in databases (PKDD), Pisa, Italy (pp. 173–184). Google Scholar
  83. Gionis, A., Mannila, H., Mielikäinen, T., & Tsaparas, P. (2007a). Assessing data mining results via swap randomization. ACM Transactions on Knowledge Discovery from Data, 1(3), 167–176. CrossRefGoogle Scholar
  84. Gionis, A., Mannila, H., & Tsaparas, P. (2007b). Clustering aggregation. ACM Transactions on Knowledge Discovery from Data. doi:10.1145/1217299.1217303. MATHGoogle Scholar
  85. Godbole, S., & Sarawagi, S. (2004). Discriminative methods for multi-labeled classification. In Proceedings of the 8th Pacific-Asia conference on knowledge discovery and data mining (PAKDD), Sydney, Australia (pp. 22–30). doi:10.1007/978-3-540-24775-3_5. Google Scholar
  86. Gondek, D., & Hofmann, T. (2004). Non-redundant data clustering. In Proceedings of the 4th IEEE international conference on data mining (ICDM), Brighton, UK (pp. 75–82). doi:10.1109/ICDM.2004.10104. CrossRefGoogle Scholar
  87. Gondek, D., & Hofmann, T. (2005). Non-redundant clustering with conditional ensembles. In Proceedings of the 11th ACM international conference on knowledge discovery and data mining (SIGKDD), Chicago, IL (pp. 70–77). doi:10.1145/1081870.1081882. Google Scholar
  88. Grünwald, P. (2007). The minimum description length principle. Cambridge: MIT Press. Google Scholar
  89. Gullo, F., Domeniconi, C., & Tagarelli, A. (2009a). Projective clustering ensembles. In Proceedings of the 9th IEEE international conference on data mining (ICDM), Miami, FL. Google Scholar
  90. Gullo, F., Tagarelli, A., & Greco, S. (2009b). Diversity-based weighting schemes for clustering ensembles. In Proceedings of the 9th SIAM international conference on data mining (SDM), Sparks, NV (pp. 437–448). Google Scholar
  91. Gullo, F., Domeniconi, C., & Tagarelli, A. (2010). Enhancing single-objective projective clustering ensembles. In Proceedings of the 10th IEEE international conference on data mining (ICDM), Sydney, Australia. Google Scholar
  92. Gullo, F., Domeniconi, C., & Tagarelli, A. (2011). Advancing data clustering via projective clustering ensembles. In Proceedings of the 17th ACM international conference on knowledge discovery and data mining (SIGKDD), San Diego, CA. Google Scholar
  93. Günnemann, S., Müller, E., Färber, I., & Seidl, T. (2009). Detection of orthogonal concepts in subspaces of high dimensional data. In Proceedings of the 18th ACM conference on information and knowledge management (CIKM), Hong Kong, China (pp. 1317–1326). doi:10.1145/1645953.1646120. CrossRefGoogle Scholar
  94. Günnemann, S., Färber, I., Müller, E., & Seidl, T. (2010). ASCLU: alternative subspace clustering. In MultiClust: 1st international workshop on discovering, summarizing and using multiple clusterings held in conjunction with KDD 2010, Washington, DC. Google Scholar
  95. Hadjitodorov, S. T., & Kuncheva, L. I. (2007). Selecting diversifying heuristics for cluster ensembles. In 7th international workshop on multiple classifier systems (MCS), Prague, Czech Republic (pp. 200–209). CrossRefGoogle Scholar
  96. Hadjitodorov, S. T., Kuncheva, L. I., & Todorova, L. P. (2006). Moderate diversity for better cluster ensembles. Information Fusion, 7(3), 264–275. doi:10.1016/j.inffus.2005.01.008. CrossRefGoogle Scholar
  97. Hahmann, M., Volk, P. B., Rosenthal, F., Habich, D., & Lehner, W. (2009). How to control clustering results? Flexible clustering aggregation. In Proceedings of the 8th international symposium on intelligent data analysis (IDA), Lyon, France (pp. 59–70). doi:10.1007/978-3-642-03915-7_6. Google Scholar
  98. Halkidi, M., Batistakis, Y., & Vazirgiannis, M. (2001). On clustering validation techniques. Journal of Intelligent Information Systems, 17(2–3), 107–145. doi:10.1023/A:1012801612483. MATHCrossRefGoogle Scholar
  99. Hanhijärvi, S., Ojala, M., Vuokko, N., Puolamäki, K., Tatti, N., & Mannila, H. (2009). Tell me something I don’t know: randomization strategies for iterative data mining. In Proceedings of the 15th ACM international conference on knowledge discovery and data mining (SIGKDD), Paris, France (pp. 379–388). New York: ACM Press. CrossRefGoogle Scholar
  100. Hansen, L. K., & Salamon, P. (1990). Neural network ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(10), 993–1001. doi:10.1109/34.58871. CrossRefGoogle Scholar
  101. Hartigan, J. A. (1972). Direct clustering of a data matrix. Journal of the American Statistical Association, 67(337), 123–129. CrossRefGoogle Scholar
  102. Hartigan, J. A. (1975). Clustering algorithms. New York, London, Sydney, Toronto: Wiley. MATHGoogle Scholar
  103. Hébert, C., & Crémilleux, B. (2005). Mining frequent delta-free patterns in large databases. In Proceedings of the 8th international conference discovery science, Singapore (pp. 124–136). Google Scholar
  104. Horta, D., & Campello, R. J. G. B. (2012). Automatic aspect discrimination in data clustering. Pattern Recognition, 45(12), 4370–4388. MATHCrossRefGoogle Scholar
  105. Houle, M. E., Kriegel, H. P., Kröger, P., Schubert, E., & Zimek, A. (2010). Can shared-neighbor distances defeat the curse of dimensionality? In Proceedings of the 22nd international conference on scientific and statistical database management (SSDBM), Heidelberg, Germany (pp. 482–500). doi:10.1007/978-3-642-13818-8_34. Google Scholar
  106. Jain, A. K., & Dubes, R. C. (1988). Algorithms for clustering data. Englewood Cliffs: Prentice Hall. MATHGoogle Scholar
  107. Jain, P., Meka, R., & Dhillon, I. S. (2008). Simultaneous unsupervised learning of disparate clusterings. Statistical Analysis and Data Mining, 1(3), 195–210. doi:10.1002/sam.10007. MathSciNetCrossRefGoogle Scholar
  108. Jaynes, E. T. (1982). On the rationale of maximum-entropy methods. Proceedings of the IEEE, 70(9), 939–952. CrossRefGoogle Scholar
  109. Kailing, K., Kriegel, H. P., & Kröger, P. (2004a). Density-connected subspace clustering for high-dimensional data. In Proceedings of the 4th SIAM international conference on data mining (SDM), Lake Buena Vista, FL (pp. 246–257). Google Scholar
  110. Kailing, K., Kriegel, H. P., Pryakhin, A., & Schubert, M. (2004b). Clustering multi-represented objects with noise. In Proceedings of the 8th Pacific-Asia conference on knowledge discovery and data mining (PAKDD), Sydney, Australia (pp. 394–403). doi:10.1007/978-3-540-24775-3_48. Google Scholar
  111. Klein, D., Kamvar, S. D., & Manning, C. D. (2002). From instance-level constraints to space-level constraints: making the most of prior knowledge in data clustering. In Proceedings of the 19th international conference on machine learning (ICML), Sydney, Australia (pp. 307–314). Google Scholar
  112. Knobbe, A., & Ho, E. (2006a). Pattern teams. In Proceedings of the 10th European conference on principles and practice of knowledge discovery in databases (PKDD) (Vol. 4213, pp. 577–584). Berlin: Springer. Google Scholar
  113. Knobbe, A. J., & Ho, E. K. Y. (2006b). Maximally informative k-itemsets and their efficient discovery. In Proceedings of the 12th ACM international conference on knowledge discovery and data mining (SIGKDD), Philadelphia, PA (pp. 237–244). CrossRefGoogle Scholar
  114. Koller, D., & Sahami, M. (1997). Hierarchically classifying documents using very few words. In Proceedings of the 14th international conference on machine learning (ICML), Nashville, TN (pp. 170–178). Google Scholar
  115. Kontonasios, K. N., & De Bie, T. (2010). An information-theoretic approach to finding noisy tiles in binary databases. In Proceedings of the 10th SIAM international conference on data mining (SDM), Columbus, OH, SIAM (pp. 153–164). Google Scholar
  116. Kontonasios, K. N., Vreeken, J., & De Bie, T. (2011). Maximum entropy modelling for assessing results on real-valued data. In Proceedings of the 11th IEEE international conference on data mining (ICDM), Vancouver, BC, ICDM. Google Scholar
  117. Koopman, A., & Siebes, A. (2008). Discovering relational items sets efficiently. In Proceedings of the 8th SIAM international conference on data mining (SDM), Atlanta, GA (pp. 108–119). Google Scholar
  118. Koopman, A., & Siebes, A. (2009). Characteristic relational patterns. In Proceedings of the 15th ACM international conference on knowledge discovery and data mining (SIGKDD), Paris, France (pp. 437–446). CrossRefGoogle Scholar
  119. Kriegel, H. P., & Schubert, M. (2012). Co-RCA: unsupervised distance-learning for multi-view clustering. In 3rd MultiClust workshop: discovering, summarizing and using multiple clusterings held in conjunction with SIAM data mining 2012, Anaheim, CA (pp. 11–18). Google Scholar
  120. Kriegel, H. P., & Zimek, A. (2010). Subspace clustering, ensemble clustering, alternative clustering, multiview clustering: what can we learn from each other? In MultiClust: 1st international workshop on discovering, summarizing and using multiple clusterings held in conjunction with KDD 2010, Washington, DC. Google Scholar
  121. Kriegel, H. P., Kunath, P., Pryakhin, A., & Schubert, M. (2008). Distribution-based similarity for multi-represented multimedia objects. In Proceedings of the 14th IEEE international MultiMedia modeling conference (MMM), Kyoto, Japan (pp. 155–164). doi:10.1007/978-3-540-77409-9_15. Google Scholar
  122. Kriegel, H. P., Kröger, P., & Zimek, A. (2009). Clustering high dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Transactions on Knowledge Discovery from Data, 3(1), 1–58. doi:10.1145/1497577.1497578. CrossRefGoogle Scholar
  123. Kriegel, H. P., Kröger, P., Sander, J., & Zimek, A. (2011a). Density-based clustering. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1(3), 231–240. doi:10.1002/widm.30. Google Scholar
  124. Kriegel, H. P., Kröger, P., Schubert, E., & Zimek, A. (2011b). Interpreting and unifying outlier scores. In Proceedings of the 11th SIAM international conference on data mining (SDM), Mesa, AZ (pp. 13–24). Google Scholar
  125. Kriegel, H. P., Schubert, E., & Zimek, A. (2011c). Evaluation of multiple clustering solutions. In 2nd MultiClust workshop: discovering, summarizing and using multiple clusterings held in conjunction with ECML PKDD 2011, Athens, Greece (pp. 55–66). Google Scholar
  126. Kriegel, H. P., Kröger, P., & Zimek, A. (2012). Subspace clustering. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2(4), 351–364. Google Scholar
  127. Kröger, P., & Zimek, A. (2009). Subspace clustering techniques. In L. Liu & M. T. Özsu (Eds.), Encyclopedia of database systems (pp. 2873–2875). Berlin: Springer. doi:10.1007/978-0-387-39940-9_607. Google Scholar
  128. Kumar, A., & Daumé, H. (2011). A co-training approach for multi-view spectral clustering. In Proceedings of the 28th international conference on machine learning (ICML), Bellevue, Washington, DC, USA (pp. 393–400). Google Scholar
  129. Kuncheva, L. I., & Hadjitodorov, S. T. (2004). Using diversity in cluster ensembles. In Proceedings of the 2004 IEEE international conference on systems, man, and cybernetics (ICSMC), The Hague, Netherlands (pp. 1214–1219). Google Scholar
  130. Lazarevic, A., & Kumar, V. (2005). Feature bagging for outlier detection. In Proceedings of the 11th ACM international conference on knowledge discovery and data mining (SIGKDD), Chicago, IL (pp. 157–166). doi:10.1145/1081870.1081891. Google Scholar
  131. Lee, S. G., Hur, J. U., & Kim, Y. S. (2004). A graph-theoretic modeling on GO space for biological interpretation of gene clusters. Bioinformatics, 20(3), 381–388. doi:10.1093/bioinformatics/btg420. CrossRefGoogle Scholar
  132. Lelis, L., & Sander, J. (2009). Semi-supervised density-based clustering. In Proceedings of the 9th IEEE international conference on data mining (ICDM), Miami, FL (pp. 842–847). doi:10.1109/ICDM.2009.143. Google Scholar
  133. Leman, D., Feelders, A., & Knobbe, A. J. (2008). Exceptional model mining. In Proceedings of the European conference on machine learning and knowledge discovery in databases (ECML/PKDD), Antwerp, Belgium (pp. 1–16). CrossRefGoogle Scholar
  134. Li, T., & Ding, C. (2008). Weighted consensus clustering. In Proceedings of the 8th SIAM international conference on data mining (SDM), Atlanta, GA (pp. 798–809). Google Scholar
  135. Ling, R. F. (1972). On the theory and construction of k-clusters. Computer Journal, 15(4), 326–332. MathSciNetMATHCrossRefGoogle Scholar
  136. Ling, R. F. (1973). A probability theory of cluster analysis. Journal of the American Statistical Association, 68(341), 159–164. MathSciNetMATHCrossRefGoogle Scholar
  137. Liu, G., Li, J., Sim, K., & Wong, L. (2007). Distance based subspace clustering with flexible dimension partitioning. In Proceedings of the 23rd international conference on data engineering (ICDE), Istanbul, Turkey (pp. 1250–1254). doi:10.1109/ICDE.2007.368985. Google Scholar
  138. Liu, G., Sim, K., Li, J., & Wong, L. (2009). Efficient mining of distance-based subspace clusters. Statistical Analysis and Data Mining, 2(5–6), 427–444. doi:10.1002/sam.10062. MathSciNetCrossRefGoogle Scholar
  139. Long, B., Zhang, Z., & Yu, P. S. (2005). Combining multiple clustering by soft correspondence. In Proceedings of the 5th IEEE international conference on data mining (ICDM), Houston, TX (pp. 282–289). doi:10.1109/ICDM.2005.45. CrossRefGoogle Scholar
  140. Lord, P. W., Stevens, R. D., Brass, A., & Goble, C. A. (2003). Investigating semantic similarity measures across the Gene ontology: the relationship between sequence and annotation. Bioinformatics, 19(10), 1275–1283. doi:10.1093/bioinformatics/btg153. CrossRefGoogle Scholar
  141. Madeira, S. C., & Oliveira, A. L. (2004). Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 1(1), 24–45. doi:10.1109/TCBB.2004.2. CrossRefGoogle Scholar
  142. Mampaey, M., Tatti, N., & Vreeken, J. (2011). Tell me what I need to know: succinctly summarizing data with itemsets. In Proceedings of the 17th ACM international conference on knowledge discovery and data mining (SIGKDD), San Diego, CA. New York: ACM Press. Google Scholar
  143. Mannila, H., & Toivonen, H. (1997). Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery, 1(3), 241–258. CrossRefGoogle Scholar
  144. McCallum, A., Rosenfeld, R., Mitchell, T. M., & Ng, A. Y. (1998). Improving text classification by shrinkage in a hierarchy of classes. In Proceedings of the 15th international conference on machine learning (ICML), Madison, WI (pp. 359–367). Google Scholar
  145. Miettinen, P., Mielikäinen, T., Gionis, A., Das, G., & Mannila, H. (2008). The discrete basis problem. IEEE Transactions on Knowledge and Data Engineering, 20(10), 1348–1362. CrossRefGoogle Scholar
  146. Mitchell, T. M. (1977). Version spaces: a candidate elimination approach to rule learning. In Proceedings of the 5th international joint conference on artificial intelligence (IJCAI), Cambridge, MA (pp. 305–310). Google Scholar
  147. Moise, G., & Sander, J. (2008). Finding non-redundant, statistically significant regions in high dimensional data: a novel approach to projected and subspace clustering. In Proceedings of the 14th ACM international conference on knowledge discovery and data mining (SIGKDD), Las Vegas, NV (pp. 533–541). doi:10.1145/1401890.1401956. CrossRefGoogle Scholar
  148. Moise, G., Zimek, A., Kröger, P., Kriegel, H. P., & Sander, J. (2009). Subspace and projected clustering: experimental evaluation and analysis. Knowledge and Information Systems, 21(3), 299–326. doi:10.1007/s10115-009-0226-y. CrossRefGoogle Scholar
  149. Mörchen, F., Thies, M., & Ultsch, A. (2011). Efficient mining of all margin-closed itemsets with applications in temporal knowledge discovery and classification by compression. Knowledge and Information Systems, 29(1), 55–80. CrossRefGoogle Scholar
  150. Müller, E., Assent, I., Günnemann, S., Krieger, R., & Seidl, T. (2009a). Relevant subspace clustering: mining the most interesting non-redundant concepts in high dimensional data. In Proceedings of the 9th IEEE international conference on data mining (ICDM), Miami, FL (pp. 377–386). doi:10.1109/ICDM.2009.10. Google Scholar
  151. Müller, E., Assent, I., Krieger, R., Günnemann, S., & Seidl, T. (2009b). DensEst: density estimation for data mining in high dimensional spaces. In Proceedings of the 9th SIAM international conference on data mining (SDM), Sparks, NV (pp. 173–184). Google Scholar
  152. Müller, E., Günnemann, S., Assent, I., & Seidl, T. (2009c). Evaluating clustering in subspace projections of high dimensional data. In Proceedings of the 35th international conference on very large data bases (VLDB), Lyon, France (pp. 1270–1281). Google Scholar
  153. Nagesh, H. S., Goil, S., & Choudhary, A. (2001). Adaptive grids for clustering massive data sets. In Proceedings of the 1st SIAM international conference on data mining (SDM), Chicago, IL. Google Scholar
  154. Nguyen, H. V., Ang, H. H., & Gopalkrishnan, V. (2010). Mining outliers with ensemble of heterogeneous detectors on random subspaces. In Proceedings of the 15th international conference on database systems for advanced applications (DASFAA), Tsukuba, Japan (pp. 368–383). doi:10.1007/978-3-642-12026-8_29. CrossRefGoogle Scholar
  155. Niu, D., Dy, J. G., & Jordan, M. I. (2010). Multiple non-redundant spectral clustering views. In Proceedings of the 27th international conference on machine learning (ICML), Haifa, Israel (pp. 831–838). Google Scholar
  156. Novak, P. K., Lavrac, N., & Webb, G. I. (2009). Supervised descriptive rule discovery: a unifying survey of contrast set, emerging pattern and subgroup mining. Journal of Machine Learning Research, 10, 377–403. MATHGoogle Scholar
  157. Ntoutsi, E., Zimek, A., Palpanas, T., Kröger, P., & Kriegel, H. P. (2012). Density-based projected clustering over high dimensional data streams. In Proceedings of the 12th SIAM international conference on data mining (SDM), Anaheim, CA (pp. 987–998). Google Scholar
  158. Ojala, M. (2010). Assessing data mining results on matrices with randomization. In Proceedings of the 10th IEEE international conference on data mining (ICDM), Sydney, Australia (pp. 959–964). Google Scholar
  159. Ojala, M., Vuokko, N., Kallio, A., Haiminen, N., & Mannila, H. (2008). Randomization of real-valued matrices for assessing the significance of data mining results. In Proceedings of the 8th SIAM international conference on data mining (SDM), Atlanta, GA (pp. 494–505). Google Scholar
  160. Ojala, M., Vuokko, N., Kallio, A., Haiminen, N., & Mannila, H. (2009). Randomization methods for assessing data analysis results on real-valued matrices. Statistical Analysis and Data Mining, 2(4), 209–230. MathSciNetCrossRefGoogle Scholar
  161. Pasquier, N., Bastide, Y., Taouil, R., & Lakhal, L. (1999a). Discovering frequent closed itemsets for association rules. In Proceedings of the 7th international conference on database theory (ICDT), Jerusalem, Israel. Google Scholar
  162. Pasquier, N., Bastide, Y., Taouil, R., & Lakhal, L. (1999b). Discovering frequent closed itemsets for association rules. In Proceedings of the 7th international conference on database theory (ICDT), Jerusalem, Israel (pp. 398–416). New York: ACM Press. Google Scholar
  163. Pensa, R. G., Robardet, C., & Boulicaut, J. F. (2005). A bi-clustering framework for categorical data. In Proceedings of the 9th European conference on principles and practice of knowledge discovery in databases (PKDD), Porto, Portugal (pp. 643–650). Google Scholar
  164. Poernomo, A. K., & Gopalkrishnan, V. (2009). Towards efficient mining of proportional fault-tolerant frequent itemsets. In Proceedings of the 15th ACM international conference on knowledge discovery and data mining (SIGKDD), Paris, France (pp. 697–706). CrossRefGoogle Scholar
  165. Prelić, A., Bleuler, S., Zimmermann, P., Wille, A., Bühlmann, P., Guissem, W., Hennig, L., Thiele, L., & Zitzler, E. (2006). A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics, 22(9), 1122–1129. doi:10.1093/bioinformatics/btl060. CrossRefGoogle Scholar
  166. Qi, Z. J., & Davidson, I. (2009). A principled and flexible framework for finding alternative clusterings. In Proceedings of the 15th ACM international conference on knowledge discovery and data mining (SIGKDD), Paris, France (pp. 717–726). doi:10.1145/1557019.1557099. CrossRefGoogle Scholar
  167. Ramakrishnan, N., Kumar, D., Mishra, B., Potts, M., & Helm, R. F. (2004). Turning cartwheels: an alternating algorithm for mining redescriptions. In Proceedings of the 10th ACM international conference on knowledge discovery and data mining (SIGKDD), Seattle, WA (pp. 266–275). Google Scholar
  168. Rissanen, J. (1978). Modeling by shortest data description. Automatica, 14(1), 465–471. MATHCrossRefGoogle Scholar
  169. Schapire, R. E., & Singer, Y. (2000). BoosTexter: a boosting-based system for text categorization. Machine Learning, 39(2–3), 135–168. doi:10.1023/A:1007649029923. MATHCrossRefGoogle Scholar
  170. Schubert, E., Wojdanowski, R., Zimek, A., & Kriegel, H. P. (2012). On evaluation of outlier rankings and outlier scores. In Proceedings of the 12th SIAM international conference on data mining (SDM), Anaheim, CA (pp. 1047–1058). Google Scholar
  171. Segal, E., Taskar, B., Gasch, A., Friedman, N., & Koller, D. (2001). Rich probabilistic models for gene expression. Bioinformatics, 17(Suppl(1), S243–S252. CrossRefGoogle Scholar
  172. Seppanen, J. K., & Mannila, H. (2004). Dense itemsets. In Proceedings of the 10th ACM international conference on knowledge discovery and data mining (SIGKDD), Seattle, WA (pp. 683–688). Google Scholar
  173. Sibson, R. (1973). SLINK: an optimally efficient algorithm for the single-link cluster method. Computer Journal, 16(1), 30–34. doi:10.1093/comjnl/16.1.30. MathSciNetCrossRefGoogle Scholar
  174. Silla, C. N., & Freitas, A. A. (2011). A survey of hierarchical classification across different application domains. Data Mining and Knowledge Discovery, 22(1–2), 31–72. doi:10.1007/s10618-010-0175-9. MathSciNetMATHCrossRefGoogle Scholar
  175. Sim, K., Gopalkrishnan, V., Zimek, A., & Cong, G. (2012). A survey on enhanced subspace clustering. Data Mining and Knowledge Discovery. doi:10.1007/s10618-012-0258-x. MATHMathSciNetGoogle Scholar
  176. Singh, V., Mukherjee, L., Peng, J., & Xu, J. (2010). Ensemble clustering using semidefinite programming with applications. Machine Learning, 79(1–2), 177–200. MathSciNetCrossRefGoogle Scholar
  177. Smets, K., & Vreeken, J. (2012). Slim: directly mining descriptive patterns. In Proceedings of the 12th SIAM international conference on data mining (SDM), Anaheim, CA (pp. 1–12). Philadelphia: Society for Industrial and Applied Mathematics (SIAM). Google Scholar
  178. Sneath, P. H. A. (1957). The application of computers to taxonomy. Journal of General Microbiology, 17, 201–226. CrossRefGoogle Scholar
  179. Sridharan, K., & Kakade, S. M. (2008). An information theoretic framework for multiview learning. In Proceedings of the 21st annual conference on learning theory (COLT), Helsinki, Finland (pp. 403–414). Google Scholar
  180. Strehl, A., & Ghosh, J. (2002). Cluster ensembles—a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, 3, 583–617. MathSciNetMATHGoogle Scholar
  181. Stuetzle, W. (2003). Estimating the cluster tree of a density by analyzing the minimal spanning tree of a sample. Journal of Classification, 20(1), 25–47. doi:10.1007/s00357-003-0004-6. MathSciNetMATHCrossRefGoogle Scholar
  182. Tatti, N. (2008). Maximum entropy based significance of itemsets. Knowledge and Information Systems, 17(1), 57–77. CrossRefGoogle Scholar
  183. Tatti, N., & Mörchen, F. (2011). Finding robust itemsets under subsampling. In Proceedings of the 11th IEEE international conference on data mining (ICDM), Vancouver, BC (pp. 705–714). Google Scholar
  184. Tatti, N., & Vreeken, J. (2011). Comparing apples and oranges: measuring differences between data mining results. In Proceedings of the European conference on machine learning and knowledge discovery in databases (ECML PKDD), Athens, Greece (pp. 398–413). Berlin: Springer. CrossRefGoogle Scholar
  185. Tatti, N., & Vreeken, J. (2012). The long and the short of it: summarizing event sequences with serial episodes. In Proceedings of the 18th ACM international conference on knowledge discovery and data mining (SIGKDD), Beijing, China. Google Scholar
  186. Thabtah, F. A., Cowling, P., & Peng, Y. (2004). MMAC: a new multi-class, multi-label associative classification approach. In Proceedings of the 4th IEEE international conference on data mining (ICDM), Brighton, UK (pp. 217–224). doi:10.1109/ICDM.2004.10117. CrossRefGoogle Scholar
  187. Topchy, A., Jain, A., & Punch, W. (2005). Clustering ensembles: models of consensus and weak partitions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(12), 1866–1881. doi:10.1109/TPAMI.2005.237. CrossRefGoogle Scholar
  188. Topchy, A. P., Law, M. H. C., Jain, A. K., & Fred, A. L. (2004). Analysis of consensus partition in cluster ensemble. In Proceedings of the 4th IEEE international conference on data mining (ICDM), Brighton, UK (pp. 225–232). doi:10.1109/ICDM.2004.10100. CrossRefGoogle Scholar
  189. Tsoumakas, G., & Katakis, I. (2007). Multi-label classification: an overview. International Journal of Data Warehousing and Mining, 3(3), 1–13. CrossRefGoogle Scholar
  190. Valentini, G., & Masulli, F. (2002). Ensembles of learning machines. In Proceedings of the 13th Italian workshop on neural nets, Vietri, Italy (pp. 3–22). doi:10.1007/3-540-45808-5_1. Google Scholar
  191. van Leeuwen, M., Vreeken, J., & Siebes, A. (2009). Identifying the components. Data Mining and Knowledge Discovery, 19(2), 173–292. MathSciNetCrossRefGoogle Scholar
  192. Vendramin, L., Campello, R. J. G. B., & Hruschka, E. R. (2010). Relative clustering validity criteria: a comparative overview. Statistical Analysis and Data Mining, 3(4), 209–235. doi:10.1002/sam.10080. MathSciNetGoogle Scholar
  193. Vreeken, J., & Zimek, A. (2011). When pattern met subspace cluster—a relationship story. In 2nd MultiClust workshop: discovering, summarizing and using multiple clusterings held in conjunction with ECML PKDD 2011, Athens, Greece (pp. 7–18). Google Scholar
  194. Vreeken, J., van Leeuwen, M., & Siebes, A. (2011). Krimp: mining itemsets that compress. Data Mining and Knowledge Discovery, 23(1), 169–214. MathSciNetMATHCrossRefGoogle Scholar
  195. Wang, C., & Parthasarathy, S. (2006). Summarizing itemset patterns using probabilistic models. In Proceedings of the 12th ACM international conference on knowledge discovery and data mining (SIGKDD), Philadelphia, PA (pp. 730–735). CrossRefGoogle Scholar
  196. Wang, H., Azuaje, F., Bodenreider, O., & Dopazo, J. (2004). Gene expression correlation and gene ontology-based similarity: an assessment of quantitative relationships. In Proceedings of the 2004 IEEE symposium on computational intelligence in bioinformatics and computational biology (CIBCB), La Jolla, CA. Google Scholar
  197. Webb, G. I. (2007). Discovering significant patterns. Machine Learning, 68(1), 1–33. CrossRefGoogle Scholar
  198. Wishart, D. (1969). Mode analysis: a generalization of nearest neighbor which reduces chaining effects. In A. J. Cole (Ed.), Numerical taxonomy (pp. 282–311). Google Scholar
  199. Wrobel, S. (1997). An algorithm for multi-relational discovery of subgroups. In Proceedings of the 1st European symposium on principles of data mining and knowledge discovery (PKDD), Trondheim, Norway (pp. 78–87). CrossRefGoogle Scholar
  200. Xiang, Y., Jin, R., Fuhry, D., & Dragan, F. (2011). Summarizing transactional databases with overlapped hyperrectangles. Data Mining and Knowledge Discovery, 23(2), 215–251. MathSciNetMATHCrossRefGoogle Scholar
  201. Yan, B., & Domeniconi, C. (2006). Subspace metric ensembles for semi-supervised clustering of high dimensional data. In Proceedings of the 17th European conference on machine learning (ECML), Berlin, Germany (pp. 509–520). Google Scholar
  202. Yan, X., Cheng, H., Han, J., & Xin, D. (2005). Summarizing itemset patterns: a profile-based approach. In Proceedings of the 11th ACM international conference on knowledge discovery and data mining (SIGKDD), Chicago, IL (pp. 314–323). Google Scholar
  203. Zeeberg, B. R., Feng, W., Wang, G., Wang, M. D., Fojo, A. T., Sunshine, M., Narasimhan, S., Kane, D. W., Reinhold, W. C., Lababidi, S., Bussey, K. J., Riss, J., Barrett, J. C., & Weinstein, J. N. (2003). GoMiner: a resource for biological interpretation of genomic and proteomic data. Genome Biology, 4(4), R28. CrossRefGoogle Scholar
  204. Zheng, L., & Li, T. (2011). Semi-supervised hierarchical clustering. In Proceedings of the 11th IEEE international conference on data mining (ICDM), Vancouver, BC (pp. 982–991). Google Scholar
  205. Zimek, A., Buchwald, F., Frank, E., & Kramer, S. (2010). A study of hierarchical and flat classification of proteins. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 7(3), 563–571. doi:10.1109/TCBB.2008.104. CrossRefGoogle Scholar
  206. Zimek, A., Schubert, E., & Kriegel, H. P. (2012). A survey on unsupervised outlier detection in high-dimensional numerical data. Statistical Analysis and Data Mining, 5(5), 363–387. doi:10.1002/sam.11161. MathSciNetCrossRefGoogle Scholar

Copyright information

© The Author(s) 2013

Authors and Affiliations

  1. 1.Department of Computing ScienceUniversity of AlbertaEdmontonCanada
  2. 2.Advanced Database Research and Modelling, Department of Mathematics and Computer ScienceUniversity of AntwerpAntwerpBelgium

Personalised recommendations