Geometric and Combinatorial Tiles in 0–1 Data

  • Aristides Gionis
  • Heikki Mannila
  • Jouni K. Seppänen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3202)

Abstract

In this paper we introduce a simple probabilistic model, hierarchical tiles, for 0–1 data. A basic tile (X,Y,p) specifies a subset X of the rows and a subset Y of the columns of the data, i.e., a rectangle, and gives a probability p for the occurrence of 1s in the cells of X × Y. A hierarchical tile has additionally a set of exception tiles that specify the probabilities for subrectangles of the original rectangle. If the rows and columns are ordered and X and Y consist of consecutive elements in those orderings, then the tile is geometric; otherwise it is combinatorial. We give a simple randomized algorithm for finding good geometric tiles. Our main result shows that using spectral ordering techniques one can find good orderings that turn combinatorial tiles into geometric tiles. We give empirical results on the performance of the methods.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aggarwal, C., Yu, P.: Finding generalized projected clusters in high dimensional spaces. In: SIGMOD (2000)Google Scholar
  2. 2.
    Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: SIGMOD (1998)Google Scholar
  3. 3.
    Agrawal, R., Imielinski, T., Swami, A.: Mining associations between sets of items in large databases. In: SIGMOD (1993)Google Scholar
  4. 4.
    Atkins, J., Boman, E., Hendrickson, B.: A spectral algorithm for seriation and the consecutive ones problem. SIAM Journal on Computing 28(1) (1999)Google Scholar
  5. 5.
    Beygelzimer, A., Perng, C.-S., Ma, S.: Fast ordering of large categorical datasets for better visualization. In: SIGKDD (2001)Google Scholar
  6. 6.
    Calders, T., Goethals, B.: Mining all non-derivable frequent itemsets. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, p. 74. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  7. 7.
    Cheng, C., Fu, A., Zhang, Y.: Entropy-based subspace clustering for mining numerical data. In: SIGKDD (1999)Google Scholar
  8. 8.
    Cheng, Y., Church, G.: Biclustering of expression data. In: ISMB (2000)Google Scholar
  9. 9.
    Chung, F.: Spectral graph theory. American Mathematical Society, Providence (1997)MATHGoogle Scholar
  10. 10.
    Edmonds, J., Gryz, J., Liang, D., Miller, R.J.: Mining for empty spaces in large data sets. Theor. Comput. Sci. 296(3), 435–452 (2003)MATHCrossRefMathSciNetGoogle Scholar
  11. 11.
    Fiedler, M.: Algebraic connectivity of graphs. Czech. Math. J. 23 (1973)Google Scholar
  12. 12.
    Friedman, J., Meulman, J.: Clustering objects on subsets of attributes. JRSS B (2004)Google Scholar
  13. 13.
    Gaines, B., Compton, P.: Induction of ripple-down rules applied to modeling large databases. JIIS 5(3) (1993)Google Scholar
  14. 14.
    Han, J., Wang, J., Lu, Y., Tzvetkov, P.: Mining top-k frequent closed patterns without minimum support. In: ICDM (2002)Google Scholar
  15. 15.
    Jain, A., Murty, M., Flynn, P.: Data clustering: A review. ACM Computing Surveys (1999)Google Scholar
  16. 16.
    Kivinen, J., Mannila, H., Ukkonen, E.: Learning hierarchical rule sets. In: COLT (1992)Google Scholar
  17. 17.
    Koren, Y., Harel, D.: Multi-scale algorithm for the linear arrangement problem. Technical Report MCS02-04, The Weizmann Institute of Science (2002)Google Scholar
  18. 18.
    Liu, B., Ku, L.-P., Hsu, W.: Discovering interesting holes in data. In: IJCAI (1997)Google Scholar
  19. 19.
    Murali, T., Kasif, S.: Extracting conserved gene expression motifs from gene expression data. In: Pac. Symp. Biocomp., vol. 8 (2003)Google Scholar
  20. 20.
    Ng, A., Jordan, M., Weiss, Y.: On spectral clustering: Analysis and an algorithm. In: NIPS (2001)Google Scholar
  21. 21.
    Pothen, A., Simon, H., Wang, L.: Spectral nested dissection. Technical Report CS-92-01, Pennsylvania State University, Department of Computer Science (1992)Google Scholar
  22. 22.
    Rivest, R.: Learning decision lists. Machine Learning 2(3) (1987)Google Scholar
  23. 23.
    Zha, H., He, X., Ding, C., Gu, M., Simon, H.: Bipartite graph partitioning and data clustering. In: CIKM (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Aristides Gionis
    • 1
  • Heikki Mannila
    • 1
  • Jouni K. Seppänen
    • 1
  1. 1.Helsinki Institute for Information TechnologyUniversity of Helsinki and Helsinki University of TechnologyFinland

Personalised recommendations