Multi-way set enumeration in weight tensors

Georgii, Elisabeth; Tsuda, Koji; Schölkopf, Bernhard

doi:10.1007/s10994-010-5210-y

Multi-way set enumeration in weight tensors

Open access
Published: 25 September 2010

Volume 82, pages 123–155, (2011)
Cite this article

Download PDF

You have full access to this open access article

Machine Learning Aims and scope Submit manuscript

Multi-way set enumeration in weight tensors

Download PDF

Elisabeth Georgii^1,2^nAff3,
Koji Tsuda^4,5 &
Bernhard Schölkopf⁶

907 Accesses
14 Citations
Explore all metrics

Abstract

The analysis of n-ary relations receives attention in many different fields, for instance biology, web mining, and social studies. In the basic setting, there are n sets of instances, and each observation associates n instances, one from each set. A common approach to explore these n-way data is the search for n-set patterns, the n-way equivalent of itemsets. More precisely, an n-set pattern consists of specific subsets of the n instance sets such that all possible associations between the corresponding instances are observed in the data. In contrast, traditional itemset mining approaches consider only two-way data, namely items versus transactions. The n-set patterns provide a higher-level view of the data, revealing associative relationships between groups of instances. Here, we generalize this approach in two respects. First, we tolerate missing observations to a certain degree, that means we are also interested in n-sets where most (although not all) of the possible associations have been recorded in the data. Second, we take association weights into account. In fact, we propose a method to enumerate all n-sets that satisfy a minimum threshold with respect to the average association weight. Technically, we solve the enumeration task using a reverse search strategy, which allows for effective pruning of the search space. In addition, our algorithm provides a ranking of the solutions and can consider further constraints. We show experimental results on artificial and real-world datasets from different domains.

Article PDF

Turning Krimp into a Triclustering Technique on Sets of Attribute-Condition Pairs that Compress

Mining skypatterns in fuzzy tensors

Article 04 July 2019

Efficient Algorithms for Association Finding and Frequent Association Pattern Mining

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Acar, E., Aykut-Bingol, C., Bingol, H., Bro, R., & Yener, B. (2007). Multiway analysis of epilepsy tensors. Bioinformatics, 23(13), i10–i18.
Article Google Scholar
Acar, E., Çamtepe, S., & Yener, B. (2006). Collective sampling and analysis of high order tensors for chatroom communications. In Intelligence and security informatics (pp. 213–224). Berlin: Springer.
Chapter Google Scholar
Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In VLDB ’94: Proceedings of the 20th international conference on very large data bases (pp. 487–499). San Mateo: Morgan Kaufmann.
Google Scholar
Asahiro, Y., Iwama, K., Tamaki, H., & Tokuyama, T. (2000). Greedily finding a dense subgraph. Journal of Algorithms, 34(2), 203–221.
Article MATH MathSciNet Google Scholar
Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., Davis, A. P., Dolinski, K., Dwight, S. S., Eppig, J. T., Harris, M. A., Hill, D. P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J. C., Richardson, J. E., Ringwald, M., Rubin, G. M., & Sherlock, G. (2000). Gene ontology: tool for the unification of biology. Nature Genetics, 25(1), 25–29.
Article Google Scholar
Avis, D., & Fukuda, K. (1996). Reverse search for enumeration. Discrete Applied Mathematics, 65, 21–46.
Article MATH MathSciNet Google Scholar
Bader, G. D., & Hogue, C. W. (2003). An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics, 4, 2.
Article Google Scholar
Baranzini, S. E., Mousavi, P., Rio, J., Caillier, S. J., Stillman, A., Villoslada, P., Wyatt, M. M., Comabella, M., Greller, L. D., Somogyi, R., Montalban, X., & Oksenberg, J. R. (2004). Transcription-based prediction of response to IFNβ using supervised computational methods. PLoS Biology, 3(1), e2.
Article Google Scholar
Beckmann, C. F., & Smith, S. M. (2005). Tensorial extensions of independent component analysis for multisubject FMRI analysis. Neuroimage, 25(1), 294–311.
Article Google Scholar
Bejerano, G., Friedman, N., & Tishby, N. (2004). Efficient exact p-value computation for small sample, sparse, and surprising categorical data. Journal of Computational Biology, 11(5), 867–886.
Google Scholar
Besson, J., Robardet, C., De Raedt, L., & Boulicaut, J. F. (2006). Mining bi-sets in numerical data. In Lecture notes in computer science : Vol. 4747. KDID ’06: Knowledge discovery in inductive databases, fifth international workshop (pp. 11–23). Berlin: Springer.
Chapter Google Scholar
Borgwardt, K. M., Kriegel, H. P., & Wackersreuther, P. (2006). Pattern mining in frequent dynamic subgraphs. In ICDM ’06: Proceedings of the sixth international conference on data mining (pp. 818–822). Los Alamitos: IEEE Comput. Soc.
Chapter Google Scholar
Cerf, L., Besson, J., Robardet, C., & Boulicaut, J. F. (2008). Data peeler: contraint-based closed pattern mining in n-ary relations. In SDM ’08: Proceedings of the SIAM international conference on data mining (pp. 37–48).
Culhane, A. C., Schwarzl, T., Sultana, R., Picard, K. C., Picard, S. C., Lu, T. H., Franklin, K. R., French, S. J., Papenhausen, G., Correll, M., & Quackenbush, J. (2010). GeneSigDB—a curated database of gene expression signatures. Nucleic Acids Research 38(suppl_1), D716–D725.
Article Google Scholar
Everett, L., Wang, L. S., & Hannenhalli, S. (2006). Dense subgraph computation via stochastic search: application to detect transcriptional modules. Bioinformatics, 22(14), e117–e123.
Article Google Scholar
Farkas, I. J., Abel, D., Palla, G., & Vicsek, T. (2007). Weighted network modules. New Journal of Physics, 9, 180.
Article Google Scholar
Gasch, A. P., Spellman, P. T., Kao, C. M., Carmel-Harel, O., Eisen, M. B., Storz, G., Botstein, D., & Brown, P. O. (2000). Genomic expression programs in the response of yeast cells to environmental changes. Molecular Biology of the Cell, 11(12), 4241–4257.
Google Scholar
Georgii, E., Dietmann, S., Uno, T., Pagel, P., & Tsuda, K. (2009a). Enumeration of condition-dependent dense modules in protein interaction networks. Bioinformatics, 25(7), 933–940.
Article Google Scholar
Georgii, E., Tsuda, K., & Schölkopf, B. (2009b). Multi-way set enumeration in real-valued tensors. In DMMT ’09: Proceedings of the second workshop on data mining using matrices and tensors (pp. 32–41). New York: ACM.
Google Scholar
Goldberg, L. A. (1992). Efficient algorithms for listing unlabeled graphs. Journal of Algorithms, 13(1), 128–143.
Article MATH MathSciNet Google Scholar
Han, J., & Kamber, M. (2006). The Morgan Kaufmann series data management systems. Data mining: concepts and techniques. San Mateo: Morgan Kaufmann.
Google Scholar
Haraguchi, M., & Okubo, Y. (2006). A method for pinpoint clustering of web pages with pseudo-clique search. In Lecture notes in computer science : Vol. 3847. Federation over the Web (pp. 59–78). Berlin: Springer.
Chapter Google Scholar
Höppner, F., Klawonn, F., Kruse, R., & Runkler, T. (1999). Fuzzy cluster analysis: methods for classification, data analysis and image recognition. New York: Wiley.
MATH Google Scholar
Hu, H., Yan, X., Huang, Y., Han, J., & Zhou, X. J. (2005). Mining coherent dense subgraphs across massive biological networks for functional discovery. Bioinformatics, 21(suppl_1), i213–i221.
Article Google Scholar
Jaschke, R., Hotho, A., Schmitz, C., Ganter, B., & Stumme, G. (2006). TRIAS—an algorithm for mining iceberg tri-lattices. In ICDM ’06: Proceedings of the sixth international conference on data mining (pp. 907–911). Los Alamitos: IEEE Comput. Soc.
Chapter Google Scholar
Jegelka, S., Sra, S., & Banerjee, A. (2009). Approximation algorithms for tensor clustering. In Algorithmic learning theory (pp. 368–383).
Ji, L., Tan, K. L., & Tung, A. K. H. (2006). Mining frequent closed cubes in 3D datasets. In VLDB ’06: Proceedings of the thirty-second international conference on very large data bases (pp. 811–822). VLDB Endowment/ACM, New York. http://portal.acm.org/citation.cfm?id=1164197, http://dblp.uni-trier.de/rec/bibtex/conf/vldb/JiTT06.
Google Scholar
Jiang, D., & Pei, J. (2009). Mining frequent cross-graph quasi-cliques. ACM Transactions on Knowledge Discovery Data, 2(4), 1–42.
Article MathSciNet Google Scholar
Kemp, C., Tenenbaum, J. B., Griffiths, T. L., Yamada, T., & Ueda, N. (2006). Learning systems of concepts with an infinite relational model. In AAAI ’06: Proceedings of the twenty-first national conference on artificial intelligence (pp. 381–388). Menlo Park: AAAI Press.
Google Scholar
Klimt, B., & Yang, Y. (2004). The Enron corpus: a new dataset for email classification research. In ECML ’04: Proceedings of the 15th european conference on machine learning (pp. 217–226). Berlin: Springer.
Chapter Google Scholar
Kolda, T. G., & Bader, B. W. (2007). Tensor decompositions and applications. Technical Report SAND2007-6702, Sandia National Laboratories.
Kolda, T. G., Bader, B. W., & Kenny, J. P. (2005). Higher-order web link analysis using multilinear algebra. In ICDM ’05: Proceedings of the fifth IEEE international conference on data mining (pp. 242–249). Los Alamitos: IEEE Comput. Soc.
Chapter Google Scholar
Kolda, T. G., & Sun, J. (2008). Scalable tensor decompositions for multi-aspect data mining. In ICDM ’08: Proceedings of the eighth IEEE international conference on data mining (pp. 363–372).
Koyutürk, M., Szpankowski, W., & Grama, A. (2007). Assessing significance of connectivity and conservation in protein interaction networks. Journal of Computational Biology, 14(6), 747–764.
Article MathSciNet Google Scholar
Madeira, S. C., & Oliveira, A. L. (2004). Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Transactions on Computational Biology of Bioinformatics, 1(1), 24–45.
Article Google Scholar
Mishra, N., Ron, D., & Swaminathan, R. (2004). A new conceptual clustering framework. Machine Learning, 56(1–3), 115–151.
Article MATH Google Scholar
Newman, M. E. (2006). Modularity and community structure in networks. Proceedings of the National Academy of Sciences of United States of America, 103(23), 8577–8582.
Article Google Scholar
Palla, G., Derenyi, I., Farkas, I., & Vicsek, T. (2005). Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435(7043), 814–818.
Article Google Scholar
Robardet, C. (2009). Constraint-based pattern mining in dynamic graphs. In ICDM ’09: Proceedings of the ninth IEEE international conference on data mining (pp. 950–955). Los Alamitos: IEEE Comput. Soc.
Chapter Google Scholar
Rymon, R. (1992). Search through systematic set enumeration. In Proceedings of the third international conference on principles of knowledge representation and reasoning (pp. 539–550).
Schaeffer, S. E. (2007). Graph clustering. Computer Science Review, 1(1), 27–64.
Article MathSciNet Google Scholar
Shamir, R., Maron-Katz, A., Tanay, A., Linhart, C., Steinfeld, I., Sharan, R., Shiloh, Y., & Elkon, R. (2005). EXPANDER—an integrative program suite for microarray data analysis. BMC Bioinformatics, 6(1), 232.
Article Google Scholar
Spirin, V., & Mirny, L. A. (2003). Protein complexes and functional modules in molecular networks. Proceedings of the National Academy of Sciences of United States of America, 100(21), 12123–12128.
Article Google Scholar
Sun, J., Tao, D., & Faloutsos, C. (2006). Beyond streams and graphs: dynamic tensor analysis. In KDD ’06: Proceedings of the twelfth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 374–383). New York: ACM.
Google Scholar
Tanay, A., Sharan, R., & Shamir, R. (2002). Discovering statistically significant biclusters in gene expression data. Bioinformatics, 18(Suppl 1), S136–S144.
Google Scholar
Ulitsky, I., & Shamir, R. (2009). Identifying functional modules using expression profiles and confidence-scored protein interactions. Bioinformatics, 25(9), 1158–1164.
Article Google Scholar
Uno, T. (2007). An efficient algorithm for enumerating pseudo cliques. In ISAAC ’07: Algorithms and computation, eighteenth international symposium (pp. 402–414).
Yan, C., Burleigh, J. G., & Eulenstein, O. (2005). Identifying optimal incomplete phylogenetic data sets from sequence databases. Molecular Phylogenetics and Evolution, 35(3), 528–535.
Article Google Scholar
Yan, X., & Han, J. (2002). gSpan: graph-based substructure pattern mining. In ICDM ’02: Proceedings of the second IEEE international conference on data mining (pp. 721–724). Los Alamitos: IEEE Comput. Soc.
Google Scholar
Yan, X., Zhou, X. J., & Han, J. (2005). Mining closed relational graphs with connectivity constraints. In KDD ’05: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery and data mining (pp. 324–333). New York: ACM.
Chapter Google Scholar
Zeng, Z., Wang, J., Zhou, L., & Karypis, G. (2006). Coherent closed quasi-clique discovery from large dense graph databases. In KDD ’06: Proceedings of the twelfth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 797–802). New York: ACM.
Google Scholar
Zhao, L., & Zaki, M. J. (2005). TRICLUSTER: an effective algorithm for mining coherent clusters in 3D microarray data. In SIGMOD ’05: Proceedings of the 2005 ACM SIGMOD international conference on management of data (pp. 694–705). New York: ACM.
Chapter Google Scholar
Zhu, F., Yan, X., Han, J., & Yu, P. S. (2007). gPrune: a constraint pushing framework for graph pattern mining. In PAKDD ’07: Proceedings of the eleventh Pacific-Asia conference on advances in knowledge discovery and data mining (pp. 388–400). Berlin: Springer.
Google Scholar

Download references

Author information

Elisabeth Georgii
Present address: Department of Information and Computer Science, Helsinki Institute for Information Technology, HIIT, Aalto University School of Science and Technology, P.O. Box 15400, 00076, Aalto, Finland

Authors and Affiliations

Department of Empirical Inference, Max Planck Institute for Biological Cybernetics, Tübingen, Germany
Elisabeth Georgii
Friedrich Miescher Laboratory of the Max Planck Society, Tübingen, Germany
Elisabeth Georgii
Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, AIST, Tokyo, Japan
Koji Tsuda
ERATO Minato Project, Japan Science and Technology Agency, Tokyo, Japan
Koji Tsuda
Department of Empirical Inference, Max Planck Institute for Biological Cybernetics, Tübingen, Germany
Bernhard Schölkopf

Authors

Elisabeth Georgii
View author publications
You can also search for this author in PubMed Google Scholar
Koji Tsuda
View author publications
You can also search for this author in PubMed Google Scholar
Bernhard Schölkopf
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Elisabeth Georgii.

Additional information

Editors: S.V.N. Vishwanathan, Samuel Kaski, Jennifer Neville, and Stefan Wrobel.

Rights and permissions

Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Reprints and permissions

About this article

Cite this article

Georgii, E., Tsuda, K. & Schölkopf, B. Multi-way set enumeration in weight tensors. Mach Learn 82, 123–155 (2011). https://doi.org/10.1007/s10994-010-5210-y

Download citation

Received: 01 June 2009
Accepted: 01 May 2010
Published: 25 September 2010
Issue Date: February 2011
DOI: https://doi.org/10.1007/s10994-010-5210-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Multi-way set enumeration in weight tensors

Abstract

Article PDF

Similar content being viewed by others

Turning Krimp into a Triclustering Technique on Sets of Attribute-Condition Pairs that Compress

Mining skypatterns in fuzzy tensors

Efficient Algorithms for Association Finding and Frequent Association Pattern Mining

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-way set enumeration in weight tensors

Abstract

Article PDF

Similar content being viewed by others

Turning Krimp into a Triclustering Technique on Sets of Attribute-Condition Pairs that Compress

Mining skypatterns in fuzzy tensors

Efficient Algorithms for Association Finding and Frequent Association Pattern Mining

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation