Learning Approximate MRFs from Large Transactional Data

Wang, Chao; Parthasarathy, Srinivasan

doi:10.1007/978-3-540-73133-7_16

Learning Approximate MRFs from Large Transactional Data

Chao Wang¹ &
Srinivasan Parthasarathy¹

Conference paper

1406 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCCN,volume 4503))

Abstract

In this abstract we address the problem of learning approximate Markov Random Fields (MRF) from large transactional data. Examples of such data include market basket data, co-authorship networked data, etc. Such data can be represented by a binary data matrix, with an entry (i, j) takes a value of one (zero) if the item j is (not) in the basket i. “Large” means that there can be many rows or columns in the data matrix. To model such data effectively in order to answer queries about the data efficiently, we consider the use of probabilistic models. In this abstract, we consider employing frequent itemsets to learn approximate global MRFs on large transactional data. We conduct an empirical study on real datasets to show the efficiency and effectiveness of our model on solving the query selectivity estimation problem, that is to approximately compute the marginal probability of sets of items (see [1] for the experimental results). Translated into the social network domain, this is the problem of computing the likelihood of seeing a particular combination of grocery items in the market basket domain, or the probability of a group of professors coauthoring a paper in a co-authorship network, etc. This marginal probability computation is also useful for anomalous link detection [2] in social network analysis. A link in a social network corresponds to a pair of items. The links whose associated marginal probabilities are significantly low can be thought of as anomalous.

This work is supported by DOE Award No. DE-FG02-04ER25611 and NSF CAREER Grant IIS-0347662. We refer the reader to a longer version of this paper for experimental results and complete proofs and discussions.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Wang, C., Parthasarathy, S.: Learning approximate MRFs from large transactional data. Technical Report: OSU-CISRC-5/06–TR59, The Ohio State University (2006)
Google Scholar
Rattigan, M.J., Jensen, D.: The case for anomalous link discovery. ACM SIGKDD Explorations Newsletter 7, 41–47 (2005)
Article Google Scholar
Pavlov, D., Mannila, H., Smyth, P.: Beyond independence: probabilistic models for query approximation on binary transaction data. IEEE Transactions on Knowledge and Data Engineering 15, 1409–1421 (2003)
Article Google Scholar
Goldenberg, A., Moore, A.: Tractable learning of large Bayes net structures from sparse data. In: Proceedings of the twenty-first international conference on Machine learning (2004)
Google Scholar
Karypis, G., Kumar, V.: Multilevel k-way partitioning scheme for irregular graphs. J. Parallel Distrib. Comput. 48, 96–129 (1998)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, The Ohio State University,
Chao Wang & Srinivasan Parthasarathy

Authors

Chao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Srinivasan Parthasarathy
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Edoardo Airoldi David M. Blei Stephen E. Fienberg Anna Goldenberg Eric P. Xing Alice X. Zheng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, C., Parthasarathy, S. (2007). Learning Approximate MRFs from Large Transactional Data. In: Airoldi, E., Blei, D.M., Fienberg, S.E., Goldenberg, A., Xing, E.P., Zheng, A.X. (eds) Statistical Network Analysis: Models, Issues, and New Directions. ICML 2006. Lecture Notes in Computer Science, vol 4503. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73133-7_16

Download citation

DOI: https://doi.org/10.1007/978-3-540-73133-7_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73132-0
Online ISBN: 978-3-540-73133-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics