Skip to main content

Learning Approximate MRFs from Large Transactional Data

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNCCN,volume 4503))

Abstract

In this abstract we address the problem of learning approximate Markov Random Fields (MRF) from large transactional data. Examples of such data include market basket data, co-authorship networked data, etc. Such data can be represented by a binary data matrix, with an entry (i, j) takes a value of one (zero) if the item j is (not) in the basket i. “Large” means that there can be many rows or columns in the data matrix. To model such data effectively in order to answer queries about the data efficiently, we consider the use of probabilistic models. In this abstract, we consider employing frequent itemsets to learn approximate global MRFs on large transactional data. We conduct an empirical study on real datasets to show the efficiency and effectiveness of our model on solving the query selectivity estimation problem, that is to approximately compute the marginal probability of sets of items (see [1] for the experimental results). Translated into the social network domain, this is the problem of computing the likelihood of seeing a particular combination of grocery items in the market basket domain, or the probability of a group of professors coauthoring a paper in a co-authorship network, etc. This marginal probability computation is also useful for anomalous link detection [2] in social network analysis. A link in a social network corresponds to a pair of items. The links whose associated marginal probabilities are significantly low can be thought of as anomalous.

This work is supported by DOE Award No. DE-FG02-04ER25611 and NSF CAREER Grant IIS-0347662. We refer the reader to a longer version of this paper for experimental results and complete proofs and discussions.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Wang, C., Parthasarathy, S.: Learning approximate MRFs from large transactional data. Technical Report: OSU-CISRC-5/06–TR59, The Ohio State University (2006)

    Google Scholar 

  2. Rattigan, M.J., Jensen, D.: The case for anomalous link discovery. ACM SIGKDD Explorations Newsletter 7, 41–47 (2005)

    Article  Google Scholar 

  3. Pavlov, D., Mannila, H., Smyth, P.: Beyond independence: probabilistic models for query approximation on binary transaction data. IEEE Transactions on Knowledge and Data Engineering 15, 1409–1421 (2003)

    Article  Google Scholar 

  4. Goldenberg, A., Moore, A.: Tractable learning of large Bayes net structures from sparse data. In: Proceedings of the twenty-first international conference on Machine learning (2004)

    Google Scholar 

  5. Karypis, G., Kumar, V.: Multilevel k-way partitioning scheme for irregular graphs. J. Parallel Distrib. Comput. 48, 96–129 (1998)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Edoardo Airoldi David M. Blei Stephen E. Fienberg Anna Goldenberg Eric P. Xing Alice X. Zheng

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wang, C., Parthasarathy, S. (2007). Learning Approximate MRFs from Large Transactional Data. In: Airoldi, E., Blei, D.M., Fienberg, S.E., Goldenberg, A., Xing, E.P., Zheng, A.X. (eds) Statistical Network Analysis: Models, Issues, and New Directions. ICML 2006. Lecture Notes in Computer Science, vol 4503. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73133-7_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-73133-7_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-73132-0

  • Online ISBN: 978-3-540-73133-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics