Skip to main content

Actionable Mining of Large, Multi-relational Data Using Localized Predictive Models

  • Conference paper
  • 807 Accesses

Part of the Communications in Computer and Information Science book series (CCIS,volume 272)

Abstract

Many large datasets associated with modern predictive data mining applications are quite complex and heterogeneous, possibly involving multiple relations, or exhibiting a dyadic nature with associated side-information. For example, one may be interested in predicting the preferences of a large set of customers for a variety of products, given various properties of both customers and products, as well as past purchase history, a social network on the customers, and a conceptual hierarchy on the products. This article provides an overview of recent innovative approaches to predictive modeling for such types of data, and also provides some concrete application scenarios to highlight the issues involved. The common philosophy in all the approaches described is to pursue a simultaneous problem decomposition and modeling strategy that can exploit heterogeneity in behavior, use the wide variety of information available and also yield relatively more interpretable solutions as compared to global ”one-shot” approaches. Since both the problem domains and approaches considered are quite new, we also highlight the potential for further investigations on several occasions throughout this article.

Keywords

  • Recommender System
  • Latent Dirichlet Allocation
  • Side Information
  • Cluster Assignment
  • Mean Field Approximation

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-642-29764-9_1
  • Chapter length: 20 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   79.99
Price excludes VAT (USA)
  • ISBN: 978-3-642-29764-9
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   99.99
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abernethy, J., Bach, F., Evgeniou, T., Vert, J.P.: A new approach to collaborative filtering: Operator estimation with spectral regularization. The Journal of Machine Learning Research 10, 803–826 (2009)

    MATH  Google Scholar 

  2. Agarwal, D., Chen, B.: Regression-based latent factor models. In: KDD 2009, pp. 19–28 (2009)

    Google Scholar 

  3. Agarwal, D., Chen, B., Elango, P.: Spatio-temporal models for estimating click-through rate. In: WWW 2009: Proceedings of the 18th International Conference on World Wide Web, pp. 21–30 (2009)

    Google Scholar 

  4. Agarwal, D., Chen, B.: flda: matrix factorization through latent dirichlet allocation. In: Proc. ACM International Conference on Web Search and Data Mining 2010, pp. 91–100 (2010)

    Google Scholar 

  5. Agarwal, D., Merugu, S.: Predictive discrete latent factor models for large scale dyadic data. In: KDD 2007, pp. 26–35 (2007)

    Google Scholar 

  6. Dempster, A.P., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the em algorithm. J. Royal Statistical Society, Series B(Methodological) 39(1), 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  7. Banerjee, A., Merugu, S., Dhillon, I., Ghosh, J.: Clustering with Bregman divergences. Jl. Machine Learning Research (JMLR) 6, 1705–1749 (2005)

    MathSciNet  MATH  Google Scholar 

  8. Banerjee, A., Basu, S., Merugu, S.: Multi-way clustering on relation graphs. In: SDM (2007)

    Google Scholar 

  9. Basilico, J., Hofmann, T.: Unifying collaborative and content-based filtering. In: ICML (2004)

    Google Scholar 

  10. Bertsekas, D.: Nonlinear Programming. Athena Scientific (1999)

    Google Scholar 

  11. Chamberlain, D.E., Gough, S., Vickery, J.A., Firbank, L.G., Petit, S., Pywell, R., Bradbury, R.B.: Rule-based predictive models are not cost-effective alternatives to bird monitoring on farmland. Agriculture, Ecosystems & Environment 101(1), 1–8 (2004)

    CrossRef  Google Scholar 

  12. Deodhar, M., Ghosh, J.: A framework for simultaneous co-clustering and learning from complex data. In: KDD 2007, pp. 250–259 (2007)

    Google Scholar 

  13. Deodhar, M., Ghosh, J.: Simultaneous co-clustering and modeling of market data. In: Workshop for Data Mining in Marketing, Industrial Conf. on Data Mining 2007, pp. 73–82 (2007)

    Google Scholar 

  14. Deodhar, M., Ghosh, J.: Simultaneous co-segmentation and predictive modeling for large, temporal marketing data. In: Data Mining for Marketing Workshop, ICDM 2008 (2008)

    Google Scholar 

  15. Deodhar, M., Ghosh, J.: Mining for most certain predictions from dyadic data. In: Proc. 15th ACM SIGKDD Conf. on Knowledge Discovery and Data Mining, KDD 2009 (2009)

    Google Scholar 

  16. Deodhar, M., Ghosh, J., Tsar-Tsansky, M.: Active learning for recommender systems with multiple localized models. In: Proc. Fifth Symposium on Statistical Challenges in Electronic Commerce Research, SCECR 2009 (2009)

    Google Scholar 

  17. Dietterich, T.G., Domingos, P., Getoor, L., Muggleton, S., Tadepalli, P.: Structured machine learning: the next ten years. Machine Learning 73(1), 3–23 (2008)

    CrossRef  Google Scholar 

  18. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. JMLR 3, 993–1022 (2003)

    MATH  Google Scholar 

  19. Dzeroski, S.: Multi-relational data mining: an introduction. SIGKDD Explorations 5(1), 1–16 (2003)

    CrossRef  Google Scholar 

  20. Airoldi, E., Blei, D.M., Fienberg, S.E., Xing, E.P.: Mixed membership stochastic blockmodels. JMLR 9, 1981–2014 (2008)

    MATH  Google Scholar 

  21. Gelman, A., Hill, J.: Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press (2007)

    Google Scholar 

  22. George, T., Merugu, S.: A scalable collaborative filtering framework based on co-clustering. In: Proceedings of the Fifth IEEE International Conference on Data Mining, pp. 625–628 (2005)

    Google Scholar 

  23. Getoor, L., Friedman, N., Koller, D., Taskar, B.: Learning probabilistic models of relational structure. In: Proc. 18th International Conf. on Machine Learning, pp. 170–177. Morgan Kaufmann, San Francisco (2001), citeseer.ist.psu.edu/article/getoor01learning.html

    Google Scholar 

  24. Grover, R., Srinivasan, V.: A simultaneous approach to market segmentation and market structuring. Journal of Marketing Research, 139–153 (1987)

    Google Scholar 

  25. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, 2nd edn. Springer, Heidelberg (2009)

    MATH  CrossRef  Google Scholar 

  26. Herlocker, J., Konstan, J., Borchers, A., Riedl, J.: An algorithmic framework for performing collaborative filtering. In: SIGIR 1999: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 230–237. ACM, Berkeley (1999)

    CrossRef  Google Scholar 

  27. Kim, B., Rossi, P.: Purchase frequency, sample selection, and price sensitivity: The heavy-user bias. Marketing Letters, 57–67 (1994)

    Google Scholar 

  28. Kim, B., Sullivan, M.: The effect of parent brand experience on line extension trial and repeat purchase. Marketing Letters, 181–193 (1998)

    Google Scholar 

  29. Kolda, T.: Tensor decompositions and data mining. In: Tutorial at ICDM (2007)

    Google Scholar 

  30. Kolda, T.G., Sun, J.: Scalable tensor decompositions for multi-aspect data mining. In: ICDM, pp. 363–372 (2008)

    Google Scholar 

  31. Lim, Y., Teh, Y.: Variational bayesian approach to movie rating prediction. In: Proc. KDD Cup and Workshop (2007)

    Google Scholar 

  32. Lokmic, L., Smith, K.A.: Cash flow forecasting using supervised and unsupervised neural networks. IJCNN 06, 6343 (2000)

    Google Scholar 

  33. Lu, Z., Agarwal, D., Dhillon, I.: A spatio-temporal approach to collaborative filtering. In: RecSys 2009 (2009)

    Google Scholar 

  34. Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: A survey. IEEE/ACM Trans. Comput. Biology Bioinform. 1(1), 24–45 (2004)

    CrossRef  Google Scholar 

  35. Moe, W., Fader, P.: Modeling hedonic portfolio products: A joint segmentation analysis of music compact disc sales. Journal of Marketing Research, 376–385 (2001)

    Google Scholar 

  36. Munson, M.A., et al.: The ebird reference dataset. Tech. Report, Cornell Lab of Ornithology and National Audubon Society (June 2009)

    Google Scholar 

  37. Murray-Smith, R., Johansen, T.A.: Multiple Model Approaches to Modelling and Control. Taylor and Francis, UK (1997)

    Google Scholar 

  38. Nowicki, K., Snijders, T.A.B.: Estimation and prediction for stochastic blockstructures. Journal of the American Statistical Association 96(455), 1077–1087 (2001), http://www.ingentaconnect.com/content/asa/jasa/2001/00000096/00000455/art00025

    MathSciNet  MATH  CrossRef  Google Scholar 

  39. Oh, K., Han, I.: An intelligent clustering forecasting system based on change-point detection and artificial neural networks: Application to financial economics. In: HICSS-34, vol. 3, p. 3011 (2001)

    Google Scholar 

  40. Reutterer, T.: Competitive market structure and segmentation analysis with self-organizing feature maps. In: Proceedings of the 27th EMAC Conference, pp. 85–115 (1998)

    Google Scholar 

  41. Salakhutdinov, R., Mnih, A.: Probabilistic matrix factorization. In: NIPS 2007 (2007)

    Google Scholar 

  42. Salakhutdinov, R., Mnih, A.: Bayesian probabilistic matrix factorization using markov chain monte carlo. In: Proc. ICML 2008, pp. 880–887 (2008)

    Google Scholar 

  43. Sanderson, F.J., Kloch, A., Sachanowicz, K., Donald, P.F.: Predicting the effects of agricultural change on farmland bird populations in poland. Agriculture, Ecosystems & Environment 129(1-3), 37–42 (2009)

    CrossRef  Google Scholar 

  44. Seetharaman, P., Ainslie, A., Chintagunta, P.: Investigating household state dependence effects across categories. Journal of Marketing Research, 488–500 (1999)

    Google Scholar 

  45. Shan, H., Banerjee, A.: Residual bayesian co-clustering and matrix approximation. In: Proc. SDM 2010, pp. 223–234 (2010)

    Google Scholar 

  46. Shan, H., Banerjee, A.: Bayesian co-clustering. In: ICDM, pp. 530–539 (2008)

    Google Scholar 

  47. Sharma, A., Ghosh, J.: Side information aware bayesian affinity estimation. Technical Report TR-11, Department of ECE, UT Austin (2010)

    Google Scholar 

  48. Takcs, G., Pilszy, I., NÈmeth, B., Tikk, D.: Investigation of various matrix factorization methods for large recommender systems. In: 2nd KDD-Netflix Workshop (2008)

    Google Scholar 

  49. Vasilescu, M.A.O., Terzopoulos, D.: Multilinear Analysis of Image Ensembles: TensorFaces. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part I. LNCS, vol. 2350, pp. 447–460. Springer, Heidelberg (2002)

    CrossRef  Google Scholar 

  50. Wainwright, M.J., Jordan, M.I.: Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning 1(1-2), 1–305 (2008)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ghosh, J., Sharma, A. (2013). Actionable Mining of Large, Multi-relational Data Using Localized Predictive Models. In: Fred, A., Dietz, J.L.G., Liu, K., Filipe, J. (eds) Knowledge Discovery, Knowledge Engineering and Knowledge Management. IC3K 2010. Communications in Computer and Information Science, vol 272. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29764-9_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-29764-9_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-29763-2

  • Online ISBN: 978-3-642-29764-9

  • eBook Packages: Computer ScienceComputer Science (R0)