Machine Learning

, Volume 106, Issue 12, pp 1867–1904 | Cite as

Relational data factorization

  • Sergey Paramonov
  • Matthijs van Leeuwen
  • Luc De Raedt
Article
  • 99 Downloads

Abstract

Motivated by an analogy with matrix factorization, we introduce the problem of factorizing relational data. In matrix factorization, one is given a matrix and has to factorize it as a product of other matrices. In relational data factorization, the task is to factorize a given relation as a conjunctive query over other relations, i.e., as a combination of natural join operations. Given a conjunctive query and the input relation, the problem is to compute the extensions of the relations used in the query. Thus, relational data factorization is a relational analog of matrix factorization; it is also a form of inverse querying as one has to compute the relations in the query from the result of the query. The result of relational data factorization is neither necessarily unique nor required to be a lossless decomposition of the original relation. Therefore, constraints can be imposed on the desired factorization and a scoring function is used to determine its quality (often similarity to the original data). Relational data factorization is thus a constraint satisfaction and optimization problem. We show how answer set programming can be used for solving relational data factorization problems.

Keywords

Answer set programming Inductive logic programming Pattern mining Relational data Factorization Data mining Declarative modeling 

References

  1. Aftrati, F., Das, G., Gionis, A., Mannila, H., Mielikäinen, T., & Tsaparas, P. (2012). Mining chains of relations. In D. E. Holmes & L. C. Jain (Eds.), Data mining: foundations and intelligent paradigms, intelligent systems reference library (Vol. 24, pp. 217–246). Berlin, Heidelberg: Springer.CrossRefGoogle Scholar
  2. Arimura, H., Medina, R., & Petit, J.M. (Eds.). (2012). In: Proceedings of the IEEE ICDM Workshop on Declarative Pattern Mining.Google Scholar
  3. Aykanat, C., Pinar, A., & Catalyurek, Ü. V. (2002). Permuting sparse rectangular matrices into block-diagonal form. SIAM Journal on Scientific Computing, 25, 1860–1879.MathSciNetCrossRefMATHGoogle Scholar
  4. Bache, K., & Lichman, M. (2013). UCI machine learning repository. http://archive.ics.uci.edu/ml.
  5. Berzal, F., Cubero, J. C., Cuenca, F., & Medina, J. M. (2002). Relational decomposition through partial functional dependencies. Data and Knowledge Engineering, 43(2), 207–234.CrossRefMATHGoogle Scholar
  6. Biskup, J., Paredaens, J., Schwentick, T., & den Bussche, J. V. (2004). Solving equations in the relational algebra. SIAM Journal on Computing, 33(5), 1052–1066.MathSciNetCrossRefMATHGoogle Scholar
  7. Brewka, G., Eiter, T., & Truszczyński, M. (2011). Answer set programming at a glance. Communications of the ACM, 54(12), 92–103.CrossRefGoogle Scholar
  8. Chang, M. W., Ratinov, L. A., Rizzolo, N., & Roth, D. (2008). Learning and inference with constraints. Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, AAAI, 2008, 1513–1518.Google Scholar
  9. Codd, E. F. (1970). A relational model of data for large shared data banks. Communications of the ACM, 13(6), 377–387.CrossRefMATHGoogle Scholar
  10. Date, C. J. (2006). Date on database: Writings 2000–2006. Berkely, CA, USA: Apress.Google Scholar
  11. De Raedt, L. (2008). Logical and relational learning. Berlin: Cognitive Technologies, Springer.CrossRefMATHGoogle Scholar
  12. De Raedt, L. (2012). Declarative modeling for machine learning and data mining. In: The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, pp 2–3.Google Scholar
  13. De Raedt, L. (2015). Languages for learning and mining. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, January 25–30, 2015 (pp. 4107–4111). USA.: Austin, Texas.Google Scholar
  14. Denecker, M., & Kakas, A. (2002). Abduction in logic programming. In A. Kakas & F. Sadri (Eds.), Computational logic: Logic programming and beyond, lecture notes in computer science (Vol. 2407, pp. 402–436). Berlin, Heidelberg: Springer.CrossRefGoogle Scholar
  15. Eiter, T., Ianni, G., & Krennwallner, T. (2009). Answer set programming: A primer. In: 5th International Reasoning Web Summer School (RW 2009), Brixen/Bressanone, Italy, August 30 – September 4, 2009, Springer, LNCS, vol 5689.Google Scholar
  16. Elmasri, R., & Navathe, S. B. (2010). Fundamentals of database systems (6th ed.). Boston, MA, USA: Addison-Wesley Longman Publishing Co. Inc.MATHGoogle Scholar
  17. Fan, W., Geerts, F., & Zheng, L. (2012). View determinacy for preserving selected information in data transformations. Information Systems, 37(1), 1–12.CrossRefGoogle Scholar
  18. Feige, U. (1996). A threshold of ln n for approximating set cover. In: Proceedings of the Twenty-eighth Annual ACM Symposium on Theory of Computing, ACM, New York, NY, USA, STOC ’96, pp. 314–318.Google Scholar
  19. Flach, P. A., & Kakas, A. C. (2000). On the relation between abduction and inductive learning. In: D. M. Gabbay & R. Kruse (Eds.), Abductive reasoning and learning. Handbook of defeasible reasoning and uncertainty management systems (Vol. 4, pp. 1–33). Springer NetherlandsGoogle Scholar
  20. Gebser, M., Kaminski, R., Kaufmann, B., Schaub, T., Schneider, M., & Ziller, S. (2011a). A portfolio solver for answer set programming: Preliminary report. In: Delgrande, J., Faber, WT (Eds.) Proceedings of the Eleventh International Conference on Logic Programming and Nonmonotonic Reasoning (LPNMR’11), Springer-Verlag, Lecture Notes in Artificial Intelligence, vol 6645, pp 352–357Google Scholar
  21. Gebser, M., Kaufmann, B., Kaminski, R., Ostrowski, M., Schaub, T., & Schneider, M. (2011b). Potassco: The potsdam answer set solving collection. AI Communications, 24(2), 107–124.MathSciNetMATHGoogle Scholar
  22. Gebser, M., Kaminski, R., Kaufmann, B., & Schaub, T. (2012). Answer set solving in practice. Synthesis lectures on artificial intelligence and machine learning. San Rafael: Morgan and Claypool Publishers.Google Scholar
  23. Gebser, M., Kaufmann, B., Romero, J., Otero, R., Schaub, T., & Wanko, P. (2013). Domain-specific heuristics in answer set programming. In M. desJardins & M. L. Littman (Eds.), Association for the advancement of artificial intelligence. Palo Alto: AAAI Press.Google Scholar
  24. Geerts, F., Goethals, B., & Mielikäinen, T. (2004). Tiling databases. In: E. Suzuki & S. Arikawa (Eds.), Discovery science: 7th international conference, DS 2004, Springer Berlin Heidelberg pp. 278–289.Google Scholar
  25. Golub, G. H., & Van Loan, C. F. (1996). Matrix computations (3rd ed.). Baltimore, MD, USA: Johns Hopkins University Press.MATHGoogle Scholar
  26. Gopalan, P. K., & Blei, D. M. (2013). Efficient discovery of overlapping communities in massive networks. Proceedings of the National Academy of Sciences, 110(36), 14,534–14,539.MathSciNetCrossRefMATHGoogle Scholar
  27. Guns, T., Nijssen, S., & De Raedt, L. (2011). Itemset mining: A constraint programming perspective. Artificial Intelligence, 175(12–13), 1951–1983.MathSciNetCrossRefMATHGoogle Scholar
  28. Guns, T., Dries, A., Tack, G., Nijssen, S., & De Raedt, L. (2013a). Miningzinc: A modeling language for constraint-based mining. In: International Joint Conference on Artificial Intelligence, Beijing, ChinaGoogle Scholar
  29. Guns, T., Nijssen, S., & De Raedt, L. (2013b). k-pattern set mining under constraints. IEEE Transactions on Knowledge and Data Engineering, 25(2), 402–418.CrossRefGoogle Scholar
  30. Guns, T., Nijssen, S., & De Raedt, L. (2013c). k-pattern set mining under constraints. IEEE Transactions on Knowledge and Data Engineering, 25(2), 402–418.CrossRefGoogle Scholar
  31. Heath, I.J. (1971). Unacceptable file operations in a relational data base. In: Proceedings of the 1971 ACM SIGFIDET (Now SIGMOD) Workshop on Data Description, Access and Control, ACM, New York, NY, USA, SIGFIDET ’71, pp. 19–33.Google Scholar
  32. Hochbaum, D. S., & Pathria, A. (1998). Analysis of the greedy approach in problems of maximum k-coverage. Naval Research Logistics, 45, 615–627.MathSciNetCrossRefMATHGoogle Scholar
  33. Järvisalo, M. (2011). Itemset mining as a challenge application for answer set enumeration. In: Logic Programming and Non-Monotonic Reasoning, pp 304–310.Google Scholar
  34. Jones, T.H., Song, I.Y., & Park, E.K. (1996). Ternary relationship decomposition and higher normal form structures derived from entity relationship conceptual modeling. In: Proceedings of the 1996 ACM 24th Annual Conference on Computer Science, ACM, New York, NY, USA, CSC ’96, pp. 96–104.Google Scholar
  35. Kemp, C., Tenenbaum, J.B., Griffiths, T.L., Yamada, T., & Ueda, N. (2006). Learning systems of concepts with an infinite relational model. In: Proceedings of the 21th National Conference on Artificial Intelligence, AAAI Press, pp. 381–388.Google Scholar
  36. Kim, M., & Candan, K.S. (2011). Approximate tensor decomposition within a tensor-relational algebraic framework. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, ACM, New York, NY, USA, CIKM ’11, pp. 1737–1742.Google Scholar
  37. Knobbe, A.J., & Ho, E.K.Y. (2006). Pattern teams. In: Fürnkranz J, Scheffer T, Spiliopoulou M (eds) Principles and practice of knowledge discovery in databases, Springer, Lecture Notes in Computer Science, vol 4213, pp. 577–584.Google Scholar
  38. Koehler, H. (2007). Domination normal form: Decomposing relational database schemas. In: Proceedings of the Thirtieth Australasian Conference on Computer Science - Volume 62, Australian Computer Society, Inc., Darlinghurst, Australia, Australia, ACSC ’07, pp. 79–85.Google Scholar
  39. Kok, S., & Domingos, P. (2007). Statistical predicate invention. In: Proceedings of The 24th International Conference on Machine Learning, pp. 433–440.Google Scholar
  40. Leone, N., Pfeifer, G., Faber, W., Eiter, T., Gottlob, G., Perri, S., et al. (2002). The dlv system for knowledge representation and reasoning. ACM Transactions on Computational Logic, 7, 499–562.MathSciNetCrossRefMATHGoogle Scholar
  41. Li, T. (2005). A general model for clustering binary data. ACM SIGKDD (pp. 188–197). New York, NY, USA: ACM.Google Scholar
  42. Lifschitz, V. (2008). What is answer set programming? Association for the Advancement of Artificial Intelligence, 8, 1594–1597.Google Scholar
  43. Liu, B., Hsu, W., & Ma, Y. (1998). Integrating classification and association rule mining. In: ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 80–86.Google Scholar
  44. Lu, H., Vaidya, J., & Atluri, V. (2008). Optimal boolean matrix decomposition: Application to role engineering. In: IEEE 24th ICDE, pp. 297–306.Google Scholar
  45. Métivier, J.P., Boizumault, P., Crémilleux, B., Khiari, M., & Loudni, S. (2012), A constraint language for declarative pattern discovery. In: Ossowski, S., Lecca, P. (eds) Proceedings of the ACM Symposium on Applied Computing, pp. 119–125.Google Scholar
  46. Miettinen, P. (2009). Matrix decomposition methods for data mining: computational complexity and algorithms. Department of Computer Science, series of publications A, report A-2009-4, University of Helsinki 2009 (Ph.D. thesis, monograph).Google Scholar
  47. Miettinen, P. (2012). Dynamic boolean matrix factorizations. In: Zaki, M.J., Siebes, A., Yu, J.X., Goethals, B., Webb, G.I., Wu, X. (eds). Proceedings of International Conference on Data Mining, IEEE Computer Society, pp. 519–528.Google Scholar
  48. Miettinen, P., Mielikäinen, T., Gionis, A., Das, G., & Mannila, H. (2008). The discrete basis problem. IEEE Transactions on Knowledge and Data Engineering, 20(10), 1348–1362.CrossRefGoogle Scholar
  49. Miyata, Y., Furuhashi, T., & Uchikawa, Y. (1995). A study on fuzzy abductive inference. In: Proceedings of 1995 IEEE International Conference on Fuzzy Systems, Citeseer, vol. 1, pp. 337–342.Google Scholar
  50. Muggleton, S., & De Raedt, L. (1994). Inductive logic programming: Theory and methods. The Journal of Logic Programming, 19(20), 629–679.MathSciNetCrossRefMATHGoogle Scholar
  51. Muggleton, S. H., Lin, D., & Tamaddoni-Nezhad, A. (2015). Meta-interpretive learning of higher-order dyadic datalog: Predicate invention revisited. Machine Learning, 100(1), 49–73.MathSciNetCrossRefMATHGoogle Scholar
  52. Osherson, D., Stern, J., Wilkie, O., Stob, M., & Smith, E. (1991). Default probability. Cognitive Science, 15(2), 251–269.CrossRefGoogle Scholar
  53. Paatero, P., & Tapper, U. (1994). Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values. Environmetrics, 5(2), 111–126.CrossRefGoogle Scholar
  54. Paramonov, S., van Leeuwen, M., Denecker, M., & De Raedt, L. (2015). An exercise in declarative modeling for relational query mining. In: International Conference on Inductive Logic Programming, ILP, Kyoto, 20–22 August 2015, SpringerGoogle Scholar
  55. Singh, A.P., & Gordon, G.J. (2008). Relational learning via collective matrix factorization. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp. 650–658.Google Scholar
  56. Van den Broeck, G., & Darwiche, A. (2013). On the complexity and approximation of binary evidence in lifted inference. In: The Neural Information Processing Systems, pp. 2868–2876.Google Scholar
  57. Vojtás, P. (1999). Fuzzy logic abduction. In: Proceedings of the EUSFLAT-ESTYLF Joint Conference, Palma de Mallorca, Spain, September 22–25, 1999, pp. 319–322.Google Scholar

Copyright information

© The Author(s) 2017

Authors and Affiliations

  1. 1.Machine Learning, Department of Computer ScienceKU LeuvenLeuvenBelgium
  2. 2.Leiden Institute of Advanced Computer ScienceLeiden UniversityLeidenThe Netherlands

Personalised recommendations