## Abstract

Motivated by an analogy with matrix factorization, we introduce the problem of factorizing relational data. In matrix factorization, one is given a matrix and has to factorize it as a product of other matrices. In relational data factorization, the task is to factorize a given relation as a conjunctive query over other relations, i.e., as a combination of natural join operations. Given a conjunctive query and the input relation, the problem is to compute the *extensions* of the relations used in the query. Thus, relational data factorization is a relational analog of matrix factorization; it is also a form of *inverse* querying as one has to compute the relations in the query from the result of the query. The result of relational data factorization is neither necessarily unique nor required to be a lossless decomposition of the original relation. Therefore, constraints can be imposed on the desired factorization and a scoring function is used to determine its quality (often similarity to the original data). Relational data factorization is thus a constraint satisfaction and optimization problem. We show how answer set programming can be used for solving relational data factorization problems.

## Keywords

Answer set programming Inductive logic programming Pattern mining Relational data Factorization Data mining Declarative modeling## Notes

### Acknowledgements

We would like to thank Marc Denecker, Tias Guns, Benjamin Negrevergne, Siegfried Nijssen, and Behrouz Babaki for their help and assistance, and last but not least the ICON project (FP7-ICT-2011-C) and FWO for funding this work.

## References

- Aftrati, F., Das, G., Gionis, A., Mannila, H., Mielikäinen, T., & Tsaparas, P. (2012). Mining chains of relations. In D. E. Holmes & L. C. Jain (Eds.),
*Data mining: foundations and intelligent paradigms, intelligent systems reference library*(Vol. 24, pp. 217–246). Berlin, Heidelberg: Springer.CrossRefGoogle Scholar - Arimura, H., Medina, R., & Petit, J.M. (Eds.). (2012). In:
*Proceedings of the IEEE ICDM Workshop on Declarative Pattern Mining*.Google Scholar - Aykanat, C., Pinar, A., & Catalyurek, Ü. V. (2002). Permuting sparse rectangular matrices into block-diagonal form.
*SIAM Journal on Scientific Computing*,*25*, 1860–1879.MathSciNetCrossRefMATHGoogle Scholar - Bache, K., & Lichman, M. (2013). UCI machine learning repository. http://archive.ics.uci.edu/ml.
- Berzal, F., Cubero, J. C., Cuenca, F., & Medina, J. M. (2002). Relational decomposition through partial functional dependencies.
*Data and Knowledge Engineering*,*43*(2), 207–234.CrossRefMATHGoogle Scholar - Biskup, J., Paredaens, J., Schwentick, T., & den Bussche, J. V. (2004). Solving equations in the relational algebra.
*SIAM Journal on Computing*,*33*(5), 1052–1066.MathSciNetCrossRefMATHGoogle Scholar - Brewka, G., Eiter, T., & Truszczyński, M. (2011). Answer set programming at a glance.
*Communications of the ACM*,*54*(12), 92–103.CrossRefGoogle Scholar - Chang, M. W., Ratinov, L. A., Rizzolo, N., & Roth, D. (2008). Learning and inference with constraints.
*Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, AAAI*,*2008*, 1513–1518.Google Scholar - Codd, E. F. (1970). A relational model of data for large shared data banks.
*Communications of the ACM*,*13*(6), 377–387.CrossRefMATHGoogle Scholar - Date, C. J. (2006).
*Date on database: Writings 2000–2006*. Berkely, CA, USA: Apress.Google Scholar - De Raedt, L. (2008).
*Logical and relational learning*. Berlin: Cognitive Technologies, Springer.CrossRefMATHGoogle Scholar - De Raedt, L. (2012). Declarative modeling for machine learning and data mining. In:
*The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases*, pp 2–3.Google Scholar - De Raedt, L. (2015). Languages for learning and mining. In:
*Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, January 25–30, 2015*(pp. 4107–4111). USA.: Austin, Texas.Google Scholar - Denecker, M., & Kakas, A. (2002). Abduction in logic programming. In A. Kakas & F. Sadri (Eds.),
*Computational logic: Logic programming and beyond, lecture notes in computer science*(Vol. 2407, pp. 402–436). Berlin, Heidelberg: Springer.CrossRefGoogle Scholar - Eiter, T., Ianni, G., & Krennwallner, T. (2009). Answer set programming: A primer. In:
*5th International Reasoning Web Summer School (RW 2009)*, Brixen/Bressanone, Italy, August 30 – September 4, 2009, Springer, LNCS, vol 5689.Google Scholar - Elmasri, R., & Navathe, S. B. (2010).
*Fundamentals of database systems*(6th ed.). Boston, MA, USA: Addison-Wesley Longman Publishing Co. Inc.MATHGoogle Scholar - Fan, W., Geerts, F., & Zheng, L. (2012). View determinacy for preserving selected information in data transformations.
*Information Systems*,*37*(1), 1–12.CrossRefGoogle Scholar - Feige, U. (1996). A threshold of ln n for approximating set cover. In:
*Proceedings of the Twenty-eighth Annual ACM Symposium on Theory of Computing*, ACM, New York, NY, USA, STOC ’96, pp. 314–318.Google Scholar - Flach, P. A., & Kakas, A. C. (2000). On the relation between abduction and inductive learning. In: D. M. Gabbay & R. Kruse (Eds.), Abductive reasoning and learning. Handbook of defeasible reasoning and uncertainty management systems (Vol. 4, pp. 1–33). Springer NetherlandsGoogle Scholar
- Gebser, M., Kaminski, R., Kaufmann, B., Schaub, T., Schneider, M., & Ziller, S. (2011a). A portfolio solver for answer set programming: Preliminary report. In: Delgrande, J., Faber, WT (Eds.)
*Proceedings of the Eleventh International Conference on Logic Programming and Nonmonotonic Reasoning (LPNMR’11)*, Springer-Verlag, Lecture Notes in Artificial Intelligence, vol 6645, pp 352–357Google Scholar - Gebser, M., Kaufmann, B., Kaminski, R., Ostrowski, M., Schaub, T., & Schneider, M. (2011b). Potassco: The potsdam answer set solving collection.
*AI Communications*,*24*(2), 107–124.MathSciNetMATHGoogle Scholar - Gebser, M., Kaminski, R., Kaufmann, B., & Schaub, T. (2012).
*Answer set solving in practice. Synthesis lectures on artificial intelligence and machine learning*. San Rafael: Morgan and Claypool Publishers.Google Scholar - Gebser, M., Kaufmann, B., Romero, J., Otero, R., Schaub, T., & Wanko, P. (2013). Domain-specific heuristics in answer set programming. In M. desJardins & M. L. Littman (Eds.),
*Association for the advancement of artificial intelligence*. Palo Alto: AAAI Press.Google Scholar - Geerts, F., Goethals, B., & Mielikäinen, T. (2004). Tiling databases. In: E. Suzuki & S. Arikawa (Eds.),
*Discovery science: 7th international conference, DS 2004*, Springer Berlin Heidelberg pp. 278–289.Google Scholar - Golub, G. H., & Van Loan, C. F. (1996).
*Matrix computations*(3rd ed.). Baltimore, MD, USA: Johns Hopkins University Press.MATHGoogle Scholar - Gopalan, P. K., & Blei, D. M. (2013). Efficient discovery of overlapping communities in massive networks.
*Proceedings of the National Academy of Sciences*,*110*(36), 14,534–14,539.MathSciNetCrossRefMATHGoogle Scholar - Guns, T., Nijssen, S., & De Raedt, L. (2011). Itemset mining: A constraint programming perspective.
*Artificial Intelligence*,*175*(12–13), 1951–1983.MathSciNetCrossRefMATHGoogle Scholar - Guns, T., Dries, A., Tack, G., Nijssen, S., & De Raedt, L. (2013a). Miningzinc: A modeling language for constraint-based mining. In:
*International Joint Conference on Artificial Intelligence*, Beijing, ChinaGoogle Scholar - Guns, T., Nijssen, S., & De Raedt, L. (2013b). k-pattern set mining under constraints.
*IEEE Transactions on Knowledge and Data Engineering*,*25*(2), 402–418.CrossRefGoogle Scholar - Guns, T., Nijssen, S., & De Raedt, L. (2013c). k-pattern set mining under constraints.
*IEEE Transactions on Knowledge and Data Engineering*,*25*(2), 402–418.CrossRefGoogle Scholar - Heath, I.J. (1971). Unacceptable file operations in a relational data base. In:
*Proceedings of the 1971 ACM SIGFIDET (Now SIGMOD) Workshop on Data Description, Access and Control*, ACM, New York, NY, USA, SIGFIDET ’71, pp. 19–33.Google Scholar - Hochbaum, D. S., & Pathria, A. (1998). Analysis of the greedy approach in problems of maximum k-coverage.
*Naval Research Logistics*,*45*, 615–627.MathSciNetCrossRefMATHGoogle Scholar - Järvisalo, M. (2011). Itemset mining as a challenge application for answer set enumeration. In:
*Logic Programming and Non-Monotonic Reasoning*, pp 304–310.Google Scholar - Jones, T.H., Song, I.Y., & Park, E.K. (1996). Ternary relationship decomposition and higher normal form structures derived from entity relationship conceptual modeling. In:
*Proceedings of the 1996 ACM 24th Annual Conference on Computer Science*, ACM, New York, NY, USA, CSC ’96, pp. 96–104.Google Scholar - Kemp, C., Tenenbaum, J.B., Griffiths, T.L., Yamada, T., & Ueda, N. (2006). Learning systems of concepts with an infinite relational model. In:
*Proceedings of the 21th National Conference on Artificial Intelligence*, AAAI Press, pp. 381–388.Google Scholar - Kim, M., & Candan, K.S. (2011). Approximate tensor decomposition within a tensor-relational algebraic framework. In:
*Proceedings of the 20th ACM International Conference on Information and Knowledge Management*, ACM, New York, NY, USA, CIKM ’11, pp. 1737–1742.Google Scholar - Knobbe, A.J., & Ho, E.K.Y. (2006). Pattern teams. In: Fürnkranz J, Scheffer T, Spiliopoulou M (eds)
*Principles and practice of knowledge discovery in databases*, Springer, Lecture Notes in Computer Science, vol 4213, pp. 577–584.Google Scholar - Koehler, H. (2007). Domination normal form: Decomposing relational database schemas. In:
*Proceedings of the Thirtieth Australasian Conference on Computer Science - Volume 62*, Australian Computer Society, Inc., Darlinghurst, Australia, Australia, ACSC ’07, pp. 79–85.Google Scholar - Kok, S., & Domingos, P. (2007). Statistical predicate invention. In:
*Proceedings of The 24th International Conference on Machine Learning*, pp. 433–440.Google Scholar - Leone, N., Pfeifer, G., Faber, W., Eiter, T., Gottlob, G., Perri, S., et al. (2002). The dlv system for knowledge representation and reasoning.
*ACM Transactions on Computational Logic*,*7*, 499–562.MathSciNetCrossRefMATHGoogle Scholar - Li, T. (2005). A general model for clustering binary data.
*ACM SIGKDD*(pp. 188–197). New York, NY, USA: ACM.Google Scholar - Lifschitz, V. (2008). What is answer set programming?
*Association for the Advancement of Artificial Intelligence*,*8*, 1594–1597.Google Scholar - Liu, B., Hsu, W., & Ma, Y. (1998). Integrating classification and association rule mining. In:
*ACM SIGKDD Conference on Knowledge Discovery and Data Mining*, pp. 80–86.Google Scholar - Lu, H., Vaidya, J., & Atluri, V. (2008). Optimal boolean matrix decomposition: Application to role engineering. In:
*IEEE 24th ICDE*, pp. 297–306.Google Scholar - Métivier, J.P., Boizumault, P., Crémilleux, B., Khiari, M., & Loudni, S. (2012), A constraint language for declarative pattern discovery. In: Ossowski, S., Lecca, P. (eds)
*Proceedings of the ACM Symposium on Applied Computing*, pp. 119–125.Google Scholar - Miettinen, P. (2009).
*Matrix decomposition methods for data mining: computational complexity and algorithms*. Department of Computer Science, series of publications A, report A-2009-4, University of Helsinki 2009 (Ph.D. thesis, monograph).Google Scholar - Miettinen, P. (2012). Dynamic boolean matrix factorizations. In: Zaki, M.J., Siebes, A., Yu, J.X., Goethals, B., Webb, G.I., Wu, X. (eds).
*Proceedings of International Conference on Data Mining*, IEEE Computer Society, pp. 519–528.Google Scholar - Miettinen, P., Mielikäinen, T., Gionis, A., Das, G., & Mannila, H. (2008). The discrete basis problem.
*IEEE Transactions on Knowledge and Data Engineering*,*20*(10), 1348–1362.CrossRefGoogle Scholar - Miyata, Y., Furuhashi, T., & Uchikawa, Y. (1995). A study on fuzzy abductive inference. In:
*Proceedings of 1995 IEEE International Conference on Fuzzy Systems*, Citeseer, vol. 1, pp. 337–342.Google Scholar - Muggleton, S., & De Raedt, L. (1994). Inductive logic programming: Theory and methods.
*The Journal of Logic Programming*,*19*(20), 629–679.MathSciNetCrossRefMATHGoogle Scholar - Muggleton, S. H., Lin, D., & Tamaddoni-Nezhad, A. (2015). Meta-interpretive learning of higher-order dyadic datalog: Predicate invention revisited.
*Machine Learning*,*100*(1), 49–73.MathSciNetCrossRefMATHGoogle Scholar - Osherson, D., Stern, J., Wilkie, O., Stob, M., & Smith, E. (1991). Default probability.
*Cognitive Science*,*15*(2), 251–269.CrossRefGoogle Scholar - Paatero, P., & Tapper, U. (1994). Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values.
*Environmetrics*,*5*(2), 111–126.CrossRefGoogle Scholar - Paramonov, S., van Leeuwen, M., Denecker, M., & De Raedt, L. (2015). An exercise in declarative modeling for relational query mining. In:
*International Conference on Inductive Logic Programming*, ILP, Kyoto, 20–22 August 2015, SpringerGoogle Scholar - Singh, A.P., & Gordon, G.J. (2008). Relational learning via collective matrix factorization. In:
*Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining*, ACM, pp. 650–658.Google Scholar - Van den Broeck, G., & Darwiche, A. (2013). On the complexity and approximation of binary evidence in lifted inference. In:
*The Neural Information Processing Systems*, pp. 2868–2876.Google Scholar - Vojtás, P. (1999). Fuzzy logic abduction. In:
*Proceedings of the EUSFLAT-ESTYLF Joint Conference*, Palma de Mallorca, Spain, September 22–25, 1999, pp. 319–322.Google Scholar