Memorizing Transactional Databases Compressively in Deep Neural Networks for Efficient Itemset Support Queries

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10635)

Abstract

Can a deep neural network memorize a database? Though deep artificial neural networks are remarkable for large memory capacity that makes fitting any dataset possible, memorizing a database is a novel learning task unlike other popular tasks which intrinsically model mappings rather than “memorize” information internally. We give a positive answer to the question by showing that through training with maximal/minimal and frequent/infrequent patterns of a transactional database, a dynamically constructed deep net can support random itemset support queries with relatively high precision in regard to data compression ratio. Due to the compressive memorization, the amount of transactions in the database becomes irrelevant to the query time cost in our efficient method. We further discuss the potential interpretation of learnt database representation by analyzing corresponding statistical features of the database and activation patterns of the neural network.

Keywords

Transactional database Artificial neural network Approximation query Pattern mining Data compression 

Notes

Acknowledgments

This work was supported by JST CREST Grant Number JPMJCR1304, JSPS KAKENHI Grant Numbers JP16H01836, and JP16K12428.

References

  1. 1.
    Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)MATHGoogle Scholar
  2. 2.
    Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O.: Understanding deep learning requires rethinking generalization. In: 5th International Conference on Learning Representations, ICLR, Toulon, France (2017)Google Scholar
  3. 3.
    Krueger, D., Ballas, N., Jastrzebski, S., Arpit1, D., Kanwal, M.S., Maharaj, T., Bengio, E., Fischer, A., Courville, A.: Deep nets don’t learn via memorization. In: Workshop track of the 5th International Conference on Learning Representations, ICLR, Toulon, France (2017)Google Scholar
  4. 4.
    Salakhutdinov, R., Hinton, G.: Semantic hashing. Int. J. Approximate Reasoning 50(7), 969–978 (2009)CrossRefGoogle Scholar
  5. 5.
    Norouzi, M., Fleet, D.: Minimal loss hashing for compact binary codes. In: 28th International Conference in Machine Learning, ICML, Washington (2011)Google Scholar
  6. 6.
    Boulicaut, J.F., Bykowski, A., Rigotti, C.: Approximation of frequency queris by means of free-sets. In: Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery, Freiburg, Germany, pp. 75–85 (2000)Google Scholar
  7. 7.
    Burdick, D., Calimlim, M., Gehrke, J.: MAFIA: A maximal frequent itemset algorithm for transactional databases. In: Proceedings of the 17th International Conference on Data Engineering, pp. 443–452. IEEE Press, Washington (2001)Google Scholar
  8. 8.
    Cagliero, L., Garza, P.: Infrequent weighted itemset mining using frequent pattern growth. IEEE Trans. Knowl. Data Eng. 26(4), 903–915 (2014)CrossRefGoogle Scholar
  9. 9.
    Mundra, A., Singh, A., Tomar, P.: Incremental frequent pattern mining: a recent review. Int. J. Eng. Res. Technol. 1(8) (2012)Google Scholar
  10. 10.
    Dong, W., Jiang, H., Chen, L., Liu, G.: Incremental updating algorithm for infrequent itemsets on weighted condition. In: International Conference on Computer Design and Applications, ICCDA, Qinhuangdao, China (2010)Google Scholar
  11. 11.
    Lei, J.: Dynamic structure neural network for stable adaptive control of non-linear systems. IEEE Trans. Neural Networks 7(5), 1151–1167 (1996)CrossRefGoogle Scholar
  12. 12.
    Dong, Y., Su, H., Zhu, J., Zhang, B.: Improving Interpretability of Deep Neural Networks with Semantic Information. CoRR, http://arxiv.org/abs/1703.04096 (2017)
  13. 13.
    Tan, S., Sim, K., Gales, M.: Improving the interpretability of deep neural networks with stimulated learning. In: IEEE Workshop on Automatic Speech Recognition and Understanding, Scottsdale, USA (2015)Google Scholar
  14. 14.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Data Bases, VLDB, pp. 487–499. Morgan Kaufmann, San Francisco (1994)Google Scholar
  15. 15.
    Cooper, C., Zito, M.: Realistic synthetic data for testing association rule mining algorithms for market basket databases. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) PKDD 2007. LNCS, vol. 4702, pp. 398–405. Springer, Heidelberg (2007). doi: 10.1007/978-3-540-74976-9_39 CrossRefGoogle Scholar
  16. 16.
    Ohsawa, Y., Kido, H., Hayashi, T., Liu, C.: Data jackets for synthesizing values in the market of data. Procedia Comput. Sci. 22(1), 709–716 (2013)CrossRefGoogle Scholar
  17. 17.
    IBM: Quest Synthetic Data Generator (2009). http://www.almaden.ibm.com

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.The University of TokyoTokyoJapan

Personalised recommendations