If we view data as a set of queries with an answer, what would a model be? In this paper we explore this question. The motivation is that there are more and more kinds of data that have to be analysed. Data of such a diverse nature that it is not easy to define precisely what data analysis actually is. Since all these different types of data share one characteristic – they can be queried – it seems natural to base a notion of data analysis on this characteristic.

The discussion in this paper is preliminary at best. There is no attempt made to connect the basic ideas to other – well known – foundations of data analysis. Rather, it just explores some simple consequences of its central tenet: data is a set of queries with their answer.


Structure Function Cover Function Type Theory Pattern Mining Minimum Description Length 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Calders, T., Goethals, B.: Mining All Non-derivable Frequent Itemsets. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 74–85. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  2. 2.
    Cilibrasi, R., Vitányi, P.: The google similarity distance. IEEE Transactions on Knowledge and Data Engineering 19(3) (2007)Google Scholar
  3. 3.
    Codd, E.F.: A relational model of data for large shared data banks. Communications of the ACM 13(6), 377–387 (1970)zbMATHCrossRefGoogle Scholar
  4. 4.
    Grünwald, P.D.: Minimum description length tutorial. In: Grünwald, P.D., Myung, I.J. (eds.) Advances in Minimum Description Length. MIT Press (2005)Google Scholar
  5. 5.
    Hand, D.J.: Statistics and the theory of measurement. Journal of the Royal Statistical Society. Series A 159(3), 445–492 (1996)Google Scholar
  6. 6.
    Mac Lane, S.: Categories for the Working Mathematician. Springer (1971)Google Scholar
  7. 7.
    Li, M., Vitányi, P.: An Introduction to Kolmogorov Complexity and its Applications. Springer (1993)Google Scholar
  8. 8.
    Lloyd, J.W.: Logic for Learning. Springer (2003)Google Scholar
  9. 9.
    Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. In: Data Mining and Knowledge Discovery, pp. 241–258 (1997)Google Scholar
  10. 10.
    Meijer, E., Bierman, G.M.: A co-relational model of data for large shared data banks. Commun. ACM 54(4), 49–58 (2011)CrossRefGoogle Scholar
  11. 11.
    Nies, A.: Computability and Randomness. Oxford University Press (2009)Google Scholar
  12. 12.
    Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web (1999)Google Scholar
  13. 13.
    Pei, J., Tung, A.K.H., Han, J.: Fault tolerant pattern mining: Problems and challenges. In: DMKD (2001)Google Scholar
  14. 14.
    Pierce, B.C.: Types and Programming Languages. MIT Press (2002)Google Scholar
  15. 15.
    Siebes, A., Kersten, R.: A structure function for transaction data. In: Proc. SIAM conf. on Data Mining (2011)Google Scholar
  16. 16.
    Siebes, A., Vreeken, J., van Leeuwen, M.: Item sets that compress. In: Proc. SIAM Conf. Data Mining, pp. 393–404 (2006)Google Scholar
  17. 17.
    Spivak, D.I.: Functorial data migration. Information and Computation 217, 31–51 (2012)MathSciNetzbMATHCrossRefGoogle Scholar
  18. 18.
    van Leeuwen, M., Vreeken, J., Siebes, A.: Compression Picks Item Sets That Matter. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 585–592. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  19. 19.
    Vreeken, J., Siebes, A.: Filling in the blanks - krimp minimization for missing data. In: Proceedings of the IEEE International Conference on Data Mining (2008)Google Scholar
  20. 20.
    Webb, G.I.: Self-sufficient itemsets: An approach to screening potentially interesting associations between items. ACM Transactions on Knowledge Discovery from Data 4(1) (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Arno Siebes
    • 1
  1. 1.Algorithmic Data Analysis GroupUniversiteit UtrechtThe Netherlands

Personalised recommendations