If we view data as a set of queries with an answer, what would a model be? In this paper we explore this question. The motivation is that there are more and more kinds of data that have to be analysed. Data of such a diverse nature that it is not easy to define precisely what data analysis actually is. Since all these different types of data share one characteristic – they can be queried – it seems natural to base a notion of data analysis on this characteristic.

The discussion in this paper is preliminary at best. There is no attempt made to connect the basic ideas to other – well known – foundations of data analysis. Rather, it just explores some simple consequences of its central tenet: data is a set of queries with their answer.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Calders, T., Goethals, B.: Mining All Non-derivable Frequent Itemsets. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 74–85. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  2. 2.
    Cilibrasi, R., Vitányi, P.: The google similarity distance. IEEE Transactions on Knowledge and Data Engineering 19(3) (2007)Google Scholar
  3. 3.
    Codd, E.F.: A relational model of data for large shared data banks. Communications of the ACM 13(6), 377–387 (1970)MATHCrossRefGoogle Scholar
  4. 4.
    Grünwald, P.D.: Minimum description length tutorial. In: Grünwald, P.D., Myung, I.J. (eds.) Advances in Minimum Description Length. MIT Press (2005)Google Scholar
  5. 5.
    Hand, D.J.: Statistics and the theory of measurement. Journal of the Royal Statistical Society. Series A 159(3), 445–492 (1996)Google Scholar
  6. 6.
    Mac Lane, S.: Categories for the Working Mathematician. Springer (1971)Google Scholar
  7. 7.
    Li, M., Vitányi, P.: An Introduction to Kolmogorov Complexity and its Applications. Springer (1993)Google Scholar
  8. 8.
    Lloyd, J.W.: Logic for Learning. Springer (2003)Google Scholar
  9. 9.
    Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. In: Data Mining and Knowledge Discovery, pp. 241–258 (1997)Google Scholar
  10. 10.
    Meijer, E., Bierman, G.M.: A co-relational model of data for large shared data banks. Commun. ACM 54(4), 49–58 (2011)CrossRefGoogle Scholar
  11. 11.
    Nies, A.: Computability and Randomness. Oxford University Press (2009)Google Scholar
  12. 12.
    Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web (1999)Google Scholar
  13. 13.
    Pei, J., Tung, A.K.H., Han, J.: Fault tolerant pattern mining: Problems and challenges. In: DMKD (2001)Google Scholar
  14. 14.
    Pierce, B.C.: Types and Programming Languages. MIT Press (2002)Google Scholar
  15. 15.
    Siebes, A., Kersten, R.: A structure function for transaction data. In: Proc. SIAM conf. on Data Mining (2011)Google Scholar
  16. 16.
    Siebes, A., Vreeken, J., van Leeuwen, M.: Item sets that compress. In: Proc. SIAM Conf. Data Mining, pp. 393–404 (2006)Google Scholar
  17. 17.
    Spivak, D.I.: Functorial data migration. Information and Computation 217, 31–51 (2012)MathSciNetMATHCrossRefGoogle Scholar
  18. 18.
    van Leeuwen, M., Vreeken, J., Siebes, A.: Compression Picks Item Sets That Matter. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 585–592. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  19. 19.
    Vreeken, J., Siebes, A.: Filling in the blanks - krimp minimization for missing data. In: Proceedings of the IEEE International Conference on Data Mining (2008)Google Scholar
  20. 20.
    Webb, G.I.: Self-sufficient itemsets: An approach to screening potentially interesting associations between items. ACM Transactions on Knowledge Discovery from Data 4(1) (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Arno Siebes
    • 1
  1. 1.Algorithmic Data Analysis GroupUniversiteit UtrechtThe Netherlands

Personalised recommendations