A Metadata Diagnostic Framework for a New Approximate Query Engine Working with Granulated Data Summaries

  • Agnieszka Chądzyńska-Krasowska
  • Sebastian Stawicki
  • Dominik ŚlęzakEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10313)


This paper refers to a new database engine that acquires and utilizes granulated data summaries for the purposes of fast approximate execution of analytical SQL statements. We focus on the task of creation of a relational metadata repository which enables the engine developers and users to investigate the collected data summaries independently from the engine itself. We discuss how the design of the considered repository evolved over time from both conceptual and software engineering perspectives, addressing the challenges of conversion and accessibility of the internal engine contents that can represent hundreds of terabytes of the original data. We show some scenarios of a usage of the obtained metadata repository for both diagnostic and analytical purposes. We pay a particular attention to the relationships of the discussed scenarios with the principles of rough sets – one of the theories that hugely influenced the presented solutions. We also report some empirical results obtained for relatively small fragments (\(100 \times 2^{16}\) rows each) of data sets coming from two organizations that use the considered new engine.


Big data Approximate query Data granulation Metadata Data visualization Software tools Business analytics 


  1. 1.
    Mozafari, B., Niu, N.: A handbook for building an approximate query engine. IEEE Data Eng. Bull. 38(3), 3–29 (2015)Google Scholar
  2. 2.
    Cormode, G., Garofalakis, M.N., Haas, P.J., Jermaine, C.: Synopses for massive data: samples, histograms, wavelets, sketches. Found. Trends Databases 4(1–3), 1–294 (2012)zbMATHGoogle Scholar
  3. 3.
    Pawlak, Z., Skowron, A.: Rough sets: some extensions. Inf. Sci. 177(1), 28–40 (2007)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Ślęzak, D., Synak, P., Wojna, A., Wróblewski, J.: Two database related interpretations of rough approximations: data organization and query execution. Fund. Inf. 127(1–4), 445–459 (2013)Google Scholar
  5. 5.
    Nguyen, H.S.: Approximate boolean reasoning: foundations and applications in data mining. In: Peters, J.F., Skowron, A. (eds.) Transactions on Rough Sets V. LNCS, vol. 4100, pp. 334–506. Springer, Heidelberg (2006). doi: 10.1007/11847465_16CrossRefGoogle Scholar
  6. 6.
    Neapolitan, R.E.: Learning Bayesian Networks. Prentice Hall, Upper Saddle River (2003)Google Scholar
  7. 7.
    Chądzyńska-Krasowska, A., Kowalski, M.: Quality of histograms as indicator of approximate query quality. In: Proceedings of FedCSIS 2016, pp. 9–15 (2016)Google Scholar
  8. 8.
    Kimball, R.: The Data Warehouse Lifecycle Toolkit. Wiley, Hoboken (2008)Google Scholar
  9. 9.
    Pagani, I., Liolios, K., Jansson, J., Chen, I.A., Smirnova, T., Nosrat, B., Markowitz, V.M., Kyrpides, N.: The Genomes OnLine Database (GOLD) v. 4: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res. 40(Database–Issue), 571–579 (2012)CrossRefGoogle Scholar
  10. 10.
    Chądzyńska-Krasowska, A., Betliński, P., Ślęzak, D.: Scalable machine learning with granulated data summaries: a case of feature selection. In: Proceedings of ISMIS 2017 (2017)Google Scholar
  11. 11.
    Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)CrossRefGoogle Scholar
  12. 12.
    Ganter, B., Meschke, C.: A formal concept analysis approach to rough data tables. In: Peters, J.F., Skowron, A., Sakai, H., Chakraborty, M.K., Slezak, D., Hassanien, A.E., Zhu, W. (eds.) Transactions on Rough Sets XIV. LNCS, vol. 6600, pp. 37–61. Springer, Heidelberg (2011). doi: 10.1007/978-3-642-21563-6_3CrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Agnieszka Chądzyńska-Krasowska
    • 1
  • Sebastian Stawicki
    • 2
  • Dominik Ślęzak
    • 2
    Email author
  1. 1.Polish-Japanese Academy of Information TechnologyWarsawPoland
  2. 2.Institute of InformaticsUniversity of WarsawWarsawPoland

Personalised recommendations