Intelligent Data Granulation on Load: Improving Infobright’s Knowledge Grid

  • Dominik Ślęzak
  • Marcin Kowalski
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5899)

Abstract

One of the major aspects of Infobright’s relational database technology is automatic decomposition of each of data tables onto Rough Rows, each consisting of 64K of original rows. Rough Rows are automatically annotated by Knowledge Nodes that represent compact information about the rows’ values. Query performance depends on the quality of Knowledge Nodes, i.e., their efficiency in minimizing the access to the compressed portions of data stored on disk, according to the specific query optimization procedures. We show how to implement the mechanism of organizing the incoming data into such Rough Rows that maximize the quality of the corresponding Knowledge Nodes. Given clear business-driven requirements, the implemented mechanism needs to be fully integrated with the data load process, causing no decrease in the data load speed. The performance gain resulting from better data organization is illustrated by some tests over our benchmark data. The differences between the proposed mechanism and some well-known procedures of database clustering or partitioning are discussed. The paper is a continuation of our patent application [22].

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aggarwal, C.C. (ed.): Data Streams: Models and Algorithms. Springer, Heidelberg (2007)MATHGoogle Scholar
  2. 2.
    Bhattacharjee, B., Padmanabhan, S., Malkemus, T., Lai, T., Cranston, L., Huras, M.: Efficient query processing for multi-dimensionally clustered tables in DB2. In: Proc. of VLDB, pp. 963–974 (2003)Google Scholar
  3. 3.
    Bruno, N., Nehme, R.V.: Configuration-parametric query optimization for physical design tuning. In: Proc. of SIGMOD, pp. 941–952 (2008)Google Scholar
  4. 4.
    Cannataro, M., Talia, D.: The knowledge grid. Commun. ACM 46(1), 89–93 (2003)CrossRefGoogle Scholar
  5. 5.
    Charikar, M., Chekuri, C., Feder, T., Motwani, R.: Incremental clustering and dynamic information retrieval. SIAM J. Comput. 33(6), 1417–1440 (2004)MATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    Chaudhuri, S., Narasayya, V.R.: Self-tuning database systems: A decade of progress. In: Proc. of VLDB, pp. 3–14 (2007)Google Scholar
  7. 7.
    Grondin, R., Fadeitchev, E., Zarouba, V.: Searchable archive. US Patent 7,243,110 (2007)Google Scholar
  8. 8.
    Guha, S., Rastogi, R., Shim, K.: Cure: An efficient clustering algorithm for large databases. In: Proc. of SIGMOD, pp. 73–84 (1998)Google Scholar
  9. 9.
    Hellerstein, J.M., Stonebraker, M., Hamilton, J.R.: Architecture of a database system. Foundations and Trends in Databases 1(2), 141–259 (2007)CrossRefGoogle Scholar
  10. 10.
  11. 11.
    Ioannidis, Y.E.: The history of histograms (abridged). In: Proc. of VLDB, pp. 19–30 (2003)Google Scholar
  12. 12.
    Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: A review. ACM Comput. Surv. 31(3), 264–323 (1999)CrossRefGoogle Scholar
  13. 13.
    Kerdprasop, N., Kerdprasop, K.: Semantic knowledge integration to support inductive query optimization. In: Song, I.-Y., Eder, J., Nguyen, T.M. (eds.) DaWaK 2007. LNCS, vol. 4654, pp. 157–169. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  14. 14.
    Kersten, M.L.: The database architecture jigsaw puzzle. In: Proc. of ICDE, pp. 3–4 (2008)Google Scholar
  15. 15.
    Kloesgen, W., Żytkow, J.M. (eds.): Handbook of Data Mining and Knowledge Discovery. Oxford University Press, Oxford (2002)MATHGoogle Scholar
  16. 16.
    Metzger, J.K., Zane, B.M., Hinshaw, F.D.: Limiting scans of loosely ordered and/or grouped relations using nearly ordered maps. US Patent 6,973,452 (2005)Google Scholar
  17. 17.
  18. 18.
    Pawlak, Z., Skowron, A.: Rudiments of rough sets. Inf. Sci. 177(1), 3–27 (2007)MATHCrossRefMathSciNetGoogle Scholar
  19. 19.
    Pedrycz, W., Skowron, A., Kreinovich, V. (eds.): Handbook of Granular Computing. Wiley, Chichester (2008)Google Scholar
  20. 20.
    Rasin, A., Zdonik, S., Trajman, O., Lawande, S.: Automatic vertical-database design. WO Patent Application, 2008/016877 A2 (2008)Google Scholar
  21. 21.
    Ślęzak, D., Eastwood, V.: Data warehouse technology by Infobright. In: Proc. of SIGMOD, pp. 841–845 (2009)Google Scholar
  22. 22.
    Ślęzak, D., Kowalski, M., Eastwood, V., Wróblewski, J.: Methods and systems for database organization. US Patent Application, 2009/0106210 A1 (2009)Google Scholar
  23. 23.
    Ślęzak, D., Wróblewski, J., Eastwood, V., Synak, P.: Brighthouse: an analytic data warehouse for ad-hoc queries. PVLDB 1(2), 1337–1345 (2008)Google Scholar
  24. 24.
    Stonebraker, M., Abadi, D., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O’Neil, E., O’Neil, P., Rasin, A., Tran, N., Zdonik, S.: CStore: A column oriented DBMS. In: Proc. of VLDB, pp. 553–564 (2005)Google Scholar
  25. 25.
    Wojnarski, M., Apanowicz, C., Eastwood, V., Ślęzak, D., Synak, P., Wojna, A., Wróblewski, J.: Method and system for data compression in a relational database. US Patent Application, 2008/0071818 A1 (2008)Google Scholar
  26. 26.
    Wróblewski, J., Apanowicz, C., Eastwood, V., Ślęzak, D., Synak, P., Wojna, A., Wojnarski, M.: Method and system for storing, organizing and processing data in a relational database. US Patent Application, 2008/0071748 A1 (2008)Google Scholar
  27. 27.
    Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: An efficient data clustering method for very large databases. In: Proc. of SIGMOD, pp. 103–114 (1996)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Dominik Ślęzak
    • 1
  • Marcin Kowalski
    • 2
  1. 1.Institute of MathematicsUniversity of WarsawWarsawPoland
  2. 2.Infobright Inc., PolandWarsawPoland

Personalised recommendations