Skip to main content

Intelligent Data Granulation on Load: Improving Infobright’s Knowledge Grid

  • Conference paper
Future Generation Information Technology (FGIT 2009)

Part of the book series: Lecture Notes in Computer Science ((LNCCN,volume 5899))

Included in the following conference series:

Abstract

One of the major aspects of Infobright’s relational database technology is automatic decomposition of each of data tables onto Rough Rows, each consisting of 64K of original rows. Rough Rows are automatically annotated by Knowledge Nodes that represent compact information about the rows’ values. Query performance depends on the quality of Knowledge Nodes, i.e., their efficiency in minimizing the access to the compressed portions of data stored on disk, according to the specific query optimization procedures. We show how to implement the mechanism of organizing the incoming data into such Rough Rows that maximize the quality of the corresponding Knowledge Nodes. Given clear business-driven requirements, the implemented mechanism needs to be fully integrated with the data load process, causing no decrease in the data load speed. The performance gain resulting from better data organization is illustrated by some tests over our benchmark data. The differences between the proposed mechanism and some well-known procedures of database clustering or partitioning are discussed. The paper is a continuation of our patent application [22].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal, C.C. (ed.): Data Streams: Models and Algorithms. Springer, Heidelberg (2007)

    MATH  Google Scholar 

  2. Bhattacharjee, B., Padmanabhan, S., Malkemus, T., Lai, T., Cranston, L., Huras, M.: Efficient query processing for multi-dimensionally clustered tables in DB2. In: Proc. of VLDB, pp. 963–974 (2003)

    Google Scholar 

  3. Bruno, N., Nehme, R.V.: Configuration-parametric query optimization for physical design tuning. In: Proc. of SIGMOD, pp. 941–952 (2008)

    Google Scholar 

  4. Cannataro, M., Talia, D.: The knowledge grid. Commun. ACM 46(1), 89–93 (2003)

    Article  Google Scholar 

  5. Charikar, M., Chekuri, C., Feder, T., Motwani, R.: Incremental clustering and dynamic information retrieval. SIAM J. Comput. 33(6), 1417–1440 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  6. Chaudhuri, S., Narasayya, V.R.: Self-tuning database systems: A decade of progress. In: Proc. of VLDB, pp. 3–14 (2007)

    Google Scholar 

  7. Grondin, R., Fadeitchev, E., Zarouba, V.: Searchable archive. US Patent 7,243,110 (2007)

    Google Scholar 

  8. Guha, S., Rastogi, R., Shim, K.: Cure: An efficient clustering algorithm for large databases. In: Proc. of SIGMOD, pp. 73–84 (1998)

    Google Scholar 

  9. Hellerstein, J.M., Stonebraker, M., Hamilton, J.R.: Architecture of a database system. Foundations and Trends in Databases 1(2), 141–259 (2007)

    Article  Google Scholar 

  10. Infobright: http://www.infobright.com

  11. Ioannidis, Y.E.: The history of histograms (abridged). In: Proc. of VLDB, pp. 19–30 (2003)

    Google Scholar 

  12. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: A review. ACM Comput. Surv. 31(3), 264–323 (1999)

    Article  Google Scholar 

  13. Kerdprasop, N., Kerdprasop, K.: Semantic knowledge integration to support inductive query optimization. In: Song, I.-Y., Eder, J., Nguyen, T.M. (eds.) DaWaK 2007. LNCS, vol. 4654, pp. 157–169. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  14. Kersten, M.L.: The database architecture jigsaw puzzle. In: Proc. of ICDE, pp. 3–4 (2008)

    Google Scholar 

  15. Kloesgen, W., Żytkow, J.M. (eds.): Handbook of Data Mining and Knowledge Discovery. Oxford University Press, Oxford (2002)

    MATH  Google Scholar 

  16. Metzger, J.K., Zane, B.M., Hinshaw, F.D.: Limiting scans of loosely ordered and/or grouped relations using nearly ordered maps. US Patent 6,973,452 (2005)

    Google Scholar 

  17. MySQL manual: Storage engines, http://dev.mysql.com/doc/refman/6.0/en/storage-engines.html

  18. Pawlak, Z., Skowron, A.: Rudiments of rough sets. Inf. Sci. 177(1), 3–27 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  19. Pedrycz, W., Skowron, A., Kreinovich, V. (eds.): Handbook of Granular Computing. Wiley, Chichester (2008)

    Google Scholar 

  20. Rasin, A., Zdonik, S., Trajman, O., Lawande, S.: Automatic vertical-database design. WO Patent Application, 2008/016877 A2 (2008)

    Google Scholar 

  21. Ślęzak, D., Eastwood, V.: Data warehouse technology by Infobright. In: Proc. of SIGMOD, pp. 841–845 (2009)

    Google Scholar 

  22. Ślęzak, D., Kowalski, M., Eastwood, V., Wróblewski, J.: Methods and systems for database organization. US Patent Application, 2009/0106210 A1 (2009)

    Google Scholar 

  23. Ślęzak, D., Wróblewski, J., Eastwood, V., Synak, P.: Brighthouse: an analytic data warehouse for ad-hoc queries. PVLDB 1(2), 1337–1345 (2008)

    Google Scholar 

  24. Stonebraker, M., Abadi, D., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O’Neil, E., O’Neil, P., Rasin, A., Tran, N., Zdonik, S.: CStore: A column oriented DBMS. In: Proc. of VLDB, pp. 553–564 (2005)

    Google Scholar 

  25. Wojnarski, M., Apanowicz, C., Eastwood, V., Ślęzak, D., Synak, P., Wojna, A., Wróblewski, J.: Method and system for data compression in a relational database. US Patent Application, 2008/0071818 A1 (2008)

    Google Scholar 

  26. Wróblewski, J., Apanowicz, C., Eastwood, V., Ślęzak, D., Synak, P., Wojna, A., Wojnarski, M.: Method and system for storing, organizing and processing data in a relational database. US Patent Application, 2008/0071748 A1 (2008)

    Google Scholar 

  27. Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: An efficient data clustering method for very large databases. In: Proc. of SIGMOD, pp. 103–114 (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ślęzak, D., Kowalski, M. (2009). Intelligent Data Granulation on Load: Improving Infobright’s Knowledge Grid. In: Lee, Yh., Kim, Th., Fang, Wc., Ślęzak, D. (eds) Future Generation Information Technology. FGIT 2009. Lecture Notes in Computer Science, vol 5899. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10509-8_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-10509-8_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-10508-1

  • Online ISBN: 978-3-642-10509-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics