Intelligent Data Granulation on Load: Improving Infobright’s Knowledge Grid

Ślęzak, Dominik; Kowalski, Marcin

doi:10.1007/978-3-642-10509-8_3

Dominik Ślęzak²⁰ &
Marcin Kowalski²¹

Part of the book series: Lecture Notes in Computer Science ((LNCCN,volume 5899))

Included in the following conference series:

International Conference on Future Generation Information Technology

1020 Accesses
7 Citations

Abstract

One of the major aspects of Infobright’s relational database technology is automatic decomposition of each of data tables onto Rough Rows, each consisting of 64K of original rows. Rough Rows are automatically annotated by Knowledge Nodes that represent compact information about the rows’ values. Query performance depends on the quality of Knowledge Nodes, i.e., their efficiency in minimizing the access to the compressed portions of data stored on disk, according to the specific query optimization procedures. We show how to implement the mechanism of organizing the incoming data into such Rough Rows that maximize the quality of the corresponding Knowledge Nodes. Given clear business-driven requirements, the implemented mechanism needs to be fully integrated with the data load process, causing no decrease in the data load speed. The performance gain resulting from better data organization is illustrated by some tests over our benchmark data. The differences between the proposed mechanism and some well-known procedures of database clustering or partitioning are discussed. The paper is a continuation of our patent application [22].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aggarwal, C.C. (ed.): Data Streams: Models and Algorithms. Springer, Heidelberg (2007)
MATH Google Scholar
Bhattacharjee, B., Padmanabhan, S., Malkemus, T., Lai, T., Cranston, L., Huras, M.: Efficient query processing for multi-dimensionally clustered tables in DB2. In: Proc. of VLDB, pp. 963–974 (2003)
Google Scholar
Bruno, N., Nehme, R.V.: Configuration-parametric query optimization for physical design tuning. In: Proc. of SIGMOD, pp. 941–952 (2008)
Google Scholar
Cannataro, M., Talia, D.: The knowledge grid. Commun. ACM 46(1), 89–93 (2003)
Article Google Scholar
Charikar, M., Chekuri, C., Feder, T., Motwani, R.: Incremental clustering and dynamic information retrieval. SIAM J. Comput. 33(6), 1417–1440 (2004)
Article MATH MathSciNet Google Scholar
Chaudhuri, S., Narasayya, V.R.: Self-tuning database systems: A decade of progress. In: Proc. of VLDB, pp. 3–14 (2007)
Google Scholar
Grondin, R., Fadeitchev, E., Zarouba, V.: Searchable archive. US Patent 7,243,110 (2007)
Google Scholar
Guha, S., Rastogi, R., Shim, K.: Cure: An efficient clustering algorithm for large databases. In: Proc. of SIGMOD, pp. 73–84 (1998)
Google Scholar
Hellerstein, J.M., Stonebraker, M., Hamilton, J.R.: Architecture of a database system. Foundations and Trends in Databases 1(2), 141–259 (2007)
Article Google Scholar
Infobright: http://www.infobright.com
Ioannidis, Y.E.: The history of histograms (abridged). In: Proc. of VLDB, pp. 19–30 (2003)
Google Scholar
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: A review. ACM Comput. Surv. 31(3), 264–323 (1999)
Article Google Scholar
Kerdprasop, N., Kerdprasop, K.: Semantic knowledge integration to support inductive query optimization. In: Song, I.-Y., Eder, J., Nguyen, T.M. (eds.) DaWaK 2007. LNCS, vol. 4654, pp. 157–169. Springer, Heidelberg (2007)
Chapter Google Scholar
Kersten, M.L.: The database architecture jigsaw puzzle. In: Proc. of ICDE, pp. 3–4 (2008)
Google Scholar
Kloesgen, W., Żytkow, J.M. (eds.): Handbook of Data Mining and Knowledge Discovery. Oxford University Press, Oxford (2002)
MATH Google Scholar
Metzger, J.K., Zane, B.M., Hinshaw, F.D.: Limiting scans of loosely ordered and/or grouped relations using nearly ordered maps. US Patent 6,973,452 (2005)
Google Scholar
MySQL manual: Storage engines, http://dev.mysql.com/doc/refman/6.0/en/storage-engines.html
Pawlak, Z., Skowron, A.: Rudiments of rough sets. Inf. Sci. 177(1), 3–27 (2007)
Article MATH MathSciNet Google Scholar
Pedrycz, W., Skowron, A., Kreinovich, V. (eds.): Handbook of Granular Computing. Wiley, Chichester (2008)
Google Scholar
Rasin, A., Zdonik, S., Trajman, O., Lawande, S.: Automatic vertical-database design. WO Patent Application, 2008/016877 A2 (2008)
Google Scholar
Ślęzak, D., Eastwood, V.: Data warehouse technology by Infobright. In: Proc. of SIGMOD, pp. 841–845 (2009)
Google Scholar
Ślęzak, D., Kowalski, M., Eastwood, V., Wróblewski, J.: Methods and systems for database organization. US Patent Application, 2009/0106210 A1 (2009)
Google Scholar
Ślęzak, D., Wróblewski, J., Eastwood, V., Synak, P.: Brighthouse: an analytic data warehouse for ad-hoc queries. PVLDB 1(2), 1337–1345 (2008)
Google Scholar
Stonebraker, M., Abadi, D., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O’Neil, E., O’Neil, P., Rasin, A., Tran, N., Zdonik, S.: CStore: A column oriented DBMS. In: Proc. of VLDB, pp. 553–564 (2005)
Google Scholar
Wojnarski, M., Apanowicz, C., Eastwood, V., Ślęzak, D., Synak, P., Wojna, A., Wróblewski, J.: Method and system for data compression in a relational database. US Patent Application, 2008/0071818 A1 (2008)
Google Scholar
Wróblewski, J., Apanowicz, C., Eastwood, V., Ślęzak, D., Synak, P., Wojna, A., Wojnarski, M.: Method and system for storing, organizing and processing data in a relational database. US Patent Application, 2008/0071748 A1 (2008)
Google Scholar
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: An efficient data clustering method for very large databases. In: Proc. of SIGMOD, pp. 103–114 (1996)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Mathematics, University of Warsaw, Banacha 2, 02-097, Warsaw, Poland
Dominik Ślęzak
Infobright Inc., Poland, Krzywickiego 34 pok. 219, 02-078, Warsaw, Poland
Marcin Kowalski

Authors

Dominik Ślęzak
View author publications
You can also search for this author in PubMed Google Scholar
Marcin Kowalski
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Sogang University, South Korea
Young-hoon Lee
Hannam University, Daejeon, South Korea
Tai-hoon Kim
National Chiao Tung University, Hsinchu, Taiwan
Wai-chi Fang
University of Warsaw & Infobright Inc., Poland
Dominik Ślęzak

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ślęzak, D., Kowalski, M. (2009). Intelligent Data Granulation on Load: Improving Infobright’s Knowledge Grid. In: Lee, Yh., Kim, Th., Fang, Wc., Ślęzak, D. (eds) Future Generation Information Technology. FGIT 2009. Lecture Notes in Computer Science, vol 5899. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10509-8_3

Download citation

DOI: https://doi.org/10.1007/978-3-642-10509-8_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-10508-1
Online ISBN: 978-3-642-10509-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics