Abstract
One of the major aspects of Infobright’s relational database technology is automatic decomposition of each of data tables onto Rough Rows, each consisting of 64K of original rows. Rough Rows are automatically annotated by Knowledge Nodes that represent compact information about the rows’ values. Query performance depends on the quality of Knowledge Nodes, i.e., their efficiency in minimizing the access to the compressed portions of data stored on disk, according to the specific query optimization procedures. We show how to implement the mechanism of organizing the incoming data into such Rough Rows that maximize the quality of the corresponding Knowledge Nodes. Given clear business-driven requirements, the implemented mechanism needs to be fully integrated with the data load process, causing no decrease in the data load speed. The performance gain resulting from better data organization is illustrated by some tests over our benchmark data. The differences between the proposed mechanism and some well-known procedures of database clustering or partitioning are discussed. The paper is a continuation of our patent application [22].
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aggarwal, C.C. (ed.): Data Streams: Models and Algorithms. Springer, Heidelberg (2007)
Bhattacharjee, B., Padmanabhan, S., Malkemus, T., Lai, T., Cranston, L., Huras, M.: Efficient query processing for multi-dimensionally clustered tables in DB2. In: Proc. of VLDB, pp. 963–974 (2003)
Bruno, N., Nehme, R.V.: Configuration-parametric query optimization for physical design tuning. In: Proc. of SIGMOD, pp. 941–952 (2008)
Cannataro, M., Talia, D.: The knowledge grid. Commun. ACM 46(1), 89–93 (2003)
Charikar, M., Chekuri, C., Feder, T., Motwani, R.: Incremental clustering and dynamic information retrieval. SIAM J. Comput. 33(6), 1417–1440 (2004)
Chaudhuri, S., Narasayya, V.R.: Self-tuning database systems: A decade of progress. In: Proc. of VLDB, pp. 3–14 (2007)
Grondin, R., Fadeitchev, E., Zarouba, V.: Searchable archive. US Patent 7,243,110 (2007)
Guha, S., Rastogi, R., Shim, K.: Cure: An efficient clustering algorithm for large databases. In: Proc. of SIGMOD, pp. 73–84 (1998)
Hellerstein, J.M., Stonebraker, M., Hamilton, J.R.: Architecture of a database system. Foundations and Trends in Databases 1(2), 141–259 (2007)
Infobright: http://www.infobright.com
Ioannidis, Y.E.: The history of histograms (abridged). In: Proc. of VLDB, pp. 19–30 (2003)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: A review. ACM Comput. Surv. 31(3), 264–323 (1999)
Kerdprasop, N., Kerdprasop, K.: Semantic knowledge integration to support inductive query optimization. In: Song, I.-Y., Eder, J., Nguyen, T.M. (eds.) DaWaK 2007. LNCS, vol. 4654, pp. 157–169. Springer, Heidelberg (2007)
Kersten, M.L.: The database architecture jigsaw puzzle. In: Proc. of ICDE, pp. 3–4 (2008)
Kloesgen, W., Żytkow, J.M. (eds.): Handbook of Data Mining and Knowledge Discovery. Oxford University Press, Oxford (2002)
Metzger, J.K., Zane, B.M., Hinshaw, F.D.: Limiting scans of loosely ordered and/or grouped relations using nearly ordered maps. US Patent 6,973,452 (2005)
MySQL manual: Storage engines, http://dev.mysql.com/doc/refman/6.0/en/storage-engines.html
Pawlak, Z., Skowron, A.: Rudiments of rough sets. Inf. Sci. 177(1), 3–27 (2007)
Pedrycz, W., Skowron, A., Kreinovich, V. (eds.): Handbook of Granular Computing. Wiley, Chichester (2008)
Rasin, A., Zdonik, S., Trajman, O., Lawande, S.: Automatic vertical-database design. WO Patent Application, 2008/016877 A2 (2008)
Ślęzak, D., Eastwood, V.: Data warehouse technology by Infobright. In: Proc. of SIGMOD, pp. 841–845 (2009)
Ślęzak, D., Kowalski, M., Eastwood, V., Wróblewski, J.: Methods and systems for database organization. US Patent Application, 2009/0106210 A1 (2009)
Ślęzak, D., Wróblewski, J., Eastwood, V., Synak, P.: Brighthouse: an analytic data warehouse for ad-hoc queries. PVLDB 1(2), 1337–1345 (2008)
Stonebraker, M., Abadi, D., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O’Neil, E., O’Neil, P., Rasin, A., Tran, N., Zdonik, S.: CStore: A column oriented DBMS. In: Proc. of VLDB, pp. 553–564 (2005)
Wojnarski, M., Apanowicz, C., Eastwood, V., Ślęzak, D., Synak, P., Wojna, A., Wróblewski, J.: Method and system for data compression in a relational database. US Patent Application, 2008/0071818 A1 (2008)
Wróblewski, J., Apanowicz, C., Eastwood, V., Ślęzak, D., Synak, P., Wojna, A., Wojnarski, M.: Method and system for storing, organizing and processing data in a relational database. US Patent Application, 2008/0071748 A1 (2008)
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: An efficient data clustering method for very large databases. In: Proc. of SIGMOD, pp. 103–114 (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ślęzak, D., Kowalski, M. (2009). Intelligent Data Granulation on Load: Improving Infobright’s Knowledge Grid. In: Lee, Yh., Kim, Th., Fang, Wc., Ślęzak, D. (eds) Future Generation Information Technology. FGIT 2009. Lecture Notes in Computer Science, vol 5899. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10509-8_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-10509-8_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-10508-1
Online ISBN: 978-3-642-10509-8
eBook Packages: Computer ScienceComputer Science (R0)