Abstract
Nowadays a lot of research efforts have been done in the context of approximate query answering techniques in OLAP, which pursue the idea of compressing the data cube in order to obtain approximate answers to OLAP queries whose (approximation) error is tolerable in real-life Business Intelligence scenarios. In this chapter, we introduce a novel approximate OLAP query answering technique that is based on an innovative analytical interpretation of multidimensional data cubes, and the use of the well-known Least Squares Approximation (LSA) method in order to build the so-called analytical synopsis data structure Δ-Syn. The benefits deriving from adopting Δ-Syn within the core layer of modern OLAP server platforms is confirmed by a comprehensive experimental evaluation of the performance of Δ-Syn on both synthetic, benchmark and real-life data cubes that clearly shows the superiority of Δ-Syn in comparison with state-of-the-art approximate query answering techniques like histograms, wavelets and random sampling.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
The AQUA Project Home Page, http://www.bell-labs.com/project/aqua/
Acharya, S., Gibbons, P.B., Poosala, V.: AQUA: A Fast Decision Support System Using Approximate Query Answers. In: Proceedings of the 25th International Conference on Very Large Data Bases, pp. 754–757 (1999)
Acharya, S., Gibbons, P.B., Poosala, V., Ramaswamy, S.: Join Synopses for Approximate Query Answering. In: Proceedings of the 1999 ACM International Conference on Management of Data, pp. 275–286 (1999)
Antoshenkov, G., Ziauddin, M.: Query Processing and Optimization in Oracle Rdb. Very Large Data Bases Journal 5(4), 229–237 (1996)
The Data Exploration Project Home Page, http://research.microsoft.com/dmx/approximateqp/
Babcock, B., Chaudhuri, S., Das, G.: Dynamic Sample Selection for Approximate Query Answers. In: Proceedings of the 2003 ACM International Conference on Management of Data, pp. 539–550 (2003)
Bayardo Jr., R.J., Miranker, D.P.: Processing Queries for First Few Answers. In: Proceedings of the 5th ACM International Conference on Information and Knowledge Management, pp. 45–52 (1996)
Bonnet, P., Gehrke, J.E., Seshadri, P.: Towards Sensor Database Systems. In: Proceedings of the 2nd International Conference on Mobile Data Management, pp. 3–14 (2001)
Bruno, N., Chaudhuri, S., Gravano, L.: STHoles: A Multidimensional Workload-Aware Histogram. In: Proceedings of the 2001 ACM International Conference on Management of Data, pp. 211–222 (2001)
Buccafurri, F., Furfaro, F., Saccá, D., Sirangelo, C.: A Quad-Tree Based Multiresolution Approach for Two-Dimensional Summary Data. In: Proceedings of the 15th IEEE International Conference on Scientific and Statistical Database Management, pp. 127–140 (2003)
Chakrabarti, K., Garofalakis, M., Rastogi, R., Shim, K.: Approximate Query Processing Using Wavelets. In: Proceedings of the 26th International Conference on Very Large Data Bases, pp. 111–122 (2000)
Chaudhuri, S., Das, G., Datar, M., Motwani, R., Rastogi, R.: Overcoming Limitations of Sampling for Aggregation Queries. In: Proceedings of the 17th IEEE International Conference on Data Engineering, pp. 534–542 (2001)
Colliat, G.: OLAP, Relational, and Multidimensional Database Systems. ACM SIGMOD Record 25(3), 64–69 (1996)
CONTROL - Continuous Output and Navigation Technology with Refinement On-Line, http://control.cs.berkeley.edu
Data Reduction and Knowledge Extraction for On-Line Data Warehouses, http://www.research.att.com/~drknow/
Deligiannakis, A., Roussopoulos, N.: Extended Wavelets for Multiple Measures. In: Proceedings of the 2003 ACM International Conference on Management of Data, pp. 229–240 (2003)
Deshpande, P.M., Ramasamy, K., Shukla, A., Naughton, J.F.: Caching Multidimensional Queries using Chuncks. In: Proceedings of the 1998 ACM International Conference on Management of Data, pp. 259–270 (1998)
The Forest CoverType Data Set, http://kdd.ics.uci.edu/databases/covertype/covertype.html
Furfaro, F., Mazzeo, G.M., Saccá, D., Sirangelo, C.: A New Histogram-Based Technique for Compressing Multidimensional Data. In: Proceedings of the 20th Annual ACM Symposium on Applied Computing, pp. 598–603 (2005)
Ganti, V., Lee, M., Ramakrishnan, R.: ICICLES: Self-Tuning Samples for Approximate Query Answering. In: Proceedings of the 26th International Conference on Very Large Data Bases, pp. 176–187 (2000)
Gibbons, P.B., Matias, Y.: New Sampling-Based Summary Statistics for Improving Approximate Query Answers. In: Proceedings of the 1998 ACM International Conference on Management of Data, pp. 331–342 (1998)
Gibbons, P.B., Matias, Y., Poosala, V.: Fast Incremental Maintenance of Approximate Histograms. In: Proceedings of the 23rd International Conference on Very Large Data Bases, pp. 466–475 (1997)
Gray, J., Bosworth, A., Layman, A., Pirahesh, H.: Data Cube: a Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals. In: Proceeding of the 12th IEEE International Conference on Data Engineering, pp. 152–159 (1996)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2000)
Hellerstein, J.M., Haas, P.J., Wang, H.J.: Online Aggregation. In: Proceedings of the 1997 ACM International Conference on Management of Data, pp. 171–182 (1997)
Ho, C.-T., Agrawal, R., Megiddo, N., Srikant, R.: Range Queries in OLAP Data Cubes. In: Proceedings of the 1997 ACM International Conference on Management of Data, pp. 73–88 (1997)
Hoeffding, W.: Probability Inequalities for Sums of Bounded Random Variables. Journal of the American Statistical Association 58(301), 13–30 (1963)
Ioannidis, Y.E., Poosala, V.: Histogram-based Approximation of Set-Valued Query Answers. In: Proceedings of the 25th International Conference on Very Large Data Bases, pp. 174–185 (1999)
Jagadish, H.V., Koudas, N., Muthukrishnan, S., Poosala, V., Sevcik, K., Suel, T.: Optimal Histograms with Quality Guarantees. In: Proceedings of the 24th International Conference on Very Large Data Bases, pp. 275–286 (1998)
Karayannidis, N., Sellis, T.: SISYPHUS: the Implementation of a Chunk-Based Storage Manager for OLAP. Data & Knowledge Engineering 45(2), 155–180 (2003)
Kenney, J.F., Keeping, E.S.: Skewness. In: Mathematics of Statistics, Pt. 1, Van Nostrand, pp. 100–101 (1962)
Khanna, S., Muthukrishnan, S., Paterson, M.: On Approximating Rectangle Tiling and Packing. In: Proceedings of 9th ACM SIAM Symposium on Discrete Algorithms, pp. 384–393 (1998)
Koudas, N., Muthukrishnan, S., Srivastava, D.: Optimal Histograms for Hierarchical Range Queries. In: Proceedings of the 9th ACM Symposium on Principles of Database Systems, pp. 196–204 (2000)
Matias, Y., Vitter, J.S., Wang, M.: Wavelet-Based Histograms for Selectivity Estimation. In: Proceedings of the 1998 ACM International Conference on Management of Data, pp. 448–459 (1998)
Muthukrishnan, S., Poosala, V., Suel, T.: On Rectangular Partitioning in Two Dimensions: Algorithms, Complexity, and Applications. In: Proceedings of the 7th IEEE International Conference on Database Theory, pp. 236–256 (1999)
The NEMESIS Project: Warehousing and Analysis of Network-Management Data, http://www.bell-labs.com/project/nemesis/
Papoulis, A.: Probability, Random Variables, and Stochastic Processes. McGraw-Hill, New York (1984)
Poosala, V., Ganti, V.: Fast Approximate Answers to Aggregate Queries on a Data Cube. In: Proceedings of the 11th International Conference on Statistical and Scientific Database Management, pp. 24–33 (1999)
Poosala, V., Ioannidis, Y.E.: Selectivity Estimation without the Attribute Value Independence Assumption. In: Proceedings of the 23rd International Conference on Very Large Databases, pp. 486–495 (1997)
Poosala, V., Ioannidis, Y.E., Haas, P.J., Shekita, E.: Improved Histograms for Selectivity Estimation of Range Predicates. In: Proceedings of the 1996 ACM International Conference on Management of Data, pp. 294–305 (1996)
Powell, M.J.D.: Approximation Theory and Methods. Cambridge University Press, Cambridge (1982)
Smith, J.R., Castelli, V., Jhingran, A., Li, C.-S.: Dynamic Assembly of Views in Data Cubes. In: Proceedings of the 7th ACM Symposium on Principles of Database Systems, pp. 274–283 (1998)
Stuart, A., Ord, J.K.: Kendall’s Advanced Theory of Statistics: Distribution Theory, vol. 1. Oxford University Press, Oxford (1998)
Transactions Processing Council Benchmarks, http://www.tpc.org
Program for TPC-D Data Generation with Skew, ftp://ftp.research.microsoft.com/pub/users/viveknar/tpcdskew
Vitter, J.S., Wang, M., Iyer, B.: Data Cube Approximation and Histograms via Wavelets. In: Proceeding of the 7th ACM International Conference on Information and Knowledge Management, pp. 96–104 (1998)
Vitter, J.S., Wang, M.: Approximate Computation of Multidimensional Aggregates of Sparse Data Using Wavelets. In: Proceedings of the 1999 ACM International Conference on Management of Data, pp. 194–204 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Cuzzocrea, A. (2010). LSA-Based Compression of Data Cubes for Efficient Approximate Range-SUM Query Answering in OLAP. In: Ras, Z.W., Tsay, LS. (eds) Advances in Intelligent Information Systems. Studies in Computational Intelligence, vol 265. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-05183-8_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-05183-8_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-05182-1
Online ISBN: 978-3-642-05183-8
eBook Packages: EngineeringEngineering (R0)