Abstract
We consider the problem of representing multidimensional data where the domain of each dimension is organized hierarchically, and the queries require summary information at a different node in the hierarchy of each dimension. This is the typical case of OLAP databases. A basic approach is to represent each hierarchy as a one-dimensional line and recast the queries as multidimensional range queries. This approach can be implemented compactly by generalizing to more dimensions the \(k^2\)-treap, a compact representation of two-dimensional points that allows for efficient summarization queries along generic ranges. Instead, we propose a more flexible generalization, which instead of a generic quadtree-like partition of the space, follows the domain hierarchies across each dimension to organize the partitioning. The resulting structure is much more efficient than a generic multidimensional structure, since queries are resolved by aggregating much fewer nodes of the tree.
Founded in part by Fondecyt 1-140796 (for Gonzalo Navarro); and, for the Spanish group, by MINECO (PGE and FEDER) [TIN2013-46238-C4-3-R]; CDTI, AGI, MINECO [IDI-20141259/ITC-20151305/ITC-20151247]; ICT COST Action IC1302; and by Xunta de Galicia (co-founded with FEDER) [GRC2013/053]. This article was elaborated in the context of BIRDS, a European project that has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie GA No. 690941.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The implemented algorithm is recursive and each sum is actually computed only once, when returning from the recursive calls.
- 2.
We do not actually need to represent the nodes of the last level in \(T_{a}\). This data structure will be used to first identify a node whose children will be later located in another bit array (\(T_{c}\)). But these already constitute matrix cells, with no children.
References
Brisaboa, N.R., de Bernardo, G., Konow, R., Navarro, G., Seco, D.: Aggregated 2d range queries on clustered points. Inf. Syst. 60, 34–49 (2016)
Brisaboa, N.R., Ladra, S., Navarro, G.: DACs: bringing direct access to variable-length codes. Inf. Process. Manag. 49, 392–404 (2013)
Brisaboa, N.R., Ladra, S., Navarro, G.: Compact representation of web graphs with extended functionality. Inf. Syst. 39, 152–174 (2014)
Chan, T., Durocher, S., Larsen, K., Morrison, J., Wilkinson, B.: Linear-space data structures for range mode query in arrays. In: Proceedings of 29th International Symposium on Theoretical Aspects of Computer Science (STACS), pp. 290–301 (2012)
Chaudhuri, S., Dayal, U.: An overview of data warehousing and OLAP technology. SIGMOD Rec. 26(1), 65–74 (1997)
Clark, D.: Compact PAT Trees. Ph.D. thesis, University of Waterloo, Canada (1996)
Codd, E.F., Codd, S.B., Salley, C.T.: Providing OLAP. On-Line Analytical Processing to User-Analysts: An IT Mandate. E. F. Codd and Associates (1993)
Hon, W., Shah, R., Thankachan, S.V., Vitter, J.S.: Space-efficient frameworks for top-k string retrieval. J. ACM 61(2), 9:1–9:36 (2014)
Jacobson, G.: Space-efficient static trees and graphs. In: Proceedings of the 30th Annual Symposium on Foundations of Computer Science, SFCS 1989, pp. 549–554. IEEE Computer Society, Washington, DC (1989)
Kimball, R., Ross, M.: The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling, 2nd edn. Wiley, New York (2002)
Larsen, K., van Walderveen, F.: Near-optimal range reporting structures for categorical data. In: Proceedings of 24th Symposium on Discrete Algorithms (SODA), pp. 265–276 (2013)
Levene, M., Loizou, G.: Why is the snowflake schema a good data warehouse design? Inf. Syst. 28(3), 225–240 (2003)
Sadakane, K.: Succinct data structures for flexible text retrieval systems. J. Discrete Algorithms 5, 12–22 (2007)
Samet, H.: Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann, San Francisco (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Brisaboa, N.R., Cerdeira-Pena, A., López-López, N., Navarro, G., Penabad, M.R., Silva-Coira, F. (2016). Efficient Representation of Multidimensional Data over Hierarchical Domains. In: Inenaga, S., Sadakane, K., Sakai, T. (eds) String Processing and Information Retrieval. SPIRE 2016. Lecture Notes in Computer Science(), vol 9954. Springer, Cham. https://doi.org/10.1007/978-3-319-46049-9_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-46049-9_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46048-2
Online ISBN: 978-3-319-46049-9
eBook Packages: Computer ScienceComputer Science (R0)