Abstract
In Big Data cubes with hundreds of dimensions and billions of tuples, the indexing and query operations are a challenge and the reason is the time-space exponential complexity when a full cube is computed. Therefore, solutions based on RAM may not be practical and the solutions based on hybrid memory (RAM and disk) become viable alternatives. In this paper, we propose a hybrid approach, named bCubing, to index and query high-dimension data cubes with high number of tuples in a single machine and using RAM and disk memory systems. We evaluated bCubing in terms of runtime and memory consumption, comparing it with the Frag-Cubing, HIC and H-Frag approaches. bCubing showed to be faster and used less RAM than Frag-Cubing, HIC and H-Frag. bCubing indexed and allowed to query a data cube with 1.2 billion tuples and 60 dimensions, consuming only 84 GB of RAM, which means 35% less memory than HIC. The complex holistic measures mode and median were computed in multidimensional queries, and bCubing was, on average, 50% faster than HIC.
Similar content being viewed by others
References
Augustin H, Sudmanns M, Tiede D, Baraldi A (2018) A semantic earth observation data cube for monitoring environmental changes during the syrian conflict. Proceedings of the AGIT pp 214–227
Beyer K, Ramakrishnan R (1999) Bottom-up computation of sparse and iceberg cube. SIGMOD Rec 28(2):359–370. https://doi.org/10.1145/304181.304214
Braz FAF, Orlando S, Orsini R, Raffaet A, Roncato A, Silvestri C (2007) Approximate aggregations in trajectory data warehouses. In: 2007 IEEE 23rd international conference on data engineering workshop, pp 536–545 . https://doi.org/10.1109/ICDEW.2007.4401039
Ceci M, Cuzzocrea A, Malerba D (2015) Effectively and efficiently supporting roll-up and drill-down olap operations over continuous dimensions via hierarchical clustering. J Intell Inf Syst 44:38–49. https://doi.org/10.1007/s10844-013-0268-1
Chan CY, Ioannidis YE (1998) Bitmap index design and evaluation. In: Proceedings of the 1998 ACM SIGMOD international conference on management of data, SIGMOD ’98, pp 355–366. ACM, New York, NY, USA https://doi.org/10.1145/276304.276336
Chiou AS, Sieg JC (2001) Optimization for queries with holistic functions. In: Proceedings seventh international conference on database systems for advanced applications. DASFAA 2001, pp 327–334. https://doi.org/10.1109/DASFAA.2001.916394
Codd EF (1972) Relational completeness of data base sublanguages. In: Database systems. Prentice-Hall, pp 65–98
Cuzzocrea A, Bellatreche L, Song IY (2013) Data warehousing and olap over big data: Current challenges and future research directions. In: Proceedings of the sixteenth international workshop on data warehousing and OLAP, DOLAP ’13, ACM, New York, NY, USA, pp 67–70 https://doi.org/10.1145/2513190.2517828
Cuzzocrea A, Moussa R, Xu G, Grasso GM (2015) Cloud-based olap over big data: Application scenarios and performance analysis. In: Cluster, cloud and grid computing (CCGrid), 2015 15th IEEE/ACM international symposium on, IEEE, pp 921–927
Dehdouh K, Boussaid O, Bentayeb F (2014) Columnar NoSQL star schema benchmark. Springer International Publishing, Cham, pp 281–288. https://doi.org/10.1007/978-3-319-11587-0_26
Derbal KA, Tahar Z, Boukhalfa K, Frihi I, Alimazighi Za (2016) From spatial data warehouse and decision-making tool to solap generalisation approach for efficient road risk analysis. Int J Inf Technol Manag 15(4):364–386
Ferro A, Giugno R, Puglisi PL, Pulvirenti A (2009) Bitcube: A bottom-up cubing engineering. In: Proceedings of the 11th International conference on data warehousing and knowledge discovery, DaWaK ’09, Springer-Verlag, Berlin, Heidelberg, pp 189–203. https://doi.org/10.1007/978-3-642-03730-6_16
Foundation TAS (2017) Commons math. https://commons.apache.org/proper/commons-math/
Foundation TAS (2017) Marchine learning repository. https://archive.ics.uci.edu/ml/
Gibbons PB, Matias Y (1998) New sampling-based summary statistics for improving approximate query answers. SIGMOD Rec 27(2):331–342. https://doi.org/10.1145/276305.276334
Giglio L (2010) Modis collection 5 active fire product user’s guide version 2.4. Science Systems and Applications, Inc
Giuliani G, Chatenoux B, De Bono A, Rodila D, Richard JP, Allenbach K, Dao H, Peduzzi P (2017) Building an earth observations data cube: Lessons learned from the swiss data cube (sdc) on generating analysis ready data (ard). Big Earth Data 1(1–2):100–117
Han J, Pei J, Dong G, Wang K (2001) Efficient computation of iceberg cubes with complex measures. SIGMOD Rec 30(2):1–12. https://doi.org/10.1145/376284.375664
Hu Kf, Ling C, Jie S, Qi G, Tang Xl (2005) Computing High Dimensional MOLAP with Parallel Shell Mini-cubes. Springer Berlin Heidelberg, Berlin, Heidelberg, pp 1192–1196. https://doi.org/10.1007/11539506_149
Kleppmann M, Beresford AR, Svingen B (2019) Online event processing. Commun ACM 62(5):43–49
Kreps J, Narkhede N, Rao J et al (2011) Kafka: A distributed messaging system for log processing. Proc NetDB 11:1–7
Lakshmanan LV, Pei J, Han J (2002) Quotient cube: How to summarize the semantics of a data cube. In: VLDB’02: Proceedings of the 28th international conference on very large databases, Elsevier, pp 778–789
Lee S, Kang S, Kim J, Yu EJ (2019) Scalable distributed data cube computation for large-scale multidimensional data analysis on a spark cluster. Cluster Comput 22(1):2063–2087
Leng F, Bao Y, Yu G, Wang D, Liu Y (2006) An efficient Indexing technique for computing high dimensional data cubes. Springer, Berlin Heidelberg, Berlin, Heidelberg, pp 557–568. https://doi.org/10.1007/11775300_47
Lewis A, Oliver S, Lymburner L, Evans B, Wyborn L, Mueller N, Raevksi G, Hooke J, Woodcock R, Sixsmith J et al (2017) The australian geoscience data cube-foundations and lessons learned. Remote Sens Environ 202:276–292
Li C, Cong G, Tung AKH, Wang S (2004) Incremental maintenance of quotient cube for median. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’04, ACM, New York, NY, USA, pp 226–235. https://doi.org/10.1145/1014052.1014079
Li X, Han J, Gonzalez H (2004) High-dimensional olap: a minimal cubing approach. In: Proceedings of the thirtieth international conference on very large data bases. vol. 30, VLDB ’04, VLDB Endowment, pp 528–539. http://dl.acm.org/citation.cfm?id=1316689.1316736
Li X, Han J, Yin Z, Lee JG, Sun Y (2008) Sampling cube: A framework for statistical olap over sampling data. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data, SIGMOD ’08, ACM, New York, NY, USA, pp 779–790. https://doi.org/10.1145/1376616.1376695
Lo E, Kao B, Ho WS, Lee SD, Chui CK, Cheung DW (2008) Olap on sequence data. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data, SIGMOD ’08. ACM, New York, NY, USA. pp 649–660. https://doi.org/10.1145/1376616.1376682
Milo T, Altshuler E (2016) An efficient mapreduce cube algorithm for varied datadistributions. In: Proceedings of the 2016 international conference on management of data, pp 1151–1165
O’Neil P, Quass D (1997) Improved query performance with variant indexes. SIGMOD Rec 26(2):38–49. https://doi.org/10.1145/253262.253268
Pagano TS, Durham RM (1993) Moderate resolution imaging spectroradiometer (modis). In: Optical engineering and photonics in aerospace sensing, International Society for Optics and Photonics, pp. 2–17
Poosala V, Ganti V (1999) Fast approximate answers to aggregate queries on a data cube. In: Proceedings eleventh international conference on scientific and statistical database management, pp 24–33. https://doi.org/10.1109/SSDM.1999.787618
Silva RR, de Castro Lima J, Hirata CM (2013) qcube: Efficient integration of range query operators over a high dimension data cube. JIDM 4(3):469–482
Silva RR, de Castro Lima J, Hirata CM (2016) Computing big data cubes with hybrid memory. JCIT 11(1):13–30
Silva RR, Hirata CM, Lima JdC (2015) A hybrid memory data cube approach for high dimension relations. In: Proceedings of the 17th international conference on enterprise information systems. vol. 1, ICEIS 2015, SCITEPRESS - Science and Technology Publications, Lda, Portugal, pp 139–149. https://doi.org/10.5220/0005371601390149
Song J, He H, Thomas R, Bao Y, Yu G (2019) Haery: a hadoop based query system on accumulative and high-dimensional data model for big data. IEEE transactions on knowledge and data engineering
Wang B, Gui H, Roantree M, O’Connor MF (2014) Data cube computational model with hadoop mapreduce
Wu K, Otoo E, Shoshani A (2004) On the performance of bitmap indices for high cardinality attributes. In: Proceedings of the thirtieth international conference on very large data bases. vol. 30, VLDB ’04, VLDB Endowment, pp 24–35. http://dl.acm.org/citation.cfm?id=1316689.1316694
Wu K, Otoo EJ, Shoshani A (2002) Compressing bitmap indexes for faster search operations. In: Proceedings of the 14th international conference on scientific and statistical database management, SSDBM ’02, IEEE Computer Society, Washington, DC, USA, pp 99–108. https://doi.org/10.1109/SSDM.2002.1029710
Wu K, Stockinger K, Shoshani A (2008) Breaking the curse of cardinality on bitmap indexes. In: Proceedings of the 20th international conference on scientific and statistical database management, SSDBM ’08, Springer-Verlag, Berlin, Heidelberg, pp 348–365. https://doi.org/10.1007/978-3-540-69497-7_23
Xu D, Ma Y, Yan J, Liu P, Chen L (2018) Spatial-feature data cube for spatiotemporal remote sensing data processing and analysis. Computing, pp 1–15
Acknowledgements
This work was partially supported by FAPESP under Grant No. 2012/04260-4. The second author was supported by CNPq under the Grant numbers CNPq Universal 01/2016 403921/2016-3 and CNPq PQ 306186/2018-7.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Silva, R.R., Hirata, C.M. & de Castro Lima, J. Big high-dimension data cube designs for hybrid memory systems. Knowl Inf Syst 62, 4717–4746 (2020). https://doi.org/10.1007/s10115-020-01505-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-020-01505-9