Skip to main content
Log in

Big high-dimension data cube designs for hybrid memory systems

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

In Big Data cubes with hundreds of dimensions and billions of tuples, the indexing and query operations are a challenge and the reason is the time-space exponential complexity when a full cube is computed. Therefore, solutions based on RAM may not be practical and the solutions based on hybrid memory (RAM and disk) become viable alternatives. In this paper, we propose a hybrid approach, named bCubing, to index and query high-dimension data cubes with high number of tuples in a single machine and using RAM and disk memory systems. We evaluated bCubing in terms of runtime and memory consumption, comparing it with the Frag-Cubing, HIC and H-Frag approaches. bCubing showed to be faster and used less RAM than Frag-Cubing, HIC and H-Frag. bCubing indexed and allowed to query a data cube with 1.2 billion tuples and 60 dimensions, consuming only 84 GB of RAM, which means 35% less memory than HIC. The complex holistic measures mode and median were computed in multidimensional queries, and bCubing was, on average, 50% faster than HIC.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Augustin H, Sudmanns M, Tiede D, Baraldi A (2018) A semantic earth observation data cube for monitoring environmental changes during the syrian conflict. Proceedings of the AGIT pp 214–227

  2. Beyer K, Ramakrishnan R (1999) Bottom-up computation of sparse and iceberg cube. SIGMOD Rec 28(2):359–370. https://doi.org/10.1145/304181.304214

    Article  Google Scholar 

  3. Braz FAF, Orlando S, Orsini R, Raffaet A, Roncato A, Silvestri C (2007) Approximate aggregations in trajectory data warehouses. In: 2007 IEEE 23rd international conference on data engineering workshop, pp 536–545 . https://doi.org/10.1109/ICDEW.2007.4401039

  4. Ceci M, Cuzzocrea A, Malerba D (2015) Effectively and efficiently supporting roll-up and drill-down olap operations over continuous dimensions via hierarchical clustering. J Intell Inf Syst 44:38–49. https://doi.org/10.1007/s10844-013-0268-1

    Article  Google Scholar 

  5. Chan CY, Ioannidis YE (1998) Bitmap index design and evaluation. In: Proceedings of the 1998 ACM SIGMOD international conference on management of data, SIGMOD ’98, pp 355–366. ACM, New York, NY, USA https://doi.org/10.1145/276304.276336

  6. Chiou AS, Sieg JC (2001) Optimization for queries with holistic functions. In: Proceedings seventh international conference on database systems for advanced applications. DASFAA 2001, pp 327–334. https://doi.org/10.1109/DASFAA.2001.916394

  7. Codd EF (1972) Relational completeness of data base sublanguages. In: Database systems. Prentice-Hall, pp 65–98

  8. Cuzzocrea A, Bellatreche L, Song IY (2013) Data warehousing and olap over big data: Current challenges and future research directions. In: Proceedings of the sixteenth international workshop on data warehousing and OLAP, DOLAP ’13, ACM, New York, NY, USA, pp 67–70 https://doi.org/10.1145/2513190.2517828

  9. Cuzzocrea A, Moussa R, Xu G, Grasso GM (2015) Cloud-based olap over big data: Application scenarios and performance analysis. In: Cluster, cloud and grid computing (CCGrid), 2015 15th IEEE/ACM international symposium on, IEEE, pp 921–927

  10. Dehdouh K, Boussaid O, Bentayeb F (2014) Columnar NoSQL star schema benchmark. Springer International Publishing, Cham, pp 281–288. https://doi.org/10.1007/978-3-319-11587-0_26

    Book  Google Scholar 

  11. Derbal KA, Tahar Z, Boukhalfa K, Frihi I, Alimazighi Za (2016) From spatial data warehouse and decision-making tool to solap generalisation approach for efficient road risk analysis. Int J Inf Technol Manag 15(4):364–386

    Google Scholar 

  12. Ferro A, Giugno R, Puglisi PL, Pulvirenti A (2009) Bitcube: A bottom-up cubing engineering. In: Proceedings of the 11th International conference on data warehousing and knowledge discovery, DaWaK ’09, Springer-Verlag, Berlin, Heidelberg, pp 189–203. https://doi.org/10.1007/978-3-642-03730-6_16

  13. Foundation TAS (2017) Commons math. https://commons.apache.org/proper/commons-math/

  14. Foundation TAS (2017) Marchine learning repository. https://archive.ics.uci.edu/ml/

  15. Gibbons PB, Matias Y (1998) New sampling-based summary statistics for improving approximate query answers. SIGMOD Rec 27(2):331–342. https://doi.org/10.1145/276305.276334

    Article  Google Scholar 

  16. Giglio L (2010) Modis collection 5 active fire product user’s guide version 2.4. Science Systems and Applications, Inc

  17. Giuliani G, Chatenoux B, De Bono A, Rodila D, Richard JP, Allenbach K, Dao H, Peduzzi P (2017) Building an earth observations data cube: Lessons learned from the swiss data cube (sdc) on generating analysis ready data (ard). Big Earth Data 1(1–2):100–117

    Article  Google Scholar 

  18. Han J, Pei J, Dong G, Wang K (2001) Efficient computation of iceberg cubes with complex measures. SIGMOD Rec 30(2):1–12. https://doi.org/10.1145/376284.375664

    Article  Google Scholar 

  19. Hu Kf, Ling C, Jie S, Qi G, Tang Xl (2005) Computing High Dimensional MOLAP with Parallel Shell Mini-cubes. Springer Berlin Heidelberg, Berlin, Heidelberg, pp 1192–1196. https://doi.org/10.1007/11539506_149

    Book  Google Scholar 

  20. Kleppmann M, Beresford AR, Svingen B (2019) Online event processing. Commun ACM 62(5):43–49

    Article  Google Scholar 

  21. Kreps J, Narkhede N, Rao J et al (2011) Kafka: A distributed messaging system for log processing. Proc NetDB 11:1–7

    Google Scholar 

  22. Lakshmanan LV, Pei J, Han J (2002) Quotient cube: How to summarize the semantics of a data cube. In: VLDB’02: Proceedings of the 28th international conference on very large databases, Elsevier, pp 778–789

  23. Lee S, Kang S, Kim J, Yu EJ (2019) Scalable distributed data cube computation for large-scale multidimensional data analysis on a spark cluster. Cluster Comput 22(1):2063–2087

    Article  Google Scholar 

  24. Leng F, Bao Y, Yu G, Wang D, Liu Y (2006) An efficient Indexing technique for computing high dimensional data cubes. Springer, Berlin Heidelberg, Berlin, Heidelberg, pp 557–568. https://doi.org/10.1007/11775300_47

    Book  Google Scholar 

  25. Lewis A, Oliver S, Lymburner L, Evans B, Wyborn L, Mueller N, Raevksi G, Hooke J, Woodcock R, Sixsmith J et al (2017) The australian geoscience data cube-foundations and lessons learned. Remote Sens Environ 202:276–292

    Article  Google Scholar 

  26. Li C, Cong G, Tung AKH, Wang S (2004) Incremental maintenance of quotient cube for median. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’04, ACM, New York, NY, USA, pp 226–235. https://doi.org/10.1145/1014052.1014079

  27. Li X, Han J, Gonzalez H (2004) High-dimensional olap: a minimal cubing approach. In: Proceedings of the thirtieth international conference on very large data bases. vol. 30, VLDB ’04, VLDB Endowment, pp 528–539. http://dl.acm.org/citation.cfm?id=1316689.1316736

  28. Li X, Han J, Yin Z, Lee JG, Sun Y (2008) Sampling cube: A framework for statistical olap over sampling data. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data, SIGMOD ’08, ACM, New York, NY, USA, pp 779–790. https://doi.org/10.1145/1376616.1376695

  29. Lo E, Kao B, Ho WS, Lee SD, Chui CK, Cheung DW (2008) Olap on sequence data. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data, SIGMOD ’08. ACM, New York, NY, USA. pp 649–660. https://doi.org/10.1145/1376616.1376682

  30. Milo T, Altshuler E (2016) An efficient mapreduce cube algorithm for varied datadistributions. In: Proceedings of the 2016 international conference on management of data, pp 1151–1165

  31. O’Neil P, Quass D (1997) Improved query performance with variant indexes. SIGMOD Rec 26(2):38–49. https://doi.org/10.1145/253262.253268

    Article  Google Scholar 

  32. Pagano TS, Durham RM (1993) Moderate resolution imaging spectroradiometer (modis). In: Optical engineering and photonics in aerospace sensing, International Society for Optics and Photonics, pp. 2–17

  33. Poosala V, Ganti V (1999) Fast approximate answers to aggregate queries on a data cube. In: Proceedings eleventh international conference on scientific and statistical database management, pp 24–33. https://doi.org/10.1109/SSDM.1999.787618

  34. Silva RR, de Castro Lima J, Hirata CM (2013) qcube: Efficient integration of range query operators over a high dimension data cube. JIDM 4(3):469–482

    Google Scholar 

  35. Silva RR, de Castro Lima J, Hirata CM (2016) Computing big data cubes with hybrid memory. JCIT 11(1):13–30

    Google Scholar 

  36. Silva RR, Hirata CM, Lima JdC (2015) A hybrid memory data cube approach for high dimension relations. In: Proceedings of the 17th international conference on enterprise information systems. vol. 1, ICEIS 2015, SCITEPRESS - Science and Technology Publications, Lda, Portugal, pp 139–149. https://doi.org/10.5220/0005371601390149

  37. Song J, He H, Thomas R, Bao Y, Yu G (2019) Haery: a hadoop based query system on accumulative and high-dimensional data model for big data. IEEE transactions on knowledge and data engineering

  38. Wang B, Gui H, Roantree M, O’Connor MF (2014) Data cube computational model with hadoop mapreduce

  39. Wu K, Otoo E, Shoshani A (2004) On the performance of bitmap indices for high cardinality attributes. In: Proceedings of the thirtieth international conference on very large data bases. vol. 30, VLDB ’04, VLDB Endowment, pp 24–35. http://dl.acm.org/citation.cfm?id=1316689.1316694

  40. Wu K, Otoo EJ, Shoshani A (2002) Compressing bitmap indexes for faster search operations. In: Proceedings of the 14th international conference on scientific and statistical database management, SSDBM ’02, IEEE Computer Society, Washington, DC, USA, pp 99–108. https://doi.org/10.1109/SSDM.2002.1029710

  41. Wu K, Stockinger K, Shoshani A (2008) Breaking the curse of cardinality on bitmap indexes. In: Proceedings of the 20th international conference on scientific and statistical database management, SSDBM ’08, Springer-Verlag, Berlin, Heidelberg, pp 348–365. https://doi.org/10.1007/978-3-540-69497-7_23

  42. Xu D, Ma Y, Yan J, Liu P, Chen L (2018) Spatial-feature data cube for spatiotemporal remote sensing data processing and analysis. Computing, pp 1–15

Download references

Acknowledgements

This work was partially supported by FAPESP under Grant No. 2012/04260-4. The second author was supported by CNPq under the Grant numbers CNPq Universal 01/2016 403921/2016-3 and CNPq PQ 306186/2018-7.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rodrigo Rocha Silva.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Silva, R.R., Hirata, C.M. & de Castro Lima, J. Big high-dimension data cube designs for hybrid memory systems. Knowl Inf Syst 62, 4717–4746 (2020). https://doi.org/10.1007/s10115-020-01505-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-020-01505-9

Keywords

Navigation