Big high-dimension data cube designs for hybrid memory systems

Silva, Rodrigo Rocha; Hirata, Celso Massaki; de Castro Lima, Joubert

doi:10.1007/s10115-020-01505-9

Big high-dimension data cube designs for hybrid memory systems

Regular Paper
Published: 26 August 2020

Volume 62, pages 4717–4746, (2020)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Rodrigo Rocha Silva ORCID: orcid.org/0000-0002-5741-6897¹,
Celso Massaki Hirata² &
Joubert de Castro Lima³

299 Accesses
5 Citations
Explore all metrics

Abstract

In Big Data cubes with hundreds of dimensions and billions of tuples, the indexing and query operations are a challenge and the reason is the time-space exponential complexity when a full cube is computed. Therefore, solutions based on RAM may not be practical and the solutions based on hybrid memory (RAM and disk) become viable alternatives. In this paper, we propose a hybrid approach, named bCubing, to index and query high-dimension data cubes with high number of tuples in a single machine and using RAM and disk memory systems. We evaluated bCubing in terms of runtime and memory consumption, comparing it with the Frag-Cubing, HIC and H-Frag approaches. bCubing showed to be faster and used less RAM than Frag-Cubing, HIC and H-Frag. bCubing indexed and allowed to query a data cube with 1.2 billion tuples and 60 dimensions, consuming only 84 GB of RAM, which means 35% less memory than HIC. The complex holistic measures mode and median were computed in multidimensional queries, and bCubing was, on average, 50% faster than HIC.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

HaCube: Extending MapReduce for Efficient OLAP Cube Materialization and View Maintenance

Two-dimensional indexing to provide one-integrated-memory view of distributed memory for a massively-parallel search engine

Article 13 November 2018

Tae-Seob Yun, Kyu-Young Whang, … Il-Yeol Song

Keyword Oriented Bitmap Join Index for In-Memory Analytical Processing

References

Augustin H, Sudmanns M, Tiede D, Baraldi A (2018) A semantic earth observation data cube for monitoring environmental changes during the syrian conflict. Proceedings of the AGIT pp 214–227
Beyer K, Ramakrishnan R (1999) Bottom-up computation of sparse and iceberg cube. SIGMOD Rec 28(2):359–370. https://doi.org/10.1145/304181.304214
Article Google Scholar
Braz FAF, Orlando S, Orsini R, Raffaet A, Roncato A, Silvestri C (2007) Approximate aggregations in trajectory data warehouses. In: 2007 IEEE 23rd international conference on data engineering workshop, pp 536–545 . https://doi.org/10.1109/ICDEW.2007.4401039
Ceci M, Cuzzocrea A, Malerba D (2015) Effectively and efficiently supporting roll-up and drill-down olap operations over continuous dimensions via hierarchical clustering. J Intell Inf Syst 44:38–49. https://doi.org/10.1007/s10844-013-0268-1
Article Google Scholar
Chan CY, Ioannidis YE (1998) Bitmap index design and evaluation. In: Proceedings of the 1998 ACM SIGMOD international conference on management of data, SIGMOD ’98, pp 355–366. ACM, New York, NY, USA https://doi.org/10.1145/276304.276336
Chiou AS, Sieg JC (2001) Optimization for queries with holistic functions. In: Proceedings seventh international conference on database systems for advanced applications. DASFAA 2001, pp 327–334. https://doi.org/10.1109/DASFAA.2001.916394
Codd EF (1972) Relational completeness of data base sublanguages. In: Database systems. Prentice-Hall, pp 65–98
Cuzzocrea A, Bellatreche L, Song IY (2013) Data warehousing and olap over big data: Current challenges and future research directions. In: Proceedings of the sixteenth international workshop on data warehousing and OLAP, DOLAP ’13, ACM, New York, NY, USA, pp 67–70 https://doi.org/10.1145/2513190.2517828
Cuzzocrea A, Moussa R, Xu G, Grasso GM (2015) Cloud-based olap over big data: Application scenarios and performance analysis. In: Cluster, cloud and grid computing (CCGrid), 2015 15th IEEE/ACM international symposium on, IEEE, pp 921–927
Dehdouh K, Boussaid O, Bentayeb F (2014) Columnar NoSQL star schema benchmark. Springer International Publishing, Cham, pp 281–288. https://doi.org/10.1007/978-3-319-11587-0_26
Book Google Scholar
Derbal KA, Tahar Z, Boukhalfa K, Frihi I, Alimazighi Za (2016) From spatial data warehouse and decision-making tool to solap generalisation approach for efficient road risk analysis. Int J Inf Technol Manag 15(4):364–386
Google Scholar
Ferro A, Giugno R, Puglisi PL, Pulvirenti A (2009) Bitcube: A bottom-up cubing engineering. In: Proceedings of the 11th International conference on data warehousing and knowledge discovery, DaWaK ’09, Springer-Verlag, Berlin, Heidelberg, pp 189–203. https://doi.org/10.1007/978-3-642-03730-6_16
Foundation TAS (2017) Commons math. https://commons.apache.org/proper/commons-math/
Foundation TAS (2017) Marchine learning repository. https://archive.ics.uci.edu/ml/
Gibbons PB, Matias Y (1998) New sampling-based summary statistics for improving approximate query answers. SIGMOD Rec 27(2):331–342. https://doi.org/10.1145/276305.276334
Article Google Scholar
Giglio L (2010) Modis collection 5 active fire product user’s guide version 2.4. Science Systems and Applications, Inc
Giuliani G, Chatenoux B, De Bono A, Rodila D, Richard JP, Allenbach K, Dao H, Peduzzi P (2017) Building an earth observations data cube: Lessons learned from the swiss data cube (sdc) on generating analysis ready data (ard). Big Earth Data 1(1–2):100–117
Article Google Scholar
Han J, Pei J, Dong G, Wang K (2001) Efficient computation of iceberg cubes with complex measures. SIGMOD Rec 30(2):1–12. https://doi.org/10.1145/376284.375664
Article Google Scholar
Hu Kf, Ling C, Jie S, Qi G, Tang Xl (2005) Computing High Dimensional MOLAP with Parallel Shell Mini-cubes. Springer Berlin Heidelberg, Berlin, Heidelberg, pp 1192–1196. https://doi.org/10.1007/11539506_149
Book Google Scholar
Kleppmann M, Beresford AR, Svingen B (2019) Online event processing. Commun ACM 62(5):43–49
Article Google Scholar
Kreps J, Narkhede N, Rao J et al (2011) Kafka: A distributed messaging system for log processing. Proc NetDB 11:1–7
Google Scholar
Lakshmanan LV, Pei J, Han J (2002) Quotient cube: How to summarize the semantics of a data cube. In: VLDB’02: Proceedings of the 28th international conference on very large databases, Elsevier, pp 778–789
Lee S, Kang S, Kim J, Yu EJ (2019) Scalable distributed data cube computation for large-scale multidimensional data analysis on a spark cluster. Cluster Comput 22(1):2063–2087
Article Google Scholar
Leng F, Bao Y, Yu G, Wang D, Liu Y (2006) An efficient Indexing technique for computing high dimensional data cubes. Springer, Berlin Heidelberg, Berlin, Heidelberg, pp 557–568. https://doi.org/10.1007/11775300_47
Book Google Scholar
Lewis A, Oliver S, Lymburner L, Evans B, Wyborn L, Mueller N, Raevksi G, Hooke J, Woodcock R, Sixsmith J et al (2017) The australian geoscience data cube-foundations and lessons learned. Remote Sens Environ 202:276–292
Article Google Scholar
Li C, Cong G, Tung AKH, Wang S (2004) Incremental maintenance of quotient cube for median. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’04, ACM, New York, NY, USA, pp 226–235. https://doi.org/10.1145/1014052.1014079
Li X, Han J, Gonzalez H (2004) High-dimensional olap: a minimal cubing approach. In: Proceedings of the thirtieth international conference on very large data bases. vol. 30, VLDB ’04, VLDB Endowment, pp 528–539. http://dl.acm.org/citation.cfm?id=1316689.1316736
Li X, Han J, Yin Z, Lee JG, Sun Y (2008) Sampling cube: A framework for statistical olap over sampling data. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data, SIGMOD ’08, ACM, New York, NY, USA, pp 779–790. https://doi.org/10.1145/1376616.1376695
Lo E, Kao B, Ho WS, Lee SD, Chui CK, Cheung DW (2008) Olap on sequence data. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data, SIGMOD ’08. ACM, New York, NY, USA. pp 649–660. https://doi.org/10.1145/1376616.1376682
Milo T, Altshuler E (2016) An efficient mapreduce cube algorithm for varied datadistributions. In: Proceedings of the 2016 international conference on management of data, pp 1151–1165
O’Neil P, Quass D (1997) Improved query performance with variant indexes. SIGMOD Rec 26(2):38–49. https://doi.org/10.1145/253262.253268
Article Google Scholar
Pagano TS, Durham RM (1993) Moderate resolution imaging spectroradiometer (modis). In: Optical engineering and photonics in aerospace sensing, International Society for Optics and Photonics, pp. 2–17
Poosala V, Ganti V (1999) Fast approximate answers to aggregate queries on a data cube. In: Proceedings eleventh international conference on scientific and statistical database management, pp 24–33. https://doi.org/10.1109/SSDM.1999.787618
Silva RR, de Castro Lima J, Hirata CM (2013) qcube: Efficient integration of range query operators over a high dimension data cube. JIDM 4(3):469–482
Google Scholar
Silva RR, de Castro Lima J, Hirata CM (2016) Computing big data cubes with hybrid memory. JCIT 11(1):13–30
Google Scholar
Silva RR, Hirata CM, Lima JdC (2015) A hybrid memory data cube approach for high dimension relations. In: Proceedings of the 17th international conference on enterprise information systems. vol. 1, ICEIS 2015, SCITEPRESS - Science and Technology Publications, Lda, Portugal, pp 139–149. https://doi.org/10.5220/0005371601390149
Song J, He H, Thomas R, Bao Y, Yu G (2019) Haery: a hadoop based query system on accumulative and high-dimensional data model for big data. IEEE transactions on knowledge and data engineering
Wang B, Gui H, Roantree M, O’Connor MF (2014) Data cube computational model with hadoop mapreduce
Wu K, Otoo E, Shoshani A (2004) On the performance of bitmap indices for high cardinality attributes. In: Proceedings of the thirtieth international conference on very large data bases. vol. 30, VLDB ’04, VLDB Endowment, pp 24–35. http://dl.acm.org/citation.cfm?id=1316689.1316694
Wu K, Otoo EJ, Shoshani A (2002) Compressing bitmap indexes for faster search operations. In: Proceedings of the 14th international conference on scientific and statistical database management, SSDBM ’02, IEEE Computer Society, Washington, DC, USA, pp 99–108. https://doi.org/10.1109/SSDM.2002.1029710
Wu K, Stockinger K, Shoshani A (2008) Breaking the curse of cardinality on bitmap indexes. In: Proceedings of the 20th international conference on scientific and statistical database management, SSDBM ’08, Springer-Verlag, Berlin, Heidelberg, pp 348–365. https://doi.org/10.1007/978-3-540-69497-7_23
Xu D, Ma Y, Yan J, Liu P, Chen L (2018) Spatial-feature data cube for spatiotemporal remote sensing data processing and analysis. Computing, pp 1–15

Download references

Acknowledgements

This work was partially supported by FAPESP under Grant No. 2012/04260-4. The second author was supported by CNPq under the Grant numbers CNPq Universal 01/2016 403921/2016-3 and CNPq PQ 306186/2018-7.

Author information

Authors and Affiliations

Faculdade de Tecnologia de São Paulo, Universidade de Coimbra, Rua Carlos Barattino, 908 Vila Nova Mogilar, 08773-600, Mogi das Cruzes, SP, Brazil
Rodrigo Rocha Silva
Instituto Tecnológico de Aeronáutica, São José dos Campos, SP, Brazil
Celso Massaki Hirata
Universidade Federal de Ouro Preto, Ouro Prêto, SP, Brazil
Joubert de Castro Lima

Authors

Rodrigo Rocha Silva
View author publications
You can also search for this author in PubMed Google Scholar
Celso Massaki Hirata
View author publications
You can also search for this author in PubMed Google Scholar
Joubert de Castro Lima
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rodrigo Rocha Silva.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Silva, R.R., Hirata, C.M. & de Castro Lima, J. Big high-dimension data cube designs for hybrid memory systems. Knowl Inf Syst 62, 4717–4746 (2020). https://doi.org/10.1007/s10115-020-01505-9

Download citation

Received: 11 November 2019
Revised: 04 August 2020
Accepted: 09 August 2020
Published: 26 August 2020
Issue Date: December 2020
DOI: https://doi.org/10.1007/s10115-020-01505-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Big high-dimension data cube designs for hybrid memory systems

Abstract

Access this article

Similar content being viewed by others

HaCube: Extending MapReduce for Efficient OLAP Cube Materialization and View Maintenance

Two-dimensional indexing to provide one-integrated-memory view of distributed memory for a massively-parallel search engine

Keyword Oriented Bitmap Join Index for In-Memory Analytical Processing

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Big high-dimension data cube designs for hybrid memory systems

Abstract

Access this article

Similar content being viewed by others

HaCube: Extending MapReduce for Efficient OLAP Cube Materialization and View Maintenance

Two-dimensional indexing to provide one-integrated-memory view of distributed memory for a massively-parallel search engine

Keyword Oriented Bitmap Join Index for In-Memory Analytical Processing

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation