Data and Systems Heterogeneity: Analysis on Data, Processing, Workload, and Infrastructure

Stan, Roxana-Gabriela; Negru, Catalin; Bajenaru, Lidia; Pop, Florin

doi:10.1007/978-3-030-38836-2_4

Roxana-Gabriela Stan⁹,
Catalin Negru⁹,
Lidia Bajenaru¹⁰ &
…
Florin Pop^9,10

Part of the book series: Computer Communications and Networks ((CCN))

754 Accesses

Abstract

This paper is our survey toward a general understanding of the requirements for handling large volumes of heterogeneous data, and moreover, presents an overview of the employed computing techniques and technologies necessary for analyzing and processing those datasets. As of our attempt to picture how the data heterogeneity meets the systems heterogeneity, we summarize the identified key issues for multiple dimensions, including data, processing, workload, and infrastructure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ahmed S, Usman Ali M, Ferzund J, Atif Sarwar M, Rehman A, Mehmood A (2017) Modern data formats for big bioinformatics data analytics. Int J Adv Comput Sci Appl 8(4):366–377
Google Scholar
Avro, Apache Software Foundation. https://avro.apache.org/
Azmandian F, Moffie M, Dy JG, Aslam JA, Kaeli DR (2011) Workload characterization at the virtualization layer. In: 2011 IEEE 19th annual international symposium on modelling, analysis, and simulation of computer and telecommunication systems, pp 63–72
Google Scholar
Carbone P, Katsifodimos A, Ewen S, Markl V, Haridi S, Tzoumas K (2015) Apache Flink™: stream and batch processing in a single engine. IEEE Data Eng Bull 38(4):28–38
Google Scholar
Casado R, Younas M (2015) Emerging trends and technologies in big data processing. Concurr Comput Pract Exp 27(8):2078–2091
Google Scholar
Ciobanu R, Dobre C, Bălănescu M, Suciu G (2019) Data and task offloading in collaborative mobile fog-based networks. IEEE Access 7:104405–104422
Google Scholar
Ciobanu R, Tăbuşcă V, Dobre C, Băjenaru L, Mavromoustakis CX, Mastorakis G (2019) Avoiding data corruption in drop computing mobile networks. IEEE Access 7:31170–31185
Google Scholar
Ciobanu R-I, Dobre C (2019) Mobile interactions and computation offloading in drop computing. In: Advances in network-based information systems. Springer International Publishing, pp 361–373
Google Scholar
Ciobanu R-I, Negru C, Pop F, Dobre C, Mavromoustakis CX, Mastorakis G (2019) Drop computing: ad-hoc dynamic collaborative computing. Futur Gener Comput Syst 92:889–899
Article Google Scholar
Dean J, Ghemawat S (2004) MapReduce: simplified data processing on large clusters. In: Sixth symposium on operating system design and implementation, OSDI’04, San Francisco, CA, pp 137–150
Google Scholar
Dünner C, Parnell T, Atasu K, Sifalakis M, Pozidis H (2017) Understanding and optimizing the performance of distributed machine learning applications on apache spark. In: 2017 IEEE international conference on big data (big data), pp 331–338
Google Scholar
Ghemawat S, Gobioff H, Leung S-T (2003) The Google file system. ACM SIGOPS Oper Syst Rev 37(5):29–43
Article Google Scholar
Hadoop, Apache Software Foundation. https://hadoop.apache.org/
HBase, Apache Software Foundation. https://hbase.apache.org/
Hive, Apache Software Foundation. https://hive.apache.org/
Jia Z, Zhan J, Wang L, Luo C, Gao W, Jin Y, Han R, Zhang L (2017) Understanding big data analytics workloads on modern processors. IEEE Trans Parallel Distrib Syst 28(6):1797–1810
Article Google Scholar
Lew J, Shah DA, Pati S, Cattell S, Zhang M, Sandhupatla A, Ng C, Goli N, Sinclair MD, Rogers TG, Aamodt TM (2019) Analyzing machine learning workloads using a detailed GPU simulator. In: 2019 IEEE international symposium on performance analysis of systems and software (ISPASS), pp 151–152
Google Scholar
Lu J, Irena H (2019) Multi-model databases: a new journey to handle the variety of data. ACM Comput Surv 52(3)
Google Scholar
Mohammadi Makrani H, Sayadi H, Pudukotai Dinakarra SM, Rafatirad S, Homayoun H (2018) A comprehensive memory analysis of data intensive workloads on server class architecture. In: Proceedings of the international symposium on memory systems, MEMSYS’18, New York, NY, USA. Association for Computing Machinery, pp 19–30
Google Scholar
Marin R-C, Ciobanu R-I, Dobre C (2017) Improving opportunistic networks by leveraging device-to-device communication. IEEE Commun Mag 55(11):86–91
Article Google Scholar
Mishra AK, Nurvitadhi E, Venkatesh G, Pearce J, Marr D (2017) Fine-grained accelerators for sparse machine learning workloads. In: 2017 22nd Asia and South Pacific design automation conference (ASP-DAC), pp 635–640
Google Scholar
ORC, Apache Software Foundation. https://orc.apache.org/
Parquet, Apache Software Foundation. https://parquet.apache.org/
Płuciennik E, Zgorzałek K (2017) The multi-model databases: a review. In: Beyond databases, architectures and structures. Towards efficient solutions for data analysis and knowledge representation. Springer International Publishing, pp 141–152
Google Scholar
Samza, Apache Software Foundation. https://samza.apache.org/
Samza–Core concepts, Apache Software Foundation. http://samza.apache.org/learn/documentation/latest/core-concepts/core-concepts.html
Spark, Apache Software Foundation. https://spark.apache.org/
Stan C-S, Pandelica A-E, Zamfir A-V, Stan R-G, Negru C (2019) Apache spark and apache ignite performance analysis. In: 2019 22nd international conference on control systems and computer science (CSCS), pp 726–733
Google Scholar
Storm, Apache Software Foundation. https://storm.apache.org/
Storm–Concepts, Apache Software Foundation. https://storm.apache.org/releases/current/Concepts.html
Storm–Guaranteeing Message Processing, Apache Software Foundation. https://storm.apache.org/releases/current/Guaranteeing-message-processing.html
Wang M, Meng C, Long G, Wu C, Yang J, Lin W, Jia Y (2019) Characterizing deep learning training workloads on Alibaba-PAI. In: 2019 IEEE international symposium on workload characterization (IISWC), pp 189–202
Google Scholar
Yousefpour A, Fung C, Nguyen T, Kadiyala K, Jalali F, Niakanlahiji A, Kong J, Jue JP (2019) All one needs to know about fog computing and related edge computing paradigms: a complete survey. J Syst Architect 98:289–330
Article Google Scholar
Yu S, Zhang L, Li L, Yan B, Cai Z, Zhang L (2019) An efficient interest-aware data dissemination approach in opportunistic networks. Procedia Comput Sci 147:394–399
Google Scholar
Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Franklin MJ, Shenker S, Stoica I (2012) Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX conference on networked systems design and implementation, NSDI’12, USA. USENIX Association
Google Scholar
Zaharia M, Xin RS, Wendell P, Das T, Armbrust M, Dave A, Meng X, Rosen J, Venkataraman S, Franklin MJ, Ghodsi A, Gonzalez J, Shenker S, Stoica I (2016) Apache Spark: a unified engine for big data processing. Commun ACM 59(11):56–65
Google Scholar

Download references

Author information

Authors and Affiliations

University Politehnica of Bucharest, Bucharest, Romania
Roxana-Gabriela Stan, Catalin Negru & Florin Pop
National Institute for Research and Development in Informatics (ICI), Bucharest, Romania
Lidia Bajenaru & Florin Pop

Authors

Roxana-Gabriela Stan
View author publications
You can also search for this author in PubMed Google Scholar
Catalin Negru
View author publications
You can also search for this author in PubMed Google Scholar
Lidia Bajenaru
View author publications
You can also search for this author in PubMed Google Scholar
Florin Pop
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Catalin Negru .

Editor information

Editors and Affiliations

University Politehnica of Bucharest, Bucharest, Romania
Florin Pop
National Institute for Research and Development in Informatics, Bucharest, Romania
Gabriel Neagu

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Stan, RG., Negru, C., Bajenaru, L., Pop, F. (2021). Data and Systems Heterogeneity: Analysis on Data, Processing, Workload, and Infrastructure. In: Pop, F., Neagu, G. (eds) Big Data Platforms and Applications. Computer Communications and Networks. Springer, Cham. https://doi.org/10.1007/978-3-030-38836-2_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-38836-2_4
Published: 29 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-38835-5
Online ISBN: 978-3-030-38836-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics