Skip to main content

Data and Systems Heterogeneity: Analysis on Data, Processing, Workload, and Infrastructure

  • Chapter
  • First Online:
Big Data Platforms and Applications

Part of the book series: Computer Communications and Networks ((CCN))

  • 754 Accesses

Abstract

This paper is our survey toward a general understanding of the requirements for handling large volumes of heterogeneous data, and moreover, presents an overview of the employed computing techniques and technologies necessary for analyzing and processing those datasets. As of our attempt to picture how the data heterogeneity meets the systems heterogeneity, we summarize the identified key issues for multiple dimensions, including data, processing, workload, and infrastructure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ahmed S, Usman Ali M, Ferzund J, Atif Sarwar M, Rehman A, Mehmood A (2017) Modern data formats for big bioinformatics data analytics. Int J Adv Comput Sci Appl 8(4):366–377

    Google Scholar 

  2. Avro, Apache Software Foundation. https://avro.apache.org/

  3. Azmandian F, Moffie M, Dy JG, Aslam JA, Kaeli DR (2011) Workload characterization at the virtualization layer. In: 2011 IEEE 19th annual international symposium on modelling, analysis, and simulation of computer and telecommunication systems, pp 63–72

    Google Scholar 

  4. Carbone P, Katsifodimos A, Ewen S, Markl V, Haridi S, Tzoumas K (2015) Apache Flink™: stream and batch processing in a single engine. IEEE Data Eng Bull 38(4):28–38

    Google Scholar 

  5. Casado R, Younas M (2015) Emerging trends and technologies in big data processing. Concurr Comput Pract Exp 27(8):2078–2091

    Google Scholar 

  6. Ciobanu R, Dobre C, Bălănescu M, Suciu G (2019) Data and task offloading in collaborative mobile fog-based networks. IEEE Access 7:104405–104422

    Google Scholar 

  7. Ciobanu R, Tăbuşcă V, Dobre C, Băjenaru L, Mavromoustakis CX, Mastorakis G (2019) Avoiding data corruption in drop computing mobile networks. IEEE Access 7:31170–31185

    Google Scholar 

  8. Ciobanu R-I, Dobre C (2019) Mobile interactions and computation offloading in drop computing. In: Advances in network-based information systems. Springer International Publishing, pp 361–373

    Google Scholar 

  9. Ciobanu R-I, Negru C, Pop F, Dobre C, Mavromoustakis CX, Mastorakis G (2019) Drop computing: ad-hoc dynamic collaborative computing. Futur Gener Comput Syst 92:889–899

    Article  Google Scholar 

  10. Dean J, Ghemawat S (2004) MapReduce: simplified data processing on large clusters. In: Sixth symposium on operating system design and implementation, OSDI’04, San Francisco, CA, pp 137–150

    Google Scholar 

  11. Dünner C, Parnell T, Atasu K, Sifalakis M, Pozidis H (2017) Understanding and optimizing the performance of distributed machine learning applications on apache spark. In: 2017 IEEE international conference on big data (big data), pp 331–338

    Google Scholar 

  12. Ghemawat S, Gobioff H, Leung S-T (2003) The Google file system. ACM SIGOPS Oper Syst Rev 37(5):29–43

    Article  Google Scholar 

  13. Hadoop, Apache Software Foundation. https://hadoop.apache.org/

  14. HBase, Apache Software Foundation. https://hbase.apache.org/

  15. Hive, Apache Software Foundation. https://hive.apache.org/

  16. Jia Z, Zhan J, Wang L, Luo C, Gao W, Jin Y, Han R, Zhang L (2017) Understanding big data analytics workloads on modern processors. IEEE Trans Parallel Distrib Syst 28(6):1797–1810

    Article  Google Scholar 

  17. Lew J, Shah DA, Pati S, Cattell S, Zhang M, Sandhupatla A, Ng C, Goli N, Sinclair MD, Rogers TG, Aamodt TM (2019) Analyzing machine learning workloads using a detailed GPU simulator. In: 2019 IEEE international symposium on performance analysis of systems and software (ISPASS), pp 151–152

    Google Scholar 

  18. Lu J, Irena H (2019) Multi-model databases: a new journey to handle the variety of data. ACM Comput Surv 52(3)

    Google Scholar 

  19. Mohammadi Makrani H, Sayadi H, Pudukotai Dinakarra SM, Rafatirad S, Homayoun H (2018) A comprehensive memory analysis of data intensive workloads on server class architecture. In: Proceedings of the international symposium on memory systems, MEMSYS’18, New York, NY, USA. Association for Computing Machinery, pp 19–30

    Google Scholar 

  20. Marin R-C, Ciobanu R-I, Dobre C (2017) Improving opportunistic networks by leveraging device-to-device communication. IEEE Commun Mag 55(11):86–91

    Article  Google Scholar 

  21. Mishra AK, Nurvitadhi E, Venkatesh G, Pearce J, Marr D (2017) Fine-grained accelerators for sparse machine learning workloads. In: 2017 22nd Asia and South Pacific design automation conference (ASP-DAC), pp 635–640

    Google Scholar 

  22. ORC, Apache Software Foundation. https://orc.apache.org/

  23. Parquet, Apache Software Foundation. https://parquet.apache.org/

  24. Płuciennik E, Zgorzałek K (2017) The multi-model databases: a review. In: Beyond databases, architectures and structures. Towards efficient solutions for data analysis and knowledge representation. Springer International Publishing, pp 141–152

    Google Scholar 

  25. Samza, Apache Software Foundation. https://samza.apache.org/

  26. Samza–Core concepts, Apache Software Foundation. http://samza.apache.org/learn/documentation/latest/core-concepts/core-concepts.html

  27. Spark, Apache Software Foundation. https://spark.apache.org/

  28. Stan C-S, Pandelica A-E, Zamfir A-V, Stan R-G, Negru C (2019) Apache spark and apache ignite performance analysis. In: 2019 22nd international conference on control systems and computer science (CSCS), pp 726–733

    Google Scholar 

  29. Storm, Apache Software Foundation. https://storm.apache.org/

  30. Storm–Concepts, Apache Software Foundation. https://storm.apache.org/releases/current/Concepts.html

  31. Storm–Guaranteeing Message Processing, Apache Software Foundation. https://storm.apache.org/releases/current/Guaranteeing-message-processing.html

  32. Wang M, Meng C, Long G, Wu C, Yang J, Lin W, Jia Y (2019) Characterizing deep learning training workloads on Alibaba-PAI. In: 2019 IEEE international symposium on workload characterization (IISWC), pp 189–202

    Google Scholar 

  33. Yousefpour A, Fung C, Nguyen T, Kadiyala K, Jalali F, Niakanlahiji A, Kong J, Jue JP (2019) All one needs to know about fog computing and related edge computing paradigms: a complete survey. J Syst Architect 98:289–330

    Article  Google Scholar 

  34. Yu S, Zhang L, Li L, Yan B, Cai Z, Zhang L (2019) An efficient interest-aware data dissemination approach in opportunistic networks. Procedia Comput Sci 147:394–399

    Google Scholar 

  35. Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Franklin MJ, Shenker S, Stoica I (2012) Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX conference on networked systems design and implementation, NSDI’12, USA. USENIX Association

    Google Scholar 

  36. Zaharia M, Xin RS, Wendell P, Das T, Armbrust M, Dave A, Meng X, Rosen J, Venkataraman S, Franklin MJ, Ghodsi A, Gonzalez J, Shenker S, Stoica I (2016) Apache Spark: a unified engine for big data processing. Commun ACM 59(11):56–65

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Catalin Negru .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Stan, RG., Negru, C., Bajenaru, L., Pop, F. (2021). Data and Systems Heterogeneity: Analysis on Data, Processing, Workload, and Infrastructure. In: Pop, F., Neagu, G. (eds) Big Data Platforms and Applications. Computer Communications and Networks. Springer, Cham. https://doi.org/10.1007/978-3-030-38836-2_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-38836-2_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-38835-5

  • Online ISBN: 978-3-030-38836-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics