Encyclopedia of Big Data Technologies

Living Edition
| Editors: Sherif Sakr, Albert Zomaya

Languages for Big Data analysis

  • Marco Aldinucci
  • Maurizio Drocco
  • Claudia Misale
  • Guy Tremblay
Living reference work entry
DOI: https://doi.org/10.1007/978-3-319-63962-8_142-1

Overview

Boosted by Big Data popularity, new languages and frameworks for data analytics are appearing at an increasing pace. Each of them introduces its own concepts and terminology and advocates a (real or alleged) superiority in terms of performances or expressiveness against predecessors. In this hype, for a user approaching Big Data analytics (even an educated computer scientist), it might be difficult to have a clear picture of the programming model underneath these tools and the expressiveness they provide to solve some user-defined problem.

To provide some order in the world of Big Data processing, a toolkit of models to identify their common features is introduced, starting from data layout.

Data processing applications are divided into batch vs. stream processing. Batch programs process one or more finite datasets to produce a resulting finite output dataset, whereas stream programs process possibly unbounded sequences of data, called streams, doing so in an incremental...

This is a preview of subscription content, log in to check access.

References

  1. Akidau T, Bradshaw R, Chambers C, Chernyak S, Fernàndez-Moctezuma RJ, Lax R, McVeety S, Mills D, Perry F, Schmidt E, Whittle S (2015) The dataflow model: a practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. Proc VLDB Endowment 8:1792–1803CrossRefGoogle Scholar
  2. Aldinucci M, Danelutto M, Anardu L, Torquati M, Kilpatrick P (2012) Parallel patterns + macro data flow for multi-core programming. In: Proceedings of International Euromicro PDP 2012: parallel distributed and network-based processing. IEEE, Garching, pp 27–36Google Scholar
  3. Carbone P, Fóra G, Ewen S, Haridi S, Tzoumas K (2015) Lightweight asynchronous snapshots for distributed dataflows. CoRR abs/1506.08603Google Scholar
  4. Chu CT, Kim SK, Lin YA, Yu Y, Bradski G, Ng AY, Olukotun K (2006) Map-reduce for machine learning on multicore. In: Proceedings of the 19th International conference on Neural information processing systems, pp 281–288Google Scholar
  5. Cole M (1989) Algorithmic skeletons: structured management of parallel computations. Research monographs in parallel and distributed computing, PitmanGoogle Scholar
  6. Dean J, Ghemawat S (2004) MapReduce: simplified data processing on large clusters. In: Proceedings of 6th Usenix symposium on operating systems design & implementation, pp 137–150Google Scholar
  7. Lee EA, Parks TM (1995) Dataflow process networks. Proc IEEE 83(5):773–801CrossRefGoogle Scholar
  8. Misale C (2017) PiCo: a domain-specific language for data analytics pipelines. PhD thesis, Computer Science Department, University of TorinoGoogle Scholar
  9. Misale C, Drocco M, Aldinucci M, Tremblay G (2017a) A comparison of big data frameworks on a layered dataflow model. Parallel Process Lett 27(01):1740003MathSciNetCrossRefGoogle Scholar
  10. Misale C, Drocco M, Tremblay G, Aldinucci M (2017b) PiCo: a novel approach to stream data analytics. In: Euro-Par 2017 Auto-DaSP workshopGoogle Scholar
  11. Nasir MAU, Morales GDF, García-Soriano D, Kourtellis N, Serafini M (2015) The power of both choices: practical load balancing for distributed stream processing engines. CoRR abs/1504.00788Google Scholar
  12. Toshniwal A, Taneja S, Shukla A, Ramasamy K, Patel JM, Kulkarni S, Jackson J, Gade K, Fu M, Donham J, Bhagat N, Mittal S, Ryaboy D (2014) Storm@twitter. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 147–156Google Scholar
  13. Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Franklin MJ, Shenker S, Stoica I (2012) Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX conference on networked systems design and implementationGoogle Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • Marco Aldinucci
    • 1
  • Maurizio Drocco
    • 1
  • Claudia Misale
    • 2
  • Guy Tremblay
    • 3
  1. 1.Computer Science DepartmentUniversity of TurinTurinItaly
  2. 2.Cognitive and Cloud, Data-Centric Solutions, IBM T.J. Watson Research CenterNew YorkUSA
  3. 3.Département d’InformatiqueUniversité du Québec à MontréalMontréalCanada

Section editors and affiliations

  • Domenico Talia
  • Paolo Trunfio
    • 1
  1. 1.DIMESUniversity of CalabriaRendeItaly