Definition
Apache SystemML (Ghoting et al. 2011; Boehm et al. 2016) is a system for declarative, large-scale machine learning (ML) that aims to increase the productivity of data scientists. ML algorithms are expressed in a high-level language with R- or Python-like syntax, and the system automatically generates efficient, hybrid execution plans of single-node CPU or GPU operations, as well as distributed operations using data-parallel frameworks such as MapReduce (Dean and Ghemawat 2004) or Spark (Zaharia et al. 2012). SystemML’s high-level abstraction provides the necessary flexibility to specify custom ML algorithms while ensuring physical data independence, independence of the underlying runtime operations and technology stack, and scalability for large data. Separating the concerns of algorithm semantics and execution plan generation is essential for the automatic optimization of execution plans regarding different data and cluster characteristics, without the need for algorithm...
References
Abadi M et al (2016) TensorFlow: a system for large-scale machine learning. In: OSDI
Ashari A, Tatikonda S, Boehm M, Reinwald B, Campbell K, Keenleyside J, Sadayappan P (2015) On optimizing machine learning workloads via Kernel fusion. In: PPoPP
Boehm M, Burdick DR, Evfimievski AV, Reinwald B, Reiss FR, Sen P, Tatikonda S, Tian Y (2014a) SystemML’s optimizer: plan generation for large-scale machine learning programs. IEEE Data Eng Bull 37(3):52–62
Boehm M, Tatikonda S, Reinwald B, Sen P, Tian Y, Burdick D, Vaithyanathan S (2014b) Hybrid parallelization strategies for large-scale machine learning in SystemML. PVLDB 7(7):553–564
Boehm M, Dusenberry M, Eriksson D, Evfimievski AV, Manshadi FM, Pansare N, Reinwald B, Reiss F, Sen P, Surve A, Tatikonda S (2016) SystemML: declarative machine learning on spark. PVLDB 9(13): 1425–1436
Dean J, Ghemawat S (2004) MapReduce: simplified data processing on large clusters. In: OSDI
Elgamal T, Luo S, Boehm M, Evfimievski AV, Tatikonda S, Reinwald B, Sen P (2017) SPOOF: sum-product optimization and operator fusion for large-scale machine learning. In CIDR
Elgohary A, Boehm M, Haas PJ, Reiss FR, Reinwald B (2016) Compressed linear algebra for large-scale machine learning. PVLDB 9(12):960–971
Ghoting A, Krishnamurthy R, Pednault EPD, Reinwald B, Sindhwani V, Tatikonda S, Tian Y, Vaithyanathan S (2011) SystemML: declarative machine learning on MapReduce. In: ICDE
Huang B, Boehm M, Tian Y, Reinwald B, Tatikonda S, Reiss FR (2015) Resource elasticity for large-scale machine learning. In: SIGMOD
Kumar A, Boehm M, Yang J (2017) Data management in machine learning: challenges, techniques, and systems. In: SIGMOD
Low Y, Gonzalez J, Kyrola A, Bickson D, Guestrin C, Hellerstein JM (2012) Distributed GraphLab: a framework for machine learning in the cloud. PVLDB 5(8)
Tian Y, Tatikonda S, Reinwald B (2012) Scalable and numerically stable descriptive statistics in SystemML. In: ICDE
Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauly M, Franklin MJ, Shenker S, Stoica I (2012) Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: NSDI
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this entry
Cite this entry
Boehm, M. (2018). Apache SystemML. In: Sakr, S., Zomaya, A. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-63962-8_187-1
Download citation
DOI: https://doi.org/10.1007/978-3-319-63962-8_187-1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63962-8
Online ISBN: 978-3-319-63962-8
eBook Packages: Springer Reference MathematicsReference Module Computer Science and Engineering