Skip to main content
  • Reference work
  • © 2019

Encyclopedia of Big Data Technologies

  • Presents 300+ entries covering key concepts and terms in the broad field of machine learning

  • Updates and informs through in-depth essays and definitions, historical background, key applications, and bibliographies

  • Supports quick and efficient discovery of information through extensive cross-references

  • Opens the field to those inquiring into this fast-growing area of research

  • Includes supplementary material:

Buying options

eBook USD 849.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-77525-8
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Hardcover Book USD 999.99
Price excludes VAT (USA)

This is a preview of subscription content, access via your institution.

Table of contents (632 entries)

  1. Front Matter

    Pages i-xlvi
  2. A

    1. Achieving Low Latency Transactions for Geo-replicated Storage with Blotter

      • Henrique Moniz, João Leitão, Ricardo J. Dias, Johannes Gehrke, Nuno Preguiça, Rodrigo Rodrigues
      Pages 1-10
    2. Active Disk

      Pages 10-10
    3. Active Storage

      • Ilia Petrov, Tobias Vinçon, Andreas Koch, Julian Oppermann, Sergey Hardock, Christian Riegger
      Pages 11-18
    4. Ad Hoc Benchmark

      Pages 18-18
    5. Adaptive Partitioning

      Pages 18-18
    6. Adaptive Windowing

      • Ricard Gavaldà
      Pages 18-23
    7. Advancements in YARN Resource Manager

      • Konstantinos Karanasos, Arun Suresh, Chris Douglas
      Pages 23-32
    8. ADWIN Algorithm

      Pages 32-32
    9. Alignment Creation

      Pages 32-32
    10. Analytics Benchmarks

      • Todor Ivanov, Roberto V. Zicari
      Pages 32-41
    11. Apache Apex

      • Ananth Gundabattula, Thomas Weise
      Pages 41-51
    12. Apache Flink

      • Fabian Hueske, Timo Walther
      Pages 51-58
    13. Apache Hadoop

      Pages 58-58
    14. Apache Kafka

      • Matthias J. Sax
      Pages 58-66
    15. Apache Mahout

      • Andrew Musselman
      Pages 66-70
    16. Apache Samza

      • Martin Kleppmann
      Pages 70-77
    17. Apache Spark

      • Alexandre da Silva Veith, Marcos Dias de Assuncao
      Pages 77-81

About this book

The Encyclopedia of Big Data Technologies provides researchers, educators, students and industry professionals with a comprehensive authority over the most relevant Big Data Technology concepts. With over 300 articles written by worldwide subject matter experts from both industry and academia, the encyclopedia covers topics such as big data storage systems, NoSQL database, cloud computing, distributed systems, data processing, data management, machine learning and social technologies, data science.  Each peer-reviewed, highly structured entry provides the reader with basic terminology, subject overviews, key research results, application examples, future directions, cross references and a bibliography. The entries are expository and tutorial, making this reference a practical resource for students, academics, or professionals. In addition, the distinguished, international editorial board of the encyclopedia consists of well-respected scholars, each developing topics based upon their expertise.




  • Big Data
  • Data Science
  • Data Analytics
  • NoSQL
  • Big SQL

Editors and Affiliations

  • Institute of Computer Science, University of Tartu, Tartu, Estonia

    Sherif Sakr

  • School of Information Technologies, Sydney University, Sydney, Australia

    Albert Y. Zomaya

About the editors

Editorial Board:

Sherif Sakr (Editor-in-Chief), Institute of Computer Science, University of Tartu, Tartu, Estonia  

Albert Y. Zomaya (Editor-in-Chief), School of Information Technologies, Sydney University, Sydney, Australia


Pramod Bhatotia, School of Informatics, University of Edinburgh, Edinburgh, UK

Rodrigo N. Calheiros, School of Computing, Engineering and Mathematics, Western Sydney University, Penrith, NSW, Australia

Aamir Cheema, Monash University, Australia

Jinjun Chen, School of Software and Electrical Engineering, Swinburne University of Technology, Hawthorn, VIC, Australia

Philippe Cudré-Mauroux, eXascale Infolab, University of Fribourg, Fribourg, Switzerland

Marcos Dias de Assuncao, Inria, LIP, ENS Lyon, Lyon, France

Marlon Dumas, Institute of Computer Science, University of Tartu, Tartu, Estonia

Paolo Ferragina, Department of Computer Science, University of Pisa, Pisa, Italy

George Fletcher, Technische Universiteit Eindhoven, Eindhoven, Netherlands

Olaf Hartig, Linköping University, Linköping, Sweden

Bingsheng He, National University of Singapore, Singapore

Asterios Katsifodimos, TU Delft, Delft, Netherlands

Alessandro Margara, Politecnico di Milano, Milano, Italy

Kamran Munir, Computer Science and Creative Technologies, University of the West of England, Bristol, UK

Behrooz Parhami, Department of Electrical and Computer Engineering, University of California, Santa Barbara, CA, USA

Antonio Pescapè, Department of Electrical Engineering and Information Technology, University of Napoli Federico II, Napoli, Italy

Meikel Poess, Server Technologies, Oracle, Redwood Shores, California, United States

Deepak Puthal, Faculty of Engineering and Information Technologies, School of Electrical and Data Engineering, University of Technology Sydney, Ultimo, NSW, Australia

Tilmann Rabl, Technische Universität Berlin, Database Systems and Information Management Group, Berlin, Germany

Mohammad Sadoghi, University of California, Davis, CA, USA

Timos Sellis, Swinburne University of Technology, Data Science Research Institute, Hawthorn, Victoria, Australia

Domenico Talia, University of Calabria, Italy

Maik Thiele, Database Systems Group, Technische Universität Dresden, Dresden, Saxony, Germany

Yuanyuan Tian, IBM Almaden Research Center, SAN JOSE, CA, United States

Paolo Trunfio, University of Calabria, DIMES, Rende, Italy

Hannes Voigt, Dresden Database Systems Group, Technische Universität Dresden, Dresden, Germany

Matthias Weidlich, Humboldt-Universität zu Berlin, Department of Computer Science, Berlin, Germany

Fatma Özcan, IBM Research – Almaden, San Jose, CA, USA


Sherif Sakr is the Head of Data Systems Group at the Institute of Computer Science, University of Tartu. He received his PhD degree in Computer and Information Science from Konstanz University, Germany in 2007. He received his BSc and MSc degrees in Computer Science from the Information Systems department at the Faculty of Computers and Information in Cairo University, Egypt, in 2000 and 2003 respectively. During his career, Prof. Sakr held appointments in several international and reputable organizations including University of New South Wales, Macquarie University, Data61/CSIRO, Microsoft Research, Nokia Bell Labs and King Saud bin Abdulaziz University for Health Sciences. 

Prof. Sakr's research interest is data and information management in general, particularly in big data processing systems, big data analytics, data science and big data management in cloud computing platforms. Prof. Sakr has published more than 100 refereed research publications in international journals and conferences such as: Proceedings of the VLDB endowment (PVLDB), IEEE Transactions on Parallel and Distributed Systems (IEEE TPDS), IEEE Transactions on Service Computing (IEEE TSC), IEEE Transactions on Big Data (IEEE TBD), ACM Computing Survey (ACM CSUR), Journal of Computer, Systems and Science (JCSS), Information Systems, Cluster Computing, Grid Computing, IEEE Communications Surveys and Tutorials (IEEE COMST), IEEE Software, Scientometrics, VLDB, SIGMOD, ICDE, EDBT, WWW, CIKM, ISWC, BPM, ER, ICWS, ICSOC, IEEE SCC, IEEE Cloud, TPCTC, DASFAA, ICPE and JCDL. Prof. Sakr Co-authored 5 books and Co-Edited 3 other books in the areas of data and information management and processing. Sherif is an associate editor of the cluster computing journal and Transactions on Large-Scale Data and Knowledge-Centered Systems (TLDKS). He is also an editorial board member of many reputable international journals. Prof. Sakr is an ACM Senior Member and an IEEE Senior Member. In 2017, he has been appointed to serve as an ACM Distinguished Speaker and as an IEEE Distinguished Speaker. For more information, please visit his personal web page ( and his research group page (

Albert Y. Zomaya is currently the Chair Professor of High Performance Computing & Networking in the School of Information Technologies, University of Sydney. He is also the Director of the Centre for Distributed and High Performance Computing which was established in late 2009. Dr. Zomaya was an Australian Research Council Professorial Fellow during 2010-2014 and held the CISCO Systems Chair Professor of Internetworking during the period 2002–2007 and also was Head of school for 2006–2007 in the same school.

Prior to his current appointment he was a Full Professor in the Electrical and Electronic Engineering Department at the University of Western Australia, where he also led the Parallel Computing Research Laboratory during the period 1990–2002. He served as Associate–, Deputy–, and Acting–Head in the same department, and held numerous visiting positions and has extensive industry involvement.

Dr. Zomaya published more than 600 scientific papers and articles and is author, co-author or editor of more than 20 books. He served as the Editor in Chief of the IEEE Transactions on Computers (2011-2014). Currently, he serves as a Founding Editor-in-Chief for the IEEE Transactions on Sustainable Computing, a Co-Founding Editor in Chief of the IET Cyber-Physical Systems: Theory and Applications, Associate Editor-in-Chief (Special Issues), Journal of Parallel and Distributed Computing.

Dr. Zomaya is an Associate Editor for several leading journals, such as, ACM Transactions on Internet Technology, ACM Computing Surveys, IEEE Transactions on Cloud Computing, IEEE Transactions on Computational Social Systems, and IEEE Transactions on Big Data. He is also the Founding Editor of several book series, such as, the Wiley Book Series on Parallel and Distributed ComputingSpringer Scalable Computing and Communications, and the IET Book Series on Big Data.

Dr. Zomaya was the Chair the IEEE Technical Committee on Parallel Processing (1999–2003) and currently serves on its executive committee. He is the ViceChairIEEE Task Force on Computational Intelligence for Cloud Computing and serves on the advisory board of the IEEE Technical Committee on Scalable Computingand the steering committee of the IEEE Technical Area in Green Computing.

Dr. Zomaya has delivered more than 180 keynote addresses, invited seminars, and media briefings and has been actively involved, in a variety of capacities, in the organization of more than 700 conferences.

Dr. Zomaya is a Fellow of the IEEE, the American Association for the Advancement of Science, the Institution of Engineering and Technology (UK). He is a Chartered Engineer and an IEEE Computer Society’s Golden Core member. He received the 1997 Edgeworth David Medal from the Royal Society of New South Wales for outstanding contributions to Australian Science. Dr. Zomaya is the recipient of the IEEE Technical Committee on Parallel Processing Outstanding Service Award (2011), the IEEE Technical Committee on Scalable Computing Medal for Excellence in Scalable Computing (2011), the IEEE Computer Society Technical Achievement Award (2014), and the ACM MSWIM Reginald A. Fessenden Award (2017). His research interests span several areas in parallel and distributed computing and complex systems. More information can be found at

Bibliographic Information

Buying options

eBook USD 849.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-77525-8
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Hardcover Book USD 999.99
Price excludes VAT (USA)