Compilation and Synthesis in Big Data Analytics
Databases and compilers are two long-established and quite distinct areas of computer science. With the advent of the big data revolution, these two areas move closer, to the point that they overlap and merge. Researchers in programming languages and compiler construction want to take part in this revolution, and also have to respond to the need of programmers for suitable tools to develop data-driven software for data-intensive tasks and analytics. Database researchers cannot ignore the fact that most big-data analytics is performed in systems such as Hadoop that run code written in general-purpose programming languages rather than query languages. To remain relevant, each community has to move closer to the other. In the first part of this keynote, I illustrate this current trend further, and describe a number of interesting and inspiring research efforts that are currently underway in these two communities, as well as open research challenges. In the second part, I present a number of research projects in this space underway in my group at EPFL, including work on the static and just-in-time compilation of analytics programs and database systems, and the automatic synthesis of out-of-core algorithms that efficiently exploit the memory hierarchy.