Integrating Big Data and Relational Data with a Functional SQL-like Query Language

  • Carlyna Bondiombouy
  • Boyan Kolev
  • Oleksandra Levchenko
  • Patrick Valduriez
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9261)


Multistore systems have been recently proposed to provide integrated access to multiple, heterogeneous data stores through a single query engine. In particular, much attention is being paid on the integration of unstructured big data typically stored in HDFS with relational data. One main solution is to use a relational query engine that allows SQL-like queries to retrieve data from HDFS, which requires the system to provide a relational view of the unstructured data and hence is not always feasible. In this paper, we introduce a functional SQL-like query language that can integrate data retrieved from different data stores and take full advantage of the functionality of the underlying data processing frameworks by allowing the ad hoc usage of user defined map/filter/reduce operators in combination with traditional SQL statements. Furthermore, the query language allows for optimization by enabling subquery rewriting so that filter conditions can be pushed inside and executed at the data store as early as possible. Our approach is validated with two data stores and a representative query that demonstrates the usability of the query language and evaluates the benefits from query optimization.


Data Store Query Processing Query Language Query Optimization Query Execution 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Abouzeid, A., Badja-Pawlikowski, K., Abadi, D., Silberschatz, A., Rasin, A.: HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. PVLDB 12, 922–933 (2009)Google Scholar
  2. 2.
    Bugiotti, F., Bursztyn, D., Deutsch, A., Ileana, I., Manolescu, I.: Invisible glue: scalable self-tuning multi-stores. In: CIDR Conference (2015)Google Scholar
  3. 3.
    Chaiken, R., Jenkins, B., Larson, P., Ramsey, B., Shakib, D., Weaver, S., Zhou, J.: SCOPE: easy and efficient parallel processing of massive data sets. PVLDB 1, 1265–1276 (2008)zbMATHGoogle Scholar
  4. 4.
  5. 5.
    CoherentPaaS project.
  6. 6.
    DeWitt, D., Halverson, A., Nehme, R., Shankar, S., Aguilar-Saborit, J., Avanes, A., Flasza, M., Gramling, J.: Split query processing in polybase. In: ACM SIGMOD Conference, pp. 1255–1266 (2013)Google Scholar
  7. 7.
    Haas, L., Kossmann, D., Wimmers, E., Yang, J.: Optimizing queries across diverse data sources. In: International Conference on Very Large Databases (VLDB), pp. 276–285 (1997)Google Scholar
  8. 8.
    Hacigümüs, H., Sankaranarayanan, J., Tatemura, J., LeFevre, J., Polyzotis, N.: Odyssey: a multi-store system for evolutionary analytics. PVLDB 6, 1180–1181 (2013)Google Scholar
  9. 9.
    LeFevre, J., Sankaranarayanan, J., Hacigümüs, H., Tatemura, J., Polyzotis, N., Carey, M.: MISO: souping up big data query processing with a multistore system. In: ACM SIGMOD Conference, pp. 1591–1602 (2014)Google Scholar
  10. 10.
    Minpeng, Z., Tore, R.: Querying combined cloud-based and relational databases. In: International Conference on Cloud and Service Computing (CSC), pp. 330–335 (2011)Google Scholar
  11. 11.
    Özsu, T., Valduriez, P.: Principles of distributed database systems. Springer, New York (2011)Google Scholar
  12. 12.
    Simitsis, A., Wilkinson, K., Castellanos, M., Dayal, U.: Optimizing analytic data flows for multiple execution engines. In: ACM SIGMOD Conference, pp. 829–840 (2012)Google Scholar
  13. 13.
    Tomasic, A., Raschid, L., Valduriez, P.: Scaling access to heterogeneous data sources with DISCO. IEEE Trans. Knowl. Data Eng. 10, 808–823 (1998)CrossRefGoogle Scholar
  14. 14.
    Yuanyuan, T., Zou, T., Özcan, F., Gonscalves, R., Pirahesh, H.: Joins for hybrid warehouses: exploiting massive parallelism and enterprise data warehouses. In: EDBT/ICDT Conference, 12 p. (2015)Google Scholar
  15. 15.
    Zhou, J., Bruno, N., Wu, M., Larson, P., Chaiken, R., Shakib, D.: SCOPE: parallel databases meet MapReduce. PVLDB 21, 611–636 (2012)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Carlyna Bondiombouy
    • 1
  • Boyan Kolev
    • 1
  • Oleksandra Levchenko
    • 1
    • 2
  • Patrick Valduriez
    • 1
  1. 1.Inria and LIRMMUniversity of MontpellierMontpellierFrance
  2. 2.Odessa National Polytechnic UniversityOdessaUkraine

Personalised recommendations