Skip to main content
Log in

CloudMdsQL: querying heterogeneous cloud data stores with a common language

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

The blooming of different cloud data management infrastructures, specialized for different kinds of data and tasks, has led to a wide diversification of DBMS interfaces and the loss of a common programming paradigm. In this paper, we present the design of a cloud multidatastore query language (CloudMdsQL), and its query engine. CloudMdsQL is a functional SQL-like language, capable of querying multiple heterogeneous data stores (relational and NoSQL) within a single query that may contain embedded invocations to each data store’s native query interface. The query engine has a fully distributed architecture, which provides important opportunities for optimization. The major innovation is that a CloudMdsQL query can exploit the full power of local data stores, by simply allowing some local data store native queries (e.g. a breadth-first search query against a graph database) to be called as functions, and at the same time be optimized, e.g. by pushing down select predicates, using bind join, performing join ordering, or planning intermediate data shipping. Our experimental validation, with three data stores (graph, document and relational) and representative queries, shows that CloudMdsQL satisfies the five important requirements for a cloud multidatastore query language.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Armbrust, M., Xin, R., Lian, C., Huai, Y., Liu, D., Bradley, J., Meng, X., Kaftan, T., Franklin, M., Ghodsi, A., Zaharia, M.: Spark SQL: Relational Data Processing in Spark. ACM SIGMOD Int. Conf. on Management of Data, pp. 1383-1394 (2015)

  2. Binnig, C., Rehrmann, R., Faerber, F., Riewe, R.: FunSQL: It is time to make SQL functional. Int. Conf. on Extending Database Technology / Database Theory (EDBT/ICDT), pp. 41-46 (2012)

  3. Bondiombouy, C., Kolev, B., Levchenko, O., Valduriez, P. : Integrating Big Data and Relational Data with a Functional SQL-like Query Language. Int. Conf. on Databases and Expert Systems Applications (DEXA), pp. 170-185 (2015)

  4. Bugiotti, F., Bursztyn, D., Deutsch, A., Ileana, I., Manolescu, I.: Invisible Glue: Scalable Self-Tuning Multi-Stores. Conf. on Innovative Data Systems Research (CIDR), 7pp (2015)

  5. CoherentPaaS Project, http://coherentpaas.eu. [Last accessed on August 18, 2015]

  6. Danforth, S., Valduriez, P.: A FAD for Data-Intensive Applications. IEEE Trans. on Knowledge and Data Engineering 4(1), 34–51 (1992)

    Article  MATH  Google Scholar 

  7. Doan, A., Halevy, A., Ives, Z.: Principles of Data Integration. Morgan Kaufmann, (2012)

  8. Godfrey, P., Gryz, J., Hoppe, A., Ma, W., Zuzarte, C.: Query rewrites with views for XML in DB2. IEEE Int. Conf. on Data Engineering, pp. 1339–1350 (2009)

  9. Gulisano, V., Jiménez-Peris, R., Patiño-Martinez, M., Valduriez, P.: StreamCloud: A Large Scale Data Streaming System. IEEE Int. Conf. on Distributed Computing Systems (ICDCS), pp. 126-137 (2010)

  10. Gulisano, V., Jiménez-Peris, R., Patiño-Martinez, M., Soriente, C., Valduriez, P.: StreamCloud: An Elastic and Scalable Data Streaming System. IEEE Trans. On Parallel and Distributed Systems 23(12), 2351–2365 (2012)

    Article  Google Scholar 

  11. Haas, L. M., Kossmann, D., Wimmers, E. L., Yang, J.: Optimizing Queries across Diverse Data Sources. Int. Conf. on Very Large Databases (VLDB), pp. 276-285 (1997)

  12. Haase, P., Mathäß, T., Ziller, M.: An Evaluation of Approaches to Federated Query Processing over Linked Data. Int. Conf. on Semantic Systems (I-SEMANTICS) (2010)

  13. Hacıgümüs, H., Sankaranarayanan, J., Tatemura, J., LeFevre, J., Polyzotis, N.: Odyssey: A Multi-Store System for Evolutionary Analytics. Proceedings of the VLDB Endowment (PVLDB) 6(11), 1180–1181 (2013)

    Article  Google Scholar 

  14. Hart, B., Valduriez, P., Danforth, S.: Parallelizing FAD using Compile Time Analysis Techniques. IEEE Data Engineering Bulletin (12) 1, 9–15 (1989)

    Google Scholar 

  15. JSON Schema and Hyper-Schema, http://json-schema.org. [Last accessed on August 18, 2015]

  16. LeFevre, J., Sankaranarayanan, J., Hacıgümüs, H., Tatemura, J., Polyzotis, N., Carey, M.: MISO: Souping Up Big Data Query Processing with a Multistore System. ACM SIGMOD Int. Conf. on Management of Data, pp. 1591-1602 (2014)

  17. Liu, Z.H., Chang, H.J., Sthanikam, B.: Efficient support of XQuery Update Facility in XML enabled RDBMS. IEEE Int. Conf. on Data Engineering, pp. 1394–1404 (2012)

  18. Martínez-Bazan, N., Muntés-Mulero, V., Gómez-Villamor, S., Águila-Llorente, M.A., Domínguez-Sal, D., Larriba-Pey, J-L.: Efficient Graph Management Based on Bitmap Indices. Int. Database Engineering & Applications Symposium (IDEAS), pp. 110-119 (2012)

  19. Meijer, E., Beckman, B., Bierman, G. M.: LINQ: Reconciling Object, Relations and XML in the .NET Framework. ACM SIGMOD Int. Conf. on Data Management, pp. 706-706 (2006)

  20. NoSQL Databases, http://nosql-database.org. [Last accessed on August 18, 2015]

  21. Özsu, T., Valduriez, P.: Principles of Distributed Database Systems – Third Edition. Springer, 850 pages (2011)

  22. Tomasic, A., Raschid, L., Valduriez, P.: Scaling Access to Heterogeneous Data Sources with DISCO. IEEE Transactions on Knowledge and Data Engineering 10(5), 808–823 (1998)

    Article  Google Scholar 

  23. Valduriez, P., Danforth, S.: Functional SQL, an SQL Upward Compatible Database Programming Language. Information Sciences 62(3), 183–203 (1992)

    Article  MATH  Google Scholar 

  24. Wyss, C.M., Robertson, E.L.: Relational Languages for Metadata Integration. ACM Trans. On Database Systems 30(2), 624–660 (2005)

    Article  Google Scholar 

  25. Yuanyuan, T., Zou, T., Özcan, F., Goncalves, R., Pirahesh, H.: Joins for Hybrid Warehouses: Exploiting Massive Parallelism and Enterprise Data Warehouses. Int. Conf. on Extending Database Technology / Database Theory (EDBT/ICDT), pp. 373-384 (2015)

  26. Zhu, M., Risch, T.: Querying Combined Cloud-Based and Relational Databases, Int. Conf. on Cloud and Service Computing, pp. 330–335 (2011)

  27. Zhu, Q., Larson, P.-A.: Global Query Processing and Optimization in the CORDS Multidatabase System, Int. Conf. on Parallel and Distributed Computing Systems, pp. 640–647 (1996)

Download references

Acknowledgments

Work partially funded by the European Commission through the CoherentPaaS FP7 Project funded under contract FP7-611068 [5]. We want to thank Norbert Martínez-Bazan for his contributions on the first version of the CloudMdsQL query engine. We also thank the editor and reviewers for their careful readings and useful suggestions that helped improving our design and the paper. The work of Prof. Ricardo Jimenez was also partially funded by the Regional Government of Madrid (CAM) under Project Cloud4BigData (S2013/ICE-2894) cofunded by ESF & ERDF, and the Spanish Research Council (MICCIN) under Project BigDataPaaS (TIN2013-46883).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Boyan Kolev.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kolev, B., Valduriez, P., Bondiombouy, C. et al. CloudMdsQL: querying heterogeneous cloud data stores with a common language. Distrib Parallel Databases 34, 463–503 (2016). https://doi.org/10.1007/s10619-015-7185-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-015-7185-y

Keywords

Navigation