CloudMdsQL: querying heterogeneous cloud data stores with a common language

Kolev, Boyan; Valduriez, Patrick; Bondiombouy, Carlyna; Jiménez-Peris, Ricardo; Pau, Raquel; Pereira, José

doi:10.1007/s10619-015-7185-y

CloudMdsQL: querying heterogeneous cloud data stores with a common language

Published: 25 September 2015

Volume 34, pages 463–503, (2016)
Cite this article

Distributed and Parallel Databases Aims and scope Submit manuscript

Boyan Kolev ORCID: orcid.org/0000-0003-4871-0434¹,
Patrick Valduriez¹,
Carlyna Bondiombouy¹,
Ricardo Jiménez-Peris²,
Raquel Pau³ &
…
José Pereira⁴

1298 Accesses
53 Citations
3 Altmetric
Explore all metrics

Abstract

The blooming of different cloud data management infrastructures, specialized for different kinds of data and tasks, has led to a wide diversification of DBMS interfaces and the loss of a common programming paradigm. In this paper, we present the design of a cloud multidatastore query language (CloudMdsQL), and its query engine. CloudMdsQL is a functional SQL-like language, capable of querying multiple heterogeneous data stores (relational and NoSQL) within a single query that may contain embedded invocations to each data store’s native query interface. The query engine has a fully distributed architecture, which provides important opportunities for optimization. The major innovation is that a CloudMdsQL query can exploit the full power of local data stores, by simply allowing some local data store native queries (e.g. a breadth-first search query against a graph database) to be called as functions, and at the same time be optimized, e.g. by pushing down select predicates, using bind join, performing join ordering, or planning intermediate data shipping. Our experimental validation, with three data stores (graph, document and relational) and representative queries, shows that CloudMdsQL satisfies the five important requirements for a cloud multidatastore query language.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multistore Big Data Integration with CloudMdsQL

Data Model for Cloud Computing Environment

SQL Query Optimization in Distributed NoSQL Databases for Cloud-Based Applications

References

Armbrust, M., Xin, R., Lian, C., Huai, Y., Liu, D., Bradley, J., Meng, X., Kaftan, T., Franklin, M., Ghodsi, A., Zaharia, M.: Spark SQL: Relational Data Processing in Spark. ACM SIGMOD Int. Conf. on Management of Data, pp. 1383-1394 (2015)
Binnig, C., Rehrmann, R., Faerber, F., Riewe, R.: FunSQL: It is time to make SQL functional. Int. Conf. on Extending Database Technology / Database Theory (EDBT/ICDT), pp. 41-46 (2012)
Bondiombouy, C., Kolev, B., Levchenko, O., Valduriez, P. : Integrating Big Data and Relational Data with a Functional SQL-like Query Language. Int. Conf. on Databases and Expert Systems Applications (DEXA), pp. 170-185 (2015)
Bugiotti, F., Bursztyn, D., Deutsch, A., Ileana, I., Manolescu, I.: Invisible Glue: Scalable Self-Tuning Multi-Stores. Conf. on Innovative Data Systems Research (CIDR), 7pp (2015)
CoherentPaaS Project, http://coherentpaas.eu. [Last accessed on August 18, 2015]
Danforth, S., Valduriez, P.: A FAD for Data-Intensive Applications. IEEE Trans. on Knowledge and Data Engineering 4(1), 34–51 (1992)
Article MATH Google Scholar
Doan, A., Halevy, A., Ives, Z.: Principles of Data Integration. Morgan Kaufmann, (2012)
Godfrey, P., Gryz, J., Hoppe, A., Ma, W., Zuzarte, C.: Query rewrites with views for XML in DB2. IEEE Int. Conf. on Data Engineering, pp. 1339–1350 (2009)
Gulisano, V., Jiménez-Peris, R., Patiño-Martinez, M., Valduriez, P.: StreamCloud: A Large Scale Data Streaming System. IEEE Int. Conf. on Distributed Computing Systems (ICDCS), pp. 126-137 (2010)
Gulisano, V., Jiménez-Peris, R., Patiño-Martinez, M., Soriente, C., Valduriez, P.: StreamCloud: An Elastic and Scalable Data Streaming System. IEEE Trans. On Parallel and Distributed Systems 23(12), 2351–2365 (2012)
Article Google Scholar
Haas, L. M., Kossmann, D., Wimmers, E. L., Yang, J.: Optimizing Queries across Diverse Data Sources. Int. Conf. on Very Large Databases (VLDB), pp. 276-285 (1997)
Haase, P., Mathäß, T., Ziller, M.: An Evaluation of Approaches to Federated Query Processing over Linked Data. Int. Conf. on Semantic Systems (I-SEMANTICS) (2010)
Hacıgümüs, H., Sankaranarayanan, J., Tatemura, J., LeFevre, J., Polyzotis, N.: Odyssey: A Multi-Store System for Evolutionary Analytics. Proceedings of the VLDB Endowment (PVLDB) 6(11), 1180–1181 (2013)
Article Google Scholar
Hart, B., Valduriez, P., Danforth, S.: Parallelizing FAD using Compile Time Analysis Techniques. IEEE Data Engineering Bulletin (12) 1, 9–15 (1989)
Google Scholar
JSON Schema and Hyper-Schema, http://json-schema.org. [Last accessed on August 18, 2015]
LeFevre, J., Sankaranarayanan, J., Hacıgümüs, H., Tatemura, J., Polyzotis, N., Carey, M.: MISO: Souping Up Big Data Query Processing with a Multistore System. ACM SIGMOD Int. Conf. on Management of Data, pp. 1591-1602 (2014)
Liu, Z.H., Chang, H.J., Sthanikam, B.: Efficient support of XQuery Update Facility in XML enabled RDBMS. IEEE Int. Conf. on Data Engineering, pp. 1394–1404 (2012)
Martínez-Bazan, N., Muntés-Mulero, V., Gómez-Villamor, S., Águila-Llorente, M.A., Domínguez-Sal, D., Larriba-Pey, J-L.: Efficient Graph Management Based on Bitmap Indices. Int. Database Engineering & Applications Symposium (IDEAS), pp. 110-119 (2012)
Meijer, E., Beckman, B., Bierman, G. M.: LINQ: Reconciling Object, Relations and XML in the .NET Framework. ACM SIGMOD Int. Conf. on Data Management, pp. 706-706 (2006)
NoSQL Databases, http://nosql-database.org. [Last accessed on August 18, 2015]
Özsu, T., Valduriez, P.: Principles of Distributed Database Systems – Third Edition. Springer, 850 pages (2011)
Tomasic, A., Raschid, L., Valduriez, P.: Scaling Access to Heterogeneous Data Sources with DISCO. IEEE Transactions on Knowledge and Data Engineering 10(5), 808–823 (1998)
Article Google Scholar
Valduriez, P., Danforth, S.: Functional SQL, an SQL Upward Compatible Database Programming Language. Information Sciences 62(3), 183–203 (1992)
Article MATH Google Scholar
Wyss, C.M., Robertson, E.L.: Relational Languages for Metadata Integration. ACM Trans. On Database Systems 30(2), 624–660 (2005)
Article Google Scholar
Yuanyuan, T., Zou, T., Özcan, F., Goncalves, R., Pirahesh, H.: Joins for Hybrid Warehouses: Exploiting Massive Parallelism and Enterprise Data Warehouses. Int. Conf. on Extending Database Technology / Database Theory (EDBT/ICDT), pp. 373-384 (2015)
Zhu, M., Risch, T.: Querying Combined Cloud-Based and Relational Databases, Int. Conf. on Cloud and Service Computing, pp. 330–335 (2011)
Zhu, Q., Larson, P.-A.: Global Query Processing and Optimization in the CORDS Multidatabase System, Int. Conf. on Parallel and Distributed Computing Systems, pp. 640–647 (1996)

Download references

Acknowledgments

Work partially funded by the European Commission through the CoherentPaaS FP7 Project funded under contract FP7-611068 [5]. We want to thank Norbert Martínez-Bazan for his contributions on the first version of the CloudMdsQL query engine. We also thank the editor and reviewers for their careful readings and useful suggestions that helped improving our design and the paper. The work of Prof. Ricardo Jimenez was also partially funded by the Regional Government of Madrid (CAM) under Project Cloud4BigData (S2013/ICE-2894) cofunded by ESF & ERDF, and the Spanish Research Council (MICCIN) under Project BigDataPaaS (TIN2013-46883).

Author information

Authors and Affiliations

Zenith team, Inria, Montpellier, France
Boyan Kolev, Patrick Valduriez & Carlyna Bondiombouy
Universidad Politecnica de Madrid (UPM) and LeanXcale, Madrid, Spain
Ricardo Jiménez-Peris
Sparsity Technologies, Barcelona, Spain
Raquel Pau
INESC, Braga, Portugal
José Pereira

Authors

Boyan Kolev
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Valduriez
View author publications
You can also search for this author in PubMed Google Scholar
Carlyna Bondiombouy
View author publications
You can also search for this author in PubMed Google Scholar
Ricardo Jiménez-Peris
View author publications
You can also search for this author in PubMed Google Scholar
Raquel Pau
View author publications
You can also search for this author in PubMed Google Scholar
José Pereira
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Boyan Kolev.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kolev, B., Valduriez, P., Bondiombouy, C. et al. CloudMdsQL: querying heterogeneous cloud data stores with a common language. Distrib Parallel Databases 34, 463–503 (2016). https://doi.org/10.1007/s10619-015-7185-y

Download citation

Published: 25 September 2015
Issue Date: December 2016
DOI: https://doi.org/10.1007/s10619-015-7185-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CloudMdsQL: querying heterogeneous cloud data stores with a common language

Abstract

Access this article

Similar content being viewed by others

Multistore Big Data Integration with CloudMdsQL

Data Model for Cloud Computing Environment

SQL Query Optimization in Distributed NoSQL Databases for Cloud-Based Applications

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

CloudMdsQL: querying heterogeneous cloud data stores with a common language

Abstract

Access this article

Similar content being viewed by others

Multistore Big Data Integration with CloudMdsQL

Data Model for Cloud Computing Environment

SQL Query Optimization in Distributed NoSQL Databases for Cloud-Based Applications

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation