Abstract
The blooming of different cloud data management infrastructures, specialized for different kinds of data and tasks, has led to a wide diversification of DBMS interfaces and the loss of a common programming paradigm. In this paper, we present the design of a cloud multidatastore query language (CloudMdsQL), and its query engine. CloudMdsQL is a functional SQL-like language, capable of querying multiple heterogeneous data stores (relational and NoSQL) within a single query that may contain embedded invocations to each data store’s native query interface. The query engine has a fully distributed architecture, which provides important opportunities for optimization. The major innovation is that a CloudMdsQL query can exploit the full power of local data stores, by simply allowing some local data store native queries (e.g. a breadth-first search query against a graph database) to be called as functions, and at the same time be optimized, e.g. by pushing down select predicates, using bind join, performing join ordering, or planning intermediate data shipping. Our experimental validation, with three data stores (graph, document and relational) and representative queries, shows that CloudMdsQL satisfies the five important requirements for a cloud multidatastore query language.
Similar content being viewed by others
References
Armbrust, M., Xin, R., Lian, C., Huai, Y., Liu, D., Bradley, J., Meng, X., Kaftan, T., Franklin, M., Ghodsi, A., Zaharia, M.: Spark SQL: Relational Data Processing in Spark. ACM SIGMOD Int. Conf. on Management of Data, pp. 1383-1394 (2015)
Binnig, C., Rehrmann, R., Faerber, F., Riewe, R.: FunSQL: It is time to make SQL functional. Int. Conf. on Extending Database Technology / Database Theory (EDBT/ICDT), pp. 41-46 (2012)
Bondiombouy, C., Kolev, B., Levchenko, O., Valduriez, P. : Integrating Big Data and Relational Data with a Functional SQL-like Query Language. Int. Conf. on Databases and Expert Systems Applications (DEXA), pp. 170-185 (2015)
Bugiotti, F., Bursztyn, D., Deutsch, A., Ileana, I., Manolescu, I.: Invisible Glue: Scalable Self-Tuning Multi-Stores. Conf. on Innovative Data Systems Research (CIDR), 7pp (2015)
CoherentPaaS Project, http://coherentpaas.eu. [Last accessed on August 18, 2015]
Danforth, S., Valduriez, P.: A FAD for Data-Intensive Applications. IEEE Trans. on Knowledge and Data Engineering 4(1), 34–51 (1992)
Doan, A., Halevy, A., Ives, Z.: Principles of Data Integration. Morgan Kaufmann, (2012)
Godfrey, P., Gryz, J., Hoppe, A., Ma, W., Zuzarte, C.: Query rewrites with views for XML in DB2. IEEE Int. Conf. on Data Engineering, pp. 1339–1350 (2009)
Gulisano, V., Jiménez-Peris, R., Patiño-Martinez, M., Valduriez, P.: StreamCloud: A Large Scale Data Streaming System. IEEE Int. Conf. on Distributed Computing Systems (ICDCS), pp. 126-137 (2010)
Gulisano, V., Jiménez-Peris, R., Patiño-Martinez, M., Soriente, C., Valduriez, P.: StreamCloud: An Elastic and Scalable Data Streaming System. IEEE Trans. On Parallel and Distributed Systems 23(12), 2351–2365 (2012)
Haas, L. M., Kossmann, D., Wimmers, E. L., Yang, J.: Optimizing Queries across Diverse Data Sources. Int. Conf. on Very Large Databases (VLDB), pp. 276-285 (1997)
Haase, P., Mathäß, T., Ziller, M.: An Evaluation of Approaches to Federated Query Processing over Linked Data. Int. Conf. on Semantic Systems (I-SEMANTICS) (2010)
Hacıgümüs, H., Sankaranarayanan, J., Tatemura, J., LeFevre, J., Polyzotis, N.: Odyssey: A Multi-Store System for Evolutionary Analytics. Proceedings of the VLDB Endowment (PVLDB) 6(11), 1180–1181 (2013)
Hart, B., Valduriez, P., Danforth, S.: Parallelizing FAD using Compile Time Analysis Techniques. IEEE Data Engineering Bulletin (12) 1, 9–15 (1989)
JSON Schema and Hyper-Schema, http://json-schema.org. [Last accessed on August 18, 2015]
LeFevre, J., Sankaranarayanan, J., Hacıgümüs, H., Tatemura, J., Polyzotis, N., Carey, M.: MISO: Souping Up Big Data Query Processing with a Multistore System. ACM SIGMOD Int. Conf. on Management of Data, pp. 1591-1602 (2014)
Liu, Z.H., Chang, H.J., Sthanikam, B.: Efficient support of XQuery Update Facility in XML enabled RDBMS. IEEE Int. Conf. on Data Engineering, pp. 1394–1404 (2012)
Martínez-Bazan, N., Muntés-Mulero, V., Gómez-Villamor, S., Águila-Llorente, M.A., Domínguez-Sal, D., Larriba-Pey, J-L.: Efficient Graph Management Based on Bitmap Indices. Int. Database Engineering & Applications Symposium (IDEAS), pp. 110-119 (2012)
Meijer, E., Beckman, B., Bierman, G. M.: LINQ: Reconciling Object, Relations and XML in the .NET Framework. ACM SIGMOD Int. Conf. on Data Management, pp. 706-706 (2006)
NoSQL Databases, http://nosql-database.org. [Last accessed on August 18, 2015]
Özsu, T., Valduriez, P.: Principles of Distributed Database Systems – Third Edition. Springer, 850 pages (2011)
Tomasic, A., Raschid, L., Valduriez, P.: Scaling Access to Heterogeneous Data Sources with DISCO. IEEE Transactions on Knowledge and Data Engineering 10(5), 808–823 (1998)
Valduriez, P., Danforth, S.: Functional SQL, an SQL Upward Compatible Database Programming Language. Information Sciences 62(3), 183–203 (1992)
Wyss, C.M., Robertson, E.L.: Relational Languages for Metadata Integration. ACM Trans. On Database Systems 30(2), 624–660 (2005)
Yuanyuan, T., Zou, T., Özcan, F., Goncalves, R., Pirahesh, H.: Joins for Hybrid Warehouses: Exploiting Massive Parallelism and Enterprise Data Warehouses. Int. Conf. on Extending Database Technology / Database Theory (EDBT/ICDT), pp. 373-384 (2015)
Zhu, M., Risch, T.: Querying Combined Cloud-Based and Relational Databases, Int. Conf. on Cloud and Service Computing, pp. 330–335 (2011)
Zhu, Q., Larson, P.-A.: Global Query Processing and Optimization in the CORDS Multidatabase System, Int. Conf. on Parallel and Distributed Computing Systems, pp. 640–647 (1996)
Acknowledgments
Work partially funded by the European Commission through the CoherentPaaS FP7 Project funded under contract FP7-611068 [5]. We want to thank Norbert Martínez-Bazan for his contributions on the first version of the CloudMdsQL query engine. We also thank the editor and reviewers for their careful readings and useful suggestions that helped improving our design and the paper. The work of Prof. Ricardo Jimenez was also partially funded by the Regional Government of Madrid (CAM) under Project Cloud4BigData (S2013/ICE-2894) cofunded by ESF & ERDF, and the Spanish Research Council (MICCIN) under Project BigDataPaaS (TIN2013-46883).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kolev, B., Valduriez, P., Bondiombouy, C. et al. CloudMdsQL: querying heterogeneous cloud data stores with a common language. Distrib Parallel Databases 34, 463–503 (2016). https://doi.org/10.1007/s10619-015-7185-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10619-015-7185-y