Abstract
NoSQL document stores offer native support to efficiently store documents with different schema within a same collection. However, this flexibility made it difficult and complex to formulate queries or to manipulate collections with multiple schemas. Hence, the user has to build complex queries or to reformulate existing ones whenever new schemas appear in the collection. In this paper, we propose a novel approach, grounded on formal foundations, for enabling schema-independent queries for querying and maintaining multi-structured documents. We introduce a query reformulation mechanism which consults a pre-constructed dictionary. This dictionary binds each possible path in the documents to all its corresponding absolute paths in all the documents. We automate the process of query reformulation via a set of rules that reformulate most document store operators, such as select, project and aggregate. In addition, we automate the process of reformulating the classical manipulation operators (insert, delete and update queries) in order to update the dictionary according to the different structural changes made in the collection. These two processes produce queries which are compatible with the native query engine of the underlying document store. To evaluate our approach, we conduct experiments on synthetic datasets. Our results show that the induced overhead when querying or updating can be acceptable when compared to the efforts made to restructure the data and the time required to execute several queries corresponding to the different schemas inside the collection.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Shvaiko, P., Euzenat, J.: A survey of schema-based matching approaches. J. Data Semant. IV, 146–171 (2005)
Bourhis, P., Reutter, J.L., Suárez, F., Vrgoč, D.: JSON: data model, query languages and schema specification. In: Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, pp. 123–135. ACM (2017)
Yang, Y., Sun, Y., Tang, J., Ma, B., Li, J.: Entity matching across heterogeneous sources. In: Proceedings of the 21th ACM SIGKDD, pp. 1395–1404. ACM (2015)
Hai, R., Geisler, S., Quix, C.: Constance: an intelligent data lake system. In: Proceedings of the 2016 International Conference on Management of Data, pp. 2097–2100. ACM (2016)
Sheth, A.P., Larson, J.A.: Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Comput. Surv. (CSUR) 22, 183–236 (1990)
Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB J. 10, 334–350 (2001)
Chasseur, C., Li, Y., Patel, J.M.: Enabling JSON document stores in relational systems. In: WebDB, vol. 13, pp. 14–15 (2013)
Tahara, D., Diamond, T., Abadi, D.J.: Sinew: a SQL system for multi-structured data. In: Proceedings of the 2014 ACM SIGMOD, pp. 815–826. ACM (2014)
DiScala, M., Abadi, D.J.: Automatic generation of normalized relational schemas from nested key-value data. In: Proceedings of the 2016 International Conference on Management of Data, pp. 295–310. ACM (2016)
Baazizi, M.A., Lahmar, H.B., Colazzo, D., Ghelli, G., Sartiani, C.: Schema inference for massive JSON datasets. In: EDBT (2017)
Sevilla Ruiz, D., Morales, S.F., García Molina, J.: Inferring versioned schemas from NoSQL databases and its applications. In: Johannesson, P., Lee, M.L., Liddle, S.W., Opdahl, A.L., López, Ó.P. (eds.) ER 2015. LNCS, vol. 9381, pp. 467–480. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25264-3_35
Wang, L., Zhang, S., Shi, J., Jiao, L., Hassanzadeh, O.: Schema management for document stores. Proc. VLDB Endow. 8, 922–933 (2015)
Herrero, V., Abelló, A., Romero, O.: NOSQL design for analytical workloads: variability matters. In: Comyn-Wattiau, I., Tanaka, K., Song, I.-Y., Yamamoto, S., Saeki, M. (eds.) ER 2016. LNCS, vol. 9974, pp. 50–64. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46397-1_4
Papakonstantinou, Y., Vassalos, V.: Query rewriting for semistructured data. In: ACM SIGMOD Record, vol. 28, pp. 455–466. ACM (1999)
Lin, C., Wang, J., Rong, C.: Towards heterogeneous keyword search. In: Proceedings of the ACM Turing 50th Celebration Conference-China, p. 46. ACM (2017)
Clark, J., DeRose, S., et al.: XML path language (XPath) version 1.0 (1999)
Boag, S., et al.: XQuery 1.0: an XML query language (2002)
Florescu, D., Fourny, G.: JSONiq: the history of a query language. IEEE Internet Comput. 17, 86–90 (2013)
Hidders, J., Paredaens, J., Van den Bussche, J.: J-logic: logical foundations for JSON querying. In: Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, pp. 137–149. ACM (2017)
Botoeva, E., Calvanese, D., Cogrel, B., Xiao, G.: Expressivity and complexity of MongoDB queries. In: 21st International Conference on Database Theory, ICDT 2018, Vienna, Austria, 26–29 March 2018, pp. 9:1–9:23 (2018)
Hamadou, H.B., Ghozzi, F., Péninou, A., Teste, O.: Towards schema-independent querying on document data stores. In: Proceedings of the 20th International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data (DOLAP), Vienna, Austria, 26–29 March 2018 (2018)
Tou, J.T.: Information systems. In: von Brauer, W. (ed.) GI 1973. LNCS, vol. 1, pp. 489–507. Springer, Heidelberg (1973). https://doi.org/10.1007/3-540-06473-7_52
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Ben Hamadou, H., Ghozzi, F., Péninou, A., Teste, O. (2019). Schema-Independent Querying and Manipulation for Heterogeneous Collections in NoSQL Document Stores. In: Hammoudi, S., Śmiałek, M., Camp, O., Filipe, J. (eds) Enterprise Information Systems. ICEIS 2018. Lecture Notes in Business Information Processing, vol 363. Springer, Cham. https://doi.org/10.1007/978-3-030-26169-6_16
Download citation
DOI: https://doi.org/10.1007/978-3-030-26169-6_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26168-9
Online ISBN: 978-3-030-26169-6
eBook Packages: Computer ScienceComputer Science (R0)