Skip to main content
Log in

CHiSEL: a user-oriented framework for simplifing database evolution

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

In order to conduct research effectively, scientists must be able to access, organize, describe, and produce data as part of their daily research activities. While relational databases are well suited to the tasks of describing and organizing scientific metadata and results, the difficulties of using relational database management systems effectively, have resulted in their limited adoption among scientists. In addition, scientific research is changing steadily with new experimental protocols, instruments, and discoveries that determine what data are generated and how they must be described and organized according to a relational schema. Unfortunately, evolving a schema is one of the most difficult aspects of database usage. The conventional data definition and manipulation languages offer relatively low-level programming abstractions to perform complex database evolution tasks, and therefore require specialized technical skills not possessed by most scientists. A simplified means of expressing database evolution operations would reduce the effort for non-expert users of databases. This paper presents a high-level, user-oriented, schema evolution framework built on a formal algebra of schema modification operators. The approach allows introduction of novel operators as motivated by new requirements and is amenable to well established optimization techniques for efficient planning and execution. We also propose a rigorous evaluation methodology for comparing the user effort of database evolution languages, and we introduce a benchmark for evaluating the execution efficiency of schema evolution expressions. We present the framework and its implementation, and we demonstrate its utility in exemplar use cases and a performance evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Notes

  1. Scientists are concerned with reuse of data and reproducible results, but these concerns are typically addressed by online sharing of historical snapshots of specific datasets of acquired or derived data from an experiment.

  2. See https://www.w3.org/TR/sparql11-overview/.

  3. See https://www.w3.org/TR/rdf11-primer/.

  4. t is not on the input list of domainify, because the operator always produces a relation with term as its attribute.

  5. t and s are not on the input list of canonicalize, because the operator always produces a relation with term and synonyms as its attributes.

  6. See https://pandas.pydata.org/.

  7. Unlike SRFs that batch up results in memory before returning, UDFs can and should instead pipeline results according to the iterator pattern [21] for efficiency.

  8. The execution of the statements in the evolve block may not be performed in a single transactional unit, as it depends on the capabilities of the underlying database management system.

  9. https://rstudio.github.io/reticulate/.

  10. See https://pypi.org/project/pyfpm/.

  11. See https://github.com/informatics-isi-edu/chisel/blob/63a72fdee2682a84f7ffbbd3196336d1aebc1fa6/chisel/catalog/semistructured.py and https://github.com/informatics-isi-edu/chisel/blob/63a72fdee2682a84f7ffbbd3196336d1aebc1fa6/chisel/operators/semistructured.py for the code used to extend the implementation with semistructured data source support.

  12. SQL dialects including Data Query Language (DQL), Data Manipulation Language (DML), and Data Definition Language (DDL) are the standard approach to performing in situ database evolution.

  13. The exact function names vary by database management system implementation.

  14. The database for this use case was our own based on these real data extracts, but does not reflect in any way the internal data management of the GTEx or EBI projects. The database was initially set up for a demonstration project in a separate but affiliated project. For more information on the GTEx project, see https://gtexportal.org/home/. For more information on the EBI data, see https://www.ebi.ac.uk/ena/data/view/PRJEB2784.

  15. For example, COPY in PostgreSQL or BULK INSERT in Microsoft SQL Server.

  16. At present, only one DEL [16] has so far attempted to support constraint evolution, to our knowledge.

References

  1. Armbrust, M., Ghodsi, A., Zaharia, M., Xin, R.S., Lian, C., Huai, Y., Liu, D., Bradley, J.K., Meng, X., Kaftan, T., Franklin, M.J.: Spark SQL: relational data processing in spark. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data—SIGMOD ’15, pp. 1383–1394 (2015). https://doi.org/10.1145/2723372.2742797

  2. Begley, C.G., Ellis, L.M.: Drug development: raise standards for preclinical cancer research. Nature 483(7391), 531–3 (2012). https://doi.org/10.1038/483531a

    Article  Google Scholar 

  3. Bernstein, P.A.: Applying model management to classical meta data problems. In: Proceedings of the 2003 CIDR Conference, Asilomar, CA, USA, pp. 209–220 (2003)

  4. Bernstein, P.A., Halevy, A.Y., Pottinger, R.A.: A vision for management of complex models. SIGMOD Rec. 29(4), 55–63 (2000). https://doi.org/10.1145/369275.369289

    Article  Google Scholar 

  5. Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python. O’Reilly Media, Inc., Newton (2009)

    MATH  Google Scholar 

  6. Brinkley, J.F., Fisher, S., Harris, M.P., Holmes, G., Hooper, J.E., Jabs, E.W., Jones, K.L., Kesselman, C., Klein, O.D., Maas, R.L., Marazita, M.L., Selleri, L., Spritz, R.A., van Bakel, H., Visel, A., Williams, T.J., Wysocka, J., FaceBase Consortium, Chai, Y.: The FaceBase Consortium: a comprehensive resource for craniofacial researchers. Development 143(14), 2677–88 (2016). https://doi.org/10.1242/dev.135434

    Article  Google Scholar 

  7. Bugacov, A., Czajkowski, K., Kesselman, C., Kumar, A., Schuler, R., Tangmunarunkit, H.: Experiences with Deriva: an asset management platform for accelerating eScience. In: The IEEE 13th International Conference on eScience, Auckland, New Zealand (2017)

  8. Chaudhuri, S., Ganti, V., Kaushik, R.: A primitive operator for similarity joins in data cleaning. In: 22nd International Conference on Data Engineering (ICDE’06), p. 5 (2006)

  9. Cleve, A., Hainaut, J.L.: Co-transformations in Database Applications Evolution, pp. 409–421. Springer, Berlin (2006). https://doi.org/10.1007/11877028_17

    Book  Google Scholar 

  10. Curino, C., Tanca, L., Moon, H., Zaniolo, C.: Schema evolution in wikipedia: toward a web information system benchmark. In: International Conference on Enterprise Information Systems (ICEIS) (2008). https://doi.org/10.5220/0001713003230332

  11. Curino, C.A., Moon, H.J., Zaniolo, C.: Graceful database schema evolution: the PRISM workbench. Proc. VLDB Endow. 1(1), 761–772 (2008). https://doi.org/10.14778/1453856.1453939

    Article  Google Scholar 

  12. Curino, C.A., Tanca, L., Moon, H.J., Zaniolo, C.: Schema evolution in wikipedia: toward a web information system benchmark. In: International Conference on Enterprise Information Systems (ICEIS) (2008)

  13. Curino, C., Moon, H., Zaniolo, C.: Automating database schema evolution in information system upgrades. In: Proceedings of the 2nd International Workshop on Hot Topics in Software Upgrades pp. 1–5 (2009)

  14. Curino, C.A., Moon, H.J., Deutsch, A., Zaniolo, C.: Update rewriting and integrity constraint maintenance in a schema evolution support system: PRISM++. Proc. VLDB Endow. 4(2), 117–128 (2010). https://doi.org/10.14778/1921071.1921078

    Article  Google Scholar 

  15. Curino, C., Moon, H.J., Deutsch, A., Zaniolo, C.: Automating the database schema evolution process. VLDB J. 22(1), 73–98 (2013). https://doi.org/10.1007/s00778-012-0302-x

    Article  Google Scholar 

  16. Curino, C., Moon, H.J., Deutsch, A., Zaniolo, C.: Automating the database schema evolution process. VLDB J. 22(1), 73–98 (2013)

    Article  Google Scholar 

  17. Czajkowski, K., Kesselman, C., Schuler, R.E., Tangmunarunkit, H.: Ermrest: a web service for collaborative data management. In: Proceedings of the 30th International Conference on Scientific and Statistical Database Management, ACM, New York, NY, USA, SSDBM ’18, pp. 13:1–13:12 (2018)

  18. Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate record detection: a survey. IEEE Trans. Knowl. Data Eng. 19(1), 1–16 (2007)

    Article  Google Scholar 

  19. Giannakopoulou, S., Karpathiotakis, M., Gaidioz, B., Ailamaki, A.: Cleanm: an optimizable query language for unified scale-out data cleaning. Proc. VLDB Endow. 10(11), 1466–1477 (2017). https://doi.org/10.14778/3137628.3137654

    Article  Google Scholar 

  20. Gobert, M., Maes, J., Cleve, A., Weber, J.: Understanding schema evolution as a basis for database reengineering. In: 2013 IEEE International Conference on Software Maintenance, pp. 472–475 (2013). https://doi.org/10.1109/ICSM.2013.75

  21. Graefe, G.: Query evaluation techniques for large databases. ACM Comput. Surv. 25(2), 73–169 (1993). https://doi.org/10.1145/152610.152611

    Article  Google Scholar 

  22. Graefe, G.: The cascades framework for query optimization. Data Eng. Bull. 18, 19–29 (1995)

    Google Scholar 

  23. Hartung, M., Terwilliger, J., Rahm, E.: Recent advances in schema and ontology evolution. In: Bellahsene, Z., Bonifati, A., Rahm, E. (eds.) Schema Matching and Mapping, pp. 149–190. Springer, Berlin (2011). https://doi.org/10.1007/978-3-642-16518-4_6

    Chapter  Google Scholar 

  24. Heidorn, P.B.: Shedding light on the dark data in the long tail of science. Libr. Trends 57(2), 280–299 (2008)

    Article  Google Scholar 

  25. Herrmann, K., Voigt, H., Behrend, A., Lehner, W.: Codel—a relationally complete language for database evolution. In: Tadeusz, M., Valduriez, P., Bellatreche, L. (eds.) Advances in Databases and Information Systems, pp. 63–76. Springer International Publishing, Cham (2015)

    Chapter  Google Scholar 

  26. Herrmann, K., Voigt, H., Rausch, J., Behrend, A., Lehner, W.: Living in parallel realities—co-existing schema versions with a bidirectional database evolution language. In: SIGMOD’17, Proceedings of the 2017 International Conference on Management of Data, Chicago, IL, USA, May 14–19, 2017 (2017). ACM

  27. Hick, J.M., Hainaut, J.L.: Database application evolution: a transformational approach. Data Knowl. Eng. 59(3), 534–558 (2006). https://doi.org/10.1016/j.datak.2005.10.003

    Article  Google Scholar 

  28. Howe, B., Cole, G., Souroush, E., Koutris, P., Key, A., Khoussainova, N., Battle, L.: Database-as-a-service for long-tail science (2011)

  29. Jain, S., Moritz, D., Halperin, D., Howe, B., Lazowska, E.: SQLShare: results from a multi-year SQL-as-a-service experiment. In: SIGMOD’16, ACM, San Francisco, CA, USA (2016). https://doi.org/10.1145/2882903.2882957

  30. Kandel, S.: Enterprise data analysis and visualization: an interview study. IEEE Trans. Vis. Comput. Graphics 18, 2917–2926 (2012)

    Article  Google Scholar 

  31. Kandel, S., Paepcke, A., Hellerstein, J., Heer, J.: Wrangler: Interactive visual specification of data transformation scripts. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Association for Computing Machinery, New York, NY, USA, CHI ’11, pp. 3363–3372 (2011)

  32. Kandel, S., Paepcke, A., Hellerstein, J., Heer, J., Hellerstein, J.: Wrangler: interactive visual specification of data transofmration scripts. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems—CHI ’11, ACM Press, New York, USA, pp. 3363–3372 (2011). https://doi.org/10.1145/1978942.1979444

  33. Kluyver, T., Ragan-Kelley, B., Pérez, F., Granger, B.E., Bussonnier, M., Frederic, J., Kelley, K., Hamrick, J.B., Grout, J., Corlay, S., et al.: Jupyter notebooks—a publishing format for reproducible computational workflows, pp. 87–90. Positioning and Power in Academic Publishing, Players, Agents and Agendas (2016)

  34. Krogh, B., Weisberg, A., Bested, M.: DBLint : a tool for automated analysis of database design (2011)

  35. Maier, D.: Theory of Relational Databases. Computer Science Press, Rockville (1983)

    MATH  Google Scholar 

  36. Melnik, S., Rahm, E., Bernstein, P.A.: Rondo: a programming platform for generic model management. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, ACM, New York, NY, USA, SIGMOD ’03, pp. 193–204 (2003). https://doi.org/10.1145/872757.872782

  37. Melnik, S., Bernstein, P.A., Halevy, A., Rahm, E.: Supporting executable mappings in model management. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, ACM, New York, NY, USA, SIGMOD ’05, pp. 167–178 (2005). https://doi.org/10.1145/1066157.1066177

  38. Meurice, L., Cleve, A.: DAHLIA: a visual analyzer of database schema evolution. In: 2014 Software Evolution Week—IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE). IEEE, pp. 464–468 (2014). https://doi.org/10.1109/CSMR-WCRE.2014.6747219

  39. Miller, G.A.: Wordnet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995)

    Article  Google Scholar 

  40. Moody, D.L.: Metrics for evaluating the quality of entity relationship models. In: Proceedings of the 17th International Conference on Conceptual Modeling, Springer, London, UK, UK, ER ’98, pp. 211–225 (1998).http://dl.acm.org/citation.cfm?id=647520.727704

  41. Moon, H.J., Curino, C.A., Deutsch, A., Hou, C.Y., Zaniolo, C.: Managing and querying transaction-time databases under schema evolution. Proc. VLDB Endow. 1(1), 882–895 (2008)

    Article  Google Scholar 

  42. Perez, F., Granger, B.E.: Ipython: a system for interactive scientific computing. Comput. Sci. Eng. 9(3), 21–29 (2007). https://doi.org/10.1109/MCSE.2007.53

    Article  Google Scholar 

  43. Roddick, J.F.: SQL/SE: a query language extension for databases supporting schema evolution. Sigmod Record 21(3), 1079–1080 (1992). https://doi.org/10.1145/140979.140985

    Article  Google Scholar 

  44. Roddick, J.F.: A survey of schema versioning issues for database systems. Inf. Softw. Technol. 37(7), 383–393 (1995). https://doi.org/10.1016/0950-5849(95)91494-K

    Article  Google Scholar 

  45. Roddick, J.F., Craske, N.G., Richards, T.J.: A taxonomy for schema versioning based on the relational and entity relationship models. In: Proceedings of Twelfth International Conference on Entity-Relationship Approach, Springer-Verlag, Dallas, Texas, pp. 143–154 (1993). https://doi.org/10.1007/BFb0024363

  46. Sansone, S.A., Gonzalez-Beltran, A., Rocca-Serra, P., Alter, G., Grethe, J.S., Xu, H., Fore, I.M., Lyle, J., Gururaj, A.E., Chen, X., Kim, H., Zong, N., Li, Y., Liu, R., Ozyurt, I.B., Ohno-Machado, L.: Dats, the data tag suite to enable discoverability of datasets. Sci. Data 4, 170059 (2017)

    Article  Google Scholar 

  47. Schek, H.J., Scholl, M.: The relational model with relation-valued attributes. Inf. Syst. 11(2), 137–147 (1986)

    Article  Google Scholar 

  48. Schuler, R.E., Kesselman, C.: Towards an efficient and effective framework for the evolution of scientific databases. In: Proceedings of the 30th International Conference on Scientific and Statistical Database Management, ACM, New York, NY, USA, SSDBM ’18, pp. 27:1–27:4 (2018)

  49. Schuler, R.E., Kesselman, C.: A high-level user-oriented framework for database evolution. In: 31st International Conference on Scientific and Statistical Database Management (SSDBM ’19), ACM, New York, NY, USA, p. 12 (2019)

  50. Schuler, R.E., Kesselman, C., Czajkowski, K.: Digital asset management for heterogeneous biomedical data in an era of data-intensive science. In: 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 588–592 (2014). https://doi.org/10.1109/BIBM.2014.6999226

  51. Schuler, R.E., Kesselman, C., Czajkowski, K.: Accelerating data-driven discovery with scientific asset management. The IEEE 12th International Conference on eScience, Baltimore, MD USA, pp. 1–10. (2016)

  52. Schuler, R., Czajkowski, K., D’Arcy, M., Tangmunarunkit, H., Kesselman, C.: Towards co-evolution of data-centric ecosystems. In: 32nd International Conference on Scientific and Statistical Database Management (SSDBM ’20), ACM, New York, NY, USA, p. 12 (2020)

  53. Szalay, A.S., Kunszt, P.Z., Thakar, A., Gray, J., Slutz, D., Brunner, R.J.: Designing and mining multi-terabyte astronomy archives: the sloan digital sky survey. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, ACM, New York, NY, USA, SIGMOD ’00, pp. 451–462 (2000). https://doi.org/10.1145/342009.335439

  54. Terwilliger, J.F., Bernstein, P.A., Unnithan, A.: Worry-free database upgrades: automated model-driven evolution of schemas and complex mappings. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, ACM, New York, NY, USA, SIGMOD ’10, pp. 1191–1194 (2010), https://doi.org/10.1145/1807167.1807316

  55. The PostgreSQL Global Development Group (2018) PostgreSQL 10.5 Documentation. The PostgreSQL Global Development Group. https://www.postgresql.org/docs/10/static/index.html

  56. Van Deursen, A., Klint, P., Visser, J.: Domain-specific languages: an annotated bibliography. ACM Sigplan Not. 35(6), 26–36 (2000)

    Article  Google Scholar 

  57. Vassiliadis, P.: A survey of extract-transform-load technology. Int. J. Data Warehous. Min. 5(3), 1–27 (2009)

    Article  Google Scholar 

  58. Vassiliadis, P., Zarras, A.V., Skoulis, I.: How is Life for a Table in an Evolving Relational Schema? Birth, Death and Everything in Between, pp. 453–466. Springer International Publishing, Cham (2015). https://doi.org/10.1007/978-3-319-25264-3_34

    Book  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Robert Schuler.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Approximating CHiSEL SMOs in SQL

Appendix: Approximating CHiSEL SMOs in SQL

First the SMOs that take the form of conventional relational operators have clear equivalence between CHiSEL and SQL. For select, the CHiSEL expression takes the form of t.select().where(...) where ‘t’ is a table name in the database and ‘...’ a placeholder for the conditional formula to restrict the tuples in an optional where clause. SQL clearly takes the form SELECT * FROM t WHERE .... In both languages, the cost of the operation per Rubric R2 is 1. project also takes a similar form of t.select(a, b, c, ...) and SELECT a, b, c, ... FROM t respectively, where “a, b, c, ...” are the column names of t to be projected. Per Rubric R3, these implicit operations incur a cost of 1. rename is just slightly different as usual t.select(b=a) and SELECT a AS b FROM t respectively, and again per Rubric R3 incur a cost of 1. join in CHiSEL is expressed as t.join(s).where(...) using ‘s’ as another named table in the database, and similarly in SQL it is expressed as SELECT * FROM t, s WHERE ... or SELECT * FROM t JOIN s ON ... and per Rubric R2 these incur cost of 1 in either language. union is expressed as expression1 + expression2 in CHiSEL or expression1 UNION expression2 in SQL, and again applying Rubric R2 both forms incur a cost of 1. Finally, distinct is expressed within CHiSEL composite formulas and the user does not need to directly invoke the operations, while certain expressions formed in SQL will require the use of the DISTINCT keyword at a cost of 1 according to Rubric R2.

The similarityjoin, deduplicate, nest, and unnest operations are not directly exposed in the CHiSEL language as they are only required through the use of composite operators in the language. In order to approximate these operations in SQL expressions, the first three of these would depend on a hypothetical but plausible SIMILAR(...) function that performs non-trivial similarity measures beyond the simple pattern matching LIKE comparison or common regular expression operators available in many database management systems. While this function does not exist in standard SQL, most commercial and open source database management systems provide functions for performing string or word similarity comparisons based on edit distance or other fuzzy matching algorithms. Per Rubric R4, the usage of such functions incurs a cost of 1. Thus similarityjoin could be expressed as:

figure y

with a cost of R2 + R4 for a total cost of 2. deduplicate and nest, however, require inherently more complex expressions to approximate in SQL. deduplicate could be approximated in SQL using a similarity self-join, aggregating the resulting relation, and returning the first unique value, as in the following expression:

figure z

where the innermost expression costs 3 \(\times\) R2 + R3 + R4 for a subtotal of 5, plus the next most inner expression costs 2 \(\times\) R2 + R3 + R4 for a subtotal of 4, plus the outer expression consists of R2 + R3 for a subtotal of 2, and therefore the whole expression comes at an overall cost of 11. Likewise, nest is essentially the same expression but with additional projections and aggregations of the nested elements of the inner expressions, such as:

figure aa

which adds an additional aggregate function and per R4 raises the total cost for the expression to 12. The unnest operation would require a function to unpack an array into separate tuples (e.g., unnest function in PostgreSQL) and a built-in or user-defined function to take unstructured input and convert it into an array (e.g., string_to_array also from PostgreSQL could convert strings to arrays but UDFs would be required to unpack messier values into discrete units) in an expression such as:

figure ab

and by applying Rubric R2, R3, R4, and R5 comes at a cost of 4. These are of course only approximations, because there may be many ways to achieve the same operations in different but equivalent expressions. We have attempted here to provide the relatively straightforward implementation one might use to achieve the respective SMOs.

Next, we turn our attention to the SMOs that in the CHiSEL formulation are “composites” of other operations. copycol is expressed by joining the target table t with the source table s and then projecting all of the columns t.* along with the addition of the desired column s.a from the source table s. We can use the following expression for this:

figure ac

and it has a cost of 2 \(\times\) R2 + R3 for a total of 3. reify is expressed by forcing set semantics on a table t based on the distinct columns a1, ..., aN to be projected as the key columns of the new relation along with a subset of desired columns b1, ..., bM from a table t.

figure ad

and by 2 × R2 + R3 has a total cost of 3. \(\texttt {Reify}^{\texttt {Sub}}\) is expressed by projecting the key of a table t along with a subset of desired columns a1, ..., aN. To approximate the respective SMO, the expression would depend on a user-defined function KEY() to infer the primary key columns from the given table. The following example:

figure ae

consists of R2 + R3 + R5 for a total cost of 3. To approximate align, we begin by assuming a table t with a column aN that we wish to replace with a “canonical” term in column b of table s based on a fuzzy similarity match between the target and canonical columns. To do so, we would perform a similarity join on t.aN and s.b and then project all of the columns in t without aN (denoting the column just before the Nth term as aN_minus_1, for illustration but of course the formula need not be limited to such a sequence of columns) and instead project out s.b. We get the following example expression:

figure af

and it requires 2 \(\times\) R2 + 2 \(\times\) R3 + R4 for a total cost of 5.

The final set of SMOs to be discussed can be formulated mostly from combinations of the expressions described above. atomize is a composite of unnest and \(\texttt {Reify}^{\texttt {Sub}}\) and by combining their respective SQL expressions comes at a combined cost of 7. domainify builds on deduplicate by deduplicating a single column and renaming it, adding cost per R3, therefore raising the total cost of the expression to 12. canonicalize begins with a sub-expression to project a target column twice, such as SELECT a AS term, a AS synonyms FROM t and then nests synonyms. Thus the cost is that of nest plus 1 per R3 for the renaming to bring the total cost to 12. tagify is a composite of align and atomize and by combining their respective SQL expressions yields a total cost of 12.

Finally, we return to the cost evaluation for the usage of CHiSEL expressions beyond the basic operations evaluated above. In general, the CHiSEL syntax exposes methods corresponding to SMOs in idioms of its host language. For example, to atomize a column, the CHiSEL expression is t.columns[ ‘a’ ].to_atoms(). Similarly, domainify, canonicalize, align, and tagify are expressed through methods to_domain(), to_vocabulary(), align( domain ) and tagify( domain ), respectively. These operations per R2 each come at a cost of 1. reify and \(\texttt {Reify}^{\texttt {Sub}}\) are exposed via table-level methods. reify as in t.reify({ ... key columns ...}, { ... non-key columns ... }) and \(\texttt {Reify}^{\texttt {Sub}}\) as in t.reify_sub( ... non-key columns ... ). The expressions include an additional expense for the projections per R3 and therefore have total cost of 2 each. The costs of the CHiSEL expressions is summarized along with the corresponding expressions in SQL in Table 5.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Schuler, R., Kesselman, C. CHiSEL: a user-oriented framework for simplifing database evolution. Distrib Parallel Databases 39, 483–543 (2021). https://doi.org/10.1007/s10619-020-07314-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-020-07314-x

Keywords

Navigation