Abstract
Supporting database schema evolution represents a long-standing challenge of practical and theoretical importance for modern information systems. In this paper, we describe techniques and systems for automating the critical tasks of migrating the database and rewriting the legacy applications. In addition to labor saving, the benefits delivered by these advances are many and include reliable prediction of outcome, minimization of downtime, system-produced documentation, and support for archiving, historical queries, and provenance. The PRISM/PRISM++ system delivers these benefits, by solving the difficult problem of automating the migration of databases and the rewriting of queries and updates. In this paper, we present the PRISM/PRISM++ system and the novel technology that made it possible. In particular, we focus on the difficult and previously unsolved problem of supporting legacy queries and updates under schema and integrity constraints evolution. The PRISM/PRISM++ approach consists in providing the users with a set of SQL-based Schema Modification Operators (SMOs), which describe how the tables in the old schema are modified into those in the new schema. In order to support updates, SMOs are extended with integrity constraints modification operators. By using recent results on schema mapping, the paper (i) characterizes the impact on integrity constraints of structural schema changes, (ii) devises representations that enable the rewriting of updates, and (iii) develop a unified approach for query and update rewriting under constraints. We complement the system with two novel tools: the first automatically collects and provides statistics on schema evolution histories, whereas the second derives equivalent sequences of SMOs from the migration scripts that were used for schema upgrades. These tools were used to produce an extensive testbed containing 15 evolution histories of scientific databases and web information systems, providing over 100 years of aggregate evolution histories and almost 2,000 schema evolution steps.
Similar content being viewed by others
Notes
From the SVN commit 5552 accessible at: http://svn.wikimedia.org/viewvc/mediawiki?view=rev&revision=5552.
PRISM is an acronym for Panta Rhei Information & Schema Manager—‘Panta Rhei’ (Everything is in flux) is often credited to Heraclitus. The project homepage is as follows: http://yellowstone.cs.ucla.edu/schema-evolution/index.php/Prism.
See Ensembl CVS repository at: http://tinyurl.com/ensembl-schema.
An \(exon\) is a nucleic acid sequence related to a portion of DNA.
This information is derived from the CVS logs and from the SQL used for data migration.
These are simple equality assertions about the value of a column and constants, supported by the SQL DDL.
Note that we apply the definition only for the case when \(M\) is a functional mapping. This suffices in our context since we force evolution operators to be invertible (as explained below). In general, however, classical schema mappings [34] may associate several possible \(S_2\)-instances with a given \(S_1\)-instance.
This process is semi-automatic, and the user is guided by the system in the selection –at evolution time, not at query rewriting time– of the inverse for each SMO [23].
Note that some of the updates will now fail due to the stricter constraints. This is unavoidable to maintain the DB instance \(I_2\) consistent with \(IC_2\) and is in general well-accepted consequence of tightening constraints.
This policy is only available for rewriting purposes, that is, for inverses of ICMOs, since the use for data migration would lead to an inconsistent DB instance: \(I_2 \not \models IC_2\).
Note that the evolution is information preserving: (forward) thanks to the primary key on id, and (inverse) since the system automatically declares the integrity constraints valid in the output Footnote 14 continued
schema (two primary keys on the id columns, and two cross foreign keys).
We release the two datasets at: http://db.csail.mit.edu/wikipedia/ and http://db.csail.mit.edu/ensembldb/.
References
http://www.oracle.com/us/products/enterprise-manager/change-management-pack-11g-ds-068451.pdf
Abiteboul, S., Duschka, O.M.: Complexity of answering queries using materialized views. In: PODS, pp. 254–263 (1998)
Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison Wesley, Reading (1995)
Afrati, F.N., Kolaitis, P.G.: Repair checking in inconsistent databases: Algorithms and complexity. In: ICDT, pp. 31–41 (2009)
Arenas, M., Bertossi, L., Chomicki, J.: Consistent query answers in inconsistent databases. In: PODS, pp. 68–79 (1999)
Bancilhon, F., Spyratos, N.: Update semantics of relational views. ACM Trans. Database Syst. 6(4), 557–575 (1981)
Bernstein, P.A.: Applying model management to classical meta data problems. In: CIDR (2003)
Bernstein, P.A., Green, T.J., Melnik, S., Nash, A.: Implementing mapping composition. VLDB J. 17(2), 333–353 (2008)
Bohannon, A., Pierce, B.C., Vaughan, J.A.: Relational lenses: A language for updatable views. In: PODS, pp. 338–347 (2006)
Cleve A., Hainaut, J.-L.: Co-transformations in database applications evolution. In: GTTSE, pp. 409–421 (2006)
Curino, C., Ham, M., Moroni, F., Zaniolo, C.: Pantha rei data set. http://data.schemaevolution.org/ (2009)
Curino, C., Moon, H.J., Deutsch, A., Zaniolo, C.: Update rewriting and integrity constraint maintenance in a schema evolution support system: Prism++. PVLDB 4(2), 117–128 (2010)
Curino, C., Moon, H.J., Ham, M., Zaniolo, C.: The prism workbench: Database schema evolution without tears. In: ICDE (2009)
Curino, C., Moon, H.J., Tanca, L., Zaniolo, C.: Schema evolution in Wikipedia: Toward a web information system benchmark. ICEIS (2008)
Curino, C., Moon, H.J., Zaniolo, C.: Graceful database schema evolution: The prism workbench. PVLDB 1(1), 761–772 (2008)
Curino, C., Moon, H.J.: C. Zaniolo. Managing the history of metadata in support for db archiving and schema evolution. In: ECDM (2008)
Dayal, U., Bernstein, P.A.: On the correct translation of update operations on relational views. ACM Trans. Database Syst. 7(3), 381–416 (1982)
Deutsch, A., Nash, A., Remmel, J.: The chase revisited. In: PODS, pp. 149–158 (2008)
Deutsch, A., Tannen, V.: Mars: A system for publishing xml from mixed and redundant storage. In: VLDB, pp. 201–212 (2003)
Ensembl development team. Ensembl Genetic DB http://www.ensembl.org, 2009 (Online)
Fagin, R.: Inverting schema mappings. ACM Trans. Database Syst. 32(4), 25:1–25:51 (2007)
Fagin, R., Kolaitis, P.G., Popa, L., Tan, W.-C.: Composing schema mappings: Second-order dependencies to the rescue. ACM Trans. Database Syst. 30(4), 994–1055 (2005)
Fagin, R., Kolaitis, P.G., Popa, L., Tan, W.-C.: Quasi-inverses of schema mappings. In: PODS, pp. 123–132 (2007)
Fagin, R., Kolaitis, P.G., Popa, L., Tan, W. C.: Reverse data exchange: Coping with nulls. In: PODS, pp. 23–32 (2009)
Hartung, M., Terwilliger, J.F., Rahm, E.: Recent advances in schema and ontology evolution. In: Schema Matching and Mapping, pp. 149–190 (2011)
Hernández, M.A., Miller, R.J., Haas, L.M.: Clio: A semi-automatic tool for schema mapping. In: SIGMOD, p. 607 (2001)
Hick, J.-M., Hainaut, J.-L.: Database application evolution: A transformational approach. Data Knowl. Eng. 59(3), 534–558 (2006)
Hull, R.: Non-finite specifiability of projections of functional dependency families. Theor. Comput. Sci. 39, 239–265 (1985)
Kotidis, Y., Srivastava, D., Velegrakis, Y.: Updates through views: A new hope. In: ICDE, p. 2 (2006)
Lenzerini, M.: Data integration: A theoretical perspective. In: PODS, pp. 233–246 (2002)
Liu, Z., He, B., Hsiao, H.-I., Chen, Y.: Efficient and scalable data evolution with column oriented databases. In: EDBT (2011)
Madhavan, J., Halevy, A.Y.: Composing mappings among data sources. In: VLDB, pp. 572–583 (2003)
Melnik, S., Rahm, E., Bernstein, P.A.: Rondo: A programming platform for generic model management. In: SIGMOD (2003)
Miller, R.J., Ioannidis, Y.E., Ramakrishnan, R.: The use of information capacity in schema integration and translation. In: VLDB, pp. 120–133 (1993)
Miller, R.J., Ioannidis, Y.E., Ramakrishnan, R.: Schema equivalence in heterogeneous systems: Bridging theory and practice. Inf. Syst. 19(1), 3–31 (1994)
Moon, H.J., Curino, C., Deutsch, A., Hou, C.-Y., Zaniolo, C.: Managing and querying transaction-time databases under schema evolution. PVLDB 1(1), 882–895 (2008)
Moon, H.J., Curino, C., Zaniolo, C.: Scalable architecture and query optimization for transaction-time dbs with evolving schemas. In: SIGMOD Conference, pp. 207–218 (2010)
Moroni, F.: Schema Evolution Toolsuite: Analysis and Interpretation of Relational Schema Changes. Master’s thesis, Politecnico di Milano—Dipartimento di Elettronica e Informazione (2009)
Papastefanatos, G., Vassiliadis, P., Simitsis, A., Vassiliou, Y.: Hecataeus: Regulating schema evolution. In: ICDE, pp. 1181–1184, March (2010)
Qian, L., LeFevre, K., Jagadish, H.V.: Crius: User-friendly database design. PVLDB 4(2), 81–92 (2010)
Ra, Y.-G.: Relational schema evolution for program independency. In: Proceedings of the 7th international conference on Intelligent Information Technology, pp 273–281, Springer, Heidelberg (2004). doi:10.1007/978-3-540-30561-3_29
Terwilliger, J.F., Bernstein, P.A., Unnithan, A.: Worry-free database upgrades: Automated model-driven evolution of schemas and complex mappings. In: SIGMOD Conference (2010)
Terwilliger, J.F., Fernández-Moctezuma, R., Delcambre, L.M.L., Maier, D.: Support for schema evolution in data stream management systems. J. UCS 16(20), 3073–3101 (2010)
Ullman, J.: Principles of Database System. Computer Science Press, Rockville (1982)
Ullman, J.D.: Information integration using logical views. Theor. Comput. Sci. 239(2), 189–210 (2000)
Velegrakis, Y., Miller, R.J., Popa, L.: Mapping adaptation under evolving schemas. In: VLDB, pp. 584–595 (2003)
Wikimedia Foundation. Wikipedia, the free encyclopedia http://en.wikipedia.org/, 2007 (Online)
Wikimedia Foundation. The mediawiki http://www.mediawiki.org, 2008
Yu, C., Popa, L.: Semantic adaptation of schema mappings when schemas evolve. In: VLDB, pp. 1006–1017 (2005)
Acknowledgments
The authors would like to thank Fabrizio Moroni, MyungWon Ham for their help in developing the tool, and Letizia Tanca for the great feedback and support.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Curino, C., Moon, H.J., Deutsch, A. et al. Automating the database schema evolution process. The VLDB Journal 22, 73–98 (2013). https://doi.org/10.1007/s00778-012-0302-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-012-0302-x