Abstract
In all data integration projects, there is always a concern about datasets changing their properties. This could be changing columns, changing data types, or even changing the degree of quality instilled in the data. The technical name for this is “Schema Evolution,” sometimes known as Schema Drift, and whether that be new columns arriving or known columns dropping off, how these situations are handled can have a huge effect on the success of the project. At a basic level, you need to be able to detect and react to occasions when a datasets schema has evolved, and with the vast amount of file and database types available, this task is getting more complex. Not only do you need to detect changes in tabular data (CSV files, database extracts) but also in semi-structured datasets such as JSON and XML. Expanding on this basic concept, you need to be able to handle the schema drift so that you can continue to integrate the data without having to manage multiple extraction methods for the same type of data. This may be manual to begin with, but there are tools out there now that can automatically handle schema evolution. As you begin to write ingestion procedures, remember that maintaining these schemas through schema evolution needs to be simple. If you get to a point where you are ingesting over 20 different files or datasets, then you do not want to have to visit each script to update the schema. Instead we need a centralized schema store so that we can easily make updates in a controlled way.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2020 Matt How
About this chapter
Cite this chapter
How, M. (2020). The Role of the Data Contract. In: The Modern Data Warehouse in Azure. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-5823-1_6
Download citation
DOI: https://doi.org/10.1007/978-1-4842-5823-1_6
Published:
Publisher Name: Apress, Berkeley, CA
Print ISBN: 978-1-4842-5822-4
Online ISBN: 978-1-4842-5823-1
eBook Packages: Professional and Applied ComputingApress Access BooksProfessional and Applied Computing (R0)