Skip to main content

The Role of the Data Contract

  • Chapter
  • First Online:
The Modern Data Warehouse in Azure

Abstract

In all data integration projects, there is always a concern about datasets changing their properties. This could be changing columns, changing data types, or even changing the degree of quality instilled in the data. The technical name for this is “Schema Evolution,” sometimes known as Schema Drift, and whether that be new columns arriving or known columns dropping off, how these situations are handled can have a huge effect on the success of the project. At a basic level, you need to be able to detect and react to occasions when a datasets schema has evolved, and with the vast amount of file and database types available, this task is getting more complex. Not only do you need to detect changes in tabular data (CSV files, database extracts) but also in semi-structured datasets such as JSON and XML. Expanding on this basic concept, you need to be able to handle the schema drift so that you can continue to integrate the data without having to manage multiple extraction methods for the same type of data. This may be manual to begin with, but there are tools out there now that can automatically handle schema evolution. As you begin to write ingestion procedures, remember that maintaining these schemas through schema evolution needs to be simple. If you get to a point where you are ingesting over 20 different files or datasets, then you do not want to have to visit each script to update the schema. Instead we need a centralized schema store so that we can easily make updates in a controlled way.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Matt How

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

How, M. (2020). The Role of the Data Contract. In: The Modern Data Warehouse in Azure. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-5823-1_6

Download citation

Publish with us

Policies and ethics