Encryption Types

Next up

Technical Description

Continuing in

SQL Server Always Encrypted

This is a preview of subscription content

Your browser needs to be JavaScript capable to view this video

Try reloading this page, or reviewing your browser settings

Autoplay: Autoplay Off Autoplay On

View previous videoPrevious video View next video

This video segment demonstrates the difference between the 2 types of column encryption supported by SQL Server Always Encrypted, randomized and deterministic. It then offers advice on how to make a choice when encrypting a column.

Keywords

Always Encrypted
Encryption Types
Randomized
Deterministic

About this video

Author(s): Eric Blinn
First online: 27 October 2019
DOI: https://doi.org/10.1007/978-1-4842-5565-0_2
Online ISBN: 978-1-4842-5565-0
Publisher: Apress
Copyright information: © Eric Blinn 2019

Video Transcript

Hello and welcome. In this segment, we’re going to look at SQL Server Always Encrypted encryption types. The feature affords us two distinct methods by which to encrypt columns. Coupled with plain text or no encryption, we have three possibilities for each call. It’s important that we understand the difference between the types and the pros and cons of each so that we can make a smart decision when encrypting a column, as we’ll be making this decision on a per column basis.

The two types of encryption are called deterministic and randomized. I’ve plotted them on this small chart from plain text as the least secure through randomized as the most secure. Deterministic isn’t really in the middle. It’s certainly closer to the security of randomized than it is plain text. So it being drawn off center is no accident.

Before this slide is over, there’s going to be a lot of text on the screen. I’m not planning on reading it all. And I would encourage you to pause the video and read it if you feel like there is something in there you need to see. The first distinction we’ll look at is that certainly columns that do not contain sensitive data would likely be stored in plain text, while those containing sensitive data would choose one of the two encryption methods.

The next big difference is the ability to search the columns. A column encrypted with deterministic encryption may be searched, while one with randomized encryption will not be able to be searched in any way. The searching of a deterministic encrypted column allows for only exact match searches. This would be something with an equal sign or an in clause perhaps but not a like or a less than or between, nothing like that. For that reason, it would also be preferred for primary or foreign key columns. If a column needs to be searched completely, it has to be left in plain text.

Randomized is also preferred for columns with low cardinality. That is columns with a small universe of values. We’ll look at an example of that in a moment. There’s a series of data types that are not eligible to be encrypted. And those all must be done in plain text if they’re to stay. There are also a series of properties about columns that make them ineligible for encryption.

So those two types of encryption are called deterministic and randomized. Deterministic is a repeatable encryption. This means that when a cryptographic operation takes place, any given input value will consistently return the same cipher text value. With randomized encryption, this is not true. You could give it the same input value again and again and it will consistently get different cipher text values. That makes for the stronger encryption.

On the plus side, a deterministic encrypted column is indexable. This is precisely because it is repeatable. A client machine can encrypt a value to be searched, send that Cipher text to the server, and ask SQL Server to find the cipher text without SQL Server ever knowing what the underlying value is being searched. With randomized encryption, any search of the column, even an exact match, will result in an error.

On the downside for a column encrypted with deterministic encryption, it is much less secure on a column of low cardinality. That is a column with a small universe of possible values. A good example of this might be a gender column. There are probably only two values available for that column. And even if the application is more liberal and allowed more options, you would still expect the bulk of the data to fall into one of two values.

Let’s see some examples. In this table, we have a list of orders and each order is assigned to a person. And we’ve assigned it to them using their government ID. And I’ve made these look like American social security numbers. Because the deterministic encryption always spits out the same value, we can see that this person has three orders. And each of their three orders gets the exact same cipher text. Even though SQL Server has no idea what the underlying value is and even though someone without access to the appropriate keys doesn’t know what that is, they can tell that that person has three orders. We don’t know who it is, but we know that they have three.

We can also sort by that value, making the data looks something like that. This is sorted by the government ID cipher text. It’s very easy to imagine that an application could take that value, 123-45-6789, using the repeatable encryption, they can turn that into the 0x1352 cipher text and ask SQL Server to find 0x1352, which it would do very quickly.

Notice though that even though this is an index on government ID, they’re not sorted by the plain text. So it can help us find this exact match value. But if we ask it to do something like a value that was like 1% or something that was less than nine, anything like that that would be considered an inequality search, this index is not helpful to us. It is only helpful for an exact match search, meaning an equal sign or an in type of value.

What would this table look like with randomized encryption? Here, we have the same table with the same orders and the same government IDs. But when we look at our repeated input values, each one gets a different and distinct cipher text. Now someone looking at this cipher text can’t even determine that one person had three orders. But now any search run against this government ID column, exact match or otherwise, will result in an error, as the client given an input plain text value cannot possibly determine every randomized value available.

That example shows a higher cardinality table. Let’s look at a lower cardinality example. Here, we have two copies of the same table. On the left, we have encrypted the gender column using deterministic encryption and on the right randomized. On the left, we can see that each person who has a gender M receives the exact same 0x29 cipher text. And everyone with an F gets the 0xc63 cipher text.

It would not take someone looking at that cipher text very long to realize that one of those values was M and one of those values was F. And if they were to crack even one row by finding something else about that value that suggests which one is M or F, they could then figure out every single value. On the right, however, every M and every F gets a separate cipher text. Even if they crack a row, they’re no closer to figuring out anybody else’s value. That’s why for a table of low cardinality, randomized encryption is strongly encouraged.

So how do we choose? Randomized should really be the value chosen for most columns. Deterministic should only be chosen on a column that meets these requirements. It’s going to be searched and an index would be desirable. Those searches will be exact matches not inequality or range based. And if there’s enough cardinality in the table that the security of the column is not compromised.

The previous example of a government ID or social security number is a textbook example for deterministic encryption. It’s a column we would expect to be commonly searched. It is likely to be searched as an exact match. And there should be enough unique values to keep the column secure. The type of encryption chosen for any one column is a one time decision. Beyond the ability to index and search deterministic column and not search a randomized column, there are no further downstream considerations to be made.