Monthly Archives: January 2012

Using constraints in Data Models

When talking to application developers about data bases, I have encountered many long-lived myths that seem not to want to die out. One of the most persistent myths is: Constraints, especially foreign key constraints, are bad for performance. So use as less constraints as possible.
The contrary is true: Use as many constraints as possible. Primary key, Foreign Key constraints, Check constraints, Default constraints, Unique constraints, Not NULL constraints.

For two main reasons:

  • Data integrity
  • (and, yes) Performance

 Data Integrity

The more constraints you use, the more you can be sure, that the data will exactly look like it was intended to look like. The more check and key constraints you have, the less data inconsistencies you’ll have. The additional effort this is in the design and implementation phase, is marginal compared to the effort that comes with data inconsistency issues.
I have never heard of expensive consultations and migration experts having to be hired, because the data was too consistent. Implementing as much data integrity in your data model will save you a lot of headaches and maintenance effort.

Performance

Yes, there is a (small) performance overhead in maintaining all these constraints, but this performance overhead of having many constraints is mostly negligible. If you write code where the performance overhead of a foreign key constraint or a check constraint matters, you already have a database that is fast as hell.

On the contrary, the more constraints you have the easier it is for the Query Optimizer to find the best execution plan for a query. Every constraint helps defining the data and the connections between the data, without having to look at the data itself. Thus the optimizer is more likely to find a good execution plan than without constraints. And so constraints are more likely to make the database faster than slower.

Here is an example with a simple check constraint:

select
t1.col1
    ,t2.col1
from
    t1
left outer join t2 on t1.t2Id = t2.id and t2.col2 = 15

Col2 in table T2 only has values from 1 to 3. The optimizer doesn’t know that, so it creates a plan with a table scan on T2.

Let’s create a check constraints that tells the database that Col2 in T2 can only contain values 1, 2 and 3.

ALTER TABLE dbo.t2 ADD CONSTRAINT CHK_t2_Col2
CHECK (Col2 >= 1 AND Col2

Now let’s run the same statement again. And, surprise, the execution plan looks completely different.

The optimizer now knows that T2.Col2 = 15 will never ever bring any result back, so it doesn’t even bother to include the table T2 in the execution plan. The execution plan is now far more efficient.

This is just a simple example, but it goes to show:

The more the Query Optimizer knows about the data, the better the execution plan will be.

And the way to tell SQL Server about the data is creating contraints. Thus:

Create as many constraints as possible!

%d bloggers like this: