Monthly Archives: June 2012

Bit columns

I am not a big friend of the bit data type. It’s one of these things that seem useful and helpful, but in the end they cause trouble, or at least annoyance. I’m not saying that they shouldn’t be used, but I think one should be aware what that entails.
Like integer, bit is a very simple data type, so there is no problem with the processing time. It only takes up one byte, which is the least any data type can use, so I’m also okay with that (kind of. Why does something that is actually just a bit, even is called a bit, need a byte?). But that’s about it. Everything else about them (or their usage) annoys me.

Nullable bit columns

What’s the sense of that? A bit column makes sense as a flag, defining if a state is true or false. If a third state is needed, like ‘undefined’ or ‘who cares’ use another data type (see further down).

Bit columns without a default

I admit, sometimes the state of something isn’t known or doesn’t matter at insert time. To my understanding this is a third state (see above), but let’s for now assume it is not and we don’t want to include the column in every insert where it is not necessary and we actually don’t know, if the state is true. Then please define a default of 0 = false and make the column NOT NULL.
Just because you’re too lazy to write the column into every insert statement, doesn’t mean it has to be NULL.

Not 0 is 1

Everything you insert into a bit column, that is not 0 (zero) is automatically 1 (true). So even if you insert -42, it will be 1. That’s fine, and part of the data type, but it’s contrary to some other programming languages.
Let’s try this:

declare @a bit = -100
select @a as bitVariable
select case when @a = -100 then 'True' else 'False' end bitVarCheck


So it sets -100 as TRUE, but an equality check on -100 as FALSE. That’s inconsistent. I wouldn’t expect the second query to return TRUE, but I would also expect everything except TRUE being FALSE when inserting data, and not the other way round. Then again it is called bit and not boolean, so the behaviour is actually correct (0 = no value, 1 = (any) value). But still.

And on a related topic:

select * from tablename where bitColumn <> 0;

There are only two possible values. So if it is not 0 it is 1. It’s the same as:

select * from tablename where bitColumn = 1;

Equality operators are always more exact and need less processor time than inequality operators. And they help the query optimizer to decide on its plan faster. I’ll admit that the difference might be minuscule in such a case, but still: do it right. For your karma.

Indexes on bit columns

Again, bit columns only hold two possible values, so an index on a bit column gives you a diversification of data of 2. Most likely the break down will be something like 60-40, 70-30. This index will never be chosen because it returns too many rows. Unless you have a break down of 95-5 and always query on the 5% value, an index on a bit column makes no sense.
A bit column in an index can make sense as

  • A part of a multiple column index with columns with larger diversification before it
  • A where –clause in a filtered index as an additional criteria to keep the index small.
  • The query optimizer sometimes even suggests an index on a bit column. But that’s just because it doesn’t know the actual statistics on the column. So don’t create indexes on bit columns, unless you only query the 5%.

    Data conversion

    This is my biggest issue with bit columns: Almost every query results in a data conversion.

    select * from tablename where bitColumn = 1;
    
    insert into	
    	dbo.tablename
    	(intColumn
    	,nvarcharColumn
    	,bitColumn)
    values
    	(1
    	,N'1'
    	,1)
    
    update
    	dbo.tablename
    set
    	bitColumn = 1
    where
    	intColumn > 10
    
    select 
    	case when bitColumn = 1 then 'True' else 'False' end
    from 
    	dbo.tablename

    This is how we would write those queries, right? In stored procedures or in views, wherever. Each of these queries results in forcing SQL Server to do data conversions. As in most (all?) other programming languages the default for undefined whole numbers is integer. So all the 1s are interpreted as integers. Only when the actual plan is compiled SQL Server realises they are actually meant to be bit and has to convert them.
    In the above cases this is one conversion per statement and converting integer to bit is not the most costly conversion. But still.

    select * from tablename where bitColumn = cast(1 as bit);
    
    select 
    	case when bitColumn = cast(1 as bit) then 'True' else 'False' end
    from 
    	dbo.tablename

    Database Bit in other programming languages

    Every programming language interprets a database bit differently. Some map it to true and false, some return it as 1 and 0. Some return it as byte, some as boolean.
    I know, once you mapped it, you know and can use it accordingly. But still.

    Alternative to bit

    What I like to use instead of bit columns is tinyint with a check constraint that only allows values 0 and 1.
    The advantages are obvious:

  • Integers are just as simple as bits
  • They also need a byte for storage but can actually store eight bits in it, not just one
  • If for some reason at some point in my development I need a third state or even more, I don’t need to change the data type of the column or any of my queries. I simply change the check constraint.
  • The queries I write won’t cause data conversion, because intColumn = 1 is really meant to be an integer.
  • The check constraint will give an error if I try to insert 100 or -42. Which is more consistent.
  • %d bloggers like this: