select… into…

A pattern I see frequently used, especially in conjunction with temporary tables is this

select
   ID
  ,Col1
  ,Col2
  ,Col3
into
  dbo.tab2
from
  dbo.tab1
where
  ID > 100

This syntax automatically creates a table for the result set and inserts the records of the result set into it. It is fast and efficient. And it can be really handy, especially in complex stored procedures or scripts, storing data from complex queries for other complex queries.
What makes it so popular is the possibility to create not only manifest tables, but also temporary tables:

select
   ID
  ,Col1
  ,Col2
  ,Col3
into
  #tmpTab
from
  dbo.tab1
where
  ID > 100

You need not define or create the table before in fact it must not be defined before.
This is all very neat, and sometimes I use it myself, but only sometimes and never in stored procedures. More likely in some ad-hoc scripts.

And here is why:

Schema Locks

select… into… always creates a Schema Lock on the table it is selecting from. Even with (nolock) it needs to create a schema lock to get the schema for the table it creates. A schema lock may not sound too bad and is actually not that big a deal.
But the problem is not so much the lock itself or what it does, but what it takes to get it. Imagine an OLTP system with a high data load, a table with approx. 1000 records inserted per second. And you have a select.. into… from this table, possibly joint with two or three other tables, and your statement, even with the nolock hint, needs to be able to place a schema lock on all four tables at once.
This means the query has to wait for a window, when it can actually place these locks. Sometimes nanoseconds, sometimes a few milliseconds, in worst case, although unlikely, even seconds.
While this may not seem much it can amount to quite a lot in the long run. And what you get in any case are highly different, unreliable execution times.

Crappy Schema

The created table gets the data from the selected data. So f.i. if for Column X the longest selected string value has 4 characters, the field will be varchar(4). Might be ok, but when I later need space for longer values, I am in trouble.
Example:

select
   1      as ID
  ,'XXX'    as Col1
  ,getdate()  as Col2
  ,200.23   as Col3
  ,N'djjdh' as Col4
into
  tabInto

This gives you this schema

CREATE TABLE dbo.tabInto
  (ID int NOT NULL
  ,Col1 varchar(3) NOT NULL
  ,Col2 datetime NOT NULL
  ,Col3 numeric(5, 2) NOT NULL
  ,Col4 nvarchar(5) NOT NULL)

A heap with no primary key, all the columns are exactly the maximum length the data had, and all columns are not null, just because there were not any nulls in the data.
Of course you can create your keys and indexes afterwards, but why let the engine work twice to create a decent table. And we all know what increasing the size of not null-columns does to a table (basically tear it apart).
If you just leave it as the heap it is, the Query Optimizer will have a hard time working with it.

And speaking of execution plans:

Non-Reuseable Execution Plans

If you have a create table-command in your stored procedure, which you implicitly have with select… into…, the procedure has to be recompiled for every execution. It will never get a reusable execution plan. This increases the execution time of the procedure immensely. And it doesn’t matter if it is a real table or a temporary table, they are treated the same way.

Whereas if you use a well-defined table variable, insert .. select… into it, you will get a reusable execution plan for your procedure, and, if you defined all the data types correctly and used constraints where possible, you will also get the best possible execution plan for your data.

Leave a Reply

Your email address will not be published. Required fields are marked *

%d bloggers like this: