24 Hours of PASS

March 21st brings a new edition of 24 Hours of PASS. 24 back-to-back one hour sessions of free SQL Server training by the leading experts in the world.
Don’t miss out on that and register now!

select… into…

A pattern I see frequently used, especially in conjunction with temporary tables is this

select
   ID
  ,Col1
  ,Col2
  ,Col3
into
  dbo.tab2
from
  dbo.tab1
where
  ID > 100

This syntax automatically creates a table for the result set and inserts the records of the result set into it. It is fast and efficient. And it can be really handy, especially in complex stored procedures or scripts, storing data from complex queries for other complex queries.
What makes it so popular is the possibility to create not only manifest tables, but also temporary tables:

select
   ID
  ,Col1
  ,Col2
  ,Col3
into
  #tmpTab
from
  dbo.tab1
where
  ID > 100

You need not define or create the table before in fact it must not be defined before.
This is all very neat, and sometimes I use it myself, but only sometimes and never in stored procedures. More likely in some ad-hoc scripts.

And here is why:

Schema Locks

select… into… always creates a Schema Lock on the table it is selecting from. Even with (nolock) it needs to create a schema lock to get the schema for the table it creates. A schema lock may not sound too bad and is actually not that big a deal.
But the problem is not so much the lock itself or what it does, but what it takes to get it. Imagine an OLTP system with a high data load, a table with approx. 1000 records inserted per second. And you have a select.. into… from this table, possibly joint with two or three other tables, and your statement, even with the nolock hint, needs to be able to place a schema lock on all four tables at once.
This means the query has to wait for a window, when it can actually place these locks. Sometimes nanoseconds, sometimes a few milliseconds, in worst case, although unlikely, even seconds.
While this may not seem much it can amount to quite a lot in the long run. And what you get in any case are highly different, unreliable execution times.

Crappy Schema

The created table gets the data from the selected data. So f.i. if for Column X the longest selected string value has 4 characters, the field will be varchar(4). Might be ok, but when I later need space for longer values, I am in trouble.
Example:

select
   1      as ID
  ,'XXX'    as Col1
  ,getdate()  as Col2
  ,200.23   as Col3
  ,N'djjdh' as Col4
into
  tabInto

This gives you this schema

CREATE TABLE dbo.tabInto
  (ID int NOT NULL
  ,Col1 varchar(3) NOT NULL
  ,Col2 datetime NOT NULL
  ,Col3 numeric(5, 2) NOT NULL
  ,Col4 nvarchar(5) NOT NULL)

A heap with no primary key, all the columns are exactly the maximum length the data had, and all columns are not null, just because there were not any nulls in the data.
Of course you can create your keys and indexes afterwards, but why let the engine work twice to create a decent table. And we all know what increasing the size of not null-columns does to a table (basically tear it apart).
If you just leave it as the heap it is, the Query Optimizer will have a hard time working with it.

And speaking of execution plans:

Non-Reuseable Execution Plans

If you have a create table-command in your stored procedure, which you implicitly have with select… into…, the procedure has to be recompiled for every execution. It will never get a reusable execution plan. This increases the execution time of the procedure immensely. And it doesn’t matter if it is a real table or a temporary table, they are treated the same way.

Whereas if you use a well-defined table variable, insert .. select… into it, you will get a reusable execution plan for your procedure, and, if you defined all the data types correctly and used constraints where possible, you will also get the best possible execution plan for your data.

SQL Saturday in Dublin, March 24th

Just a quick reminder for all you Developers in and around Dublin: on Saturday, March 24th, SQLSaturday #105 will be hosted in Dublin. Make sure to visit, there will be some great sessions. And it’s free.
sqlsaturday#105

Pitfalls of SQL Server Replication

If you are planning Replication for one of your SQL Server databases there are some pitfalls you will encounter and need to solve. There are some little things you need to take care of, which can be hard to find since some error messages SQL Server gives maybe describe the error but don’t help at all in finding the source or how to resolve it.

1. Inform yourself about replication.
Before you even start planning Replication read up on the principal concepts: Snapshot replication, Transactional Replication, Merge Replication. How they work. What resources they need.

2. Plan your strategy for identity columns.
You need to plan your Replication especially careful if you plan to let users update the replicated data. Then you need to take special care how both databases – master data and replicated data – handle identity columns.

3. Only replicate the data you really need replicated.
Don’t just replicate your whole database. This might seem the easiest solution, especially if your database is small. But databases tend to grow, and what might be a small database in one customer instance, is a huge database at another customer’s instance. Always plan your databases (and therefore also replication) scalable, so they can become bigger and bigger without performance loss.
This said, replicate only the tables, views, functions and stored procedures you need. Replication adds additional stress to the system, so minimize that stress to the absolutely necessary.

4. Check also what the application needs.
Depending on how you plan to use the replicated data, what you actually need to replicate will differ a lot. It is not only necessary to check all the dependencies within the database and maybe even with other databases, but also to check the applications which you are planning to use on the replicated data.
For instance the database and the views, procedures and functions don’t show any indication of any dependency on a user table or a lookup table, but the application you are trying to run on this data absolutely needs this data to work properly (or at all).
So always check with application developers what data apart from the obvious database dependencies you also will need to replicate (or substitute in some other way).

5. Check your objects for their “replication-fitness”.
There are some (simple) prerequisites to take care of before replicating a table:

a. It needs a Primary Key (for transactional replication).
You might shrug at this point, but there are databases out there, especially legacy databases, where not every table has a primary key. If you need to replicate such a table, you need to create a primary key. In many cases you will find unique columns on such tables, either defined by a unique constraint or index or undefined but clearly detectable. Talk to the application developers, what might be unique, if adding a column might work, what consequences a primary key might have.
If there is no way of adding a Primary Key, you cannot replicate that table and you might need to find another way to copy the data – some automated ETL process maybe.
b. Take care of identity columns
When you add a table with an identity column to the replicated articles, SQL Server automatically alters the schema of that column to . This is sufficient if you don’t let the users update the replicated data (read-only database). If you do, you need to take special care of the identity columns (see above).
c. Check for triggers.
Everyone always forgets about triggers. An enabled trigger on the target database can fail the whole replication process. Check the triggers, see what they do, and if you need them on the target database. You even might need to alter them because you need only parts of it.
Tip: If you have a trigger that on insert into Table1 generates data for Table2, there are two ways of handling this:
i. Replicate Table1 and Table2, disable the trigger on the target database.
ii. Replicate only Table1 and leave the trigger enabled on the target database. So every replicated data in Table1 fires the trigger and fills Table2.
Method (i) let’s all the resources be used on the master database, first the trigger fires, then both tables need to be distributed. Method (ii) balances the load by letting the target database do some of the work. Chose for yourself, which to use. It just goes to show, there is no straight answer on what to do with triggers. Each one has to be checked on its own.

6. You need sufficient rights.
When setting up replication you need to have Windows users available that have sufficient rights. We are talking Windows/Domain users here, not SQL Server users:
a. Your user needs rights to set the executing user context of a service, create a Network share, maybe install components on the servers.
b. A user for each database that can reach the database snapshot share.
c. A user for each database under which the SQL Agent job can run.
d. A user with SQL Server sysadmin rights.

7. SQL Server Replication needs to be installed on both servers.
Don’t laugh! This happens more often than you think. Everything is prepared, all the objects and user rights checked and you want to start setting up the replication and the first message you get is

So get your Setup files ready and install SQL Server Replication. It is a straight forward process and shouldn’t present any problems, as long as you have sufficient rights to do so (and the Setup files handy).
Important: even though setup doesn’t tell you to, after the setup is finished you must restart SQL Agent service. Otherwise replication won’t work. And SQL Server will never tell you why, but throw cryptic error messages that some settings are missing or incorrect.

8. Run SQL Agent with a valid user context.
Database snapshots are distributed via a network share that both servers can access. Depending on your security configuration (and best practice implementations) you might need to set SQL Agent service to run under a specific user context.
Important: Don’t change the service’s user context in the computer management screen. Even though this screen automatically gives the user account the Logon as Service right, SQL Server and SQL Agent need some additional rights to work correctly.
Always use SQL Configuration Manager to change the service’s user context. This will set all the rights that SQL Server needs.

9. Replication needs TCP/IP name resolution.
The replication set up screen lets you connect to the distribution server via its IP address. No problem. You can set up everything, set the user rights and so on. Only when replication tries to start its SQL Agent jobs it will tell you, it can’t reach the distribution server because it has no name. So always use the server name instead of the IP address.
Tip: Depending on your network configuration and name resolution configuration it might not be possible to reach the server by name. In such a case you can edit the hosts file in %WINDOWSDIR%\system32\drivers\etc . Here enter the IP and the name and fool SQL Server into having a name to use.

10. The owner of the database has to be valid.
This issue only happens if you have an environment with, let’s say, test and production servers with different SQL users and databases get copied across different environments. It is a quite obvious and rare issue, but it happens, so I mention it.
Even though everything else works and the users seem to have sufficient rights, if the owner’s SID is not recognized, replication won’t work. This can be solved simply by resetting the ownership of the database:

alter authorization on database::test to mySAUser;

Even though the user name might be the same, the SID would be different and therefore we need to make this change.

Calling Oracle stored procedures and functions in C#

Even if you are a SQL Server and /or .NET developer you will find yourself every now and then in a situation, where you have to deal with other databases, for instance Oracle. You might have to, as I do, load data into Oracle Financial Applications, which is an absolute pain in the a… (I will write more about that as soon as the trauma subsides). And as such you might find yourself in the situation that you have to call Oracle stored procedures from your C#/.NET code.

This is, on paper, no big deal. It work’s on large parts exactly like working with SQL Server stored procedures. You can use

System.Data.OleDb

if this is your preferred provider, but actually you should use

System.Data.OracleClient

So far, so good, but there are some things you need to be aware of.

using System;
using System.Data;
using System.Data.OracleClient;

namespace ClassLibrary1
{
    public class Class1
    {
        public int CallOracleProcedure()
        {
            // create Oracle connection
            OracleConnection con = new OracleConnection("{myOracleConnectionString}");
            OracleCommand comm = new OracleCommand(); // create command
            int retVal = -1;

            try
            {
                con.Open(); // open connection
                comm.Connection = con; // add connection to command
                comm.CommandText = "FND_GLOBAL.APPS_INITIALIZE"; // name of the Oracle procedure
                comm.CommandType = CommandType.StoredProcedure; // set type to stored procedure

                // add input parameters
                comm.Parameters.Add(new OracleParameter("inputVal1", 123));
                comm.Parameters.Add(new OracleParameter("inputVal2", "test"));
                // if necessary set data type
                comm.Parameters.Add(new OracleParameter("inputDate", DateTime.Now));
                comm.Parameters["inputDate"].DbType = OracleType.Datetime;

                // add output parameter
                comm.Parameters.Add(new OracleParameter("outputVal1", OleDbType.Integer));
                comm.Parameters["outputVal1"].Direction = ParameterDirection.Output;

                comm.ExecuteNonQuery(); // execute procedure

                retVal = (int)comm.Parameters["outputVal1"].Value;  // read return value
          }
          finally
          {
                con.Close();
          }

            return retVal;
        }

    }
}

As you can see in the code above, it’s quite simple and the same as in SQL Server. What you need to take care of, are the different data types. Depending on how well the OleDb provider is written, you can have some problems with dates or numbers. OracleClient provider gives you the Oracle data types, presumably, but still, dates are always a pain.
You also need to be aware of the difference between function and stored procedure:

using System;
using System.Data;
using System.Data.OracleClient;

namespace ClassLibrary1
{
    public class Class1
    {
        public int CallOracleFunction()
        {
            // create Oracle connection
            OracleConnection con = new OracleConnection("{myOracleConnectionString}");
            OracleCommand comm = new OracleCommand(); // create command
            int retVal = -1;

            try
            {
                con.Open(); // open connection
                comm.Connection = con; // add connection to command
                comm.CommandText = "myOracleFunction"; // name of the Oracle procedure
                comm.CommandType = CommandType.StoredProcedure; // set type to stored procedure

                // add input parameters
                comm.Parameters.Add(new OracleParameter("inputVal1", 123));
                comm.Parameters.Add(new OracleParameter("inputVal2", "test"));
                // if necessary set data type
                comm.Parameters.Add(new OracleParameter("inputDate", DateTime.Now));
                comm.Parameters["inputDate"].DbType = OracleType.Date;

                // add return parameter
                comm.Parameters.Add(new OracleParameter("returnVal", OracleType.Integer));
                comm.Parameters["returnVal"].Direction = ParameterDirection.ReturnValue;

                comm.ExecuteNonQuery(); // execute procedure

                retVal = (int)comm.Parameters["returnVal"].Value; // read return value
            }
            finally
            {
                con.Close();
            }

            return retVal;
        }

    }
}

As you can see, the code looks almost the same. You even have to set the command type to ‘stored procedure’ to execute a function. But if you were to execute the above code for a stored procedure instead of a function, you would get this exception:

 ORA-06550: line 1, column 25:
    PLS-00222: no function with name 'APPS_INITIALIZE' exists in this scope
    ORA-06550: line 1, column 7:
    PL/SQL: Statement ignored

The main and only difference (from a C# perspective) between Oracle Function and Oracle Stored Procedure is the return value. Functions have return values, procedures have output parameters. And if you pass a return-parameter to a stored procedure, it throws an exception.

Generating random numbers in T-SQL

Every once in a while the necessity arises to generate some random numbers in your T-SQL code, be it for some random selection of price winners, be it to randomize a sort order or whatever reason.
SQL Server provides a built-in function rand() which returns a random number. Great, you might say, until you try it.

select rand()

This returns a float value between 0 and 1. To make that useful you have to tweak it. I chose to create a function that returns a random integer in between the boundaries I need.

-- view that returns a random number
begin try
    exec('create view dbo.vw_RandNumber as select 1');
end try begin catch end catch
go

alter view dbo.vw_RandNumber
as
    -- return a random number using the built-in rand() function
    select
        rand() as RandNumber    -- returns a float value between 0 and 1
    ;
go

-- function to return a random integer value
begin try
    exec('create function dbo.fn_RandNumber() returns int');
end try begin catch end catch
go

alter function dbo.fn_RandNumber
    (@Min int   -- lower boundary of desired random value
    ,@Max int)  -- upper boundary of desired random value
    returns int
AS
begin
    declare
         @rand float    -- the built-in rand() function returns a float,
                        -- so we need one here
        ,@ret int       -- our return value
    ;

    while @ret is null  -- just to be sure, we really return sth valid
    begin
        -- rand() returns a float between 0 and 1, so we need to tweak it
        -- to get a random number between our boundaries
        -- f.i. @Min = 1, @Max = 100 => random number between 1 and 100
        select      @rand = @Min
                            + (select RandNumber
                            from dbo.vw_RandNumber)
                            * (@Max-@Min)
        ;
        -- now we convert the float into an integer
        -- simply by assigning it to an int variable, thus cutting of the decimals
        -- Note: this is still an implicit conversion, even if no convert() or
        --       cast() is used meaning it still needs the same resources
        select @ret = @rand   -- 94,56873936663979 => 94
    end
    -- return the result
    return @ret;
end
go

This way I can get a random number within any range I want (as long as it’s integer).

select dbo.fn_RandNumber(1, 100)
select dbo.fn_RandNumber(1, 500000)
select dbo.fn_RandNumber(800, 900)
select dbo.fn_RandNumber(-50, 0)

You might ask why we need this construct with a view. Simple answer, the usage of rand() is not allowed within a user defined function. It will throw an error message:
Invalid use of a side-effecting operator ‘rand’ within a function.

Therefore we need to ‘fool’ SQL Server by creating a view that returns just the one random number and then select from the view within the function.

Caveats

There are two major caveats to consider:

Firstly, rand() returns a float value, we also need a float variable in the function. Using floats is never really a good idea. Normal CPUs are not optimized for floating point calculations, meaning that operations with floats need additional CPU cycles. Unless you have an additional floating point processor in your SQL Server machine (which is very unlikely), this will take up some CPU time.

There are other ways to create random numbers, you’ll find them when you google. Here is a neat example by Jeff Moden.

select
     abs(checksum(newid())) % (@Max - @Min + 1) + 1 as randNum
    ,col1
    ,col2
    ,col3
FROM
    dbo.someTable

I prefer the construct with the view and the function, since I think it’s quite flexible (I can have functions for decimal, money, whatever), easy to read in stored procedure code and easy to maintain.

Using constraints in Data Models

When talking to application developers about data bases, I have encountered many long-lived myths that seem not to want to die out. One of the most persistent myths is: Constraints, especially foreign key constraints, are bad for performance. So use as less constraints as possible.
The contrary is true: Use as many constraints as possible. Primary key, Foreign Key constraints, Check constraints, Default constraints, Unique constraints, Not NULL constraints.

For two main reasons:

  • Data integrity
  • (and, yes) Performance

 Data Integrity

The more constraints you use, the more you can be sure, that the data will exactly look like it was intended to look like. The more check and key constraints you have, the less data inconsistencies you’ll have. The additional effort this is in the design and implementation phase, is marginal compared to the effort that comes with data inconsistency issues.
I have never heard of expensive consultations and migration experts having to be hired, because the data was too consistent. Implementing as much data integrity in your data model will save you a lot of headaches and maintenance effort.

Performance

Yes, there is a (small) performance overhead in maintaining all these constraints, but this performance overhead of having many constraints is mostly negligible. If you write code where the performance overhead of a foreign key constraint or a check constraint matters, you already have a database that is fast as hell.

On the contrary, the more constraints you have the easier it is for the Query Optimizer to find the best execution plan for a query. Every constraint helps defining the data and the connections between the data, without having to look at the data itself. Thus the optimizer is more likely to find a good execution plan than without constraints. And so constraints are more likely to make the database faster than slower.

Here is an example with a simple check constraint:

select
t1.col1
    ,t2.col1
from
    t1
left outer join t2 on t1.t2Id = t2.id and t2.col2 = 15

Col2 in table T2 only has values from 1 to 3. The optimizer doesn’t know that, so it creates a plan with a table scan on T2.

Let’s create a check constraints that tells the database that Col2 in T2 can only contain values 1, 2 and 3.

ALTER TABLE dbo.t2 ADD CONSTRAINT CHK_t2_Col2
CHECK (Col2 >= 1 AND Col2

Now let’s run the same statement again. And, surprise, the execution plan looks completely different.

The optimizer now knows that T2.Col2 = 15 will never ever bring any result back, so it doesn’t even bother to include the table T2 in the execution plan. The execution plan is now far more efficient.

This is just a simple example, but it goes to show:

The more the Query Optimizer knows about the data, the better the execution plan will be.

And the way to tell SQL Server about the data is creating contraints. Thus:

Create as many constraints as possible!

Rebuilding fragmented indexes and tables

Over time indexes and tables will get fragmented. What that means is that the real data pages containing the data for the table or the index are more and more scattered around our data files and not neatly side by side, pretty much the same as disk fragmentation.

The effect this has is a loss in performance since data has to be picked up in different places, indexes must be read from different places, writes go crisscross all over the file(s). A lot of additional I/O operations.

To prevent that from happening we should regularly check our tables and indexes for fragmentation. You can use either DBCC SHOWCONTIG or the DMV sys.dm_db_index_physical_stats to query the fragmentation level of objects.

-- set minimum page count the index should have
-- to be considered for a rebuild.
-- indexes smaller than 100, 120 pages usually don't gain anything from rebuilding.
declare @IPageCnt int = 100;
-- set the minimum fragmentation level of an index
-- at which threshold it should be rebuild. (value in % of fragmentation)
declare @IMinFragmentation int = 30;
SELECT
     OBJECT_SCHEMA_NAME(FRAG.[object_id]) + '.' + OBJECT_NAME(FRAG.[object_id])  as [table]
    ,SIX.[name] as [index]
    ,FRAG.avg_fragmentation_in_percent as [fragmentation]
    ,FRAG.page_count as [pages]
    ,'alter index ' + SIX.[name]
        + ' on '+OBJECT_SCHEMA_NAME(FRAG.[object_id])
        + '.' + OBJECT_NAME(FRAG.[object_id])
        + ' rebuild;' as [rebuild_sql]
FROM
    sys.dm_db_index_physical_stats
    (
        db_id(),    --use the currently connected database
        0,          --Parameter for object_id.
        DEFAULT,    --Parameter for index_id.
        0,          --Parameter for partition_number.
        DEFAULT     --Scanning mode. Default to "LIMITED", which is good enough
    ) FRAG
    INNER JOIN sys.indexes SIX ON FRAG.[object_id] = SIX.[object_id]
        AND FRAG.index_id = SIX.index_id
WHERE
    -- consider only those indexes that need treatment
    FRAG.page_count > @IPageCnt
    AND FRAG.avg_fragmentation_in_percent > @IMinFragmentation
ORDER BY
    FRAG.avg_fragmentation_in_percent DESC

 The level of fragmentation you are willing to allow is up to you. You should consider the time the size of the table, the available space, the time it takes to rebuild tables and indexes etc. As a rule of thumb, the bigger the table the lower the threshold. On a huge table with several billion rows 10% mean several hundred million rows scattered around the data file, on a small table with some ten thousand rows even 40% or 50% fragmentation might be tolerable.

In this example I am interested in all tables that have a fragmentation level of 30% or more.

NOTE: You should always know resp. try to find out why fragmentation is happening, because in many cases it can hint to an underlying problem with the database setup and design. Fragmentation could hint to disk problems, memory problems or poor clustered key design.

Defragment Indexes

Once you identified the indexes and tables there are a lot of things to take into account before defragmenting. There are several ways to defragment an index.

alter index IX_myIndex on dbo.myTable reorganize

This will reorganize the leaf level data of an index, clustered or nonclustered. This operation is always done online. It reorders the index pages and also tries to compact them. The index and the table remain online.  It’s better than nothing but then again not much more than that. With a low level of fragmentation you might want to try a reorganization.

alter index IX_myIndex on dbo.myTable rebuild

This does what it says on the label: it will rebuild the whole index from scratch. New index pages are created in order and compacted together. This operation can be done online or offline.

Depending on the index size, rebuilding will take up a lot of temporary space:

– Size of index in the data file
– Enough space in tempdb for sorting
– Or space for sorting in the data file with sort_in_tempdb = off 

When rebuilding a clustered index be aware that the clustered index is the table, so you will need free space to move the whole table around! 

To make room for this operation you can disable other indexes on the table. This will remove all the leaf data from the index, but not the meta data and the index definition. It will also prevent the index from being constantly updated while the rebuild is moving data pages around. But beware that you have to rebuild the indexes afterwards to enable them again.

Also, if you disable a clustered, unique or primary key index, all associated non-clustered, non-unique indexes AND all foreign key constraints associated with the table will also be disabled and will have to be rebuilt afterwards. The table data will be completely inaccessible until the index is rebuilt. So, careful with disabling indexes!

Defragment Tables

If the table has a clustered index (which every table should have!) then rebuilding the clustered index IS rebuilding the table, and that’s exactly what you should do.

If there is no clustered index you can use

alter table dbo.myTable rebuild

Note that this will also rebuild ALL indexes on that table.

Important Note: After you rebuilt your indexes/tables, recompile your stored procedures!

After rebuilding indexes and/or tables you should recompile your stored procedures which access this table. The statistics will be up to date and the efficiency of the indexes might have change.

exec sp_recompile N'dbo.myTable'

 This command marks the table myTable for recompilation. This means, that every stored procedure accessing the table for the first time after the command will be recompiled and receive a new execution plan.

SQL Server healthcheck: sp_blitz by Brent Ozar

Just a quick reminder for everyone who hasn’t yet tried it: if you want to check out an unknown server, or find issues on your well-known server, and learn about the caveats for setting up and maintaining a SQL Server, there’s no better way to start as Brent Ozar’s sp_blitz.

It will check for commonly known issues, some obvious, some not so commonly known and guide you on to further learn about the issues, fix them and avoid them.

Using Fully Qualified Object Names and DEFAULT_SCHEMA for performance gain

As a rule of thumb you should always use fully qualified object names in your T-SQL code. I know that many find it inconvenient, many are not even using different schemas, but still, using fully qualified object names can give your database a performance boost.

In SQL server you can define schemas and you can define on object creation, in which schema this object should be created. When accessing the object you can use the fully qualified name of the object, consisting of up to three parts

database.schema.object

We will ignore the first part, the database, for now. The database should be defined by the connection anyway. This article concentrates on the schema qualifier.

select ID from myschema.mytable;
exec dbo.uspMyProcedure;
select * from dbo.myView;
select ID, dbo.myFunction(myColumn) as calcVal from dbo.myView;

But why should you when this

select ID from mytable;

works just as well?

Well, that´s the point: it doesn’t work as well.

Reads

Let´s take a look at what happens when we execute the statement without schema qualifier. We’ll assume we are executing the statement as userX. SQL Server first looks for the object in the user’s default schema, which pre-SQL Server 2005 was the user’s own. So SQL Server looks for userX.mytable. The object does not exist, so SQL Server looks in the dbo (the overall default schema), so it looks for dbo.mytable, finds it and returns the data.

Note: If the table was in another schema than userX or dbo, the second statement wouldn’t work at all, even if the userX has access to the other schema. SQL Server stops its search. You will have to use a schema qualifier in any case.

One step to solve this is to set the DEFAULT_SCHEMA for userX. Since SQL Server 2005 the default for the DEFAULT_SCHEMA on user creation is dbo. You can look up the default schema for all users in the column default_schema_name of the DMV sys.database_principals.

select * from sys.database_principals

You can also set the default schema with this

ALTER USER [Username] WITH DEFAULT_SCHEMA=dbo;

This means Server directly looks in dbo for the object dbo.mytable.

So all is good, isn’t it? Well, not quite.

SQL Server still has to look up the user’s default schema. Yes, we saved one of two unnecessary reads, but why not save all? Just tell SQL Server where the object is. One additional read, mostly from the cache, may not sound like much, but just imagine a big database system with hundreds of transactions and queries per second. Every unnecessary operation you save can make a difference, and by schema-qualifying your objects you can save at least one, depending on your user-schema-default schema-architecture even two operations for every statement.

Finding Objects

In an architectare where you actually do have several schemas, fully qualifying the object names is an absolute must! Imagine userX and userY each have their own schemas and there is a dbo schema as well.

Your application issues the statement

truncate table customer;

UserX executes the statement in his default schema, which contains the table userX.customer with his/her private customer table.

truncate table userX.customer;

UserY doesn’t have a table customer in his/her default schema, so dbo.customer is truncated.

truncate table dbo.customer;

UserX, finding the private table truncated drops the table in the schema userX. He/she executes the statement again, this time, since userX.customer doesn’t exist anymore, on dbo.customer….

And so on.. I hope you see where I am going with this.

Reusability of Execution Plans

Last and not least, a big performance advantage of SQL Server is the reusability of execution plans. In a scenario like the one above SQL Server cannot rely on the object targeted being the same on every execution. It depends on the user executing the command. So in this case SQL Server will not reuse the execution plan but will recompile the statement every time.

All this may not make a difference in every database and every project you do, but if you make a habit out of using schema.object as your way of writing T-SQL, you will never have to think about, if it could.

%d bloggers like this: