Monthly Archives: December 2011

Rebuilding fragmented indexes and tables

Over time indexes and tables will get fragmented. What that means is that the real data pages containing the data for the table or the index are more and more scattered around our data files and not neatly side by side, pretty much the same as disk fragmentation.

The effect this has is a loss in performance since data has to be picked up in different places, indexes must be read from different places, writes go crisscross all over the file(s). A lot of additional I/O operations.

To prevent that from happening we should regularly check our tables and indexes for fragmentation. You can use either DBCC SHOWCONTIG or the DMV sys.dm_db_index_physical_stats to query the fragmentation level of objects.

-- set minimum page count the index should have
-- to be considered for a rebuild.
-- indexes smaller than 100, 120 pages usually don't gain anything from rebuilding.
declare @IPageCnt int = 100;
-- set the minimum fragmentation level of an index
-- at which threshold it should be rebuild. (value in % of fragmentation)
declare @IMinFragmentation int = 30;
SELECT
     OBJECT_SCHEMA_NAME(FRAG.[object_id]) + '.' + OBJECT_NAME(FRAG.[object_id])  as [table]
    ,SIX.[name] as [index]
    ,FRAG.avg_fragmentation_in_percent as [fragmentation]
    ,FRAG.page_count as [pages]
    ,'alter index ' + SIX.[name]
        + ' on '+OBJECT_SCHEMA_NAME(FRAG.[object_id])
        + '.' + OBJECT_NAME(FRAG.[object_id])
        + ' rebuild;' as [rebuild_sql]
FROM
    sys.dm_db_index_physical_stats
    (
        db_id(),    --use the currently connected database
        0,          --Parameter for object_id.
        DEFAULT,    --Parameter for index_id.
        0,          --Parameter for partition_number.
        DEFAULT     --Scanning mode. Default to "LIMITED", which is good enough
    ) FRAG
    INNER JOIN sys.indexes SIX ON FRAG.[object_id] = SIX.[object_id]
        AND FRAG.index_id = SIX.index_id
WHERE
    -- consider only those indexes that need treatment
    FRAG.page_count > @IPageCnt
    AND FRAG.avg_fragmentation_in_percent > @IMinFragmentation
ORDER BY
    FRAG.avg_fragmentation_in_percent DESC

 The level of fragmentation you are willing to allow is up to you. You should consider the time the size of the table, the available space, the time it takes to rebuild tables and indexes etc. As a rule of thumb, the bigger the table the lower the threshold. On a huge table with several billion rows 10% mean several hundred million rows scattered around the data file, on a small table with some ten thousand rows even 40% or 50% fragmentation might be tolerable.

In this example I am interested in all tables that have a fragmentation level of 30% or more.

NOTE: You should always know resp. try to find out why fragmentation is happening, because in many cases it can hint to an underlying problem with the database setup and design. Fragmentation could hint to disk problems, memory problems or poor clustered key design.

Defragment Indexes

Once you identified the indexes and tables there are a lot of things to take into account before defragmenting. There are several ways to defragment an index.

alter index IX_myIndex on dbo.myTable reorganize

This will reorganize the leaf level data of an index, clustered or nonclustered. This operation is always done online. It reorders the index pages and also tries to compact them. The index and the table remain online.  It’s better than nothing but then again not much more than that. With a low level of fragmentation you might want to try a reorganization.

alter index IX_myIndex on dbo.myTable rebuild

This does what it says on the label: it will rebuild the whole index from scratch. New index pages are created in order and compacted together. This operation can be done online or offline.

Depending on the index size, rebuilding will take up a lot of temporary space:

– Size of index in the data file
– Enough space in tempdb for sorting
– Or space for sorting in the data file with sort_in_tempdb = off 

When rebuilding a clustered index be aware that the clustered index is the table, so you will need free space to move the whole table around! 

To make room for this operation you can disable other indexes on the table. This will remove all the leaf data from the index, but not the meta data and the index definition. It will also prevent the index from being constantly updated while the rebuild is moving data pages around. But beware that you have to rebuild the indexes afterwards to enable them again.

Also, if you disable a clustered, unique or primary key index, all associated non-clustered, non-unique indexes AND all foreign key constraints associated with the table will also be disabled and will have to be rebuilt afterwards. The table data will be completely inaccessible until the index is rebuilt. So, careful with disabling indexes!

Defragment Tables

If the table has a clustered index (which every table should have!) then rebuilding the clustered index IS rebuilding the table, and that’s exactly what you should do.

If there is no clustered index you can use

alter table dbo.myTable rebuild

Note that this will also rebuild ALL indexes on that table.

Important Note: After you rebuilt your indexes/tables, recompile your stored procedures!

After rebuilding indexes and/or tables you should recompile your stored procedures which access this table. The statistics will be up to date and the efficiency of the indexes might have change.

exec sp_recompile N'dbo.myTable'

 This command marks the table myTable for recompilation. This means, that every stored procedure accessing the table for the first time after the command will be recompiled and receive a new execution plan.

SQL Server healthcheck: sp_blitz by Brent Ozar

Just a quick reminder for everyone who hasn’t yet tried it: if you want to check out an unknown server, or find issues on your well-known server, and learn about the caveats for setting up and maintaining a SQL Server, there’s no better way to start as Brent Ozar’s sp_blitz.

It will check for commonly known issues, some obvious, some not so commonly known and guide you on to further learn about the issues, fix them and avoid them.

Using Fully Qualified Object Names and DEFAULT_SCHEMA for performance gain

As a rule of thumb you should always use fully qualified object names in your T-SQL code. I know that many find it inconvenient, many are not even using different schemas, but still, using fully qualified object names can give your database a performance boost.

In SQL server you can define schemas and you can define on object creation, in which schema this object should be created. When accessing the object you can use the fully qualified name of the object, consisting of up to three parts

database.schema.object

We will ignore the first part, the database, for now. The database should be defined by the connection anyway. This article concentrates on the schema qualifier.

select ID from myschema.mytable;
exec dbo.uspMyProcedure;
select * from dbo.myView;
select ID, dbo.myFunction(myColumn) as calcVal from dbo.myView;

But why should you when this

select ID from mytable;

works just as well?

Well, that´s the point: it doesn’t work as well.

Reads

Let´s take a look at what happens when we execute the statement without schema qualifier. We’ll assume we are executing the statement as userX. SQL Server first looks for the object in the user’s default schema, which pre-SQL Server 2005 was the user’s own. So SQL Server looks for userX.mytable. The object does not exist, so SQL Server looks in the dbo (the overall default schema), so it looks for dbo.mytable, finds it and returns the data.

Note: If the table was in another schema than userX or dbo, the second statement wouldn’t work at all, even if the userX has access to the other schema. SQL Server stops its search. You will have to use a schema qualifier in any case.

One step to solve this is to set the DEFAULT_SCHEMA for userX. Since SQL Server 2005 the default for the DEFAULT_SCHEMA on user creation is dbo. You can look up the default schema for all users in the column default_schema_name of the DMV sys.database_principals.

select * from sys.database_principals

You can also set the default schema with this

ALTER USER [Username] WITH DEFAULT_SCHEMA=dbo;

This means Server directly looks in dbo for the object dbo.mytable.

So all is good, isn’t it? Well, not quite.

SQL Server still has to look up the user’s default schema. Yes, we saved one of two unnecessary reads, but why not save all? Just tell SQL Server where the object is. One additional read, mostly from the cache, may not sound like much, but just imagine a big database system with hundreds of transactions and queries per second. Every unnecessary operation you save can make a difference, and by schema-qualifying your objects you can save at least one, depending on your user-schema-default schema-architecture even two operations for every statement.

Finding Objects

In an architectare where you actually do have several schemas, fully qualifying the object names is an absolute must! Imagine userX and userY each have their own schemas and there is a dbo schema as well.

Your application issues the statement

truncate table customer;

UserX executes the statement in his default schema, which contains the table userX.customer with his/her private customer table.

truncate table userX.customer;

UserY doesn’t have a table customer in his/her default schema, so dbo.customer is truncated.

truncate table dbo.customer;

UserX, finding the private table truncated drops the table in the schema userX. He/she executes the statement again, this time, since userX.customer doesn’t exist anymore, on dbo.customer….

And so on.. I hope you see where I am going with this.

Reusability of Execution Plans

Last and not least, a big performance advantage of SQL Server is the reusability of execution plans. In a scenario like the one above SQL Server cannot rely on the object targeted being the same on every execution. It depends on the user executing the command. So in this case SQL Server will not reuse the execution plan but will recompile the statement every time.

All this may not make a difference in every database and every project you do, but if you make a habit out of using schema.object as your way of writing T-SQL, you will never have to think about, if it could.

BIDS: The given path’s format is not supported

Yesterday I came across an interesting error, that I have not yet found an explanation or solution for. Although I found a workaround. But first things first: the error.

For a new project I need to load data from multiple sources (flat files, SQL Server, Excel) into an Oracle database. I decided to use SSIS as a tool to achieve this, and which is made for exactly projects like this.

I downloaded the newest Oracle OLEDB provider from the Oracle homepage, installed it and tested the connection to my Oracle target database. Worked fine.

I then started to create my SSIS package in Visual Studio BIDS, created my data sources and created the connection to the target database.

I clicked Test Connection and as expected the result was positive.

I expected to get on with my work and clicked OK. How wrong you sometimes can be… because what happened was an error message as cryptic as can be.

Everything I found on the internet was either unsolved, had to do with file paths in ASP.NET or was solved by reinstalling the machine. That was not very helpful. The most annoying thing was, that the connection tested ok, but I couldn’t store the connection because of the error. And if I can’t create and store the connection, the SSIS is useless to me for this project. What to do? Reinstall the servers? Forget SSIS and use another architecture? No time for that.

I still don’t know the cause of the error or what it even means in this context. But I found a workaround.

Workaround

To check if there was something wrong with my connection string I opened SSMS and used the data import and export wizard to connect to the Oracle database with the same connection string and imported some tables into a test database. That worked without any problems.

Next I used the data import and export wizard again to export data to my Oracle database and this time I stored the generated package.

I opened the package in Visual Studio, removed everything except the Oracle destination connection, and behold! it worked just fine.

If you now think, the problem is solved: it is not. When I open the connection properties in Visual Studio and try to save them again, I still get that damn error.

So it works for me for this project. But before I start the next one I surely want to know what went wrong here.

%d bloggers like this: