Search This Blog

Saturday, June 19, 2010

Why do we need the Non-clustered Index

I was asked this question by an SME , So I thought let me pass on if it makes sense .

Lets create a table test which has a few columns from Person.address table (Adventureworks database)
select addressid,addressline1,addressline2,postalcode,stateprovinceid into test from Person.Address

On this table I created clustered index on AddressID and stateprovinceID
CREATE UNIQUE CLUSTERED INDEX [combined index] ON [dbo].[testo]
(
[addressid] ASC,
[stateprovinceid] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

Another non-clustered index was created with include option
CREATE NONCLUSTERED INDEX [provinceID] ON [dbo].[testo]
(
[postalcode] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]


*** BUT I WILL DISABLE THE INDEX ********

Now I ran a few queries:
-----------------------------------
select * from test where postalcode >'98501'



select * from test where postalcode >'98501' order by stateprovinceid



select postalcode from testo where postalcode >'98501' order by stateprovinceid




select addressline1,addressline2 from testo where postalcode >'98501' order by stateprovinceid



Notice that In all the cases above we have the clustered index Scan and not a simple SCAN .
So the clustered index is being used .And since Its visible that a clustered index is being used there is no need of any other index .Why ............because the data itself lies in the clustered index and we can not improve the plan anymore.

Lets see why a scan would be used
Your Data is sorted on the bases of 2 columns in the clustered index .So any query will use that Index because the data lies there .So in worst to worst case you will have a Clustered Index Scan (Better than a Index scan [heap scan]).

But why Scan and not Seek
Because the columns you are using in the where clause or the columns you are looking for as an output have no index or you can say they are not indexed (indirectly , working like a
heap).So , You will do a scan for those Rows .But since its ordered already , So it will be a clustered Scan.

Now you enable the non-clustered index and run few queries

select postalcode from testo where postalcode >'98501'


It did an index seek .remember it is not a Clustered index seek like below

select addressid from testo where addressid < 100



The reason is that In the earlier case a non-cluseterd index was used to find out the postalcode very quickly but since the data was in the clustered index it needed to point to that Leaf and hence some more efforts .But these efforts were far less than Clustered index scan .

That is the reason we need to Non-clustered index .TO reduce the ROW Lookups or Bookmark lookups.which are the pointers to the clustered index.

Happy learning!!

No comments: