SQL Tips by Namwar Rizvi

July 18, 2008

How SQL Server Index Statistics works – Part 2

Filed under: Information,SQL Server 2005,TSQL,Tuning — namwar @ 11:36 PM
Tags: ,

In my last article,

How SQL Server Index Statistics works – Part 1

I described the concept behind SQL Server index statistics. Today, I will present the real world example from AdventurWorks database. Launch Management Studio and execute the following script

Use AdventureWorks

Go

DBCC SHOW_STATISTICS (‘Sales.SalesOrderDetail’,‘IX_SalesOrderDetail_ProductID’ )

Go

You will get the result similar to this:

The fist resultset tells you about the basic details of statistics collected for the index IX_SalesOrderDetail_ProductI You can check when was the statistics last updated. If it is too old then you need to update the statistics.

Second resultset guides you about the density of frequent values. Frequent value means the value which appeared more than once in the step. The All density considers frequent values and Density considers non-frequent values only.

Third resultset is actually the most intersting section. This is the histogram of values of the indexed column of the index or the first column of the composit index. Description of its columns is as follows:

 

Column

Details

RANGE_HI_KEY

Upper bound value of a histogram step i.e. the highest value in the step. The first step is the lowest value for the column in the table.

RANGE_ROWS

Number of rows from the sample that fall within a histogram step, excluding the upper bound. By definition, this must be zero in the first sample.

EQ_ROWS

Number of rows from the sample that are equal in value to the upper bound of the histogram step.

DISTINCT_RANGE_ROWS

Number of distinct values within a histogram step, excluding the upper bound.

AVG_RANGE_ROWS

average number of duplicate values within a histogram step, excluding the upper bound. This is defined as:

(RANGE_ROWS / DISTINCT_RANGE_ROWS

for DISTINCT_RANGE_ROWS > 0)

By using this histogram, SQL Server determines the expected no. of rows to be returned for a given query. For example: in the above if you see the highlighted section, you will noticed that in the 80th step we have

RANGE_HI_KEY=819

RANGE_ROWS=218

EQ_ROWS= 195

Firstly, it means that for range of ProductID>815 and ProductID<819 we will get 218 rows. This can be verified by executing the following query:

Select count(*) from Sales.SalesOrderDetail where ProductID>815 and ProductID<819

Result is 218, which confirms the RANGE_ROWS statistics value

Secondly, it also means that for ProductID=819 we will get 195 rows. This can be verified by executing the following query:

Select count(*) from Sales.SalesOrderDetail where ProductID=819

Result is 195, which confirms the EQ_ROWS statistics value.

I hope by seeing these examples, you have got the good idea how SQL Server forecasts the total number of rows to be returned and how they are used in analyzing the cost of index usage.

Advertisements

July 17, 2008

How SQL Server Index Statistics works – Part 1

In my last article,

Selectivity of a Query and how it affects indexing

I discussed the selectivity of a query. Today I will further extend our discussion and will drill down how index Statistics works. I will continue it in my next article where we will see some real world examples but first we need to understand the concept.

Index statistics is one of the most fundamental part of query plan optimization in SQL Server. Better understanding of index statistics helps you optimize and fine tune your queries. So if you have 1000000 rows and you are searching for a range of values of an indexed column then how SQL Server forecasts how many number of rows will match your criteria before even searching for any row? Here is the actual process:

As we all know that when we index a column, SQL Server does two things:

  1. Sorts the values of the column in an internal data structure called “Index” This data structure contains the sorted value and a pointer to its respective row in the table.
  2. Generates a histogram of values.

The histogram shows the distribution of values for the sorted column. For example:

if we have following values in an integer column 1,2,2,3,4,1,4,5,4 then a typical histogram will be similar to this

Value

Rows matched

1

2

2

2

3

1

4

3

5

1

So suppose if you are searching for all rows having column values between 3 and 5 then by looking at the above histogram table you can estimate that you will get total 1+3+1=5 rows. If your table contains just 9 rows in total then 5 rows means approximately 55% of total rows will match the criteria. Based on this calculation you may elect to directly scan the table instead of first reading the index, fetching the pointers of the matching rows and then read the actual rows. Although, it is a very simple example and there are other factors involve in deciding whether to use index or not but this example demonstrates the concept in very simple terms.

In my next article, I will further explain this concept with the help of real world examples.

Blog at WordPress.com.