SQL Tips by Namwar Rizvi

January 15, 2009

How to use more than 4GB RAM on SQL Server?

SQL Server is an highly optimized and well tuned piece of code. It try to utilise Operating System resources as optimum as possible. Due to this, most DBA think that if they will increase RAM on the server then SQL Server will automatically start using it, right? Wrong, SQL Server can only use RAM more than 4GB if and only if,

  • You are running Enterprise Editions of Windows 2003 Servers orAdvanced/Datacenter version of Windows 2000
  • You have enabled Operating System to use the extra RAM you have installed. Please check here for the details and how to do it.
  • You are not running Express or Workgroup version of SQL Server. Please check here for further details about limitations in different versions of SQL Server
  • you have enabled awe enabled option and set max server memory to the maximum memory you can allocate to SQL Server. This is applicatiable for 32 but versions of OS and not required in 64 bit version of Windows servers. Please check here for further details and how to configure the memoryfor SQL Server
Advertisements

July 18, 2008

How SQL Server Index Statistics works – Part 2

Filed under: Information,SQL Server 2005,TSQL,Tuning — namwar @ 11:36 PM
Tags: ,

In my last article,

How SQL Server Index Statistics works – Part 1

I described the concept behind SQL Server index statistics. Today, I will present the real world example from AdventurWorks database. Launch Management Studio and execute the following script

Use AdventureWorks

Go

DBCC SHOW_STATISTICS (‘Sales.SalesOrderDetail’,‘IX_SalesOrderDetail_ProductID’ )

Go

You will get the result similar to this:

The fist resultset tells you about the basic details of statistics collected for the index IX_SalesOrderDetail_ProductI You can check when was the statistics last updated. If it is too old then you need to update the statistics.

Second resultset guides you about the density of frequent values. Frequent value means the value which appeared more than once in the step. The All density considers frequent values and Density considers non-frequent values only.

Third resultset is actually the most intersting section. This is the histogram of values of the indexed column of the index or the first column of the composit index. Description of its columns is as follows:

 

Column

Details

RANGE_HI_KEY

Upper bound value of a histogram step i.e. the highest value in the step. The first step is the lowest value for the column in the table.

RANGE_ROWS

Number of rows from the sample that fall within a histogram step, excluding the upper bound. By definition, this must be zero in the first sample.

EQ_ROWS

Number of rows from the sample that are equal in value to the upper bound of the histogram step.

DISTINCT_RANGE_ROWS

Number of distinct values within a histogram step, excluding the upper bound.

AVG_RANGE_ROWS

average number of duplicate values within a histogram step, excluding the upper bound. This is defined as:

(RANGE_ROWS / DISTINCT_RANGE_ROWS

for DISTINCT_RANGE_ROWS > 0)

By using this histogram, SQL Server determines the expected no. of rows to be returned for a given query. For example: in the above if you see the highlighted section, you will noticed that in the 80th step we have

RANGE_HI_KEY=819

RANGE_ROWS=218

EQ_ROWS= 195

Firstly, it means that for range of ProductID>815 and ProductID<819 we will get 218 rows. This can be verified by executing the following query:

Select count(*) from Sales.SalesOrderDetail where ProductID>815 and ProductID<819

Result is 218, which confirms the RANGE_ROWS statistics value

Secondly, it also means that for ProductID=819 we will get 195 rows. This can be verified by executing the following query:

Select count(*) from Sales.SalesOrderDetail where ProductID=819

Result is 195, which confirms the EQ_ROWS statistics value.

I hope by seeing these examples, you have got the good idea how SQL Server forecasts the total number of rows to be returned and how they are used in analyzing the cost of index usage.

July 17, 2008

How SQL Server Index Statistics works – Part 1

In my last article,

Selectivity of a Query and how it affects indexing

I discussed the selectivity of a query. Today I will further extend our discussion and will drill down how index Statistics works. I will continue it in my next article where we will see some real world examples but first we need to understand the concept.

Index statistics is one of the most fundamental part of query plan optimization in SQL Server. Better understanding of index statistics helps you optimize and fine tune your queries. So if you have 1000000 rows and you are searching for a range of values of an indexed column then how SQL Server forecasts how many number of rows will match your criteria before even searching for any row? Here is the actual process:

As we all know that when we index a column, SQL Server does two things:

  1. Sorts the values of the column in an internal data structure called “Index” This data structure contains the sorted value and a pointer to its respective row in the table.
  2. Generates a histogram of values.

The histogram shows the distribution of values for the sorted column. For example:

if we have following values in an integer column 1,2,2,3,4,1,4,5,4 then a typical histogram will be similar to this

Value

Rows matched

1

2

2

2

3

1

4

3

5

1

So suppose if you are searching for all rows having column values between 3 and 5 then by looking at the above histogram table you can estimate that you will get total 1+3+1=5 rows. If your table contains just 9 rows in total then 5 rows means approximately 55% of total rows will match the criteria. Based on this calculation you may elect to directly scan the table instead of first reading the index, fetching the pointers of the matching rows and then read the actual rows. Although, it is a very simple example and there are other factors involve in deciding whether to use index or not but this example demonstrates the concept in very simple terms.

In my next article, I will further explain this concept with the help of real world examples.

August 10, 2007

Optimized solution of Paging by using Count(*) OVER() functionality

Paging through a dataset is always a requirement of any data oriented application. There are several solutions to implement paging. Some of them are client based and some are server based.
Every paging solution needs:

  1. Records of only selected page
  2. Total no. of records to calculate total of pages available

Whatever solution you implement, you will end up doing at least two queries, first to find total no. of records and secondly to find the records of your selected page.

In oneof my previous posts, I showed you the way of implementing paging by using Row_Number() function of SQL Server 2005. Today, I am showing you an excellent trick to get total of records and list of selected records in just one query. Following is the sample code to implement this logic. This code demonstrates the way you can implement paging very effectively without hitting the server twice. Note down the use of Count(*) OVER() functionality.

Use AdventureWorks
Go

–Change these two variable to parameters of your
–actual stored procedure

Declare @PageNumber int
Declare @PageSize int

–Assume we need page 6 i.e. records from 51-60
Select @PageNumber=6
Select @PageSize=10

–Select only those records which
–fit into a single page

Select Top(@PageSize) * from
(
Select
RowID=ROW_NUMBER() OVER (ORDER BY Name),
ProductID,Name,ProductNumber,ListPrice,
TotalRows=Count(*) OVER() –Count all records
from Production.Product

) A
Where A.RowId > ((@PageNumber1)*@PageSize)

June 28, 2007

Quick delete technique for huge tables

Deleting a huge table like purging old transaction records of millions of rows takes quite a lot of time and since DELETE is a fully logged operation therefore, if something goes wrong during deletion the whole process will be rolled back which itself is a time consuming process.
SQL Server 2005 enhanced the functionality of TOP operator and it can be used with Data modification statements like INSERT,UPDATE and DELETE.

By using the TOP operator, you can divide your time consuming delete operation into the batches of small deletions like 5000 records per deletion. The benefit of this approach is:

– Log space will be re-used again and again which will keep your log file size in control.
– If operation fails at any time then only last batch will be rolled back and you can start purging again quickly after recovering from the failure.

Following is the TSQL code to perform batch deletions:

Use AdventureWorks
Go

–Create a copy of Sales.SalesOrderHeader as a test table
Select *
into testOrderHeaderTable
from Sales.SalesOrderHeader

–Delete 5000 Rows at a time
While 1=1
Begin
Delete Top(5000) testOrderHeaderTable

—Check for exit condition
If @@rowcount <5000 break
End

–Remove the test table
Drop table testOrderHeaderTable

June 27, 2007

Statement Level Recompilation–Excellent new feature of SQL Server 2005

Recompilation of the Query plan sometimes harms your server performance more than you expect. Prior to SQL Server 2005, SQL Server had the algorithm which may decide to recompile the full stored procedure because of just one statement. This strategy of full recompilation makes DBA life difficult because he/she needs to investigate the full stored procedure and sometimes may break it into separate small stored procedures to reduce recompilations.

SQL Server 2005 has a great feature of statement level recompilation. This feature automatically detects the statements needs to be recompiled and only these statements will be recompiled resulting less CPU cycles and no need of breaking the stored procedure into sub-stored procedures.

An excellent article about recompilations is available at Microsoft here

June 14, 2007

Dynamic Date Search without writing dynamic SQL

Filed under: Information,Performance,Query,Search columns,tips,TSQL,Tuning — namwar @ 7:29 PM

Many of us sometimes got stuck when our application requires flexible search mechanism where one or all of the search parameters can be NULL. These type of queries requires you to implement a mechanism which takes care of all possible combinations of input to the stored procedure.
Following is a quick,simple and efficient way of implementing flexible date search. Technique used here is applicable for any type of parameter.

We will create a stored procedure which will take two parameters FromDate and ToDate. Both, any or none of these parameters can be Null. I am using AdventureWorks sample database and the table we are searching on is Sales.SalesOrderHeader. This table contains a column called OrderDate. The objective here is to return
1. all orders whose OrderDate falls in the given range if both From and To Dates are provided.
2. all orders whose OrderDate is greater than or equal to the given FromDate if ToDate is NULL.
3. all orders whose OrderDate is less than or equal to the given ToDate if FromDate is NULL.
4. and finally all orders if both From and To Dates are Null.

Following is the stored procedure:

Use AdventureWorks
Go

Create proc usp_GetSalesOrderHeaderInfo
(
@p_FromDate datetime,
@p_ToDate datetime

)
as
Begin

–Set date format to Day/Month/Year
Set dateformat ‘dmy’

Select * from Sales.SalesOrderHeader
Where OrderDate between
–If respective parameter is Null then OrderDate is equal to itself which is always true
Isnull(@p_FromDate,OrderDate) and isnull(@p_ToDate,OrderDate)

End

and Following are some sample calls to the above stored procedure

–Set date format to Day/Month/Year
Set dateformat ‘dmy’

—Get all orders of one day i.e. 17-March-2004
exec usp_GetSalesOrderHeaderInfo ’17/03/2004′,’17/03/2004′

—Get all order uptill 17-March-2004
exec usp_GetSalesOrderHeaderInfo NULL,’17/03/2004′

—Get all order from 17-March-2004 and after
exec usp_GetSalesOrderHeaderInfo ’17/03/2004′,NULL

—Get all orders
exec usp_GetSalesOrderHeaderInfo NULL,NULL

How it works?
The crux of the logic is If the parameter is Null then compare the value with itself which will always results TRUE so practically you have removed the where clause condition for that parameter without creating dynamic SQL for different combinations.

June 5, 2007

Clearing Cache of SQL Server 2005

Query tuning excercises require cache flushing again and again. Following are the quick ways to flush SQL Server 2005 cache:

1. To clear data globally from SQL Server Cache use
DBCC DropCleanBuffers

2. To clear execution plan globally from SQL Server Cache use
DBCC FreeProcCache

2. To clear execution plan of a particular database from SQL Server Cache use
DBCC FlushProcInDB()

June 4, 2007

Hash Join: The clear indicator of Index requirement

Many people ask me What is Hash join? When SQL Server uses HASH join algorithm? The answer is, if you do not have index on either column of a join statement then SQL Server has to scan both the tables to match the rows. Is it really feasible? Is there can be anything better than this?
Yes, there is one algorithm which is Hash algorithm. If you recall from your school days of Data Structure course you will easily understand the importance of data hashing here. The idea is simple:
1. Take a sample set of data values from the column participating in Join (Each value represents a data bucket)
2. Distribute the rows in respective buckets according to their values (Rows will get the bucket number by applying the hash function on the value and the result will be the bucket number)
3. Check the existence of any data value by applying the hash on the value, getting the bucket number as a result and searching on that bucket only.

It looks complex but it is much more efficient than a simple table scan of a huge table.

Always remember, the quickest algorithm for searching is the B-Tree which is the default algorithm for SQL Server but it is applicable on indexed columns or SQL Server builds it dynamically sometimes for very small tables.

Please note: Whenever you see a Hash join operator in your query plan than most of the time it is a clear indicator of a Index requirement on that column but beware, if it is an ad hoc query and you are not willing to run this query on production then do not make excessive indexes just because of Hash Join operator.

June 3, 2007

Use Try Catch for better Error Handling in TSQL

Proper error handling is the crux of good programming style and it is applicable for TSQL also. Prior to SQL Server 2005 we had to check @@Error variable value after each statement to make sure no error has occurred. This approach ends up with the spaghetti style coding of GOTO statements.
With many other enhancements SQL Server 2005 has introduced a proper way of error handling with TRY CATCH blocks similar to most modern languages like C#. You can enclose your code into Try block and errors will be handled in Catch block. It gives you cleaner and more manageable code. Following is a simple example of using TRY CATCH for error handling:

–Disables intermediate SQL Server messages
Set NoCount On
Create table #tempTable
(
SalesOrderId int,
SalesOrderNumber varchar(50)
)

Begin Try

Insert into #tempTable values(‘A’,3)
End Try

Begin Catch
Print ‘Error Number:’+CAST(ERROR_Number() as varchar)
Print ‘Error Message:’+ERROR_MESSAGE()

End Catch

For further details please refer to Microsoft documentation at
How to use TRY CATCH

Next Page »

Blog at WordPress.com.