SQL Tips by Namwar Rizvi

June 20, 2009

What is not Data Mining? – A Myth Buster

Filed under: Uncategorized — namwar @ 11:38 PM
Tags: , , ,

While watching a web cast by John Weston, I noticed a very important thing which is the clarification of what is not Data Mining? It is very common that people sometime get confused with what actually is Data Mining and start referring different terms and techniques used in normal data processing as Data Mining activities.

Following is a list which describes different data processing techniques and why they can not be reffered as Data Mining?

1. Ad Hoc Query

Ad Hoc queries just examines the current data set and gives you result based on that. This means you can check what is the maximum price of a product but you can not predict what will be the maximum price of that product in near future? A Data Mining Algorithm can do it.

2. Event Notification:

You can set different alerts based on some threshold values which will inform you as soon as that threshold will reach by actual transactional data but again you can not predict when that threshold will reach? A Data Mining Algorithm can do it.

3. Multidimensional Analysis:

You can find the value of an item based on different dimensions like Time, Area, Color but you can not predict what will be the value of the item when its color will be Blue and Area will be UK and Time will be First Quarter of the year? A Data Mining Algorithm can do it.

4. Statistics:

Item Statistics can tell you the history of price changes, moving averages, maximum values, minimum values etc. but it can not tell you how price will change if you start selling another product in the same season. A Data Mining Algorithm can do it.

So in simple words…Data Mining is not history…It is Future!

Advertisements

April 13, 2009

SQL Server Client Tools Setup Fails due to MSXML 6

If you are trying to install SQL Server 2005 client tools and continuously getting failure message because of MSXML 6 installation which already exists on your machine then it may be due to the corrupted installation of MSXML 6 which blocks the installation of client tools. You can fix it as follows:

Go to Control Panel-> Administrative Tools->Add Remove Programs and try to remove it. If it removes smoothly then just install the Client Tools as before and you are done. 

OR

If you found that you are unable to uninstall the MSXML 6 because of some wiered error then you need a tool called Windows Install Clean up from here

Donwload it and remove the corrupt installation of MSXML 6. Once removed, you can now install the Client tools as usual.

January 8, 2009

Search all columns of all tables for a particular value

Filed under: How To,Object Search,tips,TSQL — namwar @ 7:30 PM
Tags: , , , , ,

Several times I have came across with the question that how can I search all columns of all tables in a SQL Server database for a particular value. Unfortunately, you can not perform this query directly by any already provided system table or object. 

There are some fundamental points to remember:

 

  1. You have to iterate the list of tables and for each table you have to iterate each column and perform an exist query against it. 
  2. You will need three types of queries due to the data types i.e. string, number and date and time.
I have found an excellent article here which provides a good explanation along with the necessary sql script. I would recommend every reader to read this article.

 

July 17, 2008

How SQL Server Index Statistics works – Part 1

In my last article,

Selectivity of a Query and how it affects indexing

I discussed the selectivity of a query. Today I will further extend our discussion and will drill down how index Statistics works. I will continue it in my next article where we will see some real world examples but first we need to understand the concept.

Index statistics is one of the most fundamental part of query plan optimization in SQL Server. Better understanding of index statistics helps you optimize and fine tune your queries. So if you have 1000000 rows and you are searching for a range of values of an indexed column then how SQL Server forecasts how many number of rows will match your criteria before even searching for any row? Here is the actual process:

As we all know that when we index a column, SQL Server does two things:

  1. Sorts the values of the column in an internal data structure called “Index” This data structure contains the sorted value and a pointer to its respective row in the table.
  2. Generates a histogram of values.

The histogram shows the distribution of values for the sorted column. For example:

if we have following values in an integer column 1,2,2,3,4,1,4,5,4 then a typical histogram will be similar to this

Value

Rows matched

1

2

2

2

3

1

4

3

5

1

So suppose if you are searching for all rows having column values between 3 and 5 then by looking at the above histogram table you can estimate that you will get total 1+3+1=5 rows. If your table contains just 9 rows in total then 5 rows means approximately 55% of total rows will match the criteria. Based on this calculation you may elect to directly scan the table instead of first reading the index, fetching the pointers of the matching rows and then read the actual rows. Although, it is a very simple example and there are other factors involve in deciding whether to use index or not but this example demonstrates the concept in very simple terms.

In my next article, I will further explain this concept with the help of real world examples.

Create a free website or blog at WordPress.com.