SQL Tips by Namwar Rizvi

May 31, 2007

Finding all mispelled string values quickly

Filed under: Information,Query,string manipulation,tips,TSQL — namwar @ 9:10 PM

During data cleansing of imported data from some legacy system, we find that some values are actually spelled incorrectly and therefore, they are not getting included in the result of a query. Finding all the misspelled versions of a given value is quite difficult if you have thousands or millions of records.
Fortunately, there is a quick and easy solution for it which is “SOUNDEX” function in TSQL. Please note that Soundex is not the guarnteed way of finding all the incorrect versions but it is one of the quickest and nearly 90% accurate way of it.
Soundex is an algorithm and it bases on the idea that similar sounding words will have a same alpha-numerical score calculated by this algorithm. So for example, if you have a column called “Color” and you have different variation of same color names like Red,Redd,Redh etc. then Soundex will assign the same score to all of them and you can easily find these variations by comparing the Soundex score. Following is the full working example to better understand this concept.

–Disable SQL Server intermediate messages
Set NoCount On

–Create test Table containing daily data
Declare @m_TestTable table (ItemId int, Color varchar(50))

–Insert some sample values
Insert into @m_TestTable values(1,‘Red’)
Insert into @m_TestTable values(2,‘Reddh’)
Insert into @m_TestTable values(3,‘Redd’)
Insert into @m_TestTable values(4,‘Blue’)
Insert into @m_TestTable values(5,‘Green’)
Insert into @m_TestTable values(6,‘Dark Red’)

—Select all those items which are Red
Select * from @m_TestTable Where Soundex(Color)=Soundex(‘Red’)

Advertisements

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: