Home > SSIS > Data cleansing using Fuzzy Lookup -SSIS

Data cleansing using Fuzzy Lookup -SSIS

Fuzzy lookup transformation is one of the useful transformation used in data cleansing purpose.

In previous post we seen how to remove duplicate rows in source data using fuzzy grouping transformation. This Fuzzy Lookup transformation used identify the similar data in columns. For example User may enter data as spelling mistake like user name as “Robin” or “Robins” for the same person. So using reference data we can do fuzzy lookup to get the appropriate name.

Let see how we can use Fuzzy Lookup:-

We may have data which has spell is not consistent like

Fruits Month Cost
Mango Jan 10
Mangoees March 15
Mangoes Feb 20


SELECT ‘Mango’ Fruits ,’Jan’ Month,   10 Cost


SELECT ‘Mangoes’,  ‘Feb’, 20


SELECT ‘Mangoees’, ‘March’, 15


Above data when reference with look up table in ETL process, let say we have Fruit Dimension


FruitId FruitName
101 Mangoes
102 Apples
103 Grapes


Let Say we will lookup with the data in Dimension before any cleanup

Open SSIS designer and create one sample package and copy the source query into the OLE DB Source task and then lookup with the Fruit Dimension data.



Lets drag the normal lookup with having DIMFruit Query as below:-

SELECT 101 FruitId, ‘Mangoes’FruitName


SELECT 102 FruitId, ‘Apples’FruitName


SELECT 103 FruitId, ‘Grapes’FruitName



Go to the Column section in Transformation Editor as shown below screen:-



In above screen we lookup on names and getting the value of FruitId which want to place in the Fact table.



Add one unionall component with dataviewer control to get to know how many rows are matched.

Lets run the package, it will show as in below screen:-



So we got 1 record as matched, however we want 3 records, but data is not correct due to miss spell.

Let’s use the FuzzyLookup:-

For Fuzzy Lookup we need to create DimFruit table physically as it will not take as query as we did in above lookup.

CREATE TABLE dbo.DimFruit(FruitId INT ,FruitName VARCHAR(100))

insert into dbo.DimFruit

values( 101 , ‘Mangoes’)

,( 102 , ‘Apples’)

,( 103 , ‘Grapes’)

Fuzzy lookup ask for table to reference and it will store as index for better performance during lookup.



Go to the Column tab and set which source column should map with the reference column.



Go to the advance tab and set the Similarity threshold to .50 , it depends upon how much similarity we want to check.



Lets add unionall and data viewer control to know the records.



In above screen we can see we get the Fruitid for all the source data.


Learn Different BI Tools
Subscribe my YouTube Channel : BI Tools

Categories: SSIS Tags: ,
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: